can not reproduce $6000 #12103

taishan1994 · 2025-02-28T06:48:28Z

taishan1994
Feb 28, 2025

Here's my machine configuration
AMD双路服务器:
CPU: 2 X AMD EPYC 9115/2.60GHz/64M/16C/32T/165W；
内存: 24 X 32GB/DDR5/5600MHz/REG（消费级），总共768G；
SSD: １X SSD/3.84T/SATA 6Gb/2.5寸/读取型；
硬盘: １ X ４TB/SATA/7200RPM/3.5寸/企业级；
阵列卡: 1 X LSI 9560-8I 4G/支持RAID 0,1,5,6,10,50,60；
网卡: 1 X 双口/千兆电口/RJ45接口/I350-T2；
板载2个10G万兆电口 Broadcom® BCM57416；
电源：1200W冗余电源1+1；

When I tested with a sample used by $6000, the TPS was only 3.0, while $6000 reported 5.4.

llama-cli -m model_path --temp 0.6 -no-cnv -c 16384 -p "<丨User丨>How many Rs are there in strawberry?<丨Assistant丨>"

ejrydhfs · 2025-03-02T10:25:43Z

ejrydhfs
Mar 2, 2025

Are you running llama.cpp on a virtual machine by any chance? I noticed you were using llama.cpp as root and with some tabs at the top of the image. Virtualization could be the reason why the TPS is lower.

0 replies

ejrydhfs · 2025-03-02T10:47:11Z

ejrydhfs
Mar 2, 2025

Also is this 6000? https://openi.pcl.ac.cn/6000/aiforge

2 replies

taishan1994 Mar 3, 2025
Author

Also is this 6000? https://openi.pcl.ac.cn/6000/aiforge

The above screenshot was taken inside the container. Later, we will directly conduct tests on the host. On average, the decoding speed of Q8_0 is approximately 3.3 tokens per second. We are referring to this place (or this content) here. https://x.com/carrigmat/status/1884244369907278106

ejrydhfs Mar 10, 2025

Please forgive me if you have done this already but did you do this?

And that's your system! Put it all together and throw Linux on it. Also, an important tip: Go into the BIOS and set the number of NUMA groups to 0. This will ensure that every layer of the model is interleaved across all RAM chips, doubling our throughput. Don't forget!

Sounds like this problem is rather common. If you still experience issues I would look at the cpu temperatures to see if they are overheating or something. Maybe there's still a sticker between the heatsink and a CPU.

taishan1994 · 2025-03-10T08:59:20Z

taishan1994
Mar 10, 2025
Author

I have aligned it. I need to set NUMA to NPS0 and set the memory to be interactive.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can not reproduce $6000 #12103

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

can not reproduce $6000 #12103

taishan1994 Feb 28, 2025

Replies: 3 comments · 2 replies

ejrydhfs Mar 2, 2025

ejrydhfs Mar 2, 2025

taishan1994 Mar 3, 2025 Author

ejrydhfs Mar 10, 2025

taishan1994 Mar 10, 2025 Author

taishan1994
Feb 28, 2025

Replies: 3 comments 2 replies

ejrydhfs
Mar 2, 2025

ejrydhfs
Mar 2, 2025

taishan1994 Mar 3, 2025
Author

taishan1994
Mar 10, 2025
Author