Hi everyone,
I am new to Reddit, I started testing with local LLMs using a Xeon W2255, 128GB RAM, and 2x RTX 3080s, and everything ran smoothly. Since my primary goal was inference, I initially upgraded to two AMD R9700s to get more VRAM.
The project is working well so far, so I'm moving to the next step with new hardware. My pipeline requires an LLM, a VLM, and a RAG system (including Embeddings and Reranking).
I have now purchased two additional R9700s and plan to build a Threadripper 9955WX Pro system with 128GB DDR5 housing the four R9700s, which will be dedicated exclusively to running vLLM. My old Xeon W2255 system would remain in service to handle the VLM and the rest of the workload, with both systems connected directly via...
Hi everyone,
I am new to Reddit, I started testing with local LLMs using a Xeon W2255, 128GB RAM, and 2x RTX 3080s, and everything ran smoothly. Since my primary goal was inference, I initially upgraded to two AMD R9700s to get more VRAM.
The project is working well so far, so I'm moving to the next step with new hardware. My pipeline requires an LLM, a VLM, and a RAG system (including Embeddings and Reranking).
I have now purchased two additional R9700s and plan to build a Threadripper 9955WX Pro system with 128GB DDR5 housing the four R9700s, which will be dedicated exclusively to running vLLM. My old Xeon W2255 system would remain in service to handle the VLM and the rest of the workload, with both systems connected directly via a 10Gb network.
My original plan was to put everything into the Threadripper build and run 6x R9700s, but it feels like going beyond 4 GPUs in one system introduces too many extra problems.
I just wanted to hear your thoughts on this plan. Also, since I haven't found much info on 4x R9700 systems yet, let me know if there are specific models you'd like me to test. Currently, I’m planning to run gpt-oss 120b.