A client involved in machine learning research using large-scale language models contacted us to inquire about introducing a high-performance PC for the purpose of local verification.
The client's budget is around 250 million to 500 million yen, and they would like a workstation equipped with as many high-performance GPUs as possible, with the intention of using Ollama.
Pre-installation of GPU drivers, CUDA Toolkit, and cuDNN is also required.
It is assumed that the device will be installed in a 200V power supply environment.
| CPU | Intel Xeon W3-2525 3.50GHz (up to 3.0GHz at TB4.5) 8C/16T |
| memory | Total 256GB DDR5 5600 REG ECC 64GB x 4 |
| storage | 1TB SSD S-ATA |
| Video | NVIDIA RTX PRO6000 Max-Q 96GB x 2 configuration |
| network | on board (2.5GbE x1 /10GbE x1) |
| Housing + power supply | Mid-tower chassis + 1600W 80PLUS PLATINUM |
| OS | Ubuntu 24.04 |
| Others | 12A 200V Power Cable C19 – C14 Installation (GPU driver, CUDA Toolkit, cuDNN) |
GPU selection
In this application,Balancing GPU performance and VRAM capacityIs important.
Depending on the LLM, around 140GB of VRAM may be required, so we recommended a configuration equipped with two NVIDIA RTX PRO 6000 Max-Q (96GB) cards.
In addition, the design includes an available PCIe slot to accommodate the addition of a third RTX PRO 6000 Max-Q.
Tegsys has published a technical article summarizing the GPU performance differences in LLM.
In the first part Actual comparison of RTX 5090 / RTX 4090 / RTX 5000 AdaIn the sequel Testing with RTX PRO 6000 Max-Q Is introduced.
For detailed verification results, please see below.
![]() |
![]() |
Memory Configuration and Scalability
LLM inference requires that sufficient VRAM is secured, and also requires a system memory with an equal or larger capacity.
This configuration implements 256GB of memory (4 x 64GB), and modules of the same capacity (64GB) can be added in the available slots.
If you add a third GPU in the future, the total amount of VRAM will be 96GB x 3 = 288GB .
In this case, you can easily meet the appropriate system memory requirements by adding memory using the available slots.
This allows you to avoid bottlenecks and achieve stable data processing even after adding GPUs.
Pre-construction of software environment
Pre-installed GPU drivers, CUDA Toolkit, and cuDNN with the appropriate versions.Ready-to-use environmentWe will deliver it to you.
Although it is planned that customers will set up frameworks such as PyTorch, we have experience in implementing them and are available to provide consultation if necessary.
For those who are active in these fields
|
Tegara's custom-made PC production service not only caters to initial use, but also supports system expansion in anticipation of future expansion of research scale.
We not only propose configurations that meet various software requirements, but also accept consultations regarding the construction of an entire research environment.
Please feel free to contact us and we will provide the best solution to suit your needs.
KeywordWhat is CUDA Toolkit? The CUDA Toolkit is a GPU computing development environment provided by NVIDIA. What is cuDNN (CUDA Deep Neural Network library)? cuDNN is a high-performance library provided by NVIDIA for accelerating deep neural network (DNN) calculations using GPUs. The biggest advantage of using cuDNN is that you don't need to write GPU-optimized code for each framework. |

|
■ Click here for details and inquiries about this PC case * Please enter the name of the case or your desired conditions. |




