
The client contacted us to build a workstation for generative AI that could be run locally to utilize large-scale language models in-house. The software they were considering using was Ollama, LM Studio, Dify, and Python.
Specifically, the customer requested a CPU model with a sufficient number of cores, and a memory configuration with at least 1TB, and preferably expandable to 2-4TB.
Regarding GPUs, we would like to install the maximum number of NVIDIA RTX Pro 6000 Blackwells possible, assuming that multiple cards can be installed.
The OS must be Ubuntu and the power supply must operate in a 100V environment.
Furthermore, storage requires the maximum number of NVMe SSDs with the highest possible capacity, providing high overall processing performance and expandability.
The client wanted to create a configuration that met these requirements within a budget of approximately 2,000 million yen.
Taking the above into consideration, we proposed the following configuration:
| CPU | Intel Xeon 6515P 2.30GHz(TB 3.80GHz) 16C/32T |
| memory | Total 4TB DDR5-6400 REG ECC 128GB x 32 |
| Storage 1 | 7.68TB U.2 NVMe SSD |
| Storage 2 | 15.36TB U.2 NVMe SSD ×5 (RAID5) |
| Video | NVIDIA RTX PRO 6000 BW Server Edition x 4 |
| network | Network Card 10GbE RJ45 2-Port |
| Housing + power supply | 4U rackmount enclosure, 3200W/200V redundant power supply (3+1) |
| OS | Ubuntu 24.04 |
| Other | RAID card Broadcom MegaRAID Complete rail kit 3-year send-back warranty (1-year standard warranty + 2-year extended warranty) |
CPU/memory configuration optimized for LLM inference and generation AI processing
This configuration prioritizes the number of CPU cores and memory bandwidth required for running generative AI and local LLM. In order to efficiently process large-scale models, not only GPU performance but also multi-core CPU performance and the ability to access large amounts of memory are important.
To achieve this, we have adopted the latest generation of server CPUs, aiming to achieve both multi-core processing power and large memory capacity. Initially, we had envisioned a 1TB memory configuration, but in anticipation of future model expansion and support for RAG workloads, we are proposing a configuration with the maximum possible memory capacity of 4TB.
High-density GPU configuration featuring RTX Pro 6000 Blackwell
Because the customer wanted to install multiple NVIDIA RTX Pro 6000 Blackwell GPUs, we chose a 4U rack-mount chassis and a motherboard configuration that would enable stable operation of the maximum number of GPUs.
When installing multiple PRO 6000 class GPUs, significant constraints arise, such as the number of PCIe lanes, heat dissipation, and power supply capacity. This configuration meets all of these requirements, achieving stable AI calculation processing as a high-density GPU server.
Increasing NVMe storage capacity and RAID configuration
In response to requests for installing the maximum number of NVMe SSDs with the highest possible capacity, we propose a configuration that combines read/write speed and redundancy by installing multiple NVMe SSDs in addition to the system SSD.
This configuration allows for high-speed handling of research data and embedding caches, and delivers high performance in local inference environments such as RAG environments and Dify/Ollama.
Power supply environment (preferred 100V → change to 200V compatible)
Initially, the client requested that the system be operated in a 100V environment, but as the power supply capacity of 100V was insufficient for a configuration with many GPUs, the client changed course and instead chose a 200V environment.
This allows the use of a server-grade power supply unit, ensuring the stability of the entire machine and future expandability.
Keyword・What is Ollama? ・What is LM Studio? What is Dify? ・What is Python? |

|
■ Click here for details and inquiries about this PC case * Please enter the name of the case or your desired conditions. |


