table of contents
Introduction
When introducing a workstation for research purposes,Constitution,GPU typeWhat criteria do you use to select them?
Many people may be wondering, "I know the model number and VRAM capacity, but what about the actual processing speed and stability?"
There are some things that cannot be understood from catalog specifications alone, and some things can only be understood by actually running the product.
At TEGSYS, we clarify the differences in GPU configurations that are difficult to understand from a spec sheet alone.
If we were to ask an LLM to summarize Dazai Osamu's "Run, Melos"...
What is the performance of each GPU configuration?
What answers will you get?
are verified and the results are published.
Verification details
The environment used in this verification is as follows:
hardware
| Verification environment | GPU type | VRAM 2019 | Usage Model |
| Configuration 1 | NVIDIA RTX 6000Ada x 1 | 48GB | q2_K, normal model |
| Configuration 2 | NVIDIA RTX 4500Ada x 2 | 48GB (24GB×2) | q2_K, normal model |
| Configuration 3 | NVIDIA RTX 6000Ada x 2 | 96GB (48GB×2) | q2_K, normal model, q8_0, FP16 |
software
- Usage Model:Llama 3.3 (70B)
- Execution environment: Ollama + Dify
- Quantization settings and model size :
- q2_K: 26GB (lightweight, low precision)
- Standard model: 43GB
- *Only 6000Ada x 2 (Configuration 3) used
- q8_0: 75GB (high precision)
- FP16: 141GB (full system)
- Setting status: Default (no additional data, no prior fine tuning)
Items verified
- Inference speed
- VRAM capacity and available models
In order to check the behavior of LLM as it is,Without fine-tuning or additional data training,Preset StatusVerification was carried out.
inspection result
Depending on the processing power of the GPU itself and the VRAM capacity,Inference speed and model size that can be handledOn the other hand, even if multiple GPUs are installed, if parallel inference is not configured, the addition does not lead to an improvement in inference speed, highlighting the fact that a simple comparison of specifications alone is not enough to make a judgment.
The verification conditions, specific numerical values for performance differences, and the full summary of "Run Melos" generated by LLM are available on the TEGSYS official website.
Based on actual testing, TEGSYS has published examples of workstation configurations suitable for language model inference and research and development.
This time, we will introduce a model suitable for LLM inference and research and development, based on the GPU configuration used in the verification.
Reference example of actual machine configuration
Below is an example of a workstation configuration for LLM operation.
It can be flexibly customized according to your application and analysis target.
Uses: Llama inference, local LLM, Dify application development
| CPU | Intel Xeon W5-2565X 3.20GHz (up to 3.0GHz at TB4.8) 18C/36T |
| memory | Total 256GB DDR5 5600 REG ECC 32GB x 8 |
| storage | 2TB SSD M.2 NVMe Gen4 |
| GPU | NVIDIA RTX 6000 Ada 48GB x 2 configuration NVIDIA RTX A400 4GB (MiniDisplayPort x 4) |
| OS | Microsoft Windows 11 Professional 64bit |
| Others | Software installation |
Uses: Image analysis, natural language processing
| CPU | Intel Xeon W7-2575X 3.00GHz (up to 3.0GHz at TB4.8) 22C/44T |
| memory | Total 512GB DDR5 5600 REG ECC 64GB x 8 |
| storage | 2TB SSD S-ATA+ 8TB HDD S-ATA |
| GPU | NVIDIA RTX 6000 Ada 48GB (DisplayPort x 4) |
| OS | Ubuntu 24.04 |
You can consider an implementation plan that suits your research environment, such as the software you will be using and your budget.
Please feel free to contact us even if your conditions are different from those listed.
Life Science Campaign Announcement
Life SciencesTEGSYS is currently running a special campaign for researchers who are considering introducing it.
In addition to selecting the GPU configuration,Supporting the development of storage services and analysis environmentsWe also offer special benefits.
The campaign page also publishes examples of the implementation of various software and analysis environments.
Please take a look at this as a reference for environment construction and system design.
My Feelings, Then and Now
Depending on the GPU configuration,There are differences in LLM inference speed and compatible modelsThis verification revealed that:Depending on the conditions, there is little difference in performance.There is also a case,It is important to make a selection based on actual testing rather than judging by simple specifications alone.
Based on the results of this verification, TEGSYS is proposing configurations tailored to the research field and purpose of use.
Please use this as a reference when selecting a model that suits your analysis environment.
In addition, We are running a special campaign to encourage researchers in the life sciences field to adopt our technology.
Please see below for details.



