A customer involved in research and development of medical products asked us about a learning machine for large-scale language models for biology.
It is assumed that large-scale language models used in biology such as ProteinBERT, ChemBERTa, and HyenaDNA will be executed from pre-training.
Customers requested that we prioritize GPU performance, as we have received information that ProteinBERT used Nvidia Quadro RTX 5000, ChemBERTa used NVIDIA Tesla T4, and HyenaDNA used NVIDIA A100 for training.
In addition, we would like to have a budget of 300 million yen or less, a configuration that will allow for the highest speed, and a case that is about the size of a mid-tower and can be used in a 100V power environment.
Based on the conditions you contacted us, we proposed the following configuration.
CPU | Intel Xeon W5-2455X (3.20GHz 12 cores) |
memory | 128GB REG ECC |
Storage 1 | 2TB M.2 SSD |
Storage 2 | 4TB SSD S-ATA |
video | NVIDIA RTX A6000 48GB x2 |
network | on board (1GbE x1 /10GbE x1) |
Housing + power supply | Middle tower type housing + 1500W |
OS | Microsoft Windows 11 Professional 64bit |
This is a machine configuration proposal that emphasizes GPU performance based on your budget and usage environment.
The GPU is equipped with NVIDIA RTX A6000 x2.
According to the official website of the ProteinBERT developer, it says that it took about a month to build the trained model using NVIDIA RTX5000.
The A6000 is a newer generation than the RTX5000 and is a higher-end model in the lineup, so you can expect higher processing performance than the RTX5000.
The NVIDIA Tesla T4 that you cited as an example is a product that is often used for inference.Therefore, this configuration uses A4, which has higher unit performance than NVIDIA TeslaT6000.
Also, NVIDIA A100, unlike A6000, is a GPGPU-only card.
Although this product has high fp64 performance and is suitable for scientific calculations, fp64 performance is rarely used for deep learning purposes like this one.
In addition, the price is much higher than the A6000, and it can only be used in a dedicated casing, so we judged that it would not be a good match for our usage conditions and purpose.
Regarding storage, the developer of ProteinBERT recommends that users have at least 1TB of storage capacity when training models on their own, so it is equipped with a 2TB system disk and a 4TB data disk.
In addition, assuming that frequent data access will occur during learning, all storage is SSD.
The OS selected is Windows 11.
The language model you plan to use is basically provided as a Python package, so you can change it if you wish on any OS that supports Python.
The configuration of this case study is based on the conditions given by the customer.
We will flexibly propose machines according to your conditions, so please feel free to contact us even if you are considering different conditions than what is listed.
■ Keywords・What is Deep Learning?
・What is Python? ・What is BERT? ・What is ProteinBERT?
・What is ChemBERTa? ・What is HyenaDNA? |
■ Click here for details and inquiries about this PC case * Please enter the name of the case or your desired conditions. |