TEGAKARI
  • Home
  • Overseas Products What's New (Unipos)
  • R & D PC configuration example (Tegsys)
  • Service information for R & D
    • Rental service tegakari
  • Technical information articles
  • Version upgrade information
  • News from TEGARA
  • Contact Us
Pickup new articles
  • [April 2025, 7] Workstation for Electromagnetic Field Analysis Research workstation
  • [April 2025, 7] Machine for large-scale language model calculation Research workstation
  • [April 2025, 7] MAGMA dedicated machine for large-scale numerical calculations Research workstation
  • [April 2025, 7] Part 3: Young researchers x product introduction - "Advancement of research" seen through introduction examples Overseas Products What's New (Unipos)
  • [April 2025, 7] Part 2: With friendly, individual support, you can make the most of your research funds! A guide to safely introducing both overseas products and domestic equipment Overseas Products What's New (Unipos)

Home > R & D PC configuration example (Tegsys) > Machine for learning large-scale language models for biology

Machine for learning large-scale language models for biology

2023/11/10 TEGARA Co., Ltd. Research workstation, Biology / Agriculture, Artificial intelligence, R & D PC configuration example (Tegsys)

A customer involved in research and development of medical products asked us about a learning machine for large-scale language models for biology.
It is assumed that large-scale language models used in biology such as ProteinBERT, ChemBERTa, and HyenaDNA will be executed from pre-training.

Customers requested that we prioritize GPU performance, as we have received information that ProteinBERT used Nvidia Quadro RTX 5000, ChemBERTa used NVIDIA Tesla T4, and HyenaDNA used NVIDIA A100 for training.

In addition, we would like to have a budget of 300 million yen or less, a configuration that will allow for the highest speed, and a case that is about the size of a mid-tower and can be used in a 100V power environment.

Based on the conditions you contacted us, we proposed the following configuration.

CPU Intel Xeon W5-2455X (3.20GHz 12 cores)
memory 128GB REG ECC
Storage 1 2TB M.2 SSD
Storage 2 4TB SSD S-ATA
Video NVIDIA RTX A6000 48GB x2
network on board (1GbE x1 /10GbE x1)
Housing + power supply Middle tower type housing + 1500W
OS Microsoft Windows 11 Professional 64bit

This is a machine configuration proposal that emphasizes GPU performance based on your budget and usage environment.

The GPU is equipped with NVIDIA RTX A6000 x2.
According to the official website of the ProteinBERT developer, it says that it took about a month to build the trained model using NVIDIA RTX5000.
The A6000 is a newer generation than the RTX5000 and is a higher-end model in the lineup, so you can expect higher processing performance than the RTX5000.

The NVIDIA Tesla T4 that you cited as an example is a product that is often used for inference.Therefore, this configuration uses A4, which has higher unit performance than NVIDIA TeslaT6000.

Also, NVIDIA A100, unlike A6000, is a GPGPU-only card.
Although this product has high fp64 performance and is suitable for scientific calculations, fp64 performance is rarely used for deep learning purposes like this one.
In addition, the price is much higher than the A6000, and it can only be used in a dedicated casing, so we judged that it would not be a good match for our usage conditions and purpose.

Regarding storage, the developer of ProteinBERT recommends that users have at least 1TB of storage capacity when training models on their own, so it is equipped with a 2TB system disk and a 4TB data disk.
In addition, assuming that frequent data access will occur during learning, all storage is SSD.

The OS selected is Windows 11.
The language model you plan to use is basically provided as a Python package, so you can change it if you wish on any OS that supports Python.

The configuration of this case study is based on the conditions given by the customer.
We will flexibly propose machines according to your conditions, so please feel free to contact us even if you are considering different conditions than what is listed.

■ Keywords

・What is Deep Learning?
DeepLearning is a type of machine learning that uses multilayer neural networks to perform advanced pattern recognition and prediction.Since it generally requires a large amount of data, it is considered an effective method when data is abundant.DeepLeanig is also widely used in fields such as image recognition, speech recognition, and natural language processing.Because it can learn complex features and relationships, it can achieve higher accuracy than traditional machine learning methods.

Reference: [Special article] What is machine learning? * Jump to our owned media "TEGAKARI"

・What is Python?
Python is an object-oriented programming language copyrighted by the Python Software Foundation (PSF).Its programming syntax is simple, making it highly readable, and it also features a wide variety of components, such as libraries and frameworks, that are suitable for different purposes.A popular language for programming beginners to advanced users.

Reference: Python *Jumps to an external site

・What is BERT?
BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing (NLP) model developed by Google.It can understand words based on a given context and is applied to a wide range of tasks in language processing.
Also, BERT consists of two phases: pre-training and fine-tuning.Pre-training creates a generic language model trained from a large corpus.Fine-tuning adjusts a model learned from a small dataset to apply it to a specific task.
It is characterized by showing higher accuracy than conventional NLP models and being able to handle complex tasks, and is applied to text generation, question answering, document classification, language translation, etc. Widely used as one.

・What is ProteinBERT?
ProteinBERT is a protein language model based on BERT. Pretrained on up to 90 million proteins from the UniRef1 database, it can handle protein sequences of almost any length, including very long protein sequences.

Reference: GitHub – nadavbra/protein_bert *Jumps to an external site

・What is ChemBERTa?
ChemBERTa is a large-scale language model of SMILES notation, which is a notation method for chemical structures, using RoBERTa (a variant of BERT).It is used in drug design, chemical modeling, property prediction, etc.

Reference: GitHub – seyonechithrananda/bert-loves-chemistry: bert-loves-chemistry: a repository of HuggingFace models applied on chemical SMILES data for drug design, chemical modeling, etc. *Jumps to an external site

・What is HyenaDNA?
HyenaDNA is a large-scale language model that is pre-trained on the human genome as a base sequence of 100 million tokens.Single nucleotide unit (ATGC) tokenization allows analysis at the nucleotide level.

Reference: GitHub – HazyResearch/hyena-dna: Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena *Jumps to an external site

 

■ Click here for details and inquiries about this PC case
Machine for learning large-scale language models for biology

* Please enter the name of the case or your desired conditions.

  • Natural language processing
  • Machine learning (machine learning)

People who read this article also read this article

R & D PC configuration example (Tegsys)

Ubuntu-based high-speed computing machine for image analysis

2014/1/31 TEGARA Co., Ltd. Research workstation, R & D PC configuration example (Tegsys)

■ This article was posted on January 2014, 1, so the information content may be out of date. "I'm looking for a research PC that has as much memory as possible and is capable of high-speed computation. […see next]

R & D PC configuration example (Tegsys)

Robot arm control PC

2024/8/29 TEGARA Co., Ltd. Research workstation, Industrial computer, R & D PC configuration example (Tegsys)

A customer involved in research into robot manipulation asked us for a PC to customize and use the collaborative robot "xArm". […see next]

Automotive / vehicle related

[Release information] New lineup of driving support system device comma "comma three dev kit"

2022/2/2 TEGARA Co., Ltd. Automotive / vehicle related, Overseas Products What's New (Unipos)

A new lineup of driving support system device comma "comma three dev kit" has been released (20) […see next]

Site search:

Tegara YouTube Video

[Effect of IR Pass Filter] Shoot whiteboard with RealSense D435 and D435f

The latest posted video is displayed.
Other videosTegara Corporation Youtube channelplease look at

Popular Articles (Access ranking for the last 7 days)

  • [Product introduction] Leap Motion Controller 2 – Hand tracking camera that recognizes hand and finger movements 2023/6/9
  • The latest version 5 of the projection mapping software "MadMapper" has been officially released. 2021/12/23
  • furix BetterWMF and CompareDWG tools for AutoCAD [Product introduction] Beyond Compare: File and folder comparison, integration and synchronization utility 2022/11/18
  • We compared 8 types of 3D cameras in various environments [No. XNUMX indoor edition] 2020/9/7
  • [Function comparison] What is the difference between Azure Kinect DK and Orbbec Femto Bolt? 2023/9/26

Latest posts

  • Workstation for Electromagnetic Field Analysis
    2025/7/11
  • Machine for large-scale language model calculation
    2025/7/9
  • MAGMA dedicated machine for large-scale numerical calculations
    2025/7/8
  • Tegsys x Unipos x TKS Young Researchers Support Campaign
    Part 3: Young researchers x product introduction - "Advancement of research" seen through introduction examples
    2025/7/7
  • Young Researchers Support Campaign x Unipos.net
    Part 2: With friendly, individual support, you can make the most of your research funds! A guide to safely introducing both overseas products and domestic equipment
    2025/7/4

Featured tags

Analysis tool (56) 3D camera (55) Machine learning (machine learning) (53) AI (47) Robotics (45) VR (44) Robot arm (42) Bioinformatics (42) RealSense (41) Statistical analysis (39) Deepearning (37) Video / Video (37) SBC (36) Depth camera (36) IoT (35) instrumentation (35) Small SBC (35) simulation (33) Spectrum (33) Data analysis (31) First principle (29) Python (29) Cyber ​​security (28) Chemical (27) AR (27) Next-generation sequencer (27) JavaScript (27) . NET (26) Image processing (25) Image analysis / image inspection (25) In-vehicle (25) TO DEAL (25) Metashape (25) MATLAB (24) UI (24) Photogrammetry (23) prototype (22) 3D model (22) Molecular biology (22) Educational robot (22) Support (22) Measuring instrument (21) Web development / production (21) Test tool (20) material (20) GIS (20) ROS (19) Animation (19) Visualization (19) security (19) Electromagnetic field analysis (19) Robot hand (19) Drone (19) Mech robot (19) Mobile robot (19) Psychology (19) programming (18) protocol (18) Autonomous vehicle (18) ToF (18) EEG (18) gene (18) DNA (17) 3D printer (17) Motion capture (17) tracking (17) Clinical (17) Raspberry Pi (17) CAE (17) Deep learning (17) Education (16) Structural analysis (16) modeling (16) XNUM XD modeling (16) Industrial (16) chart (16) Bioassay (16) Library (15) Arduino (15) biostatistics (15) Movie editing (15) RNA (15) 3D scan (15) Fluid analysis (15) drug development (15) Molecular dynamics (15) AR / VR (15) Device control (14) Malware (14) others (14) Articles delivered in August 2022 (14) Stimulus presentation (14) CFD (14) Agriculture / Agriculture (14) Information dissemination September issue (14) SLAM (14) CUDA (14) Articles delivered in August 2022 (14) 写真 (14) Nanostructured material (13) Genome analysis (13) Surveying (13) Monitoring (13) Development and evaluation kit (13) Voice processing (13) Numerical analysis (13) Depth sensor (13) STEM / STEAM education (13) IDE (Integrated Development Environment) (13) 24 hours operation (13) control (13) 3D CAD (13) wireless (13) Thermal fluid analysis (13) Remote operation (remote control) (12) Information dissemination February 22 issue (12) natural Science (12) Deep Lab Cut (12) FDTD method (12) Information dissemination February 22 issue (12) Quantum chemistry calculation (12) GPGPU (12) Capture glove (12) Looking Glass (12) CAD (12)
Find Information by Field-Category
  •  Humanities / Social Sciences
  •  Mathematical Science
  •  Chemical
  •  engineering
  •  Medicine / Nursing / Pharmacy
  •  Biology / Agriculture
  •  Informatics
 
  •  Artificial intelligence
  •  Robotics
  •  Sensor technology
  •  Development kit / electronic work
  •  Digital gadget
  •  Automotive / vehicle related
  •  Industrial communication technology
  •  Application development and programming
  •  Network security
  •  Multimedia (video / image / audio) processing
  •  Business support and efficiency tools
Translate
Contact Form – Contact
Click here to contact TEGAKARI
Site link
Privacy Policy
Management website (service)
TEGARA Co., Ltd.
TEGARA CORPORATION corporate site

UNIPOS
Overseas product procurement and consultation services for R & D

Tegusis
Research and industrial PC production and sales services
SNS account
  • Twitter
  • YouTube
  • Facebook

TEGARA Co., Ltd.

Tegara is a platform that provides R & D with useful products, services, and information in an integrated manner. "Helping accelerate R & D"

Copyright © 2020 | Tegara Corporation