TEGAKARI
  • Home
  • Latest information on overseas products (Unipos WEB)
  • R & D PC configuration example (Tegsys)
  • Service information for R & D
    • Rental service tegakari
    • Research and development/experimental equipment set construction service
  • Technical information articles
  • Version upgrade information
  • News from TEGARA
  • Contact
Pickup new articles
  • [April 2026, 2] Gaussian, LAMMPS, and GROMACS workstation Mathematical Science
  • [April 2026, 2] Chemical reaction/fluid analysis workstation Mathematical Science
  • [April 2026, 2] AlphaFold3/GNINA workstation Biology / Agriculture
  • [April 2026, 1] Dual RTX Pro 6000 Max-Q processors for Unreal Engine engineering
  • [April 2026, 1] RTX 4090-equipped MIPAR workstation engineering

Home > R & D PC configuration example (Tegsys) > Machine for learning large-scale language models for biology

Machine for learning large-scale language models for biology

2023 January 11 TEGARA Co., Ltd. Research workstation, Biology / Agriculture, Artificial intelligence, R & D PC configuration example (Tegsys)

A customer involved in research and development of medical products asked us about a learning machine for large-scale language models for biology.
It is assumed that large-scale language models used in biology such as ProteinBERT, ChemBERTa, and HyenaDNA will be executed from pre-training.

Customers requested that we prioritize GPU performance, as we have received information that ProteinBERT used Nvidia Quadro RTX 5000, ChemBERTa used NVIDIA Tesla T4, and HyenaDNA used NVIDIA A100 for training.

In addition, we would like to have a budget of 300 million yen or less, a configuration that will allow for the highest speed, and a case that is about the size of a mid-tower and can be used in a 100V power environment.

Based on the conditions you contacted us, we proposed the following configuration.

CPU Intel Xeon W5-2455X (3.20GHz 12 cores)
memory 128GB REG ECC
Storage 1 2TB M.2 SSD
Storage 2 4TB SSD S-ATA
Video NVIDIA RTX A6000 48GB x2
network on board (1GbE x1 /10GbE x1)
Housing + power supply Middle tower type housing + 1500W
OS Microsoft Windows 11 Professional 64bit

This is a machine configuration proposal that emphasizes GPU performance based on your budget and usage environment.

The GPU is equipped with NVIDIA RTX A6000 x2.
According to the official website of the ProteinBERT developer, it says that it took about a month to build the trained model using NVIDIA RTX5000.
The A6000 is a newer generation than the RTX5000 and is a higher-end model in the lineup, so you can expect higher processing performance than the RTX5000.

The NVIDIA Tesla T4 that you cited as an example is a product that is often used for inference.Therefore, this configuration uses A4, which has higher unit performance than NVIDIA TeslaT6000.

Also, NVIDIA A100, unlike A6000, is a GPGPU-only card.
Although this product has high fp64 performance and is suitable for scientific calculations, fp64 performance is rarely used for deep learning purposes like this one.
In addition, the price is much higher than the A6000, and it can only be used in a dedicated casing, so we judged that it would not be a good match for our usage conditions and purpose.

Regarding storage, the developer of ProteinBERT recommends that users have at least 1TB of storage capacity when training models on their own, so it is equipped with a 2TB system disk and a 4TB data disk.
In addition, assuming that frequent data access will occur during learning, all storage is SSD.

The OS selected is Windows 11.
The language model you plan to use is basically provided as a Python package, so you can change it if you wish on any OS that supports Python.

The configuration of this case study is based on the conditions given by the customer.
We will flexibly propose machines according to your conditions, so please feel free to contact us even if you are considering different conditions than what is listed.

■ Keywords

・What is Deep Learning?
DeepLearning is a type of machine learning that uses multilayer neural networks to perform advanced pattern recognition and prediction.Since it generally requires a large amount of data, it is considered an effective method when data is abundant.DeepLeanig is also widely used in fields such as image recognition, speech recognition, and natural language processing.Because it can learn complex features and relationships, it can achieve higher accuracy than traditional machine learning methods.

Reference: [Special article] What is machine learning? * Jump to our owned media "TEGAKARI"

・What is Python?
Python is an object-oriented programming language copyrighted by the Python Software Foundation (PSF).Its programming syntax is simple, making it highly readable, and it also features a wide variety of components, such as libraries and frameworks, that are suitable for different purposes.A popular language for programming beginners to advanced users.

Reference: Python *Jumps to an external site

・What is BERT?
BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing (NLP) model developed by Google.It can understand words based on a given context and is applied to a wide range of tasks in language processing.
Also, BERT consists of two phases: pre-training and fine-tuning.Pre-training creates a generic language model trained from a large corpus.Fine-tuning adjusts a model learned from a small dataset to apply it to a specific task.
It is characterized by showing higher accuracy than conventional NLP models and being able to handle complex tasks, and is applied to text generation, question answering, document classification, language translation, etc. Widely used as one.

・What is ProteinBERT?
ProteinBERT is a protein language model based on BERT. Pretrained on up to 90 million proteins from the UniRef1 database, it can handle protein sequences of almost any length, including very long protein sequences.

Reference: GitHub – nadavbra/protein_bert *Jumps to an external site

・What is ChemBERTa?
ChemBERTa is a large-scale language model of SMILES notation, which is a notation method for chemical structures, using RoBERTa (a variant of BERT).It is used in drug design, chemical modeling, property prediction, etc.

Reference: GitHub – seyonechithrananda/bert-loves-chemistry: bert-loves-chemistry: a repository of HuggingFace models applied on chemical SMILES data for drug design, chemical modeling, etc. *Jumps to an external site

・What is HyenaDNA?
HyenaDNA is a large-scale language model that is pre-trained on the human genome as a base sequence of 100 million tokens.Single nucleotide unit (ATGC) tokenization allows analysis at the nucleotide level.

Reference: GitHub – HazyResearch/hyena-dna: Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena *Jumps to an external site

 

■ Click here for details and inquiries about this PC case
Machine for learning large-scale language models for biology

* Please enter the name of the case or your desired conditions.


  • Natural language processing
  • Machine learning (machine learning)

People who read this article also read this article

engineering

Development board `` Google Coral Dev Board '' with Google Edge TPU

2019 January 6 TEGARA Co., Ltd. engineering, Artificial intelligence, Robotics, Development kit / electronic work, Overseas Products What's New (Unipos)

■This article was posted on June 2019, 6, so the information may be out of date.Development board with Google Edge TPU on Unipos website […see next] […see next]

R & D PC configuration example (Tegsys)

RTX PRO 6000 Max-Q Machine Learning Workstation

2025 January 10 TEGARA Co., Ltd. Mathematical Science, Informatics, Artificial intelligence, Robotics, R & D PC configuration example (Tegsys)

A customer who uses a workstation for machine learning applications contacted us for advice on this year's workstation configuration. Last year, they installed four RTX 4090s. […see next]

Automotive / vehicle related

[Release information] New lineup of driving support system device comma "comma three dev kit"

2022 January 2 TEGARA Co., Ltd. Automotive / vehicle related, Overseas Products What's New (Unipos)

■This article was posted on February 2, 2022, so the information may be out of date. Comma is a driving assistance system device that turns the vehicle into an "autonomous vehicle." […see next]

Site search:

Tegara's research and development campaign information

  • [Materials field only] Research and development support campaign
    [Materials field only] Research and development support campaign
    2025 January 12
  • ALOHA Purchase Early Bird Campaign | This is your last chance to purchase during fiscal year 7!
    ALOHA Purchase Early Bird Campaign | This is your last chance to purchase during fiscal year 7!
    2025 January 11
  • Special Offer on AI Robotics Products | For Tegara Repeat Users
    Special Offer on AI Robotics Products | For Tegara Repeat Users
    2025 January 10
  • Tegsys Referral Campaign | Rewards for both the referrer and the referred person
    Tegsys Referral Campaign | Rewards for both the referrer and the referred person
    2025 January 10
  • Unipos Referral Campaign | Benefits for both the introducer and the referred person
    Unipos Referral Campaign | Benefits for both the introducer and the referred person
    2025 January 10
  • Special campaign for conference attendees | UNIPOS
    Special campaign for conference attendees | UNIPOS
    2025 January 10
  • Special Campaign for Life Science Research and Development [Tegsys]
    Special Campaign for Life Sciences Research and Development [Tegsys]
    2025 January 6
  • Announcement of the Young Researchers Support Campaign
    Announcement of the Young Researchers Support Campaign
    2025 January 5

Tegara YouTube Video

[Effect of IR Pass Filter] Shoot whiteboard with RealSense D435 and D435f

The latest posted video is displayed.
Other videosTegara Corporation Youtube channelto check more details.

Popular Articles (Access ranking for the last 7 days)

  • [Product introduction] MarineTraffic: real-time information provision service on ships (subscription plan) 2023 January 4
  • Illustration tool "BioRender" for the life science field 2021 January 9
  • The latest version 5 of the projection mapping software "MadMapper" has been officially released. 2021 January 12
  • [Product Introduction] Virtual Serial Ports Emulator (VSPE) : Virtual Serial Port Emulator 2023 January 1
  • [Product introduction] Leap Motion Controller 2 – Hand tracking camera that recognizes hand and finger movements 2023 January 6

Latest posts

  • Gaussian, LAMMPS, and GROMACS workstation
    2026 January 2
  • Chemical reaction/fluid analysis workstation
    2026 January 2
  • AlphaFold3/GNINA workstation
    2026 January 2
  • Dual RTX Pro 6000 Max-Q processors for Unreal Engine
    2026 January 1
  • RTX 4090-equipped MIPAR workstation
    2026 January 1

Featured tags

Analysis tool (56) Machine learning (machine learning) (55) 3D camera (55) Robotics (51) AI (48) Deepearning (47) Bioinformatics (47) VR (44) Statistical analysis (43) Robot arm (42) RealSense (41) Video / Video (37) SBC (36) simulation (36) Depth camera (36) IoT (35) Small SBC (35) instrumentation (35) Spectrum (33) Next-generation sequencer (31) Data analysis (31) Python (31) Image analysis / image inspection (30) First principle (30) Cyber ​​security (28) JavaScript (27) Chemical (27) AR (27) MATLAB (26) . NET (26) Metashape (26) Image processing (26) TO DEAL (25) In-vehicle (25) UI (24) Photogrammetry (23) material (23) Support (22) Molecular biology (22) prototype (22) Educational robot (22) 3D model (22) gene (21) Measuring instrument (21) Molecular dynamics (21) Web development / production (21) Electromagnetic field analysis (21) GIS (20) Test tool (20) Visualization (20) ROS (20) Psychology (19) Mech robot (19) Mobile robot (19) security (19) Robot hand (19) Drone (19) Animation (19)
Find Information by Field-Category
  •  Humanities / Social Sciences
  •  Mathematical Science
  •  Chemical
  •  engineering
  •  Medicine / Nursing / Pharmacy
  •  Biology / Agriculture
  •  Informatics
 
  •  Artificial intelligence
  •  Robotics
  •  Sensor technology
  •  Development kit / electronic work
  •  Digital gadget
  •  Automotive / vehicle related
  •  Industrial communication technology
  •  Application development and programming
  •  Network security
  •  Multimedia (video / image / audio) processing
  •  Business support and efficiency tools
Translate
Site link
Group Privacy Policy
Management website (service)
TEGARA Co., Ltd.
TEGARA CORPORATION corporate site

UNIPOS
Overseas product procurement and consultation services for R & D

Tegusis
Research and industrial PC production and sales services

TKS Division
Research and development/experimental equipment set construction service
Contact Form – Contact
Click here to contact TEGAKARI
SNS account
  • Twitter
  • YouTube
  • Facebook

TEGARA Co., Ltd.

Tegara is a platform that provides R & D with useful products, services, and information in an integrated manner. "Helping accelerate R & D"

Copyright © 2020 | Tegara Corporation