A customer involved in plant genome research consulted us about the configuration of a PC for analysis.
The purpose is to assemble a genome of a diploid organism, estimated at 280 Mb, and we understand that the research will be expanded to include RNA-seq analysis. The budget is about 2 million to 100 million yen, and the OS is a machine with Linux (Ubuntu) pre-installed.
Regarding the software to be used, Trinity is planned to be used for RNA-seq analysis, SPAdes, Platanus, Racon, and medaka are expected for haploid samples in genome analysis, and FALCON or Canu for diploid samples.
We plan to initially perform assembly and synteny analysis using approximately 10 Gb of data, and the specific software to be used has yet to be decided.
In light of these requirements, we proposed the following configuration:
CPU | Intel Xeon W5-3435X 3.10GHz (3.0GHz at TB4.70) 16C/32T |
memory | Total 256GB DDR5 4800 REG ECC 32GB x 8 |
Storage 1 | 1TB SSD M.2 NVMe Gen4 |
Storage 2 | 16TB HDD S-ATA |
Video | NVIDIA T400 4GB (MiniDisplayPort x3) |
network | on board (1GbE x1 /10GbE x1) |
Housing + power supply | Tower type housing + 1000W |
OS | Ubuntu 22.04 |
When selecting a PC for analysis, it is common sense to emphasize securing sufficient memory capacity for calculations. Therefore, the order of specification selection and cost weighting is to first secure the required amount of memory, and then use the remaining budget to consider the CPU for analysis and data storage.
In this case, we are starting with about 10Gb of data, so we are proposing a configuration that will allow you to perform a certain level of analysis within a budget of 120 million yen. The key point of this configuration is memory scalability, with the ability to add up to +768GB later. This configuration allows for room to add more memory if a memory shortage occurs as the amount of data handled increases step by step.
Please note that the configuration in this example prioritizes memory capacity that can be analyzed, and is not a configuration that prioritizes processing speed itself. If you have any requirements regarding processing speed, please feel free to contact us.
Reference: Trinity memory notation (Running Trinity · trinityrnaseq/trinityrnaseq Wiki · GitHub)
■ Keywords・What is Trinity? Trinity is software for de novo assembly of transcriptomes. It is useful for organisms for which reference genomes are not available, or for discovering novel transcripts. It can recover the original mRNA sequence using RNA-Seq reads (short base sequences).
What is SPAdes? SPAdes is software for de novo assembly of genome sequences. It is an assembler for reconstructing genome sequences using next-generation sequencing (NGS) data, and is particularly suited to the assembly of bacterial genomes. It also supports single-cell sequencing (SCS) data.
What is Platanus? Platanus is software for de novo assembly of genome sequences, and is a tool particularly suited to the assembly of highly heterogeneous genomes. It can reconstruct genome sequences with high accuracy using short read data from next-generation sequencers. What is Racon? Racon is a tool for generating fast and accurate consensus sequences in de novo genome assembly using long-read sequence data. It aims to rapidly generate high-quality consensus sequences from long reads with high error rates, and is particularly suitable for data from PacBio and Oxford Nanopore Technologies. What is medaka? medaka is a tool used for the analysis of next-generation sequencing data, especially for DNA mutation detection. It is a polishing and mutation detection tool that targets long-read sequencing data, mainly from Oxford Nanopore Technologies (ONT), and is particularly suitable for post-processing of de novo assemblies and mutation calling against known reference genomes. What is FALCON? FALCON is software for performing de novo genome assembly using PacBio long-read sequence data. It is a de novo genome assembler developed by Pacific Biosciences (PacBio) and is suitable for assembling large-scale, complex genomes. ・What is canu? Canu is software for de novo genome assembly using long-read sequence data. It is a de novo assembler specialized for long-read data from PacBio and Oxford Nanopore Technologies, and is suitable for assembling large-scale, complex genomes. |
■ Click here for details and inquiries about this PC case * Please enter the name of the case or your desired conditions. |