ddRAD-Seq Analysis Workstation

A customer involved in research on forest ecosystems contacted us inquiring about a machine for next-generation sequencer analysis.
Specifically, they are considering introducing a workstation for ddRAD-Seq analysis using Illumina NovaSeqX.
The main applications are as follows:

・Analysis of large volume sequence read data using the analysis software Stacks
・Population genetic analysis such as ADMIXTURE, PCA, BayesAss, etc. targeting tens of thousands of SNPs

Currently, they are using a PC with the following configuration, which was previously purchased from Tegsys. Although the calculations are very fast, they have reported that they are not making full use of the specs because the scale of the processing they perform is not large.

■ PC configuration used
CPU AMD Ryzen Threadripper3 3970X (3.70GHz 32 cores)
memory 256GB
Storage 1 500GB SSD S-ATA
Storage 2 8TB HDD S-ATA
Video NVIDIA T400 2GB
network On board (10/100/1000Base-T x1) Wi-Fi x1
Housing + power supply Middle tower type housing + 850W
OS Ubuntu 20.04

Therefore, for this proposal, we would like you to suggest the optimal configuration based on a budget of around 70 to 80 yen, even if the specs are slightly lower than the PC currently in use.

Based on the customer's requests, we proposed the following configuration:

CPU AMD Ryzen9 7950X (4.50GHz 16 cores)
memory 192GB
Storage 1 500GB SSD S-ATA
Storage 2 8TB HDD S-ATA
Video NVIDIA T400 4GB
network on board(2.5G x1 10/100/1000Base-T x1) Wi-Fi x1
Housing + power supply Middle tower type housing + 850W
OS Ubuntu 22.04

Configuration that prioritizes analysis processing performance within your budget

This configuration has scaled-down specifications from the machine you are currently using, assuming a budget of 70 to 80 yen.

The CPU is the AMD Ryzen2024 6X, the top model of the Ryzen 7000 series, which is the latest as of June 9.
While the basic configuration, such as storage, will follow the same configuration as the machine you are currently using, we will select parts with a focus on allocating your budget to the CPU and memory capacity, whose specifications have a significant impact on analysis speed.
With a 16-core/32-thread CPU and 192GB of memory, you can expect particularly fast analysis processing speeds, even among configurations within your budget.
Please note that the number of CPU cores and memory capacity are both upper limits for this system. If you would like a higher performance machine or a configuration that takes future upgrades into account, please contact us.

The configuration of this case study is based on the conditions given by the customer.
We will flexibly propose machines according to your conditions, so please feel free to contact us even if you are considering different conditions than what is listed.

 

Feel free to request a quote based on your usage and budget - Tegsys' simple inquiry form

 

■ Keywords

・What is ddRAD-Seq analysis?

ddRAD-Seq (double digest restriction-site associated DNA sequencing) analysis is a type of RAD-seq (Restriction-site Associated DNA Sequencing) analysis that uses next-generation sequencing technology to analyze the regions adjacent to restriction enzyme recognition sites. It can sequence genome-scale data from non-model species, allowing rapid and efficient development of large amounts of genetic markers.

What is Stacks?

Stacks is a software pipeline for constructing loci from short sequence reads, such as those generated by the Illumina platform, and was developed for constructing genetic maps using restriction enzyme-based data such as RAD-seq for population genomics and phylogenetic analysis.

Reference: Stacks*Jumps to an external site

・What is ADMIXUTURE?

ADMIXTURE is software for maximum likelihood estimation of individual ancestry from multilocus SNP genotype datasets. It uses fast numerical optimization algorithms to compute estimates quickly.

reference:ADMIXTURE: fast ancestry estimation*Jumps to an external site

・What is BayesAss?

BayesAss is a program for inferring recent migration rates between populations using unlinked multilocus genotypes. It uses Markov chain Monte Carlo methods to estimate the posterior probability of recent migration rates between populations.

Reference: brannala/BA3 *Jumps to an external site