Bayesian statistics machine

A customer involved in marketing research consulted us about a machine for statistical analysis.
We would like a machine to perform the MCMC method in Bayesian statistics using R (RStan), focusing on statistical analysis of consumer questionnaire data.The maximum number of questionnaires that can be handled is about 1,000.
In addition, as a method, Bayesian network analysis using BayoLinkS and Bayesian network analysis using R are assumed, and the specific conditions are as follows.

・CPU: Core i9
・ Memory: 64GB
・Storage: SSD 1TB
・OS: Windows 11
・Display: 23.8-inch liquid crystal display
・Software used: R, Rstudio, Python, BayoLinkS
・Others: I want to be able to comfortably execute four calculations by the MCMC method in parallel.
・Budget: About 50 million yen

Based on the contents of the communication, we proposed the following configuration.

CPU Core i9-13900KS (3.20GHz 8 cores + 2.40GHz 16 cores)
memory 64GB
storage 2TB M.2 SSD
video on board (DPx1 HDMIx1)
network on board (2.5GBase-T x1) Wi-Fi x1
Housing + power supply Middle tower case + 850W
OS Microsoft Windows 11 Professional 64bit
Others

23.8-inch wide FullHD LCD display

Based on the use of R and the budget conditions, I think that the Core i9 CPU is good as you requested.
4 If it is used for calculation, there is a high possibility that the clock boost will be effective, so even the 13th generation Core i9 has selected a product with high single core performance. In the results of running only P cores, we confirmed operation at around 8 GHz with 5.6 cores running.
Compared to the 12th generation Core i9-12900K, there is a difference of about 33% (12% for single) as a CPUMark value.

Reference: Intel Core i9-12900K vs Intel Core i9-13900KS (PassMark Software)

There is an option of ECC memory, but considering the usage, it is hard to say that 64GB is essential, and it will lead to an increase in cost, so we chose NonECC specification. Since the CPU itself supports ECC memory, it can be changed if the cost increase is acceptable.If you expect continuous calculation at the level of several weeks, we recommend using ECC memory.

Also, if there is a possibility that R-type intermediate files may be exported to storage, a larger SLC area controlled by the SSD side as a cache will lead to improved performance, so we chose a product with a capacity of 2TB. It is a high-speed type compatible with PCI-E Gen4.

The configuration of this case study is based on the conditions given by the customer.
Please feel free to contact us even if you are considering different conditions from what is posted.

■FAQ

・What is R?
R is an open source and free software programming language/development execution environment for statistical analysis.Used in calculations and graphing for statistical processing.
Since many libraries exist, complex techniques can be handled by simply calling the library.

reference:The R Project for Statistical Computing *Jumps to an external site

 

・What is Rstudio?
Rstudio is an integrated development environment for using R.It offers an intuitive and easy-to-use user interface, including project management features, code editor, code auto-completion, syntax highlighting, debugger, profiler, support for markdown documents, package management features, interactive graphics viewing, and more. It has a variety of functions.

reference:RSTUDIO IDE (Posit) *Jumps to an external site

 

・What is Python?
Python is an object-oriented programming language copyrighted by the Python Software Foundation (PSF).Its programming syntax is simple, making it highly readable, and it also features a wide variety of components, such as libraries and frameworks, that are suitable for different purposes.A popular language for programming beginners to advanced users.

reference:Python *Jumps to an external site

reference:[Feature Article] Programming language Python Why is it so popular? --Tools to accelerate Python programming * Jump to our owned media "TEGAKARI"

 

・What is BayoLinkS?
BayoLinkS is a business intelligence tool that performs predictive analysis using machine learning algorithms.We provide the tools necessary for Bayesian network analysis, and can consistently perform data collection and preprocessing to analysis.

reference:BayoLinkS (NTT DATA Mathematical Systems) *Jumps to an external site

 

・What is Bayesian network analysis?
Bayesian network analysis is a data analysis method that can graphically represent multiple variables with probabilistic relationships.Construct a directed graph that expresses the causal relationship between variables, and estimate the probability distribution under the given conditions for each variable.In addition, Bayesian networks are suitable for decision-making and predictive analysis under highly uncertain conditions, and are used in the medical and financial fields.Analysis using Bayesian networks plays an important role in data analysis because of its high prediction accuracy and ability to consider data uncertainties.