[Article] About GPU support for Windows Subsystem for Linux 2 (WSL2) [2/3]

■ This is an article posted on June 2020, 11, so the content of the information may be out of date.

[Please check] This is the following articleSequel articleWill be

About GPU support of Windows Subsystem for Linux 2 (WSL2) [1/3]

Last articleSo, I briefly introduced the setup of the GPGPU environment on Windows Subsystem for Linux 2 (WSL2), but this time I will introduce it because I verified the performance when actually running GPGPU on WSL2.

 

About WSL2 overhead

WSL2 is realized by using the virtualization system Hyper-V provided as a function of Windows. Since the Linux environment on WSL2 also operates as a virtual machine on the mechanism, the overhead that can occur in general virtualized systems also occurs on WSL2.This point will be verified in the following sections, but please refer to the official Microsoft document for explanation.

Exceptionally when using WSL2 instead of WSL1
https://docs.microsoft.com/ja-jp/windows/wsl/compare-versions#exceptions-for-using-wsl-1-rather-than-wsl-2

 

■ What is Hyper-V?

Hyper-V is a virtualization system provided by Microsoft. With Hyper-VYou can run multiple operating systems as a virtual machine on Windows.For example, when verifying the operation of programs and software, the environment of multiple operating systems is used.It is very convenient because it can be realized on one computer (desktop or laptop).

Hyper-V is available for 64-bit versions of Windows 10 Pro, Enterprise, and Education (not for Home Edition).

Overview of Hyper-V in Windows 10
https://docs.microsoft.com/ja-jp/virtualization/hyper-v-on-windows/about/

 

How to measure GPGPU performance

This time, we used NBody (N-body simulation) provided as a sample program * of NVIDIA CUDA Toolkit to measure the time required and FLOPS with the specified parameter settings.

Benchmark mode is implemented in NBody, and effective performance can be measured by using this.We tested under the same conditions on Windows and Ubuntu on WSL2 set up on the same computer to see if there was a difference in effective performance.

* Click the image to enlarge

 

■ About the sample program of CUDA Toolkit

CUDA In ToolkitMany code samples are available to help you create software using CUDA C / C ++.In the sample (nbody) used this time, efficient N-body simulation with CUDAInteraction calculationYou can perform a simulation.

NBODY-CUDA N-BODY SIMULATION
https://docs.nvidia.com/cuda/cuda-samples/#cuda-n-body-simulation

CUDA CODE SAMPLES
https://developer.nvidia.com/cuda-code-samples

 

For setup of CUDA Toolkit and execution of sample program,Here (first article) Please refer to

 

Measurement result

conditions:
・ Set the number of elements to 1,280,000 in benchmark mode and execute NBody on GPU
・ The GPU used was one NVIDIA RTX2080Ti.

Measurement method: Measure 5 times each and get the average value

* Results are based on single precision calculation (fp32)

 

Performance comparison (when running GPU): Windows VS WSL2

GPU execution Time required (ms) GFLOP (S)
Windows 10 35403.003 9256.453
Ubuntu 20.04 (on WSL2) 31231.972 10492.176

The result is that running on WSL2 has better performance.Regarding this result, it is different from our previous prediction and we are guessing the factor, but GPU support in WSL2 directly uses the GPU driver installed on the Windows side, as a virtualization system It has a slightly special mechanism, and it is thought that the influence of virtualization overhead is small for GPGPU.Another possible factor is the difference in the operating system OS.

Since the judgment is made only by looking at the execution result of NBody, this result cannot be generalized, but I think that the superiority of WSL2 can be confirmed to some extent in terms of use with GPGPU.

 

[Reference] Performance comparison (when running CPU): Windows VS WSL2

CPU execution Time required (ms) GFLOP (S)
Windows 10 907883.125 3.609
Ubuntu 20.04 (on WSL2) 2968239.750 1.104

NBody can be executed only on the CPU without using the GPU by setting options. Regarding CPU execution, it is single thread execution and it seems that it is not parallelized, and it takes a lot of time under the same conditions as the test performed on GPU.For this reason, I reduced the number of elements to 1, which is 10/128,000, but I can see that the time required is still considerably longer than when running the GPU.

Also, regarding CPU execution, contrary to the case of GPU, the performance on WSL2 is significantly reduced. It is considered that the processing on the CPU main is affected by the virtualization overhead.

Next time, the 2rd time, we plan to carry out verification assuming more applied use of WSLXNUMX.

Person who wrote this article : Engineering Department Wada
The date this article was written : 2020.11.12
2021/05/10 postscript:
[Article] About GPU support of Windows Subsystem for Linux 2 (WSL2) [3/3] has been released!