[Article] GeForce TITAN X 3 machine mounted specification machine operation report

■ This is an article posted on June 2016, 3, so the content of the information may be out of date.

As a representative of deep learning applications, it has high performance as a GPGPU and has good cost performance compared to TESLA.TITAN XThe number of inquiries about "is increasing.

Regarding the processing capacity of TITAN X, since it is a popular product, I think that many people have already been verified, so this time it is not there, research PC Case PC-3875 I would like to report on the parts that are difficult to see, such as temperature and power consumption, that require attention, in line with the TITAN X 3-panel configuration.

Verifier equipped with 3 GeForce TITAN X

・ Verification environment

Chipset X99 series 
CPU Intel Core i7-5960X (cooled by water cooling unit) 
GPU  NVIDIA GeForce GTX TITAN X 12GB PCI-E x 3
memory 8GB DDR4 UDIMM x 8 
Power supply unit  Enermax EMR1500EWT 1500W
OS Linux (Ubuntu 14.04) 

・ Measurement method, tools

CPU: Prime95 (http://www.mersenne.org/
GPU :
– CUDA 7.5 Toolkithttps://developer.nvidia.com/
– VVIDIA Inspector (http://orbmu2k.de/l
Power consumption measurement: Watts up? Pro (https://www.wattsupmeters.com/

CPU is Prime95, CUDA 7.5 Toolkit is installed on GPU, and TITAN X is explicitly specified for each nbody in the CUDA sample folder, and the load is applied in order.

Each load test is performed approximately every 2 minutes in the order of CPU → 1st GPU → 2nd GPU → 3rd GPU, and the load state is continued for about 10 minutes.

 Measurement result

1. Voltage fluctuation (Fig. 1)

First is the transition of power consumption when performing the above series of operations. Power consumption is measured with the watt checker sandwiched between the power supply unit and the outlet.

* Click to enlarge

At startup, the power will rise to 360W instantaneously, but it will be about 150-160W in the idle state. When the load is applied only to the CPU here, it will be about 270 W, and it will increase each time the GPU is started. The peak power consumption was 811.9W.

From the graph, it seems that the power consumption does not rise so much even if the load of the second or third GPU is applied, so I checked the load factor (next section).

2-1. Utilization rate when 3 GPUs are loaded (Fig. XNUMX)

The usage rate when the load is applied to three GPUs is shown in the graph below.

You can see that the load on the GPU is not always applied, but it is moving finely. Although it is particularly instantaneous, there is also a part where the usage of GPU 0 is 0%. Therefore, we confirmed that the load on the GPU is one and two.

2-2. Load factor when one GPU is loaded (Fig. 1)

Only GPU1 is loaded.

2-3. Load factor when one GPU is loaded (Fig. 2)

It puts a load on GPU1 and GPU2.

Looking at the above results, there is some increase and decrease in usage rate when the first sheet is loaded (Fig. 1), but when the second sheet begins to be loaded (Fig. 3), it is the same as when three sheets are loaded You can see that the graph looks like this (Fig. 2) (if there is only one, the lower limit is around 4%).

From this, in the range confirmed this time, when multiple TITAN X are simultaneously operated, it seems that they are operating while switching the GPU to be used to some extent. This operation itself may be a problem caused by the program used in this test, so it does not mean that multiple GPUs can not be used, but if it is a calculation that loads properly on multiple cards, It is possible that the power consumption measured this time (Fig. 1) could be greatly exceeded.

(Figure 1) CPU load → The increase in power consumption when adding one GPU is up to about 1 W → 270 W, so it is about 520 W per GPU, and if you look only here, the number according to the catalog specifications You can say that (Fig. 1), the graph is moving up to near 250W (a little higher than the catalog specifications).

This is because the power consumption this time is measured at the outlet-Watts up? Pro-power supply unit and the external side, and may be related to the power factor of the power supply unit. In addition to the power consumption of the catalog specifications required internally, the power factor of the power supply unit used may actually require more power than the catalog specifications.

Conclusion 1 (About power consumption)

When actually calculating with a GPU, it seems necessary to consider the power supply capacity with some margin in the GPGPU card catalog specifications.

Especially when considering 2 CPU model or 4 cards introduction, there is a possibility that operation under 100V environment will be quite severe, so it is probably better to consider building a 200V environment if possible. .

3. Temperature (Fig. 5)

It is a graph of temperature measurement under the same 3GPU environment. The outside temperature at the time of measurement is about 20 degrees Celsius (measurement at the front of the case), and the measurement points are near the GPU inside the case (front side of the case) and near the GPU exhaust port outside the case.

When the power is turned on, the internal and external temperatures rise, and immediately after the load on the GPU begins to rise, the temperature of the external exhaust rises at a stroke, and the rise has stopped at around 80 ° C. The internal temperature will continue to rise slightly after that, but the rise will stop in about 10 minutes after the first load is applied, and you can see that the temperature is stable.

This time, the load is applied to the 3GPU and stopped for about 10 minutes, so it is a graph that you can see that the temperature on the exhaust side began to drop significantly at the timing when it was stopped.

The load status after 10 minutes is as shown below (Fig. 6).

4. Load of 3 GPUs, load factor after 10 minutes (Fig. 6)

It is a graph when 3 minutes have elapsed with load on 10 GPUs.

The maximum temperature of GPU is 86 ℃.

As for the way to apply the load, while the vertical width of GPU1 and GPU2 is large, GPU0 has little fluctuation and a certain load is applied. In proportion to that, GPU0 has a FAN rotation speed (usage rate) of 50%, and GPU1 and GPU2 have a FAN usage rate of less than 50%. You can see that this is not a temporary thing by looking at the progress of the graph.

From the above, it can be inferred that the temperature may rise a little more depending on the load and program (actually, it seems that there is still room for the FAN speed of this configuration).

Also, since this time it was measured in winter, the outside temperature is measured at a relatively low temperature of 20 ° C. Therefore, if the outside temperature becomes higher in summer, etc., the difference may be added to this measurement.
In this case, the problem is that this measurement shows a maximum temperature of 10 ℃ with a load of about 86 minutes, but when the outside air temperature reaches 25 ℃, it is the maximum temperature on the TITAN X catalog specifications. It may reach ℃.

Conclusion 2 (about temperature)

Depending on how the load is applied and the program, the number of revolutions of the FAN should rise further, but of course the temperature inside the case is also affected by the outside temperature, so it is actually cooled using air near 50 ° C. It becomes a shape. At the time of introduction, the outside temperature is also an important factor.

We hope this is useful for those who are considering introducing GPGPU.

References:
TITAN spec sheet
http://www.nvidia.co.jp/object/geforce-gtx-titan-x-jp.html#pdpContent=2