Metashape (formerly Photoscan), which we often receive inquiries, is actually software that can be processed even in a cluster configuration by forming a network.
Previously, measurement of processing speed in GPU etc. with PhotoScanI have done this, but then I tried to verify what kind of tendency it would be if it was configured in a cluster.
About the verification environment
The cluster configuration used for this test is the following environment. We prepare 4 systems below, 3 of them will operate as cluster nodes completely, and 1 will be the verification of Serve for Metashape and storage server and node.
Cluster system
CPU | Intel Core i9 9900K (3.60GHz / TB5.0GHz, 8C / 16T) |
memory | 40GB |
SSD | 1TB S-ATA |
GPU | Geforce RTX 2080 Ti × 1 |
LAN | Onboard (1GbE) |
OS | Microsoft Windows 10 Professional 64bit |
Metashape | See 1.5.2.7838 |
For comparison, data from a machine with the following specifications, which was operated alone, is also displayed for reference.
Standalone system for comparison
CPU | Intel Xeon W-2155 (3.30GHz / TB4.50GHz, 10C / 20T) |
memory | 256GB |
SSD | 1TB M.2 |
GPU | Geforce RTX 2080 Ti × 2 |
OS | Microsoft Windows 10 Professional 64bit |
Metashape | See 1.5.2.7838 |
About processing contents
First of all,Last testHowever, in the Doll (Agisoft data download page) of the sample data released by the maker (Agisoft) that used it, processing of ① MatchPhotos ② AlignCameras ③ BuildDepthMaps ④ BuildDenseCloud ⑤ BuildModel ⑥ BuildUV ⑦ BuildTexture was executed.
* The processing performed is performed with the following parameters.
Aligen Photos: Highest
Build Dense Cloud: Ultra High
Build Mesh: Arbitray & Ultra High
Build Texture: Generic
About processing results
Below is the server log, which is a graph of the result of calculating the total processing time from ①MatchPhotos to ⑦BuildTexture.
(Y axis = elapsed time: the longer the graph, the longer it takes to process)
Cluster 1 (RTX 2080Ti × 1) | 20 minutes 35 seconds |
Cluster 2 (RTX 2080Ti × 2) | 18 minutes 33 seconds |
Cluster 3 (RTX 2080Ti × 3) | 17 minutes 09 seconds |
3 clusters + 1 server shared (RTX 2080Ti x 4) | 16 minutes 44 seconds |
Single system for comparison (RTX 2080Ti × 2) | 20 minutes 03 seconds |
You can see that the processing speed of one cluster is a little slow, but as far as the result of the cluster is seen, the speed is faster according to the number of units.However, the difference depending on the number of GPUs seems to be small, and is it really effective for cluster configuration processing?The question comes up.
Therefore, I checked the logs for the above processing time and summarized the data for each of the results of the parts that seemed to be Metashape phases. This results in the following graph.
#In the case of a single system, the log output is different, so it is not included in the above graph.
Judging from this distribution, the processing speed from ①MatchPhotos to ④BuildDenseCloud in the first half reflects the scale of the number of GPUs and clusters to some extent, but from ⑤BuildModel to ⑦BuildTexture, the speed does not change much depending on the number of clusters. became.
At the time of actual verification, the load during measurement was confirmed. However, all processes from ⑤BuildModel to ⑦BuildTexture are performed only on one cluster node, and are not processed on other cluster nodes. The situation has been confirmed.
Another point to note is the time taken for each phase. In this measurement, the time from ⑤ Build Model to ⑦ Build Texture is more than 50% of the total processing time tested this time.
I would like you to check again the graph of the result of calculating the processing time from ① MatchPhotos to ② BuildTexture which was shown first, but it should be quite a high spec with two GPUs originally Standalone system for comparison (Navy blueActually, two cluster systems (two GPUs) (light blueThe result was that it took longer than the processing in (graph). Considering this cause, the spec difference between the comparison stand-alone system and the cluster system, other than the number of GPUs, accounts for 50% of the total processing time. It was guessed that there is.
When we compared the specifications based on that assumption, it seemed that the CPU had the factor. Let's compare the two CPUs again.
Cluster system
CPU | Intel Core i9 9900K (3.60GHz /TB5.0GHz, 8C / 16T) |
Standalone system for comparison
CPU | Intel Xeon W-2155 (3.30GHz /TB4.50GHz, 10C / 20T) |
Cluster side TB (Turbo boost) 5.0GHzOn the other hand, TB on the single system side 4.50GHzIt works with. In other words, for 推測 BuildModel to ⑦BuildTexture, it is speculated that the single clock of the CPU is particularly effective for the processing speed.
Then, what happens when ⑤BuildModel to ⑦BuildTexture are processed with priority on the number of cores, which is sufficiently larger than this example?The question arises.
This question will be announced soon. ”Processing speed measurement and trend verification in a “Metashape” cluster configuration (Part XNUMX)We will report with the verification result of a series of batch processing of ortho image processing in ".
■ Added on June 2019, 6: The second part has been released!