OptiStruct SPMD (Hybrid Shared/Distributed Memory Parallelization) |
|||||
OptiStruct SPMD (Hybrid Shared/Distributed Memory Parallelization) |
Single Program, Multiple Data (SPMD) is a parallelization technique in computing that is employed to achieve faster results by splitting the program into multiple subsets and running them simultaneously on multiple processors/machines. SPMD typically implies running the same process or program on different machines (Nodes) with different input data for each individual task. In this section, discussion about SPMD typically includes the application of Shared Memory Parallelization (SMP) in conjunction with the MPI-based parallelization. Typically, this combination can be termed as Hybrid Shared/Distributed Memory Parallelization, and it will henceforth be referred to as SPMD.
Supported platforms and MPI versions for OptiStruct SPMD are listed in Table 1:
Application |
Version |
Supported Platforms |
MPI |
---|---|---|---|
OptiStruct SPMD |
2017 |
Linux |
IBM Platform MPI - Version 9.1.2 |
Windows |
IBM Platform MPI - Version 9.1.2 |
Table 1: Supported Platforms for OptiStruct SPMD
However, SPMD can sometimes be implemented on a single machine with multiple processors depending upon the program and hardware limitations/requirements. SPMD in OptiStruct is implemented by the following four MPI-based functionalities:
• | Load Decomposition Method (LDM) |
• | Domain Decomposition Method (DDM) |
In addition to Load Decomposition Method (LDM), OptiStruct SPMD includes another approach for parallelization called Domain Decomposition Method (DDM) for analysis and optimization. DDM allows you to run a single subcase of analysis and/or optimization with multiple processors. The solution time will be significantly reduced in DDM mode and the scalability is much higher compared to the legacy shared memory processing parallelization approach, especially on machines with a high number of processors/sockets (for example, greater than 8). Figure 2: Example illustrating Graph Partitioning for the DDM implementation in OptiStruct The DDM process utilizes graph partition algorithms to automatically partition the geometric structure into multiple domains (equal to the number of MPI processes). During FEA analysis/optimization, an individual domain/MPI process only processes its domain related calculations. Such procedures include element matrix assembly, linear solution, stress calculations, sensitivity calculations, and so on. Figure 3: Example DDM setup with four MPI processes (-np=4). There are 2 nodes/sockets available for use. Two MPI processes are assigned to a single node in this case. The necessary communication across domains is accomplished by OptiStruct and is required to guarantee the accuracy of the final solution. When the solution is complete, result data is collected and is output to a single copy of the .out file. From the user’s perspective, there will be no difference between DDM and serial runs in this aspect. Supported Solution Sequences for DDMLinear, Nonlinear static analysis/optimization, Structural Direct Frequency Response Analysis (MUMPS is also available for SMP run via the SOLVTYP entry), Normal Modes Analysis, and Buckling Analysis solution sequences are generally supported. Preloaded Modal Frequency Response (with AMLS/AMSES) is supported. Direct Frequency Response with Fluid–Structure Interaction (Acoustic Analysis) is supported. Fatigue Analysis (based on linear static analysis) is also supported. Normal Mode and Buckling Analysis/Optimization are supported via the Lanczos eigensolver, additionally, the MUMPS solver can also be activated for the SMP run using the SOLVTYP entry. The Iterative solvers, however, are currently not supported in conjunction with DDM. Note:
Frequently Asked Questions (DDM)
|
It is possible to run OptiStruct SPMD over LAN. Follow the corresponding MPI manual to setup different working directories of each node the OptiStruct SPMD is launched. |
There is no single answer to this question. If the computer has sufficient memory to run all tasks in-core, expect faster solution times as MPI communication is not slowed down by the network speed. But if the tasks have to run out-of-core, then computations are slowed down by disk read/write delay. Multiple tasks on the same machine may compete for disk access, and (in extreme situations) even result in wall clock time slower than that for serial (non-MPI) runs. |
The flowchart below provides a quick reference to determine the number of Nodes for each parallelization. Figure 4: Flowchart showing the process to follow for LDM and DDM runs. |
To run parallel MPI processes, distributed memory (with parallel access) is essential. If a single node contains multiple sockets (each with a single Processor), then theoretically, an equivalent number of MPI processes (equal to the number of sockets) can be run on the node, provided sufficient RAM is available to handle all MPI processes simultaneously in parallel. However, if sufficient distributed memory is not available in the RAM, it is typically more efficient to use Shared Memory Parallelization (SMP), instead of SPMD and use multiple logical processors/cores within the node in parallel via the –nt run option. When each node has sufficient RAM to execute only a single serial OptiStruct run, activate SMP on each node by splitting up the run into multiple threads (using more than four threads, -nt=4 is usually not effective). For example: On a 4 node cluster (with 2 sockets each) and if overall each node contains 8 cores, you can run: Insufficient RAM: optistruct <inputfile> -ldm -np 4 –nt 4 Sufficient RAM: optistruct <inputfile> -ldm -np 8 –nt 4 |
There are several ways to launch parallel programs with OptiStruct SPMD. Remember to propagate environment variables when launching OptiStruct SPMD, if needed. Refer to the respective MPI vendor’s manual for more details. In OptiStruct 14.0, commonly used MPI runtime software are automatically included as a part of the HyperWorks installation. The various MPI installations are located at $ALTAIR_HOME/mpi.
|
|
|