OptiStruct SPMD (Hybrid Shared/Distributed Memory Parallelization)

Single Program, Multiple Data (SPMD) is a parallelization technique in computing that is employed to achieve faster results by splitting the program into multiple subsets and running them simultaneously on multiple processors/machines. SPMD typically implies running the same process or program on different machines (Nodes) with different input data for each individual task. In this section, discussion about SPMD typically includes the application of Shared Memory Parallelization (SMP) in conjunction with the MPI-based parallelization. Typically, this combination can be termed as Hybrid Shared/Distributed Memory Parallelization, and it will henceforth be referred to as SPMD.

Supported Platforms

Supported platforms and MPI versions for OptiStruct SPMD are listed in Table 1:

Application	Version	Supported Platforms	MPI
OptiStruct SPMD	2017	Linux (64-bit)	IBM Platform MPI - Version 9.1.2 (formerly HP-MPI) (or) Intel MPI - Version 5.1.3 (223)
OptiStruct SPMD	2017	Windows (64-bit)	IBM Platform MPI - Version 9.1.2 (formerly HP-MPI) (or) Intel MPI - Version 5.1.3 (or) Microsoft MPI - Version 5.0.12435.6

Table 1: Supported Platforms for OptiStruct SPMD

However, SPMD can sometimes be implemented on a single machine with multiple processors depending upon the program and hardware limitations/requirements. SPMD in OptiStruct is implemented by the following four MPI-based functionalities:

•	Load Decomposition Method (LDM)

•	Domain Decomposition Method (DDM)

•	Multi-Model Optimization (MMO)

•	Failsafe Topology Optimization (FSO)

Load Decomposition Method (LDM) in OptiStruct can be used when a run is distributed into parallel tasks, as shown in Figure 1. The schematic shown in Figure 1 is applicable to a LDM run on multiple machines. The entire model is divided into parallelizable tasks, Table 2 lists the various supported solution sequences and parallelizable steps.

spmd_flowchart

Figure 1: Example LDM setup with four MPI processes (-np=4). There are 3 nodes/sockets available for use.

In Load Decomposition Method, the model (analysis/optimization) is split into several tasks, as shown in Figure 1. The tasks are assigned to various nodes that run in parallel. Ideally, if the model is split into N parallel tasks, then (N+1) MPI processes should be spawned by OptiStruct for maximum efficiency (that is, -np should be set equal to N+1). The extra process is known as the Manager. The manager process decides the nature of data assigned to each process and the identity of the Master process. The manager also distributes input data and tasks/processes to various available nodes/machines. The manager process does not need to be run on a machine with high processing power, as no analysis or optimization is run by it. The Master process, however, requires a higher amount of memory, since it contains the main input deck and it also collects all results and solves all sections of the model that cannot be parallelized. Optimization is run by the Master process.

The platform dependent Message Passing Interface (MPI) handles the communication between various MPI processes.

Note:

A Task is a minimum distribution unit used in parallelization. Each buckling analysis subcase is one task. Each Left-Hand Side (LHS) of the static analysis subcases is one task. Typically, the static analysis subcases sharing the same SPC (Single Point Constraint) belong to one task. Not all tasks can be run in parallel at the same time (For example: A buckling subcase can not start before the execution of its STATSUB subcase).

The manager and master processes can be run on the same node/socket. In a cluster environment, this can be accomplished by repeating the first node in the appfile/hostfile in a cluster setup. If, as recommended, -np is set to N+1, then N+1 MPI processes are spawned and both manager and master processes are assigned to the first two nodes on the list (which in this case will both be node 1). The -np option, appfile/hostfile are explained in the following sections.

Supported Solution Sequences for LDM

OptiStruct can handle a variety of solution sequences as listed in the overview. However, all solution sequences cannot be parallelized. Steps like Pre-processing and Matrix Assembly are repeated on all nodes, while response recovery, screening, approximation, optimization and output of results are all executed on the Master node.

Solution Sequences that Support Parallelization	Parallelizable Steps	Non-Parallelizable Steps
Static Analysis	Two or more static Boundary Conditions are parallelized (Matrix Factorization is the step that is parallelized since it is computationally intensive.) Sensitivities are parallelized (Even for a single Boundary Condition as analysis is repeated on all slave nodes).	Iterative Solution is not parallelized. (Only Direct Solution is parallelized).
Buckling Analysis Note: Preloaded Buckling Analysis (STATSUB(PRELOAD) in a buckling subcase) is not supported with Load Decomposition Method.	Two or more Buckling Subcases are parallelized. Sensitivities are parallelized.
Direct Frequency Response Analysis	Loading Frequencies are parallelized. No Optimization.
Modal Frequency Response Analysis	Loading Frequencies are parallelized.	Modal FRF pre-processing is not parallelized. Sensitivities are not parallelized.

Table 2: Load Decomposition Method - Parallelizable Steps for various solution sequences

As of HyperWorks 11.0, the presence of non-parallelizable subcases WILL NOT make the entire program non-parallelizable. The program execution will continue in parallel and the non-parallelizable subcase will be executed as a serial run.

MPI Process Assignment (in a cluster)

In a cluster environment, the manager process is always run on the first listed node in the appfile/hostfile. The master process is run on the second listed node and the remaining processes are assigned to the remaining nodes. If there are more MPI processes than nodes, then the extra processes are randomly assigned to the remaining nodes. If there are more nodes when compared to MPI processes, then some of the extra nodes are idle during the run.

Caution should be exercised in setting the number of spawned MPI processes (-np). If –np is set greater than N+1 (indicated in the .out file), then more MPI processes are spawned than there are parallelizable tasks for the LDM run. In such instances, these MPI processes utilize computational resources but do not contribute to the solution, thereby increasing runtimes.

If –np is set lower than N+1 (indicated in the .out file), then fewer MPI processes are spawned than there are parallelizable tasks for the LDM run. This may be acceptable if you are limited by hardware (node/socket) availability. However, if you have a sufficient number of nodes (at least N+1), then it is highly recommended to set –np equal to N+1 to maximize scalability.

MPI Process Type

Functions

Master Process

Consists of all non-parallelizable tasks

Optimization is run

Slave Process(es)

Consists of parallelizable tasks

Input deck copies are available

Manager Process

No tasks are run in this process, it manages the way nodes/sockets are assigned tasks.

Manager process makes multiple copies of the input deck for the slave processes.

Table 3: Types and functions of the MPI Processes

This assignment is based on the sequence of nodes that you specify in the appfile. The appfile is a text file which contains process counts and the list of programs. Nodes can be repeated in the appfile.

Frequently Asked Questions (LDM)

LDM parallelization is based on task distribution. If the maximum number of tasks which can be run at the same time is N, then specifying (N+1) processes is recommended (the extra process is the manager process). Using greater than (N+1) nodes will not improve the performance, and may possibly adversely affect run time.

For LDM runs, the .out file suggests the number of MPI Processes you can use, based on your model.

In a cluster environment, when there are only M physical nodes/machines available (M < N), the correct way to start the job is to list M+1 nodes/machines in the hostfile/appfile (the first node is repeated). The manager process requires only a small amount of resources and can be safely shared with the master process:

The way to assign such a distribution is repeating the first physical host in the hostfile/appfile.

For example: hostfile for Intel MPI can be:

Node1
Node1
Node2
Node3
…….

Note:

For Frequency Response Analysis any number of MPI Processes may be specified via -np (up to the number of loading frequencies in the model); however, this should be less than or equal to the number of physical processors (or sockets) in the cluster.

No. Memory estimates for serial runs and that of a single process in a LDM run are the same. They are based on the solution of a single (most demanding) subcase.

Yes. Disk space usage for each process will be smaller, because only temporary files related to task(s) solved for that particular process will be stored. But the total amount of disk space (for all MPI processes combined) will be larger than that in the serial run and this is noticeable, especially in parallel runs on a single shared-memory machine.

In addition to Load Decomposition Method (LDM), OptiStruct SPMD includes another approach for parallelization called Domain Decomposition Method (DDM) for analysis and optimization. DDM allows you to run a single subcase of analysis and/or optimization with multiple processors. The solution time will be significantly reduced in DDM mode and the scalability is much higher compared to the legacy shared memory processing parallelization approach, especially on machines with a high number of processors/sockets (for example, greater than 8).

graph_partioning_ddm

Figure 2: Example illustrating Graph Partitioning for the DDM implementation in OptiStruct

The DDM process utilizes graph partition algorithms to automatically partition the geometric structure into multiple domains (equal to the number of MPI processes). During FEA analysis/optimization, an individual domain/MPI process only processes its domain related calculations. Such procedures include element matrix assembly, linear solution, stress calculations, sensitivity calculations, and so on.

spmd_flowchart3

Figure 3: Example DDM setup with four MPI processes (-np=4). There are 2 nodes/sockets available for use. Two MPI processes are assigned to a single node in this case.

The necessary communication across domains is accomplished by OptiStruct and is required to guarantee the accuracy of the final solution. When the solution is complete, result data is collected and is output to a single copy of the .out file. From the user’s perspective, there will be no difference between DDM and serial runs in this aspect.

Supported Solution Sequences for DDM

Linear, Nonlinear static analysis/optimization, Structural Direct Frequency Response Analysis (MUMPS is also available for SMP run via the SOLVTYP entry), Normal Modes Analysis, and Buckling Analysis solution sequences are generally supported. Preloaded Modal Frequency Response (with AMLS/AMSES) is supported. Direct Frequency Response with Fluid–Structure Interaction (Acoustic Analysis) is supported. Fatigue Analysis (based on linear static analysis) is also supported. Normal Mode and Buckling Analysis/Optimization are supported via the Lanczos eigensolver, additionally, the MUMPS solver can also be activated for the SMP run using the SOLVTYP entry. The Iterative solvers, however, are currently not supported in conjunction with DDM.

Note:

1.	The –ddm run option can be used to activate DDM. Refer to Launching OptiStruct SPMD for information on launching Domain Decomposition in OptiStruct.

In DDM mode, there is no distinction between MPI process types (for example, manager, master, slave, and so on). All MPI processes (domains) are considered as worker MPI processes. If –np n is specified, OptiStruct partitions n geometric domains and assigns each domain to one MPI process. These MPI processes are then run on corresponding sockets/machines depending on availability.

DDM works by symmetric partitioning of the model, therefore, it is recommended to set –np (number of MPI processes spawned) such that the number of MPI processes assigned to each machine/socket is the same. For example, if the number of available machines/sockets that can support MPI processes is equal to k, then –np should be set equal to an integral multiple of k.

Hybrid computation is supported. –nt can be used to specify the number of threads (m) per MPI process in an SMP run. Sometimes, hybrid performance may be better than pure MPI or pure SMP mode, especially for blocky structures. It is also recommended that the total number of threads for all MPI processes (n x m) should not exceed the number of physical cores of the machine.

Frequently Asked Questions (DDM)

This depends on whether DDM is run in a cluster with separate machines or in a shared memory machine with multiple processors/sockets. See details below:

Shared memory machine with multiple processors:

It is important to try avoiding an out-of-core solution in a shared memory machine with multiple processors/sockets. Otherwise multiple processors will compete with the system I/O resources and slow down the entire solution. Use –core in and limit the number of MPI processes -np to make sure OptiStruct runs in the in-core mode. The number of MPI processes –np is usually dictated by the memory demand. When -np is set, you can determine the number of threads per MPI process based on the total number of cores available.

Note: Total memory usage is the summation of OptiStruct usage listed in memory estimation section in the .out file and the MUMPS usage listed in .stat file. The MUMPS estimation is also listed in the memory estimation section of .out file.

Cluster with separate machines:

A generally cautious method is to specify the number of threads per MPI process -nt equal to the number of cores per socket, and to specify the number of MPI processes per machine equal to the number of sockets in each machine. You can extrapolate from this to the cluster environment. For example, if one machine in a cluster is equipped with two typical Intel Xeon Ivy bridge CPU’s, you can set two MPI processes per machine, and 12 cores per MPI process (-nt=12) since a typical Ivy bridge CPU consists of 12 cores.

Yes, memory per MPI process for a DDM solution is significantly reduced compared to serial runs. DDM is designed for extremely large models on machine clusters. The scaling of out-of-core mode on multiple MPI processes is very good because the total I/O amount is distributed and the smaller I/O is better cached by system memory.

Yes. Disk space usage is also distributed.

Yes, see Supported Solution Sequences section.

Yes, DDM can be used in Analysis and Optimization. For details, see the Supported Solution Sequences section above.

Yes, the solver utilizes multiple processors/sockets/machines to perform matrix factorizations and analysis.

This depends on the available memory and disk-space resources. LDM will not reduce the memory and disk-space requirement for each MPI process. If memory is a bottleneck, DDM is recommended. Additionally, if multiple subcases share the same boundary conditions, LDM will not parallelize the model effectively and again, and DDM is recommended in this case.

Frequently Asked Questions (OptiStruct SPMD)

It is possible to run OptiStruct SPMD over LAN. Follow the corresponding MPI manual to setup different working directories of each node the OptiStruct SPMD is launched.

There is no single answer to this question. If the computer has sufficient memory to run all tasks in-core, expect faster solution times as MPI communication is not slowed down by the network speed. But if the tasks have to run out-of-core, then computations are slowed down by disk read/write delay. Multiple tasks on the same machine may compete for disk access, and (in extreme situations) even result in wall clock time slower than that for serial (non-MPI) runs.

The flowchart below provides a quick reference to determine the number of Nodes for each parallelization.

spmd_flowchart2

Figure 4: Flowchart showing the process to follow for LDM and DDM runs.

To run parallel MPI processes, distributed memory (with parallel access) is essential. If a single node contains multiple sockets (each with a single Processor), then theoretically, an equivalent number of MPI processes (equal to the number of sockets) can be run on the node, provided sufficient RAM is available to handle all MPI processes simultaneously in parallel. However, if sufficient distributed memory is not available in the RAM, it is typically more efficient to use Shared Memory Parallelization (SMP), instead of SPMD and use multiple logical processors/cores within the node in parallel via the –nt run option.

When each node has sufficient RAM to execute only a single serial OptiStruct run, activate SMP on each node by splitting up the run into multiple threads (using more than four threads, -nt=4 is usually not effective).

For example:

On a 4 node cluster (with 2 sockets each) and if overall each node contains 8 cores, you can run:

Insufficient RAM: optistruct <inputfile> -ldm -np 4 –nt 4

Sufficient RAM: optistruct <inputfile> -ldm -np 8 –nt 4

There are several ways to launch parallel programs with OptiStruct SPMD. Remember to propagate environment variables when launching OptiStruct SPMD, if needed. Refer to the respective MPI vendor’s manual for more details. In OptiStruct 14.0, commonly used MPI runtime software are automatically included as a part of the HyperWorks installation. The various MPI installations are located at $ALTAIR_HOME/mpi.

Using Solver Scripts

On a single host (for IBM Platform MPI (Formerly HP-MPI) using solver script

Load Decomposition Method (LDM)

[optistruct@host1~]$ $ALTAIR_HOME/scripts/optistruct –mpi [MPI_TYPE] –ldm -np [n] [INPUTDECK] [OS_ARGS]

Domain Decomposition Method (DDM)

[optistruct@host1~]$ $ALTAIR_HOME/scripts/optistruct –mpi [MPI_TYPE] –ddm -np [n] [INPUTDECK] [OS_ARGS]

Where,

[MPI_TYPE]: is the MPI implementation used:

pl for IBM Platform-MPI (Formerly HP-MPI)

i for Intel MPI

-- ( [MPI_TYPE] is optional, default MPI implementation on Linux machines is i

Refer to the Run Options page for further information).

[n] is the number of processors

[INPUTDECK] is the input deck file name

[OS_ARGS] lists the arguments to OptiStruct (Optional, refer to Run Options for further information).

Note:

1.	Adding the command line option “-testmpi”, runs a small program which verifies whether your MPI installation, setup, library paths and so on are accurate.

2.	OptiStruct SPMD can also be launched using the Run Manager GUI. (Refer to HyperWorks Solver Run Manager)

3.	It is also possible to launch OptiStruct SPMD without the GUI/ Solver Scripts. (Refer to the Appendix)

4.	Adding the optional command line option “–mpipath PATH” helps you find the MPI installation if it is not included in the current search path or when multiple MPI’s are installed.

5.	If a SPMD run option (-ldm/ -mmo/ -ddm /-fso) is not specified, LDM is run by default.

Using Solver Scripts

On a single host using solver script (for HP-MPI, Platform-MPI, Intel-MPI and MS-MPI)

Load Decomposition Method (LDM)

[optistruct@host1~]$ $ALTAIR_HOME/hwsolvers/scripts/optistruct.bat –mpi [MPI_TYPE] -ldm –np [n] [INPUTDECK] [OS_ARGS]

Domain Decomposition Method (DDM)

[optistruct@host1~]$ $ALTAIR_HOME/hwsolvers/scripts/optistruct.bat –mpi [MPI_TYPE] -ddm –np [n] [INPUTDECK] [OS_ARGS]

Where,

[MPI_TYPE]: is the MPI implementation used:

pl for IBM Platform-MPI (Formerly HP-MPI).

pl8 for IBM Platform-MPI

i for Intel MPI

ms for MS-MPI

-- ( [MPI_TYPE] is optional, default MPI implementation on Windows machines is i

Refer to the Run Options page for further information).

[n] is the number of processors

[INPUTDECK] is the input deck file name

[OS_ARGS] lists the arguments to OptiStruct (Optional, refer to Run Options for further information).

Notes:

1.	Adding the command line option “-testmpi”, runs a small program which verifies whether your MPI installation, setup, library paths and so on are accurate.

2.	OptiStruct SPMD can also be launched using the Run Manager GUI. (Refer to HyperWorks Solver Run Manager)

3.	It is also possible to launch OptiStruct SPMD without the GUI/ Solver Scripts. (Refer to the Appendix)

4.	Adding the optional command line option “–mpipath PATH” helps you find the MPI installation if it is not included in the current search path or when multiple MPI’s are installed.

5.	If a SPMD run option (-ldm/ -mmo/ -ddm /-fso) is not specified, LDM is run by default.

Appendix

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpirun -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mpimode

Domain Decomposition Method (DDM)

[optistruct@host1~]$ mpirun -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpirun -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpirun -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -fsomode

Where,

optistruct_spmd is the OptiStruct SPMD binary

[n] is the number of processors

[INPUTDECK] is the input deck file name

[OS_ARGS] lists the arguments to OptiStruct other than –mpimode/-ddmmode/-mmomode/-fsomode (Optional, refer to Run Options for further information).

Note:

Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option –mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these run options is not used, there will be no parallelization and the entire program will be run on each node.

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] -mpimode

Domain Decomposition Method (DDM)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] -ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] -fsomode

Where,

[appfile]: is a text file which contains process counts and a list of programs.

Note:

Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option –mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these options is not used, there will be no parallelization and the entire program will be run on each node.

Example: 4 CPU job on 2 dual-CPU hosts (the two machines are named: c1 and c2)

[optistruct@host1~]$ cat appfile

-h c1 –np 2 $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] -mpimode

-h c2 –np 2 $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] –mpimode

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpirun –f [hostfile] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mpimode

Domain Decomposition Method (DDM)

[optistruct@host1~]$ mpirun –f [hostfile] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpirun –f [hostfile] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpirun –f [hostfile] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/linux64/optistruct_spmd [INPUTDECK] [OS_ARGS] -fsomode

Where,

[hostfile] is a text file which contains the host names.

Line format is as follows:

[host i]

Note:

1.	One host requires only one line.

2.	Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option –mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these options is not used, there will be no parallelization and the entire program will be run on each node.

Example: 4 CPU job on 2 dual-CPU hosts (the two machines are named: c1 and c2)

[optistruct@host1~]$ cat hostfile

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpirun -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mpimode

Domain Decomposition Method (DDM)

[optistruct@host1~]$ mpirun -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpirun -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpirun -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -fsomode

Where,

optistruct_spmd is the OptiStruct SPMD binary

[n] is the number of processors

[INPUTDECK] is the input deck file name

[OS_ARGS] lists the arguments to OptiStruct other than –mpimode/-ddmmode/-mmomode/-fsomode (Optional, refer to Run Options for further information).

Note:

Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option -mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these options is not used, there will be no parallelization and the entire program will be run on each node.

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpiexec -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mpimode

Domain Decomposition Method (DDM)

[optistruct@host1~]$ mpiexec -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpiexec -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpiexec -np [n]

$ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] [OS_ARGS] -fsomode

Where,

optistruct_spmd is the OS SPMD binary

[n] is the number of processors

[INPUTDECK] is the input deck file name

[OS_ARGS] lists the arguments to OptiStruct SPMD other than –mpimode/-ddmmode/-mmomode/-fsomode (Optional, refer to Run Options for further information).

Note:

Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option –mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these options is not used, there will be no parallelization and the entire program will be run on each node.

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] -mpimode

Domain Decomposition Method (DDM)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] -ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpirun –f [appfile]

-h [host i] -np [n] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] -fsomode

Where,

[appfile] is a text file which contains process counts and a list of programs.

Note:

Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option –mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these options is not used, there will be no parallelization and the entire program will be run on each node.

Example: 4 CPU job on 2 dual-CPU hosts (the two machines are named: c1 and c2)

[optistruct@host1~]$ cat appfile

-h c1 –np 2 $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] -mpimode

-h c2 –np 2 $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] –mpimode

Load Decomposition Method (LDM)

[optistruct@host1~]$ mpiexec –configfile [config_file]

-host [host i] –n [np] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/ optistruct_spmd [INPUTDECK] -mpimode

Domain Decomposition Mode (DDM)

[optistruct@host1~]$ mpiexec –configfile [config_file]

-host [host i] –n [np] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/ optistruct_spmd [INPUTDECK] –ddmmode

Multi-Model Optimization (MMO)

[optistruct@host1~]$ mpiexec –configfile [config_file]

-host [host i] –n [np] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/ optistruct_spmd [INPUTDECK] -mmomode

Failsafe Topology Optimization (FSO)

[optistruct@host1~]$ mpiexec –configfile [config_file]

-host [host i] –n [np] $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/ optistruct_spmd [INPUTDECK] -fsomode

Where,

[config_file] is a text file which contains the command for each host.

Note:

1.	One host needs only one line.

2.	Running OptiStruct SPMD, using direct calls to the executable, requires an additional command-line option –mpimode/-ddmmode/-mmomode/-fsomode (as shown above). If one of these options is not used, there will be no parallelization and the entire program will be run on each node.

Example: 4 CPU job on 2 dual-CPU hosts (the two machines are named: c1 and c2)

[optistruct@host1~]$ cat hostfile

-host c1 –n 2 $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] -mpimode

-host c2 –n 2 $ALTAIR_HOME/hwsolvers/optistruct/bin/win64/optistruct_spmd [INPUTDECK] –mpimode