Benchmark Test of MPI programs by PRIMEPOWER HPC2500
Benchmark Test of New Domain Decomposition 3-Dimensional MHD Code
Method of benchmark Test of MPI Fortran programs by PRIMEPOWER HPC2500
November 9, 2004
STEL, Nagoya University
Tatsuki Ogino
We have made 4 kinds of programs composed of 1D, 2D, 3D domain decomposition codes
for the vector-parallel machine and 3D domain decomposition code for the
scalar-parallel machine in the 3-dimensional MHD code written by MPI (Message Passing
Interface) Fortran to simulate interaction between the solar-wind and the earth's
magnetosphere. All the input and output processes are comment out and they have
a style of complete code to be run themselves. These 3-dimensional MHD codes for
benchmark test are put in the MPI homepage and the execution time for test is about
a few minutes.
(A) mearthd1dc2n.f 1D Domain Decomposition for Vector-Parallel Machine f(nx2,ny2,nz2,nb)
(B) mearthd2dc2n.f 2D Domain Decomposition for Vector-Parallel Machine f(nx2,ny2,nz2,nb)
(C) mearthd3dc2n.f 3D Domain Decomposition for Vector-Parallel Machine f(nx2,ny2,nz2,nb)
(D) mearthd3dd2n.f 3D Domain Decomposition for Scalar-Parallel Machine f(nb,nx2,ny2,nz2)
Let's concretely explain the the details of the program in an example, (D) mearthd3dd2n.f,
the number of domain decomposition in each direction and number of cpus are defined as follows.
npex=2 Number of domain decomposition in x-direction
npey=2 Number of domain decomposition in y-direction
npez=2 Number of domain decomposition in z-direction
npe=npex*npey*npez Number of cpus
They are concretely defined in the program as the following sentences.
parameter (npex=2,npey=2,npez=2)
parameter (npe=npex*npey*npez,npexy=npex*npey)
c parameter (nx=510,ny=254,nz=254,nxp=155)
When the numbers of domain decomposition are set up as npex=2 in x-direction,
npey=2 in y-direction and npez=2 in z-direction, the number of total cpus,
npe=npex*npey*npez=8 is required. All of npex, npey, npez must be more than 2.
The numbers of array in x, y and z directions are chosen as a power of 2
such as nx2=nx+2=512, ny2=ny+2=256, nz2=nz+2=256 including the both boundary
grids in the present computation. Generally, nx2, ny2 and nz2 must be divided
by npex, npey and npez, respectively.
These numbers of domain decomposition and size of array, (nx=510,ny=254,nz=254)
must be same in the main program and all the subroutines.
f(nb,nx2,ny2,nz2)=f(8,512,256,256)
parameter (npex=2,npey=2,npez=2) npe= 8 cpu's
parameter (npex=2,npey=2,npez=4) npe= 16 cpu's
parameter (npex=2,npey=4,npez=4) npe= 32 cpu's
parameter (npex=4,npey=4,npez=4) npe= 64 cpu's
parameter (npex=4,npey=4,npez=8) npe= 128 cpu's
parameter (npex=4,npey=8,npez=8) npe= 256 cpu's
parameter (npex=8,npey=8,npez=8) npe= 512 cpu's
parameter (npex=8,npey=8,npez=16) npe=1024 cpu's
Thus we can try the benchmark test to know the efficiency of the supercomputer
like scalar-parallel machine by using the 4-kinds of MHD codes when the number
of cpus increases in the above table. The three MHD codes of (A) mearthd1dc2n.f,
(B) mearthd2dc2n.f, (C) mearthd3dc2n.f give "cpu time" in seconds for every two
time steps of advance, and the other MHD code of (D) mearthd3dd2n.f does "cpu
time" for every eight time steps of advance. Then cpu time for one time step
of advance can be obtained for eath MHD code.
Compile and execution in Fujitsu PRIMEPOWER2500 (ngrd1) are done as follows,
mpifrt -Kfast_GP2=3,largepage=2 -O5 -Kprefetch=4,prefetch_cache_level=2,
prefetch_strong -Cpp -o progmpi mearthd3dd2n.f
mv progmpi ztest0008
qsub mpiex08b.sh
ngrd1% more mpiex08b.sh
# @$-q gs -eo -o ztest0008.out
# @$-lP 8 -lM 3.0gb -ls 128mb -lB 0 -lb 0
mpiexec -n 8 -mode limited ./mearthd5/ztest0008
----------------------------------------------------------------------------------
Parallel Computation by Domain Decomposition Method (October 1, 2004)
We have developed the 3-dimensional MHD simulation codes of the earth's magnetosphere
with 1D, 2D and 3D domain decomposition methods for parallel computation. The programs
for the vector-parallel machine have the order of array, f(x,y,z,components) as the
former MHD program in order to allow longer array in the first variables. On the other
hand, the program for the scalar-parallel machine has the order of array, f(components,x,y,z)
in order to increase the cache rate. That is, the order of array is changed for that
the variables for calculation get closer to each other.
gpcs%/vpp/home/usr6/a41456c/mearthd4
(A) mearthd1dc2n.f 1D Domain Decomposition for Vector-Parallel Machine f(nx2,ny2,nz2,nb)
(B) mearthd2dc2n.f 2D Domain Decomposition for Vector-Parallel Machine f(nx2,ny2,nz2,nb)
(C) mearthd3dc2n.f 3D Domain Decomposition for Vector-Parallel Machine f(nx2,ny2,nz2,nb)
(D) mearthd3dd2n.f 3D Domain Decomposition for Scalar-Parallel Machine f(nb,nx2,ny2,nz2)
The 2-dimensional domain decomposition, (B) is expected to give the maximum performance for
the vector-parallel machine for large number of cpus (more than 1000 cpus), on the contrary
the 3-dimensional domain decomposition, (D) is expected to give the maximum performance for
the scalar-parallel machine.
Back to the previous page