Benchmark Test of MPI programs by PRIMEPOWER HPC2500


Benchmark Test of New Domain Decomposition 3-Dimensional MHD Code



Method of benchmark Test of MPI Fortran programs by PRIMEPOWER HPC2500

                     November 9, 2004
                     STEL, Nagoya University
                     Tatsuki Ogino

We have made 4 kinds of programs composed of 1D, 2D, 3D domain decomposition codes 
for the vector-parallel machine and 3D domain decomposition code for the 
scalar-parallel machine in the 3-dimensional MHD code written by MPI (Message Passing 
Interface) Fortran to simulate interaction between the solar-wind and the earth's 
magnetosphere. All the input and output processes are comment out and they have 
a style of complete code to be run themselves. These 3-dimensional MHD codes for 
benchmark test are put in the MPI homepage and the execution time for test is about 
a few minutes. 

(A) mearthd1dc2n.f  1D Domain Decomposition for Vector-Parallel Machine  f(nx2,ny2,nz2,nb)
(B) mearthd2dc2n.f  2D Domain Decomposition for Vector-Parallel Machine  f(nx2,ny2,nz2,nb)
(C) mearthd3dc2n.f  3D Domain Decomposition for Vector-Parallel Machine  f(nx2,ny2,nz2,nb)
(D) mearthd3dd2n.f  3D Domain Decomposition for Scalar-Parallel Machine  f(nb,nx2,ny2,nz2)

Let's concretely explain the the details of the program in an example, (D) mearthd3dd2n.f, 
the number of domain decomposition in each direction and number of cpus are defined as follows.

  npex=2    Number of domain decomposition in x-direction
  npey=2    Number of domain decomposition in y-direction
  npez=2    Number of domain decomposition in z-direction
  npe=npex*npey*npez     Number of cpus

They are concretely defined in the program as the following sentences. 

       parameter (npex=2,npey=2,npez=2)
       parameter (npe=npex*npey*npez,npexy=npex*npey)
c      parameter (nx=510,ny=254,nz=254,nxp=155)

When the numbers of domain decomposition are set up as npex=2 in x-direction, 
npey=2 in y-direction and npez=2 in z-direction, the number of total cpus, 
npe=npex*npey*npez=8 is required. All of npex, npey, npez must be more than 2. 
The numbers of array in x, y and z directions are chosen as a power of 2
such as nx2=nx+2=512, ny2=ny+2=256, nz2=nz+2=256 including the both boundary 
grids in the present computation. Generally, nx2, ny2 and nz2 must be divided 
by npex, npey and npez, respectively. 

These numbers of domain decomposition and size of array, (nx=510,ny=254,nz=254) 
must be same in the main program and all the subroutines. 

 f(nb,nx2,ny2,nz2)=f(8,512,256,256)
       parameter (npex=2,npey=2,npez=2)  npe=   8 cpu's
       parameter (npex=2,npey=2,npez=4)  npe=  16 cpu's
       parameter (npex=2,npey=4,npez=4)  npe=  32 cpu's
       parameter (npex=4,npey=4,npez=4)  npe=  64 cpu's
       parameter (npex=4,npey=4,npez=8)  npe= 128 cpu's
       parameter (npex=4,npey=8,npez=8)  npe= 256 cpu's
       parameter (npex=8,npey=8,npez=8)  npe= 512 cpu's
       parameter (npex=8,npey=8,npez=16) npe=1024 cpu's

Thus we can try the benchmark test to know the efficiency of the supercomputer 
like scalar-parallel machine by using the 4-kinds of MHD codes when the number 
of cpus increases in the above table. The three MHD codes of (A) mearthd1dc2n.f, 
(B) mearthd2dc2n.f, (C) mearthd3dc2n.f give "cpu time" in seconds for every two 
time steps of advance, and the other MHD code of (D) mearthd3dd2n.f does "cpu 
time" for every eight time steps of advance. Then cpu time for one time step 
of advance can be obtained for eath MHD code. 


Compile and execution in Fujitsu PRIMEPOWER2500 (ngrd1) are done as follows, 

mpifrt -Kfast_GP2=3,largepage=2 -O5 -Kprefetch=4,prefetch_cache_level=2,
prefetch_strong -Cpp -o progmpi mearthd3dd2n.f
mv progmpi ztest0008
qsub mpiex08b.sh

ngrd1% more mpiex08b.sh
#  @$-q gs  -eo -o  ztest0008.out
#  @$-lP 8 -lM 3.0gb -ls 128mb -lB 0 -lb 0
mpiexec -n 8 -mode limited ./mearthd5/ztest0008

----------------------------------------------------------------------------------
Parallel Computation by Domain Decomposition Method (October 1, 2004)

We have developed the 3-dimensional MHD simulation codes of the earth's magnetosphere 
with 1D, 2D and 3D domain decomposition methods for parallel computation. The programs 
for the vector-parallel machine have the order of array, f(x,y,z,components) as the 
former MHD program in order to allow longer array in the first variables. On the other 
hand, the program for the scalar-parallel machine has the order of array, f(components,x,y,z) 
in order to increase the cache rate. That is, the order of array is changed for that 
the variables for calculation get closer to each other. 
 
gpcs%/vpp/home/usr6/a41456c/mearthd4
(A) mearthd1dc2n.f  1D Domain Decomposition for Vector-Parallel Machine  f(nx2,ny2,nz2,nb)
(B) mearthd2dc2n.f  2D Domain Decomposition for Vector-Parallel Machine  f(nx2,ny2,nz2,nb)
(C) mearthd3dc2n.f  3D Domain Decomposition for Vector-Parallel Machine  f(nx2,ny2,nz2,nb)
(D) mearthd3dd2n.f  3D Domain Decomposition for Scalar-Parallel Machine  f(nb,nx2,ny2,nz2)

The 2-dimensional domain decomposition, (B) is expected to give the maximum performance for 
the vector-parallel machine for large number of cpus (more than 1000 cpus), on the contrary 
the 3-dimensional domain decomposition, (D) is expected to give the maximum performance for
the scalar-parallel machine. 


Back to the previous page