Global MHD Simulation Code for the Earth's Magnetosphere Using HPF/JA

Tatsuki Ogino
Solar-Terrestrial Environment Laboratory, Nagoya University

1. Introduction

We have been able to execute a 3-dimensional global magnetohydrodynamic (MHD) simulation of the interaction between the solar wind and earth's magnetosphere in order to study the structure and dynamics of the magnetosphere on a variety of computers using a fully vectorized MHD code. Among the computers we have used are: CRAY Y-MP, Fujitsu VP-2600, Hitachi S820, and NEC SX-3. This flexibility of using many different computers allowed us to work together with many scientists executing our computer simulations. However, as vector parallel and massively parallel supercomputers have come into the simulation community, a number of different approaches are now being used. Many simulation scientists have lost their common language and are being forced to speak new dialects.

We began to use the vector parallel supercomputer, Fujitsu VPP500, in the Computer Center of Nagoya University in 1995 and have succeeded in rewriting our 3-dimensional global MHD simulation code in VPP Fortran to achieve a high performance of over 17 Gflops. Moreover, we have used a new supercomputer, the Fujitsu VPP5000/56 since last December to achieve a higher performance of over 400 Gflops in VPP Fortran. However, the MHD code in VPP Fortran cannot achieve such a high performance using other supercomputers such as Hitachi SR8000 and NEC SX-5. Thus we face a difficult problem in collaborating in computer simulations with other scientists even in our domestic universities. We very much hope to recover the ability to have a common language in the supercomputer world. Recent candidates for this common language appear to be High Performance Fortran (HPF) and Message Passing Interface (MPI). We look forward to the time when we can use these compilers in supercomputers.

Last year I heard from some engineers in Japanese supercomputer companies that they had tried to develop an extended version of HPF created by the Japanese HPF Association (JAHPF) to supply the new HPF compiler in the near future. I was very interested in this information because the performance of the original HPF was questionable and we want to test this performance by ourselves. Since last June, we have had the opportunity to use HPF/JA (an extended version of HPF) with a supercomputer of the vector-parallel type, Fujitsu VPP5000/56. We immediately began to translate our 3-dimensional MHD simulation code for the Earth's magnetosphere from VPP Fortran to HPF/JA. The MHD code was fully vectorized and fully parallelized in VPP Fortran. We successfully rewrote the code from VPP Fortran to HPF/JA in three weeks, and needed an additional two weeks to perform a final verification of the calculation results. The entire performance and capability of the HPF MHD code are shown to be almost comparable to those of the VPP Fortran in a typical simulation by Fujitsu VPP5000/56.

2. From VPP Fortran to HPF/JA

We asked the Computer Center of Nagoya University to begin to use HPF in their supercomputer as soon as possible. Last June, we had an opportunity to hear a lecture on HPF by a Fujitsu engineer and we began to use HPF/JA on the Fujitsu VPP5000. We heard that the use of HPF was not very successful in the USA and Europe, and that users would rather abandon the usage of HPF due to its difficulty to achieving a high performance. However, we have high expectations of HPF/JA because supercomputer companies in Japan have had good experiences with compilers in operating vector-parallel machines. I will decide to translate our MHD code from VPP Fortran to HPF if half the performance of VPP Fortran can be obtained using HPF. We have experience in fully vectorizing and parallelizing several test programs and 3-dimensional global MHD codes for the earth's magnetosphere using VPP Fortran. Thus I had the impression that we could obtain good performance by translating the MHD codes from VPP Fortran to HPF/JA after hearing the HPF lecture. The main points are summarized as follows,

(1) We can use the same domain division (z-direction in 3-dimension) 
    as in VPP Fortran, and overlap data at the boundary of distributed  
    data which was handled by "sleeve" in VPP Fortran can be replaced 
    by "shadow" and "reflect" in HPF/JA.
(2) Parallelizing formalities for division are performed by  
   "independent" sentence as is done in VPP Fortran.
(3) lump transmission of distributed data is done by "asynchronous" 
    sentence.
(4) "asynchronous" sentence can be used for lump transmission between 
    the distributed data with different divided directions.
(5) Unnecessary communication, which was problem in HPF, can be removed 
    by instructions of "independent new(  )" and "on home local(  )".
(6) Maximum and minimum can be calculated in parallel by "reduction".

Fortran programs using VPP Fortran can be be directly rewritten using HPF/JA owing to (1)-(3) keeping their original styles. If unnecessary communication could be completely stopped by (5), the efficiency of parallelization could be greatly improved. We can use the instructions to stop unnecessary communication if the calculation results don't change with insertion of the instructions. Generally it is very difficult for the parallel compiler to know whether the communication is needed or not in advance because variables might be rewritten at anytime. Therefore unnecessary communication that users cannot be aware of frequently happens and the efficiency of parallelization cannot increase. That is, the compiler just compiles programs considering all cases; it cannot provide a good solution if there exists any uncertain portion, and therefore unexpected and unnecessary communication very often occurs in the execution of a program.

We can succeed in parallelization of the MHD program even in the worst case because the high speed alternation in the division directions by (4) can solve almost all the difficulties of parallelization in the MHD program, e.g. in the treatment of boundary conditions. We could fully parallelize our MHD code by using the function of division alternation when we used the ADETRAN compiler of Matsusita Electric Co. We obtained an efficiency of parallelization over 85% using the ADETRAN massively parallel computer with 256 processors. Moreover, we can execute parallel computation in maximum, minimum and summation by using a "reduction" sentence in HPF/JA. Because we had high expectations for translating programs from VPP Fortran to HPF/JA, we soon began to rewrite several test programs and the 3-dimensional MHD code.

3. Comparison of Processing Capability by VPP Fortran and HPF/JA

Our aim is to translate the 3-dimensional global MHD simulation code for interaction between the solar wind and the earth's magnetosphere from VPP Fortran to HPF/JA and to achieve an efficiency more than 50% over VPP Fortran. In the global MHD simulation, the MHD and Maxwell's equations are solved by the modified leap-frog method as an initial value and boundary value problem to study the response of the magnetosphere to variations of the solar wind and the interplanetary magnetic field (IMF). Because the external boundary is put in a distant region to avoid its influence on the boundary condition and the tail boundary is extended to look at structure of the distant tail, then higher spatial resolution is required to obtain numerically accurate results. Therefore, the number of 3-dimensional grid points increases up to the limit of computers.

Examples of MHD simulations of the solar wind-magnetosphere interaction are shown in Figures 1 and 2. Figure 1 shows the 3-dimensional structure of magnetic field lines under steady state conditions for the earth's magnetosphere when the IMF is northward and duskward . A dawn-dusk asymmetry appears in the structure of magnetic field lines because magnetic reconnection at the magnetopause occurs in the high latitude tail on the dusk side in the northern hemisphere and on the dawn side in the southern hemisphere. Figure 2 shows a snapshot of the earth's magnetosphere obtained by 3-dimensional global MHD simulation when the ACE satellite is monitoring the upstream solar wind and IMF every 1 minute. These data were used as input to the simulation. This simulation is one of the fundamental studies in the international space weather program designed to develop numerical models and to understand the variations in the solar-terrestrial environment from moment to moment. It requires 50-300 hours of computation time to execute these 3-dimensional global MHD simulations of the earth's magnetosphere even by VPP5000/56.

Table 1 shows the comparison of computer processing capabilities of VPP Fortran and HPF/JA by using a 3-dimensional global MHD code for the solar wind-magnetosphere interaction run on the Fujitsu VPP5000/56. Both the Fortran codes are fully vectorized and fully parallelized, and are adjusted to reach the maximum performance using the results of several test runs applying practical simulations. The table includes the number of PEs (Processing Elements), 3-dimensional grid points except for the boundary, cpu time (sec) to execute and advance of one time step, computation speed (Gflops) and computation speed per PE (Gflops/PE) for both VPP Fortran and HPF/JA.

The characteristics of the Fujitsu VPP5000/56 in the Computer Center of Nagoya University are as follows,

   Number of PEs:                               56 PE
   Theoretical maximum speed of PE:            9.6 Gflops
   Maximum memory capacity per PE for users:   7.5 GB

As is known from Amdahl's law, it is quite important to fully vectorize in all inner do loops and to fully parallelize in all outer do loops in order to increase the efficiency of vectorization and parallelization performance. The efficiency would be quite low if only one do loop could not be vectorized or parallelized. Users could achieve and excellent computational performance when they succeed in vectorizing or parallelizing the last do loop. In the 3-dimensional MHD simulation, the number of grid points is given by nx*ny*nz, division of distributed data is taken in z-direction and the size of the array in z-direction becomes nz2=nz+2. Therefore, it is desirable that the array size in the divided direction, nz2 is chosen to be integer times the number of PEs. Moreover, the efficiency of vectorization and parallelization becomes better as an amount of jobs, namely the number of grid points, nx*ny*nz increases. The computational results for the scalar mode and for the vector mode with only 1PE are also shown in the Table for the sake of comparison with parallel computation. It can be easily understood what this Table means if we look at it from those points of view.

It should be noted from the Table that the vector mode with 1PE is about 40 times faster than the scalar mode and that computation speed of HPF/JA is almost comparable to that of VPP Fortran. Furthermore, the scalability of parallel computation is well satisfied given that the computation speed is roughly proportional to the number of PEs. However, when we look at the Table in more detail, it is noted that it was not suitable to choose the grid number of nx*ny*nz=500*100*200 used in the previous simulation for parallel computation. This is because the ratio of the number of array variables in the divided direction, nz2=202=101*2 to the number of PEs is not normally an integer. This can be known from that the computation speed increases but is lower than the speed expected from the linear proportion to the number of PEs.

The scalability is greatly improved when we choose the number of grid points as nx*ny*nz=200*100*482 (nz2=480=2**5*3*5) and 800*200*670 (nz2=672=2**5*3*7) in the MHD simulation. Furthermore we achieved a quite high performance beyond 400 Gflops in both cases of VPP Fortran and HPF/JA when we chose a large number of grid points given by nx*ny*nz=1000*500*1118 and 1000*1000*1118. This is because nz2 keeps an integer times the number of PEs and also the program size of MHD code becomes larger. These results clearly show that a 3-dimensional global MHD simulation of the earth's magnetosphere can achieve a high efficiency of 76.5% in the absolute measurement of vector and parallel computation using 56 PEs of the Fujitsu VPP5000/56.

4. Development of MHD Simulation

Since the 2 and 3-dimensional global MHD simulations of the interaction between the solar wind and the earth's magnetosphere have been carried out over the past 20 years, we shall present a short history of their development and evaluate their future. Figure 3 shows a summary plot of the 2 and 3-dimensional global MHD simulations indicating how many grid points were used in each year. In this figure, the black circles and triangles show the MHD simulations executed by other groups, and the white ones and crosses show the MHD simulations executed by our group. We began the 3-dimensional MHD simulations to study MHD instabilities in toroidal plasmas using small number of grid points given by 18*16*12 and 26*26*18 including the boundary in 1976. However, the MHD simulations required about 50-200 hours of computing time using a Fujitsu M-100 and M-200. The execution of such large scale MHD simulations was practically limited by speed of the computers at that time.

We began a global MHD simulation of the earth's magnetosphere using a larger number of grid points given by 62*32*32 and 50*50*26, including the boundary using a CRAY-1 in 1982, when the memory of CRAY-1 of 1MW limited the scale of the MHD simulation. At that time, we used to consider how many interesting simulations could be executed if 100 cube grid points could be used. The Japanese supercomputers subsequently appeared, and we could execute the MHD simulations with 100 cube grid points using Fujitsu VP-100, VP-200 and VP-2600. Later, we had a good opportunity to use the Fujitsu VPP500 vector parallel machine in 1995 and began to use VPP5000 in 2000; this practically allowed us to execute the MHD simulation with 1000 cube grid points. When we look at the development of the MHD simulations from the point of view of the number of grid points used in those 24 years, we can recognize the pattern of a 4 fold increase every 3 years or a 10 fold increase every 5 years.

Figure 4 shows the computation time for an advance of one time step which is needed to execute the MHD simulation, while Figure 5 shows the capacity of the computer memory which is needed to execute the MHD simulation with the total grid points. The computation time and capacity generally depend of the kind of computers and numerical methods, and the computation time becomes longer and the necessary computer memory increases in linear proportion to the number of total grid points. Since the global MHD simulation of the earth's magnetosphere requires repetitive calculations from several thousand to tens of thousands times the one time step advance, it is necessary to make the computation time of one time step advance less than 10 seconds in order to carry out practical MHD simulations. Thus, the number of usable grid points is automatically limited in MHD simulations by the speed of the computers. At the same time the program size for MHD simulations with the chosen grid points must be less than size of the computer memory.

Since the time evolution of 8 physical components (i.e. the density, 3 components of velocity, plasma pressure and 3 components of magnetic field) is calculated in the MHD code, the number of independent variables is 8 Ng and the capacity is 32 Ng Bytes in single mode, where Ng is the total grid number. Therefore, the key variables in the MHD code are 32 Ng Bytes in the use of the 2 step Lax-Wendroff method (2LW) and 64 Ng Bytes in the use of the modified Leap-Frog method (MLF) because the MHD quantities must be stored at two time step intervals. The 3-dimensional global MHD code which was originally developed for the CRAY-1 and has been used by other vector machines requires only 1.3 times the capacity to store the MHD key variables for a 1-dimensional array in order to effectively decrease the work area. It becomes impossible to effectively decrease the work area for parallel computers, because the program size increases considerably in the parallel machines such as ADENART, VPP500 and VPP5000. However the required computer memory in VPP500 and VPP5000 is 2.5 times of capacity to the key variables (160 Ng=2.5*64 Ng) in MLF. If the dependence on required computer memory for the total grid number could be concretely demonstrated, we could easily estimate how large an MHD simulation could be executed to study specified subjects and could establish more precisely the prospects for the future.

Figure 6 shows a comparison of the computer processing capability with the total grid number in MHD simulations. Rapid progress has been made in increasing performance by moving from a vector machine to a vector parallel machine. Newer high performance workstations and personal computers have the capability of older supercomputers (speed of about 0.3 Gflops). However, the processing capability of high end supercomputers keeps about a thousand times higher than that of the other computers at all times. Moreover, definitive differences between supercomputers and other computers appear in the shortage of main memory and cache required to carry out MHD simulations with a large number of grid points. It is practically impossible to execute a 3-dimensional MHD simulation code using workstations and personal computers when the fully vectorized and fully parallelized MHD code would need cpu times of 100 hours using a supercomputer such as VPP5000. It is now always necessary to use a supercomputer with a maximum performance capability in the study using 3-dimensional global MHD simulations of the earth's magnetosphere.

5. Conclusions

We have executed a 3-dimensional global magnetohydrodynamic (MHD) simulation of the interaction between the solar wind and the earth's magnetosphere on a variety of computers using a fully vectorized MHD code. However, simulation scientists have lost their common language with appearance of vector parallel and massive parallel supercomputers. We sincerely hope to see the development of a common language in the supercomputer world. The candidates appear to be High Performance Fortran (HPF) and Message Passing Interface (MPI). We look forward to the time when we can use these compilers in supercomputers.

We have had the opportunity to use HPF/JA with a the Fujitsu VPP5000/56 supercomputer in the Computer Center of Nagoya University since last June. We translated our 3-dimensional MHD simulation code for the Earth's magnetosphere from VPP Fortran to HPF/JA and the code was fully vectorized and fully parallelized in VPP Fortran. The performance of the HPF MHD code was almost comparable to that of the VPP Fortran in a typical simulation using a large number (56) of Processing Elements (PEs). Thus we have reached the following conclusion: fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be relatively easily translated to HPF/JA, and a code in HPF/JA can be expected to achieve comparable performance to one written in VPP Fortran.

The 3-dimensional global MHD simulation code for the earth's magnetosphere solves the MHD and Maxwell's equations in 3-dimensional Cartesian coordinates (x, y, z) as an initial and boundary value problem using a modified leap- frog method. The MHD quantities are distributed in the z direction. The quantities in the neighboring grids can be calculated by the HPF/JA instruction sentences "shadow" and "reflect" when they exist in another PE. Unnecessary communication among PEs can be completely stopped by the instructions "independent, new," and "on home, local." Moreover, the lump transmission of data is used in the calculation of the boundary conditions by the instruction "asynchronous." It was not necessary to change the fundamental structure of the MHD code in the translation procedure. This was a big advantage for translating the MHD code from VPP Fortran to HPF/JA.

Based on this experience, I anticipate that we will find little difficulty in translating programs from VPP Fortran to HPF/JA, and we can expect an almost comparable performance on the Fujitsu VPP5000. The maximum performance of the 3-dimensional MHD code was over 230 Gflops for 32 PEs and over 400 Gflops for 56 PEs. We hope that the MHD code rewritten in HPF/JA can be executed on other supercomputers such as the Hitachi SR8000 and the NEC SX-5 in near future and that HPF/JA becomes a real common language in the supercomputer world. I believe that this is a necessary condition to restart a new and higher level of collaborative research on 3-dimensional global MHD simulation of the magnetosphere with other scientists around the world. I expect efforts by simulation scientists as well as by the Japanese supercomputer companies to reach this goal.

We have made available a part of the boundary condition in the HPF/JA MHD code and a test program of the 3-dimensional wave equation at the following Web address:

http://gedas.stelab.nagoya-u.ac.jp/simulation/hpfja/hpf01.html.

Acknowledgments. The work was supported by a grant in aid for science research from the Ministry of Education, Science and Culture. Computing support was provided by the Computer Center of Nagoya University.

References

Ogino, T., Magnetohydrodynamic simulation of interaction between the solar wind and magnetosphere, J. Plasma and Fusion Research, (Review paper), Vol.75, No.5, CD-ROM 20-30, 1999.

Ogino, T., Computer simulation of solar wind-magnetosphere interaction, Center News, Computer Center of Nagoya University, Vol.28, No.4, 280-291, 1997.

Tsuda, T., On usage of supercomputer VPP5000/56, Center News, Computer Center of Nagoya University, Vol.31, No.1, 18-33, 2000.

High Performance Fortran Forum, Official Manual of High Performance Fortran 2.0, Springer, 1999.

Fujitsu Co., HPF programming -parallel programming (HPF version)-, UXP/V Ver.1.0, June, 2000.

Fujitsu Co., VPP fortran programming, UXP/V, UXP/M Ver.1.2, April, 1997.

Table 1. Comparison of computer processing capability of VPP Fortran and HPF/JA in a 3-dimensional global MHD code for the solar wind-magnetosphere interaction using a Fujitsu VPP5000/56.

 Number    Number of     VPP Fortran               HPF/JA
 of PE     grids        cpu time  Gflops  Gf/PE   cpu time Gflops  Gf/PE

  1PE     200x100x478    119.607 (  0.17) 0.17 (scalar)
  1PE     200x100x478      2.967 (  6.88) 6.88       3.002 (  6.80) 6.80
  2PE     200x100x478      1.458 ( 14.01) 7.00       1.535 ( 13.30) 6.65
  4PE     200x100x478      0.721 ( 28.32) 7.08       0.761 ( 26.85) 6.71
  8PE     200x100x478      0.365 ( 55.89) 6.99       0.386 ( 52.92) 6.62
 16PE     200x100x478      0.205 ( 99.38) 6.21       0.219 ( 93.39) 5.84
 32PE     200x100x478      0.107 (191.23) 5.98       0.110 (186.13) 5.82
 48PE     200x100x478      0.069 (297.96) 6.21       0.074 (276.96) 5.77
 56PE     200x100x478      0.064 (319.53) 5.71       0.068 (299.27) 5.34

  1PE     500x100x200      2.691 (  7.94) 7.94       2.691 (  7.94) 7.94
  2PE     500x100x200      1.381 ( 15.47) 7.73       1.390 ( 15.37) 7.68
  4PE     500x100x200      0.715 ( 29.97) 7.47       0.712 ( 29.99) 7.50
  8PE     500x100x200      0.398 ( 53.65) 6.71       0.393 ( 54.38) 6.80
 16PE     500x100x200      0.210 (101.87) 6.37       0.202 (105.74) 6.61
 32PE     500x100x200      0.131 (163.55) 5.11       0.120 (175.50) 5.48
 48PE     500x100x200      0.100 (214.48) 4.46       0.091 (231.69) 4.82
 56PE     500x100x200      0.089 (239.48) 4.28       0.086 (244.85) 4.37

  4PE     800x200x670      7.618 ( 30.06) 7.52       8.001 ( 28.62) 7.16
  8PE     800x200x670      3.794 ( 60.36) 7.54       3.962 ( 57.81) 7.23
 12PE     800x200x670      2.806 ( 81.61) 6.80       3.005 ( 76.21) 6.35
 16PE     800x200x670      1.924 (119.00) 7.44       2.012 (113.85) 7.12
 24PE     800x200x670      1.308 (175.10) 7.30       1.360 (168.44) 7.02
 32PE     800x200x670      0.979 (233.85) 7.31       1.032 (221.88) 6.93
 48PE     800x200x670      0.682 (335.62) 6.99       0.721 (317.80) 6.62
 56PE     800x200x670      0.595 (384.61) 6.87       0.628 (364.87) 6.52

 16PE   1000x500x1118      9.668 (123.52) 7.72       9.619 (125.50) 7.84
 32PE   1000x500x1118      5.044 (236.73) 7.40       4.992 (241.83) 7.56
 48PE   1000x500x1118      3.550 (336.40) 7.01       3.479 (346.97) 7.23
 56PE   1000x500x1118      2.985 (400.04) 7.14       2.935 (411.36) 7.35
 32PE  1000x1000x1118      9.979 (239.33) 7.48       9.813 (243.37) 7.61
 48PE  1000x1000x1118      7.177 (332.79) 6.93       7.028 (339.85) 7.08
 56PE  1000x1000x1118      5.817 (410.55) 7.33       5.794 (412.23) 7.36
------------------------------------------------------------------------
 : Gflops is an estimated value for comparison with a computation by
   1 processor of a CRAY Y-MP C90

Figure Captions

Figure 1 Structure of 3-dimensional magnetic field lines for steady state conditions in the earth's magnetosphere for an interplanetary magnetic field which is northward and duskward. An asymmetry appears in the magnetosphere due to the occurrence of magnetic reconnection on the dusk side in the northern hemisphere and on the dawn side in the southern hemisphere near the magnetopause. VRML1

Figure 2 An example of snapshots of the earth's magnetosphere obtained by 3-dimensional global MHD simulation, when the observational data of the solar wind and interplanetary magnetic field obtained by the ACE satellite was used as input for the simulation. VRML2

Figure 3 Development of 2 and 3-dimensional MHD simulations as quantified by the total grid number used; the total grid number has increased 4 fold every 3 years or 10 fold every 5 years.

Figure 4 Comparison of computation times corresponding to an advance of one time step versus total grid number in the MHD simulation codes.

Figure 5 Comparison of total capacity of the necessary computer memory versus total grid number in the MHD simulation codes.

Figure 6 Comparison of the computer processing capability versus total grid number in the MHD simulation codes.

Simulation Home Page.

GEDAS Home Page

STELab Home Page.