A Report on Computer Processing Capability for the Magnetohydrodynamic Simulation Model



Tatsuki Ogino

Solar-Terrestrial Environment Laboratory, Nagoya University
Honohara 3-13, Toyokawa, Aichi 442, Japan

1. Introduction

In the computer simulation of space plasma phenomena, it is quite important to keep higher numerical accuracy and fine spatial and temporal resolution. To do so, it is strongly needed to extract the maximum performance of comput- ers in execution of the simulation codes as well as to use a numerical algorithm with higher accuracy.

For example, in the global simulation of the earth's magnetosphere, we want to keep the outer boundaries away from the earth to avoid troublesome boundary effects and also to find what is happening in narrow regions of the bow shock, magnetopause and plasma sheet. Therefore, we need to increase the number of grid points to as many as possible and that automatically in- creases the computer memory and computation time. If the grid intervals are changed to be a half in a 3-dimensional simulation box with the same total length, the simulation code usually needs 8 times the computer memory and 16 times computation time. This is an essential reason why we need a supercom- puter with higher speed and greater memory.

It is not easy in general to evaluate computer performance because results strongly depend on the nature of the program itself and conditions of execu- tion time. On the other hand, it is very difficult to estimate how long we need to execute our particular simulation codes only from the catalog lists of a computer performance. Thus we often want to know a rough evaluation or an example of the practical performance of computation in different kinds of computers. In order to realize such a purpose, we have executed some test programs in many kind of computers and we have used the results as a guide to develop the magnetohydrodynamic (MHD) simulation codes.

2. Comparison of Computer Processing Capability

We have had good opportunities to use several kinds of computers. In these trials we tried to execute some test program runs to evaluate the computer performance for fundamental arithmetic calculations and 2- and 3-dimensional MHD simulation codes. Tables 1 and 2 show the results of comparisons of the computer processing capability. In Table 1, simple averages for execution of the four fundamental arithmetic calculations, addition, subtraction, multi- plication and division are shown to evaluate a basic processing capability, where the unit is millions of floating point operations per second (MFLOPS) and the compiler option was adopted to get the maximum performance if a particular compiler option is not written. The values of the processing capability (MFLOPS) stand for a simple average from the four arithmetic calcu- lations.

In Table 2, the execution times for single time step advance of 2- and 3- dimensional MHD simulation codes are shown when 3-dimensional global MHD simulation codes of the interaction between the solar wind and the earth's magnetosphere were used for (a) the dawn-dusk asymmetry model with grid points 50x50x26 [Ogino et al., 1985, 1986a], (b) the dawn-dusk symmetry model with grid points 62x32x32 [Ogino, 1986], and also 2-dimensional MHD simulation code of the interaction between the solar wind and the cometary plasma was used for (c) with grid points 402x102 [Ogino et al, 1986b] including boundary points. Since the three MHD codes were originally developed to efficiently execute by CRAY-1 supercomputer, the program size is not large and less than 1MW memory, We successively applied the MHD codes to other computers after we modified the original codes to get a good computer performance keeping the number of grid points. One essential difference of the two 3-dimensional MHD models of (a) and (b) are the length of the "do loop" in the programs. In model (a), the long "do loop" was separated into several parts of small "do loops" in order to vectorize all the "do loops" in the CRAY-1 compiler because the long "do loop" for vectorization is limited in CRAY computers. On the other hand, the minimum number of long "do loops" is usually used to get a better process- ing performance and model (b) just corresponds to that case.

Table 1 is to demonstrate the average values for the four fundamental arithmetic calculation and tells us a rough evaluation on computer processing capability, where array arguments with a length of 10,000 number are used in the calculations. There, we obtain the average values for vector option and scalar option in compiler for the supercomputers, where all the vector do loops were confirmed to be fully vectorized. The ratio of vector to scalar options can be understood to give the maximum capability of a practical vec- torization in the supercomputers. The vectorization ratio or acceleration ratio of the supercomputer is in the range from 10 to 100 times, and it may become a guide to develop and execute the simulation codes.

In Table 1 are shown only the average values for the four arithmetic calcu- lations. Each value is not equal for addition, subtraction, multiplication and division. However, in most cases the processing times for addition, subtrac- tion and multiplication are almost same; on the contrary, that for division is relatively small and is a quarter the other values. This is true for vector compilers in supercomputers and so it should be noted that the division has worst efficiency in the four arithmetic calculations. Therefore, we should decrease the number of divisions in each "do loop" of simulation codes if we want to have a higher efficiency.

It is surprising that new-age supercomputers such as NEC SX-3 and Fujitsu VP-2600 show quite high performance, larger than 1 GFLOPS. Even if the proc- essing capability of the workstations becomes much higher recently, the prac- tical computation speed is almost 1 to 10 MFLOPS and is less than a hun- dredth the fastest supercomputer speed. Therefore, we must depend on the supercomputer by all means when we carry out a large simulation code. In the last line of the table, the performance of a massive parallel processor, Matsusita ADENART is shown and is almost equivalent to that of the vector- type supercomputer like Fujitsu VP-200.

Strictly speaking, the processing capability of the four fundamental arith- metic calculations does not reflect on that of complete simulation codes, because a complete program is composed of many kinds of calculations and processing. Therefore, the processing capability strongly depends on the character of each complete program. Table 2 show an example on the computer processing capability when we use three types of global MHD simulation codes. Computation times corresponding to single time step advance in the MHD codes are demonstrated in seconds. The new-age supercomputers such as NEC SX-3 and Fujitsu VP-2600 again give excellent results in the global MHD simulation codes. In our test by using the MHD simulation codes, three kinds of super- computers of Fujitsu VP-200, NEC SX-2, and CRAY-YMP-864, and a massive paral- lel processor, Matsusita ADENART give almost comparable performance. It is noted that CRAY-2 did not present good values and that CRAY-XMP and CRAY-YMP did not show good performance for model (b). In those cases, the full vector- ization in compiler was not achieved because we could not understand well how to vectorize, or the length of some "do loops" was too long for vectorization in CRAY computer.

Moreover, it can be noted that we can nowadays get about 10 to 20 times computation performance by using the recent supercomputers in comparison with the first supercomputer, CRAY-1. At the same time we can also use a large amount of computer memory from 300 MB to 1 GB in the simulation, which may permit us to handle large numbers of grid points, much greater than 100x100x100 and 1000x1000 even in the 2-and 3-dimensional MHD simulation codes. As a result, we can confidently expect that we will be able to obtain much physically meaningful results from computer simulations in the STEP interval.

3. Summary

We demonstrated comparisons of computer processing capabilities for funda- mental arithmetic calculations and for three kinds of complete MHD simulation codes. The almost computations to obtain the results were executed in fact by ourselves. These tables to demonstrate computer performances are of course a particular example and do not mean the general performance of computers. However, they may be useful for us to give guidance when we develop new simu- lation codes and execute them by using particular computers.

In the 3-dimensional global MHD simulation of the interaction between the solar wind and the earth's magnetosphere, we particularly need higher speed in calculation and large computer memory. Since such a high performance of computers has been quickly achieved, we will be able to study dynamics of the earth's magnetosphere in more detail in order to compare with theories and observations in the STEP interval.

I would like to express my acknowledgement to the many computer centers where I had opportunities to execute the test programs and also to the staffs, in the computer centers.

References

Ogino, T., A three dimensional MHD simulation of the interaction of the solar wind with the earth's magnetosphere: The generation of field-aligned currents, J. Geophys. Res., 91, A6, 6791, 1986.

Ogino, T, R.J. Walker, M. Ashour-Abdalla, and J.M. Dawson, An MHD simulation of By-dependent magnetospheric convection and field-aligned currents during northward IMF, J. Geophys. Res., 90, 10,835, 1985.

Ogino, T., R.J. Walker, M. Ashour-Abdalla, and J.M. Dawson. An MHD simulation of the effects of the interplanetary magnetic field By component on the inter- action of the solar wind with the earth's magnetosphere during southward interplanetary magnetic field, J. Geophys. Res., 91, 10,029, 1986a.

Ogino, T., R.J. Walker, and M. Ashour-Abdalla, An MHD simulation of the inter- action of the solar wind with the outflowing plasma from a comet, Geophys. Res. Lett., 13, 929, 1986b.

Table 1. A comparison of the processing capability in computers. A test program to execute the four fundamental arithmetic calculations, addition, subtraction, multiplication and division was used to evaluate the com- puter processing capability, where the unit is millions of floating-point operations per second (MFLOPS) and the compiler option to get the maximum performance was adopted if a compiler option is not given. In the table, IAP means to use the inner array processor and NIAP (or NOIAP) means not to use.

   ---------------------------------------------------------------------
   computer             compiler option       processing capability (MFLOPS)
   ---------------------------------------------------------------------
   NEC ACOS-650            Fortran                    0.41
   NEC ACOS-850            NIAP                       1.09
   NEC ACOS-850            IAP                        2.47
   NEC ACOS-930            NIAP, OPT=1                1.04
   NEC ACOS-930            IAP, opt=3                 6.87
   NEC S-2000              NIAP                       7.37
   NEC S-2000              IAP                       13.58
   NEC SX-2A               vector                   196.3
   NEC SX-2                scalar                     7.74
   NEC SX-2                vector                   247.4
   NEC SX-3/14             scalar                    10.78
   NEC SX-3/14             vector                   583.1
   NEC SX-3                vector                 1,406.9
   Fujitsu M-200                                      1.54
   Fujitsu M-380                                      3.64
   Fujitsu M-780/20                                   8.31
   Fujitsu M-780/30        FORT77 O2                 14.16
   Fujitsu M-780/30        FORT77EX O3               18.07
   Fujitsu M-1800          FORT77                    18.46
   Fujitsu VP-100          scalar                     4.17
   Fujitsu VP-100          vector                    94.91
   Fujitsu VP-200          scalar                     3.20
   Fujitsu VP-200          vector                   225.0
   Fujitsu VP-400          vector                   262.8
   Fujitsu VP-2600         FORT77EX O3            1,238.4
   Fujitsu VPP-500 (1PE)   frtpx, -sc                19.78
   Fujitsu VPP-500 (1PE)   frtpx                    730.3
   Fujitsu VPP-5000 (1PE)  frt, -sc                 189.78    (1999.12.27)
   Fujitsu VPP-5000 (1PE)  frt                    3,073.7     (1999.12.27)   
   Hitachi M-680                                      8.76
   Hitachi M-680D          NOIAP, OPT=0               1.23
   Hitachi M-680D          NOIAP, OPT=3               4.37
   Hitachi M-680D          IAP, OPT=3                49.12
   Hitachi M-680H          NOIAP                      6.65
   Hitachi M-680H          IAP                       44.54
   Hitachi S810/10         scalar                     3.96
   Hitachi S810/10         vector                    51.94
   Hitachi S820            vector                   358.9
   Hitachi S820/80         vector                   497.4
   Hitachi S3800/480       vector                   820.7
   VAX 8600                                           0.507
   IBM-3090                Level(0)                   4.03
   IBM-3090                Level(1)                   8.52
   IBM-3090                Level(2)                   8.28
   CRAY-XMP-48             CFT114i off=v              3.39
   CRAY-XMP-48             CFT114i                   36.00
   CRAY-2                  CIVIC                     29.51
   CRAY-YMP-864            -o off                     1.45
   CRAY-YMP-864            -o novector               11.09
   CRAY-YMP-864            -o full                  116.36
   SCS40                   SCSFT o=h                  0.695
   SCS40                   SCSFT vector               8.30
   Asahi Stellar GS-1000   O1 (scalar)                0.538
     (version 1.6)         O2 (vector)                4.67
                           O3 (parallel)             10.77
   NEC EWS-4800/20                                    0.188
   NEC EWS-4800/50                                    0.112
   NEC EWS-4800/210        f77 -O                     1.665
   NEC EWS-4800/220        f77 -O                     1.812
   NEC EWS-4800/260        f77 -O                     2.009
   NEC EWS-4800/350        f77 -O                     4.929
   NEC EWS-4800/360        f77 -O                     5.016
   MicroVAX-3400                                      0.419
   Sun SPARC Station 1     f77 -O                     0.961
   Sun SPARC Station 2     f77 -O                     2.188
   Sun SPARC IPX           f77 -O                     1.646
   Sun SPARC 2 (AS4075)    f77 -O                     1.400
   Sun SPARC Station 10    f77 -O                     2.999
   Sun SPARC S-4/5         f77 -O                     4.217
   Sun SPARC S-4/CL4       f77 -O                     4.419
   Sun SPARC S-4/20H       f77 -O                    18.67
   Sun SPARC S-4/20H(stcpu1) f77 -O                  18.78    (1998.04.17)    
   Sun SPARC S-4/20H(stcpu1) f90 -O                  30.75    (1998.04.17)
   Sun S-7/300U            f77 -O                    20.18
   Sun Ultra 2   (162MHz)  f77 -O                    24.04    (1998.04.09)
   Sun Ultra 2   (162MHz)  f90 -O                    13.56    (1998.04.09)
   Sun Ultra 2   (162MHz)  frt -O (Fujitsu f90)      23.46    (1998.04.18)
   Sun S-7/7000U (296MHz)  f77 -O                    38.19    (1998.04.07)
   Sun S-7/7000U (296MHz)  f90 -O                    23.12    (1998.04.07)
   Sun S-7/7000U 350       f77 -O                    42.01    (1999.08.02)
   Sun S-4/420U            f77 -O                    44.59    (1999.08.02)
   Sun PanaStation         f77 -O                    18.06
   DELL OptiPlex GXi       f77 -O                    11.36    (1997.11.12)
   DEC Alpha (500MHz)      f77                       63.0     (1998.04.17)
   DEC Alpha (500MHz)      f90                       64.1     (1998.04.17)
   SGI Indy                f77 -O                     3.96
   SGI Indigo2             f77 -O                     9.46
   SGI Origin2000(1CPU)    Fortran77                 27.00
   SGI Octane              f77 -O                    17.30    (1999.08.02)
   SGI O2                  f77 -O                     4.87    (1999.08.02)
   DEC alpha 3000AXP/500   f77 -O3                   13.52
   Solbourne               f77 -O3                    1.824
   TITAN                   O1 (scalar)                0.904
   TITAN                   O2 (vector)                3.756
   TITAN III               O1 (scalar)                1.176
   TITAN III               O2 (vector)                6.228
   TITAN III               O3 (parallel)              6.543
   IBM 6091-19             f77 -O                     8.125
   Matsusita ADENART       ADETRAN (parallel)       218.0
   Convex C3810            Fortran -O1 (scalar)       4.089
   Convex C3810            Fortran -O2 (vector)      94.37
   nCUBE2                  HPF                        0.758
   nCUBE2                  HPF SSS32                  1.788
   DECmpp 12000 MP-2 1K pe HPF Ver.3.1               70.95
   DECmpp 12000 MP-2 2K pe HPF Ver.3.1              128.69
   DECmpp 12000 MP-2 4K pe HPF Ver.3.1              189.34

   ---------------------------------------------------------------------

Table 2. Comparison of the computer processing capability for the 2-dimen- sional and 3-dimensional global magnetohydrodynamic (MHD) simulation, where numerical values stand for computation times (in seconds) corresponding to one time step advance in the MHD simulation codes. In the test, the compiler options to get maximum performance were adopted if a particular compiler option is not given. Moreover, the grid numbers used in the MHD simulation codes are (a) 50 x 50 x 26 , (b) 62 x 32 x 32 for 3-dimensional simulation and (c) 402 x 102 for 2-dimensional simulation when the boundary grid points are included.

   ---------------------------------------------------------------------
  computer           compiler        (a)3D-MHD     (b)3D-MHD     (c)2D-MHD
                                      50x50x26      62x32x32      402x102
                                        sec (MFLOPS)  sec (MFLOPS)  sec (MFLOPS)
   ---------------------------------------------------------------------
  NEC ACOS-650        Fortran 77      187.1  (  0.6)159.5  (  0.7) 29.3  (  1.0)
  NEC ACOS-930        NIAP, OPT=1      14.07 (    9) 13.76 (    8)  4.31 (  6.5)
  NEC ACOS-930        IAP,OPT=3         9.97 (   12) 11.34 (   10)  2.44 ( 11.4)
  NEC SX-2            opt=scalar        3.66 (   33)  5.02 (   23)  0.90 ( 31.0)
  NEC SX-2            Fortran 77        0.34 (  356)  0.28 (  412)  0.042(  664)
  NEC SX-3/14         opt=scalar        2.11 (   57)  1.81 (   64)  0.48 ( 58.1)
  NEC SX-3/14         Fortran 77        0.097(1,248)  0.116(  994)  0.0149(1,871)
  NEC SX-3            Fortran 77                                    0.014(1,991)
  Fujitsu M-200       Fortran 77       34.4  (  3.5) 34.2  (  3.4)  7.84 (  3.6)
  Fujitsu M-380       Fortran 77       11.60 (   10)  9.37 (   12)  3.31 (  8.4)
  Fujitsu M-780/20    Fortran 77        4.87 (   25)  3.94 (   30)  1.14 ( 24.5)
  Fujitsu M-780/30    FORT77 O2         3.95 (   31)  5.06 (   23)  0.84 ( 33.2)
  Fujitsu M-780/30    FORT77EX O3       2.63 (   46)  2.21 (   53)  0.66 ( 42.2)
  Fujitsu VP-100      opt=scalar       11.44 (   11)  9.61 (   12)  2.39 ( 11.7)
  Fujitsu VP-100      Fortran 77        0.80 (  151)  0.75 (  154)  0.13 (  214)
  Fujitsu VP-200      opt=scalar       12.20 (   10) 10.23 (   11)  2.56 ( 10.9)
  Fujitsu VP-200      Fortran 77        0.50 (  242)  0.41 (  281)  0.080(  348)
  Fujitsu VP-400      Fortran 77        0.49 (  247)  0.39 (  296)  0.042(  664)
  Fujitsu VP-2600     FORTCLG           0.099(1,223)  0.082(1,405)  0.014(1,991)
  Fujitsu VPP-500( 1PE) frtpx, -sc      2.65 (   46)  3.26 (   36)  0.704( 39.6)
  Fujitsu VPP-500( 1PE) frtpx           0.150(  807)  0.132(  881)  0.029(  961)
  Fujitsu VPP-500( 1PE) frtpx, -sc     3.1043(   39) 5.359 (   22)
  Fujitsu VPP-500( 1PE) frtpx          0.1396(  867) 0.1194(  974)
  Fujitsu VPP-500( 2PE) frtpx, -Wx     0.0749(1,616) 0.0632(1,840)
  Fujitsu VPP-500( 4PE) frtpx, -Wx     0.0440(2,751) 0.0372(3,126)
  Fujitsu VPP-500( 8PE) frtpx, -Wx     0.0277(4,370) 0.0244(4,766)
  Fujitsu VPP-500(16PE) frtpx, -Wx     0.0189(6,405) 0.0155(7,503)
  Fujitsu VPP-5000( 1PE) frt,  -sc     0.717 (  169) 0.694 (  168)0.1238 (  225)
  Fujitsu VPP-5000( 1PE) frt           0.0301( 4026) 0.0264( 4416)0.00441( 6316)
  Fujitsu VPP-5000( 1PE) frt,  -sc     0.7209(  168) 0.8529(  136)
  Fujitsu VPP-5000( 1PE) frt          0.02330( 5201)0.02270( 5110)
  Fujitsu VPP-5000( 2PE) frt,  -Wx    0.01279( 9475)0.01050(11047)
  Fujitsu VPP-5000( 4PE) frt,  -Wx    0.00751(16136)0.00594(19528)
  Fujitsu VPP-5000( 8PE) frt,  -Wx    0.00451(26870)0.00356(32583)
  Fujitsu VPP-5000(16PE) frt,  -Wx    0.00306(39602)0.00225(51554)  
  Hitachi M-680       Fortran 77                                    1.49 ( 18.7)
  Hitachi M-680D      NOIAP, OPT=3      8.42 (   14)  9.31 (   12)  2.11 ( 13.2)
  Hitachi M-680D      IAP, OPT=3        3.54 (   34)  2.75 (   42)  0.57 ( 48.9)
  Hitachi M-680D      IAP, SOPT         3.25 (   37)  2.44 (   48)  0.53 ( 52.6)
  Hitachi S810/10     opt=scalar                                    3.17 (  8.8)
  Hitachi S810/10     Fortran 77                                    0.167(  167)
  Hitachi S820/20     Fortran 77        0.23 (  526)  0.16 (  727)  0.020(1,394)
  Hitachi S3800/480   Fortran 77        0.125(  968)  0.103(1,129)  0.0093(2,998)
  IBM-3033            VS               33.3  (  3.6) 27.8  (  4.2)  7.90 (  3.5)
  IBM-3090            VS                9.17 (   13)  8.76 (   13)  2.27 ( 12.3)
  IBM-3090            Fortvclg L0       9.11 (   13)  8.78 (   13)  2.28 ( 12.2)
  IBM-3090            Fortvclg L1       5.19 (   23)  4.01 (   29)  1.11 ( 25.1)
  VAX-11/750          Fortran         449.5  (  0.3)432.9  (  0.3) 96.64 (  0.3)
  CRAY-1              CFT               1.88 (   64)  1.76 (   65)  0.372( 74.9)
  CRAY-XMP            CFT 1.13          1.67 (   73)  3.85 (   39)  0.282( 98.9)
  CRAY-2              CIVIC 131        10.3  (   12)  7.29 (   16)
  CRAY-XMP-48         off=v             5.68 (   21)  6.15 (   19)  1.436( 19.4)
  CRAY-XMP-48         CFT114i           1.29 (   94)  1.13 (  101)  0.252(  111)
  CRAY-YMP-864        -o off            9.36 (   13)  9.26 (   13)  2.74 ( 10.2)
  CRAY-YMP-864        -o novector       3.62 (   33)  3.81 (   31)  0.999( 27.9)
  CRAY-YMP-864        -o full           0.430(  282)  1.921(   61)  0.0982( 284)
  SCS40               SCSFT o=h        18.86 (  6.4) 20.25 (  5.7)  5.71 (  4.9)
  SCS40               SCSFT             3.94 (   31)  3.81 (   31)  0.964( 28.9)
  TITAN               O1 (scalar)      49.44 (  2.5) 56.70 (  2.1) 12.59 (  2.2)
  TITAN               O2 (vector)      22.02 (  5.5) 23.97 (  4.8)  5.47 (  5.1)
  TITAN III           O1 (scalar)      15.66 (  7.7) 18.57 (  6.3)  3.68 (  7.6)
  TITAN III           O2 (vector)       7.96 (   15)  7.54 (   15)  1.62 ( 17.2)
  TITAN III           O3 (parallel)     7.73 (   16)  7.31 (   16)  1.59 ( 17.5)
  Sun SPARC Station 1 f77 -O           47.50 (  2.5) 47.25 (  2.4) 13.00 (  2.1)
  Sun SPARC Station 2 f77 -O           20.50 (  5.9) 19.88 (  5.8)  4.81 (  5.8)
  Sun SPARC IPX       f77 -O           16.30 (  7.2) 17.60 (  6.6)  5.23 (  5.3)
  Sun SPARC 2(AS4075) f77 -O           15.18 (  8.0) 16.63 (  7.0)  4.92 (  5.7)
  Sun SPARC Station 10 f77 -O           8.22 ( 14.7) 11.46 ( 10.2)  2.04 ( 13.7)
  Sun SPARC S-4/5     f77 -O            5.57 ( 21.7)  6.81 ( 17.1)  1.79 ( 15.6)
  Sun SPARC S-4/CL4   f77 -O            5.68 ( 21.3)  6.79 ( 17.1)  1.79 ( 15.6)
  Sun SPARC S-4/20H   f77 -O            1.90 ( 63.7)  1.92 ( 60.6)  0.344( 81.0)
  Sun SPARC S-4/20H(stcpu1) f77 -O      1.90 ( 63.7)  1.92 ( 60.6)  0.344( 81.0)
  Sun SPARC S-4/20H(stcpu1) f90 -O      1.97 ( 61.4)  1.79 ( 65.0)  0.450( 61.9)
  Sun S-7/300U        f77 -O            1.76 ( 68.8)  1.97 ( 59.1)  0.426( 65.4)
  Sun Ultra 2   (162MHz) f77 -O         1.49 (   81)  3.39 (   34)  0.344(   81)
  Sun Ultra 2   (162MHz) f90 -O         3.91 (   31)  5.14 (   23)  0.680(   41)
  Sun Ultra 2   (162MHz) frt -O (f90)   1.53 (   79)  2.74 (   42)  0.380(   74)
  Sun S-7/7000U (296MHz) f77 -O         0.836(  145)  1.09 (  107)  0.195(  143)
  Sun S-7/7000U (296MHz) f90 -O         3.04 (   40)  2.72 (   43)  0.375(   74)
  Sun S-7/7000U 350   f77 -O            1.055(  115)  1.039(  113)  0.172(  161)
  Sun S-4/420U        f77 -O            0.969(  125)  0.953(  123)  0.156(  178)
  Sun PanaStation     f77 -O            2.08 ( 58.2)  1.87 ( 62.2)  0.348( 80.1)
  DELL OptiPlex GXi   f77 -O            2.68 ( 45.2)  3.14 ( 37.0)  0.641( 43.5)
  DEC Alpha (500MHz)  f77               0.359(  337)  0.383(  304)  0.0781( 346)
  DEC Alpha (500MHz)  f90               0.359(  337)  0.383(  304)  0.0781( 346)
  SGI Indy            f77 -O            8.20 ( 14.8) 10.35 ( 11.2)  2.21 ( 12.6)
  SGI Indigo2         f77 -O            2.68 ( 45.2)  2.85 ( 40.8)  0.775( 36.0)
  SGI Octane          f77 -O            0.475(  255)  0.550(  211)  0.133(  210)
  SGI O2              f77 -O            1.875( 64.6)  2.125( 54.7)  0.325( 85.8)
  SGI Origin2000(1CPU) Fortran77        0.531(  228)  0.797(  146)  0.129(  216)
  SGI Origin2000(2CPU) Fortran77        0.324(  374)  0.464(  251)
  SGI Origin2000(4CPU) Fortran77        0.202(  599)  0.275(  423)
  SGI Origin2000(8CPU) Fortran77        0.155(  781)  0.191(  609)
  DEC alpha 3000AXP/500 f77 -O3         2.59 ( 46.8)  4.41 ( 26.4)  0.56 ( 49.8)
  Solbourne           f77 -O3          23.68 (  5.1) 25.70 (  4.5)  6.06 (  4.6)
  NEC EWS-4800/210    f77 -O3          16.65 (  7.3) 21.15 (  5.5)  4.17 (  6.7)
  NEC EWS-4800/220    f77 -O3          18.50 (  6.5) 17.00 (  6.7)  3.50 (  8.0)
  NEC EWS-4800/260    f77 -O           12.53 (  9.7) 14.87 (  7.8)  3.04 (  9.2)
  NEC EWS-4800/350    f77 -O            6.72 ( 18.0)  7.54 ( 15.4)  1.46 ( 19.1)
  NEC EWS-4800/360    f77 -O            4.69 ( 25.8)  5.40 ( 21.5)  1.06 ( 26.3)
  IBM 6091-19         f77 -O            4.437( 27.3)  4.500( 25.9)  1.125( 24.8)
  Matsusita ADENART   ADETRAN(parallel) 0.431(  281)  0.307(  375)  0.110(  253)
  Convex C3810        f77 -O1 (scalar)  5.704(   21)  6.286(   19)  1.378( 20.2)
  Convex C3810        f77 -O2 (vector)  0.948(  128)  0.895(  130)  0.213(  131)
  nCUBE2E  16 pe      f90 -O (parallel) 2.78 ( 43.5)                0.544( 51.2)
  nCUBE2E  32 pe      f90 -O (parallel)                             0.293( 95.1)
  nCUBE2S 128 pe      f90 -O (parallel)                             0.083(  336)
  nCUBE2  256 pe      f90 -O (parallel)                             0.072(  378)

   ---------------------------------------------------------------------
  CRAY Y-MP4E         1 processor       0.460(  263)  0.431(  263)  0.106(  263)
                      2 processors      0.246(  492)  0.233(  495)  0.062(  450)
                      4 processors      0.136(  890)  0.129(  893)  0.040(  697)
  CRAY Y-MP C90       1 processor       0.289(  419)  0.265(  439)
                      2 processors      0.159(  762)  0.144(  800)
                      4 processors      0.087(1,392)  0.079(1,459)
   ----------------------------------------------------------------------
   ----------------------------------------------------------------------
  computer             compiler       (a)3D-MHD     (b)3D-MHD     (c)2D-MHD
                                         sec (MFLOPS) sec (MFLOPS)    sec (MFLOPS)
                       Grid points    192x192x96     240x120x120   1600x400
  Convex C3810 (1cpu)  240 MFLOPS      52.8  (  143) 77.7  (   95)  3.4  (  131)
  Convex C3820 (2cpu)  480 MFLOPS      29.5  (  256) 41.6  (  178)  1.9  (  235)
  Convex C3840 (4cpu)  960 MFLOPS      18.9  (  400) 16.1  (  283)  1.2  (  372)

                       Grid points    240x120x120    240x120x120   1600x400
  SGI Origin2000(1CPU) Fortran77       37.59 (  201) 40.58 (  183)  2.24 (  199)
  SGI Origin2000(2CPU) Fortran77       19.86 (  381) 21.21 (  351)  1.76 (  253)
  SGI Origin2000(4CPU) Fortran77       10.45 (  724) 11.08 (  672)  1.40 (  319)
  SGI Origin2000(8CPU) Fortran77        5.94 (1,274)  6.21 (1,199)
                       Grid points    320x 80x160    320x 80x160
  SGI Origin2000(1CPU) Fortran77       77.49 (  116) 82.97 (  100)
  SGI Origin2000(2CPU) Fortran77       40.39 (  222) 43.01 (  194)
  SGI Origin2000(4CPU) Fortran77       21.12 (  424) 22.85 (  364)
  SGI Origin2000(8CPU) Fortran77       11.56 (  774) 12.25 (  680)

  -----------------------------------------------------------------------
 Computer Processing Capability
                                         2000.6.25  by Tatsuki OGINO
  -----------------------------------------------------------------------
  computer                           grid number       sec   (MFLOPS) GF/PE
 -----------------------------------------------------------------------
  Matsusita ADENART (256CPU)         180x 60x 60      3.46   (   400)
  Matsusita ADENART (256CPU)         150x100x 50      5.81   (   276)
  CRAY Y-MP C90  (8CPU)              400x200x200      7.00   ( 4,883) 0.61
  SGI Origin2000 (1CPU, earthb)      240x120x120     40.58   (   183) 0.18
  SGI Origin2000 (2CPU)              240x120x120     21.21   (   351) 0.18
  SGI Origin2000 (4CPU)              240x120x120     11.08   (   672) 0.17
  SGI Origin2000 (8CPU)              240x120x120      6.21   ( 1,199) 0.15
  Fujitsu VP-200                     240x 80x 80     10.38   (   316) 0.32
  Fujitsu VP-2600                    240x 80x 80      1.50   ( 2,188) 2.19
  Fujitsu VP-2600                    320x 80x 80      1.76   ( 2,486) 2.49
  Fujitsu VP-2600                    300x100x100      2.57   ( 2,494) 2.49
  Fujitsu VP-2600                    320x 80x160      3.63   ( 2,417) 2.42
  Fujitsu VPP-500 (1PE, earthb)      320x 80x 80      3.556  ( 1,230) 1.23
  Fujitsu VPP-500 (2PE)              320x 80x 80      1.846  ( 2,370) 1.19
  Fujitsu VPP-500 (4PE)              320x 80x 80      1.012  ( 4,323) 1.08
  Fujitsu VPP-500 (8PE)              320x 80x 80      0.591  ( 7,403) 0.93
  Fujitsu VPP-500 (16PE)             320x 80x 80      0.368  (11,889) 0.74
  Fujitsu VPP-500 (16PE)             400x100x100      0.666  (12,831) 0.80
  Fujitsu VPP-500 (16PE)             640x160x160      2.308  (15,165) 0.95
  Fujitsu VPP-500 (16PE)             800x200x200      4.119  (16,597) 1.04
  Fujitsu VPP-500 (1PE, eartha2)     320x 80x160      7.088  ( 1,234) 1.23
  Fujitsu VPP-500 (2PE)              320x 80x160      3.620  ( 2,417) 1.21
  Fujitsu VPP-500 (4PE)              320x 80x160      1.899  ( 4,608) 1.15
  Fujitsu VPP-500 (8PE)              320x 80x160      1.035  ( 8,454) 1.06
  Fujitsu VPP-500 (16PE)             320x 80x160      0.592  (14,781) 0.92
  Fujitsu VPP-500 (16PE)             400x100x200      1.088  (15,708) 0.98
  Fujitsu VPP-500 (16PE)             640x160x320      4.064  (17,225) 1.08
  Fujitsu VPP-500 (16PE)             800x200x400      7.632  (17,914) 1.12
  Fujitsu VPP-500 (16PE, Venus)      400x100x100      0.667  (12,811) 0.80
  Fujitsu VPP-500 (16PE, Jupiter)    300x200x100      0.975  (13,146) 0.82
  Fujitsu VPP-5000 (1PE, earthb)     400x100x100      1.154 (  7,405) 7.40
  Fujitsu VPP-5000 (2PE, earthb)     400x100x100      0.5762( 14,831) 7.41
  Fujitsu VPP-5000 (4PE, earthb)     400x100x100      0.3039( 28,119) 7.03
  Fujitsu VPP-5000 (8PE, earthb)     400x100x100      0.1613( 52,979) 6.62
  Fujitsu VPP-5000 (16PE, earthb)    400x100x100     0.09355( 91,346) 5.71
  Fujitsu VPP-5000 (16PE, earthb)    800x200x200     0.62417(109,526) 6.85
  Fujitsu VPP-5000 (16PE, eartha2)   500x100x200     0.19975(106,948) 6.68
  Fujitsu VPP-5000 (16PE, eartha2)   800x200x400     1.20162(113,779) 7.11
  Fujitsu VPP-5000 ( 2PE, eartha2)   800x200x478    10.65936( 15,327) 7.66
  Fujitsu VPP-5000 ( 4PE, eartha2)   800x200x478     5.35061( 30,534) 7.63
  Fujitsu VPP-5000 ( 8PE, eartha2)   800x200x478     2.73815( 59,666) 7.46
  Fujitsu VPP-5000 (12PE, eartha2)   800x200x478     1.86540( 87,581) 7.30
  Fujitsu VPP-5000 (16PE, eartha2)   800x200x478     1.41918(115,119) 7.19
  Fujitsu VPP-5000 (32PE, eartha2)   800x200x478     0.72187(226,328) 7.07
  Fujitsu VPP-5000 (48PE, eartha2)   800x200x478     0.53445(305,698) 6.36
  Fujitsu VPP-5000 (56PE, eartha2)   800x200x478     0.49367(330,950) 5.91

  Fujitsu VPP-5000 (32PE, eartha2)  1000x200x478     0.91633(222,872) 6.96
  Fujitsu VPP-5000 (32PE, eartha2)   800x400x478     1.44683(225,845) 7.06
  Fujitsu VPP-5000 ( 2PE, eartha2)   800x200x670     -.-----(---,---)
  Fujitsu VPP-5000 ( 4PE, eartha2)   800x200x670     7.61763( 30,063) 7.52
  Fujitsu VPP-5000 ( 8PE, eartha2)   800x200x670     3.79406( 60,359) 7.54
  Fujitsu VPP-5000 (12PE, eartha2)   800x200x670     2.80623( 81,606) 6.80
  Fujitsu VPP-5000 (16PE, eartha2)   800x200x670     1.92435(119,004) 7.44
  Fujitsu VPP-5000 (24PE, eartha2)   800x200x670     1.30786(175,099) 7.30
  Fujitsu VPP-5000 (32PE, eartha2)   800x200x670     0.97929(233,848) 7.31
  Fujitsu VPP-5000 (48PE, eartha2)   800x200x670     0.68234(335,618) 6.99
  Fujitsu VPP-5000 (56PE, eartha2)   800x200x670     0.59542(384,611) 6.87
  Fujitsu VPP-5000 (16PE, eartha2)   1000x500x1118   9.66792(123,518) 7.72 (2000.07.21)
  Fujitsu VPP-5000 (32PE, eartha2)   1000x500x1118   5.04442(236,729) 7.40 (2000.07.21)
  Fujitsu VPP-5000 (48PE, eartha2)   1000x500x1118   3.54985(336,397) 7.01 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2)   1000x500x1118   3.00623(397,228) 7.09 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2)   1000x500x1118   2.98512(400,038) 7.14
  Fujitsu VPP-5000 (32PE, eartha2)  1000x1000x1118   9.97933(239,327) 7.48 (2000.07.19)
  Fujitsu VPP-5000 (48PE, eartha2)  1000x1000x1118   7.17658(332,794) 6.93 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2)  1000x1000x1118   5.81743(410,546) 7.33
  Fujitsu VPP-5000 (56PE, eartha2)  1000x1000x1118   5.97927(399,433) 7.13 (2000.08.07)
  Fujitsu VPP-5000 (32PE, eartha2)   2238x558x1118  12.96936(229,926) 7.19 (2000.07.28)
  Fujitsu VPP-5000 (48PE, eartha2)   2238x558x1118   9.49812(313,956) 6.54 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2)   2238x558x1118   8.04309(370,752) 6.62 (2000.08.07)

  Fujitsu VPP-5000(1PE,eartha2,scalar) 200x100x478 119.60663(    171) 0.171
  Fujitsu VPP-5000 ( 1PE, eartha2)   200x100x478     2.96691(  6,883) 6.88
  Fujitsu VPP-5000 ( 2PE, eartha2)   200x100x478     1.45819( 14,005) 7.00
  Fujitsu VPP-5000 ( 4PE, eartha2)   200x100x478     0.72109( 28,320) 7.08
  Fujitsu VPP-5000 ( 8PE, eartha2)   200x100x478     0.36541( 55,886) 6.99
  Fujitsu VPP-5000 (16PE, eartha2)   200x100x478     0.20548( 99,383) 6.21
  Fujitsu VPP-5000 (32PE, eartha2)   200x100x478     0.10678(191,226) 5.98
  Fujitsu VPP-5000 (48PE, eartha2)   200x100x478     0.06853(297,959) 6.21
  Fujitsu VPP-5000 (56PE, eartha2)   200x100x478     0.06391(319,531) 5.71

  /vpp/home/usr6/a41456a/heartha2/prog9032.f  reviced boundary
  Fujitsu VPP-5000 ( 1PE, eartha2, frt)   500x100x200     2.69078(  7,939) 7.94
  Fujitsu VPP-5000 ( 2PE, eartha2, frt)   500x100x200     1.38118( 15,467) 7.73
  Fujitsu VPP-5000 ( 4PE, eartha2, frt)   500x100x200     0.71535( 29,965) 7.47
  Fujitsu VPP-5000 ( 8PE, eartha2, frt)   500x100x200     0.39820( 53,648) 6.71
  Fujitsu VPP-5000 (16PE, eartha2, frt)   500x100x200     0.20970(101,873) 6.37
  Fujitsu VPP-5000 (32PE, eartha2, frt)   500x100x200     0.13062(163,548) 5.11
  Fujitsu VPP-5000 (48PE, eartha2, frt)   500x100x200     0.09960(214,479) 4.46
  Fujitsu VPP-5000 (56PE, eartha2, frt)   500x100x200     0.08921(239,478) 4.28
  -----------------------------------------------------------------------
   HPF/JA (High Performance Fortran)
  /vpp/home/usr6/a41456a/heartha2/proghpf53.f
  Fujitsu VPP-5000 ( 1PE, eartha2, HPF)   500x100x200     2.69089(  7,938) 7.94
  Fujitsu VPP-5000 ( 2PE, eartha2, HPF)   500x100x200     1.39017( 15,366) 7.68
  Fujitsu VPP-5000 ( 4PE, eartha2, HPF)   500x100x200     0.71228( 29,993) 7.50
  Fujitsu VPP-5000 ( 8PE, eartha2, HPF)   500x100x200     0.39285( 54,381) 6.80
  Fujitsu VPP-5000 (16PE, eartha2, HPF)   500x100x200     0.20202(105,742) 6.61
  Fujitsu VPP-5000 (32PE, eartha2, HPF)   500x100x200     0.12034(175,496) 5.48
  Fujitsu VPP-5000 (48PE, eartha2, HPF)   500x100x200     0.09115(231,688) 4.82
  Fujitsu VPP-5000 (56PE, eartha2, HPF)   500x100x200     0.08625(244,846) 4.37

  HPF/JA (High Performance Fortran)
  Fujitsu VPP-5000 ( 1PE, eartha2, HPF)   200x100x478     3.00248(  6,801) 6.80 OK
  Fujitsu VPP-5000 ( 2PE, eartha2, HPF)   200x100x478     1.53509( 13,303) 6.65 OK
  Fujitsu VPP-5000 ( 4PE, eartha2, HPF)   200x100x478     0.76061( 26,849) 6.71 OK
  Fujitsu VPP-5000 ( 8PE, eartha2, HPF)   200x100x478     0.38589( 52,921) 6.62 OK
  Fujitsu VPP-5000 (16PE, eartha2, HPF)   200x100x478     0.21867( 93,390) 5.84 OK
  Fujitsu VPP-5000 (32PE, eartha2, HPF)   200x100x478     0.10972(186,129) 5.82 OK
  Fujitsu VPP-5000 (48PE, eartha2, HPF)   200x100x478     0.07374(276,956) 5.77 OK
  Fujitsu VPP-5000 (56PE, eartha2, HPF)   200x100x478     0.06823(299,269) 5.34 OK

  Fujitsu VPP-5000 ( 2PE, proghpf63.f)    800x200x478    10.74172( 15,210) 7.60
  Fujitsu VPP-5000 ( 4PE, proghpf63.f)    800x200x478     5.35382( 30,516) 7.63
  Fujitsu VPP-5000 ( 8PE, proghpf63.f)    800x200x478     2.72973( 59,851) 7.48
  Fujitsu VPP-5000 (12PE, proghpf63.f)    800x200x478     1.91098( 85,493) 7.12
  Fujitsu VPP-5000 (16PE, proghpf63.f)    800x200x478     1.38854(117,660) 7.35
  Fujitsu VPP-5000 (32PE, proghpf63.f)    800x200x478     0.71746(227,715) 7.12
  Fujitsu VPP-5000 (48PE, proghpf63.f)    800x200x478     0.51497(317,257) 6.61
  Fujitsu VPP-5000 (56PE, proghpf63.f)    800x200x478     0.46350(352,488) 6.29

  Fujitsu VPP-5000 ( 2PE, proghpf63.f)    800x200x670     -.-----(---,---)
  Fujitsu VPP-5000 ( 4PE, proghpf63.f)    800x200x670     8.00096( 28,622) 7.16 OK
  Fujitsu VPP-5000 ( 8PE, proghpf63.f)    800x200x670     3.96162( 57,806) 7.23 OK
  Fujitsu VPP-5000 (12PE, proghpf63.f)    800x200x670     3.00484( 76,212) 6.35 OK
  Fujitsu VPP-5000 (16PE, proghpf63.f)    800x200x670     2.01151(113,848) 7.12 OK
  Fujitsu VPP-5000 (24PE, proghpf63.f)    800x200x670     1.35955(168,442) 7.02 OK
  Fujitsu VPP-5000 (32PE, proghpf63.f)    800x200x670     1.03211(221,880) 6.93 OK
  Fujitsu VPP-5000 (48PE, proghpf63.f)    800x200x670     0.72060(317,798) 6.62 OK
  Fujitsu VPP-5000 (56PE, proghpf63.f)    800x200x670     0.62764(364,866) 6.52 OK

  Fujitsu VPP-5000 (16PE, eartha2)   1000x500x1118   9.84601(122,615) 7.66 (2000.07.21)
  Fujitsu VPP-5000 (16PE, eartha2a)  1000x500x1118   9.61939(125,504) 7.84 (2000.07.21)
  Fujitsu VPP-5000 (32PE, eartha2)   1000x500x1118   5.19470(232,403) 7.26 (2000.07.21)
  Fujitsu VPP-5000 (32PE, eartha2a)  1000x500x1118   4.99224(241,828) 7.56 (2000.07.21)
  Fujitsu VPP-5000 (48PE, eartha2a)  1000x500x1118   3.47943(346,972) 7.23 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2a)  1000x500x1118   2.93481(411,361) 7.35 (2000.08.07)
  Fujitsu VPP-5000 (32PE, eartha2)  1000x1000x1118  10.22563(233,562) 7.30 (2000.07.19)
  Fujitsu VPP-5000 (32PE, eartha2a) 1000x1000x1118   9.81345(243,372) 7.61 (2000.07.21)
  Fujitsu VPP-5000 (48PE, eartha2a) 1000x1000x1118   7.02753(339,852) 7.08 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2a) 1000x1000x1118   5.79368(412,228) 7.36 (2000.08.07)

  Fujitsu VPP-5000 (48PE, eartha2)   1678x558x1118   6.52886(342,453) 7.13 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2)   1678x558x1118   5.54894(402,929) 7.20 (2000.08.07)

  Fujitsu VPP-5000 (32PE, eartha2)   2238x558x1118  13.25331(225,000) 7.03 (2000.07.21)
  Fujitsu VPP-5000 (32PE, eartha2a)  2238x558x1118  12.71245(234,573) 7.33 (2000.07.21)
  Fujitsu VPP-5000 (48PE, eartha2a)  2238x558x1118   9.22722(323,174) 6.73 (2000.08.07)
  Fujitsu VPP-5000 (56PE, eartha2a)  2238x558x1118   7.80778(381,926) 6.82 (2000.08.07)
  -----------------------------------------------------------------------
   frt: Fujitsu VPP Fortran 90    HPF: High Performance Fortran

: MFLOPS is an estimated value in comparison with the computation by 1 processor of CRAY Y-MP C90.


Simulation Home Page.


GEDAS Home Page
STELab Home Page.