Tatsuki Ogino
Solar-Terrestrial Environment Laboratory, Nagoya University
1. Introduction
In the computer simulation of space plasma phenomena, it is quite important
to keep higher numerical accuracy and fine spatial and temporal resolution.
To do so, it is strongly needed to extract the maximum performance of comput-
ers in execution of the simulation codes as well as to use a numerical
algorithm with higher accuracy.
For example, in the global simulation of the earth's magnetosphere, we want
to keep the outer boundaries away from the earth to avoid troublesome boundary
effects and also to find what is happening in narrow regions of the bow
shock, magnetopause and plasma sheet. Therefore, we need to increase the
number of grid points to as many as possible and that automatically in-
creases the computer memory and computation time. If the grid intervals are
changed to be a half in a 3-dimensional simulation box with the same total
length, the simulation code usually needs 8 times the computer memory and 16
times computation time. This is an essential reason why we need a supercom-
puter with higher speed and greater memory.
It is not easy in general to evaluate computer performance because results
strongly depend on the nature of the program itself and conditions of execu-
tion time. On the other hand, it is very difficult to estimate how long we
need to execute our particular simulation codes only from the catalog lists of
a computer performance. Thus we often want to know a rough evaluation or an
example of the practical performance of computation in different kinds of
computers. In order to realize such a purpose, we have executed some test
programs in many kind of computers and we have used the results as a guide
to develop the magnetohydrodynamic (MHD) simulation codes.
2. Comparison of Computer Processing Capability
We have had good opportunities to use several kinds of computers. In these
trials we tried to execute some test program runs to evaluate the computer
performance for fundamental arithmetic calculations and 2- and 3-dimensional
MHD simulation codes. Tables 1 and 2 show the results of comparisons of the
computer processing capability. In Table 1, simple averages for execution of
the four fundamental arithmetic calculations, addition, subtraction, multi-
plication and division are shown to evaluate a basic processing capability,
where the unit is millions of floating point operations per second (MFLOPS)
and the compiler option was adopted to get the maximum performance if a
particular compiler option is not written. The values of the processing
capability (MFLOPS) stand for a simple average from the four arithmetic calcu-
lations.
In Table 2, the execution times for single time step advance of 2- and 3-
dimensional MHD simulation codes are shown when 3-dimensional global MHD
simulation codes of the interaction between the solar wind and the earth's
magnetosphere were used for (a) the dawn-dusk asymmetry model with grid points
50x50x26 [Ogino et al., 1985, 1986a], (b) the dawn-dusk symmetry model with
grid points 62x32x32 [Ogino, 1986], and also 2-dimensional MHD simulation code
of the interaction between the solar wind and the cometary plasma was used for
(c) with grid points 402x102 [Ogino et al, 1986b] including boundary points.
Since the three MHD codes were originally developed to efficiently execute by
CRAY-1 supercomputer, the program size is not large and less than 1MW memory,
We successively applied the MHD codes to other computers after we modified the
original codes to get a good computer performance keeping the number of grid
points. One essential difference of the two 3-dimensional MHD models of (a)
and (b) are the length of the "do loop" in the programs. In model (a), the
long "do loop" was separated into several parts of small "do loops" in
order to vectorize all the "do loops" in the CRAY-1 compiler because the long
"do loop" for vectorization is limited in CRAY computers. On the other hand,
the minimum number of long "do loops" is usually used to get a better process-
ing performance and model (b) just corresponds to that case.
Table 1 is to demonstrate the average values for the four fundamental
arithmetic calculation and tells us a rough evaluation on computer processing
capability, where array arguments with a length of 10,000 number are used
in the calculations. There, we obtain the average values for vector option
and scalar option in compiler for the supercomputers, where all the vector do
loops were confirmed to be fully vectorized. The ratio of vector to scalar
options can be understood to give the maximum capability of a practical vec-
torization in the supercomputers. The vectorization ratio or acceleration
ratio of the supercomputer is in the range from 10 to 100 times, and it may
become a guide to develop and execute the simulation codes.
In Table 1 are shown only the average values for the four arithmetic calcu-
lations. Each value is not equal for addition, subtraction, multiplication and
division. However, in most cases the processing times for addition, subtrac-
tion and multiplication are almost same; on the contrary, that for division is
relatively small and is a quarter the other values. This is true for vector
compilers in supercomputers and so it should be noted that the division has
worst efficiency in the four arithmetic calculations. Therefore, we should
decrease the number of divisions in each "do loop" of simulation codes if we
want to have a higher efficiency.
It is surprising that new-age supercomputers such as NEC SX-3 and Fujitsu
VP-2600 show quite high performance, larger than 1 GFLOPS. Even if the proc-
essing capability of the workstations becomes much higher recently, the prac-
tical computation speed is almost 1 to 10 MFLOPS and is less than a hun-
dredth the fastest supercomputer speed. Therefore, we must depend on the
supercomputer by all means when we carry out a large simulation code. In the
last line of the table, the performance of a massive parallel processor,
Matsusita ADENART is shown and is almost equivalent to that of the vector-
type supercomputer like Fujitsu VP-200.
Strictly speaking, the processing capability of the four fundamental arith-
metic calculations does not reflect on that of complete simulation codes,
because a complete program is composed of many kinds of calculations and
processing. Therefore, the processing capability strongly depends on the
character of each complete program. Table 2 show an example on the computer
processing capability when we use three types of global MHD simulation codes.
Computation times corresponding to single time step advance in the MHD codes
are demonstrated in seconds. The new-age supercomputers such as NEC SX-3 and
Fujitsu VP-2600 again give excellent results in the global MHD simulation
codes. In our test by using the MHD simulation codes, three kinds of super-
computers of Fujitsu VP-200, NEC SX-2, and CRAY-YMP-864, and a massive paral-
lel processor, Matsusita ADENART give almost comparable performance. It is
noted that CRAY-2 did not present good values and that CRAY-XMP and CRAY-YMP
did not show good performance for model (b). In those cases, the full vector-
ization in compiler was not achieved because we could not understand well how
to vectorize, or the length of some "do loops" was too long for vectorization
in CRAY computer.
Moreover, it can be noted that we can nowadays get about 10 to 20 times
computation performance by using the recent supercomputers in comparison with
the first supercomputer, CRAY-1. At the same time we can also use a large
amount of computer memory from 300 MB to 1 GB in the simulation, which may
permit us to handle large numbers of grid points, much greater than
100x100x100 and 1000x1000 even in the 2-and 3-dimensional MHD simulation
codes. As a result, we can confidently expect that we will be able to obtain
much physically meaningful results from computer simulations in the STEP
interval.
3. Summary
We demonstrated comparisons of computer processing capabilities for funda-
mental arithmetic calculations and for three kinds of complete MHD simulation
codes. The almost computations to obtain the results were executed in fact
by ourselves. These tables to demonstrate computer performances are of course
a particular example and do not mean the general performance of computers.
However, they may be useful for us to give guidance when we develop new simu-
lation codes and execute them by using particular computers.
In the 3-dimensional global MHD simulation of the interaction between the
solar wind and the earth's magnetosphere, we particularly need higher speed in
calculation and large computer memory. Since such a high performance of
computers has been quickly achieved, we will be able to study dynamics of
the earth's magnetosphere in more detail in order to compare with theories
and observations in the STEP interval.
I would like to express my acknowledgement to the many computer centers
where I had opportunities to execute the test programs and also to the
staffs, in the computer centers.
References
Ogino, T., A three dimensional MHD simulation of the interaction of the solar
wind with the earth's magnetosphere: The generation of field-aligned currents,
J. Geophys. Res., 91, A6, 6791, 1986.
Ogino, T, R.J. Walker, M. Ashour-Abdalla, and J.M. Dawson, An MHD simulation
of By-dependent magnetospheric convection and field-aligned currents during
northward IMF, J. Geophys. Res., 90, 10,835, 1985.
Ogino, T., R.J. Walker, M. Ashour-Abdalla, and J.M. Dawson. An MHD simulation
of the effects of the interplanetary magnetic field By component on the inter-
action of the solar wind with the earth's magnetosphere during southward
interplanetary magnetic field, J. Geophys. Res., 91, 10,029, 1986a.
Ogino, T., R.J. Walker, and M. Ashour-Abdalla, An MHD simulation of the inter-
action of the solar wind with the outflowing plasma from a comet, Geophys.
Res. Lett., 13, 929, 1986b.
Table 1. A comparison of the processing capability in computers. A test
program to execute the four fundamental arithmetic calculations, addition,
subtraction, multiplication and division was used to evaluate the com-
puter processing capability, where the unit is millions of floating-point
operations per second (MFLOPS) and the compiler option to get the maximum
performance was adopted if a compiler option is not given. In the table, IAP
means to use the inner array processor and NIAP (or NOIAP) means not to use.
Table 2. Comparison of the computer processing capability for the 2-dimen-
sional and 3-dimensional global magnetohydrodynamic (MHD) simulation, where
numerical values stand for computation times (in seconds) corresponding to one
time step advance in the MHD simulation codes. In the test, the compiler
options to get maximum performance were adopted if a particular compiler
option is not given. Moreover, the grid numbers used in the MHD simulation
codes are (a) 50 x 50 x 26 , (b) 62 x 32 x 32 for 3-dimensional simulation and
(c) 402 x 102 for 2-dimensional simulation when the boundary grid points are
included.
Honohara 3-13, Toyokawa, Aichi 442, Japan
---------------------------------------------------------------------
computer compiler option processing capability (MFLOPS)
---------------------------------------------------------------------
NEC ACOS-650 Fortran 0.41
NEC ACOS-850 NIAP 1.09
NEC ACOS-850 IAP 2.47
NEC ACOS-930 NIAP, OPT=1 1.04
NEC ACOS-930 IAP, opt=3 6.87
NEC S-2000 NIAP 7.37
NEC S-2000 IAP 13.58
NEC SX-2A vector 196.3
NEC SX-2 scalar 7.74
NEC SX-2 vector 247.4
NEC SX-3/14 scalar 10.78
NEC SX-3/14 vector 583.1
NEC SX-3 vector 1,406.9
Fujitsu M-200 1.54
Fujitsu M-380 3.64
Fujitsu M-780/20 8.31
Fujitsu M-780/30 FORT77 O2 14.16
Fujitsu M-780/30 FORT77EX O3 18.07
Fujitsu M-1800 FORT77 18.46
Fujitsu VP-100 scalar 4.17
Fujitsu VP-100 vector 94.91
Fujitsu VP-200 scalar 3.20
Fujitsu VP-200 vector 225.0
Fujitsu VP-400 vector 262.8
Fujitsu VP-2600 FORT77EX O3 1,238.4
Fujitsu VPP-500 (1PE) frtpx, -sc 19.78
Fujitsu VPP-500 (1PE) frtpx 730.3
Fujitsu VPP-5000 (1PE) frt, -sc 189.78 (1999.12.27)
Fujitsu VPP-5000 (1PE) frt 3,073.7 (1999.12.27)
Hitachi M-680 8.76
Hitachi M-680D NOIAP, OPT=0 1.23
Hitachi M-680D NOIAP, OPT=3 4.37
Hitachi M-680D IAP, OPT=3 49.12
Hitachi M-680H NOIAP 6.65
Hitachi M-680H IAP 44.54
Hitachi S810/10 scalar 3.96
Hitachi S810/10 vector 51.94
Hitachi S820 vector 358.9
Hitachi S820/80 vector 497.4
Hitachi S3800/480 vector 820.7
VAX 8600 0.507
IBM-3090 Level(0) 4.03
IBM-3090 Level(1) 8.52
IBM-3090 Level(2) 8.28
CRAY-XMP-48 CFT114i off=v 3.39
CRAY-XMP-48 CFT114i 36.00
CRAY-2 CIVIC 29.51
CRAY-YMP-864 -o off 1.45
CRAY-YMP-864 -o novector 11.09
CRAY-YMP-864 -o full 116.36
SCS40 SCSFT o=h 0.695
SCS40 SCSFT vector 8.30
Asahi Stellar GS-1000 O1 (scalar) 0.538
(version 1.6) O2 (vector) 4.67
O3 (parallel) 10.77
NEC EWS-4800/20 0.188
NEC EWS-4800/50 0.112
NEC EWS-4800/210 f77 -O 1.665
NEC EWS-4800/220 f77 -O 1.812
NEC EWS-4800/260 f77 -O 2.009
NEC EWS-4800/350 f77 -O 4.929
NEC EWS-4800/360 f77 -O 5.016
MicroVAX-3400 0.419
Sun SPARC Station 1 f77 -O 0.961
Sun SPARC Station 2 f77 -O 2.188
Sun SPARC IPX f77 -O 1.646
Sun SPARC 2 (AS4075) f77 -O 1.400
Sun SPARC Station 10 f77 -O 2.999
Sun SPARC S-4/5 f77 -O 4.217
Sun SPARC S-4/CL4 f77 -O 4.419
Sun SPARC S-4/20H f77 -O 18.67
Sun SPARC S-4/20H(stcpu1) f77 -O 18.78 (1998.04.17)
Sun SPARC S-4/20H(stcpu1) f90 -O 30.75 (1998.04.17)
Sun S-7/300U f77 -O 20.18
Sun Ultra 2 (162MHz) f77 -O 24.04 (1998.04.09)
Sun Ultra 2 (162MHz) f90 -O 13.56 (1998.04.09)
Sun Ultra 2 (162MHz) frt -O (Fujitsu f90) 23.46 (1998.04.18)
Sun S-7/7000U (296MHz) f77 -O 38.19 (1998.04.07)
Sun S-7/7000U (296MHz) f90 -O 23.12 (1998.04.07)
Sun S-7/7000U 350 f77 -O 42.01 (1999.08.02)
Sun S-4/420U f77 -O 44.59 (1999.08.02)
Sun PanaStation f77 -O 18.06
DELL OptiPlex GXi f77 -O 11.36 (1997.11.12)
DEC Alpha (500MHz) f77 63.0 (1998.04.17)
DEC Alpha (500MHz) f90 64.1 (1998.04.17)
SGI Indy f77 -O 3.96
SGI Indigo2 f77 -O 9.46
SGI Origin2000(1CPU) Fortran77 27.00
SGI Octane f77 -O 17.30 (1999.08.02)
SGI O2 f77 -O 4.87 (1999.08.02)
DEC alpha 3000AXP/500 f77 -O3 13.52
Solbourne f77 -O3 1.824
TITAN O1 (scalar) 0.904
TITAN O2 (vector) 3.756
TITAN III O1 (scalar) 1.176
TITAN III O2 (vector) 6.228
TITAN III O3 (parallel) 6.543
IBM 6091-19 f77 -O 8.125
Matsusita ADENART ADETRAN (parallel) 218.0
Convex C3810 Fortran -O1 (scalar) 4.089
Convex C3810 Fortran -O2 (vector) 94.37
nCUBE2 HPF 0.758
nCUBE2 HPF SSS32 1.788
DECmpp 12000 MP-2 1K pe HPF Ver.3.1 70.95
DECmpp 12000 MP-2 2K pe HPF Ver.3.1 128.69
DECmpp 12000 MP-2 4K pe HPF Ver.3.1 189.34
---------------------------------------------------------------------
---------------------------------------------------------------------
computer compiler (a)3D-MHD (b)3D-MHD (c)2D-MHD
50x50x26 62x32x32 402x102
sec (MFLOPS) sec (MFLOPS) sec (MFLOPS)
---------------------------------------------------------------------
NEC ACOS-650 Fortran 77 187.1 ( 0.6)159.5 ( 0.7) 29.3 ( 1.0)
NEC ACOS-930 NIAP, OPT=1 14.07 ( 9) 13.76 ( 8) 4.31 ( 6.5)
NEC ACOS-930 IAP,OPT=3 9.97 ( 12) 11.34 ( 10) 2.44 ( 11.4)
NEC SX-2 opt=scalar 3.66 ( 33) 5.02 ( 23) 0.90 ( 31.0)
NEC SX-2 Fortran 77 0.34 ( 356) 0.28 ( 412) 0.042( 664)
NEC SX-3/14 opt=scalar 2.11 ( 57) 1.81 ( 64) 0.48 ( 58.1)
NEC SX-3/14 Fortran 77 0.097(1,248) 0.116( 994) 0.0149(1,871)
NEC SX-3 Fortran 77 0.014(1,991)
Fujitsu M-200 Fortran 77 34.4 ( 3.5) 34.2 ( 3.4) 7.84 ( 3.6)
Fujitsu M-380 Fortran 77 11.60 ( 10) 9.37 ( 12) 3.31 ( 8.4)
Fujitsu M-780/20 Fortran 77 4.87 ( 25) 3.94 ( 30) 1.14 ( 24.5)
Fujitsu M-780/30 FORT77 O2 3.95 ( 31) 5.06 ( 23) 0.84 ( 33.2)
Fujitsu M-780/30 FORT77EX O3 2.63 ( 46) 2.21 ( 53) 0.66 ( 42.2)
Fujitsu VP-100 opt=scalar 11.44 ( 11) 9.61 ( 12) 2.39 ( 11.7)
Fujitsu VP-100 Fortran 77 0.80 ( 151) 0.75 ( 154) 0.13 ( 214)
Fujitsu VP-200 opt=scalar 12.20 ( 10) 10.23 ( 11) 2.56 ( 10.9)
Fujitsu VP-200 Fortran 77 0.50 ( 242) 0.41 ( 281) 0.080( 348)
Fujitsu VP-400 Fortran 77 0.49 ( 247) 0.39 ( 296) 0.042( 664)
Fujitsu VP-2600 FORTCLG 0.099(1,223) 0.082(1,405) 0.014(1,991)
Fujitsu VPP-500( 1PE) frtpx, -sc 2.65 ( 46) 3.26 ( 36) 0.704( 39.6)
Fujitsu VPP-500( 1PE) frtpx 0.150( 807) 0.132( 881) 0.029( 961)
Fujitsu VPP-500( 1PE) frtpx, -sc 3.1043( 39) 5.359 ( 22)
Fujitsu VPP-500( 1PE) frtpx 0.1396( 867) 0.1194( 974)
Fujitsu VPP-500( 2PE) frtpx, -Wx 0.0749(1,616) 0.0632(1,840)
Fujitsu VPP-500( 4PE) frtpx, -Wx 0.0440(2,751) 0.0372(3,126)
Fujitsu VPP-500( 8PE) frtpx, -Wx 0.0277(4,370) 0.0244(4,766)
Fujitsu VPP-500(16PE) frtpx, -Wx 0.0189(6,405) 0.0155(7,503)
Fujitsu VPP-5000( 1PE) frt, -sc 0.717 ( 169) 0.694 ( 168)0.1238 ( 225)
Fujitsu VPP-5000( 1PE) frt 0.0301( 4026) 0.0264( 4416)0.00441( 6316)
Fujitsu VPP-5000( 1PE) frt, -sc 0.7209( 168) 0.8529( 136)
Fujitsu VPP-5000( 1PE) frt 0.02330( 5201)0.02270( 5110)
Fujitsu VPP-5000( 2PE) frt, -Wx 0.01279( 9475)0.01050(11047)
Fujitsu VPP-5000( 4PE) frt, -Wx 0.00751(16136)0.00594(19528)
Fujitsu VPP-5000( 8PE) frt, -Wx 0.00451(26870)0.00356(32583)
Fujitsu VPP-5000(16PE) frt, -Wx 0.00306(39602)0.00225(51554)
Hitachi M-680 Fortran 77 1.49 ( 18.7)
Hitachi M-680D NOIAP, OPT=3 8.42 ( 14) 9.31 ( 12) 2.11 ( 13.2)
Hitachi M-680D IAP, OPT=3 3.54 ( 34) 2.75 ( 42) 0.57 ( 48.9)
Hitachi M-680D IAP, SOPT 3.25 ( 37) 2.44 ( 48) 0.53 ( 52.6)
Hitachi S810/10 opt=scalar 3.17 ( 8.8)
Hitachi S810/10 Fortran 77 0.167( 167)
Hitachi S820/20 Fortran 77 0.23 ( 526) 0.16 ( 727) 0.020(1,394)
Hitachi S3800/480 Fortran 77 0.125( 968) 0.103(1,129) 0.0093(2,998)
IBM-3033 VS 33.3 ( 3.6) 27.8 ( 4.2) 7.90 ( 3.5)
IBM-3090 VS 9.17 ( 13) 8.76 ( 13) 2.27 ( 12.3)
IBM-3090 Fortvclg L0 9.11 ( 13) 8.78 ( 13) 2.28 ( 12.2)
IBM-3090 Fortvclg L1 5.19 ( 23) 4.01 ( 29) 1.11 ( 25.1)
VAX-11/750 Fortran 449.5 ( 0.3)432.9 ( 0.3) 96.64 ( 0.3)
CRAY-1 CFT 1.88 ( 64) 1.76 ( 65) 0.372( 74.9)
CRAY-XMP CFT 1.13 1.67 ( 73) 3.85 ( 39) 0.282( 98.9)
CRAY-2 CIVIC 131 10.3 ( 12) 7.29 ( 16)
CRAY-XMP-48 off=v 5.68 ( 21) 6.15 ( 19) 1.436( 19.4)
CRAY-XMP-48 CFT114i 1.29 ( 94) 1.13 ( 101) 0.252( 111)
CRAY-YMP-864 -o off 9.36 ( 13) 9.26 ( 13) 2.74 ( 10.2)
CRAY-YMP-864 -o novector 3.62 ( 33) 3.81 ( 31) 0.999( 27.9)
CRAY-YMP-864 -o full 0.430( 282) 1.921( 61) 0.0982( 284)
SCS40 SCSFT o=h 18.86 ( 6.4) 20.25 ( 5.7) 5.71 ( 4.9)
SCS40 SCSFT 3.94 ( 31) 3.81 ( 31) 0.964( 28.9)
TITAN O1 (scalar) 49.44 ( 2.5) 56.70 ( 2.1) 12.59 ( 2.2)
TITAN O2 (vector) 22.02 ( 5.5) 23.97 ( 4.8) 5.47 ( 5.1)
TITAN III O1 (scalar) 15.66 ( 7.7) 18.57 ( 6.3) 3.68 ( 7.6)
TITAN III O2 (vector) 7.96 ( 15) 7.54 ( 15) 1.62 ( 17.2)
TITAN III O3 (parallel) 7.73 ( 16) 7.31 ( 16) 1.59 ( 17.5)
Sun SPARC Station 1 f77 -O 47.50 ( 2.5) 47.25 ( 2.4) 13.00 ( 2.1)
Sun SPARC Station 2 f77 -O 20.50 ( 5.9) 19.88 ( 5.8) 4.81 ( 5.8)
Sun SPARC IPX f77 -O 16.30 ( 7.2) 17.60 ( 6.6) 5.23 ( 5.3)
Sun SPARC 2(AS4075) f77 -O 15.18 ( 8.0) 16.63 ( 7.0) 4.92 ( 5.7)
Sun SPARC Station 10 f77 -O 8.22 ( 14.7) 11.46 ( 10.2) 2.04 ( 13.7)
Sun SPARC S-4/5 f77 -O 5.57 ( 21.7) 6.81 ( 17.1) 1.79 ( 15.6)
Sun SPARC S-4/CL4 f77 -O 5.68 ( 21.3) 6.79 ( 17.1) 1.79 ( 15.6)
Sun SPARC S-4/20H f77 -O 1.90 ( 63.7) 1.92 ( 60.6) 0.344( 81.0)
Sun SPARC S-4/20H(stcpu1) f77 -O 1.90 ( 63.7) 1.92 ( 60.6) 0.344( 81.0)
Sun SPARC S-4/20H(stcpu1) f90 -O 1.97 ( 61.4) 1.79 ( 65.0) 0.450( 61.9)
Sun S-7/300U f77 -O 1.76 ( 68.8) 1.97 ( 59.1) 0.426( 65.4)
Sun Ultra 2 (162MHz) f77 -O 1.49 ( 81) 3.39 ( 34) 0.344( 81)
Sun Ultra 2 (162MHz) f90 -O 3.91 ( 31) 5.14 ( 23) 0.680( 41)
Sun Ultra 2 (162MHz) frt -O (f90) 1.53 ( 79) 2.74 ( 42) 0.380( 74)
Sun S-7/7000U (296MHz) f77 -O 0.836( 145) 1.09 ( 107) 0.195( 143)
Sun S-7/7000U (296MHz) f90 -O 3.04 ( 40) 2.72 ( 43) 0.375( 74)
Sun S-7/7000U 350 f77 -O 1.055( 115) 1.039( 113) 0.172( 161)
Sun S-4/420U f77 -O 0.969( 125) 0.953( 123) 0.156( 178)
Sun PanaStation f77 -O 2.08 ( 58.2) 1.87 ( 62.2) 0.348( 80.1)
DELL OptiPlex GXi f77 -O 2.68 ( 45.2) 3.14 ( 37.0) 0.641( 43.5)
DEC Alpha (500MHz) f77 0.359( 337) 0.383( 304) 0.0781( 346)
DEC Alpha (500MHz) f90 0.359( 337) 0.383( 304) 0.0781( 346)
SGI Indy f77 -O 8.20 ( 14.8) 10.35 ( 11.2) 2.21 ( 12.6)
SGI Indigo2 f77 -O 2.68 ( 45.2) 2.85 ( 40.8) 0.775( 36.0)
SGI Octane f77 -O 0.475( 255) 0.550( 211) 0.133( 210)
SGI O2 f77 -O 1.875( 64.6) 2.125( 54.7) 0.325( 85.8)
SGI Origin2000(1CPU) Fortran77 0.531( 228) 0.797( 146) 0.129( 216)
SGI Origin2000(2CPU) Fortran77 0.324( 374) 0.464( 251)
SGI Origin2000(4CPU) Fortran77 0.202( 599) 0.275( 423)
SGI Origin2000(8CPU) Fortran77 0.155( 781) 0.191( 609)
DEC alpha 3000AXP/500 f77 -O3 2.59 ( 46.8) 4.41 ( 26.4) 0.56 ( 49.8)
Solbourne f77 -O3 23.68 ( 5.1) 25.70 ( 4.5) 6.06 ( 4.6)
NEC EWS-4800/210 f77 -O3 16.65 ( 7.3) 21.15 ( 5.5) 4.17 ( 6.7)
NEC EWS-4800/220 f77 -O3 18.50 ( 6.5) 17.00 ( 6.7) 3.50 ( 8.0)
NEC EWS-4800/260 f77 -O 12.53 ( 9.7) 14.87 ( 7.8) 3.04 ( 9.2)
NEC EWS-4800/350 f77 -O 6.72 ( 18.0) 7.54 ( 15.4) 1.46 ( 19.1)
NEC EWS-4800/360 f77 -O 4.69 ( 25.8) 5.40 ( 21.5) 1.06 ( 26.3)
IBM 6091-19 f77 -O 4.437( 27.3) 4.500( 25.9) 1.125( 24.8)
Matsusita ADENART ADETRAN(parallel) 0.431( 281) 0.307( 375) 0.110( 253)
Convex C3810 f77 -O1 (scalar) 5.704( 21) 6.286( 19) 1.378( 20.2)
Convex C3810 f77 -O2 (vector) 0.948( 128) 0.895( 130) 0.213( 131)
nCUBE2E 16 pe f90 -O (parallel) 2.78 ( 43.5) 0.544( 51.2)
nCUBE2E 32 pe f90 -O (parallel) 0.293( 95.1)
nCUBE2S 128 pe f90 -O (parallel) 0.083( 336)
nCUBE2 256 pe f90 -O (parallel) 0.072( 378)
---------------------------------------------------------------------
CRAY Y-MP4E 1 processor 0.460( 263) 0.431( 263) 0.106( 263)
2 processors 0.246( 492) 0.233( 495) 0.062( 450)
4 processors 0.136( 890) 0.129( 893) 0.040( 697)
CRAY Y-MP C90 1 processor 0.289( 419) 0.265( 439)
2 processors 0.159( 762) 0.144( 800)
4 processors 0.087(1,392) 0.079(1,459)
----------------------------------------------------------------------
----------------------------------------------------------------------
computer compiler (a)3D-MHD (b)3D-MHD (c)2D-MHD
sec (MFLOPS) sec (MFLOPS) sec (MFLOPS)
Grid points 192x192x96 240x120x120 1600x400
Convex C3810 (1cpu) 240 MFLOPS 52.8 ( 143) 77.7 ( 95) 3.4 ( 131)
Convex C3820 (2cpu) 480 MFLOPS 29.5 ( 256) 41.6 ( 178) 1.9 ( 235)
Convex C3840 (4cpu) 960 MFLOPS 18.9 ( 400) 16.1 ( 283) 1.2 ( 372)
Grid points 240x120x120 240x120x120 1600x400
SGI Origin2000(1CPU) Fortran77 37.59 ( 201) 40.58 ( 183) 2.24 ( 199)
SGI Origin2000(2CPU) Fortran77 19.86 ( 381) 21.21 ( 351) 1.76 ( 253)
SGI Origin2000(4CPU) Fortran77 10.45 ( 724) 11.08 ( 672) 1.40 ( 319)
SGI Origin2000(8CPU) Fortran77 5.94 (1,274) 6.21 (1,199)
Grid points 320x 80x160 320x 80x160
SGI Origin2000(1CPU) Fortran77 77.49 ( 116) 82.97 ( 100)
SGI Origin2000(2CPU) Fortran77 40.39 ( 222) 43.01 ( 194)
SGI Origin2000(4CPU) Fortran77 21.12 ( 424) 22.85 ( 364)
SGI Origin2000(8CPU) Fortran77 11.56 ( 774) 12.25 ( 680)
-----------------------------------------------------------------------
Computer Processing Capability
2000.6.25 by Tatsuki OGINO
-----------------------------------------------------------------------
computer grid number sec (MFLOPS) GF/PE
-----------------------------------------------------------------------
Matsusita ADENART (256CPU) 180x 60x 60 3.46 ( 400)
Matsusita ADENART (256CPU) 150x100x 50 5.81 ( 276)
CRAY Y-MP C90 (8CPU) 400x200x200 7.00 ( 4,883) 0.61
SGI Origin2000 (1CPU, earthb) 240x120x120 40.58 ( 183) 0.18
SGI Origin2000 (2CPU) 240x120x120 21.21 ( 351) 0.18
SGI Origin2000 (4CPU) 240x120x120 11.08 ( 672) 0.17
SGI Origin2000 (8CPU) 240x120x120 6.21 ( 1,199) 0.15
Fujitsu VP-200 240x 80x 80 10.38 ( 316) 0.32
Fujitsu VP-2600 240x 80x 80 1.50 ( 2,188) 2.19
Fujitsu VP-2600 320x 80x 80 1.76 ( 2,486) 2.49
Fujitsu VP-2600 300x100x100 2.57 ( 2,494) 2.49
Fujitsu VP-2600 320x 80x160 3.63 ( 2,417) 2.42
Fujitsu VPP-500 (1PE, earthb) 320x 80x 80 3.556 ( 1,230) 1.23
Fujitsu VPP-500 (2PE) 320x 80x 80 1.846 ( 2,370) 1.19
Fujitsu VPP-500 (4PE) 320x 80x 80 1.012 ( 4,323) 1.08
Fujitsu VPP-500 (8PE) 320x 80x 80 0.591 ( 7,403) 0.93
Fujitsu VPP-500 (16PE) 320x 80x 80 0.368 (11,889) 0.74
Fujitsu VPP-500 (16PE) 400x100x100 0.666 (12,831) 0.80
Fujitsu VPP-500 (16PE) 640x160x160 2.308 (15,165) 0.95
Fujitsu VPP-500 (16PE) 800x200x200 4.119 (16,597) 1.04
Fujitsu VPP-500 (1PE, eartha2) 320x 80x160 7.088 ( 1,234) 1.23
Fujitsu VPP-500 (2PE) 320x 80x160 3.620 ( 2,417) 1.21
Fujitsu VPP-500 (4PE) 320x 80x160 1.899 ( 4,608) 1.15
Fujitsu VPP-500 (8PE) 320x 80x160 1.035 ( 8,454) 1.06
Fujitsu VPP-500 (16PE) 320x 80x160 0.592 (14,781) 0.92
Fujitsu VPP-500 (16PE) 400x100x200 1.088 (15,708) 0.98
Fujitsu VPP-500 (16PE) 640x160x320 4.064 (17,225) 1.08
Fujitsu VPP-500 (16PE) 800x200x400 7.632 (17,914) 1.12
Fujitsu VPP-500 (16PE, Venus) 400x100x100 0.667 (12,811) 0.80
Fujitsu VPP-500 (16PE, Jupiter) 300x200x100 0.975 (13,146) 0.82
Fujitsu VPP-5000 (1PE, earthb) 400x100x100 1.154 ( 7,405) 7.40
Fujitsu VPP-5000 (2PE, earthb) 400x100x100 0.5762( 14,831) 7.41
Fujitsu VPP-5000 (4PE, earthb) 400x100x100 0.3039( 28,119) 7.03
Fujitsu VPP-5000 (8PE, earthb) 400x100x100 0.1613( 52,979) 6.62
Fujitsu VPP-5000 (16PE, earthb) 400x100x100 0.09355( 91,346) 5.71
Fujitsu VPP-5000 (16PE, earthb) 800x200x200 0.62417(109,526) 6.85
Fujitsu VPP-5000 (16PE, eartha2) 500x100x200 0.19975(106,948) 6.68
Fujitsu VPP-5000 (16PE, eartha2) 800x200x400 1.20162(113,779) 7.11
Fujitsu VPP-5000 ( 2PE, eartha2) 800x200x478 10.65936( 15,327) 7.66
Fujitsu VPP-5000 ( 4PE, eartha2) 800x200x478 5.35061( 30,534) 7.63
Fujitsu VPP-5000 ( 8PE, eartha2) 800x200x478 2.73815( 59,666) 7.46
Fujitsu VPP-5000 (12PE, eartha2) 800x200x478 1.86540( 87,581) 7.30
Fujitsu VPP-5000 (16PE, eartha2) 800x200x478 1.41918(115,119) 7.19
Fujitsu VPP-5000 (32PE, eartha2) 800x200x478 0.72187(226,328) 7.07
Fujitsu VPP-5000 (48PE, eartha2) 800x200x478 0.53445(305,698) 6.36
Fujitsu VPP-5000 (56PE, eartha2) 800x200x478 0.49367(330,950) 5.91
Fujitsu VPP-5000 (32PE, eartha2) 1000x200x478 0.91633(222,872) 6.96
Fujitsu VPP-5000 (32PE, eartha2) 800x400x478 1.44683(225,845) 7.06
Fujitsu VPP-5000 ( 2PE, eartha2) 800x200x670 -.-----(---,---)
Fujitsu VPP-5000 ( 4PE, eartha2) 800x200x670 7.61763( 30,063) 7.52
Fujitsu VPP-5000 ( 8PE, eartha2) 800x200x670 3.79406( 60,359) 7.54
Fujitsu VPP-5000 (12PE, eartha2) 800x200x670 2.80623( 81,606) 6.80
Fujitsu VPP-5000 (16PE, eartha2) 800x200x670 1.92435(119,004) 7.44
Fujitsu VPP-5000 (24PE, eartha2) 800x200x670 1.30786(175,099) 7.30
Fujitsu VPP-5000 (32PE, eartha2) 800x200x670 0.97929(233,848) 7.31
Fujitsu VPP-5000 (48PE, eartha2) 800x200x670 0.68234(335,618) 6.99
Fujitsu VPP-5000 (56PE, eartha2) 800x200x670 0.59542(384,611) 6.87
Fujitsu VPP-5000 (16PE, eartha2) 1000x500x1118 9.66792(123,518) 7.72 (2000.07.21)
Fujitsu VPP-5000 (32PE, eartha2) 1000x500x1118 5.04442(236,729) 7.40 (2000.07.21)
Fujitsu VPP-5000 (48PE, eartha2) 1000x500x1118 3.54985(336,397) 7.01 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2) 1000x500x1118 3.00623(397,228) 7.09 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2) 1000x500x1118 2.98512(400,038) 7.14
Fujitsu VPP-5000 (32PE, eartha2) 1000x1000x1118 9.97933(239,327) 7.48 (2000.07.19)
Fujitsu VPP-5000 (48PE, eartha2) 1000x1000x1118 7.17658(332,794) 6.93 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2) 1000x1000x1118 5.81743(410,546) 7.33
Fujitsu VPP-5000 (56PE, eartha2) 1000x1000x1118 5.97927(399,433) 7.13 (2000.08.07)
Fujitsu VPP-5000 (32PE, eartha2) 2238x558x1118 12.96936(229,926) 7.19 (2000.07.28)
Fujitsu VPP-5000 (48PE, eartha2) 2238x558x1118 9.49812(313,956) 6.54 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2) 2238x558x1118 8.04309(370,752) 6.62 (2000.08.07)
Fujitsu VPP-5000(1PE,eartha2,scalar) 200x100x478 119.60663( 171) 0.171
Fujitsu VPP-5000 ( 1PE, eartha2) 200x100x478 2.96691( 6,883) 6.88
Fujitsu VPP-5000 ( 2PE, eartha2) 200x100x478 1.45819( 14,005) 7.00
Fujitsu VPP-5000 ( 4PE, eartha2) 200x100x478 0.72109( 28,320) 7.08
Fujitsu VPP-5000 ( 8PE, eartha2) 200x100x478 0.36541( 55,886) 6.99
Fujitsu VPP-5000 (16PE, eartha2) 200x100x478 0.20548( 99,383) 6.21
Fujitsu VPP-5000 (32PE, eartha2) 200x100x478 0.10678(191,226) 5.98
Fujitsu VPP-5000 (48PE, eartha2) 200x100x478 0.06853(297,959) 6.21
Fujitsu VPP-5000 (56PE, eartha2) 200x100x478 0.06391(319,531) 5.71
/vpp/home/usr6/a41456a/heartha2/prog9032.f reviced boundary
Fujitsu VPP-5000 ( 1PE, eartha2, frt) 500x100x200 2.69078( 7,939) 7.94
Fujitsu VPP-5000 ( 2PE, eartha2, frt) 500x100x200 1.38118( 15,467) 7.73
Fujitsu VPP-5000 ( 4PE, eartha2, frt) 500x100x200 0.71535( 29,965) 7.47
Fujitsu VPP-5000 ( 8PE, eartha2, frt) 500x100x200 0.39820( 53,648) 6.71
Fujitsu VPP-5000 (16PE, eartha2, frt) 500x100x200 0.20970(101,873) 6.37
Fujitsu VPP-5000 (32PE, eartha2, frt) 500x100x200 0.13062(163,548) 5.11
Fujitsu VPP-5000 (48PE, eartha2, frt) 500x100x200 0.09960(214,479) 4.46
Fujitsu VPP-5000 (56PE, eartha2, frt) 500x100x200 0.08921(239,478) 4.28
-----------------------------------------------------------------------
HPF/JA (High Performance Fortran)
/vpp/home/usr6/a41456a/heartha2/proghpf53.f
Fujitsu VPP-5000 ( 1PE, eartha2, HPF) 500x100x200 2.69089( 7,938) 7.94
Fujitsu VPP-5000 ( 2PE, eartha2, HPF) 500x100x200 1.39017( 15,366) 7.68
Fujitsu VPP-5000 ( 4PE, eartha2, HPF) 500x100x200 0.71228( 29,993) 7.50
Fujitsu VPP-5000 ( 8PE, eartha2, HPF) 500x100x200 0.39285( 54,381) 6.80
Fujitsu VPP-5000 (16PE, eartha2, HPF) 500x100x200 0.20202(105,742) 6.61
Fujitsu VPP-5000 (32PE, eartha2, HPF) 500x100x200 0.12034(175,496) 5.48
Fujitsu VPP-5000 (48PE, eartha2, HPF) 500x100x200 0.09115(231,688) 4.82
Fujitsu VPP-5000 (56PE, eartha2, HPF) 500x100x200 0.08625(244,846) 4.37
HPF/JA (High Performance Fortran)
Fujitsu VPP-5000 ( 1PE, eartha2, HPF) 200x100x478 3.00248( 6,801) 6.80 OK
Fujitsu VPP-5000 ( 2PE, eartha2, HPF) 200x100x478 1.53509( 13,303) 6.65 OK
Fujitsu VPP-5000 ( 4PE, eartha2, HPF) 200x100x478 0.76061( 26,849) 6.71 OK
Fujitsu VPP-5000 ( 8PE, eartha2, HPF) 200x100x478 0.38589( 52,921) 6.62 OK
Fujitsu VPP-5000 (16PE, eartha2, HPF) 200x100x478 0.21867( 93,390) 5.84 OK
Fujitsu VPP-5000 (32PE, eartha2, HPF) 200x100x478 0.10972(186,129) 5.82 OK
Fujitsu VPP-5000 (48PE, eartha2, HPF) 200x100x478 0.07374(276,956) 5.77 OK
Fujitsu VPP-5000 (56PE, eartha2, HPF) 200x100x478 0.06823(299,269) 5.34 OK
Fujitsu VPP-5000 ( 2PE, proghpf63.f) 800x200x478 10.74172( 15,210) 7.60
Fujitsu VPP-5000 ( 4PE, proghpf63.f) 800x200x478 5.35382( 30,516) 7.63
Fujitsu VPP-5000 ( 8PE, proghpf63.f) 800x200x478 2.72973( 59,851) 7.48
Fujitsu VPP-5000 (12PE, proghpf63.f) 800x200x478 1.91098( 85,493) 7.12
Fujitsu VPP-5000 (16PE, proghpf63.f) 800x200x478 1.38854(117,660) 7.35
Fujitsu VPP-5000 (32PE, proghpf63.f) 800x200x478 0.71746(227,715) 7.12
Fujitsu VPP-5000 (48PE, proghpf63.f) 800x200x478 0.51497(317,257) 6.61
Fujitsu VPP-5000 (56PE, proghpf63.f) 800x200x478 0.46350(352,488) 6.29
Fujitsu VPP-5000 ( 2PE, proghpf63.f) 800x200x670 -.-----(---,---)
Fujitsu VPP-5000 ( 4PE, proghpf63.f) 800x200x670 8.00096( 28,622) 7.16 OK
Fujitsu VPP-5000 ( 8PE, proghpf63.f) 800x200x670 3.96162( 57,806) 7.23 OK
Fujitsu VPP-5000 (12PE, proghpf63.f) 800x200x670 3.00484( 76,212) 6.35 OK
Fujitsu VPP-5000 (16PE, proghpf63.f) 800x200x670 2.01151(113,848) 7.12 OK
Fujitsu VPP-5000 (24PE, proghpf63.f) 800x200x670 1.35955(168,442) 7.02 OK
Fujitsu VPP-5000 (32PE, proghpf63.f) 800x200x670 1.03211(221,880) 6.93 OK
Fujitsu VPP-5000 (48PE, proghpf63.f) 800x200x670 0.72060(317,798) 6.62 OK
Fujitsu VPP-5000 (56PE, proghpf63.f) 800x200x670 0.62764(364,866) 6.52 OK
Fujitsu VPP-5000 (16PE, eartha2) 1000x500x1118 9.84601(122,615) 7.66 (2000.07.21)
Fujitsu VPP-5000 (16PE, eartha2a) 1000x500x1118 9.61939(125,504) 7.84 (2000.07.21)
Fujitsu VPP-5000 (32PE, eartha2) 1000x500x1118 5.19470(232,403) 7.26 (2000.07.21)
Fujitsu VPP-5000 (32PE, eartha2a) 1000x500x1118 4.99224(241,828) 7.56 (2000.07.21)
Fujitsu VPP-5000 (48PE, eartha2a) 1000x500x1118 3.47943(346,972) 7.23 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2a) 1000x500x1118 2.93481(411,361) 7.35 (2000.08.07)
Fujitsu VPP-5000 (32PE, eartha2) 1000x1000x1118 10.22563(233,562) 7.30 (2000.07.19)
Fujitsu VPP-5000 (32PE, eartha2a) 1000x1000x1118 9.81345(243,372) 7.61 (2000.07.21)
Fujitsu VPP-5000 (48PE, eartha2a) 1000x1000x1118 7.02753(339,852) 7.08 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2a) 1000x1000x1118 5.79368(412,228) 7.36 (2000.08.07)
Fujitsu VPP-5000 (48PE, eartha2) 1678x558x1118 6.52886(342,453) 7.13 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2) 1678x558x1118 5.54894(402,929) 7.20 (2000.08.07)
Fujitsu VPP-5000 (32PE, eartha2) 2238x558x1118 13.25331(225,000) 7.03 (2000.07.21)
Fujitsu VPP-5000 (32PE, eartha2a) 2238x558x1118 12.71245(234,573) 7.33 (2000.07.21)
Fujitsu VPP-5000 (48PE, eartha2a) 2238x558x1118 9.22722(323,174) 6.73 (2000.08.07)
Fujitsu VPP-5000 (56PE, eartha2a) 2238x558x1118 7.80778(381,926) 6.82 (2000.08.07)
-----------------------------------------------------------------------
frt: Fujitsu VPP Fortran 90 HPF: High Performance Fortran
: MFLOPS is an estimated value in comparison with the computation by
1 processor of CRAY Y-MP C90.