A Report on Computer Processing Capability for the Magnetohydrodynamic Simulation Model Tatsuki Ogino Solar-Terrestrial Environment Laboratory, Nagoya University Honohara 3-13, Toyokawa, Aichi 442, Japan 1. Introduction In the computer simulation of space plasma phenomena, it is quite important to keep higher numerical accuracy and fine spatial and temporal resolution. To do so, it is strongly needed to extract the maximum performance of comput- ers in execution of the simulation codes as well as to use a numerical algorithm with higher accuracy. For example, in the global simulation of the earth's magnetosphere, we want to keep the outer boundaries away from the earth to avoid troublesome boundary effects and also to find what is happening in narrow regions of the bow shock, magnetopause and plasma sheet. Therefore, we need to increase the number of grid points to as many as possible and that automatically in- creases the computer memory and computation time. If the grid intervals are changed to be a half in a 3-dimensional simulation box with the same total length, the simulation code usually needs 8 times the computer memory and 16 times computation time. This is an essential reason why we need a supercom- puter with higher speed and greater memory. It is not easy in general to evaluate computer performance because results strongly depend on the nature of the program itself and conditions of execu- tion time. On the other hand, it is very difficult to estimate how long we need to execute our particular simulation codes only from the catalog lists of a computer performance. Thus we often want to know a rough evaluation or an example of the practical performance of computation in different kinds of computers. In order to realize such a purpose, we have executed some test programs in many kind of computers and we have used the results as a guide to develop the magnetohydrodynamic (MHD) simulation codes. 2. Comparison of Computer Processing Capability We have had good opportunities to use several kinds of computers. In these trials we tried to execute some test program runs to evaluate the computer performance for fundamental arithmetic calculations and 2- and 3-dimensional MHD simulation codes. Tables 1 and 2 show the results of comparisons of the computer processing capability. In Table 1, simple averages for execution of the four fundamental arithmetic calculations, addition, subtraction, multi- plication and division are shown to evaluate a basic processing capability, where the unit is millions of floating point operations per second (MFLOPS) and the compiler option was adopted to get the maximum performance if a particular compiler option is not written. The values of the processing capability (MFLOPS) stand for a simple average from the four arithmetic calcu- lations. In Table 2, the execution times for single time step advance of 2- and 3- dimensional MHD simulation codes are shown when 3-dimensional global MHD simulation codes of the interaction between the solar wind and the earth's magnetosphere were used for (a) the dawn-dusk asymmetry model with grid points 50x50x26 [Ogino et al., 1985, 1986a], (b) the dawn-dusk symmetry model with grid points 62x32x32 [Ogino, 1986], and also 2-dimensional MHD simulation code of the interaction between the solar wind and the cometary plasma was used for (c) with grid points 402x102 [Ogino et al, 1986b] including boundary points. Since the three MHD codes were originally developed to efficiently execute by CRAY-1 supercomputer, the program size is not large and less than 1MW memory, We successively applied the MHD codes to other computers after we modified the original codes to get a good computer performance keeping the number of grid points. One essential difference of the two 3-dimensional MHD models of (a) and (b) are the length of the "do loop" in the programs. In model (a), the long "do loop" was separated into several parts of small "do loops" in order to vectorize all the "do loops" in the CRAY-1 compiler because the long "do loop" for vectorization is limited in CRAY computers. On the other hand, the minimum number of long "do loops" is usually used to get a better process- ing performance and model (b) just corresponds to that case. Table 1 is to demonstrate the average values for the four fundamental arithmetic calculation and tells us a rough evaluation on computer processing capability, where array arguments with a length of 10,000 number are used in the calculations. There, we obtain the average values for vector option and scalar option in compiler for the supercomputers, where all the vector do loops were confirmed to be fully vectorized. The ratio of vector to scalar options can be understood to give the maximum capability of a practical vec- torization in the supercomputers. The vectorization ratio or acceleration ratio of the supercomputer is in the range from 10 to 100 times, and it may become a guide to develop and execute the simulation codes. In Table 1 are shown only the average values for the four arithmetic calcu- lations. Each value is not equal for addition, subtraction, multiplication and division. However, in most cases the processing times for addition, subtrac- tion and multiplication are almost same; on the contrary, that for division is relatively small and is a quarter the other values. This is true for vector compilers in supercomputers and so it should be noted that the division has worst efficiency in the four arithmetic calculations. Therefore, we should decrease the number of divisions in each "do loop" of simulation codes if we want to have a higher efficiency. It is surprising that new-age supercomputers such as NEC SX-3 and Fujitsu VP-2600 show quite high performance, larger than 1 GFLOPS. Even if the proc- essing capability of the workstations becomes much higher recently, the prac- tical computation speed is almost 1 to 10 MFLOPS and is less than a hun- dredth the fastest supercomputer speed. Therefore, we must depend on the supercomputer by all means when we carry out a large simulation code. In the last line of the table, the performance of a massive parallel processor, Matsusita ADENART is shown and is almost equivalent to that of the vector- type supercomputer like Fujitsu VP-200. Strictly speaking, the processing capability of the four fundamental arith- metic calculations does not reflect on that of complete simulation codes, because a complete program is composed of many kinds of calculations and processing. Therefore, the processing capability strongly depends on the character of each complete program. Table 2 show an example on the computer processing capability when we use three types of global MHD simulation codes. Computation times corresponding to single time step advance in the MHD codes are demonstrated in seconds. The new-age supercomputers such as NEC SX-3 and Fujitsu VP-2600 again give excellent results in the global MHD simulation codes. In our test by using the MHD simulation codes, three kinds of super- computers of Fujitsu VP-200, NEC SX-2, and CRAY-YMP-864, and a massive paral- lel processor, Matsusita ADENART give almost comparable performance. It is noted that CRAY-2 did not present good values and that CRAY-XMP and CRAY-YMP did not show good performance for model (b). In those cases, the full vector- ization in compiler was not achieved because we could not understand well how to vectorize, or the length of some "do loops" was too long for vectorization in CRAY computer. Moreover, it can be noted that we can nowadays get about 10 to 20 times computation performance by using the recent supercomputers in comparison with the first supercomputer, CRAY-1. At the same time we can also use a large amount of computer memory from 300 MB to 1 GB in the simulation, which may permit us to handle large numbers of grid points, much greater than 100x100x100 and 1000x1000 even in the 2-and 3-dimensional MHD simulation codes. As a result, we can confidently expect that we will be able to obtain much physically meaningful results from computer simulations in the STEP interval. 3. Summary We demonstrated comparisons of computer processing capabilities for funda- mental arithmetic calculations and for three kinds of complete MHD simulation codes. The almost computations to obtain the results were executed in fact by ourselves. These tables to demonstrate computer performances are of course a particular example and do not mean the general performance of computers. However, they may be useful for us to give guidance when we develop new simu- lation codes and execute them by using particular computers. In the 3-dimensional global MHD simulation of the interaction between the solar wind and the earth's magnetosphere, we particularly need higher speed in calculation and large computer memory. Since such a high performance of computers has been quickly achieved, we will be able to study dynamics of the earth's magnetosphere in more detail in order to compare with theories and observations in the STEP interval. I would like to express my acknowledgement to the many computer centers where I had opportunities to execute the test programs and also to the staffs, in the computer centers. References Ogino, T., A three dimensional MHD simulation of the interaction of the solar wind with the earth's magnetosphere: The generation of field-aligned currents, J. Geophys. Res., 91, A6, 6791, 1986. Ogino, T, R.J. Walker, M. Ashour-Abdalla, and J.M. Dawson, An MHD simulation of By-dependent magnetospheric convection and field-aligned currents during northward IMF, J. Geophys. Res., 90, 10,835, 1985. Ogino, T., R.J. Walker, M. Ashour-Abdalla, and J.M. Dawson. An MHD simulation of the effects of the interplanetary magnetic field By component on the inter- action of the solar wind with the earth's magnetosphere during southward interplanetary magnetic field, J. Geophys. Res., 91, 10,029, 1986a. Ogino, T., R.J. Walker, and M. Ashour-Abdalla, An MHD simulation of the inter- action of the solar wind with the outflowing plasma from a comet, Geophys. Res. Lett., 13, 929, 1986b. Table 1. A comparison of the processing capability in computers. A test program to execute the four fundamental arithmetic calculations, addition, subtraction, multiplication and division was used to evaluate the com- puter processing capability, where the unit is millions of floating-point operations per second (MFLOPS) and the compiler option to get the maximum performance was adopted if a compiler option is not given. In the table, IAP means to use the inner array processor and NIAP (or NOIAP) means not to use. --------------------------------------------------------------------- computer compiler option processing capability (MFLOPS) --------------------------------------------------------------------- NEC ACOS-650 Fortran 0.41 NEC ACOS-850 NIAP 1.09 NEC ACOS-850 IAP 2.47 NEC ACOS-930 NIAP, OPT=1 1.04 NEC ACOS-930 IAP, opt=3 6.87 NEC S-2000 NIAP 7.37 NEC S-2000 IAP 13.58 NEC SX-2A vector 196.3 NEC SX-2 scalar 7.74 NEC SX-2 vector 247.4 NEC SX-3/14 scalar 10.78 NEC SX-3/14 vector 583.1 NEC SX-3 vector 1,406.9 Fujitsu M-200 1.54 Fujitsu M-380 3.64 Fujitsu M-780/20 8.31 Fujitsu M-780/30 FORT77 O2 14.16 Fujitsu M-780/30 FORT77EX O3 18.07 Fujitsu M-1800 FORT77 18.46 Fujitsu VP-100 scalar 4.17 Fujitsu VP-100 vector 94.91 Fujitsu VP-200 scalar 3.20 Fujitsu VP-200 vector 225.0 Fujitsu VP-400 vector 262.8 Fujitsu VP-2600 FORT77EX O3 1,238.4 Fujitsu VPP-500 (1PE) frtpx, -sc 19.78 Fujitsu VPP-500 (1PE) frtpx 730.3 Fujitsu VPP-5000 (1PE) frt, -sc 189.78 (1999.12.27) Fujitsu VPP-5000 (1PE) frt 3,073.7 (1999.12.27) Hitachi M-680 8.76 Hitachi M-680D NOIAP, OPT=0 1.23 Hitachi M-680D NOIAP, OPT=3 4.37 Hitachi M-680D IAP, OPT=3 49.12 Hitachi M-680H NOIAP 6.65 Hitachi M-680H IAP 44.54 Hitachi S810/10 scalar 3.96 Hitachi S810/10 vector 51.94 Hitachi S820 vector 358.9 Hitachi S820/80 vector 497.4 Hitachi S3800/480 vector 820.7 VAX 8600 0.507 IBM-3090 Level(0) 4.03 IBM-3090 Level(1) 8.52 IBM-3090 Level(2) 8.28 CRAY-XMP-48 CFT114i off=v 3.39 CRAY-XMP-48 CFT114i 36.00 CRAY-2 CIVIC 29.51 CRAY-YMP-864 -o off 1.45 CRAY-YMP-864 -o novector 11.09 CRAY-YMP-864 -o full 116.36 SCS40 SCSFT o=h 0.695 SCS40 SCSFT vector 8.30 Asahi Stellar GS-1000 O1 (scalar) 0.538 (version 1.6) O2 (vector) 4.67 O3 (parallel) 10.77 NEC EWS-4800/20 0.188 NEC EWS-4800/50 0.112 NEC EWS-4800/210 f77 -O 1.665 NEC EWS-4800/220 f77 -O 1.812 NEC EWS-4800/260 f77 -O 2.009 NEC EWS-4800/350 f77 -O 4.929 NEC EWS-4800/360 f77 -O 5.016 MicroVAX-3400 0.419 Sun SPARC Station 1 f77 -O 0.961 Sun SPARC Station 2 f77 -O 2.188 Sun SPARC IPX f77 -O 1.646 Sun SPARC 2 (AS4075) f77 -O 1.400 Sun SPARC Station 10 f77 -O 2.999 Sun SPARC S-4/5 f77 -O 4.217 Sun SPARC S-4/CL4 f77 -O 4.419 Sun SPARC S-4/20H f77 -O 18.67 Sun SPARC S-4/20H(stcpu1) f77 -O 18.78 (1998.04.17) Sun SPARC S-4/20H(stcpu1) f90 -O 30.75 (1998.04.17) Sun S-7/300U f77 -O 20.18 Sun Ultra 2 (162MHz) f77 -O 24.04 (1998.04.09) Sun Ultra 2 (162MHz) f90 -O 13.56 (1998.04.09) Sun Ultra 2 (162MHz) frt -O (Fujitsu f90) 23.46 (1998.04.18) Sun S-7/7000U (296MHz) f77 -O 38.19 (1998.04.07) Sun S-7/7000U (296MHz) f90 -O 23.12 (1998.04.07) Sun S-7/7000U 350 f77 -O 42.01 (1999.08.02) Sun S-4/420U f77 -O 44.59 (1999.08.02) Sun GP7000F frt -O 92.38 (2001.01.31) Sun PanaStation f77 -O 18.06 DELL OptiPlex GXi f77 -O 11.36 (1997.11.12) DEC Alpha (500MHz) f77 63.0 (1998.04.17) DEC Alpha (500MHz) f90 64.1 (1998.04.17) SGI Indy f77 -O 3.96 SGI Indigo2 f77 -O 9.46 SGI Origin2000(1CPU) Fortran77 27.00 SGI Octane f77 -O 17.30 (1999.08.02) SGI O2 f77 -O 4.87 (1999.08.02) DEC alpha 3000AXP/500 f77 -O3 13.52 Solbourne f77 -O3 1.824 TITAN O1 (scalar) 0.904 TITAN O2 (vector) 3.756 TITAN III O1 (scalar) 1.176 TITAN III O2 (vector) 6.228 TITAN III O3 (parallel) 6.543 IBM 6091-19 f77 -O 8.125 Matsusita ADENART ADETRAN (parallel) 218.0 Convex C3810 Fortran -O1 (scalar) 4.089 Convex C3810 Fortran -O2 (vector) 94.37 nCUBE2 HPF 0.758 nCUBE2 HPF SSS32 1.788 DECmpp 12000 MP-2 1K pe HPF Ver.3.1 70.95 DECmpp 12000 MP-2 2K pe HPF Ver.3.1 128.69 DECmpp 12000 MP-2 4K pe HPF Ver.3.1 189.34 --------------------------------------------------------------------- Table 2. Comparison of the computer processing capability for the 2-dimen- sional and 3-dimensional global magnetohydrodynamic (MHD) simulation, where numerical values stand for computation times (in seconds) corresponding to one time step advance in the MHD simulation codes. In the test, the compiler options to get maximum performance were adopted if a particular compiler option is not given. Moreover, the grid numbers used in the MHD simulation codes are (a) 50 x 50 x 26 , (b) 62 x 32 x 32 for 3-dimensional simulation and (c) 402 x 102 for 2-dimensional simulation when the boundary grid points are included. --------------------------------------------------------------------- computer compiler (a)3D-MHD (b)3D-MHD (c)2D-MHD 50x50x26 62x32x32 402x102 sec (MFLOPS) sec (MFLOPS) sec (MFLOPS) --------------------------------------------------------------------- NEC ACOS-650 Fortran 77 187.1 ( 0.6)159.5 ( 0.7) 29.3 ( 1.0) NEC ACOS-930 NIAP, OPT=1 14.07 ( 9) 13.76 ( 8) 4.31 ( 6.5) NEC ACOS-930 IAP,OPT=3 9.97 ( 12) 11.34 ( 10) 2.44 ( 11.4) NEC SX-2 opt=scalar 3.66 ( 33) 5.02 ( 23) 0.90 ( 31.0) NEC SX-2 Fortran 77 0.34 ( 356) 0.28 ( 412) 0.042( 664) NEC SX-3/14 opt=scalar 2.11 ( 57) 1.81 ( 64) 0.48 ( 58.1) NEC SX-3/14 Fortran 77 0.097(1,248) 0.116( 994) 0.0149(1,871) NEC SX-3 Fortran 77 0.014(1,991) Fujitsu M-200 Fortran 77 34.4 ( 3.5) 34.2 ( 3.4) 7.84 ( 3.6) Fujitsu M-380 Fortran 77 11.60 ( 10) 9.37 ( 12) 3.31 ( 8.4) Fujitsu M-780/20 Fortran 77 4.87 ( 25) 3.94 ( 30) 1.14 ( 24.5) Fujitsu M-780/30 FORT77 O2 3.95 ( 31) 5.06 ( 23) 0.84 ( 33.2) Fujitsu M-780/30 FORT77EX O3 2.63 ( 46) 2.21 ( 53) 0.66 ( 42.2) Fujitsu VP-100 opt=scalar 11.44 ( 11) 9.61 ( 12) 2.39 ( 11.7) Fujitsu VP-100 Fortran 77 0.80 ( 151) 0.75 ( 154) 0.13 ( 214) Fujitsu VP-200 opt=scalar 12.20 ( 10) 10.23 ( 11) 2.56 ( 10.9) Fujitsu VP-200 Fortran 77 0.50 ( 242) 0.41 ( 281) 0.080( 348) Fujitsu VP-400 Fortran 77 0.49 ( 247) 0.39 ( 296) 0.042( 664) Fujitsu VP-2600 FORTCLG 0.099(1,223) 0.082(1,405) 0.014(1,991) Fujitsu VPP-500( 1PE) frtpx, -sc 2.65 ( 46) 3.26 ( 36) 0.704( 39.6) Fujitsu VPP-500( 1PE) frtpx 0.150( 807) 0.132( 881) 0.029( 961) Fujitsu VPP-500( 1PE) frtpx, -sc 3.1043( 39) 5.359 ( 22) Fujitsu VPP-500( 1PE) frtpx 0.1396( 867) 0.1194( 974) Fujitsu VPP-500( 2PE) frtpx, -Wx 0.0749(1,616) 0.0632(1,840) Fujitsu VPP-500( 4PE) frtpx, -Wx 0.0440(2,751) 0.0372(3,126) Fujitsu VPP-500( 8PE) frtpx, -Wx 0.0277(4,370) 0.0244(4,766) Fujitsu VPP-500(16PE) frtpx, -Wx 0.0189(6,405) 0.0155(7,503) Fujitsu VPP-5000( 1PE) frt, -sc 0.717 ( 169) 0.694 ( 168)0.1238 ( 225) Fujitsu VPP-5000( 1PE) frt 0.0301( 4026) 0.0264( 4416)0.00441( 6316) Fujitsu VPP-5000( 1PE) frt, -sc 0.7209( 168) 0.8529( 136) Fujitsu VPP-5000( 1PE) frt 0.02330( 5201)0.02270( 5110) Fujitsu VPP-5000( 2PE) frt, -Wx 0.01279( 9475)0.01050(11047) Fujitsu VPP-5000( 4PE) frt, -Wx 0.00751(16136)0.00594(19528) Fujitsu VPP-5000( 8PE) frt, -Wx 0.00451(26870)0.00356(32583) Fujitsu VPP-5000(16PE) frt, -Wx 0.00306(39602)0.00225(51554) Hitachi M-680 Fortran 77 1.49 ( 18.7) Hitachi M-680D NOIAP, OPT=3 8.42 ( 14) 9.31 ( 12) 2.11 ( 13.2) Hitachi M-680D IAP, OPT=3 3.54 ( 34) 2.75 ( 42) 0.57 ( 48.9) Hitachi M-680D IAP, SOPT 3.25 ( 37) 2.44 ( 48) 0.53 ( 52.6) Hitachi S810/10 opt=scalar 3.17 ( 8.8) Hitachi S810/10 Fortran 77 0.167( 167) Hitachi S820/20 Fortran 77 0.23 ( 526) 0.16 ( 727) 0.020(1,394) Hitachi S3800/480 Fortran 77 0.125( 968) 0.103(1,129) 0.0093(2,998) IBM-3033 VS 33.3 ( 3.6) 27.8 ( 4.2) 7.90 ( 3.5) IBM-3090 VS 9.17 ( 13) 8.76 ( 13) 2.27 ( 12.3) IBM-3090 Fortvclg L0 9.11 ( 13) 8.78 ( 13) 2.28 ( 12.2) IBM-3090 Fortvclg L1 5.19 ( 23) 4.01 ( 29) 1.11 ( 25.1) VAX-11/750 Fortran 449.5 ( 0.3)432.9 ( 0.3) 96.64 ( 0.3) CRAY-1 CFT 1.88 ( 64) 1.76 ( 65) 0.372( 74.9) CRAY-XMP CFT 1.13 1.67 ( 73) 3.85 ( 39) 0.282( 98.9) CRAY-2 CIVIC 131 10.3 ( 12) 7.29 ( 16) CRAY-XMP-48 off=v 5.68 ( 21) 6.15 ( 19) 1.436( 19.4) CRAY-XMP-48 CFT114i 1.29 ( 94) 1.13 ( 101) 0.252( 111) CRAY-YMP-864 -o off 9.36 ( 13) 9.26 ( 13) 2.74 ( 10.2) CRAY-YMP-864 -o novector 3.62 ( 33) 3.81 ( 31) 0.999( 27.9) CRAY-YMP-864 -o full 0.430( 282) 1.921( 61) 0.0982( 284) SCS40 SCSFT o=h 18.86 ( 6.4) 20.25 ( 5.7) 5.71 ( 4.9) SCS40 SCSFT 3.94 ( 31) 3.81 ( 31) 0.964( 28.9) TITAN O1 (scalar) 49.44 ( 2.5) 56.70 ( 2.1) 12.59 ( 2.2) TITAN O2 (vector) 22.02 ( 5.5) 23.97 ( 4.8) 5.47 ( 5.1) TITAN III O1 (scalar) 15.66 ( 7.7) 18.57 ( 6.3) 3.68 ( 7.6) TITAN III O2 (vector) 7.96 ( 15) 7.54 ( 15) 1.62 ( 17.2) TITAN III O3 (parallel) 7.73 ( 16) 7.31 ( 16) 1.59 ( 17.5) Sun SPARC Station 1 f77 -O 47.50 ( 2.5) 47.25 ( 2.4) 13.00 ( 2.1) Sun SPARC Station 2 f77 -O 20.50 ( 5.9) 19.88 ( 5.8) 4.81 ( 5.8) Sun SPARC IPX f77 -O 16.30 ( 7.2) 17.60 ( 6.6) 5.23 ( 5.3) Sun SPARC 2(AS4075) f77 -O 15.18 ( 8.0) 16.63 ( 7.0) 4.92 ( 5.7) Sun SPARC Station 10 f77 -O 8.22 ( 14.7) 11.46 ( 10.2) 2.04 ( 13.7) Sun SPARC S-4/5 f77 -O 5.57 ( 21.7) 6.81 ( 17.1) 1.79 ( 15.6) Sun SPARC S-4/CL4 f77 -O 5.68 ( 21.3) 6.79 ( 17.1) 1.79 ( 15.6) Sun SPARC S-4/20H f77 -O 1.90 ( 63.7) 1.92 ( 60.6) 0.344( 81.0) Sun SPARC S-4/20H(stcpu1) f77 -O 1.90 ( 63.7) 1.92 ( 60.6) 0.344( 81.0) Sun SPARC S-4/20H(stcpu1) f90 -O 1.97 ( 61.4) 1.79 ( 65.0) 0.450( 61.9) Sun S-7/300U f77 -O 1.76 ( 68.8) 1.97 ( 59.1) 0.426( 65.4) Sun Ultra 2 (162MHz) f77 -O 1.49 ( 81) 3.39 ( 34) 0.344( 81) Sun Ultra 2 (162MHz) f90 -O 3.91 ( 31) 5.14 ( 23) 0.680( 41) Sun Ultra 2 (162MHz) frt -O (f90) 1.53 ( 79) 2.74 ( 42) 0.380( 74) Sun S-7/7000U (296MHz) f77 -O 0.836( 145) 1.09 ( 107) 0.195( 143) Sun S-7/7000U (296MHz) f90 -O 3.04 ( 40) 2.72 ( 43) 0.375( 74) Sun S-7/7000U 350 f77 -O 1.055( 115) 1.039( 113) 0.172( 161) Sun S-4/420U f77 -O 0.969( 125) 0.953( 123) 0.156( 178) Sun GP7000F frt -O 0.781( 155) 0.629( 186) 0.119( 233) Sun PanaStation f77 -O 2.08 ( 58.2) 1.87 ( 62.2) 0.348( 80.1) DELL OptiPlex GXi f77 -O 2.68 ( 45.2) 3.14 ( 37.0) 0.641( 43.5) DEC Alpha (500MHz) f77 0.359( 337) 0.383( 304) 0.0781( 346) DEC Alpha (500MHz) f90 0.359( 337) 0.383( 304) 0.0781( 346) SGI Indy f77 -O 8.20 ( 14.8) 10.35 ( 11.2) 2.21 ( 12.6) SGI Indigo2 f77 -O 2.68 ( 45.2) 2.85 ( 40.8) 0.775( 36.0) SGI Octane f77 -O 0.475( 255) 0.550( 211) 0.133( 210) SGI O2 f77 -O 1.875( 64.6) 2.125( 54.7) 0.325( 85.8) SGI Origin2000(1CPU) Fortran77 0.531( 228) 0.797( 146) 0.129( 216) SGI Origin2000(2CPU) Fortran77 0.324( 374) 0.464( 251) SGI Origin2000(4CPU) Fortran77 0.202( 599) 0.275( 423) SGI Origin2000(8CPU) Fortran77 0.155( 781) 0.191( 609) DEC alpha 3000AXP/500 f77 -O3 2.59 ( 46.8) 4.41 ( 26.4) 0.56 ( 49.8) Solbourne f77 -O3 23.68 ( 5.1) 25.70 ( 4.5) 6.06 ( 4.6) NEC EWS-4800/210 f77 -O3 16.65 ( 7.3) 21.15 ( 5.5) 4.17 ( 6.7) NEC EWS-4800/220 f77 -O3 18.50 ( 6.5) 17.00 ( 6.7) 3.50 ( 8.0) NEC EWS-4800/260 f77 -O 12.53 ( 9.7) 14.87 ( 7.8) 3.04 ( 9.2) NEC EWS-4800/350 f77 -O 6.72 ( 18.0) 7.54 ( 15.4) 1.46 ( 19.1) NEC EWS-4800/360 f77 -O 4.69 ( 25.8) 5.40 ( 21.5) 1.06 ( 26.3) IBM 6091-19 f77 -O 4.437( 27.3) 4.500( 25.9) 1.125( 24.8) Matsusita ADENART ADETRAN(parallel) 0.431( 281) 0.307( 375) 0.110( 253) Convex C3810 f77 -O1 (scalar) 5.704( 21) 6.286( 19) 1.378( 20.2) Convex C3810 f77 -O2 (vector) 0.948( 128) 0.895( 130) 0.213( 131) nCUBE2E 16 pe f90 -O (parallel) 2.78 ( 43.5) 0.544( 51.2) nCUBE2E 32 pe f90 -O (parallel) 0.293( 95.1) nCUBE2S 128 pe f90 -O (parallel) 0.083( 336) nCUBE2 256 pe f90 -O (parallel) 0.072( 378) --------------------------------------------------------------------- CRAY Y-MP4E 1 processor 0.460( 263) 0.431( 263) 0.106( 263) 2 processors 0.246( 492) 0.233( 495) 0.062( 450) 4 processors 0.136( 890) 0.129( 893) 0.040( 697) CRAY Y-MP C90 1 processor 0.289( 419) 0.265( 439) 2 processors 0.159( 762) 0.144( 800) 4 processors 0.087(1,392) 0.079(1,459) ---------------------------------------------------------------------- ---------------------------------------------------------------------- computer compiler (a)3D-MHD (b)3D-MHD (c)2D-MHD sec (MFLOPS) sec (MFLOPS) sec (MFLOPS) Grid points 192x192x96 240x120x120 1600x400 Convex C3810 (1cpu) 240 MFLOPS 52.8 ( 143) 77.7 ( 95) 3.4 ( 131) Convex C3820 (2cpu) 480 MFLOPS 29.5 ( 256) 41.6 ( 178) 1.9 ( 235) Convex C3840 (4cpu) 960 MFLOPS 18.9 ( 400) 16.1 ( 283) 1.2 ( 372) Grid points 240x120x120 240x120x120 1600x400 SGI Origin2000(1CPU) Fortran77 37.59 ( 201) 40.58 ( 183) 2.24 ( 199) SGI Origin2000(2CPU) Fortran77 19.86 ( 381) 21.21 ( 351) 1.76 ( 253) SGI Origin2000(4CPU) Fortran77 10.45 ( 724) 11.08 ( 672) 1.40 ( 319) SGI Origin2000(8CPU) Fortran77 5.94 (1,274) 6.21 (1,199) Grid points 320x 80x160 320x 80x160 SGI Origin2000(1CPU) Fortran77 77.49 ( 116) 82.97 ( 100) SGI Origin2000(2CPU) Fortran77 40.39 ( 222) 43.01 ( 194) SGI Origin2000(4CPU) Fortran77 21.12 ( 424) 22.85 ( 364) SGI Origin2000(8CPU) Fortran77 11.56 ( 774) 12.25 ( 680) ----------------------------------------------------------------------- Computer Processing Capability 2000.6.25 by Tatsuki OGINO ----------------------------------------------------------------------- computer grid number sec (MFLOPS) GF/PE ----------------------------------------------------------------------- Matsusita ADENART (256CPU) 180x 60x 60 3.46 ( 400) Matsusita ADENART (256CPU) 150x100x 50 5.81 ( 276) CRAY Y-MP C90 (8CPU) 400x200x200 7.00 ( 4,883) 0.61 SGI Origin2000 (1CPU, earthb) 240x120x120 40.58 ( 183) 0.18 SGI Origin2000 (2CPU) 240x120x120 21.21 ( 351) 0.18 SGI Origin2000 (4CPU) 240x120x120 11.08 ( 672) 0.17 SGI Origin2000 (8CPU) 240x120x120 6.21 ( 1,199) 0.15 Fujitsu VP-200 240x 80x 80 10.38 ( 316) 0.32 Fujitsu VP-2600 240x 80x 80 1.50 ( 2,188) 2.19 Fujitsu VP-2600 320x 80x 80 1.76 ( 2,486) 2.49 Fujitsu VP-2600 300x100x100 2.57 ( 2,494) 2.49 Fujitsu VP-2600 320x 80x160 3.63 ( 2,417) 2.42 Fujitsu VPP-500 (1PE, earthb) 320x 80x 80 3.556 ( 1,230) 1.23 Fujitsu VPP-500 (2PE) 320x 80x 80 1.846 ( 2,370) 1.19 Fujitsu VPP-500 (4PE) 320x 80x 80 1.012 ( 4,323) 1.08 Fujitsu VPP-500 (8PE) 320x 80x 80 0.591 ( 7,403) 0.93 Fujitsu VPP-500 (16PE) 320x 80x 80 0.368 (11,889) 0.74 Fujitsu VPP-500 (16PE) 400x100x100 0.666 (12,831) 0.80 Fujitsu VPP-500 (16PE) 640x160x160 2.308 (15,165) 0.95 Fujitsu VPP-500 (16PE) 800x200x200 4.119 (16,597) 1.04 Fujitsu VPP-500 (1PE, eartha2) 320x 80x160 7.088 ( 1,234) 1.23 Fujitsu VPP-500 (2PE) 320x 80x160 3.620 ( 2,417) 1.21 Fujitsu VPP-500 (4PE) 320x 80x160 1.899 ( 4,608) 1.15 Fujitsu VPP-500 (8PE) 320x 80x160 1.035 ( 8,454) 1.06 Fujitsu VPP-500 (16PE) 320x 80x160 0.592 (14,781) 0.92 Fujitsu VPP-500 (16PE) 400x100x200 1.088 (15,708) 0.98 Fujitsu VPP-500 (16PE) 640x160x320 4.064 (17,225) 1.08 Fujitsu VPP-500 (16PE) 800x200x400 7.632 (17,914) 1.12 Fujitsu VPP-500 (16PE, Venus) 400x100x100 0.667 (12,811) 0.80 Fujitsu VPP-500 (16PE, Jupiter) 300x200x100 0.975 (13,146) 0.82 Fujitsu VPP-5000 (1PE, earthb) 400x100x100 1.154 ( 7,405) 7.40 Fujitsu VPP-5000 (2PE, earthb) 400x100x100 0.5762( 14,831) 7.41 Fujitsu VPP-5000 (4PE, earthb) 400x100x100 0.3039( 28,119) 7.03 Fujitsu VPP-5000 (8PE, earthb) 400x100x100 0.1613( 52,979) 6.62 Fujitsu VPP-5000 (16PE, earthb) 400x100x100 0.09355( 91,346) 5.71 Fujitsu VPP-5000 (16PE, earthb) 800x200x200 0.62417(109,526) 6.85 Fujitsu VPP-5000 (16PE, eartha2) 500x100x200 0.19975(106,948) 6.68 Fujitsu VPP-5000 (16PE, eartha2) 800x200x400 1.20162(113,779) 7.11 Fujitsu VPP-5000 ( 2PE, eartha2) 800x200x478 10.65936( 15,327) 7.66 Fujitsu VPP-5000 ( 4PE, eartha2) 800x200x478 5.35061( 30,534) 7.63 Fujitsu VPP-5000 ( 8PE, eartha2) 800x200x478 2.73815( 59,666) 7.46 Fujitsu VPP-5000 (12PE, eartha2) 800x200x478 1.86540( 87,581) 7.30 Fujitsu VPP-5000 (16PE, eartha2) 800x200x478 1.41918(115,119) 7.19 Fujitsu VPP-5000 (32PE, eartha2) 800x200x478 0.72187(226,328) 7.07 Fujitsu VPP-5000 (48PE, eartha2) 800x200x478 0.53445(305,698) 6.36 Fujitsu VPP-5000 (56PE, eartha2) 800x200x478 0.49367(330,950) 5.91 Fujitsu VPP-5000 (32PE, eartha2) 1000x200x478 0.91633(222,872) 6.96 Fujitsu VPP-5000 (32PE, eartha2) 800x400x478 1.44683(225,845) 7.06 Fujitsu VPP-5000 ( 2PE, eartha2) 800x200x670 -.-----(---,---) Fujitsu VPP-5000 ( 4PE, eartha2) 800x200x670 7.61763( 30,063) 7.52 Fujitsu VPP-5000 ( 8PE, eartha2) 800x200x670 3.79406( 60,359) 7.54 Fujitsu VPP-5000 (12PE, eartha2) 800x200x670 2.80623( 81,606) 6.80 Fujitsu VPP-5000 (16PE, eartha2) 800x200x670 1.92435(119,004) 7.44 Fujitsu VPP-5000 (24PE, eartha2) 800x200x670 1.30786(175,099) 7.30 Fujitsu VPP-5000 (32PE, eartha2) 800x200x670 0.97929(233,848) 7.31 Fujitsu VPP-5000 (48PE, eartha2) 800x200x670 0.68234(335,618) 6.99 Fujitsu VPP-5000 (56PE, eartha2) 800x200x670 0.59542(384,611) 6.87 Fujitsu VPP-5000 (16PE, eartha2) 1000x500x1118 9.66792(123,518) 7.72 (2000.07.21) Fujitsu VPP-5000 (32PE, eartha2) 1000x500x1118 5.04442(236,729) 7.40 (2000.07.21) Fujitsu VPP-5000 (48PE, eartha2) 1000x500x1118 3.54985(336,397) 7.01 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2) 1000x500x1118 3.00623(397,228) 7.09 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2) 1000x500x1118 2.98512(400,038) 7.14 Fujitsu VPP-5000 (32PE, eartha2) 1000x1000x1118 9.97933(239,327) 7.48 (2000.07.19) Fujitsu VPP-5000 (48PE, eartha2) 1000x1000x1118 7.17658(332,794) 6.93 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2) 1000x1000x1118 5.81743(410,546) 7.33 Fujitsu VPP-5000 (56PE, eartha2) 1000x1000x1118 5.97927(399,433) 7.13 (2000.08.07) Fujitsu VPP-5000 (32PE, eartha2) 2238x558x1118 12.96936(229,926) 7.19 (2000.07.28) Fujitsu VPP-5000 (48PE, eartha2) 2238x558x1118 9.49812(313,956) 6.54 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2) 2238x558x1118 8.04309(370,752) 6.62 (2000.08.07) Fujitsu VPP-5000(1PE,eartha2,scalar) 200x100x478 119.60663( 171) 0.171 Fujitsu VPP-5000 ( 1PE, eartha2) 200x100x478 2.96691( 6,883) 6.88 Fujitsu VPP-5000 ( 2PE, eartha2) 200x100x478 1.45819( 14,005) 7.00 Fujitsu VPP-5000 ( 4PE, eartha2) 200x100x478 0.72109( 28,320) 7.08 Fujitsu VPP-5000 ( 8PE, eartha2) 200x100x478 0.36541( 55,886) 6.99 Fujitsu VPP-5000 (16PE, eartha2) 200x100x478 0.20548( 99,383) 6.21 Fujitsu VPP-5000 (32PE, eartha2) 200x100x478 0.10678(191,226) 5.98 Fujitsu VPP-5000 (48PE, eartha2) 200x100x478 0.06853(297,959) 6.21 Fujitsu VPP-5000 (56PE, eartha2) 200x100x478 0.06391(319,531) 5.71 /vpp/home/usr6/a41456a/heartha2/prog9032.f reviced boundary Fujitsu VPP-5000 ( 1PE, eartha2, frt) 500x100x200 2.69078( 7,939) 7.94 Fujitsu VPP-5000 ( 2PE, eartha2, frt) 500x100x200 1.38118( 15,467) 7.73 Fujitsu VPP-5000 ( 4PE, eartha2, frt) 500x100x200 0.71535( 29,965) 7.47 Fujitsu VPP-5000 ( 8PE, eartha2, frt) 500x100x200 0.39820( 53,648) 6.71 Fujitsu VPP-5000 (16PE, eartha2, frt) 500x100x200 0.20970(101,873) 6.37 Fujitsu VPP-5000 (32PE, eartha2, frt) 500x100x200 0.13062(163,548) 5.11 Fujitsu VPP-5000 (48PE, eartha2, frt) 500x100x200 0.09960(214,479) 4.46 Fujitsu VPP-5000 (56PE, eartha2, frt) 500x100x200 0.08921(239,478) 4.28 ----------------------------------------------------------------------- HPF/JA (High Performance Fortran) /vpp/home/usr6/a41456a/heartha2/proghpf53.f Fujitsu VPP-5000 ( 1PE, eartha2, HPF) 500x100x200 2.69089( 7,938) 7.94 Fujitsu VPP-5000 ( 2PE, eartha2, HPF) 500x100x200 1.39017( 15,366) 7.68 Fujitsu VPP-5000 ( 4PE, eartha2, HPF) 500x100x200 0.71228( 29,993) 7.50 Fujitsu VPP-5000 ( 8PE, eartha2, HPF) 500x100x200 0.39285( 54,381) 6.80 Fujitsu VPP-5000 (16PE, eartha2, HPF) 500x100x200 0.20202(105,742) 6.61 Fujitsu VPP-5000 (32PE, eartha2, HPF) 500x100x200 0.12034(175,496) 5.48 Fujitsu VPP-5000 (48PE, eartha2, HPF) 500x100x200 0.09115(231,688) 4.82 Fujitsu VPP-5000 (56PE, eartha2, HPF) 500x100x200 0.08625(244,846) 4.37 HPF/JA (High Performance Fortran) Fujitsu VPP-5000 ( 1PE, eartha2, HPF) 200x100x478 3.00248( 6,801) 6.80 OK Fujitsu VPP-5000 ( 2PE, eartha2, HPF) 200x100x478 1.53509( 13,303) 6.65 OK Fujitsu VPP-5000 ( 4PE, eartha2, HPF) 200x100x478 0.76061( 26,849) 6.71 OK Fujitsu VPP-5000 ( 8PE, eartha2, HPF) 200x100x478 0.38589( 52,921) 6.62 OK Fujitsu VPP-5000 (16PE, eartha2, HPF) 200x100x478 0.21867( 93,390) 5.84 OK Fujitsu VPP-5000 (32PE, eartha2, HPF) 200x100x478 0.10972(186,129) 5.82 OK Fujitsu VPP-5000 (48PE, eartha2, HPF) 200x100x478 0.07374(276,956) 5.77 OK Fujitsu VPP-5000 (56PE, eartha2, HPF) 200x100x478 0.06823(299,269) 5.34 OK Fujitsu VPP-5000 ( 2PE, proghpf63.f) 800x200x478 10.74172( 15,210) 7.60 Fujitsu VPP-5000 ( 4PE, proghpf63.f) 800x200x478 5.35382( 30,516) 7.63 Fujitsu VPP-5000 ( 8PE, proghpf63.f) 800x200x478 2.72973( 59,851) 7.48 Fujitsu VPP-5000 (12PE, proghpf63.f) 800x200x478 1.91098( 85,493) 7.12 Fujitsu VPP-5000 (16PE, proghpf63.f) 800x200x478 1.38854(117,660) 7.35 Fujitsu VPP-5000 (32PE, proghpf63.f) 800x200x478 0.71746(227,715) 7.12 Fujitsu VPP-5000 (48PE, proghpf63.f) 800x200x478 0.51497(317,257) 6.61 Fujitsu VPP-5000 (56PE, proghpf63.f) 800x200x478 0.46350(352,488) 6.29 Fujitsu VPP-5000 ( 2PE, proghpf63.f) 800x200x670 -.-----(---,---) Fujitsu VPP-5000 ( 4PE, proghpf63.f) 800x200x670 8.00096( 28,622) 7.16 OK Fujitsu VPP-5000 ( 8PE, proghpf63.f) 800x200x670 3.96162( 57,806) 7.23 OK Fujitsu VPP-5000 (12PE, proghpf63.f) 800x200x670 3.00484( 76,212) 6.35 OK Fujitsu VPP-5000 (16PE, proghpf63.f) 800x200x670 2.01151(113,848) 7.12 OK Fujitsu VPP-5000 (24PE, proghpf63.f) 800x200x670 1.35955(168,442) 7.02 OK Fujitsu VPP-5000 (32PE, proghpf63.f) 800x200x670 1.03211(221,880) 6.93 OK Fujitsu VPP-5000 (48PE, proghpf63.f) 800x200x670 0.72060(317,798) 6.62 OK Fujitsu VPP-5000 (56PE, proghpf63.f) 800x200x670 0.62764(364,866) 6.52 OK Fujitsu VPP-5000 (16PE, eartha2) 1000x500x1118 9.84601(122,615) 7.66 (2000.07.21) Fujitsu VPP-5000 (16PE, eartha2a) 1000x500x1118 9.61939(125,504) 7.84 (2000.07.21) Fujitsu VPP-5000 (32PE, eartha2) 1000x500x1118 5.19470(232,403) 7.26 (2000.07.21) Fujitsu VPP-5000 (32PE, eartha2a) 1000x500x1118 4.99224(241,828) 7.56 (2000.07.21) Fujitsu VPP-5000 (48PE, eartha2a) 1000x500x1118 3.47943(346,972) 7.23 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2a) 1000x500x1118 2.93481(411,361) 7.35 (2000.08.07) Fujitsu VPP-5000 (32PE, eartha2) 1000x1000x1118 10.22563(233,562) 7.30 (2000.07.19) Fujitsu VPP-5000 (32PE, eartha2a) 1000x1000x1118 9.81345(243,372) 7.61 (2000.07.21) Fujitsu VPP-5000 (48PE, eartha2a) 1000x1000x1118 7.02753(339,852) 7.08 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2a) 1000x1000x1118 5.79368(412,228) 7.36 (2000.08.07) Fujitsu VPP-5000 (48PE, eartha2) 1678x558x1118 6.52886(342,453) 7.13 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2) 1678x558x1118 5.54894(402,929) 7.20 (2000.08.07) Fujitsu VPP-5000 (32PE, eartha2) 2238x558x1118 13.25331(225,000) 7.03 (2000.07.21) Fujitsu VPP-5000 (32PE, eartha2a) 2238x558x1118 12.71245(234,573) 7.33 (2000.07.21) Fujitsu VPP-5000 (48PE, eartha2a) 2238x558x1118 9.22722(323,174) 6.73 (2000.08.07) Fujitsu VPP-5000 (56PE, eartha2a) 2238x558x1118 7.80778(381,926) 6.82 (2000.08.07) ----------------------------------------------------------------------- frt: Fujitsu VPP Fortran 90 HPF: High Performance Fortran : MFLOPS is an estimated value in comparison with the computation by 1 processor of CRAY Y-MP C90.