The Richest Man in Babylon

icon

7

pages

icon

English

icon

Documents

Écrit par

Publié par

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

7

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

  • expression écrite - matière potentielle : upon tablets of moist clay
  • expression écrite
1 The Richest Man in Babylon by George S. Clason
  • black men from the south
  • readers from coast to coast
  • clay tablets
  • thy purse
  • gods bless thee with great liberality
  • waters from the river by means of dams
  • writing upon tablets of moist clay
  • man of means
  • city
Voir icon arrow

Publié par

Nombre de lectures

19

Langue

English

Progress in NUCLEAR SCIENCE and TECHNOLOGY, Vol. 2, pp.663-669 (2011)
ARTICLE
On-the-Fly Computing on Many-Core Processors in Nuclear Applications
⁄Noriyuki KUSHIDA
Japan Atomic Energy Agency, 2-4 Shirakata, Tokai-mura, Naka-gun, Ibaraki-ken 319-1195, Japan
Many nuclear applications still require more computational power than the current computers can provide. Further-
more, some of them require dedicated machines, because they must run constantly or no delay is allowed. To satisfy
these requirements, we introduce computer accelerators which can provide higher computational power with lower
prices than the current commodity processors. However, the feasibility of accelerators had not well investigated on
nuclear applications. Thus, we applied the Cell and GPGPU to plasma stability monitoring and infrasound propagation
analysis, respectively. In the plasma monitoring, the eigenvalue solver was focused on. To obtain sufficient power, we
connected Cells with Ethernet, and implemented a preconditioned conjugate gradient method. Moreover, we applied
a hierarchical parallelization method to minimize communications among the Cells. Finally, we could solve the block
tri-diagonal Hermitian matrix that had 1;024 diagonal blocks, and each block was 128£128, within one second. On
the basis of these results, we showed the potential of plasma monitoring by using our Cell cluster system. In infrasound
propagation analysis, we accelerated two-dimensional parabolic equation (PE) method by using GPGPU. PE is one
of the most accurate methods, but it requires higher computational power than other methods. By applying software-
pipelining and memory layout optimization, we obtained£18:3 speedup on GPU from CPU. Our achieved computing
speed could be comparable to faster but more inaccurate method.
KEYWORDS: PowerXCell 8i, HD5870, GPGPU, accelerators, plasma stability monitoring, infrasound prop-
agation analysis, preconditioned conjugate gradient method, finite difference method
I. Introduction of multicore processor and multicore processors have been
employed to attack the ILP wall. Consequently, we contend
Many nuclear applications still need more computational
that they include essentials of the future HPC processing unit.
power. However, high performance computing (HPC) ma-
In addition, Cell and GPGPU provide higher computational
chines are said to be facing three walls today, and hit a glass
power with cheaper price than current processors. This fea-
ceiling in speedup. They are the “Memory Wall,” the “Power
ture is suitable for nuclear applications that require dedicated
Wall,” and “Instruction-level parallelism (ILP) Wall.” Here,
computer system, because they must run constantly, or no de-
the term “Memory Wall” is growing difference in speed be-
lay is allowed. In this study, we apply Cell and GPGPU to
tween the processing unit and the main memory. “Power
two nuclear applications, in order to investigate the feasibility
Wall” is the increasing power consumption and resulting heat
of accelerators on nuclear applications: one is plasma stability
generation of the processing unit, whereas “ILP Wall” is the
monitoring, and the other is infrasound propagation analysis.
increasing difficulty of finding enough parallelism in an in-
Both of them need dedicated HPC machines. Therefore, Cell
struction. In order to overcome the memory wall problem,
and GPGPU have preferable natures. The details including in-
out of order execution, speculative execution, data prefetch
dividual motivations of them are described in Sections II and
mechanism and other techniques have been developed and
III, respectively. In Section IV, we made conclusions.
implemented. The common aspect of these techniques is
the minimizing of total processing time by operating possi- II. Plasma Stability Monitoring for Fusion Reactors
ble calculations behind the data transfer. However, these tech-
1. Motivationsniques cause so many extra calculations that the techniques
magnify the power wall problem. Here, the combined usage In this study, we have developed a high speed eigenvalue
of software controlled memory and single instruction multi- solver on a Cell cluster system, which is an essential compo-
ple data (SIMD) processing unit seems to be a good way to nent of a plasma stability analysis system for fusion reactors.
1)break the memory wall and power wall. In particular, the The Japan Atomic Energy Agency (JAEA) has been devel-
2)Cell processor and general-purpose computing on graphics oping a plasma stability analysis system, in order to achieve
3)processing units (GPGPU), which are implemented on the 1, we illustrate a schematic viewsustainable operation. In Fig.
4)second and third fastest supercomputer in the world, per- of the stability analysis module in the real time plasma profile
form well with HPC applications. Additionally, they are kinds control system, which works as follows:
⁄Corresponding author, E-mail: kushida.noriyuki@jaea.go.jp 1. Monitor plasma current profile, pressure profile and the
c 2011 Atomic Energy Society of Japan, All Rights Reserved.
663Cellll cccllluuussster
664 Noriyuki KUSHIDA
boundary condition of magnetic flux surface.
Fusion Reactor
2. Calculate the plasma equilibrium using the equilibrium
code. Data sender Controller
3. Evaluate the plasma stability for all possible modes
(Plasma is stable/unstable, when the smallest eigenvalue
Data receiver‚, is grater/smaller than zero).
λ > 0: stableMatrix generation
4. If the plasma is unstable, control the pressure/current λ < 0: unstable
profiles to stabilize the plasma. Eigensolver:
~1 sec.
We need to evaluate the plasma equilibrium (2:) and sta-
bility of all possible modes (3:) every two to three seconds,
Result sender
if the real time profile control is applied in the fusion reactors
such as International Thermo-nuclear Experimental Reactor
Entire monitoring cycle: 2~3 sec.5)(ITER). The time limitation have roots in the characteris-
tic confinement time of the density and temperature in fusion
Fig. 1 Illustration of plasma stability analysis modulereactors; it is from three to five seconds. Moreover, we es-
timated that the plasma equilibrium and stability should be
evaluated within half of the characteristic confinement time,
entire time for computation can be longer when the numberby taking into account the time for data transfer, and other
of processors increases. Thus, we cannot utilize MPPs forsuch activities. Since we must analyze the plasma stability
the monitoring system. In order to solve these problems men-within a quite short time interval, a high-speed computer is
tioned above, we introduced a Cell cluster system into thisessential. The main component of the stability analysis mod-
6) study. A cell processor is faster than a traditional processor,ule is the plasma simulation program MARG2D. MARG2D
hence we could obtain sufficient computational power withconsists of roughly two parts: one is the matrix generation
a small number of processors. Thus, we were able to estab-part, the other is the eigensolver. In particular, the eigen-
lish the Cell cluster system at much cheaper cost, and we cansolver consumes the greatest amount of the computation time
dedicate it to monitoring. Moreover, our Cell cluster systemof MARG2D. Therefore, we focused on the eigensolver in this
requires less network overhead. Therefore, it should be suit-study. A massively parallel supercomputer (MPP), which ob-
able for the monitoring system. The Cell processor obtainstains its high calculation speed by connecting many process-
its greater computational power at the cost of more complexing units and is the current trend for heavy duty computation,
programming. Therefore, we also introduce our newly devel-is inadequate for following two reasons.
oped eigensolver in the present paper. The details of our Cell
1. MPPs can not be dedicated for the monitoring system. cluster system and the eigenvalue solver, are described in the
following subsections (Subsections 2 and 3). Moreover, the
2. MPPs have a network communication overhead. performance is evaluated in Subection 4 and conclusions are
given in Subection 5.
We elaborate on the above two points. Firstly, with regard
to the first point, when we consider developing the plasma
2. Cell Clustermonitoring system, we are required to utilize a computer dur-
(1) PowerXCell 8iing the entire reactor operation. That is because fusion reac-
tors must be monitored continuously on real time basis and Po 8i, which has a faster double precision com-
immediately. For this reason, MPPs are inadequate because putational unit than the original version, is a kind of Cell pro-
they are usually shared with a batch job system. Furthermore, cessor. An overview of PowerXCell 8i is shown in Fig. 2.
using an MPP is unrealistic, because of its high price. There- In the figure, PPE denotes a Power PC Processor Element.
fore, MPPs could not be dedicated to such a monitoring sys- The PPE has a PPU that is a processing unit equivalent to a
tem. Secondly, we discuss the latter point. MPPs consist of Power PC, and also includes a second level cache memory.
many processing units that are connected via a network. The SPE denotes a Synergetic Processor Element, which consists
data transfer performance of a network is lower than that of of a 128 bit single instruction multiple data processing unit
7)main memory. In ad

Voir icon more
Alternate Text