ARTICLE Simulating Plasma Turbulence with the Global Eulerian Gyrokinetic Code GT5D * Sébastien JOLLIETand Yasuhiro IDOMURA Japan Atomic Energy Agency, Higashi Ueno 6-9-3, Taitou, 110-0015 Tokyo, Japan The efficiency of future fusion devices such as ITER is strongly affected by plasma turbulence, which produces outwards particle, momentum and heat transport. The most advanced tools to study such problems are gyrokinetic codes, which solve the Boltzmann-Maxwell system in five-dimensional phase-space. In this work, the gyrokinetic global Eulerian code GT5D is presented, focusing on the different numerical schemes and on the parallelization. 4 Weak and strong scaling shows excellent scalability up to 10processors. Finally, a straight-field-line solver is pre-sented, which, for large plasma size, reduces the total memory by two orders of magnitude. Using Fourier transforms, this solver takes advantage of the alignment of turbulence with the magnetic field lines to reduce 3D turbulent fields to quasi-2D by filtering out non-aligned components. This procedure does not affect the steady state of GT5D simu-lations. Then, the code is applied to study the influence of plasma size scaling on plasma turbulence: worse-than-Bohm scaling is found for the first time in experimentally realistic gyrokinetic simulations. KEYWORDS: plasma turbulence, gyrokinetics, ITG 1 I. IntroductionThis paper presents the gyrokinetic global Eulerian GT5D 1) code, astate-of-the-art code to study Ion Temperature In fusion devices such as Tokamaks, the measured-3 Gradient (ITG) turbulence. GT5D memory scales like(ρ*)particle, momentum and heat transport perpendicular to the and rapidly becomes a burden. This paper presents the magnetic field greatly exceeds the neoclassical (resulting2) implementation of a new straight-field-line solverwhich from Coulomb collisions) predictions. This so-called greatly reduces the required memory and will enable the anomalous transport: isattributed to plasma turbulence simulation of much larger plasmas. gradients of density and temperature from the core to the The rest of the paper is organized as follows. The edge of the plasma provide a source of free energy for gyrokinetic equations and the associated numerical schemes electromagnetic perturbations to develop. The radially are presented in SectionII. The parallelization and scaling outwards transport degrades the energy confinement timeτE, properties are detailed in SectionIII. In SectionIV, the hence the fusion performance. Numerical simulations of implementation of the straight-field-line solver is presented plasma turbulence are extremely complex and require and some simulation results are shown in Section V. Finally, state-of-the-art computers: firstly, although the gyrokinetic conclusions are given in Section VI. theory reduces the dimensionality from 6 to 5 by getting rid of the fast (compared to the turbulent time scale) gyromotion II. The GT5D Code of particles, the number of grid points is typically of the 9-7 order of 10 . Time steps∆t~10 s mustbe used but profiles1. Gyrokinetic Model -3 typically evolve on a collision timeτii~10 s. Global gyroki-A full description of GT5D can be found in Reference 3. 5 6 netic simulations typically require 10-10 CPUhours. Themain numerical aspects are presented here for com-Realistic gyrokinetic simulations will need to solve simulta-pleteness. The Boltzmann equation reads: neously the ion and the electron dynamics, increasing the total cost by√(mi/me)~60 for Deuterium plasmas. One of the main parameters of turbulence isρ*=√(τ)ρL/a(1), where τ=Te/Ti isthe ion to electron temperature ratio,ρLis the . Larmor radius anda isthe minor radius of the Tokamak. -1 Wheref(R,v//,µ,t)the ion guiding distribution function, isR(ρ*) definesthe spatial resolution, and will increase by a is the guiding-center position,v//the velocity parallel to is factor 2-3 in future fusion devices such as ITER, for which -1 3 the magnetic fieldB,µis the magnetic moment andJpsis the (ρ.*) ~10Global gyrokinetic simulations of ITER retaining phase space Jacobian. The equations of motions are obtained all relevant physical effects are unreachable with today’s through: supercomputers. Consequently, many efforts must be undertaken to optimize the numerical algorithms and parallelization of gyrokinetic codes. *Corresponding author, E-mail: jolliet.sebastien@jaea.go.jp
and accuracy of a conservative Vlasov simulation. The ad-vection of the distribution function is split into the linear stiff motion and the nonlinear motion. The latter is advected ex-(2) plicitly while the linear motion is solved implicitly, based on 6) a additive semi-implicit Runge-Kutta method (ASIRK)in order to decrease the Courant-Friedrich-Levy number. A 9 huge linear implicit operator (~10degrees of freedom) is solved using a generalized conjugate residual method, which .(3) converges normally with several tens of iterations. Another advantage of separating the motion in this way is that both WhereB=Bb,B*=B+Bv///Ωi∇×b,B*//=b⋅B*,Ωi=qiB/(mic)the linear and nonlinear Hamiltonian functions are 4D func-is the cyclotron frequency and<⋅>α= 1/(2π)∫⋅dαis the gy-tions, which greatly reduces the needed memory. ro-averaging operator, whereαthe gyophase angle. The is Velocity derivatives of the distribution function appearing magnetic field can either be specified analytically or ob-in the collision operator are computed with a 6th order cen-tained numerically from an equilibrium solver. It is tered finite difference scheme.CT(f) iscalculated at each axisymetric and can be writtenB=Bϕeϕ+Bθeθ, whereBϕ is spatial grid point, andCF(f)computed numerically to en- is the toroidal component andBθis the poloidal component. force the conservation of density, momentum and energy. The collision operatorC(f) isa linearized, drift-kinetic 4) Therefore, at each time step a 3x3 matrix is solved at each Fokker-Planck operatorC(f) = CT(f)+CF(f) whereCT(f) is spatial grid point. the test-particle operator andCF(f) isthe field-particle oper-Then, the Poisson equation is solved using quadratic ator. In particular the field-particle operator is constructed B-splines finite elements on a 2D grid on flux coordinates. numerically in order to conserve density, parallel momentum 5) The toroidal angle is treated in Fourier space. This scheme and energy up to machine precision. -1 transforms Eq. (5) into a linear system. The matrix is hermi-The source operator isSsrc =Asrc(R)τsrc(fM1-fM2), where 2 tian, positive-definite with3(Nr+2)Nχcomplex elements, Asrcis a deposition profile,fM1andfM2are (shifted) Maxwel-whereNr andNχare the finite elements grid resolution. The lian distributions andτsrca time constant. isτsrcset by is 7 8 number of unknowns is typically 10-10 .The imposing no particle and momentum input but fixed power flux-surface-average operator must be treated separately: inputPin. 2 matrices mustbe stored. The linear system is then solved with a direct LAPACK solver using a LU decomposition. III. Parallelization (4) 1. Description -1 9 The sink operator isSsnk=Asnk(R)τsnk(f0-f), whereAsnk isa Evolving a distribution function of more than 10grid deposition profile,fthe initial distribution and isτa is 0snkpoints is of course impossible without parallelization. GT5D time constant. uses hybrid MPI and Open-MP parallelization. The Self-consistency is imposed by the quasi-neutrality equa-MPI-domain is a 3D domain(µ,R,Z). Sinceµis a constant of tion: motion, it only appears parametrically in the equations. Communications in theµdirection must be performed when one computesµ(5)) or derivatives(r.h.s. of Eq. integrals (5) (when computingCT(f)). In the(R,Z)the value of direction, boundary cells must be communicated after each advection. . The quasi-neutrality Eq.(5) is parallelized in the toroidal 6 angle directionϕ (n directionin Fourier space) using the WhereR+ρ isthe particle position, dZ=JpsdRdv//dµdα is (R,Z)All-to-all communications must be communicator. the phase space volume,ρti isthe Larmor radius evaluated performed to transpose the data from the(R,Z)the ton pa-with the thermal velocity,λDi,λDe arethe ion and electron rallelization. Such large parallel data transpose are also Debye lengths, <⋅>f isa flux-surface-average operator and needed for the collision operator, since theµ valuesof the n0eis the equilibrium electron density. Electrons are adiabat-distribution function are stored on different processors. ic. Dirichlet boundary conditions are imposed at the plasma edge, while a free boundary is imposed at the magnetic axis. 2. Scalability 2. ImplementationFigure 1 showsthe strong scaling speedup of GT5D on The kernel of the code aims at solving Eq.(1) with andifferent machine architectures, for a problem size of Eulerian scheme. The distribution function is therefore240×240×64×128×32. It is excellent up to 32,768 cores on discretized on a 5D gridNR,NZ,Nϕ,Nv//,Nw, where(R,ϕ,Z)are theSR16000 machine and up to 2,048 cores on the FX1 cylindrical coordinates, and is evolved using amachine. The scaling is good up to 8,192 processors on the Non-Dissipative Conservative Finite Difference schemeBX900 machine. Note that on this architecture, the peak 3) (NDCFD). Thisscheme ensures the numerical conservationperformance is 11.5% at 8,192 cores and goes up to 13.2% of the L1 and L2 norms, which are important for the stabilityfor typical production runs at 4,096 processors. The peak
Fig. 1scaling speedup achieved with GT5D on a Fujitsu StrongFig. 2scaling speedup achieved with GT5D on a Fujitsu Weak FX1 machine (blue,crosses) from 512 to 2,048 cores, on a Hita-BX900 machine from 2,000 to 16,000 cores. chi SR16000 machine (red, circles) from 8,192 to 32,768 cores and on a Fujitsu BX900 (green squares) from 1,024 to 8,192 poloidal space. Any poloidal coordinate may be used. A cores. The dashed line is the ideal scaling. convenient choice is to used the straight-field-line angleθ*defined by: performance is 10.5% on the FX1 machine at 2,048 cores and 8.7% on the SR16000 at 32,768 cores. The code performance can be improved by using . OpenMP parallelism on the BX900 machine. At fixed prob-Whereq(r)is the safety factor. This coordinate is such that lem size and number of cores, using OpenMP up to 4 threads magnetic field lines are straight in a(θ*,ϕ) plane.This decreases the cost MPI communications and increases the choice is adapted because the turbulence aligns with the sustained performance.Figure 2shows a weak scaling of magnetic field lines: one of the main assumptions of the GT5D for a fixed plasma sizeρ*=1/600(the largest that can gyrokinetic theory is the smallness of the ratiok///k⊥. Large enter the BX900 machine). The number of grid points per k//components are physically damped by Landau damping. 6 process is kept fixed (4⋅10 ) and the number of points in the Consequently, the turbulent Fourier spectrum of the turbu-toroidal direction is scaled proportionally with the number of lence is extremely narrow whenθ*used. Since is processors (this procedure is physically relevant as one does k//=(m-nq(ψ))/q(ψ)R, it means that relevant Fourier modes not need to simulate a full torus in the toroidal direction to are given by|m-nq(ψ)|<∆m, with∆ma small integer num-7) obtain a converged value of heat transport ). The largest ber (typically 5). The idea of the straight-field-line solver is resolution is 640×640×256×80×20. By going from 2,000 to then to filter out Fourier modes that are not satisfying this 16,000 cores (4 open MP PEs, 500 to 4,000 MPI PEs), the inequation. Each processor must now store degradation is only 17%, due to the increase of parallel 2 -1 3Nr(2∆m+1) Nϕ/4Pϕ∝(ρ*)complex elements, whereNϕis communications. the number of grid points in the toroidal direction andPϕis Finally, the scaling with plasma size has been checked by the number of processors used to parallelize the solver in running 10 iterations on 16,000 cores for plasma sizes rang-the toroidal direction. Remarkably, the value of∆mneither ing fromρ*=1/75 toρ*=1/600. The time per iteration for a depends on the plasma size nor on the magnetic geometry -3 fixed time step is scales likeρ*as can be anticipated theo-providedθ*used as the poloidal coordinate. An ITER is retically. However, in practice a simulation should be run up plasma would require only 80Mb. The Fourier filter de--1 totfin∝a/cs∝ρ*, such that practical simulations will scale pends on the magnetic surface through the safety factor -4 likeρ*. profile. It has been checked that this procedure does not affect the numerical particle and energy conservation prop-IV. The Straight-Field-Line Solver erties of the simulations. The memory of the field solver described in Section III Figure 3shows the poloidal cross-sections of the poten--3 scales as(ρ*). For standard simulations,ρ*=1/150 and tial for linear simulations with the original and the needed memory is 780Mb. For ITER plasmas, straight-field-line solvers. They are undistinguishable, ρ*=1/1000165 Gb would andshowing thbe required. Clearly, a new∆h to capture linear physics of atm=3 isenoug solver is needed. The straight-field-line solver presented in ITG turbulence. Growth rates differ by less than 1%. In non-Reference 2 has been implemented in GT5D. First, the qu-linear simulations, it must be checked that the filtering asi-neutrality linear systemAx=Btransformed into an is procedure does not affect the steady state of the simulation. -1 equivalent systemCy=D withC=FAF,y=Fx,D=FB and In ITG turbulent simulations, Coulomb collision processes Fthe Discrete Fourier Transform (DFT) operator in the is (resp. the perturbed electric field) induce neoclassical (resp.
Fig. 3 Poloidalcross-section of the potential for the original solver (left) and the straight-field-line solver with∆m=3.Axes unit isρs
Fig. 5 Radialprofile of time-averagedχi/χGBthree GT5D for simulations atρ*=1/100,1/150and1/225. Dashed-lines show a Bohm-like scaling.sitivity decrease (in the average sense) when the simulation time is increased, but it can be as high as 15% for the simu-lation times considered here. Since the effects of parallel filtering are smaller, or at most comparable to the effects of initial conditions, they can be considered to be not signifi-cant. V.ρ∗Scaling Plasma size is one of the main plasma parameters affect-ing fusion performance. For example, the Kadomstev 8)-2.7 Fig. 4 Radialprofile ofχi/χGB (top)and R0/LTifor (bottom) constraint leadstoBτE~(ρ*). This strong dependence is GT5D simulations with different∆mvalues one of the reasons why future devices will have a larger mi-nor radius. The confinement time will also depend on the turbulent activity in the plasma, but this dependence in yet turbulent) radial fluxes. For example, the radial turbulent still unknown. However, since ITER will haveρ* twiceas heat flux is defined by: small as present day tokamaks, the understanding of plasma size on plasma turbulence is of primary importance. Heat (7) .-α transport is characterized through the relationχi/χB∝(ρ*)-1 whereχB=(ρ*)χGB. Ifα= 1, transport is said to be Gyro-The associate transport coefficient is defined by:bohm; ifα= 0, transport is said to be Bohm. Finally, ifα< 0 transportis said to be worse-than-Bohm. Many scaling (8) 9) lawsassume Gyrobohm scaling, but both experimentsand 10) fluid simulationshave reported Bohm scaling. GT5D is a fixed-flux code: the source operatorSsrc inputs Fixed-gradient gyrokinetic simulations find a transition from some heat in the0<r/a<0.5region, which is transported ra-Bohm to Gyro-Bohm scaling whenρ* decreaseswhich dially and absorbed by the sink operator in the0.8 < r/a < 1region. Like in the experiments, the main output of the codedepends ontive an effecρ*defined asρ*eff=√(τ)ρLi/w where wthe temperature profile gradient width, indicating that is is the profiles (density, parallel momentum and temperature), finite size effects on transport might still be important for which are dictated by the input power and by plasma turbu-11) lence. ITER. However, fixed flux simulations do not fix the pro-files and the effects of plasma size for such physical model Figure 4shows the radial profiles ofχi/χGB andR/LTi, 2 must be studied. This becomes now possible with the whereχGB=(ρ*) csaa normalization coefficient and is straight-field-line solver implemented in GT5D. Simulations LTi=Ti/|∇Ti| isthe characteristic length of temperature for atρ*=1/100,1/150 and1/225have been performed by vary-simulations with different values of∆m, averaged over the ing plasma size and keeping the total input power fixed steady-state phase of the simulations.∆m=∞is run with the 6 (2 MW).The latter simulation has required 1.4⋅10 CPU Fourier solver and no filtering. This case is equivalent to the hours on the JAEA BX900 cluster. Assuming a direct rela-original solver (up to machine precision). The parallel filter-tion between the input and turbulent heat flux, transport ing does not significantly affect the steady state of the should be Gyro-Bohm like.Figure 5 showsthe simulations. The zonal flow amplitude differ maximum by time-averaged profile ofχi/χGB forthe 3 different 5% and differences up to 15% are observed forχi/χGB, which simulations, which exhibit worse-than-Bohm scaling in the are comparable to the intrinsic variability observed in GT5D source-free region0.5<r/a<0.8. Worse-than-Bohm scaling simulations. In other words, due to its chaotic behavior, 12) has already been observed for extremely steep profilesor plasma turbulence is sensitive to initial conditions: this sen-
Fig. 6and temporal evolution of Radialχi/χGBfor theρ*=1/225simulation
11) very small plasma sizes.It is observed for the first time in experimentally realistic conditions in this work. Indeed, asdisplayed onFig. 6,GT5D simulationsexhibit heat avalanches which propagate inwards or outwards depending 1) on the sign of the radial electric shear.This non-local processes may be responsible for the observed worse-than-Bohm scaling. The radial electric field profile is linked with the parallel momentum profile through the force 1) balance relation.Such complicated inter-dependence is yet still not understood. VI. Conclusions
In this work, the implementation of a new straight-field-line solver in the GT5D code is presented. Global gyrokinetic codes are the most advanced tools to study turbulence in magnetically confined plasmas and re-quire intensive optimizations and top-end HPC resources. GT5D has good scalability up to 16,000 processors, using an MPI-openMP hybrid parallelization scheme. The limiting solver in terms of memory has been replaced with a new scheme that takes advantage of the alignment of turbulence with the magnetic field lines by filtering small parallel wavelengths of the turbulent spectrum. For large plasma size, at least one order of magnitude is gained in the total memory, and the simulation results are unaffected by the Fourier filtering. Then, the code is applied to study the influence of plasma size on ITG turbulence. Results show a
worse-than-Bohm scaling in the source-free region. Larger plasma size simulations will be required to assess theρ*effects on ITER. Future developments will include the kinetic electron re-sponse and the extension to multiple ion species. Acknowledgment The simulations presented in this work were performed on the JAEA Altix3700Bx2 system and on the JAEA BX900 system. One of the authors (Y. I.) is supported by the MEXT, Grant No. 22866086. References 1)Y. Idomuraet al., “Study of ion turbulent transport and profile formations using global gyrokinetic full-fVlasov simulations,” Nucl. Fusion,49, 065029 (2009). 2)B. F. McMillanet al., “Rapid Fourier space solution of linear partial integro-differential equations in magnetic confinement geometries,”Comput. Phys. Commun.,181, 715 (2010). 3)Y. Idomuraet al., “Conservative global gyrokinetic toroidal full-f five dimensional Vlasov simulation,” Comput. Phys. Commun.,179, 391 (2008). 4)X. Q. Xu, M. N. Rosenbluth, “Numerical simulation of íon-temperature-gradient-driven modes,”Phys. Fluids,B3, 627 (1991). 5)S. Satakeet al., “Benchmark test of drift-kinetic and gyrokinetic codes through neoclassical transport simulations,” Comput. Phys. Commun.,181, 1069 (2010). 6)X. Zhong, “Additive Semi-Implicit Runge-Kutta Methods for Computing High-Speed Nonequilibrium Reactive Flows,”J. Comput. Phys.,128, 19 (1996). 7)B. F. Mcmillanet al., “System size effects on gyrokinetic turbulence,”Phys. Rev. Lett.,105, 155001 (2010). 8)B. B. Kadomstev, “Tokamaks and dimensional analysis.”Sov. J. Plasma Phys.,1, 295 (1975). 9)C. C. Pettyet al., “Gyroradius scaling of Electron and Ion Transport,”Phys. Rev. Lett.,74[10], 1763 (1995). 10)X. Garbet, R. E. Waltz, “Action at distance and Bohm scaling of turbulence in tokamaks,”Phys. Plasmas,3[5), 1898 (1996). 11)L. Villardet al., “Gyrokinetic simulations of turbulent transport: size scaling and chaotic behaviour,”Plasma Phys. Control. Fusion,52, 124038 (2010). 12)R. E. Waltzet al., “ Gyrokinetic turbulence simulations of profile shear stabilization and broken gyroBohm scaling,”Phys. Plasmas,9[5], 1938 (2002).