42
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
42
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
!
CENTER FOR HIGH PERFORMANCE COMPUTING
GPUs for Scientific
Computing
Wim R.M. Cardoen
Center for High Performance Computing
wim.cardoen@utah.edu
Fall 2011 CENTER FOR HIGH PERFORMANCE COMPUTING
Overview
• Why GPUs?
• Architecture
• CUDA
• Basic example(s)
• Shared Memory
• Libraries
• Cuda-Fortran
• Alternatives to CUDA
12/20/11 2 CENTER FOR HIGH PERFORMANCE COMPUTING
Why GPUs?
12/20/11 3 CENTER FOR HIGH PERFORMANCE COMPUTING
M2090 (Fermi Architecture):
665 GFLOP/s (DP) & 1331 Gflops (SP)
Memory Bandwidth: 177 GB/s (no ECC)
12/20/11 4 CENTER FOR HIGH PERFORMANCE COMPUTING
Architecture
• CPU/Multi-GPU System HP-SL390
Source: K. Spafford, J.S. Meredith and J. S. Vetter. "Quantifying NUMA and Contention Effects in
Multi-GPU Systems", Fourth Workshop on General-Purpose Computation on Graphics Processors
(GPGPU), 2011
12/20/11 5 CENTER FOR HIGH PERFORMANCE COMPUTING
• M2090:
o SIMT (cfr. SIMD)
o 16 SMPs (Streaming Multi Processors)
o Each SMP: 32 cores/SMP => 512 cores
o 16 SMP: share 768 kB L2 Cache (new)
o Constant Memory: 64 kB
o Global Memory: 6 GB (DDR5)
o GPU clock speed: 1.3 GHz
12/20/11 6 CENTER FOR HIGH PERFORMANCE COMPUTING
Fermi architecture block diagram
L2 Cache
Source:T. R. Halfhill. White Paper “Looking Beyond Graphics”
12/20/11 7 CENTER FOR HIGH PERFORMANCE COMPUTING
• SMP:
o Each SMP: 32 cores & 4 SFU
o Each core: FP/INT Unit
o L1 Cache (new)
o Each SMP: can manage 48 threads
o Warp Size: 32 threads
o Shared memory (per block): 48 kB
o #Registers (per block): 32768
12/20/11 8 CENTER FOR HIGH PERFORMANCE COMPUTING
• SMP block diagram:
12/20/11 9 CENTER FOR HIGH PERFORMANCE COMPUTING
• Multithreading in Fermi Arch.:
Source: T. R. Halfhill. White Paper “Looking Beyond Graphics”
12/20/11 10