Lecture 2: Sets and Functions - MA101 : Calculus (Semester 1

pages

English

Documents

Écrit par
Julia D Harrison

Publié par
veap

Lire

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

pages

English

Documents

Lire

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

Publié par

veap

Nombre de lectures

Langue

English

Poids de l'ouvrage

2 Mo

cours magistral

MA101 : Calculus (Semester 1) Lecture 2: Sets and Functions Tuesday, 20 September 2011 MA101 — Lecture 2: Sets and Functions 1/10

examples ma101
short review of sets
idea of a function
numbers system
ma101
subset of real numbers
subset of the real numbers
numbers
function

Voir

Publié par

veap

Langue

English

Poids de l'ouvrage

2 Mo

!
CENTER FOR HIGH PERFORMANCE COMPUTING
GPUs for Scientiﬁc
Computing
Wim R.M. Cardoen
Center for High Performance Computing
wim.cardoen@utah.edu
Fall 2011 CENTER FOR HIGH PERFORMANCE COMPUTING
Overview
•  Why GPUs?
•  Architecture
•  CUDA
•  Basic example(s)
•  Shared Memory
•  Libraries
•  Cuda-Fortran
•  Alternatives to CUDA

12/20/11 2 CENTER FOR HIGH PERFORMANCE COMPUTING
Why GPUs?
12/20/11 3 CENTER FOR HIGH PERFORMANCE COMPUTING
M2090 (Fermi Architecture):
665 GFLOP/s (DP) & 1331 Gflops (SP)
Memory Bandwidth: 177 GB/s (no ECC)
12/20/11 4 CENTER FOR HIGH PERFORMANCE COMPUTING
Architecture
•  CPU/Multi-GPU System HP-SL390

Source: K. Spafford, J.S. Meredith and J. S. Vetter. "Quantifying NUMA and Contention Effects in
Multi-GPU Systems", Fourth Workshop on General-Purpose Computation on Graphics Processors
(GPGPU), 2011
12/20/11 5 CENTER FOR HIGH PERFORMANCE COMPUTING
•  M2090:
o  SIMT (cfr. SIMD)
o  16 SMPs (Streaming Multi Processors)
o  Each SMP: 32 cores/SMP => 512 cores
o  16 SMP: share 768 kB L2 Cache (new)
o  Constant Memory: 64 kB
o  Global Memory: 6 GB (DDR5)
o  GPU clock speed: 1.3 GHz
12/20/11 6 CENTER FOR HIGH PERFORMANCE COMPUTING
Fermi architecture block diagram
L2 Cache
Source:T. R. Halfhill. White Paper “Looking Beyond Graphics”
12/20/11 7 CENTER FOR HIGH PERFORMANCE COMPUTING
•  SMP:
o  Each SMP: 32 cores & 4 SFU
o  Each core: FP/INT Unit
o  L1 Cache (new)
o  Each SMP: can manage 48 threads
o  Warp Size: 32 threads
o  Shared memory (per block): 48 kB
o  #Registers (per block): 32768
12/20/11 8 CENTER FOR HIGH PERFORMANCE COMPUTING
•  SMP block diagram:
12/20/11 9 CENTER FOR HIGH PERFORMANCE COMPUTING
•  Multithreading in Fermi Arch.:

Source: T. R. Halfhill. White Paper “Looking Beyond Graphics”
12/20/11 10

Voir