- CUDA & GPGPU Programming
- Summary
- CUDA - Programming Model
- PTX Introduction
- Thread Hierarchy
- Memory Hierarchy
- Execution Model
- hardware model
- warp
- Warp 内的 PC 问题
- grid_size & block_size
- Triton - Programming Model
- CUDA vs Triton 的 scope 对比
- Triton Programming Model
- Example Code
- Pass
- Data Layout
- Language
- CUDA program
- program structure
- CUDA Context
- CUDA Stream
- Memory Layout
- intra-SM parallelism
- 常用 code
- peak_gflops
- 看 device 信息
- Performance Considerations
- Memory Access
- Latency Hiding
- Occupancy
- Memory Coalescing
- AoS v.s. SoA
- Tiling
- Bank Conflict
CUDA & GPGPU Programming
CUDA & GPGPU Programming
4月2日修改
2023年9月30日创建
2084
2382
Links |
Lecture 26: GPU Programming (Fall 2022) |
CUDA C++ Programming Guide |
Berkeley - Understanding Latency Hiding on GPUs August 12, 2016 128 页 pdf |
reusable software components for every layer of the CUDA programming model |
Summary
需要掌握的要点:
1.
GPU 架构入门,cuda framework & 基本术语。
a.
grid/block/thread 的基本概念。
b.
kernel function 声明 & kernel launch
c.
sm / sp 的 hardware execution model
d.
warp,warp scheduler & warp-synchronicity
2.
性能加速方法,总结为两个方向:
a.
IO优化(提高cache命中、使用share memory、register等)
b.
并行规划(让计算与IO、IO与IO、计算与计算互相overlap)
3.
memory copy
4.
stream & async copy