Loop Tiling
Loop Tiling
2024年12月23日修改
2024年3月17日创建
716
773
简要介绍
也叫 cache blocking, loop blocking
针对 nested loop 的优化技术。
核心思想:通过改变 loop 内的执行顺序,提高 cache 命中率。
1.
divide the iteration space of a nested loop into smaller rectangular blocks, or "tiles".
2.
These tiles are then processed in a specific order, rather than executing the entire loop nest in its original order.
The main benefits of loop tiling are:
1.
Improved cache utilization: By processing a smaller set of data at a time (the tiles), the program is more likely to keep the required data in the cache, reducing the number of cache misses and improving overall performance.
2.
Reduced memory access latency: When working with large data sets that do not fit entirely in the cache, tiling can help reduce the time spent accessing memory by keeping the working set smaller and more manageable.
3.
Better CPU utilization: Tiling can also help improve the CPU's ability to execute instructions in parallel, as the smaller working set can better fit the CPU's registers and functional units.
The process of loop tiling generally involves the following steps:
1.
Identifying the loop nest: The first step is to identify the nested loops in the program that would benefit from tiling.
2.
Determining the tile size: The size of the tiles is a crucial parameter that affects the performance of the tiled loop. The tile size should be chosen based on the characteristics of the target hardware, such as the cache size and memory bandwidth.
3.
Transforming the loop nest: The original loop nest is then transformed to process the tiles in the desired order, typically by adding additional loop levels to iterate over the tiles.
4.
Optimizing the tile size: The tile size may need to be tuned iteratively to find the optimal balance between reducing cache misses and maintaining a manageable working set.
理论基础
背景