pystencils.codegen.gpu_indexing.DynamicBlockSizeLaunchConfiguration#
- class pystencils.codegen.gpu_indexing.DynamicBlockSizeLaunchConfiguration(rank, num_work_items, hw_props, assume_warp_aligned_block_size)#
GPU launch configuration that dynamically computes the grid size from either the default block size or a computed block size. Computing block sizes can be triggerred via the
trim_block_size()orfit_block_size()member functions. These functions adapt a user-defined initial block size that they receive as an argument. The adaptation of the initial block sizes is described in the following:For each dimension \(c \in \{ x, y, z \}\),
if
fit_block_size()was chosen:the initial block size is adapted such that it aligns with multiples of the hardware’s warp size. This is done using a fitting algorithm first trims the initial block size with the iteration space and increases it incrementally until it is large enough and coincides with multiples of the warp size, i.e.
block_size.c = _fit_block_size_to_it_space(iter_space.c, init_block_size.c, hardware_properties)The fitted block size also guarantees the user usage of
GpuOptions.assume_warp_aligned_block_size.elif
trim_block_size()was chosen:a trimming between the number of work items and the kernel’s iteration space occurs, i.e.
if
init_block_size.c > num_work_items.c,block_size = num_work_items.cotherwise,
block_size.c = init_block_size.c
When
GpuOptions.assume_warp_aligned_block_sizeis set, we ensure warp-alignment by rounding the block size dimension that is closest the next multiple of the warp size.otherwise: the default block size is taken i.e.
block_size.c = get_default_block_size(rank=3).c
The actual launch grid size is then computed as follows.
grid_size.c = ceil(num_work_items.c / block_size.c).- Parameters:
rank (int)
num_work_items (_Dim3Lambda)
hw_props (HardwareProperties)
assume_warp_aligned_block_size (bool)
- property num_work_items: tuple[Lambda, Lambda, Lambda]#
Lambda expressions that compute the number of work items in each iteration space dimension from kernel parameters.
- property block_size: tuple[int, int, int] | None#
Block size is only available when
evaluateis called.
- evaluate(**kwargs)#
Compute block and grid size for a kernel launch.