AST Transformations#
pystencils.backend.transformations
This module contains various transformation and optimization passes that can be executed on the backend AST.
Transformations#
Canonicalization#
- class pystencils.backend.transformations.CanonicalizeSymbols(ctx, constify=True)#
Remove duplicate symbol declarations and declare all non-updated symbols
const.The
CanonicalizeSymbolspass will remove multiple declarations of the same symbol by renaming all but the last occurence, and will optionallyconst-qualify all symbols encountered in the AST that are never updated.- Parameters:
ctx (KernelCreationContext)
constify (bool)
AST Cloning#
- class pystencils.backend.transformations.CanonicalClone(ctx)#
Clone a subtree, and rename all symbols declared inside it to retain canonicality.
- Parameters:
ctx (KernelCreationContext)
Simplifying Transformations#
- class pystencils.backend.transformations.EliminateConstants(ctx, extract_constant_exprs=False, fold_integers=True, fold_relations=True, fold_floats=False)#
Eliminate constant expressions in various ways.
Constant folding: Nontrivial constant integer (and optionally floating point) expressions are evaluated and replaced by their result
Idempotence elimination: Idempotent operations (e.g. addition of zero, multiplication with one) are replaced by their result
Dominance elimination: Multiplication by zero is replaced by zero
Constant extraction: Optionally, nontrivial constant expressions are extracted and listed at the beginning of the outermost block.
- Parameters:
ctx (KernelCreationContext)
extract_constant_exprs (bool)
fold_integers (bool)
fold_relations (bool)
fold_floats (bool)
- class pystencils.backend.transformations.TypifyAndFold(ctx)#
Compute data types and fold constants on AST nodes.
- Parameters:
ctx (KernelCreationContext)
- class pystencils.backend.transformations.EliminateBranches(ctx, use_isl=True)#
Replace conditional branches by their then- or else-branch if their condition can be unequivocally evaluated.
This pass will attempt to evaluate branch conditions within their context in the AST, and replace conditionals by either their then- or their else-block if the branch is unequivocal.
If islpy is installed, this pass will incorporate information about the iteration regions of enclosing loops and enclosing conditionals into its analysis.
- Parameters:
use_isl (bool, optional) – enable islpy based analysis (default: True)
ctx (KernelCreationContext)
Code Rewriting#
- pystencils.backend.transformations.substitute_symbols(node, subs)#
Substitute expressions for symbols throughout a subtree.
- Return type:
- Parameters:
node (PsAstNode)
subs (dict[PsSymbol, PsExpression])
- pystencils.backend.transformations.collapse_blocks(node)#
Collapse trivially nested blocks to improve readability.
Blocks that just have another block as their single child are collapsed.
- Return type:
PsStructuralNode- Parameters:
node (PsStructuralNode)
Code Motion#
- class pystencils.backend.transformations.HoistIterationInvariantDeclarations(ctx)#
Hoist loop-invariant declarations out of the loop nest.
This transformation moves loop-invariant symbol declarations outside of the loop nest to prevent their repeated execution within the loops. If this transformation results in the complete elimination of a loop body, the respective loop is removed.
HoistIterationInvariantDeclarationsassumes that symbols are canonical; in particular, each symbol may have at most one declaration. To ensure this, aCanonicalizeSymbolspass should be run beforeHoistIterationInvariantDeclarations.HoistIterationInvariantDeclarationsassumes that allPsMathFunctions are pure (have no side effects), but makes no such assumption about instances ofCFunction.- Parameters:
ctx (KernelCreationContext)
Axis and Loop Transformations#
- class pystencils.backend.transformations.AxisExpansion(ctx)#
- Parameters:
ctx (KernelCreationContext)
- loop(coordinate=None)#
Expand one dimension fully as a loop.
- Return type:
ExpansionFunc- Parameters:
coordinate (int | None)
- parallel_loop(coordinate=None, *, num_threads=None, schedule=None, collapse=None)#
Expand one dimension fully as a parallel loop.
- block_loop(block_size, coordinate=None, *, assume_divisible=False)#
Introduce a block loop with given block size in the given dimension.
- Parameters:
- Return type:
ExpansionFunc
- parallel_block_loop(block_size, coordinate=None, *, assume_divisible=False, num_threads=None, schedule=None, collapse=None)#
Introduce a parallel block loop with given block size in the given dimension.
- Parameters:
- Return type:
ExpansionFunc
- peel_for_divisibility(k, coordinate=None)#
Peel off the minimal number of iterations from the back of one dimension such that the number of iterations in the bulk part is divisible by
k.
- simd(lanes)#
Apply to a cube with only one coordinate to convert it to a vectorized block
- Return type:
ExpansionFunc- Parameters:
lanes (int)
- gpu_block(dim, coordinate=None)#
Map one cube coordinate onto the GPU block index in the given grid dimension.
- Parameters:
dim (
str|GpuGridDimension) – GPU grid coordinate,"x","y"or"z".coordinate (int | None)
- Return type:
ExpansionFunc
- gpu_thread(dim, coordinate=None)#
Map one cube coordinate onto the GPU thread index in the given grid dimension.
- Parameters:
dim (
str|GpuGridDimension) – GPU grid coordinate,"x","y"or"z".coordinate (int | None)
- Return type:
ExpansionFunc
- gpu_block_x_thread(dim, coordinate=None)#
Map one cube coordinate onto the product of GPU block and thread index in the given grid dimension.
- Parameters:
dim (
str|GpuGridDimension) – GPU grid coordinate,"x","y"or"z".coordinate (int | None)
- Return type:
ExpansionFunc
- class pystencils.backend.transformations.MaterializeAxes(ctx)#
Materialize iteration axes.
This transformer converts all iteration axis in a given AST to their lower-level implementation. It introduces loops for loop axes, OpenMP constructs for parallel loops, applies vectorization in SIMD axes and constructs GPU index translations for GPU axes.
The axis materializer furthermore introduces declarations and agglomeration of modulo variables for reductions occuring in the kernel.
- Parameters:
ctx (KernelCreationContext)
- class pystencils.backend.transformations.ReshapeLoops(ctx)#
Various transformations for reshaping loop nests.
- Parameters:
ctx (KernelCreationContext)
- peel_loop_front(loop, num_iterations, omit_range_check=False)#
Peel off iterations from the front of a loop.
Removes
num_iterationsfrom the front of the given loop and returns them as a sequence of independent blocks.- Parameters:
- Return type:
- Returns:
Tuple containing the peeled-off iterations as a sequence of blocks, and the remaining loop.
- peel_loop_back(loop, num_iterations, omit_range_check=False)#
Peel off iterations from the back of a loop.
Removes
num_iterationsfrom the back of the given loop and returns them as a sequence of independent blocks.- Parameters:
- Return type:
- Returns:
Tuple containing the modified loop and the peeled-off iterations (sequence of blocks).
- cut_loop(loop, cutting_points)#
Cut a loop at the given cutting points.
Cut the given loop at the iterations specified by the given cutting points, producing
nnew subtrees representing the iterations(loop.start:cutting_points[0]), (cutting_points[0]:cutting_points[1]), ..., (cutting_points[-1]:loop.stop).Resulting subtrees representing zero iterations are dropped; subtrees representing exactly one iteration are returned without the trivial loop structure.
Currently,
cut_loopperforms no checks to ensure that the given cutting points are in fact inside the loop’s iteration range.
- class pystencils.backend.transformations.InsertPragmasAtLoops(ctx, insertions)#
Insert pragmas before loops in a loop nest.
This transformation augments the AST with pragma directives which are prepended to loops. The directives are annotated with the nesting depth of the loops they should be added to, where
-1indicates the innermost loop.The relative order of pragmas with the (exact) same nesting depth is preserved; however, no guarantees are given about the relative order of pragmas inserted at
-1and at the actual depth of the innermost loop.- Parameters:
ctx (KernelCreationContext)
insertions (Sequence[LoopPragma])
- class pystencils.backend.transformations.AddOpenMP(ctx, reductions=(), nesting_depth=0, num_threads=None, schedule=None, collapse=None, omit_parallel=False)#
Apply OpenMP directives to loop nests.
This transformation augments the AST with OpenMP pragmas according to the given configuration.
Vectorization#
- class pystencils.backend.transformations.VectorizationAxis(counter, step=PsConstantExpr(1: <untyped>))#
Information about the iteration axis along which a subtree is being vectorized.
- Parameters:
counter (PsSymbol)
step (PsExpression)
-
step:
PsExpression= PsConstantExpr(1: <untyped>)# Step size of the scalar iteration
- class pystencils.backend.transformations.VectorizationContext(ctx, lanes, axis, vectorized_symbols=None)#
Context information for AST vectorization.
- Parameters:
lanes (
int) – Number of vector lanesaxis (
VectorizationAxis) – Iteration axis along which code is being vectorizedctx (KernelCreationContext)
- property axis: VectorizationAxis#
Iteration axis along which to vectorize
- property vectorized_symbols: dict[PsSymbol, PsSymbol]#
Dictionary mapping scalar symbols that are being vectorized to their vectorized copies
- property lane_mask: PsSymbol | None#
Symbol representing the current lane execution mask, or
Noneif all lanes are active.
- get_lane_mask_expr()#
Retrieve an expression representing the current lane execution mask.
- Return type:
- vectorize_symbol(symb)#
Vectorize the given symbol of scalar type.
Creates a duplicate of the given symbol with vectorized data type, adds it to the
vectorized_symbolsdict, and returns the duplicate.- Raises:
VectorizationError – If the symbol’s data type was not a
PsScalarType, or if the symbol was already vectorized- Return type:
- Parameters:
symb (PsSymbol)
- vector_type(scalar_type)#
Vectorize the given scalar data type.
- Raises:
VectorizationError – If the given data type was not a
PsScalarType.- Return type:
- Parameters:
scalar_type (PsType)
- class pystencils.backend.transformations.AstVectorizer(ctx)#
Transform a scalar subtree into a SIMD-parallel version of itself.
The
AstVectorizerconstructs a vectorized copy of a subtree by creating a SIMD-parallel version of each of its nodes, one at a time. It relies on information given in aVectorizationContextthat defines the current environment, including the vectorization axis, the number of vector lanes, and an execution mask determining which vector lanes are active.Memory Accesses: The AST vectorizer is capable of vectorizing
PsMemAccandPsBufferAcconly under certain circumstances:If all indices are independent of both the vectorization axis’ counter and any vectorized symbols, the memory access is lane-invariant, and its result will be broadcast to all vector lanes.
If at most one index depends on the axis counter via an affine expression, and does not depend on any vectorized symbols, the memory access can be performed in parallel, either contiguously or strided, and is replaced by a
PsVecMemAcc.All other cases cause vectorization to fail.
Legality: The AST vectorizer performs no legality checks and in particular assumes the absence of loop-carried dependencies; i.e. all iterations of the vectorized subtree must already be independent of each other, and insensitive to execution order.
Result and Failures: The AST vectorizer does not alter the original subtree, but constructs and returns a copy of it. Any symbols declared within the subtree are therein replaced by canonically renamed, vectorized copies of themselves.
If the AST vectorizer is unable to transform a subtree, it raises a
VectorizationError.- Parameters:
ctx (KernelCreationContext)
- visit(node, vc)#
Vectorize a subtree.
- Return type:
- Parameters:
node (PsAstNode)
vc (VectorizationContext)
- visit_expr(expr, vc)#
Vectorize an expression.
- Return type:
- Parameters:
expr (PsExpression)
vc (VectorizationContext)
Code Lowering and Materialization#
- class pystencils.backend.transformations.ReductionsToMemory(ctx, reductions)#
Introduce IR nodes for performing reductions to memory.
This transformer takes a
blockand adds to it the declarations and write-back IR functions for the given list of reductions. Modulo variable declarations are prepended, and write-back logic is appended to the end of the block.- Parameters:
ctx (KernelCreationContext)
reductions (Iterable[ReductionInfo])
- class pystencils.backend.transformations.LowerToC(ctx)#
Lower high-level IR constructs to C language concepts.
This pass will replace a number of IR constructs that have no direct counterpart in the C language to lower-level AST nodes. These include:
Linearization of Buffer Accesses:
PsBufferAccbuffer accesses are linearized according to their buffers’ stride information and replaced byPsMemAcc.Erasure of Anonymous Structs: For buffers whose element type is an anonymous struct, the struct type is erased from the base pointer, making it a pointer to uint8_t. Member lookups on accesses into these buffers are then transformed using type casts.
- Parameters:
ctx (KernelCreationContext)
- class pystencils.backend.transformations.SelectFunctions(platform)#
Traverse the AST to replace all instances of
PsMathFunctionby their implementation provided by the givenPlatform.- Parameters:
platform (Platform)
- class pystencils.backend.transformations.SelectIntrinsics(ctx, use_builtin_convertvector=False)#
Lower IR vector types to intrinsic vector types, and IR vector operations to intrinsic vector operations.
Implementations of this transformation will replace all vectorial IR elements by conforming implementations using compiler intrinsics for the given execution platform.
A subclass implementing this visitor’s abstract methods must be set up for each vector CPU platform.
- Parameters:
ctx (
KernelCreationContext) – The current kernel creation contextuse_builtin_convertvector (
bool) – IfTrue, type conversions between SIMD vectors use the compiler builtin__builtin_convertvectorinstead of instrinsics. It is supported by Clang >= 3.7, GCC >= 9.1, and ICX. Not supported by ICC or MSVC. Activate if you need type conversions not natively supported by your CPU, e.g. conversion from 64bit integer to double on an x86 AVX machine. Defaults toFalse.
- Raises:
MaterializationError – If a vector type or operation cannot be represented by intrinsics on the given platform
- abstract type_intrinsic(vector_type, sc)#
Return the intrinsic vector type for the given generic vector type, or raise a
MaterializationErrorif type is not supported.- Return type:
- Parameters:
vector_type (PsVectorType)
sc (SelectionContext)
- abstract constant_intrinsic(c, sc)#
Return an expression that initializes a constant vector, or raise a
MaterializationErrorif not supported.- Return type:
- Parameters:
c (PsConstant)
sc (SelectionContext)
- abstract op_intrinsic(expr, operands, sc)#
Return an expression intrinsically invoking the given operation or raise a
MaterializationErrorif not supported.- Return type:
- Parameters:
expr (PsExpression)
operands (Sequence[PsExpression])
sc (SelectionContext)
- abstract math_func_intrinsic(expr, operands, sc)#
Return an expression intrinsically invoking the given mathematical function or raise a
MaterializationErrorif not supported.- Return type:
- Parameters:
expr (PsCall)
operands (Sequence[PsExpression])
sc (SelectionContext)
- abstract vector_load(acc, sc)#
Return an expression intrinsically performing a vector load, or raise a
MaterializationErrorif not supported.- Return type:
- Parameters:
acc (PsVecMemAcc)
sc (SelectionContext)
- abstract vector_store(acc, arg, sc)#
Return an expression intrinsically performing a vector store, or raise a
MaterializationErrorif not supported.- Return type:
- Parameters:
acc (PsVecMemAcc)
arg (PsExpression)
sc (SelectionContext)