AST Transformations#
pystencils.backend.transformations
This module contains various transformation and optimization passes that can be executed on the backend AST.
Canonical Form#
Many transformations in this module require that their input AST is in canonical form. This means that:
Each symbol, constant, and expression node is annotated with a data type;
Each symbol has at most one declaration;
Each symbol that is never written to apart from its declaration has a
const
type; andEach symbol whose type is not
const
has at least one non-declaring assignment.
The first requirement can be ensured by running the Typifier
on each newly constructed subtree.
The other three requirements are ensured by the CanonicalizeSymbols
pass,
which should be run first before applying any optimizing transformations.
All transformations in this module retain canonicality of the AST.
Canonicality allows transformations to forego various checks that would otherwise be necessary to prove their legality.
Certain transformations, like the LoopVectorizer
, state additional requirements, e.g.
the absence of loop-carried dependencies.
Transformations#
Canonicalization#
- class pystencils.backend.transformations.CanonicalizeSymbols(ctx, constify=True)#
Remove duplicate symbol declarations and declare all non-updated symbols
const
.The
CanonicalizeSymbols
pass will remove multiple declarations of the same symbol by renaming all but the last occurence, and will optionallyconst
-qualify all symbols encountered in the AST that are never updated.- Parameters:
ctx (KernelCreationContext)
constify (bool)
AST Cloning#
- class pystencils.backend.transformations.CanonicalClone(ctx)#
Clone a subtree, and rename all symbols declared inside it to retain canonicality.
- Parameters:
ctx (KernelCreationContext)
Simplifying Transformations#
- class pystencils.backend.transformations.EliminateConstants(ctx, extract_constant_exprs=False, fold_integers=True, fold_relations=True, fold_floats=False)#
Eliminate constant expressions in various ways.
Constant folding: Nontrivial constant integer (and optionally floating point) expressions are evaluated and replaced by their result
Idempotence elimination: Idempotent operations (e.g. addition of zero, multiplication with one) are replaced by their result
Dominance elimination: Multiplication by zero is replaced by zero
Constant extraction: Optionally, nontrivial constant expressions are extracted and listed at the beginning of the outermost block.
- Parameters:
ctx (KernelCreationContext)
extract_constant_exprs (bool)
fold_integers (bool)
fold_relations (bool)
fold_floats (bool)
- class pystencils.backend.transformations.EliminateBranches(ctx, use_isl=True)#
Replace conditional branches by their then- or else-branch if their condition can be unequivocally evaluated.
This pass will attempt to evaluate branch conditions within their context in the AST, and replace conditionals by either their then- or their else-block if the branch is unequivocal.
If islpy is installed, this pass will incorporate information about the iteration regions of enclosing loops and enclosing conditionals into its analysis.
- Parameters:
use_isl (bool, optional) – enable islpy based analysis (default: True)
ctx (KernelCreationContext)
Code Rewriting#
Code Motion#
- class pystencils.backend.transformations.HoistLoopInvariantDeclarations(ctx)#
Hoist loop-invariant declarations out of the loop nest.
This transformation moves loop-invariant symbol declarations outside of the loop nest to prevent their repeated execution within the loops. If this transformation results in the complete elimination of a loop body, the respective loop is removed.
HoistLoopInvariantDeclarations
assumes that symbols are canonical; in particular, each symbol may have at most one declaration. To ensure this, aCanonicalizeSymbols
pass should be run beforeHoistLoopInvariantDeclarations
.HoistLoopInvariantDeclarations
assumes that allPsMathFunction
s are pure (have no side effects), but makes no such assumption about instances ofCFunction
.- Parameters:
ctx (KernelCreationContext)
Loop Reshaping Transformations#
- class pystencils.backend.transformations.ReshapeLoops(ctx)#
Various transformations for reshaping loop nests.
- Parameters:
ctx (KernelCreationContext)
- peel_loop_front(loop, num_iterations, omit_range_check=False)#
Peel off iterations from the front of a loop.
Removes
num_iterations
from the front of the given loop and returns them as a sequence of independent blocks.- Parameters:
- Return type:
- Returns:
Tuple containing the peeled-off iterations as a sequence of blocks, and the remaining loop.
- peel_loop_back(loop, num_iterations, omit_range_check=False)#
Peel off iterations from the back of a loop.
Removes
num_iterations
from the back of the given loop and returns them as a sequence of independent blocks.- Parameters:
- Return type:
- Returns:
Tuple containing the modified loop and the peeled-off iterations (sequence of blocks).
- cut_loop(loop, cutting_points)#
Cut a loop at the given cutting points.
Cut the given loop at the iterations specified by the given cutting points, producing
n
new subtrees representing the iterations(loop.start:cutting_points[0]), (cutting_points[0]:cutting_points[1]), ..., (cutting_points[-1]:loop.stop)
.Resulting subtrees representing zero iterations are dropped; subtrees representing exactly one iteration are returned without the trivial loop structure.
Currently,
cut_loop
performs no checks to ensure that the given cutting points are in fact inside the loop’s iteration range.
- class pystencils.backend.transformations.InsertPragmasAtLoops(ctx, insertions)#
Insert pragmas before loops in a loop nest.
This transformation augments the AST with pragma directives which are prepended to loops. The directives are annotated with the nesting depth of the loops they should be added to, where
-1
indicates the innermost loop.The relative order of pragmas with the (exact) same nesting depth is preserved; however, no guarantees are given about the relative order of pragmas inserted at
-1
and at the actual depth of the innermost loop.- Parameters:
ctx (KernelCreationContext)
insertions (Sequence[LoopPragma])
- class pystencils.backend.transformations.AddOpenMP(ctx, nesting_depth=0, num_threads=None, schedule=None, collapse=None, omit_parallel=False)#
Apply OpenMP directives to loop nests.
This transformation augments the AST with OpenMP pragmas according to the given configuration.
Vectorization#
- class pystencils.backend.transformations.VectorizationAxis(counter, vectorized_counter=None, step=PsConstantExpr(1: <untyped>))#
Information about the iteration axis along which a subtree is being vectorized.
- Parameters:
counter (PsSymbol)
vectorized_counter (PsSymbol | None)
step (PsExpression)
-
step:
PsExpression
= PsConstantExpr(1: <untyped>)# Step size of the scalar iteration
- class pystencils.backend.transformations.VectorizationContext(ctx, lanes, axis, vectorized_symbols=None)#
Context information for AST vectorization.
- Parameters:
lanes (
int
) – Number of vector lanesaxis (
VectorizationAxis
) – Iteration axis along which code is being vectorizedctx (KernelCreationContext)
- property axis: VectorizationAxis#
Iteration axis along which to vectorize
- property vectorized_symbols: dict[PsSymbol, PsSymbol]#
Dictionary mapping scalar symbols that are being vectorized to their vectorized copies
- property lane_mask: PsSymbol | None#
Symbol representing the current lane execution mask, or
None
if all lanes are active.
- get_lane_mask_expr()#
Retrieve an expression representing the current lane execution mask.
- Return type:
- vectorize_symbol(symb)#
Vectorize the given symbol of scalar type.
Creates a duplicate of the given symbol with vectorized data type, adds it to the
vectorized_symbols
dict, and returns the duplicate.- Raises:
VectorizationError – If the symbol’s data type was not a
PsScalarType
, or if the symbol was already vectorized- Return type:
- Parameters:
symb (PsSymbol)
- vector_type(scalar_type)#
Vectorize the given scalar data type.
- Raises:
VectorizationError – If the given data type was not a
PsScalarType
.- Return type:
- Parameters:
scalar_type (PsType)
- class pystencils.backend.transformations.AstVectorizer(ctx)#
Transform a scalar subtree into a SIMD-parallel version of itself.
The
AstVectorizer
constructs a vectorized copy of a subtree by creating a SIMD-parallel version of each of its nodes, one at a time. It relies on information given in aVectorizationContext
that defines the current environment, including the vectorization axis, the number of vector lanes, and an execution mask determining which vector lanes are active.Memory Accesses: The AST vectorizer is capable of vectorizing
PsMemAcc
andPsBufferAcc
only under certain circumstances:If all indices are independent of both the vectorization axis’ counter and any vectorized symbols, the memory access is lane-invariant, and its result will be broadcast to all vector lanes.
If at most one index depends on the axis counter via an affine expression, and does not depend on any vectorized symbols, the memory access can be performed in parallel, either contiguously or strided, and is replaced by a
PsVecMemAcc
.All other cases cause vectorization to fail.
Legality: The AST vectorizer performs no legality checks and in particular assumes the absence of loop-carried dependencies; i.e. all iterations of the vectorized subtree must already be independent of each other, and insensitive to execution order.
Result and Failures: The AST vectorizer does not alter the original subtree, but constructs and returns a copy of it. Any symbols declared within the subtree are therein replaced by canonically renamed, vectorized copies of themselves.
If the AST vectorizer is unable to transform a subtree, it raises a
VectorizationError
.- Parameters:
ctx (KernelCreationContext)
- visit(node, vc)#
Vectorize a subtree.
- Return type:
- Parameters:
node (PsAstNode)
vc (VectorizationContext)
- visit_expr(expr, vc)#
Vectorize an expression.
- Return type:
- Parameters:
expr (PsExpression)
vc (VectorizationContext)
- class pystencils.backend.transformations.LoopVectorizer(ctx, lanes, trailing_iters=TrailingItersTreatment.SCALAR_LOOP)#
Vectorize loops.
The loop vectorizer provides methods to vectorize single loops inside an AST using a given number of vector lanes. During vectorization, the loop body is transformed using the
AstVectorizer
, The loop’s limits are adapted according to the number of vector lanes, and a block treating trailing iterations is optionally added.- Parameters:
ctx (
KernelCreationContext
) – The current kernel creation contextlanes (
int
) – The number of vector lanes to usetrailing_iters (
TrailingItersTreatment
) – Mode for the treatment of trailing iterations
- class TrailingItersTreatment(value)#
How to treat trailing iterations during loop vectorization.
- SCALAR_LOOP = 1#
Cover trailing iterations using a scalar remainder loop.
- MASKED_BLOCK = 2#
Cover trailing iterations using a masked block.
- NONE = 3#
Assume that the loop iteration count is a multiple of the number of lanes and do not cover any trailing iterations
- vectorize_select_loops(node, predicate)#
Select and vectorize loops from a syntax tree according to a predicate.
Finds each loop inside a subtree and evaluates
predicate
on them. Ifpredicate(loop)
evaluates toTrue
, the loop is vectorized.Loops nested inside a vectorized loop will not be processed.
Code Lowering and Materialization#
- class pystencils.backend.transformations.LowerToC(ctx)#
Lower high-level IR constructs to C language concepts.
This pass will replace a number of IR constructs that have no direct counterpart in the C language to lower-level AST nodes. These include:
Linearization of Buffer Accesses:
PsBufferAcc
buffer accesses are linearized according to their buffers’ stride information and replaced byPsMemAcc
.Erasure of Anonymous Structs: For buffers whose element type is an anonymous struct, the struct type is erased from the base pointer, making it a pointer to uint8_t. Member lookups on accesses into these buffers are then transformed using type casts.
- Parameters:
ctx (KernelCreationContext)
- class pystencils.backend.transformations.SelectFunctions(platform)#
Traverse the AST to replace all instances of
PsMathFunction
by their implementation provided by the givenPlatform
.- Parameters:
platform (Platform)
- class pystencils.backend.transformations.SelectIntrinsics(ctx, platform, use_builtin_convertvector=False)#
Lower IR vector types to intrinsic vector types, and IR vector operations to intrinsic vector operations.
This transformation will replace all vectorial IR elements by conforming implementations using compiler intrinsics for the given execution platform.
- Parameters:
ctx (
KernelCreationContext
) – The current kernel creation contextplatform (
GenericVectorCpu
) – Platform object representing the target hardware, which provides the intrinsicsuse_builtin_convertvector (
bool
) – IfTrue
, type conversions between SIMD vectors use the compiler builtin__builtin_convertvector
instead of instrinsics. It is supported by Clang >= 3.7, GCC >= 9.1, and ICX. Not supported by ICC or MSVC. Activate if you need type conversions not natively supported by your CPU, e.g. conversion from 64bit integer to double on an x86 AVX machine. Defaults toFalse
.
- Raises:
MaterializationError – If a vector type or operation cannot be represented by intrinsics on the given platform