AST Transformations#

pystencils.backend.transformations

This module contains various transformation and optimization passes that can be executed on the backend AST.

Canonical Form#

Many transformations in this module require that their input AST is in canonical form. This means that:

  • Each symbol, constant, and expression node is annotated with a data type;

  • Each symbol has at most one declaration;

  • Each symbol that is never written to apart from its declaration has a const type; and

  • Each symbol whose type is not const has at least one non-declaring assignment.

The first requirement can be ensured by running the Typifier on each newly constructed subtree. The other three requirements are ensured by the CanonicalizeSymbols pass, which should be run first before applying any optimizing transformations. All transformations in this module retain canonicality of the AST.

Canonicality allows transformations to forego various checks that would otherwise be necessary to prove their legality.

Certain transformations, like the LoopVectorizer, state additional requirements, e.g. the absence of loop-carried dependencies.

Transformations#

Canonicalization#

class pystencils.backend.transformations.CanonicalizeSymbols(ctx, constify=True)#

Remove duplicate symbol declarations and declare all non-updated symbols const.

The CanonicalizeSymbols pass will remove multiple declarations of the same symbol by renaming all but the last occurence, and will optionally const-qualify all symbols encountered in the AST that are never updated.

Parameters:
__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

AST Cloning#

class pystencils.backend.transformations.CanonicalClone(ctx)#

Clone a subtree, and rename all symbols declared inside it to retain canonicality.

Parameters:

ctx (KernelCreationContext)

__call__(node)#

Call self as a function.

Return type:

TypeVar(Node_T, bound= PsAstNode)

Parameters:

node (Node_T)

Simplifying Transformations#

class pystencils.backend.transformations.EliminateConstants(ctx, extract_constant_exprs=False, fold_integers=True, fold_relations=True, fold_floats=False)#

Eliminate constant expressions in various ways.

  • Constant folding: Nontrivial constant integer (and optionally floating point) expressions are evaluated and replaced by their result

  • Idempotence elimination: Idempotent operations (e.g. addition of zero, multiplication with one) are replaced by their result

  • Dominance elimination: Multiplication by zero is replaced by zero

  • Constant extraction: Optionally, nontrivial constant expressions are extracted and listed at the beginning of the outermost block.

Parameters:
__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

class pystencils.backend.transformations.EliminateBranches(ctx, use_isl=True)#

Replace conditional branches by their then- or else-branch if their condition can be unequivocally evaluated.

This pass will attempt to evaluate branch conditions within their context in the AST, and replace conditionals by either their then- or their else-block if the branch is unequivocal.

If islpy is installed, this pass will incorporate information about the iteration regions of enclosing loops and enclosing conditionals into its analysis.

Parameters:
__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

Code Rewriting#

pystencils.backend.transformations.substitute_symbols(node, subs)#

Substitute expressions for symbols throughout a subtree.

Return type:

PsAstNode

Parameters:

Code Motion#

class pystencils.backend.transformations.HoistLoopInvariantDeclarations(ctx)#

Hoist loop-invariant declarations out of the loop nest.

This transformation moves loop-invariant symbol declarations outside of the loop nest to prevent their repeated execution within the loops. If this transformation results in the complete elimination of a loop body, the respective loop is removed.

HoistLoopInvariantDeclarations assumes that symbols are canonical; in particular, each symbol may have at most one declaration. To ensure this, a CanonicalizeSymbols pass should be run before HoistLoopInvariantDeclarations.

HoistLoopInvariantDeclarations assumes that all PsMathFunction s are pure (have no side effects), but makes no such assumption about instances of CFunction.

Parameters:

ctx (KernelCreationContext)

__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

Loop Reshaping Transformations#

class pystencils.backend.transformations.ReshapeLoops(ctx)#

Various transformations for reshaping loop nests.

Parameters:

ctx (KernelCreationContext)

peel_loop_front(loop, num_iterations, omit_range_check=False)#

Peel off iterations from the front of a loop.

Removes num_iterations from the front of the given loop and returns them as a sequence of independent blocks.

Parameters:
  • loop (PsLoop) – The loop node from which to peel iterations

  • num_iterations (int) – The number of iterations to peel off

  • omit_range_check (bool) – If set to True, assume that the peeled-off iterations will always be executed, and omit their enclosing conditional.

Return type:

tuple[Sequence[PsBlock], PsLoop]

Returns:

Tuple containing the peeled-off iterations as a sequence of blocks, and the remaining loop.

peel_loop_back(loop, num_iterations, omit_range_check=False)#

Peel off iterations from the back of a loop.

Removes num_iterations from the back of the given loop and returns them as a sequence of independent blocks.

Parameters:
  • loop (PsLoop) – The loop node from which to peel iterations

  • num_iterations (int) – The number of iterations to peel off

  • omit_range_check (bool) – If set to True, assume that the peeled-off iterations will always be executed, and omit their enclosing conditional.

Return type:

tuple[PsLoop, Sequence[PsBlock]]

Returns:

Tuple containing the modified loop and the peeled-off iterations (sequence of blocks).

cut_loop(loop, cutting_points)#

Cut a loop at the given cutting points.

Cut the given loop at the iterations specified by the given cutting points, producing n new subtrees representing the iterations (loop.start:cutting_points[0]), (cutting_points[0]:cutting_points[1]), ..., (cutting_points[-1]:loop.stop).

Resulting subtrees representing zero iterations are dropped; subtrees representing exactly one iteration are returned without the trivial loop structure.

Currently, cut_loop performs no checks to ensure that the given cutting points are in fact inside the loop’s iteration range.

Return type:

Sequence[PsLoop | PsBlock]

Returns:

Sequence of n subtrees representing the respective iteration ranges

Parameters:
class pystencils.backend.transformations.InsertPragmasAtLoops(ctx, insertions)#

Insert pragmas before loops in a loop nest.

This transformation augments the AST with pragma directives which are prepended to loops. The directives are annotated with the nesting depth of the loops they should be added to, where -1 indicates the innermost loop.

The relative order of pragmas with the (exact) same nesting depth is preserved; however, no guarantees are given about the relative order of pragmas inserted at -1 and at the actual depth of the innermost loop.

Parameters:
class pystencils.backend.transformations.AddOpenMP(ctx, nesting_depth=0, num_threads=None, schedule=None, collapse=None, omit_parallel=False)#

Apply OpenMP directives to loop nests.

This transformation augments the AST with OpenMP pragmas according to the given configuration.

Parameters:

Vectorization#

class pystencils.backend.transformations.VectorizationAxis(counter, vectorized_counter=None, step=PsConstantExpr(1: <untyped>))#

Information about the iteration axis along which a subtree is being vectorized.

Parameters:
counter: PsSymbol#

Scalar iteration counter of this axis

vectorized_counter: PsSymbol | None = None#

Vectorized iteration counter of this axis

step: PsExpression = PsConstantExpr(1: <untyped>)#

Step size of the scalar iteration

class pystencils.backend.transformations.VectorizationContext(ctx, lanes, axis, vectorized_symbols=None)#

Context information for AST vectorization.

Parameters:
property lanes: int#

Number of vector lanes

property axis: VectorizationAxis#

Iteration axis along which to vectorize

property vectorized_symbols: dict[PsSymbol, PsSymbol]#

Dictionary mapping scalar symbols that are being vectorized to their vectorized copies

property lane_mask: PsSymbol | None#

Symbol representing the current lane execution mask, or None if all lanes are active.

get_lane_mask_expr()#

Retrieve an expression representing the current lane execution mask.

Return type:

PsExpression

vectorize_symbol(symb)#

Vectorize the given symbol of scalar type.

Creates a duplicate of the given symbol with vectorized data type, adds it to the vectorized_symbols dict, and returns the duplicate.

Raises:

VectorizationError – If the symbol’s data type was not a PsScalarType, or if the symbol was already vectorized

Return type:

PsSymbol

Parameters:

symb (PsSymbol)

vector_type(scalar_type)#

Vectorize the given scalar data type.

Raises:

VectorizationError – If the given data type was not a PsScalarType.

Return type:

PsVectorType

Parameters:

scalar_type (PsType)

axis_ctr_dependees(symbols)#

Returns all symbols in symbols that depend on the axis counter.

Return type:

set[PsSymbol]

Parameters:

symbols (set[PsSymbol])

class pystencils.backend.transformations.AstVectorizer(ctx)#

Transform a scalar subtree into a SIMD-parallel version of itself.

The AstVectorizer constructs a vectorized copy of a subtree by creating a SIMD-parallel version of each of its nodes, one at a time. It relies on information given in a VectorizationContext that defines the current environment, including the vectorization axis, the number of vector lanes, and an execution mask determining which vector lanes are active.

Memory Accesses: The AST vectorizer is capable of vectorizing PsMemAcc and PsBufferAcc only under certain circumstances:

  • If all indices are independent of both the vectorization axis’ counter and any vectorized symbols, the memory access is lane-invariant, and its result will be broadcast to all vector lanes.

  • If at most one index depends on the axis counter via an affine expression, and does not depend on any vectorized symbols, the memory access can be performed in parallel, either contiguously or strided, and is replaced by a PsVecMemAcc.

  • All other cases cause vectorization to fail.

Legality: The AST vectorizer performs no legality checks and in particular assumes the absence of loop-carried dependencies; i.e. all iterations of the vectorized subtree must already be independent of each other, and insensitive to execution order.

Result and Failures: The AST vectorizer does not alter the original subtree, but constructs and returns a copy of it. Any symbols declared within the subtree are therein replaced by canonically renamed, vectorized copies of themselves.

If the AST vectorizer is unable to transform a subtree, it raises a VectorizationError.

Parameters:

ctx (KernelCreationContext)

visit(node, vc)#

Vectorize a subtree.

Return type:

PsAstNode

Parameters:
visit_expr(expr, vc)#

Vectorize an expression.

Return type:

PsExpression

Parameters:
class pystencils.backend.transformations.LoopVectorizer(ctx, lanes, trailing_iters=TrailingItersTreatment.SCALAR_LOOP)#

Vectorize loops.

The loop vectorizer provides methods to vectorize single loops inside an AST using a given number of vector lanes. During vectorization, the loop body is transformed using the AstVectorizer, The loop’s limits are adapted according to the number of vector lanes, and a block treating trailing iterations is optionally added.

Parameters:
class TrailingItersTreatment(value)#

How to treat trailing iterations during loop vectorization.

SCALAR_LOOP = 1#

Cover trailing iterations using a scalar remainder loop.

MASKED_BLOCK = 2#

Cover trailing iterations using a masked block.

NONE = 3#

Assume that the loop iteration count is a multiple of the number of lanes and do not cover any trailing iterations

vectorize_select_loops(node, predicate)#

Select and vectorize loops from a syntax tree according to a predicate.

Finds each loop inside a subtree and evaluates predicate on them. If predicate(loop) evaluates to True, the loop is vectorized.

Loops nested inside a vectorized loop will not be processed.

Parameters:
  • node (PsAstNode) – Root of the subtree to process

  • predicate (Callable[[PsLoop], bool]) – Callback telling the vectorizer which loops to vectorize

Return type:

PsAstNode

vectorize_loop(loop)#

Vectorize the given loop.

Return type:

PsLoop | PsBlock

Parameters:

loop (PsLoop)

Code Lowering and Materialization#

class pystencils.backend.transformations.LowerToC(ctx)#

Lower high-level IR constructs to C language concepts.

This pass will replace a number of IR constructs that have no direct counterpart in the C language to lower-level AST nodes. These include:

  • Linearization of Buffer Accesses: PsBufferAcc buffer accesses are linearized according to their buffers’ stride information and replaced by PsMemAcc.

  • Erasure of Anonymous Structs: For buffers whose element type is an anonymous struct, the struct type is erased from the base pointer, making it a pointer to uint8_t. Member lookups on accesses into these buffers are then transformed using type casts.

Parameters:

ctx (KernelCreationContext)

__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

class pystencils.backend.transformations.SelectFunctions(platform)#

Traverse the AST to replace all instances of PsMathFunction by their implementation provided by the given Platform.

Parameters:

platform (Platform)

__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

class pystencils.backend.transformations.SelectIntrinsics(ctx, platform, use_builtin_convertvector=False)#

Lower IR vector types to intrinsic vector types, and IR vector operations to intrinsic vector operations.

This transformation will replace all vectorial IR elements by conforming implementations using compiler intrinsics for the given execution platform.

Parameters:
  • ctx (KernelCreationContext) – The current kernel creation context

  • platform (GenericVectorCpu) – Platform object representing the target hardware, which provides the intrinsics

  • use_builtin_convertvector (bool) – If True, type conversions between SIMD vectors use the compiler builtin __builtin_convertvector instead of instrinsics. It is supported by Clang >= 3.7, GCC >= 9.1, and ICX. Not supported by ICC or MSVC. Activate if you need type conversions not natively supported by your CPU, e.g. conversion from 64bit integer to double on an x86 AVX machine. Defaults to False.

Raises:

MaterializationError – If a vector type or operation cannot be represented by intrinsics on the given platform