AST Transformations

Contents

AST Transformations#

pystencils.backend.transformations

This module contains various transformation and optimization passes that can be executed on the backend AST.

Transformations#

Canonicalization#

class pystencils.backend.transformations.CanonicalizeSymbols(ctx, constify=True)#

Remove duplicate symbol declarations and declare all non-updated symbols const.

The CanonicalizeSymbols pass will remove multiple declarations of the same symbol by renaming all but the last occurence, and will optionally const-qualify all symbols encountered in the AST that are never updated.

Parameters:
__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

AST Cloning#

class pystencils.backend.transformations.CanonicalClone(ctx)#

Clone a subtree, and rename all symbols declared inside it to retain canonicality.

Parameters:

ctx (KernelCreationContext)

__call__(node)#

Call self as a function.

Return type:

TypeVar(Node_T, bound= PsAstNode)

Parameters:

node (Node_T)

Simplifying Transformations#

class pystencils.backend.transformations.EliminateConstants(ctx, extract_constant_exprs=False, fold_integers=True, fold_relations=True, fold_floats=False)#

Eliminate constant expressions in various ways.

  • Constant folding: Nontrivial constant integer (and optionally floating point) expressions are evaluated and replaced by their result

  • Idempotence elimination: Idempotent operations (e.g. addition of zero, multiplication with one) are replaced by their result

  • Dominance elimination: Multiplication by zero is replaced by zero

  • Constant extraction: Optionally, nontrivial constant expressions are extracted and listed at the beginning of the outermost block.

Parameters:
__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

class pystencils.backend.transformations.TypifyAndFold(ctx)#

Compute data types and fold constants on AST nodes.

Parameters:

ctx (KernelCreationContext)

class pystencils.backend.transformations.EliminateBranches(ctx, use_isl=True)#

Replace conditional branches by their then- or else-branch if their condition can be unequivocally evaluated.

This pass will attempt to evaluate branch conditions within their context in the AST, and replace conditionals by either their then- or their else-block if the branch is unequivocal.

If islpy is installed, this pass will incorporate information about the iteration regions of enclosing loops and enclosing conditionals into its analysis.

Parameters:
__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

Code Rewriting#

pystencils.backend.transformations.substitute_symbols(node, subs)#

Substitute expressions for symbols throughout a subtree.

Return type:

PsAstNode

Parameters:
pystencils.backend.transformations.collapse_blocks(node)#

Collapse trivially nested blocks to improve readability.

Blocks that just have another block as their single child are collapsed.

Return type:

PsStructuralNode

Parameters:

node (PsStructuralNode)

Code Motion#

class pystencils.backend.transformations.HoistIterationInvariantDeclarations(ctx)#

Hoist loop-invariant declarations out of the loop nest.

This transformation moves loop-invariant symbol declarations outside of the loop nest to prevent their repeated execution within the loops. If this transformation results in the complete elimination of a loop body, the respective loop is removed.

HoistIterationInvariantDeclarations assumes that symbols are canonical; in particular, each symbol may have at most one declaration. To ensure this, a CanonicalizeSymbols pass should be run before HoistIterationInvariantDeclarations.

HoistIterationInvariantDeclarations assumes that all PsMathFunction s are pure (have no side effects), but makes no such assumption about instances of CFunction.

Parameters:

ctx (KernelCreationContext)

__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

Axis and Loop Transformations#

class pystencils.backend.transformations.AxisExpansion(ctx)#
Parameters:

ctx (KernelCreationContext)

loop(coordinate=None)#

Expand one dimension fully as a loop.

Return type:

ExpansionFunc

Parameters:

coordinate (int | None)

parallel_loop(coordinate=None, *, num_threads=None, schedule=None, collapse=None)#

Expand one dimension fully as a parallel loop.

Parameters:
  • coordinate (Optional[int]) – Which dimension to expand. If None, the first dimension is used

  • num_threads (Optional[int]) – OpenMP num_threads clause

  • schedule (Optional[str]) – OpenMP schedule clause

  • collapse (Optional[int]) – OpenMP collapse clause

Return type:

ExpansionFunc

block_loop(block_size, coordinate=None, *, assume_divisible=False)#

Introduce a block loop with given block size in the given dimension.

Parameters:
  • block_size (int) – Block size of the block loop

  • coordinate (Optional[int]) – Which dimension to block; if None, the first dimension is used

  • assume_divisible (bool) – If True, assume that the iteration count is divisible by the block size and optimize accordingly.

Return type:

ExpansionFunc

parallel_block_loop(block_size, coordinate=None, *, assume_divisible=False, num_threads=None, schedule=None, collapse=None)#

Introduce a parallel block loop with given block size in the given dimension.

Parameters:
  • block_size (int) – Block size of the block loop

  • coordinate (Optional[int]) – Which dimension to block; if None, the first dimension is used

  • assume_divisible (bool) – If True, assume that the iteration count is divisible by the block size and optimize accordingly.

  • num_threads (Optional[int]) – OpenMP num_threads clause

  • schedule (Optional[str]) – OpenMP schedule clause

  • collapse (Optional[int]) – OpenMP collapse clause

Return type:

ExpansionFunc

peel_for_divisibility(k, coordinate=None)#

Peel off the minimal number of iterations from the back of one dimension such that the number of iterations in the bulk part is divisible by k.

Return type:

ExpansionFunc

Parameters:
  • k (int)

  • coordinate (int | None)

simd(lanes)#

Apply to a cube with only one coordinate to convert it to a vectorized block

Return type:

ExpansionFunc

Parameters:

lanes (int)

gpu_block(dim, coordinate=None)#

Map one cube coordinate onto the GPU block index in the given grid dimension.

Parameters:
Return type:

ExpansionFunc

gpu_thread(dim, coordinate=None)#

Map one cube coordinate onto the GPU thread index in the given grid dimension.

Parameters:
Return type:

ExpansionFunc

gpu_block_x_thread(dim, coordinate=None)#

Map one cube coordinate onto the product of GPU block and thread index in the given grid dimension.

Parameters:
Return type:

ExpansionFunc

class pystencils.backend.transformations.MaterializeAxes(ctx)#

Materialize iteration axes.

This transformer converts all iteration axis in a given AST to their lower-level implementation. It introduces loops for loop axes, OpenMP constructs for parallel loops, applies vectorization in SIMD axes and constructs GPU index translations for GPU axes.

The axis materializer furthermore introduces declarations and agglomeration of modulo variables for reductions occuring in the kernel.

Parameters:

ctx (KernelCreationContext)

class pystencils.backend.transformations.ReshapeLoops(ctx)#

Various transformations for reshaping loop nests.

Parameters:

ctx (KernelCreationContext)

peel_loop_front(loop, num_iterations, omit_range_check=False)#

Peel off iterations from the front of a loop.

Removes num_iterations from the front of the given loop and returns them as a sequence of independent blocks.

Parameters:
  • loop (PsLoop) – The loop node from which to peel iterations

  • num_iterations (int) – The number of iterations to peel off

  • omit_range_check (bool) – If set to True, assume that the peeled-off iterations will always be executed, and omit their enclosing conditional.

Return type:

tuple[Sequence[PsBlock], PsLoop]

Returns:

Tuple containing the peeled-off iterations as a sequence of blocks, and the remaining loop.

peel_loop_back(loop, num_iterations, omit_range_check=False)#

Peel off iterations from the back of a loop.

Removes num_iterations from the back of the given loop and returns them as a sequence of independent blocks.

Parameters:
  • loop (PsLoop) – The loop node from which to peel iterations

  • num_iterations (int) – The number of iterations to peel off

  • omit_range_check (bool) – If set to True, assume that the peeled-off iterations will always be executed, and omit their enclosing conditional.

Return type:

tuple[PsLoop, Sequence[PsBlock]]

Returns:

Tuple containing the modified loop and the peeled-off iterations (sequence of blocks).

cut_loop(loop, cutting_points)#

Cut a loop at the given cutting points.

Cut the given loop at the iterations specified by the given cutting points, producing n new subtrees representing the iterations (loop.start:cutting_points[0]), (cutting_points[0]:cutting_points[1]), ..., (cutting_points[-1]:loop.stop).

Resulting subtrees representing zero iterations are dropped; subtrees representing exactly one iteration are returned without the trivial loop structure.

Currently, cut_loop performs no checks to ensure that the given cutting points are in fact inside the loop’s iteration range.

Return type:

Sequence[PsLoop | PsBlock]

Returns:

Sequence of n subtrees representing the respective iteration ranges

Parameters:
class pystencils.backend.transformations.InsertPragmasAtLoops(ctx, insertions)#

Insert pragmas before loops in a loop nest.

This transformation augments the AST with pragma directives which are prepended to loops. The directives are annotated with the nesting depth of the loops they should be added to, where -1 indicates the innermost loop.

The relative order of pragmas with the (exact) same nesting depth is preserved; however, no guarantees are given about the relative order of pragmas inserted at -1 and at the actual depth of the innermost loop.

Parameters:
class pystencils.backend.transformations.AddOpenMP(ctx, reductions=(), nesting_depth=0, num_threads=None, schedule=None, collapse=None, omit_parallel=False)#

Apply OpenMP directives to loop nests.

This transformation augments the AST with OpenMP pragmas according to the given configuration.

Parameters:

Vectorization#

class pystencils.backend.transformations.VectorizationAxis(counter, step=PsConstantExpr(1: <untyped>))#

Information about the iteration axis along which a subtree is being vectorized.

Parameters:
counter: PsSymbol#

Scalar iteration counter of this axis

step: PsExpression = PsConstantExpr(1: <untyped>)#

Step size of the scalar iteration

class pystencils.backend.transformations.VectorizationContext(ctx, lanes, axis, vectorized_symbols=None)#

Context information for AST vectorization.

Parameters:
property lanes: int#

Number of vector lanes

property axis: VectorizationAxis#

Iteration axis along which to vectorize

property vectorized_symbols: dict[PsSymbol, PsSymbol]#

Dictionary mapping scalar symbols that are being vectorized to their vectorized copies

property lane_mask: PsSymbol | None#

Symbol representing the current lane execution mask, or None if all lanes are active.

property induction_vars: dict[PsSymbol, Affine]#

The vectorization context’s induction variables.

get_lane_mask_expr()#

Retrieve an expression representing the current lane execution mask.

Return type:

PsExpression

vectorize_symbol(symb)#

Vectorize the given symbol of scalar type.

Creates a duplicate of the given symbol with vectorized data type, adds it to the vectorized_symbols dict, and returns the duplicate.

Raises:

VectorizationError – If the symbol’s data type was not a PsScalarType, or if the symbol was already vectorized

Return type:

PsSymbol

Parameters:

symb (PsSymbol)

vector_type(scalar_type)#

Vectorize the given scalar data type.

Raises:

VectorizationError – If the given data type was not a PsScalarType.

Return type:

PsVectorType

Parameters:

scalar_type (PsType)

axis_ctr_dependees(symbols)#

Returns all symbols in symbols that depend on the axis counter.

Return type:

set[PsSymbol]

Parameters:

symbols (set[PsSymbol])

class pystencils.backend.transformations.AstVectorizer(ctx)#

Transform a scalar subtree into a SIMD-parallel version of itself.

The AstVectorizer constructs a vectorized copy of a subtree by creating a SIMD-parallel version of each of its nodes, one at a time. It relies on information given in a VectorizationContext that defines the current environment, including the vectorization axis, the number of vector lanes, and an execution mask determining which vector lanes are active.

Memory Accesses: The AST vectorizer is capable of vectorizing PsMemAcc and PsBufferAcc only under certain circumstances:

  • If all indices are independent of both the vectorization axis’ counter and any vectorized symbols, the memory access is lane-invariant, and its result will be broadcast to all vector lanes.

  • If at most one index depends on the axis counter via an affine expression, and does not depend on any vectorized symbols, the memory access can be performed in parallel, either contiguously or strided, and is replaced by a PsVecMemAcc.

  • All other cases cause vectorization to fail.

Legality: The AST vectorizer performs no legality checks and in particular assumes the absence of loop-carried dependencies; i.e. all iterations of the vectorized subtree must already be independent of each other, and insensitive to execution order.

Result and Failures: The AST vectorizer does not alter the original subtree, but constructs and returns a copy of it. Any symbols declared within the subtree are therein replaced by canonically renamed, vectorized copies of themselves.

If the AST vectorizer is unable to transform a subtree, it raises a VectorizationError.

Parameters:

ctx (KernelCreationContext)

visit(node, vc)#

Vectorize a subtree.

Return type:

PsAstNode

Parameters:
visit_expr(expr, vc)#

Vectorize an expression.

Return type:

PsExpression

Parameters:

Code Lowering and Materialization#

class pystencils.backend.transformations.ReductionsToMemory(ctx, reductions)#

Introduce IR nodes for performing reductions to memory.

This transformer takes a block and adds to it the declarations and write-back IR functions for the given list of reductions. Modulo variable declarations are prepended, and write-back logic is appended to the end of the block.

Parameters:
class pystencils.backend.transformations.LowerToC(ctx)#

Lower high-level IR constructs to C language concepts.

This pass will replace a number of IR constructs that have no direct counterpart in the C language to lower-level AST nodes. These include:

  • Linearization of Buffer Accesses: PsBufferAcc buffer accesses are linearized according to their buffers’ stride information and replaced by PsMemAcc.

  • Erasure of Anonymous Structs: For buffers whose element type is an anonymous struct, the struct type is erased from the base pointer, making it a pointer to uint8_t. Member lookups on accesses into these buffers are then transformed using type casts.

Parameters:

ctx (KernelCreationContext)

__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

class pystencils.backend.transformations.SelectFunctions(platform)#

Traverse the AST to replace all instances of PsMathFunction by their implementation provided by the given Platform.

Parameters:

platform (Platform)

__call__(node)#

Call self as a function.

Return type:

PsAstNode

Parameters:

node (PsAstNode)

class pystencils.backend.transformations.SelectIntrinsics(ctx, use_builtin_convertvector=False)#

Lower IR vector types to intrinsic vector types, and IR vector operations to intrinsic vector operations.

Implementations of this transformation will replace all vectorial IR elements by conforming implementations using compiler intrinsics for the given execution platform.

A subclass implementing this visitor’s abstract methods must be set up for each vector CPU platform.

Parameters:
  • ctx (KernelCreationContext) – The current kernel creation context

  • use_builtin_convertvector (bool) – If True, type conversions between SIMD vectors use the compiler builtin __builtin_convertvector instead of instrinsics. It is supported by Clang >= 3.7, GCC >= 9.1, and ICX. Not supported by ICC or MSVC. Activate if you need type conversions not natively supported by your CPU, e.g. conversion from 64bit integer to double on an x86 AVX machine. Defaults to False.

Raises:

MaterializationError – If a vector type or operation cannot be represented by intrinsics on the given platform

abstract type_intrinsic(vector_type, sc)#

Return the intrinsic vector type for the given generic vector type, or raise a MaterializationError if type is not supported.

Return type:

PsCustomType

Parameters:
abstract constant_intrinsic(c, sc)#

Return an expression that initializes a constant vector, or raise a MaterializationError if not supported.

Return type:

PsExpression

Parameters:
abstract op_intrinsic(expr, operands, sc)#

Return an expression intrinsically invoking the given operation or raise a MaterializationError if not supported.

Return type:

PsExpression

Parameters:
abstract math_func_intrinsic(expr, operands, sc)#

Return an expression intrinsically invoking the given mathematical function or raise a MaterializationError if not supported.

Return type:

PsExpression

Parameters:
abstract vector_load(acc, sc)#

Return an expression intrinsically performing a vector load, or raise a MaterializationError if not supported.

Return type:

PsExpression

Parameters:
abstract vector_store(acc, arg, sc)#

Return an expression intrinsically performing a vector store, or raise a MaterializationError if not supported.

Return type:

PsExpression

Parameters: