How to Add a New Size Processor

This guide first briefly describes the size processing system in NIMBLE. It then describes the process of adding a new size processor and lists some commonly used size processors. Note that NIMBLE’s size processing functionality can largely be found in genCpp_sizeProcessing.R.

Size Processing Overview

The size processing step in NIMBLE proceeds by traveling recursively through the syntax tree, annotating each node with dimensionality, type, size expressions, and eigenizability (“yes”, “no”, or “maybe” for conversion to C++ Eigen package code). It also generates run-time size checks, inserts intermediate variables, and populates the local symbol table with locally created objects.

Nodes of the syntax tree are represented as exprClass objects. Each node of the syntax tree (as given by the code argument to a size processor) should exit the size processor with the following fields set:

code$nDim: the dimension of the object. Must be an integer.
code$type: the type of the object, e.g. “double”
code$sizeExprs: usually, a list of length nDim, where entries are R expressions (not exprClass objects) for the size of the object in that dimension. List entries can be constants, e.g. sizeExprs[[1]] = 5 if it is known that the first dimension of this object will have length 5. Alternatively, a sizeExprs list entry could be a generic expression, e.g. sizeExprs[[1]] <- quote(dim(x)[1]), ensuring that the first dimension of this object will be the same as the size of the first dimension of the x object.
- Generic sizeExprs in some instances should be set using one of two functions:
  - The productSizeExprs()function should be used if the sizeExprs for the current node should be a single dimension, the size of which is the product of the dimensions of the sizeExprs of another node
  - The makeSizeExpressions() function (located in genCpp_initSizes.R) should be used when sizeExprs for a node are a combination of constants and expressions
- An exception to this is that for a syntax tree node representing a nimbleList or a nimbleFunction, code$sizeExprs will contain the symbolTable entry of the corresponding nimbleList or nimbleFunction. This alows information from these objects (e.g. the elements of a nimbleList) to be easily accessed at other parts of size processing.
code$toEigenize: either yes, no, or maybe. Indicates whether to convert node to a type from the C++ Eigen package.
code$name: the name of the node being size processed. For a function call, say foo(x), code$name will be "foo". Typically the name is already set and does not need to be modified, but sometimes a size processor will also split out different cases for C++ by changing the name. E.g. values can be changed to setValues.

Two useful fields that should only be modified with care are:

code$caller : The exprClass of the call in which the current call is nested.
code$callerArgID : the integer index of which argument the current call is to its caller. It should be that identical(code, code$caller$args[[ code$callerArgID ]]) is TRUE.

Note that some or all of these fields may be set by other functions called within a size processor (e.g. makeSizeExpressions()), and as such do not necessarily need to be explicitly set within the processor itself. (I’m not sure what this statement means. Can you clarify it?)

Typically the first step in a size processor is to recurse on its arguments (see below), which means that the arguments can be counted on to have the above fields set. e.g. code$args[[1]]$nDim will be the number of dimensions of the first argument (or whatever is returned by it).

Size processors collect additional expressions, called asserts, that are later inserted into the syntax tree and become lines of code before or after the line being processed. asserts may be collected for each line of code that is size processed, and the lines of code generated by the asserts can be inserted either before (by default) or after (if wrapped in after()) the expression they are assert-ed from. asserts are frequently run-time size checks, but can also be used to create intermediate variables, among other uses. For an example of generating asserts that go both before and after a line of code, see the sizeasDoublePtr() size processor.

Adding a New Size Processor

Any new size processing function must take the following arguments:
- code: the expression class object representing the node of the syntax tree
- symTab: the symbol table for the nimbleFunction method that is currently being size processed
- typeEnv: an environment originally designed for size expressions for objects known from initialization. Now it additionally is used to store some flags set in one step needed to be seen in another step. See additional information at top of genCpp_initSizes.R.
Any new size processor should set the code fields described in Size Processing Overview
If another size processor is called from within the new size processor (e.g. sizeInsertIntermediate() is commonly called from within a size processor), be sure to collect the asserts returned by that size processor. The asserts are a list, so they can just be concatenated, e.g., asserts <- c(asserts, recursiveCallOfSomeKind()). Size processing functions can also create their own asserts expressions. Any asserts either created or collected within a size processor should be returned from that size processor. If it is known that no asserts exist at the end of a size processor, an empty list() should be returned.
Any new size processing function must be added to the sizeCalls list, located at the top of genCpp_sizeProcessing.R. The entry should be of the form fxnName = 'sizeProcessorName', where fxnName is the name of the DSL function to be size processed, and 'sizeProcessorName' is a character string naming the new size processing function.
A size processor for a special-case function foo would typically be called sizeFoo. Some size processors are used for groups of functions. E.g., sizeBinaryCwise is for component-wise binary operations, such as A + B.

Tips, tricks, and useful size processors

exprClasses_setSizes(): This is the entry point for any size processing. In some cases it is called exlicitly for recursion, but usually recursion is done via a call to recurseSetSizes. See note above the exprClasses_setSizes() function in genCpp_sizeProcessing.R for more information.
sizeInsertIntermediate(): Used to lift an expression, creating an intermediate variable. Useful especially in situations where a part of a line of code needs to be eigenized. E.g., if we have foo(bar(x), z) and the size processor determines that bar(x) needs to be evaluated outside of the foo expression, using asserts <- c(asserts, sizeInsertIntermediate(code, 1, symTab, typeEnv)) will result in Interm32 <- bar(x) as an assertion (where 32 is an arbitrary unique integer) and foo(Interm32, z) as the current code. Note that bar(x) should already have been processed (by recursion) prior to lifting it to an intermediate.
sizeAssignAfterRecursing(): The main size processor for assignment operations. Called after the right hand side of an assignment operation has been annotated. This processor considers all valid combinations of left hand and right hand side types for an assignment operations, and as such is rather long.
recurseSetSizes(code, symTab, typeEnv): Recursively annotates arguments of a node in the syntax tree. Should generally be called at the beginning of a size processor if that node’s arguments may need processing. If not all arguments should be recursed into, the optional fourth argument useArgs (logical vector corresponding to arguments) can be given.
Checking on properties of an argument: Say you need to know the number of dimensions of the 2nd argument. You can’t just assume code$args[[2]]$nDim is valid, because code$args[[2]] could be a constant (like 42). So typically we’d use something like:

arg2nDim <- if(inherits(code$args[[2]], 'exprClass')) code$args[[2]]$nDim else 0

A self-lift: Say foo(x) knows that it can never appear in any expression except assignment. E.g. y <- foo(x) is ok but w <- foo(x)$q + 3 is not ok. The size processor for foo can check if code$caller %in% assignmentOperators and if not it can use sizeInsertIntermediate to lift itself out of its caller expression. An example is in generalFunSizeHandler.
What happens to an RCfunction or a member function of another nimbleFunction? Near the end of exprClasses_setSizes, RCfunctions not in sizeCalls are found in the user environment and added to neededRCfuns. Then the generalFunSizeHandler is called. This does not check types, but it lifts any argument expressions and sometimes lifts itself out of other expressions.
What happens to member data and methods of other nimbleFunctions or nimbleLists? By the time we get to size processing, nf$a appears as NFvar(nf, 'a') and is handled by sizeNFvar. Similarly, nf$foo(x) becomes nfMethod(nf, 'foo')(x) which is handled by sizeChainedCall.