Writing code that writes itself

Executing code

We write code to make computers do things. For example, let’s define the variable A to be a sequence of numbers from -1 to 1, in increments of 0.5. Then we write

A <- seq(-1, 1, 0.5)
A

## [1] -1.0 -0.5  0.0  0.5  1.0

Of course, this is very simple. You can write arbitrarily complex code if that rocks your boat.

Executing code from a string

In R, there is another way to write and run code. We can write out the code we want as a string (a bunch of letters and symbols enclosed by quotation marks) and evaluate the code in that string. (It’s possible a similar feature exists in many other programming languages; I simply don’t know them that well.) We can reproduce the example from above in this way.

code <- "B <- seq(-1, 1, 0.5)"
eval(parse(text = code))
B

## [1] -1.0 -0.5  0.0  0.5  1.0

Code that writes itself

We can exploit the trick of executing code from a string to write “code that writes itself”.

However, at this juncture, it’s important to note that the title of this post is slightly misleading. Our code can only write itself if we take a very narrow view of that does and does not mean. To be clear, we don’t produce self-writing code in the sense of stating an intention in natural language which is then converted to a programming language. Instead, we mean ‘code that writes itself’ in the sense that we construct the code we want to run in an automated way (that is, we don’t type out all of our commands explicitly).

An example

The trick of writing code as strings has been exceedingly helpful for me. One example that comes to mind has an application in experimental design: constructing factorial designs.

A factorial design is a common experimental setup for many controlled experiments. In this kind of setup, we have some variables, say \(X_1\) and \(X_2\), which we believe may have an effect on an experimental outcome, say \(Y\). We can efficiently measure the effect of both \(X_1\) and \(X_2\) on \(Y\) by taking all possible combinations of common levels, e.g. the minimum and maximum, of \(X_1\) and \(X_2\) and measure \(Y\) at these combinations of our variables of interest. It is standard in experimental design to code our experimental variables such that their minimum value is \(-1\) and their maximum is \(+1\). The notation for these designs is often written and spoken of as an ‘\(n^k\) factorial design’, where \(n\) is the number of levels per variable (which are typically distributed evenly between \(-1\) and \(+1\)) and \(k\) is the number of variables.

In R, a convenient way to construct a factorial design is to use expand.grid(). For example, we can construct a \(n = 2\) and \(k = 2\) factorial design with

design <- expand.grid(X1 = c(-1, 1), X2 = c(-1, 1))
design

##   X1 X2
## 1 -1 -1
## 2  1 -1
## 3 -1  1
## 4  1  1

However, for general \(k\) this is a problem because expand.grid() requires you to specify all variables at once. It cannot be called recursively. That is, expand.grid(X1 = c(-1, 1), X2 = c(-1, 1)) does not equal expand.grid(expand.grid(X1 = c(-1, 1)), X2 = c(-1, 1)), so building up a design for a user-specified number of variables can be challenging. This is where constructing code as a string really helps. We can build up an expand.grid command as a string so that it includes the necessary number of variables before we execute it. This is done in the function below:

factorial_design <- function(
  n, # number of levels
  k, # number of variables
  replicates = 1 # times the design is replicated
){
  
  # Set up levels of variables
  levels <- "seq(-1, 1, length.out = n)"
  
  # Set up expand.grid
  code <- "expand.grid(" # a string
  
  # Fill up the expand.grid structure
  for(i in 1:k){
    code <- paste0(code, "X", i, " = ", levels)
    # for i = 1, this looks like 
    # X1 = seq(-1, 1, length.out = n)
    if(i != k) code <- paste0(code, ", ") # need commas
  }
  
  # Book-end code with right-bracket
  code <- paste0(code, ")")
  
  # Evaluate the code with eval(parse(text = ...))
  result <- eval(parse(text = code))
  
  # Replicate if necessary
  if(replicates > 1){
    one_set_of_runs <- result
    for(j in 2:replicates){
      result <- rbind(result, one_set_of_runs)
    }
  }
  
  # Return
  return(result)
  
}

This gives us a nice general function for constructing factorial designs. Have a look! I’m sure there are more applications where this trick is useful. I hope it serves you well.

factorial_design(2, 2)

##   X1 X2
## 1 -1 -1
## 2  1 -1
## 3 -1  1
## 4  1  1

factorial_design(2, 3)

##   X1 X2 X3
## 1 -1 -1 -1
## 2  1 -1 -1
## 3 -1  1 -1
## 4  1  1 -1
## 5 -1 -1  1
## 6  1 -1  1
## 7 -1  1  1
## 8  1  1  1

factorial_design(3, 2)

##   X1 X2
## 1 -1 -1
## 2  0 -1
## 3  1 -1
## 4 -1  0
## 5  0  0
## 6  1  0
## 7 -1  1
## 8  0  1
## 9  1  1

factorial_design(3, 3)

##    X1 X2 X3
## 1  -1 -1 -1
## 2   0 -1 -1
## 3   1 -1 -1
## 4  -1  0 -1
## 5   0  0 -1
## 6   1  0 -1
## 7  -1  1 -1
## 8   0  1 -1
## 9   1  1 -1
## 10 -1 -1  0
## 11  0 -1  0
## 12  1 -1  0
## 13 -1  0  0
## 14  0  0  0
## 15  1  0  0
## 16 -1  1  0
## 17  0  1  0
## 18  1  1  0
## 19 -1 -1  1
## 20  0 -1  1
## 21  1 -1  1
## 22 -1  0  1
## 23  0  0  1
## 24  1  0  1
## 25 -1  1  1
## 26  0  1  1
## 27  1  1  1

Conclusion

R gives you the ability to write out your code as a string and execute it using eval(parse(text = ...)). For some, this is just a novelty. For others, this fact can simplify their lives greatly.