‘Hidden’ Variables and R Environments

There are a number of ways of handling objects in R - the S3 and S4 systems, and now the new reference Class approach. Here I will outline a slightly simpler approach - it isn’t actually an object system, but for some approaches, such as a turtle graphics package, it can work quite effectively. The package users do not really see any of the ‘tricks’ being done here, but hopefully anyone working through this might learn some practical ideas abour R environments.

Definition

An environment, in R, can be thought of as a list of variables and their values. I’m not sure if this is how it is achieved in practice, but it helps me to think of it as a look-up table - for example if a variable x appears in an expression, then the R interpreter refers to the entry for x in the appropriate look-up table. From this, it retrieves the value for x - basically some kind of R entity, and substitutes this for x in the expression. If there is no entry for x in the table - or in any of the other possible environents - an error is flagged.

Which environment R uses depends on context - if you are typing into the R command line, the environment used is called the global environment. When a function is called a new environment especailly for this function is created automatically - and is destroyed on leaving the function. This is the default environment for any variables created during the execution of the function. Finally, it is worth noting that a particular variable name can apprear in more than one environment - and so if R tries to find the value of a variable, and the variable name appears in more than one environment, the rules governing which environment R will search determine the value that will be found.

Environments in practice

Try typing the following at the R command line. The variable test.var appears at the command line and again in the function definition of foo.

test.var <- 8
foo <- function(x) {
  test.var <- x
  return(test.var*test.var)
}
foo(5)

## [1] 25

test.var

## [1] 8

The initial assignment of test.var places an entry in the global environment. The function foo is then defined - and this is also placed in the global environment (in R, functions are assigned to variable names just like other objects - this also means that functions are associated with environments, and that there could be functions with the same name defined in more than one environment). The next line actually calls foo with an argument of \(5\). The result (\(25\)) is printed out.

When foo was being executed, a variable called test.var was created in the environment that was created when foo was called (environments created in this way are called local environments). This was used internally during the execution of the function, before a value was returned. Once the function finished executing and returned the result, the local environment was deleted along with all of the variables contained in it. Thus, the local environment variable test.var was different to test.var in the global environment - and the latter was unaffected by the execution of foo. The final line typed in verifies this - it still contains the value assigned to it before foo was called - i.e. \(8\).

This arrangement is pretty sensible. If foo had assigned something to the global environment version of test.var then calling the function would have an unseen side effect. A person who didn’t write the function, and was using it without being aware of its content, might unknowingly overwrite a variable they were using for something else. Indeed most programming languages have some kind of ‘local variable’ arrangement for functions that is similar to this.

Global Variables in Functions

Although functions by default refer to the local environment, there are some situations where they refer to other environments. For example:

bar <- function(x) {
  return(x + test.var)
}
bar(3)

## [1] 11

What happens here is that, as before, the function bar sets up a local environment when it is called, but then, on execution, a variable test.var is encountered that isn’t defined in this environment. The function then searches the parent environment - this is essentially the environment in which the function definition took place. In this case as bar was defined at the command line, the parent environment is just the global environment. On searching this, the command line assignment of test.var is found, and used here - leading to the value 11 being returned.

It is a good idea to avoid this kind of situation - again because of unseen effects. In this case the problem is that a command line alteration of test.var will have a side effect on the behavior of bar:

test.var <- 1:6
bar(3)

## [1] 4 5 6 7 8 9

This not only changed the value of a function when given the same input as earlier, it even changed the ‘shape’ of the answer from a single value to a list of values.

Another way to access the parent environment is via the <<- operator. Generally in a function, if an assignment occurs to a new variable using <- (such as my.var <- 5) the new variable is created in the local environment. However if the assignment is made via <<- (such as my.var <<- 5) it will be created in the parent environment.

bad.fun <- function(x) {
  other.test.var <<- x
  return(other.test.var * 5)}
bad.fun(6)

## [1] 30

other.test.var

## [1] 6

bad.fun(4)

## [1] 20

other.test.var

## [1] 4

Again, use of this is not recommended (at least in the form shown here) - it is basically a license to overide the ‘sensible’ default local variable system and cause the problems of unseen effects discussed earlier.

‘Roll Your Own’ Environments

As well as the automatically managed global and local environments, R allows the manual creation of other environments. This is achieved using the new.env function:

my.env <- new.env()

This creates a new environment, and assigns a reference to it to the variable my.env. Thus, the reference to the new environment is stored in the global environment. So how can my.env be used? Probably the simplest way is with the local function. This takes two arguments - the first is an R expression, and the second is a reference to an environment. The value returned is the result of the expression - but the variables in the expression are searched for in the environment specified in the second argument, rather than in the global or local environment. If the expression creates a new variable, then this is also in the specified environment. For example, to make a simple totaliser - with a variable keeping track of a cumulative sum, a first step might be:

local(totaliser <- 0, env=my.env)

This creates a variable called totaliser in the environment you have just created. Just typing totaliser at the command line will just give an error - as there is not a variable with this name in the global environment - assuming you have not already created one.

totaliser

## Error: object 'totaliser' not found

However, you can access this variable via the local function. Unlike local environments in functions, environments made via new.env() are persistent - that is, they are nopt destroyed after the local function is evaluated. To see this, enter:

local(totaliser,env=my.env)

## [1] 0

In that expression, R accessed the totaliser variable in the envirionment whose reference was stored in my.env. The results of local is effectively passed back into the environment in which it was called - in this case the global environment if commands are being typed in at the main prompt. It can then be used in expressions in the the calling environment:

local(totaliser,env=my.env) + 8.5

## [1] 8.5

Here, the result of the R expression: here simply the current value of totaliser in the new environment` - was substituted into an expression. It is also possible to obtain a copy of the value of the variable, and transfer it to a variable in the global environment:

global.thing <- local(totaliser,env=my.env)
global.thing

## [1] 0

this puts the value of totaliser in the new environment into a new variable global.thing in the global environment. As usual in R, it is only the current value that is transferred - so that future changes to totaliser won’t alter global.thing. Thus, we have created an environment that isn’t the global environment, or a function’s local environment - and ideally we woulds like to use this to pass ’turtle-style messages between functions, as suggested above. For the basic totalising exanple, we would like to create a function called accumulate that adds a quantity to the total (stored in the totaliser variable created above). This is done below:

accumulate <- local( 
  function(x) {
    totaliser <<- totaliser + x
    return(totaliser)
  }, env=my.env)

This needs a little explanation. Firstly, the way R defines functions needs some deconstructing. The general form is something like my.fun <- function(x,y) { ... } - which looks like an assignment statement, due to the <- symbol. Effectively, it is an assignment. On the left is the function name, and on the right is the definition of what the function actually does. In fact, the right hand side is a valid stand-alone expression - describing - but not naming - an algorthm. This can be passed to other functions - such as optimise - that require a function as part of thier input. Here, the right hand side is evaluated inside of local. Generally, this doesn’t have much effect - a function definition still makes its own local environment inside the function - but this function contains a <<- operator. Recall this assigns values to variables in the parent environment. Usually, this is the global environment, but because this definition is inside the local expression, it is the environment referred to in my.env. Thus, this function adds its argument to the value of totaliser stored in my.env and returns the updated total as a value. Finally note that the name of the function is set outside of the local expression. Note that the value of this local expression is the definition part of the function - but as the assignment occurs outside of the expression, the name of the function matched with this definition is in the global environment. Thus, we have a function name registered in the global environment, but the parent environment of this function (the one referred to by <<-) is the one referrenced in my.env.

Effectively this is ‘job done’. The accumulate function has a variable totaliser that persists between calls of the function, but is not in the global environment. Just to demonstrate that this works, enter the following:

totaliser <- "Leave me alone!"
accumulate(4)

## [1] 4

accumulate(6)

## [1] 10

totaliser

## [1] "Leave me alone!"

This shows firstly that the ‘secret’ variable is capable of passing information between the calls to accumulate - and secondly that although it is called totaliser it doesn’t interfere with a variable also called totaliser (a string) that exists in the global environment. To demonstrate that the technique allows passing between different functions, we add another function definition:

deduct <- local( 
  function(x) {
    totaliser <<- totaliser - x
    return(totaliser)
  }, env=my.env)

This works in much the same way as accumulate but it takes the value of x away from totaliser:

deduct(3)

## [1] 7

and still the global totaliser is unaffected:

totaliser

## [1] "Leave me alone!"

Turtle time

The above techniques pretty much give us a toolkit suffieciently sophisticated to build a turtle graphics system, as suggested earlier. The code for this is below:

# Create a new environment
turtle.env <- new.env()

# Create the variables to be shared by the functions
local({
  x <- 0
  y <- 0
  direction <- 0
  pen <- TRUE
  d2r <- function(z) z *  pi / 180
}, env=turtle.env)

# Move the turtle forward l units in the current direction
forward <- local(
  function(l) {
    old.x <- x
    old.y <- y
    x <<- x + sin(d2r(direction)) * l
    y <<- y + cos(d2r(direction)) * l
    if (pen) lines(c(old.x,x),c(old.y,y))
}, env=turtle.env)

# Move the turtle back l units in the current direction
back <- local(
  function(l) {
    old.x <- x
    old.y <- y
    x <<- x - sin(d2r(direction)) * l
    y <<- y - cos(d2r(direction)) * l
    if (pen) lines(c(old.x,x),c(old.y,y))
}, env=turtle.env)


# Put the virtual pen up from the virtual paper (ie switch off draw mode)
pen.up <- local({
  function() pen <<- FALSE
}, env=turtle.env)

# Put the virtual pen down on the virtual paper (ie switch on draw mode)
pen.down <- local({
  function() pen <<- TRUE
}, env=turtle.env)

# Change the virtual pen status (ie up if down; down if up) 
pen.change <- local({
  function() pen <<- ! pen
}, env=turtle.env)

# Rotate the current direction d degrees clockwise
clockwise <- local({
  function(d) direction <<- direction + d
}, env=turtle.env)

# Rotate the current direction d degrees anticlockwise
anticlockwise <- local({
  function(d) direction <<- direction - d
}, env=turtle.env)

# Clear the graph 'canvas' before using turtle graphics 
reset <- local(
  function() {
    direction <<- 0
    x <<- 0
    y <<- 0
    pen <<- TRUE
    plot(c(-10,10),c(-10,10),type='n',asp=1,xlab='X',ylab='Y')
    title('Turtle Graphics Demo')
  },env=turtle.env)

The only extra things that require explaining are that in the section defining the shared variables a number of statements are grouped together in curly brackets. The R statement in a local expression can consist of a series of R statements in curly brackets - the value of the bracketed expression is the value of the last individual statement between the brackets. Here, each of the statements defines a variable stored in the environment reference stored in turtle.env. The only other thing of note is that one of these statements defines a function in the hidden environment - d2r - that converts degrees to radians. This is useful in some of the turtle functions, but it is not really necessary to add this to the gloabal environment. Since the entire function (including the name) is set up in this environment, the other functions (ie forward, back etc) can see it, but it isn’t in the global environment.

Play Time

Now a turtle graphics system has been set up, here are some examples of its use. Note that since the turtle graphics functions are now just part of the R system, they can be used in conjunction with standard R language elements, such as for loops, function definitions and so on. Here is a simple example:

reset()
lgth <- 5
for (i in 1:50) {
  forward(lgth)
  lgth <- lgth * 0.9
  clockwise(74)
}

Turtle graphics tend to be more interesting if they are used in recursive definitions. Here is an example of this…

poly <- function(x) {
    if (x == 0) return()
    for (i in 1:15) {
      pen.down()
      forward(x)
      poly(x - 1)
      pen.up()
      back(x)
      clockwise(24) 
  }
}


reset()
poly(3)

A Final ‘Trick’

The above system provides a pretty good way of having ‘hidden’ variables that won’t accidentally get changed from the command line. However, it is still possible to access and change these variables, via the local function - since the variable turtle.env is in the global environment - and contains a reference to the hidden environment. To be honest, anyone going to those lengths to access these variables is probably pretty intent on making a mess - and it is unlikely that anyone would do this by accident. But, it is possible to stop this happening. Note that new.env() basically creates a handle to a new environment - a bit like a file handle - and that turtle.env is really just a handy variable to store this reference. But note that when this variable is removed, the reference - and indeed the environment - still exist. Most important - the functions that were defined to refer to this environment will still work. Thus, removing the variable just makes it that much harder for somebody to mess things up - this can be done by adding the line

rm(turtle.env)

to the end of the definition of the turtle system. It is still possible to access the environment if you really feel you must, but it requires more system-level delving, and makes it virtually impossible to do by accident!

Conclusion

The method here is useful for creating sets of functions that can be passed on to other users - for example somebody might just load the turtle graphics system from a file without worrying about the content of the functions, but if they had variables called x, y and so on in the global environment they wouldn’t get overwritten. However, these are not fully fledged objects - the main limitation is that only one instance of a turtle graphics system can occur at a given time, as there is only one set of variables x,y,direction and so on in the ‘hidden’ environment. A proper object system (such as the new R reference classes ) allows multiple instances of objects, each with their own set of hidden variables and functions. For example, this means several turtles could be in action at once, each following thier own paths. In fact, a lot of the coding for the provision of R reference classes is written in R, and exploits the use of environments in a more complex way - and it is useful to understand something about the way they work…