Binding Values to Symbol

When R tries to bind a value to a symbol, it searches through a series of environments to find the appropriate value.

When you are working on the command line and need to retrieve the value of an R object, the order in which things occur is roughly:

  1. Search the global environment (i.e. your workspace) for a symbol name matching the one requested.
  2. Search the namespaces of each of the packages on the search list

The search list can be found by using the search() function.

search()
##  [1] ".GlobalEnv"        "package:stats"     "package:graphics" 
##  [4] "package:grDevices" "package:utils"     "package:datasets" 
##  [7] ".env"              "package:methods"   "Autoloads"        
## [10] "package:base"

The global environment or the user’s workspace is always the first element of the search list and the base package is always the last. For better or for worse, the order of the packages on the search list matters, particularly if there are multiple objects with the same name in different packages.

When a user loads a package with library() the namespace of that package gets put in position 2 of the search list (by default) and everything else gets shifted down the list.

Scoping Rules

Related to the scoping rules is how R uses the search list to bind a value to a symbol.

Consider the following function:

f <- function(x, y) {
  x^2 + y / z
}

This function has 2 formal arguments x and y. In the body of the function there is another symbol, z. In this case z is called a free variable.

The scoping rules of a language determine how values are assigned to free variables. Free variables are not formal arguments and are not local variables (assigned insided the function body).

R uses lexical scoping = static scoping. An alternative to lexical scoping is dynamic scoping which is implemented by some languages. Lexical scoping turns out to be particularly useful for simplifying statistical computations.

Lexical scoping in R means that:

the values of free variables are searched for in the environment in which the function was defined.

A function, together with its enclosing environment (the environment where the function was defined, aka enclosure), makes up what is called a closure or function closure.

The function closure model can be used to create functions that “carry around” data with them.

How do we associate a value to a free variable? There is a search process that occurs that goes as follows:

Lexical scoping importance

in R you can have functions defined inside other functions. Now things get interesting: in this case the environment in which a function is defined is the body of another function!

make.power <- function(n) {
  pow <- function(x) {
    x^n
  }
  pow
}
cube <- make.power(3)
square <- make.power(2)
cube(3)
## [1] 27
square(3)
## [1] 9
cube
## function(x) {
##     x^n
##   }
## <environment: 0x0000000016acc648>

Notice that cube() has a free variable n. What is the value of n here? Well, its value is taken from the environment where the function was defined. When I defined the cube() function it was when I called make.power(3), so the value of n at that time was 3.

We can explore the environment of a function to see what objects are there and their values.

ls(environment(cube))
## [1] "n"   "pow"
get("n", environment(cube))
## [1] 3
ls(environment(square))
## [1] "n"   "pow"
get("n", environment(square))
## [1] 2

Example

y <- 10

f <- function(x) {
  y <- 2
  y^2 + g(x)
}

g <- function(x) {
  x*y
}
f(3)
## [1] 34

With lexical scoping the value of y in the function g is looked up in the environment in which the function was defined, in this case the global environment, so the value of y is 10.

With dynamic scoping, the value of y is looked up in the environment from which the function was called (sometimes referred to as the calling environment). In R the calling environment is known as the parent frame.

In this case, the value of y would be 2.

When a function is defined in the global environment and is subsequently called from the globalenvironment, then the defining environment and the calling environment are the same. This can sometimes give the appearance of dynamic scoping.

Application: Optimization

## Kinda “constructor” function that creates a negative log-likelihood
# function that can be minimized to find maximum likelihood estimates in a
# statistical model.
make.NegLogLik <- function(data, fixed = c(FALSE, FALSE)) {
  
  params <- fixed
  
  function(p) {
    params[!fixed] <- p
    mu <- params[1]
    sigma <- params[2]
    
    ## Calculate the Normal density
    a <- -0.5*length(data)*log(2*pi*sigma^2)
    b <- -0.5*sum((data-mu)^2) / (sigma^2)
    
    # Optimization functions in R minimize functions, so we need to use 
    # the _negative_ loglikelihood.
    -(a + b)
  }
}

## Generate some data
set.seed(1)
normals <- rnorm(100, 1, 2)

## Construct negative log-likelihood
nLL <- make.NegLogLik(normals)
nLL
## function(p) {
##     params[!fixed] <- p
##     mu <- params[1]
##     sigma <- params[2]
##     
##     ## Calculate the Normal density
##     a <- -0.5*length(data)*log(2*pi*sigma^2)
##     b <- -0.5*sum((data-mu)^2) / (sigma^2)
##     
##     # Optimization functions in R minimize functions, so we need to use 
##     # the _negative_ loglikelihood.
##     -(a + b)
##   }
## <environment: 0x0000000016b56dc0>
## What's in the function environment?
ls(environment(nLL))
## [1] "data"   "fixed"  "params"
ls.str(environment(nLL))
## data :  num [1:100] -0.253 1.367 -0.671 4.191 1.659 ...
## fixed :  logi [1:2] FALSE FALSE
## params :  logi [1:2] FALSE FALSE

Now that we have our nLL() function, we can try to minimize it with optim() to estimate the parameters:

optim(c(mu = 0, sigma = 1), nLL)$par
##       mu    sigma 
## 1.218239 1.787343

We can also try to estimate one parameter while holding another parameter fixed. Here we fix sigma to be equal to 2.

## Because we now have a one-dimensional problem, we can use the simpler
## optimize() function rather than optim().
nLL <- make.NegLogLik(normals, c(FALSE, 2))
optimize(nLL, c(-1, 3))$minimum
## [1] 1.217775

We can also try to estimate sigma while holding mu fixed at 1.

nLL <- make.NegLogLik(normals, c(1, FALSE))
optimize(nLL, c(1e-6, 10))$minimum
## [1] 1.800596

Another nice feature that you can take advantage of is plotting the negative log-likelihood to see how peaked or flat it is. Here is the function when mu is fixed.

## Fix 'mu' to be equal to 1
nLL <- make.NegLogLik(normals, c(1, FALSE))
x <- seq(1.7, 1.9, len = 100)

## Evaluate 'nLL()' at every point in 'x'
y <- sapply(x, nLL)
plot(x, exp(-(y - min(y))), type = "l")

And ere is the function when sigma is fixed.

## Fix 'sigma' to be equal to 2
nLL <- make.NegLogLik(normals, c(FALSE, 2))
x <- seq(0.5, 1.5, len = 100)

## Evaluate 'nLL()' at every point in 'x'
y <- sapply(x, nLL)
plot(x, exp(-(y - min(y))), type = "l")