In R, when want to use one/multiple functions inside another function, maybe there are two ways. An example function can be:

Method 1

make.power <- function(n) {
 pow <- function(x) {
 x^n
 }
 pow
}

Method 2

make.power2 <- function(n) {
     pow(n)
    }

pow <- function(x) {
     x^n
     }

In my opinion, the first method is a better way for the one calls function (i.e. easier for users) but more complex to write.

Questions raised

  1. Are there any functional differences between the two ways? E.g., how the function will pass variables, or the relationship between the children and parent functions, etc..

  2. Which one might be a preferred one (maybe more computational efficient or structurally clear) for R?

Review

Function components

R functions have three parts:

  • the body(), the code inside the function.

  • the formals(), the list of arguments which controls how you can call the function.

  • the environment(), the “map” of the location of the function’s variables.

When you print a function in R, it shows you these three important components. If the environment isn’t displayed, it means that the function was created in the global environment.

f <- function(x) x^2
f
## function(x) x^2
formals(f)
## $x
body(f)
## x^2
environment(f)
## <environment: R_GlobalEnv>
Primitive functions: like `sum()`, call C code directly with `.Primitive()` and contain no R code. Therefore their `formals()`, `body()`, and `environment()` are all NULL.

A list of all functions in the base package

objs <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)
funs[109:110]
## $all.equal.raw
## function (target, current, ..., check.attributes = TRUE) 
## {
##     if (!is.logical(check.attributes)) 
##         stop(gettextf("'%s' must be logical", "check.attributes"), 
##             domain = NA)
##     msg <- if (check.attributes) 
##         attr.all.equal(target, current, ...)
##     if (data.class(target) != data.class(current)) {
##         msg <- c(msg, paste0("target is ", data.class(target), 
##             ", current is ", data.class(current)))
##         return(msg)
##     }
##     lt <- length(target)
##     lc <- length(current)
##     if (lt != lc) {
##         if (!is.null(msg)) 
##             msg <- msg[-grep("\\bLengths\\b", msg)]
##         msg <- c(msg, paste0("Lengths (", lt, ", ", lc, ") differ (comparison on first ", 
##             ll <- min(lt, lc), " components)"))
##         ll <- seq_len(ll)
##         target <- target[ll]
##         current <- current[ll]
##     }
##     nas <- is.na(target)
##     nasc <- is.na(current)
##     if (any(nas != nasc)) {
##         msg <- c(msg, paste("'is.NA' value mismatch:", sum(nasc), 
##             "in current", sum(nas), "in target"))
##         return(msg)
##     }
##     ne <- !nas & (target != current)
##     if (!any(ne) && is.null(msg)) 
##         TRUE
##     else if (sum(ne) == 1L) 
##         c(msg, paste("1 element mismatch"))
##     else if (sum(ne) > 1L) 
##         c(msg, paste(sum(ne), "element mismatches"))
##     else msg
## }
## <bytecode: 0x0000000013397208>
## <environment: namespace:base>
## 
## $all.names
## function (expr, functions = TRUE, max.names = -1L, unique = FALSE) 
## .Internal(all.names(expr, functions, max.names, unique))
## <bytecode: 0x00000000133761c8>
## <environment: namespace:base>

Lexical scoping

Scoping is the set of rules that govern how R looks up the value of a symbol.

R has two types of scoping:

  • lexical scoping, implemented automatically at the language level, and
  • dynamic scoping, used in select functions to save typing during interactive analysis.

There are four basic principles behind R’s implementation of lexical scoping:

  • name masking
  • functions vs. variables
  • a fresh start: every time a function is called, a new environment is created to host execution. A function has no way to tell what happened the last time it was run; each invocation is completely independent.
  • dynamic lookup: Lexical scoping determines where to look for values, not when to look for them. R looks for values when the function is run, not when it’s created. This means that the output of a function can be different depending on objects outside its environment.

Name masking

Look up the output with single function or function nested in a function

f <- function() {
  x <- 1
  y <- 2
  c(x, y)
}
f()
## [1] 1 2

If a name isn’t defined inside a function, R will look one level up.

x <- 2
g <- function() {
  y <- 1
  c(x, y)
}
g()
## [1] 2 1

The same rules apply if a function is defined inside another function

x <- 1
h <- function() {
  y <- 2
  i <- function() {
    z <- 3
    c(x, y, z)
  }
  i()
}
h()
## [1] 1 2 3

Or

j <- function(x) {
  y <- 2
  function() {
    c(x, y)
  }
}
k <- j(1)
k()
## [1] 1 2

When creating a function inside another function, the enclosing environment of the child function is the execution environment of the parent, and the execution environment is no longer ephemeral (transitory).

Infix functions

Most functions in R are “prefix” operators: the name of the function comes before the arguments. You can also create infix functions where the function name comes in between its arguments, like + or -. All user-created infix functions must start and end with %. R comes with the following infix functions predefined: %%, %*%, %/%, %in%, %o%, %x%. (The complete list of built-in infix operators that don’t need % is: :, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, <<-)

`%+%` <- function(a, b) paste0(a, b)
"new" %+% " string"
## [1] "new string"

Replacement functions

Replacement functions act like they modify their arguments in place, and have the special name xxx<-. They typically have two arguments (x and value), although they can have more, and they must return the modified object. For example, the following function allows you to modify the second element of a vector

`second<-` <- function(x, value) {
  x[2] <- value
  x
}
x <- 1:10
second(x) <- 5L
x
##  [1]  1  5  3  4  5  6  7  8  9 10

Return values and On exit

It’s good style to reserve the use of an explicit return() for when you are returning early, such as for an error, or a simple case of the function.

f <- function(x, y) {
  if (!x) return(y)

  # complicated processing here
}
in_dir <- function(dir, code) {
  old <- setwd(dir)
  on.exit(setwd(old))

  force(code)
}
getwd()
## [1] "C:/Users/minhh/Dropbox/MRC/TW"
in_dir("~", getwd())
## [1] "C:/Users/minhh/Documents"

Mutable state

Having variables at two levels allows you to maintain state across function invocations. This is possible because while the execution environment is refreshed every time, the enclosing environment is constant. The key to managing variables at different levels is the double arrow assignment operator (<<-).

new_counter <- function() {
  i <- 0
  function() {
    i <<- i + 1
    i
  }
}

Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
The new function is a closure, and its enclosing environment is the environment created when new_counter() is run. Ordinarily, function execution environments are temporary, but a closure maintains access to the environment in which it was created. In the example below, closures counter_one() and counter_two() each get their own enclosing environments when run, so they can maintain different counts.

counter_one <- new_counter()
counter_two <- new_counter()

counter_one()
## [1] 1
counter_one()
## [1] 2
counter_two()
## [1] 1

What happens if you don’t use a closure? What happens if you use <- instead of <<-?

i <- 0
new_counter2 <- function() {
  i <<- i + 1
  i
}
new_counter3 <- function() {
  i <- 0
  function() {
    i <- i + 1
    i
  }
}

Lists of functions

compute_mean <- list(
  base = function(x) mean(x),
  sum = function(x) sum(x) / length(x),
  manual = function(x) {
    total <- 0
    n <- length(x)
    for (i in seq_along(x)) {
      total <- total + x[i] / n
    }
    total
  }
)

Calling a function from a list is straightforward. You extract it then call it:

x <- runif(1e5)
system.time(compute_mean$base(x))
##    user  system elapsed 
##       0       0       0
system.time(compute_mean[[2]](x))
##    user  system elapsed 
##       0       0       0
system.time(compute_mean[["manual"]](x))
##    user  system elapsed 
##    0.02    0.00    0.01
lapply(compute_mean, function(f) f(x))
## $base
## [1] 0.5007404
## 
## $sum
## [1] 0.5007404
## 
## $manual
## [1] 0.5007404
call_fun <- function(f, ...) f(...)
lapply(compute_mean, call_fun, x)
## $base
## [1] 0.5007404
## 
## $sum
## [1] 0.5007404
## 
## $manual
## [1] 0.5007404
lapply(compute_mean, function(f) system.time(f(x)))
## $base
##    user  system elapsed 
##       0       0       0 
## 
## $sum
##    user  system elapsed 
##       0       0       0 
## 
## $manual
##    user  system elapsed 
##    0.01    0.00    0.02

Coming back to the questions

The main difference here is the evaluation of n. For method 1, the function that gets returned will essentially have a hard-coded n value.

n = 100
f1 = make.power(2)
f1(2)
## [1] 4
n = 1
f1(2)
## [1] 4

Method 2 will not, instead it will rely on the global definition of n

n = 1
make.power2(2)
## [1] 2
n = 100
make.power2(2)
## [1] 1.267651e+30

An applicantion

Select the wanted paragraph

para.extract<-function(inputfile, begphrase, endphrase, outputfile){
  ## read file
  txt <- readLines(inputfile)
  
  ## extract paragraph(s) defined from beginning phrase and ending phrase
  lns <- data.frame(beg=which(grepl(begphrase,txt)),
                    end=which(grepl(endphrase,txt)))
  txt.2 <- lapply(seq_along(lns$beg),function(l){
    paste(txt[seq(from=lns$beg[l], to=lns$end[l])])})
  
  ## write to file
  file<-file(outputfile)
  writeLines(txt.2[[1]], file)
  close(file)
}

para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase = "SUMMARY OF ANALYSIS", outputfile  = "ext1_input instructions.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile  = "ext2_summary of analysis.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile  = "ext3_model fit information.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile  = "ext4_model results.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile  = "ext5_alignment output.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile  = "ext6_savedata information.txt")

Can become a function nested in a function

cut.file<-function(inputfile,outputfilepath){
  para.extract<-function(inputfile, begphrase, endphrase, outputfile){
    txt <- readLines(inputfile)
    ## extract paragraph(s) defined from beginning phrase and ending phrase
    lns <- data.frame(beg=which(grepl(begphrase,txt)),
                      end=which(grepl(endphrase,txt)))
    txt.2 <- lapply(seq_along(lns$beg),function(l){
      paste(txt[seq(from=lns$beg[l], to=lns$end[l])])})
    ## write to file
    file<-file(outputfile)
    writeLines(txt.2[[1]], file)
    close(file)
  }
  
  filepath1<-paste(outputfilepath,"ext1_input instructions.txt",sep="/")
  filepath2<-paste(outputfilepath,"ext2_summary of analysis.txt",sep="/")
  filepath3<-paste(outputfilepath,"ext3_model fit information.txt",sep="/")
  filepath4<-paste(outputfilepath,"ext4_model results.txt",sep="/")
  filepath5<-paste(outputfilepath,"ext5_alignment output.txt",sep="/")
  filepath6<-paste(outputfilepath,"ext6_savedata information.txt",sep="/")
  
  ##INPUT INSTRUCTIONS
  sapply(X = inputfile,FUN=para.extract, begphrase="INPUT INSTRUCTIONS", endphrase="SUMMARY OF ANALYSIS", filepath1)
  ##SUMMARY OF ANALYSIS
  sapply(X = inputfile,FUN=para.extract, begphrase="SUMMARY OF ANALYSIS", endphrase="MODEL FIT INFORMATION", filepath2)
  ##MODEL FIT INFORMATION
  sapply(X = inputfile,FUN=para.extract, begphrase="MODEL FIT INFORMATION", endphrase="MODEL RESULTS", filepath3)
  ##MODEL RESULTS
  sapply(X = inputfile,FUN=para.extract, begphrase="MODEL RESULTS", endphrase="ALIGNMENT OUTPUT", filepath4)
  ##ALIGNMENT OUTPUT
  sapply(X = inputfile,FUN=para.extract, begphrase="ALIGNMENT OUTPUT", endphrase="SAVEDATA INFORMATION", filepath5)
  ##SAVEDATA INFORMATION
  sapply(X = inputfile,FUN=para.extract, begphrase="SAVEDATA INFORMATION", endphrase="Support: Support@StatModel.com", filepath6)
}

Reference