In R, when want to use one/multiple functions inside another function, maybe there are two ways. An example function can be:
make.power <- function(n) {
pow <- function(x) {
x^n
}
pow
}
make.power2 <- function(n) {
pow(n)
}
pow <- function(x) {
x^n
}
In my opinion, the first method is a better way for the one calls function (i.e. easier for users) but more complex to write.
Are there any functional differences between the two ways? E.g., how the function will pass variables, or the relationship between the children and parent functions, etc..
Which one might be a preferred one (maybe more computational efficient or structurally clear) for R?
R functions have three parts:
the body(), the code inside the function.
the formals(), the list of arguments which controls how you can call the function.
the environment(), the “map” of the location of the function’s variables.
When you print a function in R, it shows you these three important components. If the environment isn’t displayed, it means that the function was created in the global environment.
f <- function(x) x^2
f
## function(x) x^2
formals(f)
## $x
body(f)
## x^2
environment(f)
## <environment: R_GlobalEnv>
Primitive functions: like `sum()`, call C code directly with `.Primitive()` and contain no R code. Therefore their `formals()`, `body()`, and `environment()` are all NULL.
objs <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)
funs[109:110]
## $all.equal.raw
## function (target, current, ..., check.attributes = TRUE)
## {
## if (!is.logical(check.attributes))
## stop(gettextf("'%s' must be logical", "check.attributes"),
## domain = NA)
## msg <- if (check.attributes)
## attr.all.equal(target, current, ...)
## if (data.class(target) != data.class(current)) {
## msg <- c(msg, paste0("target is ", data.class(target),
## ", current is ", data.class(current)))
## return(msg)
## }
## lt <- length(target)
## lc <- length(current)
## if (lt != lc) {
## if (!is.null(msg))
## msg <- msg[-grep("\\bLengths\\b", msg)]
## msg <- c(msg, paste0("Lengths (", lt, ", ", lc, ") differ (comparison on first ",
## ll <- min(lt, lc), " components)"))
## ll <- seq_len(ll)
## target <- target[ll]
## current <- current[ll]
## }
## nas <- is.na(target)
## nasc <- is.na(current)
## if (any(nas != nasc)) {
## msg <- c(msg, paste("'is.NA' value mismatch:", sum(nasc),
## "in current", sum(nas), "in target"))
## return(msg)
## }
## ne <- !nas & (target != current)
## if (!any(ne) && is.null(msg))
## TRUE
## else if (sum(ne) == 1L)
## c(msg, paste("1 element mismatch"))
## else if (sum(ne) > 1L)
## c(msg, paste(sum(ne), "element mismatches"))
## else msg
## }
## <bytecode: 0x0000000013397208>
## <environment: namespace:base>
##
## $all.names
## function (expr, functions = TRUE, max.names = -1L, unique = FALSE)
## .Internal(all.names(expr, functions, max.names, unique))
## <bytecode: 0x00000000133761c8>
## <environment: namespace:base>
Scoping is the set of rules that govern how R looks up the value of a symbol.
R has two types of scoping:
There are four basic principles behind R’s implementation of lexical scoping:
Look up the output with single function or function nested in a function
f <- function() {
x <- 1
y <- 2
c(x, y)
}
f()
## [1] 1 2
If a name isn’t defined inside a function, R will look one level up.
x <- 2
g <- function() {
y <- 1
c(x, y)
}
g()
## [1] 2 1
The same rules apply if a function is defined inside another function
x <- 1
h <- function() {
y <- 2
i <- function() {
z <- 3
c(x, y, z)
}
i()
}
h()
## [1] 1 2 3
Or
j <- function(x) {
y <- 2
function() {
c(x, y)
}
}
k <- j(1)
k()
## [1] 1 2
When creating a function inside another function, the enclosing environment of the child function is the execution environment of the parent, and the execution environment is no longer ephemeral (transitory).
Most functions in R are “prefix” operators: the name of the function comes before the arguments. You can also create infix functions where the function name comes in between its arguments, like + or -. All user-created infix functions must start and end with %. R comes with the following infix functions predefined: %%, %*%, %/%, %in%, %o%, %x%. (The complete list of built-in infix operators that don’t need % is: :, ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, <<-)
`%+%` <- function(a, b) paste0(a, b)
"new" %+% " string"
## [1] "new string"
Replacement functions act like they modify their arguments in place, and have the special name xxx<-. They typically have two arguments (x and value), although they can have more, and they must return the modified object. For example, the following function allows you to modify the second element of a vector
`second<-` <- function(x, value) {
x[2] <- value
x
}
x <- 1:10
second(x) <- 5L
x
## [1] 1 5 3 4 5 6 7 8 9 10
It’s good style to reserve the use of an explicit return() for when you are returning early, such as for an error, or a simple case of the function.
f <- function(x, y) {
if (!x) return(y)
# complicated processing here
}
in_dir <- function(dir, code) {
old <- setwd(dir)
on.exit(setwd(old))
force(code)
}
getwd()
## [1] "C:/Users/minhh/Dropbox/MRC/TW"
in_dir("~", getwd())
## [1] "C:/Users/minhh/Documents"
Having variables at two levels allows you to maintain state across function invocations. This is possible because while the execution environment is refreshed every time, the enclosing environment is constant. The key to managing variables at different levels is the double arrow assignment operator (<<-).
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
The new function is a closure, and its enclosing environment is the environment created when new_counter() is run. Ordinarily, function execution environments are temporary, but a closure maintains access to the environment in which it was created. In the example below, closures counter_one() and counter_two() each get their own enclosing environments when run, so they can maintain different counts.
counter_one <- new_counter()
counter_two <- new_counter()
counter_one()
## [1] 1
counter_one()
## [1] 2
counter_two()
## [1] 1
What happens if you don’t use a closure? What happens if you use <- instead of <<-?
i <- 0
new_counter2 <- function() {
i <<- i + 1
i
}
new_counter3 <- function() {
i <- 0
function() {
i <- i + 1
i
}
}
compute_mean <- list(
base = function(x) mean(x),
sum = function(x) sum(x) / length(x),
manual = function(x) {
total <- 0
n <- length(x)
for (i in seq_along(x)) {
total <- total + x[i] / n
}
total
}
)
Calling a function from a list is straightforward. You extract it then call it:
x <- runif(1e5)
system.time(compute_mean$base(x))
## user system elapsed
## 0 0 0
system.time(compute_mean[[2]](x))
## user system elapsed
## 0 0 0
system.time(compute_mean[["manual"]](x))
## user system elapsed
## 0.02 0.00 0.01
lapply(compute_mean, function(f) f(x))
## $base
## [1] 0.5007404
##
## $sum
## [1] 0.5007404
##
## $manual
## [1] 0.5007404
call_fun <- function(f, ...) f(...)
lapply(compute_mean, call_fun, x)
## $base
## [1] 0.5007404
##
## $sum
## [1] 0.5007404
##
## $manual
## [1] 0.5007404
lapply(compute_mean, function(f) system.time(f(x)))
## $base
## user system elapsed
## 0 0 0
##
## $sum
## user system elapsed
## 0 0 0
##
## $manual
## user system elapsed
## 0.01 0.00 0.02
The main difference here is the evaluation of n. For method 1, the function that gets returned will essentially have a hard-coded n value.
n = 100
f1 = make.power(2)
f1(2)
## [1] 4
n = 1
f1(2)
## [1] 4
Method 2 will not, instead it will rely on the global definition of n
n = 1
make.power2(2)
## [1] 2
n = 100
make.power2(2)
## [1] 1.267651e+30
Select the wanted paragraph
para.extract<-function(inputfile, begphrase, endphrase, outputfile){
## read file
txt <- readLines(inputfile)
## extract paragraph(s) defined from beginning phrase and ending phrase
lns <- data.frame(beg=which(grepl(begphrase,txt)),
end=which(grepl(endphrase,txt)))
txt.2 <- lapply(seq_along(lns$beg),function(l){
paste(txt[seq(from=lns$beg[l], to=lns$end[l])])})
## write to file
file<-file(outputfile)
writeLines(txt.2[[1]], file)
close(file)
}
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase = "SUMMARY OF ANALYSIS", outputfile = "ext1_input instructions.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile = "ext2_summary of analysis.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile = "ext3_model fit information.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile = "ext4_model results.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile = "ext5_alignment output.txt")
para.extract(inputfile = "Mplus ouput.txt", begphrase = "INPUT INSTRUCTIONS", endphrase= "SUMMARY OF ANALYSIS", outputfile = "ext6_savedata information.txt")
Can become a function nested in a function
cut.file<-function(inputfile,outputfilepath){
para.extract<-function(inputfile, begphrase, endphrase, outputfile){
txt <- readLines(inputfile)
## extract paragraph(s) defined from beginning phrase and ending phrase
lns <- data.frame(beg=which(grepl(begphrase,txt)),
end=which(grepl(endphrase,txt)))
txt.2 <- lapply(seq_along(lns$beg),function(l){
paste(txt[seq(from=lns$beg[l], to=lns$end[l])])})
## write to file
file<-file(outputfile)
writeLines(txt.2[[1]], file)
close(file)
}
filepath1<-paste(outputfilepath,"ext1_input instructions.txt",sep="/")
filepath2<-paste(outputfilepath,"ext2_summary of analysis.txt",sep="/")
filepath3<-paste(outputfilepath,"ext3_model fit information.txt",sep="/")
filepath4<-paste(outputfilepath,"ext4_model results.txt",sep="/")
filepath5<-paste(outputfilepath,"ext5_alignment output.txt",sep="/")
filepath6<-paste(outputfilepath,"ext6_savedata information.txt",sep="/")
##INPUT INSTRUCTIONS
sapply(X = inputfile,FUN=para.extract, begphrase="INPUT INSTRUCTIONS", endphrase="SUMMARY OF ANALYSIS", filepath1)
##SUMMARY OF ANALYSIS
sapply(X = inputfile,FUN=para.extract, begphrase="SUMMARY OF ANALYSIS", endphrase="MODEL FIT INFORMATION", filepath2)
##MODEL FIT INFORMATION
sapply(X = inputfile,FUN=para.extract, begphrase="MODEL FIT INFORMATION", endphrase="MODEL RESULTS", filepath3)
##MODEL RESULTS
sapply(X = inputfile,FUN=para.extract, begphrase="MODEL RESULTS", endphrase="ALIGNMENT OUTPUT", filepath4)
##ALIGNMENT OUTPUT
sapply(X = inputfile,FUN=para.extract, begphrase="ALIGNMENT OUTPUT", endphrase="SAVEDATA INFORMATION", filepath5)
##SAVEDATA INFORMATION
sapply(X = inputfile,FUN=para.extract, begphrase="SAVEDATA INFORMATION", endphrase="Support: Support@StatModel.com", filepath6)
}
Good ways to define functions inside function in R - stack overflow
Hadley Wickham, Advanced R. Functions. On website