This is a reproduction of the the Functional Programming chapter, which comes from Chapter 10 of Advanced R. The purpose was to learn the concepts. I deviate a little from the book to assist in my learning. Just note that what is here has some differences than the chapter, mostly extending it for my learning benefit. I skipped the Numericial Integration section, so refer to the book if interested.
library(tidyverse)
library(pryr)
Many advantages to functional programming including:
Example is a function that can adapt depending on needs of user.
missing_fixer <- function(na_value) {
function(x) {
x[x == na_value] <- NA
x
}
}
Now, can adapt the function depending on the needs.
fix_missing_99 <- missing_fixer(-99)
fix_missing_99(c(-99, -999))
[1] NA -999
fix_missing_999 <- missing_fixer(-999)
fix_missing_999(c(-99, -999))
[1] -99 NA
Another use is to reduce duplication. The code below loops through a list of functions using the lapply() function. We use lapply() to reduce calls to each individual function and an anonymous function to reduce the number of arguments (e.g. na.rm = TRUE statements). Note how each function in funs gets passed as a function f.
summary2 <- function(x) {
funs <- c(mean = mean, median = median, sd = sd, mad = mad, IQR = IQR)
lapply(funs, function(f) f(x, na.rm = TRUE))
}
Here’s the result:
mtcars$mpg %>%
summary2() %>%
unlist()
mean median sd mad IQR
20.090625 19.200000 6.026948 5.411490 7.375000
Useful when it’s not worth the effort to give a function a name.
mtcars %>%
lapply(function(x) x %>% unique() %>% length()) %>%
unlist()
mpg cyl disp hp drat wt qsec vs am gear carb
25 3 27 22 22 29 30 2 2 3 6
All functions have formals(), body() and parent environment().
formals(function(x = 4) g(x) + h(x))
$x
[1] 4
body(function(x = 4) g(x) + h(x))
g(x) + h(x)
environment(function(x = 4) g(x) + h(x))
<environment: R_GlobalEnv>
See how can replace the function with an anonymous function with the use of parenthesis.
f <- function(x) x + 3
f(2)
[1] 5
Replace f with (function(X) x + 3):
(function(x) x + 3)(2)
[1] 5
match.fun() lets you find the function. Given a function, can you find its name? Why doesn’t that make sense in R?Get function from a character string using match.fun():
match.fun("mean")
function (x, ...)
UseMethod("mean")
<bytecode: 0x0000000015e4d358>
<environment: namespace:base>
Convert function to a character string using quote() and as.character():
quote(mean) %>%
as.character()
[1] "mean"
lapply() and an anonymous function to find the coefficient of variation for all the columns of mtcars.mtcars %>%
lapply(function(x) mean(x) / sd(x)) %>%
unlist() %>%
round(2)
mpg cyl disp hp drat wt qsec vs am gear carb
3.33 3.46 1.86 2.14 6.73 3.29 9.99 0.87 0.81 5.00 1.74
integrate() and an anonymous function to find the area under the curve.integrate(function(x) x ^ 2 - x, 0, 10)
283.3333 with absolute error < 3.1e-12
integrate(function(x) sin(x) - cos(x), -pi, pi)
-2.615901e-16 with absolute error < 6.3e-14
integrate(function(x) exp(x) / x, 10, 20)
25613160 with absolute error < 2.8e-07
Closures are functions written by functions. Closures get their name because they enclose the environment of the parent function and can access all of its variables. This is useful because it allows two levels of parameters:
power2 <- function(exponent) {
function(x) {
x ^ exponent
}
}
Can use the closure, power2 to create functions square() and cube().
square <- power2(2)
square(4)
[1] 16
cube <- power2(3)
cube(4)
[1] 64
Use pryr::unenclose() to see what’s going on in a Closure, which replaces variables defined in the enclosing environment with values.
cube
function(x) {
x ^ exponent
}
<environment: 0x000000001599d648>
See the difference:
unenclose(cube)
function (x)
{
x^3
}
Use to create many versions of the same general function. power2() and missing_fixer() are examples. Function factories are most useful to solve problems like maximum likelihood.
Can use the double arrow assignment operator (<<-) to maintain state across function calls.
Can create a new_counter() closure function, which is used to create :
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
counter_one <- new_counter()
counter_two <- new_counter()
Because counter_one() and counter_two() each get their own enclosing environments, they can keep track of their own respective counts.
counter_one()
[1] 1
counter_one()
[1] 2
counter_two()
[1] 1
The counters get around the “fresh start” limitation by not modifying variables in their local environment but rather making changes in the parent environment. The parent environment changes are preserved across function calls.
Because the parent function encloses the child function and can access the variables in the parent’s environment.
bc <- function(lambda) {
if (lambda == 0) {
function(x) log(x)
} else {
function(x) (x ^ lambda - 1) / lambda
}
}
This function is the Box-Cox transformation equation. A better name might be new_box_cox. It can be used to set Box-Cox transformations:
new_box_cox <- bc
We can create many iterations of the Box-Cox transformation by varying lambda.
bc_0 <- new_box_cox(0)
bc_0(10)
[1] 2.302585
bc_0.5 <- new_box_cox(0.5)
bc_0.5(10)
[1] 4.324555
bc_1 <- new_box_cox(1)
bc_1(10)
[1] 9
approxfun() do?Performs interpolation on a set of values between points in a data set. The function returned can be plotted as a curve to get values in between the data set points.
set.seed(198)
x <- 1:10
y <- rnorm(10)
f <- approxfun(x, y)
plot(f, 0, 11, col = "tomato", add = TRUE, lty = 3, lwd = 2)
points(x, y, col = 1, pch = "*")
points(x = 2.5, y = f(2.5), col = "orange", pch = 6) # Interpolated point
Can use the function, f, to interpolate between the points at x = 2 and x = 3.
f(2.5)
[1] -0.4438546
ecdf() do?Stands for empirical cumulative distribution function. ecdf() ties the values in a list to the cumulative distribution that the value falls into. The function returned can be used to compute the cumulative distribution that the point falls into.
set.seed(12)
x <- rnorm(50)
Fn <- ecdf(x)
If we want to get the cumulative distribution of a new point:
Fn(0)
[1] 0.58
moment <- function(n) {
function(x) {
sum((x - mean(x)) ^ n) / length(x)
}
}
Run Hadley’s code:
m1 <- moment(1)
m2 <- moment(2)
x <- runif(100)
stopifnot(all.equal(m1(x), 0))
stopifnot(all.equal(m2(x), var(x) * 99 / 100))
It works.
pick() that takes an index, i, as an argument and returns a function with an argument x that subsets x with i.pick <- function(i) {
function(x) x[[i]]
}
Run code:
lapply(mtcars, pick(5)) %>% unlist()
mpg cyl disp hp drat wt qsec vs am gear carb
18.70 8.00 360.00 175.00 3.15 3.44 17.02 0.00 0.00 3.00 2.00
Check results:
lapply(mtcars, function(x) x[[5]]) %>% unlist()
mpg cyl disp hp drat wt qsec vs am gear carb
18.70 8.00 360.00 175.00 3.15 3.44 17.02 0.00 0.00 3.00 2.00
Can store functions in lists, which reduces redundancy.
x <- 1:10
funs <- list(
sum = sum,
mean = mean,
median = median
)
lapply(funs, function(f) f(x, na.rm = TRUE)) %>%
unlist()
sum mean median
55.0 5.5 5.5
base::summary(), but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.Here’s the function we need to replicate:
x <- 1:10
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.25 5.50 5.50 7.75 10.00
First, recreate base::summary():
summary2 <- function(x) {
funs <- list(
"Min." = min,
"1st Qu." = function(x) quantile(x, 0.25)[[1]],
"Median" = median,
"Mean" = mean,
"3rd Qu." = function(x) quantile(x, 0.75)[[1]],
"Max" = max
)
lapply(funs, function(f) f(x)) %>%
unlist()
}
summary2(x)
Min. 1st Qu. Median Mean 3rd Qu. Max
1.00 3.25 5.50 5.50 7.75 10.00
Now, make a function factory:
make_summary <- function(
funs = list(
"Min." = min,
"1st Qu." = function(x) quantile(x, 0.25)[[1]],
"Median" = median,
"Mean" = mean,
"3rd Qu." = function(x) quantile(x, 0.75)[[1]],
"Max" = max)
) {
function(x) {
lapply(funs, function(f) f(x)) %>%
unlist()
}
}
We can now use the closure to replicate the base::summary() function…
summary2 <- make_summary()
summary2(x)
Min. 1st Qu. Median Mean 3rd Qu. Max
1.00 3.25 5.50 5.50 7.75 10.00
…or we can give it our own custom functions to run.
summary3 <- make_summary(funs = list(Mean = mean,
Std.Dev = sd,
Quantile = quantile,
IQR = IQR))
summary3(x)
Mean Std.Dev Quantile.0% Quantile.25% Quantile.50% Quantile.75%
5.50000 3.02765 1.00000 3.25000 5.50000 7.75000
Quantile.100% IQR
10.00000 4.50000
We can even apply our new summary function to a data frame:
lapply(mtcars[,1:4], summary3)
$mpg
Mean Std.Dev Quantile.0% Quantile.25% Quantile.50% Quantile.75%
20.090625 6.026948 10.400000 15.425000 19.200000 22.800000
Quantile.100% IQR
33.900000 7.375000
$cyl
Mean Std.Dev Quantile.0% Quantile.25% Quantile.50% Quantile.75%
6.187500 1.785922 4.000000 4.000000 6.000000 8.000000
Quantile.100% IQR
8.000000 4.000000
$disp
Mean Std.Dev Quantile.0% Quantile.25% Quantile.50% Quantile.75%
230.7219 123.9387 71.1000 120.8250 196.3000 326.0000
Quantile.100% IQR
472.0000 205.1750
$hp
Mean Std.Dev Quantile.0% Quantile.25% Quantile.50% Quantile.75%
146.68750 68.56287 52.00000 96.50000 123.00000 180.00000
Quantile.100% IQR
335.00000 83.50000
Section skipped