User defined functions

Introduction

Functions in programming are a set of related instructions that are bundled together to perform a specific tasks. They are designed to be used repeatedly, or because of the complexity of given computational task(s), or because a programmer wants to break a large program into smaller and manageable chunks. Functions may or may not require function arguments and can return multiple values, a single value or no values at all.

There are three types of functions in R.

Built in functions e.g. the print() function that causes the interpreter to display a specified object in the console; mean() to calculate the average of a given set of values.
User-Defined Functions (UDFs), which are functions created by users to perform a specific task. These functions have a clearly defined function name.
Anonymous functions, which are functions that do not have a function name e.g. (function(x, y) 2 * x^2 + x * y - 14)(5, -1/4).

This page focuses on user-defined functions, which, as already mentioned, are functions with a clearly defined function names.

Install required packages

The following package will be required and should be loaded first. If it is not already installed, begin by installing it using the install.packages() function e.g. install.packages(“kableExtra”)

library(kableExtra) # display table formatting

Creating user-defined functions

The are four steps to defining functions in R programming. They include the following.

Functions begin with the function name, followed by the assignment operator (either <- or =) and the keyword function.
Add function parameters to the function. These should be defined within the parenthesis of the function.
Add statements or instructions to the function body. These statements are the ones that will be executed by the function.
If the function is intended to return output, you must end it with the return() statement (with values to be returned included in the brackets). If the return statement is omitted, the function will execute but will not return any output.

The function below for example is designed to add two numbers a and b and returns their sum.

sum.ab = function(a, b){
    # function sum.ab - returns the sum of two numbers
    sum.numbers = a + b
    return(sum.numbers)
}

In the above example we have the following.

sum.ab - function name.
function - R keyword for function declaration.
a, b - function parameters.
sum.numbers = a + b - function statement (this is what the function will do - i.e. add to numbers a and b).
return(sum.numbers) - the return statement which returns the result of the function execution, in this case, the sum of a and b.

Calling functions

In programming, calling a function simply means that you execute the user created function either directly or inside other functions. These is done by replacing the function parameters (if any) with specific values (these specific values that are passed to a function are called function arguments). In the code below, we call the function sum.ab() with the arguments 5 and 3 replacing the parameters a and b respectively. The result is given immediately after the code.

result = sum.ab(5, 3)
result

[1] 8

If a function does not have function parameters, then it should be called without specifying any arguments, (i.e. just use empty brackets). For example, the function below does not accept any arguments. Its work is to calculate the absolute difference between the exact value of \(\pi = 3.1415926535897931...\) and the rational approximation \(\pi = \frac{22}{7}.\)

pi.error = function(){
    # function pi.error - returns the absolute difference between pi = 22/7 and the exact value of pi
    pi.rational = 22/7
    abs_error = abs(pi - pi.rational)
    return(abs_error)
}

Since the function was declared without function parameters, we call it with empty brackets as shown in the code below.

result = pi.error()
result

[1] 0.001264489

More about function arguments

As already mentioned, function arguments are the specific values that are passed to a function as inputs either directly or inside other functions. User-defined functions in R can take the following types of arguments.

Default arguments
Required arguments
keyword arguments
Variable number of arguments

Default arguments

This are values passed to the user-defined function parameters when it is created. If this user-defined function is called without specifying a value, then this default value(s) is(are) used. This is illustrated below.

circle = function(radius, area = TRUE, pi.value = 22/7){
    # function circle - returns area or circumference of a circle given its radius
    result = data.frame() # initialize an empty data frame
    if (isTRUE(area)){
        calculate = "Area"
        circle.area = pi.value * radius^2
        result = rbind(result, c(radius, circle.area))
    }else{
        calculate = "Perimeter"
        circle.perimeter = 2 * pi.value * radius
        result = rbind(result, c(radius, circle.perimeter))
    }
    names(result) = c("Radius", calculate)
    result
}

The function circle() above calculates either the area (area = TRUE) or circumference (area = FALSE) of a circle. Here, the default argument for the parameter area is TRUE, so, if a user does not specify a value for this parameter (i.e. omits it during function call), then the function will automatically use the default argument TRUE and hence return the area. Similarly, the parameter pi.value has a default value of 22/7 assigned to it. Again if the user omits this parameter during function call, the function will automatically use the default value of 22/7. Below is an example.

# call function
result = circle(21)
# display results
kbl(result) %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Radius	Area
21	1386

In the code above, only one argument was specified i.e. the 21 in the parenthesis of the funcion call. So, the specified argument is considered as the radius and the other two parameters of the function, area and p.value automatically took the default arguments TRUE and 22/7 respectively.

Required arguments

These are arguments that must be specified when the function is called, precisely in the correct order. Failing to specify these arguments results in a syntax error. In the example above, the first argument, radius, is a required argument. The example below returns an error when this argument is not specified (run the code without the # symbol to get the error shown after the code).

# circle()

Error in circle() : argument “radius” is missing, with no default

Keyword arguments

Keyword arguments are those arguments that are specified with their respective parameters. The advantage of keyword arguments is that a user can switch around the order of the parameters at the function call without changing the result or output of the function. The example below illustrates this (note that we even have interchanged the order of the function parameters).

# call function
result = circle(pi.value = pi, radius = 21, area = FALSE)
# display results
kbl(result) %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Radius	Perimeter
21	131.9469

Note that in the above syntax, the function has calculated the circumference of a circle with a radius equal to 21 units using the built-in value of the constant \(\pi\) i.e. the argument pi assigned to the parameter pi.value in the function call statement.

Variable number of arguments

It often occurs in programming that a user may not know the exact number of parameters that are required by the function. The R programming language uses three dots \((\cdots)\) to allow a user-defined function to accept an arbitrary number of arguments during function call. The function given below performs a specified addition or multiplication on an arbitrary number of parameters.

quartely.sales = function(description, ...){
    # performs addition or multiplication on an arbitrary number of numeric values
    sales.figures = c(...)
    n = length(sales.figures)
    Total = sum(sales.figures)
    result = data.frame(Sales = c(sales.figures, Total))
    rownames(result) = c(paste0(description, 1:n), "Total")
    return(result)
}

The function above is applied below to calculate the sum of the quarterly sales \(3253, 4218, 2514\) and \(3210.\)

# call function
result = quartely.sales(description = "Quarter", 
                        Quarter1 = 3253, Quarter2 = 4218, Quarter3 = 2514, Quarter4 = 3210)
# display results
kbl(result,
    caption = "Table 1: Quarterly sales.") %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Table 1: Quarterly sales.
	Sales
Quarter1	3253
Quarter2	4218
Quarter3	2514
Quarter4	3210
Total	13195

Note that we have used the above very simple example for demonstration purposes, otherwise we could just have created two vectors, one with the different quarters and another with the respective sales figures (or better still, we could have used a list).

Scope of variables

Variables defined within a program can either be local or global. Local variables are defined within the function body which means that they have only a local scope. These variables are only accessible within the body of the function within which they are declared. On the other hand, global variables are accessible at any point in the program and by any user-defined function that may be in you code. The code below shows two functions circlearea() and circleperimeter() used to demonstrate the concept of global functions.

pi.value = 22/7 # this value is global, so it will be accessed by both the
                # circlearea() and circleperimeter() functions
circlearea = function(radius){
    # function circlearea - returns area of a circle given its radius
    circle.area = pi.value * radius^2
    result = data.frame(Radius = radius, Area = circle.area)
    return(result)
}

circleperimeter = function(radius){
    # function circleperimeter - returns perimeter of a circle given its radius
    circle.perimeter = 2 * pi.value * radius
    result = data.frame(Radius = radius, Perimeter = circle.perimeter)
    return(result)
}

In the above code, pi.value = 22/7 is a global variable since it is defined outside the function. It is therefore available to both the circlearea() and circleperimeter() functions. The functions are called as follows.

result = circlearea(21)
# display results
kbl(result) %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Radius	Area
21	1386

result = circleperimeter(21)
# display results
kbl(result) %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Radius	Perimeter
21	132

In the following example, we now declare this value pi.value = 22/7 inside the circlearea(). In such a case, it becomes a local variable and will only be accessible by the circlearea() function and not the circleperimeter() function.

rm(pi.value) # remove this variable
circlearea1 = function(radius){
    # function circlearea - returns area of a circle given its radius
    pi.value = 22/7 # this is a local variable only available to this function
    circle.area = pi.value * radius^2
    result = data.frame(Radius = radius, Area = circle.area)
    return(result)
}

circleperimeter1 = function(radius){
    # function circleperimeter - returns perimeter of a circle given its radius
    circle.perimeter = 2 * pi.value * radius
    result = data.frame(Radius = radius, Perimeter = circle.perimeter)
    return(result)
}

The function circlearea1() executes successfully and returns the result as shown below.

result = circlearea1(21)
# display results
kbl(result) %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Radius	Area
21	1386

However, when the function circleperimeter1(21) is run, it fails and returns the error given after the below code. Remove the # sign then run the code to get the error. The syntax fails because the function circleperimeter1() cannot access the local variable pi.value defined in the body of the circlearea1(21) function.

# circleperimeter1(21)

Error in circleperimeter1(21) : object ‘pi.value’ not found

Nested functions

Nested functions are functions that incorporate other functions within their function body. This can be done in two ways.

Writing the function(s) within the function body of another function.
Calling the function(s) by their name(s) within the function body of another function.

The second method is recommended, and for this reason, it is the one we will demonstrate. First, consider the function odds.ci() below, whose task is to calculate the risk ratios, odds ratio and the \((1-\alpha) \times 100\%\) confidence interval for the odds ratio.

odds.ci = function(ctable, alpha = 0.05){
    # function odds.ci - returns the odds ration and (1-apha/2)% confidence interval
    conflevel = 1-alpha/2
    # add totals
    ctable = as.matrix(addmargins(as.table(ctable)))
    # calculate risk in exposed and unexposed
    risk.exposed = rbind(ctable[1, 1]/ctable[3, 1])
    risk.unexposed = rbind(ctable[1, 2]/ctable[3, 2])
    risk.overall = rbind(ctable[1, 3]/ctable[3, 3])
    risks = c(risk.exposed, risk.unexposed, risk.overall)
    # add proportions to given table
    ctable = rbind(ctable, round(risks, 4))
    # give row and column names
    dimnames(ctable) = list(c("Exposed", "Unexposed", "Total", "Risk"),
                            c("Cases", "Controls", "Total"))
    # calculation of odds ratio
    OR = ctable[1, 1] * ctable[2, 2]/(ctable[2, 1] * ctable[1, 2])
    v = 1/ctable[1, 1] + 1/ctable[1, 2] + 1/ctable[2, 1] + 1/ctable[2, 2]
    ss = qnorm(conflevel) * sqrt(v)
    # Confidence intervals
    LCI = round(exp(log(OR) - ss), 4)
    UCI = round(exp(log(OR) + ss), 4)
    OR.CI = paste0("OR = ", round(OR, 4), ", ", (1-alpha) * 100, "% CI [", LCI, ", ", UCI, "]")
    result = list(Table = ctable, ORCI = OR.CI)
    return(result)
}

Run the function as follows. Results are presented in Table 2.

m = c(178, 1411, 79, 1486)
ctab = matrix(m, nrow = 2, byrow = TRUE)
result = odds.ci(ctable = ctab, alpha = 0.05)
# display table with proportion of exposed
kbl(result$Table,
    caption = "Table 2: Cases and controls and their risk ratios.") %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Table 2: Cases and controls and their risk ratios.
	Cases	Controls	Total
Exposed	178.0000	1411.0000	1589.0000
Unexposed	79.0000	1486.0000	1565.0000
Total	257.0000	2897.0000	3154.0000
Risk	0.6926	0.4871	0.5038

# display the odds ratio and the 95% confidence interval
print(result$ORCI)

[1] “OR = 2.3729, 95% CI [1.8028, 3.1234]”

For demonstration purposes on nested functions, we are now going to create two functions from the body of the above function. This is done as follows.

risk.ratios.df = function(df){
    # function risk.ratios.df - calculates risk ratios for exposed, unexposed and overall
    risk.exposed = rbind(df[1, 1]/df[3, 1])
    risk.unexposed = rbind(df[1, 2]/df[3, 2])
    risk.overall = rbind(df[1, 3]/df[3, 3])
    risks = c(risk.exposed, risk.unexposed, risk.overall)
    # add proportions to given table
    df = rbind(df, risks)
    # give row and column names
    dimnames(df) = list(c("Exposed", "Unexposed", "Total", "Risk"),
                        c("Cases", "Controls", "Total"))
    return(df)
}

odds.ratio = function(df, alpha = 0.05){
    # function odds.ratio - calculates the odds ratio and the (1-alpha)% confidence interval
    OR = df[1, 1] * df[2, 2]/(df[2, 1] * df[1, 2])
    v = 1/df[1, 1] + 1/df[1, 2] + 1/df[2, 1] + 1/df[2, 2]
    conflevel = 1-alpha/2
    ss = qnorm(conflevel) * sqrt(v)
    # confidence intervals
    LCI = round(exp(log(OR) - ss), 4)
    UCI = round(exp(log(OR) + ss), 4)
    OR.CI = paste0("OR = ", round(OR, 4), ", ", (1-alpha) * 100, "% CI [", LCI, ", ", UCI, "]")
    return(OR.CI)
}

Now we write the main function that will call the above two functions within its body. This is done in the code below. This main function is what is referred to as nested function because it calls other functions within its body.

odds.ci.new = function(x, by.rows = TRUE){
    # odds.ci.new - returns table with risk ratios, odds ratio and CI or odds ratio
    M = matrix(x, nrow = 2, byrow = by.rows)
    df.totals = as.data.frame(addmargins(M))
    # call the risk.ratios() function to calculate risk ratios
    risk.df = risk.ratios.df(df = df.totals)
    # call the odds.ratio() to calculate the odds ratios and CI
    OR.CI = odds.ratio(risk.df, alpha = 0.05)
    result = list(Table = risk.df, ORCI = OR.CI)
    return(result)
}

Now run the above nested function to get the results presented in Table 3.

m = c(178, 1411, 79, 1486)
result = odds.ci.new(x = m)
# display table with proportion of exposed
kbl(round(result$Table, 4),
    caption = "Table 3: Risk ratios, odds ratio and 95% confidence interval.") %>%
    kable_styling(bootstrap_options = "striped", full_width = FALSE, position = "left")

Table 3: Risk ratios, odds ratio and 95% confidence interval.
	Cases	Controls	Total
Exposed	178.0000	1411.0000	1589.0000
Unexposed	79.0000	1486.0000	1565.0000
Total	257.0000	2897.0000	3154.0000
Risk	0.6926	0.4871	0.5038

# display the odds ratio and the 95% confidence interval
print(result$ORCI)

[1] “OR = 2.3729, 95% CI [1.8028, 3.1234]”

STEM Research
https://stemresearchs.com