Functions in R

Jacob Martin

STAT 187

Same setup chunk as before

Functions and Arguments

Functions

In the previous Rmarkdown file, we looked at creating and saving objects, either alone or in a vector.

This document will focus on functions.

I tend to think of objects like the nouns of R code and functions as the verbs. Objects, like nouns, exist while functions, like verbs, do something.

You signify a function in R by using () after the function name.

In fact, we’ve already seen a function so far when creating a vector: c()

vec1 <- c(1, 2, 3, 4, 5)

The c() function binds the values inside it to create a vector.

Arguments

Most, but not all, functions in R have arguments, sometimes called parameters in other programming languages, that are used to specify and alter what the function does and how it behaves.

Most arguments have names. For instance, the seq() function can be used as a shortcut to create a vector without having to list every single value in it. The names for the 2 common arguments are:

from = the first element value in the vector - 1 in vec1
to = the last element value in the vector - 5 in vec1

and then 1 of two different additional arguments:

by = how much you want to increase each element by. If by = 2, the sequence will be 1, 3, 5,
length.out = the total number of elements in the vector

It is important when using functions that arguments are proceeded by =, not <-.

If it is inside a function, use =, if it is used outside a function, <-

Create a vector named vec2 that starts at 0, ends at 100, and increases in increments of 10

vec2 <- seq(from = 0,
            to = 100,
            by = 10)
vec2

##  [1]   0  10  20  30  40  50  60  70  80  90 100

In R, you don’t need to specify every argument in a function. Some arguments have default values that will be used unless the argument is specified.

For instance, if we leave out the by = part of seq(), what do we get?

seq(from = 0,
    to = 100)

##   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [19]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
##  [37]  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
##  [55]  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
##  [73]  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
##  [91]  90  91  92  93  94  95  96  97  98  99 100

A vector from 0 to 100 in increments of 1! That’s because seq() defaults to changing the value by 1. What if we just use seq() without specifying any arguments?

seq()

## [1] 1

Why does it just return the number 1? Looking at the help menu for seq(), the default values are:

from = 1
to = 1
by = 1 (Just in fancier ways to work with the length.out argument)

Not using argument names

You also don’t need to specify the names of the arguments, but I strongly recommend you do (unless the arguments name is x or there is only 1 argument in the function).

If you don’t name the arguments, R will assign them in the order that they appear in the help menu.

For instance, seq(1, 10, 2) will have from = 1, to = 10, and by = 2 since those are the first 3 arguments listed.

It’s not uncommon for a function to not do what we want because one of the values given in the function was assigned to a different argument than we wanted. If you use the argument names, you don’t have to worry about a value being assigned to the wrong argument. Also, you can write the arguments in any order that you want!

# Not as intuitive ordering, but we can specify the arguments any way we want!
seq(to = 100, by = 10, from = -10)

##  [1] -10   0  10  20  30  40  50  60  70  80  90 100

Repeated values:

If you want to repeat the same value a specific number of times, you can use the rep() function.

It needs 2 arguments:

x = what you want repeated
times = or each = how many times you want x repeated

Use rep() to create a vector of five 1s

rep(x = 1, times = 5)

## [1] 1 1 1 1 1

x doesn’t have to be a number. It can be a character or a string. Repeat “a” 3 times

rep(x = "a", times = 3)

## [1] "a" "a" "a"

You can also repeat an entire vector! Repeat the vector a, b, c three times, using the times argument and the each argument.

# Using the times = argument
rep(x = c("a", "b", "c"),
    times = 3)

## [1] "a" "b" "c" "a" "b" "c" "a" "b" "c"

# Using the each = argument
rep(x = c("a", "b", "c"),
    each = 3)

## [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"

What’s the difference between the two arguments?

You can also repeat values in a vector a different numbers of times, you just need to give times a vector as well. Use rep() to create a vector of:

1, 2, 2, 3, 3, 3

rep(x = 1:3,
    times = 1:3)

## [1] 1 2 2 3 3 3

Common Transformative Functions

A transformative function will take a single value and change it to a different value.

round()

For instance, round() will transform a number to have fewer decimal places. The arguments are

x = The number you want rounded
digits = the number of digits to round it to

round(x = 5.63,
      digits = 1)

## [1] 5.6

If you don’t specify the digits argument, round() defaults to 0 (will make it an integer)

round(5.63)

## [1] 6

logs

There are two main log functions in R: log() and log10()

If we use `log10() on 1000, it should give us 3 (\(1000 = 10^3\)):

log10(x = 1000)

## [1] 3

But what about if we just used log()?

log(x = 1000)

## [1] 6.907755

log() defaults as the natural log: ln(). The reason is because when working with data, the natural log is the most common log to use, for mathematical reasons we won’t discuss in this class.

What do you do if you want log base 2? Tell it what base you want using base =!

Take the log base 2 of 16

log(x = 16,
    base = 2)

## [1] 4

In essence, log10(x) is just a shortcut for log(x, base = 10)

log10(x = 1000)

## [1] 3

log(x = 1000,
    base = 10)

## [1] 3

Summarizing data functions

Transformative functions take a single value and change it to a different value:

round()
log()
sqrt()

What a summarizing function does is take a vector (multiple values) and convert it into a single number (or handful of numbers, depending on the function).

No matter how long the vector is, a summarizing function will always return the same number of values in the end

Common summarizing functions we’ll see are:

mean()
median()
sum()
length() Calculate the number of elements in a vector

# Create a vector of ages as a demo vector
ages <- c(18, 22, 21, 18, 19, 19)

mean(x = ages)

## [1] 19.5

median(x = ages)

## [1] 19

sum(x = ages)

## [1] 117

length(x = ages)

## [1] 6

range(x = ages)

## [1] 18 22

fivenum(x = ages)

## [1] 18 18 19 21 22

The difference between a summarizing function and a transformative function is what happens when you give them a vector of values:

sqrt(ages)

## [1] 4.242641 4.690416 4.582576 4.242641 4.358899 4.358899

mean(ages)

## [1] 19.5

A transformative function will apply that function to each value in the vector, while a summarizing function always returns the same number of values!

NA: The Missing Value

R denotes a missing value by the value NA. It isn’t a character, so you don’t want ‘NA’, just NA. Otherwise R will treat it as a known character that just happened to be named NA.

Note that when you type NA in the code chunk, it will be a different color than other letters (and likely will be the same color as a number!)

Let’s add an NA value to the end of ages and find the mean(). What happens when an NA is included in a summarizing function?

ages2 <- c(ages, NA)

mean(x = ages2)

## [1] NA

It gives us back NA because as long as there is at least 1 missing value, most summarizing functions in R will give back NA.

The reason is this way we know that we don’t have all the data (aka there is at least 1 missing value).

We can have R ignore the missing value by adding the argument na.rm = TRUE which stands for “Not Available: Remove” and we want to tell it yes (yes = TRUE)

mean(x = ages2,
     na.rm = T)

## [1] 19.5

Now it will only use the 6 values in the data and completely ignore any NA values in the vector

Which functions require na.rm = T and which don’t?

Unfortunately, like most answers in life, the answer is it depends.

You’ll learn which you need to specify na.rm = T and which removes missing values by default. But most of our default summarizing functions defaults to na.rm = F and will return NA as long as there is at least 1 missing value

Getting Help inside RStudio

You can use either help(function_name) or ?function_name and RStudio will pull up the help menu in the bottom right pane.

help(mean)

## starting httpd help server ... done

?sum

The help menus are very useful. I still use them frequently.

Like any language, you aren’t expected to memorize the exact name of every argument of every function. Just pull up the help menu!