Same setup chunk as before
Functions and Arguments
Functions
In the previous Rmarkdown file, we looked at creating and saving objects, either alone or in a vector.
This document will focus on functions.
I tend to think of objects like the nouns of R code and functions as the verbs. Objects, like nouns, exist while functions, like verbs, do something.
You signify a function in R by using ()
after the
function name.
In fact, we’ve already seen a function so far when creating a vector:
c()
vec1 <- c(1, 2, 3, 4, 5)
The c()
function binds the values inside it to create a
vector.
Arguments
Most, but not all, functions in R have arguments, sometimes called parameters in other programming languages, that are used to specify and alter what the function does and how it behaves.
Most arguments have names. For instance, the seq()
function can be used as a shortcut to create a vector without having to
list every single value in it. The names for the 2 common arguments
are:
from =
the first element value in the vector - 1 invec1
to =
the last element value in the vector - 5 invec1
and then 1 of two different additional arguments:
by =
how much you want to increase each element by. Ifby = 2
, the sequence will be 1, 3, 5,length.out =
the total number of elements in the vector
It is important when using functions that arguments are
proceeded by =
, not <-
.
If it is inside a function, use =
, if it is used
outside a function, <-
Create a vector named vec2 that starts at 0, ends at 100, and increases in increments of 10
vec2 <- seq(from = 0,
to = 100,
by = 10)
vec2
## [1] 0 10 20 30 40 50 60 70 80 90 100
In R, you don’t need to specify every argument in a function. Some arguments have default values that will be used unless the argument is specified.
For instance, if we leave out the by =
part of
seq()
, what do we get?
seq(from = 0,
to = 100)
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
## [55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
## [73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
## [91] 90 91 92 93 94 95 96 97 98 99 100
A vector from 0 to 100 in increments of 1! That’s because
seq()
defaults to changing the value by 1. What if we just
use seq()
without specifying any arguments?
seq()
## [1] 1
Why does it just return the number 1? Looking at the help menu for
seq()
, the default values are:
from = 1
to = 1
by = 1
(Just in fancier ways to work with thelength.out
argument)
Not using argument names
You also don’t need to specify the names of the arguments, but I
strongly recommend you do (unless the arguments name is x
or there is only 1 argument in the function).
If you don’t name the arguments, R will assign them in the order that they appear in the help menu.
For instance, seq(1, 10, 2)
will have
from = 1
, to = 10
, and by = 2
since those are the first 3 arguments listed.
It’s not uncommon for a function to not do what we want because one of the values given in the function was assigned to a different argument than we wanted. If you use the argument names, you don’t have to worry about a value being assigned to the wrong argument. Also, you can write the arguments in any order that you want!
# Not as intuitive ordering, but we can specify the arguments any way we want!
seq(to = 100, by = 10, from = -10)
## [1] -10 0 10 20 30 40 50 60 70 80 90 100
Repeated values:
If you want to repeat the same value a specific number of times, you
can use the rep()
function.
It needs 2 arguments:
x =
what you want repeatedtimes =
oreach =
how many times you wantx
repeated
Use rep()
to create a vector of five 1s
rep(x = 1, times = 5)
## [1] 1 1 1 1 1
x
doesn’t have to be a number. It can be a character or
a string. Repeat “a” 3 times
rep(x = "a", times = 3)
## [1] "a" "a" "a"
You can also repeat an entire vector! Repeat the vector a, b, c three
times, using the times
argument and the each
argument.
# Using the times = argument
rep(x = c("a", "b", "c"),
times = 3)
## [1] "a" "b" "c" "a" "b" "c" "a" "b" "c"
# Using the each = argument
rep(x = c("a", "b", "c"),
each = 3)
## [1] "a" "a" "a" "b" "b" "b" "c" "c" "c"
What’s the difference between the two arguments?
You can also repeat values in a vector a different numbers of times,
you just need to give times
a vector as well. Use
rep()
to create a vector of:
1, 2, 2, 3, 3, 3
rep(x = 1:3,
times = 1:3)
## [1] 1 2 2 3 3 3
Common Transformative Functions
A transformative function will take a single value and change it to a different value.
round()
For instance, round()
will transform a number to have
fewer decimal places. The arguments are
x =
The number you want roundeddigits =
the number of digits to round it to
round(x = 5.63,
digits = 1)
## [1] 5.6
If you don’t specify the digits
argument,
round()
defaults to 0 (will make it an integer)
round(5.63)
## [1] 6
logs
There are two main log functions in R: log()
and
log10()
If we use `log10() on 1000, it should give us 3 (\(1000 = 10^3\)):
log10(x = 1000)
## [1] 3
But what about if we just used log()
?
log(x = 1000)
## [1] 6.907755
log()
defaults as the natural log: ln()
.
The reason is because when working with data, the natural log is the
most common log to use, for mathematical reasons we won’t discuss in
this class.
What do you do if you want log base 2? Tell it what base you want
using base =
!
Take the log base 2 of 16
log(x = 16,
base = 2)
## [1] 4
In essence, log10(x)
is just a shortcut for
log(x, base = 10)
log10(x = 1000)
## [1] 3
log(x = 1000,
base = 10)
## [1] 3
Summarizing data functions
Transformative functions take a single value and change it to a different value:
round()
log()
sqrt()
What a summarizing function does is take a vector (multiple values) and convert it into a single number (or handful of numbers, depending on the function).
No matter how long the vector is, a summarizing function will always return the same number of values in the end
Common summarizing functions we’ll see are:
mean()
median()
sum()
length()
Calculate the number of elements in a vector
# Create a vector of ages as a demo vector
ages <- c(18, 22, 21, 18, 19, 19)
mean(x = ages)
## [1] 19.5
median(x = ages)
## [1] 19
sum(x = ages)
## [1] 117
length(x = ages)
## [1] 6
range(x = ages)
## [1] 18 22
fivenum(x = ages)
## [1] 18 18 19 21 22
The difference between a summarizing function and a transformative function is what happens when you give them a vector of values:
sqrt(ages)
## [1] 4.242641 4.690416 4.582576 4.242641 4.358899 4.358899
mean(ages)
## [1] 19.5
A transformative function will apply that function to each value in the vector, while a summarizing function always returns the same number of values!
NA: The Missing Value
R denotes a missing value by the value NA. It isn’t a character, so you don’t want ‘NA’, just NA. Otherwise R will treat it as a known character that just happened to be named NA.
Note that when you type NA in the code chunk, it will be a different color than other letters (and likely will be the same color as a number!)
Let’s add an NA value to the end of ages
and find the
mean()
. What happens when an NA is included in a
summarizing function?
ages2 <- c(ages, NA)
mean(x = ages2)
## [1] NA
It gives us back NA because as long as there is at least 1 missing value, most summarizing functions in R will give back NA.
The reason is this way we know that we don’t have all the data (aka there is at least 1 missing value).
We can have R ignore the missing value by adding the argument
na.rm = TRUE
which stands for “Not Available: Remove” and
we want to tell it yes (yes = TRUE
)
mean(x = ages2,
na.rm = T)
## [1] 19.5
Now it will only use the 6 values in the data and completely ignore any NA values in the vector
Which functions require na.rm = T
and which don’t?
Unfortunately, like most answers in life, the answer is it depends.
You’ll learn which you need to specify na.rm = T
and
which removes missing values by default. But most of our default
summarizing functions defaults to na.rm = F
and will return
NA as long as there is at least 1 missing value
Getting Help inside RStudio
You can use either help(function_name)
or
?function_name
and RStudio will pull up the help menu in
the bottom right pane.
help(mean)
## starting httpd help server ... done
?sum
The help menus are very useful. I still use them frequently.
Like any language, you aren’t expected to memorize the exact name of every argument of every function. Just pull up the help menu!