Functions are an extremely important concept in almost every programming language; R is not different. After learning what a function is and how you can use one, you’ll take full control by writing your own functions.
Before even thinking of using an R function, you should clarify which
arguments it expects. All the relevant details such as a description,
usage, and arguments can be found in the documentation. To consult the
documentation on the sample() function, for example, you
can use one of following R commands:
help(sample)
?sample
If you execute these commands in the console of the DataCamp interface, you’ll be redirected to www.rdocumentation.org.
A quick hack to see the arguments of the sample()
function is the args() function. Try it out in the
console:
args(sample)
In the next exercises, you’ll be learning how to use the
mean() function with increasing complexity. The first thing
you’ll have to do is get acquainted with the mean()
function.
mean()
function: ?mean or
help(mean).mean()
function using the args() function.# Consult the documentation on the mean() function
?mean
# Inspect the arguments of the mean() function
args(mean)Great! That wasn’t too hard, was it? Take a look at the documentation and head over to the next exercise.
The documentation on the mean() function gives us quite
some information:
mean() function computes the arithmetic mean.x and
....x argument should be a vector containing numeric,
logical or time-related information.Remember that R can match arguments both by position and by name. Can you still remember the difference? You’ll find out in this exercise!
Once more, you’ll be working with the view counts of your social
network profiles for the past 7 days. These are stored in the
linkedin and facebook vectors and have already
been defined in the editor on the right.
linkedin and facebook and assign
the result to avg_li and
avg_fb, respectively. Experiment with different types
of argument matching!avg_li and
avg_fb.# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)
# Calculate average number of views
avg_li <- mean(linkedin)
avg_fb <- mean(facebook)
# Inspect avg_li and avg_fb
avg_li## [1] 10.85714
## [1] 11.42857
Nice! I’m sure you’ve already called more advanced R functions in your history as a programmer. Now you also know what actually happens under the hood.
Check the documentation on the mean() function
again:
?mean
The Usage section of the documentation includes two versions of the
mean() function. The first usage,
mean(x, ...)
is the most general usage of the mean function. The ‘Default S3 method’, however, is:
mean(x, trim = 0, na.rm = FALSE, ...)
The ... is called the ellipsis. It is a way for R to
pass arguments along without the function having to name them
explicitly. The ellipsis will be treated in more detail in future
courses.
For the remainder of this exercise, just work with the second usage
of the mean function. Notice that both trim and
na.rm have default values. This makes them optional
arguments.
linkedin and facebook and store
the result in a variable avg_sum.trim argument equal to 0.2 and assign the result
to avg_sum_trimmed.avg_sum and
avg_sum_trimmed; can you spot the difference?# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)
# Calculate the mean of the sum
avg_sum <- mean(linkedin + facebook)
# Calculate the trimmed mean of the sum
avg_sum_trimmed <- mean(linkedin + facebook, trim = 0.2)
# Inspect both new variables
avg_sum## [1] 22.28571
## [1] 22.6
Nice! When the trim argument is
not zero, it chops off a fraction (equal to
trim) of the vector you pass as argument
x.
In the video, Filip guided you through the example of specifying
arguments of the sd() function. The sd()
function has an optional argument, na.rm that specified
whether or not to remove missing values from the input vector before
calculating the standard deviation.
If you’ve had a good look at the documentation, you’ll know by now
that the mean() function also has this argument,
na.rm, and it does the exact same thing. By default, it is
set to FALSE, as the Usage of the
Default S3 method shows:
mean(x, trim = 0, na.rm = FALSE, ...)
Let’s see what happens if your vectors linkedin and
facebook contain missing values (NA).
# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, NA, 17, 14)
facebook <- c(17, NA, 5, 16, 8, 13, 14)
# Basic average of linkedin
mean(linkedin)## [1] NA
## [1] 12.33333
Awesome! Up to the next exercise!
You already know that R functions return objects that you can then use somewhere else. This makes it easy to use functions inside functions, as you’ve seen before:
speed <- 31
print(paste("Your speed is", speed))
Notice that both the print() and paste()
functions use the ellipsis - ... - as an argument. Can you
figure out how they’re used?
Use abs() on
linkedin - facebook to get the absolute differences
between the daily Linkedin and Facebook profile views. Next, use this
function call inside mean() to calculate the Mean
Absolute Deviation. In the mean() call, make sure
to specify na.rm to treat missing values
correctly!
# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, NA, 17, 14)
facebook <- c(17, NA, 5, 16, 8, 13, 14)
# Calculate the mean absolute deviation
mean(abs(linkedin - facebook), na.rm = TRUE)## [1] 4.8
Excellent! Proceed to the next exercise.
By now, you will probably have a good understanding of the difference
between required and optional arguments. Let’s refresh this difference
by having one last look at the mean() function:
mean(x, trim = 0, na.rm = FALSE, ...)
x is required; if you do not specify it, R will throw an
error. trim and na.rm are optional arguments:
they have a default value which is used if the arguments are not
explicitly specified.
Which of the following statements about the read.table()
function are true?
header, sep and quote are all
optional arguments.row.names and fileEncoding don’t have
default values.read.table("myfile.txt", "-", TRUE) will throw an
error.read.table("myfile.txt", sep = "-", header = TRUE) will
throw an error.Possible answers:
my_fun <- function(arg1, arg2) {
body
}
Notice that this recipe uses the assignment operator
(<-) just as if you were assigning a vector to a
variable for example. This is not a coincidence. Creating a function in
R basically is the assignment of a function object to a variable! In the
recipe above, you’re creating a new R variable my_fun, that
becomes available in the workspace as soon as you execute the
definition. From then on, you can use the my_fun as a
function.
pow_two(): it takes one
argument and returns that number squared (that number times
itself).12
as input.sum_abs(), that
takes two arguments and returns the sum of the absolute values of both
arguments.sum_abs() with
arguments -2 and 3
afterwards.## [1] 144
# Create a function sum_abs()
sum_abs <- function(a, b) {
abs(a) + abs(b)
}
# Use the function
sum_abs(-2, 3)## [1] 5
Great! Step it up a notch in the next exercise!
There are situations in which your function does not require an input. Let’s say you want to write a function that gives us the random outcome of throwing a fair die:
throw_die <- function() {
number <- sample(1:6, size = 1)
number
}
throw_die()
Up to you to code a function that doesn’t take any arguments!
Define a divide function
# Divide function
divide.function <- function(x) {
if (x %% 21 == 0){
print("The number is divisible by 21")
}
else{
print("The number is not divisible by 21")
}}
divide.function(23)## [1] "The number is not divisible by 21"
Conversting Temperatures
fahrenheit_to_celsius <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}
# boiling point of water
fahrenheit_to_celsius(212)## [1] 100
Define a divide function
## Warning: package 'propagate' was built under R version 4.3.1
## Loading required package: MASS
## Loading required package: tmvtnorm
## Warning: package 'tmvtnorm' was built under R version 4.3.1
## Loading required package: mvtnorm
## Loading required package: Matrix
## Loading required package: stats4
## Loading required package: gmm
## Warning: package 'gmm' was built under R version 4.3.1
## Loading required package: sandwich
## Loading required package: Rcpp
## Loading required package: ff
## Warning: package 'ff' was built under R version 4.3.1
## Loading required package: bit
##
## Attaching package: 'bit'
## The following object is masked from 'package:base':
##
## xor
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/qc/AppData/Local/Temp/RtmpUhalXN/ff"
## - getOption("ffextension")=="ff"
## - getOption("ffdrop")==TRUE
## - getOption("fffinonexit")==TRUE
## - getOption("ffpagesize")==65536
## - getOption("ffcaching")=="mmnoflush" -- consider "ffeachflush" if your system stalls on large writes
## - getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system
## - getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system
##
## Attaching package: 'ff'
## The following objects are masked from 'package:utils':
##
## write.csv, write.csv2
## The following objects are masked from 'package:base':
##
## is.factor, is.ordered
## Loading required package: minpack.lm
## Warning: package 'minpack.lm' was built under R version 4.3.1
MystatFn <- function(x, what) {
if(what == "histogram") {
hist(x, yaxt = "n", ylab = "", border = "white",
col = "red", xlab = "data",
main = "histogram")
}
if(what == "stats") {
print(paste("Dear class! The mean of this data be ",
round(mean(x), 4),
" and the standard deviation be ",
round(sd(x), 4),
sep = ""))
}
if(what == "dist") {
u=fitDistr(x)$stat
return(u)
}
}
y=rnorm(1000,0,1)
MystatFn(y,"stats")## [1] "Dear class! The mean of this data be -0.0081 and the standard deviation be 0.9929"
## 1 of 32: Fitting Normal distribution...
## .........
## 2 of 32: Fitting Skewed-normal distribution...
## .........10.........20.......
## 3 of 32: Fitting Generalized normal distribution...
## .........10.........20.......
## 4 of 32: Fitting Log-normal distribution...
## .........
## 5 of 32: Fitting Scaled/shifted t- distribution...
## .........10.........20.......
## 6 of 32: Fitting Logistic distribution...
## .........
## 7 of 32: Fitting Uniform distribution...
## .........
## 8 of 32: Fitting Triangular distribution...
## .........10.........20.......
## 9 of 32: Fitting Trapezoidal distribution...
## .........10.........20.........30.........40.........50
## .........60.........70.........80.
## 10 of 32: Fitting Curvilinear Trapezoidal distribution...
## .........10.........20.......
## 11 of 32: Fitting Gamma distribution...
## .........
## 12 of 32: Fitting Inverse Gamma distribution...
## .........
## 13 of 32: Fitting Cauchy distribution...
## .........
## 14 of 32: Fitting Laplace distribution...
## .........
## 15 of 32: Fitting Gumbel distribution...
## .........
## 16 of 32: Fitting Johnson SU distribution...
## .........10.........20.........30.........40.........50
## .........60.........70.........80.
## 17 of 32: Fitting Johnson SB distribution...
## .........10.........20.........30.........40.........50
## .........60.........70.........80.
## 18 of 32: Fitting 3P Weibull distribution...
## .........10.........20.......
## 19 of 32: Fitting 2P Beta distribution...
## .........
## 20 of 32: Fitting 4P Beta distribution...
## .........10.........20.........30.........40.........50
## .........60.........70.........80.
## 21 of 32: Fitting Arcsine distribution...
## .........
## 22 of 32: Fitting von Mises distribution...
## .........
## 23 of 32: Fitting Inverse Gaussian distribution...
## .........
## 24 of 32: Fitting Generalized Extreme Value distribution...
## .........10.........20.......
## 25 of 32: Fitting Rayleigh distribution...
## .........
## 26 of 32: Fitting Chi-Square distribution...
## ...
## 27 of 32: Fitting Exponential distribution...
## ...
## 28 of 32: Fitting F- distribution...
## .........
## 29 of 32: Fitting Burr distribution...
## ...
## 30 of 32: Fitting Chi distribution...
## ...
## 31 of 32: Fitting Inverse Chi-Square distribution...
## ...
## 32 of 32: Fitting Cosine distribution...
## .........
## Distribution BIC RSS MSE
## 8 Triangular -354.613592 4.000000e-04 0.002870822
## 9 Trapezoidal -354.509302 4.000000e-04 0.002763688
## 6 Logistic -353.948251 8.354378e-05 0.003000675
## 1 Normal -353.356953 2.663390e-04 0.003015018
## 5 Scaled/shifted t- -349.485098 1.542771e-04 0.002992045
## 22 von Mises -349.251660 2.390328e-05 0.003116507
## 2 Skewed-normal -348.555942 2.527231e-04 0.003014549
## 3 Generalized normal -348.540912 2.613800e-04 0.003014914
## 24 Generalized Extreme Value -346.996589 3.236395e-04 0.003052698
## 18 3P Weibull -345.948006 3.969057e-04 0.003078622
## 16 Johnson SU -344.747029 1.520895e-04 0.002990062
## 17 Johnson SB -343.719672 2.615970e-04 0.003014938
## 20 4P Beta -343.595095 2.691953e-04 0.003017968
## 32 Cosine -343.420296 7.331232e-05 0.003266568
## 14 Laplace -339.880835 1.034948e-05 0.003361153
## 15 Gumbel -319.250976 3.999995e-04 0.003969554
## 13 Cauchy -303.225592 1.559499e-05 0.004517193
## 29 Burr -301.792627 4.884315e-06 0.004750834
## 10 Curvilinear Trapezoidal -258.552013 4.000000e-04 0.006229452
## 25 Rayleigh -241.311271 2.892150e-02 0.007442448
## 7 Uniform -125.010353 4.000000e-04 0.019012763
## 21 Arcsine -99.499463 4.000000e-04 0.023355756
## 4 Log-normal -77.550455 4.000000e-04 0.027878383
## 28 F- -77.456122 4.000000e-04 0.027899600
## 12 Inverse Gamma -71.510238 4.000000e-04 0.029269998
## 27 Exponential -69.030757 4.000000e-04 0.031044824
## 11 Gamma -68.966429 4.000000e-04 0.029876661
## 31 Inverse Chi-Square -66.015969 4.000000e-04 0.031808861
## 26 Chi-Square -65.680155 4.000000e-04 0.031895122
## 23 Inverse Gaussian -59.813439 4.000000e-04 0.032165423
## 30 Chi -15.399433 4.000000e-04 0.047843858
## 19 2P Beta -8.076324 4.000000e-04 0.048819355