Introduction

Today you will get some practice writing functions. Load package tidyverse.

library(tidyverse)

Below is a template for how R functions are structured. The three main components are

  • name, a short informative verb;
  • arguments, inputs to the function;
  • body, code to be executed.
my_fcn_name <- function(arg_1, arg_2, arg_3) {
  # body
  # of
  # the
  # function
  # goes
  # here
  return(r_object) # object to return
}

Creating viewr()

Your goal is to create a function called viewr() that will take a data frame as its data input and return a randomly specified number of rows and specified number of columns.

The following two functions will be useful in your development of viewr().

dim()

Function dim() retrieves the dimension of an object. For example,

dim(mtcars)
dim(diamonds)

gives the number of rows and columns of the data frames mtcars and diamonds as a vector of length two.

sample_n()

Function sample_n() is a function in package dplyr that randomly selects rows from a data frame. For example

randomly selects five rows from mtcars.

Step 1

The first step in building any function is to get your code working before turning it into a function.

  1. Use a sequence of dplyr functions to select the first 3 columns of diamonds and randomly pick 7 rows.

  2. Modify your above sequence to select all the columns and randomly pick 20 rows.

Step 2

Use your code in Step 1 as a template to create a function named viewr_1(). The function should have three arguments:

  • df: will take a data frame as its input
  • nrow: will take a positive integer as its input
  • ncol: will take a positive integer as its input

The function will return part of the data frame provided as an input, but with a randomly chosen number of rows specified by nrow and the first ncol columns as specified.

After you write your function, test your function with the following calls:

viewr_1(df = diamonds, nrow = 4, ncol = 3)
viewr_1(df = diamonds, nrow = 10, ncol = 10)
viewr_1(df = diamonds, nrow = 10, ncol = 11)

You should have received an error when you ran the third line. Data set diamonds only has 10 variables, so it cannot select 11 columns. You will fix this in step 3.

Step 3

Modify viewr_1() to include an if-else statement. If ncol does not exceed the number of columns in the data frame, then the function should run as normal. Otherwise, the function should output a message with cat("The data frame does not have that many columns!"). Call this new function viewr_2()

Test viewr_2() with the following calls:

viewr_2(df = diamonds, nrow = 4, ncol = 3)
viewr_2(df = diamonds, nrow = 10, ncol = 10)
viewr_2(df = diamonds, nrow = 10, ncol = 11)
viewr_2(df = diamonds, nrow = 10000000, ncol = 3)

You should have received an error when you ran the fourth line. Data set diamonds does not have 10000000 rows. To fix this, proceed to Step 4.

Step 4

Modify viewr_1() to include a stop-if-not statement with function stopifnot(). Call this new function viewr(). If any of the expressions are not all true, stop is called, producing an error message indicating the first expression which was not true. Separate conditions with a comma. Below is an example.

ranger <- function(x){
  # check if x is a numeric vector of length at least 2
  stopifnot(is.numeric(x), length(x)>=2)
  value <- max(x) - min(x)
  return(value)
}

# function test
ranger(x = 1:10)
ranger(x = 3)
ranger(x = letters)

Using stopifnot() is more convenient than including multiple if-else statements.

Test viewr() with the following calls:

viewr(df = diamonds, nrow = 4, ncol = 3)
viewr(df = diamonds, nrow = 10, ncol = 11)
viewr(df = diamonds, nrow = 10000000, ncol = 3)

Creating limit_e()

Create a function called limit_e() that takes one argument, n. Argument n shoud be a vector of integers greater than zero. Function limit_e() will compute and return the evaluated quantity
\[\bigg(1 + \frac{1}{n}\bigg)^n.\]
From calculus you know that the mathematical constant \(e\) is defined as
\[e = \lim_{n \to \infty}\bigg(1 + \frac{1}{n}\bigg)^n.\]

After you write your function, test it with the following function calls. What do you notice happens? Why is this happening?

limit_e(100)
limit_e(1000000)
limit_e(c(1, 1000000, 100000000000))
limit_e(c(1, 1000000, 1000000000000000000))

Computer arithmetic

R, as does most software, uses floating point arithmetic, which is not the same as the arithmetic we learn. Computers cannot represent all numbers exactly.

For your mind to be blown, run the following examples.

# example 1
0.2 == 0.6 / 3

# example 2
point3 <- c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)
point3

point3 == 0.3

To work around these issues, use all.equal() for checking the equality of two numeric quantities in R.

# example 1, all.equal()
all.equal(0.2, 0.6 / 3)

# example 2, all.equal()
point3 <- c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)
point3

all.equal(point3, rep(.3, length(point3)))