Data Science Module

Topic 9B: Writing R Functions


Example R code solutions for the Data Science Module Computer Lab 9B are presented below.

1 R Functions Overview

No answer required.

2 Writing Simple Functions

2.1 Mean Function

Example R code is provided below:

mean_func <- function(values){
  # Argument:
  # values: This is our list of values

  n <- length(values)
  sum(values) / n

}                          

Note that, based on this format, our function can compute the mean of a string of numbers of any non-zero length.

2.1.1

mean_func(c(2:8))
## [1] 5

2.2 Sample standard deviation Function

Example R code is provided below:

Note that I have used quite a few brackets here, to ensure calculations are performed in the correct order.

sample_sd_func <- function(values){
  # Argument:
  # values: This is our list of values
  
  n <- length(values)
  sqrt( (1 / (n-1) ) * sum( (values - mean_func(values) )^2 ))
  
}

2.2.1

sample_sd_func(c(2:8))
## [1] 2.160247

2.2.2

sd(c(2:8))
## [1] 2.160247

Note that both our function and the inbuilt R function provide values of 2.160247. Success!

3 Writing a \(t\)-test Function

3.1 \(t\)-test test statistic

Example R code is provided below:

t_test_func <- function(values, mu){
  # Arguments:
  # values: This is our list of values
  # mu: The mean under H0
  
  n <- length(values)
  ( mean_func(values) - mu ) / (sample_sd_func(values) / sqrt(n) )
  
}

3.2

We run the following commands:

t_test_func(c(2:8), 4)
## [1] 1.224745
t.test(c(2:8), mu = 4)
## 
##  One Sample t-test
## 
## data:  c(2:8)
## t = 1.2247, df = 6, p-value = 0.2666
## alternative hypothesis: true mean is not equal to 4
## 95 percent confidence interval:
##  3.002105 6.997895
## sample estimates:
## mean of x 
##         5

Our t_test_func output provides the same \(t\) test statistic as the t.test function, correct to 4 decimal places.

3.3 Degrees of freedom

Our updated t_test_func could look like this:

t_test_func <- function(values, mu){
  # Arguments:
  # values: This is our list of values
  # mu: The mean under H0
  
  n <- length(values)
  t.val <- ( mean_func(values) - mu ) / (sample_sd_func(values) / sqrt(n) )
  
  df <- n - 1
  
  c("test.stat" = t.val, "df" = df)
  
}

3.4 The concatenate Function

Our updated t_test_func could look like this:

t_test_func <- function(values, mu){
  # Arguments:
  # values: This is our list of values
  # mu: The mean under H0
  
  n <- length(values)
  t.val <- ( mean_func(values) - mu ) / (sample_sd_func(values) / sqrt(n) )
  
  df <- n - 1
  
  cat("The test statistic is", round(t.val, 4), "\n",
      "The degrees of freedom is", df, "\n")
  
}

3.5 \(p\)-value Function

Our updated t_test_func could look like this:

t_test_func <- function(values, mu){
  # Arguments:
  # values: This is our list of values
  # mu: The mean under H0
 
  n <- length(values)
 
  t.val <- (mean_func(values) - mu) / (sample_sd_func(values) / sqrt(n))
 
  df <- n - 1
 
  p.val <- 2*pt(-abs(t.val), df)
 
  cat("The test statistic is", round(t.val, 4), "\n",
      "The degrees of freedom is", df, "\n",
      "The p-value is", round(p.val, 4), "\n")
}

Note here that using 2*pt(-abs(t.val), df) utilises the symmetry property of the Student’s \(t\)-distribution.

3.5.1

t_test_func(c(2:8), 4)
## The test statistic is 1.2247 
##  The degrees of freedom is 6 
##  The p-value is 0.2666
t.test(c(2:8), mu = 4)
## 
##  One Sample t-test
## 
## data:  c(2:8)
## t = 1.2247, df = 6, p-value = 0.2666
## alternative hypothesis: true mean is not equal to 4
## 95 percent confidence interval:
##  3.002105 6.997895
## sample estimates:
## mean of x 
##         5

Our t_test_func output provides the same \(p\)-value as the t.test function, correct to 4 decimal places.

4 Extension: Adding Additional Details to our \(t\)-test Function

4.1

Example R code with comments is provided below:

t_test_func <- function(values, mu){
 
  n <- length(values) # compute sample size
 
  mean.val <- mean_func(values) # compute mean
 
  s.val <- sample_sd_func(values) # compute sample standard deviation
 
  t.val <- (mean.val - mu) / (s.val / sqrt(n)) # compute t test statistic
 
  df <- n - 1 # compute degrees of freedom
 
  p.val <- 2*pt(-abs(t.val), df) # compute p-value
  
  cat("The test statistic is", round(t.val, 4), "\n",
      "The degrees of freedom is", df, "\n",
      "The p-value is", round(p.val, 4), "\n")
}

4.2

Example R code is provided below:

t_test_func <- function(values, mu){
 
  n <- length(values) # compute sample size
 
  mean.val <- mean_func(values) # compute mean
 
  s.val <- sample_sd_func(values) # compute sample standard deviation
 
  t.val <- (mean.val - mu) / (s.val / sqrt(n)) # compute t test statistic
 
  df <- n - 1 # compute degrees of freedom
 
  p.val <- 2*pt(-abs(t.val), df) # compute p-value
 
  list(t.val = t.val, df = df, p.val = p.val, mean.val = mean.val)
 
}

4.2.1

We can use:

final_check <- t_test_func(c(2:8), 4)
final_check
## $t.val
## [1] 1.224745
## 
## $df
## [1] 6
## 
## $p.val
## [1] 0.2665697
## 
## $mean.val
## [1] 5
names(final_check)
## [1] "t.val"    "df"       "p.val"    "mean.val"
final_check$p.val #etc
## [1] 0.2665697


That’s everything for this computer lab!


These notes have been prepared by Rupert Kuveke and Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

