Problem 1

Install the ISwR R package. Write the built-in dataset thuesen to a tab-separated text file. View it with a text editor.

library(ISwR)
## Warning: package 'ISwR' was built under R version 4.3.2
write.table(thuesen, file = "thuesen.txt", row.names = F, sep = "\t")

thuesen.tab.delim <- readLines("thuesen.txt")
theusen.tab.delim <- sub("NA", ".", thuesen.tab.delim)
writeLines(thuesen.tab.delim, "thuesen_period.txt")
cat(readLines("thuesen_period.txt"), sep = "\n")
## "blood.glucose"  "short.velocity"
## 15.3 1.76
## 10.8 1.34
## 8.1  1.27
## 19.5 1.47
## 7.2  1.27
## 5.3  1.49
## 9.3  1.31
## 11.1 1.09
## 7.5  1.18
## 12.2 1.22
## 6.7  1.25
## 5.2  1.19
## 19   1.95
## 15.1 1.28
## 6.7  1.52
## 8.6  NA
## 4.2  1.12
## 10.3 1.37
## 12.5 1.19
## 16.1 1.05
## 13.3 1.32
## 4.9  1.03
## 8.8  1.12
## 9.5  1.7

Problem 2

The exponential growth of a population is described by this mathematical function: N_t=N_0 e^rt where Nt is the population size at time t, N0 is the initial population size, and r is the rate of growth or reproductive rate. Write an exponential growth function in R that also generates a plot with time on the x axis and Nt on the y axis. Using that function create plots for 20 days assuming an initial population size of 10 individuals under three growth rate scenarios: (0.5, 0.8, -0.1).

exponential_growth <- function(N0, r, days, col) {
  t <- 1:days
  Nt <- N0 * exp(r * t)
  lines(t, Nt, type = "l", lty = 2, lwd = 2.5, col = col)
}

I orginally made three separate plots with no modifications to color, legends, etc., but saw a tutorial for generating plots like the one below as well and wanted to give it a shot…

plot(0, type = "n", xlim = c(1, 20), ylim = c(0, 1000), xlab = "Time (days)", ylab="Population Size (Number)", main="Exponential Growth")
exponential_growth(10, 0.5, 20, col="darkred")
exponential_growth(10, 0.8, 20, col="blue")
exponential_growth(10, -0.1, 20, col="darkgreen")
legend("topright", legend = c("r = 0.5", "r = 0.8", "r = -0.1"), col = c("darkred", "blue", "darkgreen"), lty = 1, lwd = 2.5)

Problem 3

Under highly favorable conditions, populations grow exponentially. However resources will eventually limit population growth and exponential growth cannot continue indefinitely. This phenomenon is described by the logistic growth function: N_t=(KN_0)/〖N_0+(K-N_0)e〗^(-rt) where K is the carrying capacity. Write a logistic growth function in R that also generates a plot with time on the x axis and Nt on the y axis. Using that function create plots for 20 days assuming an initial population size of 10 individuals and a carrying capacity of 1,000 individuals under three growth rate scenarios: (0.5, 0.8, 0.4).

log.growth<-function (N0, r, t, K, col) {
  Nt <- K * N0/(N0 + (K - N0) * exp(1)^(-r * t))
  lines(t, Nt, type = "l", lty = 2, lwd = 2.5, col = col)}

plot(0, type = "n", xlim = c(0, 20), ylim = c(0, 1000), xlab = "Time (days)", ylab = "Population (Number)", main = "Logistic Growth")
log.growth(N0 = 10, r = 0.5, t = 0:20, K = 1000, col = "darkred")
log.growth(N0 = 10, r = 0.8, t = 0:20, K = 1000, col = "blue")
log.growth(N0 = 10, r = 0.4, t = 0:20, K = 1000, col = "darkgreen")

legend("topleft", legend = c("r = 0.5", "r = 0.8", "r = 0.4"), col = c("darkred", "blue", "darkgreen"), lty = 1, lwd = 2.5)

Problem 4

Write a function (sum_n) that for any given value, say n, computes the sum of the integers from 1 to n (inclusive). Use the function to determine the sum of integers from 1 to 5,000.

sum_n <- function(n){
  n <- 1:n
  x <- n-(n-1)
  sum_n_eq <- sum(c(from = 1, to = n, by = 1))}
sum_n(n = 5000)
print(sum_n(n = 5000))
## [1] 12502502

Problem 5

Write a function (sqrt_round) that takes a number x as input, takes the square root, rounds it to the nearest whole number and then returns the result.

sqrt_round <- function(x){
  x <- x
  round(sqrt(x))}
print(sqrt_round(x = 8.3))
## [1] 3

Problem 6

Install the R package ‘nycflights13’, and load the ‘weather’ data. a) Explore the columns names and the top part of the dataset to get a sense of the data

library(nycflights13)
data(weather)
head(weather)
## # A tibble: 6 × 15
##   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
##   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
## 1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4         NA
## 2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06        NA
## 3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5         NA
## 4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7         NA
## 5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7         NA
## 6 EWR     2013     1     1     6  37.9  28.0  67.2      240      11.5         NA
## # ℹ 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
## #   time_hour <dttm>
  1. Make a subset of the data from just the first month (1) and then save that subset as ‘weather1’
weather1 <- subset(weather, weather$month == 1)
library(ggplot2)
ggplot(data = weather1, aes(x = temp)) + geom_histogram(bins = 30)

  1. Using ggplot2, make a beautiful histogram of the variable ‘temp’
ggplot(data = weather1, aes(x = temp)) + geom_histogram(col = "white", 
                                                        fill = "blue") +
  labs(title = "Temperatures in Month 1",
       x = "Temperature (°F)",
       y = "Count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  1. Using ggplot2, make a beautiful line plot of ‘temp’ as a function of ‘time_hour’
ggplot(weather1, aes(x = time_hour, y = temp)) + geom_line(color = "hotpink") +
  labs(title = "Departure Temperatures Over Time (Month 1)",
       x = "Time (hr)",
       y = "Temperature (°F)")

Okay, pink is my favourite colour but I’ll select a better colour for reporting…

ggplot(weather1, aes(x = time_hour, y = temp)) + geom_line(color = "red") +
  labs(title = "Departure Temperatures Over Time (Month 1)",
       x = "Time (hr)",
       y = "Temperature (°F)")

  1. Using ggplot2, make a beautiful boxplot of ‘temp’ as a function of ‘origin’
ggplot(weather1, aes(x = origin, y = temp, fill = origin)) + geom_boxplot() +
  labs(title = "Departure Temperatures Over Time (Month 1)",
       x = "Time (hr)",
       y = "Temperature (°F)")

Problem 1b

A researcher videotaped the glides of 8 tree snakes leaping from a 10-m tower. Undulation rates of the snakes measured in hertz (cycles per second) were as follows: 0.9, 1.4, 1.2, 1.2, 1.3, 2.0, 1.4, 1.6.

library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
undulation_rates <- c(0.9, 1.4, 1.2, 1.2, 1.3, 2.0, 1.4, 1.6)
  1. Draw a histogram of the undulation rate
hist(undulation_rates, xlab = "Hertz", ylab = "Frequency", main = "Tree Snake Undulation Rates")

  1. Calculate the sample mean sur = snake undulation rates
sur_mean <- mean(undulation_rates)
sur_mean
## [1] 1.375
  1. Calculate the range
sur_range <- range(undulation_rates)
sur_range
## [1] 0.9 2.0
  1. Calculate the standard deviation
sur_sd <- sd(undulation_rates)
sur_sd
## [1] 0.324037
  1. Write a function to express the standard deviation as a percentage of the mean (that is, the coefficient of variation) and calculate it.
coef_v <- function(sur_mean, sur_sd){sur_mean <- sur_mean
sur_sd < sur_sd
}
(sur_sd/sur_mean)*100
## [1] 23.56633

Problem 2b

Blood pressure was measured (in units of mm Hg). Here are the measurements: 112, 128, 108, 129, 125, 153, 155, 132, 137.

bp <- c(112, 128, 108, 129, 125, 153, 155, 132, 137)
  1. How many individuals are in the sample?
length(bp)
## [1] 9
  1. What is the mean of this sample?
bp_mean <- mean(bp)
bp_mean
## [1] 131
  1. What is the variance?
bp_variance <- var(bp)
bp_variance
## [1] 254.5
  1. What is the standard deviation?
bp_sd <- sd(bp)
bp_sd
## [1] 15.95306
  1. What is the coefficient of variation?
coef_v <- function(bp_mean, bp_sd){bp_mean <- bp_mean
bp_sd <- bp_sd
}
(bp_sd/bp_mean)*100
## [1] 12.17791

Problem 3b

Calculate the probability of each of the following events: a) A standard normally distributed variable is larger than 3

pnorm(3, lower.tail = FALSE)
## [1] 0.001349898
  1. A normally distributed variable with mean 35 and standard deviation 6 is larger than 42
pnorm(42, mean = 35, sd = 6, lower.tail = FALSE)
## [1] 0.1216725
  1. Getting 10 out of 10 successes in a binomial distribution with probability 0.8
dbinom(10, size = 10, prob = 0.8)
## [1] 0.1073742
  1. X > 6.5 in a Chi-squared distribution with 2 degrees of freedom
pchisq(6.5, df = 2, lower.tail = FALSE)
## [1] 0.03877421

Problem 5b

Imagine height is genetically determined by the combined (that is, the sum) effect of several genes (polygenic trait). Assume that each gene has an effect on height as a uniform distribution with min=1 and max=3. Simulate an stochastic model of height for 1,000 random people based on 1 gene,

single <- runif(1000, 1, 3)
hist(single, main = "Single Gene Height Determination", xlab = "Height")

2 genes,

double <- runif(1000, 1, 3) + runif(1000, 1, 3)
hist(double, main = "Double Gene Height Determination", xlab = "Height")

and 5 genes.

quintuple <- runif(1000, 1, 3) + runif(1000, 1, 3) + runif(1000, 1, 3) + runif(1000, 1, 3) + runif(1000, 1, 3)
hist(quintuple, main = "Quintuple Gene Height Determination", xlab = "Height")

As we increase the number of genes, what is the resulting height distribution?

As we increase the number of genes, the distribution becomes more centralized with relatively fewer outliers on either end of the spectrum. The curve of the histogram becomes more similar to a downwards-facing parabola at the 5 gene determination compared to the almost rectangular single-gene determination.

Problem 1c

Normal human body temperature is 98.6 F. Researchers obtained body-temperature measurements on randomly chosen healthy people: 98.4, 98.6, 97.8, 98.8, 97.9, 99.0, 98.2, 98.8, 98.8, 99.0, 98.0, 99.2, 99.5, 99.4, 98.4, 99.1, 98.4, 97.6, 97.4, 97.5, 97.5, 98.8, 98.6, 100.0, 98.4.

bodytempsurvey <- c(98.4, 98.6, 97.8, 98.8, 97.9, 99.0, 98.2, 98.8, 98.8, 99.0, 98.0, 99.2, 99.5, 99.4, 98.4, 99.1, 98.4, 97.6, 97.4, 97.5, 97.5, 98.8, 98.6, 100.0, 98.4)
  1. Make a histogram of the data
hist(bodytempsurvey)

  1. Make a normal quantile plot
qqnorm(bodytempsurvey)
qqline(bodytempsurvey)

  1. Are the data normally distributed?
shapiro.test(bodytempsurvey)
## 
##  Shapiro-Wilk normality test
## 
## data:  bodytempsurvey
## W = 0.97216, p-value = 0.7001
t.test(bodytempsurvey, mu = 98.6)
## 
##  One Sample t-test
## 
## data:  bodytempsurvey
## t = -0.56065, df = 24, p-value = 0.5802
## alternative hypothesis: true mean is not equal to 98.6
## 95 percent confidence interval:
##  98.24422 98.80378
## sample estimates:
## mean of x 
##    98.524

Yes, they are normally distributed.

  1. Are these measurements consistent with a population mean of 98.6 F?

Yes, these measurements are consistent with a population mean body temperature of 98.6F, as it falls within the confidence interval.

Problem 2c

Ten epileptic patients participated in a study of a new anticonvulsant drug. During the first 8-week period, half the patients received a placebo and half were given the drug, and the number of seizures were recorded. Following this, the same patients were given the opposite treatment, and the number of seizures were recorded. Assuming that the distribution of the difference between the placebo and drug meets the assumption of normality, perform an appropriate test to determine whether there were differences in the number of epileptic seizures with and without the drug.

epimedstudy <- data.frame(c(1:10), c(37, 52, 68, 4, 29, 32, 19, 52, 19, 12), c(5, 23, 40, 3, 38, 19, 9, 24, 17, 14))
names(epimedstudy) <- c("Patient Number", "Seizures: Placebo", "Seizures: Actual Med")
head(epimedstudy)
##   Patient Number Seizures: Placebo Seizures: Actual Med
## 1              1                37                    5
## 2              2                52                   23
## 3              3                68                   40
## 4              4                 4                    3
## 5              5                29                   38
## 6              6                32                   19
placebo = c(37, 52, 68, 4, 29, 32, 19, 52, 19, 12)
actual = c(5, 23, 40, 3, 38, 19, 9, 24, 17, 14)
  1. Make a boxplot of the data
boxplot(placebo, actual, names=c("Placebo", "Experimental"), main="Seizures by Treatment")

  1. Test the difference.
t.test(placebo, actual, paired = TRUE)
## 
##  Paired t-test
## 
## data:  placebo and actual
## t = 2.7661, df = 9, p-value = 0.02189
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##   2.404666 23.995334
## sample estimates:
## mean difference 
##            13.2
t.test(placebo, actual, paired = TRUE)$p.value
## [1] 0.02189433

p value is less than 0.05, indicating a significant difference between control (placebo) and experimental (actual) groups.

Problem 4c

A bee biologist is analyzing whether there was an association between bee colony number and type of forest habitat. We expect that there is no habitat preference for bee colony number.

beecolhab <- (c("Oak" = 33, "Hickory" = 30, "Maple" = 29, "Red Cedar" = 4, "Poplar" = 4))
  1. Make a barplot of the data
barplot(beecolhab)

  1. Test the hypothesis of no-association
chisq.test(beecolhab)
## 
##  Chi-squared test for given probabilities
## 
## data:  beecolhab
## X-squared = 43.1, df = 4, p-value = 9.865e-09
chisq.test(beecolhab)$p.value
## [1] 9.865077e-09

Is this true based on this data? Having run a chi-squared test as a goodness-of-fit test, and having a p-value (relatively far) lower than 0.05 suggests there is evidence supporting habitat preference for bee colony numbers

Problem 5c

Perform 10 one-sample t-tests for mu=0 on simulated standard normally distributed data of 25 observations each and get the P-value.

replicate(10, t.test(rnorm(25), mu = 0)$p.value)
##  [1] 0.52416886 0.96484912 0.80712460 0.46682219 0.27731529 0.82666178
##  [7] 0.58099065 0.06106349 0.78961751 0.99383502

Repeat the experiment, but instead simulate samples of 25 observations from a t-distribution with 2 degrees of freedom.

replicate (10, t.test(rt(25, df = 2), mu = 0)$p.value)
##  [1] 0.8506760 0.6567203 0.2315762 0.5864703 0.2089147 0.6842468 0.7622087
##  [8] 0.4138133 0.9491339 0.6274438

Find a way to automate the first experiment to do it a 1,000 times and applying a false discovery rate correction to the P-values (check ?replicate).

p.adjust(replicate(1000, t.test(rnorm(25), mu = 0)$p.value), method = "fdr")
##    [1] 0.9723148 0.8249871 0.9988447 0.9982948 0.9330501 0.9330501 0.9988447
##    [8] 0.9982948 0.9723148 0.9330501 0.9988447 0.9988447 0.9455095 0.9723148
##   [15] 0.9988447 0.9988447 0.9988447 0.9723148 0.9982948 0.9988447 0.9723148
##   [22] 0.9330501 0.9982948 0.9988447 0.9988447 0.9723148 0.9455095 0.9723148
##   [29] 0.9982948 0.9723148 0.9723148 0.9723148 0.9982948 0.9982948 0.9988447
##   [36] 0.9988447 0.9988447 0.9982948 0.9988447 0.9988447 0.9982948 0.9988447
##   [43] 0.9723148 0.9330501 0.9723148 0.9777742 0.9988447 0.9988447 0.9982948
##   [50] 0.9982948 0.9988447 0.9723148 0.9982948 0.9988447 0.9723148 0.9982948
##   [57] 0.9988447 0.9988447 0.9982948 0.9988447 0.9723148 0.9988447 0.9988447
##   [64] 0.9723148 0.9988447 0.9723148 0.9988447 0.9988447 0.9723148 0.9982948
##   [71] 0.9988447 0.9982948 0.9723148 0.9805739 0.9988447 0.9988447 0.9993963
##   [78] 0.9723148 0.9581171 0.9723148 0.9982948 0.9982948 0.9982948 0.9723148
##   [85] 0.9988447 0.9723148 0.9723148 0.9723148 0.9988447 0.9988447 0.8249871
##   [92] 0.9455095 0.9982948 0.9723148 0.9723148 0.9988447 0.9723148 0.9723148
##   [99] 0.9988447 0.9330501 0.9330501 0.9723148 0.9330501 0.9752485 0.9455095
##  [106] 0.9988447 0.9993963 0.9988447 0.9988447 0.9982948 0.9723148 0.9982948
##  [113] 0.9988447 0.9723148 0.9752485 0.9982948 0.9330501 0.9982948 0.9723148
##  [120] 0.9723148 0.9988447 0.9752485 0.9723148 0.9982948 0.9988447 0.9988447
##  [127] 0.9723148 0.9723148 0.9988447 0.9723148 0.9752485 0.9744790 0.9455095
##  [134] 0.9723148 0.9455095 0.9988447 0.9982948 0.9988447 0.9982948 0.9723148
##  [141] 0.9455095 0.9455095 0.9982948 0.9752485 0.9988447 0.9988447 0.9723148
##  [148] 0.9988447 0.9723148 0.9982948 0.9330501 0.9723148 0.9982948 0.9723148
##  [155] 0.9723148 0.9723148 0.9723148 0.9455095 0.9330501 0.9988447 0.9982948
##  [162] 0.9982948 0.8774087 0.9723148 0.9744790 0.9982948 0.9455095 0.9988447
##  [169] 0.9723148 0.9723148 0.9330501 0.9455095 0.9723148 0.9988447 0.9455095
##  [176] 0.9988447 0.9988447 0.9723148 0.9988447 0.9982948 0.9988447 0.9723148
##  [183] 0.9723148 0.9988447 0.9723148 0.9988447 0.9988447 0.9723148 0.9330501
##  [190] 0.9455095 0.9988447 0.9988447 0.9988447 0.9330501 0.9477678 0.9723148
##  [197] 0.9723148 0.9982948 0.9723148 0.9988447 0.9982948 0.9988447 0.9988447
##  [204] 0.9982948 0.9723148 0.9330501 0.9982948 0.9723148 0.9723148 0.9988447
##  [211] 0.9723148 0.9988447 0.9723148 0.9723148 0.9988447 0.9723148 0.9330501
##  [218] 0.9957783 0.9723148 0.9982948 0.9982948 0.9723148 0.9988447 0.9455095
##  [225] 0.9988447 0.9982948 0.9330501 0.9723148 0.9988447 0.9723148 0.9982948
##  [232] 0.9455095 0.9988447 0.9330501 0.9988447 0.9982948 0.9988447 0.9988447
##  [239] 0.9744790 0.9723148 0.9988447 0.9988447 0.9988447 0.9988447 0.9330501
##  [246] 0.9723148 0.9982948 0.9723148 0.9744790 0.9988447 0.9723148 0.9982948
##  [253] 0.9988447 0.9723148 0.9330501 0.6823811 0.9988447 0.9723148 0.9330501
##  [260] 0.9749366 0.9330501 0.9723148 0.9723148 0.9455095 0.9988447 0.9982948
##  [267] 0.9988447 0.9330501 0.9779414 0.9723148 0.9982948 0.9988447 0.9988447
##  [274] 0.9988447 0.9723148 0.9330501 0.9723148 0.9988447 0.9330501 0.9723148
##  [281] 0.9723148 0.9330501 0.9752485 0.9982948 0.9723148 0.9455095 0.9723148
##  [288] 0.9723148 0.9988447 0.9723148 0.9723148 0.9330501 0.9330501 0.9330501
##  [295] 0.9723148 0.9330501 0.9455095 0.9330501 0.9723148 0.9988447 0.6492151
##  [302] 0.9330501 0.9982948 0.9988447 0.9723148 0.9988447 0.9988447 0.9330501
##  [309] 0.9455095 0.9988447 0.9988447 0.9723148 0.9988447 0.9988447 0.9330501
##  [316] 0.9723148 0.9723148 0.9723148 0.9988447 0.9330501 0.9752485 0.9988447
##  [323] 0.9455095 0.9982948 0.9455095 0.9723148 0.9723148 0.9779414 0.9330501
##  [330] 0.9330501 0.9723148 0.8249871 0.9723148 0.9723148 0.9982948 0.9988447
##  [337] 0.9723148 0.9330501 0.9723148 0.9982948 0.9723148 0.9330501 0.9982948
##  [344] 0.9988447 0.9982948 0.9723148 0.9723148 0.9988447 0.9988447 0.9330501
##  [351] 0.9988447 0.9744790 0.9982948 0.9988447 0.9988447 0.9988447 0.9723148
##  [358] 0.9988447 0.9455095 0.9723148 0.7779686 0.9982948 0.9723148 0.9982948
##  [365] 0.9723148 0.9988447 0.8148259 0.9982948 0.9988447 0.9982948 0.9988447
##  [372] 0.9982948 0.9723148 0.9988447 0.9982948 0.9982948 0.9988447 0.9455095
##  [379] 0.9988447 0.9723148 0.9988447 0.9982948 0.9988447 0.9455095 0.9982948
##  [386] 0.9723148 0.9723148 0.9723148 0.8774087 0.9330501 0.9330501 0.9330501
##  [393] 0.9988447 0.9982948 0.9723148 0.9330501 0.9723148 0.9723148 0.9723148
##  [400] 0.9723148 0.9723148 0.9330501 0.9982948 0.9982948 0.9982948 0.9723148
##  [407] 0.9723148 0.9982948 0.9723148 0.9455095 0.9988447 0.9988447 0.9723148
##  [414] 0.9988447 0.9330501 0.9988447 0.9723148 0.9988447 0.9723148 0.9982948
##  [421] 0.9744790 0.9988447 0.9330501 0.9723148 0.9988447 0.9988447 0.9455095
##  [428] 0.9988447 0.9723148 0.9330501 0.9988447 0.9744790 0.9330501 0.9723148
##  [435] 0.9988447 0.8774087 0.9455095 0.9988447 0.9723148 0.9723148 0.9988447
##  [442] 0.9723148 0.9723148 0.9752485 0.9752485 0.9723148 0.9988447 0.9744790
##  [449] 0.9988447 0.9982948 0.9455095 0.9988447 0.9988447 0.9988447 0.9982948
##  [456] 0.9723148 0.9723148 0.9988447 0.9723148 0.9982948 0.9723148 0.9982948
##  [463] 0.9723148 0.9988447 0.9988447 0.9988447 0.9982948 0.9988447 0.9455095
##  [470] 0.9988447 0.9330501 0.9982948 0.9810300 0.9723148 0.9988447 0.9455095
##  [477] 0.9982948 0.9723148 0.9723148 0.9723148 0.9330501 0.9723148 0.9723148
##  [484] 0.9988447 0.9982948 0.8774087 0.9330501 0.9723148 0.9988447 0.9982948
##  [491] 0.9723148 0.9988447 0.9723148 0.9988447 0.9723148 0.9455095 0.9455095
##  [498] 0.9982948 0.9988447 0.9330501 0.9723148 0.9988447 0.9988447 0.9988447
##  [505] 0.9988447 0.9988447 0.9723148 0.9723148 0.9723148 0.9988447 0.9988447
##  [512] 0.9330501 0.9721226 0.9723148 0.9752485 0.9982948 0.9988447 0.9723148
##  [519] 0.9723148 0.9723148 0.9988447 0.9330501 0.9988447 0.9723148 0.9723148
##  [526] 0.9988447 0.9982948 0.9988447 0.9988447 0.9723148 0.9516867 0.9723148
##  [533] 0.9988447 0.9723148 0.9988447 0.9988447 0.9988447 0.9988447 0.9723148
##  [540] 0.8712003 0.9455095 0.9723148 0.9455095 0.9455095 0.9982948 0.9723148
##  [547] 0.9723148 0.9988447 0.9982948 0.9988447 0.9744790 0.9330501 0.9988447
##  [554] 0.9988447 0.9988447 0.8774087 0.9988447 0.9982948 0.9723148 0.9723148
##  [561] 0.9988447 0.9988447 0.9982948 0.9988447 0.9723148 0.9988447 0.9680508
##  [568] 0.9723148 0.9988447 0.9723148 0.9988447 0.9330501 0.9330501 0.9988447
##  [575] 0.9455095 0.9982948 0.9330501 0.9988447 0.9982948 0.8774087 0.9988447
##  [582] 0.9988447 0.9330501 0.9988447 0.9988447 0.9988447 0.9752485 0.9723148
##  [589] 0.9723148 0.9988447 0.9723148 0.9723148 0.9723148 0.9988447 0.9723148
##  [596] 0.9455095 0.9723148 0.8774087 0.9723148 0.9330501 0.9723148 0.9988447
##  [603] 0.9455095 0.9455095 0.9752485 0.9988447 0.9330501 0.9982948 0.9988447
##  [610] 0.9988447 0.9982948 0.9982948 0.9988447 0.9988447 0.2509067 0.9330501
##  [617] 0.9330501 0.9988447 0.9988447 0.9988447 0.9988447 0.9752485 0.9744790
##  [624] 0.8774087 0.9982948 0.9723148 0.9723148 0.9982948 0.9982948 0.9455095
##  [631] 0.9988447 0.9723148 0.9723148 0.9988447 0.9330501 0.9988447 0.9988447
##  [638] 0.9988447 0.9988447 0.9988447 0.9988447 0.9982948 0.9988447 0.9744790
##  [645] 0.9988447 0.9805739 0.9988447 0.9723148 0.9330501 0.9988447 0.9752485
##  [652] 0.9723148 0.9982948 0.9982948 0.9723148 0.9723148 0.9988447 0.9723148
##  [659] 0.9723148 0.9982948 0.9455095 0.9330501 0.9723148 0.9982948 0.9988447
##  [666] 0.9723148 0.9495821 0.9982948 0.9455095 0.9988447 0.9982948 0.9988447
##  [673] 0.9988447 0.9330501 0.9988447 0.9723148 0.9330501 0.9330501 0.9581171
##  [680] 0.9723148 0.9723148 0.9988447 0.9988447 0.9988447 0.9988447 0.9330501
##  [687] 0.9723148 0.9744790 0.9988447 0.9988447 0.9455095 0.9455095 0.9805739
##  [694] 0.9330501 0.9988447 0.9744790 0.9982948 0.9988447 0.9982948 0.9723148
##  [701] 0.9982948 0.9988447 0.9723148 0.9988447 0.9455095 0.9723148 0.9982948
##  [708] 0.9982948 0.9455095 0.9330501 0.9330501 0.9330501 0.9982948 0.9752485
##  [715] 0.9988447 0.9988447 0.6823811 0.9723148 0.9982948 0.9982948 0.9988447
##  [722] 0.9723148 0.9988447 0.9982948 0.9982948 0.9723148 0.9988447 0.9455095
##  [729] 0.9988447 0.9988447 0.9455095 0.9723148 0.9752485 0.9988447 0.9982948
##  [736] 0.9988447 0.9723148 0.9982948 0.9982948 0.9723148 0.9988447 0.9982948
##  [743] 0.9982948 0.9752485 0.9723148 0.9723148 0.9988447 0.9982948 0.9723148
##  [750] 0.9723148 0.9723148 0.9988447 0.9982948 0.9330501 0.9988447 0.9723148
##  [757] 0.9723148 0.9988447 0.9988447 0.9723148 0.9988447 0.9982948 0.9982948
##  [764] 0.9988447 0.9988447 0.9330501 0.9982948 0.9330501 0.9330501 0.9752485
##  [771] 0.9988447 0.9988447 0.9330501 0.9330501 0.9988447 0.9723148 0.9330501
##  [778] 0.9982948 0.9982948 0.9988447 0.9723148 0.9988447 0.9723148 0.9982948
##  [785] 0.9723148 0.9723148 0.9723148 0.9455095 0.9982948 0.9988447 0.9988447
##  [792] 0.9988447 0.9988447 0.9988447 0.9744790 0.9988447 0.9723148 0.9723148
##  [799] 0.9982948 0.9988447 0.9723148 0.9723148 0.9988447 0.9723148 0.9988447
##  [806] 0.9988447 0.9982948 0.9988447 0.9723148 0.9988447 0.9744790 0.9744790
##  [813] 0.9330501 0.9988447 0.9723148 0.9982948 0.9752485 0.9988447 0.9988447
##  [820] 0.9723148 0.9982948 0.9810300 0.9723148 0.9988447 0.9982948 0.9723148
##  [827] 0.9723148 0.2509067 0.9982948 0.9982948 0.9723148 0.9988447 0.9723148
##  [834] 0.9723148 0.9723148 0.9723148 0.9330501 0.9982948 0.9988447 0.9723148
##  [841] 0.9330501 0.9982948 0.9330501 0.9988447 0.9988447 0.9988447 0.9723148
##  [848] 0.9988447 0.9988447 0.9982948 0.9988447 0.9988447 0.9723148 0.9982948
##  [855] 0.9723148 0.9988447 0.9988447 0.9723148 0.9982948 0.9982948 0.9330501
##  [862] 0.9330501 0.9723148 0.6492151 0.9723148 0.9723148 0.9723148 0.9982948
##  [869] 0.9988447 0.9723148 0.9988447 0.9723148 0.9744790 0.9330501 0.9723148
##  [876] 0.9723148 0.9723148 0.9982948 0.9723148 0.9982948 0.9723148 0.9988447
##  [883] 0.9988447 0.9988447 0.9988447 0.9723148 0.9988447 0.9982948 0.9723148
##  [890] 0.9988447 0.9330501 0.9988447 0.9982948 0.9455095 0.9723148 0.9988447
##  [897] 0.9988447 0.9982948 0.9330501 0.9723148 0.9988447 0.9723148 0.9723148
##  [904] 0.9330501 0.9723148 0.9330501 0.9723148 0.9455095 0.9723148 0.9982948
##  [911] 0.9988447 0.9723148 0.9988447 0.9988447 0.9988447 0.9988447 0.9988447
##  [918] 0.9744790 0.9988447 0.9988447 0.9988447 0.9330501 0.9723148 0.9982948
##  [925] 0.9455095 0.9982948 0.9988447 0.9723148 0.9330501 0.8148259 0.9455095
##  [932] 0.9455095 0.9982948 0.9982948 0.9988447 0.9988447 0.9752485 0.9988447
##  [939] 0.9330501 0.9723148 0.9988447 0.9988447 0.9988447 0.9988447 0.9805739
##  [946] 0.9988447 0.9723148 0.9744790 0.9330501 0.9723148 0.9988447 0.9330501
##  [953] 0.9988447 0.9752485 0.9744790 0.9723148 0.9744790 0.9723148 0.9982948
##  [960] 0.9988447 0.9752485 0.9723148 0.9982948 0.9330501 0.9805739 0.9982948
##  [967] 0.9988447 0.9330501 0.9988447 0.2509067 0.9723148 0.9988447 0.9330501
##  [974] 0.9988447 0.8774087 0.9723148 0.9988447 0.9455095 0.9723148 0.9723148
##  [981] 0.9723148 0.9723148 0.9455095 0.9723148 0.9982948 0.9988447 0.9988447
##  [988] 0.9988447 0.9988447 0.9723148 0.8774087 0.9723148 0.9330501 0.9330501
##  [995] 0.9988447 0.9455095 0.9988447 0.9988447 0.9330501 0.9810300

Problem 1d

You are hired by the US Department of Agriculture to develop effective control practices for invasive plants (data_ANOVA.xlsx). In particular, you are asked to investigate pesticide controls for kudzu which is an invasive vine that grows in thick mats that smother underlying plants. Two of the most widely used pesticides for kudzu are glyphosate and triclopyr. To determine the effectiveness of a single application of pesticide, you conduct an experiment in 18 plots that each had 50% kudzu cover. In mid-summer, you applied equal amounts of 2% glyphosate to 6 plots, 2% triclopyr to 6 plots, and water without pesticides to 6 plots. Then you returned in autumn and measured the percent cover of kudzu in the plots.

library(readxl) install.packages(“reshape2”) library(reshape2) library(ggplot2)

  1. Transform the data from wide to long format (Google how to do it - Hint: function melt from reshape2 package)
USDA_Exp <- readxl::read_xlsx(path = "data_ANOVA.xlsx", col_names = T)

USDA_Exp
## # A tibble: 6 × 3
##   Glyphosate Triclopyr Control
##        <dbl>     <dbl>   <dbl>
## 1         20        17      34
## 2         17        14      31
## 3         24        12      29
## 4         18        16      32
## 5         16        18      27
## 6         22        13      30
KudzuCon <- reshape2::melt(USDA_Exp)
## No id variables; using all as measure variables
KudzuCon
##      variable value
## 1  Glyphosate    20
## 2  Glyphosate    17
## 3  Glyphosate    24
## 4  Glyphosate    18
## 5  Glyphosate    16
## 6  Glyphosate    22
## 7   Triclopyr    17
## 8   Triclopyr    14
## 9   Triclopyr    12
## 10  Triclopyr    16
## 11  Triclopyr    18
## 12  Triclopyr    13
## 13    Control    34
## 14    Control    31
## 15    Control    29
## 16    Control    32
## 17    Control    27
## 18    Control    30
boxplot(value~variable, data=KudzuCon, main="Kudzu Response", xlab = "Treatment", ylab = "Measured Cover (%)")

  1. Perform an ANOVA
anova(lm(value ~ variable, data = KudzuCon))
## Analysis of Variance Table
## 
## Response: value
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## variable   2    763   381.5    54.5 1.318e-07 ***
## Residuals 15    105     7.0                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lm(value ~ variable, data = KudzuCon))
## 
## Call:
## lm(formula = value ~ variable, data = KudzuCon)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.500 -1.875  0.000  1.875  4.500 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         19.500      1.080  18.053 1.38e-11 ***
## variableTriclopyr   -4.500      1.528  -2.946     0.01 *  
## variableControl     11.000      1.528   7.201 3.07e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.646 on 15 degrees of freedom
## Multiple R-squared:  0.879,  Adjusted R-squared:  0.8629 
## F-statistic:  54.5 on 2 and 15 DF,  p-value: 1.318e-07
summary(lm(value ~ variable, data = KudzuCon))$r.squared
## [1] 0.8790323
  1. What is the variation explained (R2)?

R^2 is the coefficient of determination, assessing how well a model fits the data. The R^2 value here is 0.88, so i texplains that about 88% of the variation can be explained by the data. Ideally an R^2 should be high, depending on the data collected (the data I’m working with for my research at the moment often has a value of 0.98 or 0.99), but 0.88 is by no means “bad”

Problem 2d

Data in growth.txt come from a farm-scale trial of animal diets. There are two factors: diet and supplement. Diet is a factor with three levels: barley, oats and wheat. Supplement is a factor with four levels: agrimore, control, supergain and supersupp. The response variable is weight gain after 6 weeks.

growth <- read.table("growth.txt", sep="\t", header = T)
head(growth)
##   supplement  diet     gain
## 1  supergain wheat 17.37125
## 2  supergain wheat 16.81489
## 3  supergain wheat 18.08184
## 4  supergain wheat 15.78175
## 5    control wheat 17.70656
## 6    control wheat 18.22717
  1. Inspect the data using boxplots I did this two ways, the “barebones” way I was orginally was going to do it, but also with more customization that I saw someone else use in similar applications online in the past…
library(ggplot2)

Weight as a function of diet

boxplot(gain ~ diet, data = growth)

diet_bp <- ggplot(growth, aes(x = diet, y = gain, fill = diet)) + geom_boxplot() +
  geom_jitter(shape = 16, position = position_jitter(0.2))+
  labs(x = "Diet", y = "Weight Gain (g)")
diet_bp

Weight as a function of supplement

boxplot(gain ~ supplement, data = growth)

supplement_bp <- ggplot(growth, aes(x = supplement, y = gain, fill = supplement)) +
  geom_boxplot() +
  geom_jitter(shape = 16, position = position_jitter(0.2)) + 
  labs(x = "Supplement", y = "Weight Gain (g)")
supplement_bp

Combinations FULL DISCLOSURE, I found this formatting style when double checking my own work online compared to that of others, and wanted to try learning how to format like this… My own entry before formatting like this was the first “barebones” entry

boxplot(gain ~ diet + supplement, data = growth)

combo_bp <- ggplot(growth, aes(x = supplement, y = gain, fill = supplement)) + geom_boxplot() +
  geom_jitter(shape = 16, position = position_jitter(0.2))+
  facet_wrap(~diet) + theme(axis.text.x = element_text(angle = 45)) + labs(x = "Diet & Supplement", y = "Weight Gain (g)") 
combo_bp 

  1. Perform a two-way ANOVA
anova(lm(gain ~ diet * supplement, data = growth))
## Analysis of Variance Table
## 
## Response: gain
##                 Df  Sum Sq Mean Sq F value    Pr(>F)    
## diet             2 287.171 143.586 83.5201 2.999e-14 ***
## supplement       3  91.881  30.627 17.8150 2.952e-07 ***
## diet:supplement  6   3.406   0.568  0.3302    0.9166    
## Residuals       36  61.890   1.719                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Assess graphically if there exists an interaction between both factors
interaction.plot(growth$supplement, growth$diet, growth$gain, type = "b", col = c("darkgreen", "blue", "darkred"), main = "Interaction of Diet and Supplement Type on One Another", xlab = "Supplement", ylab = "Gain (g)", trace.label = "Diet")

interaction.plot(growth$diet, growth$supplement, growth$gain, type="b", col=c("darkgreen","blue", "darkred"), 
                 main = "Interaction of Diet and Supplement Type on Gain",
                 xlab = "Diet", ylab = "Gain", trace.label = "Supplement")

  1. Given what you learnt about the interaction, what would be a better model? This ANOVA seems like a better fit
anova(lm(gain ~ diet + supplement, data = growth))
## Analysis of Variance Table
## 
## Response: gain
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## diet        2 287.171 143.586  92.358 4.199e-16 ***
## supplement  3  91.881  30.627  19.700 3.980e-08 ***
## Residuals  42  65.296   1.555                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. What is the variation explained (R2) of this new model?
summary(lm(gain ~ diet + supplement, data = growth))
## 
## Call:
## lm(formula = gain ~ diet + supplement, data = growth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.30792 -0.85929 -0.07713  0.92052  2.90615 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          26.1230     0.4408  59.258  < 2e-16 ***
## dietoats             -3.0928     0.4408  -7.016 1.38e-08 ***
## dietwheat            -5.9903     0.4408 -13.589  < 2e-16 ***
## supplementcontrol    -2.6967     0.5090  -5.298 4.03e-06 ***
## supplementsupergain  -3.3815     0.5090  -6.643 4.72e-08 ***
## supplementsupersupp  -0.7274     0.5090  -1.429     0.16    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.247 on 42 degrees of freedom
## Multiple R-squared:  0.8531, Adjusted R-squared:  0.8356 
## F-statistic: 48.76 on 5 and 42 DF,  p-value: < 2.2e-16

0.8531, adjusted = 0.8356

  1. Perform a Tukey HSD test of this new model
TukeyHSD(aov(gain ~ diet + supplement, data = growth))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = gain ~ diet + supplement, data = growth)
## 
## $diet
##                   diff       lwr       upr p adj
## oats-barley  -3.092817 -4.163817 -2.021817 0e+00
## wheat-barley -5.990298 -7.061298 -4.919297 0e+00
## wheat-oats   -2.897481 -3.968481 -1.826481 2e-07
## 
## $supplement
##                           diff        lwr        upr     p adj
## control-agrimore    -2.6967005 -4.0583332 -1.3350677 0.0000234
## supergain-agrimore  -3.3814586 -4.7430914 -2.0198259 0.0000003
## supersupp-agrimore  -0.7273521 -2.0889849  0.6342806 0.4888738
## supergain-control   -0.6847581 -2.0463909  0.6768746 0.5400389
## supersupp-control    1.9693484  0.6077156  3.3309811 0.0020484
## supersupp-supergain  2.6541065  1.2924737  4.0157392 0.0000307

Problem 1e

In a study of hyena laughter, a researcher investigated whether sound spectral properties of hyenas’ giggles are associated with age.

# Age(years): 2, 2, 2, 6, 9, 10, 13, 10, 14, 14, 12, 7, 11, 11, 14, 20

age <- c(2,2,2,6,9,10,13,10,14,14,12,7,11,11,14,20)

# Frequency (Hz): 840,670,580,470,540,660,510,520,500,480,400,650,460,500,580,500

freq <- c(840,670,580,470,540,660,510,520,500,480,400,650,460,500,580,500)
  1. Inspect the data using a scatterplot
agelaugh <- data.frame(age = c(2,2,2,6,9,10,13,10,14,14,12,7,11,11,14,20), freq = c(840,670,580,470,540,660,510,520,500,480,400,650,460,500,580,500))
head(agelaugh)
##   age freq
## 1   2  840
## 2   2  670
## 3   2  580
## 4   6  470
## 5   9  540
## 6  10  660
plot(agelaugh)

  1. Test the linear association between both variables
cor.test(agelaugh$age, agelaugh$freq, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  agelaugh$age and agelaugh$freq
## t = -2.8194, df = 14, p-value = 0.01365
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8453293 -0.1511968
## sample estimates:
##       cor 
## -0.601798
  1. Assume that the data are not normally distributed, test the linear association using a non-parametric correlation coefficient
cor.test(agelaugh$age, agelaugh$freq, method = "spearman", exact = FALSE)
## 
##  Spearman's rank correlation rho
## 
## data:  agelaugh$age and agelaugh$freq
## S = 1047.1, p-value = 0.03092
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.5397807

Problem 1f

Testosterone is known to predict aggressive behavior and is involved in face shape during puberty. Does face shape predict aggression? Researchers measured the face width-to-height ratio of 21 university hockey players with the average number of penalty minutes awarded per game for aggressive infractions. The data are available in face.txt.

library(ggplot2)
library(see)
## Warning: package 'see' was built under R version 4.3.2
testo <- read.table("face.txt", sep = "\t", header = T)
head(testo)
##   Face Penalty
## 1 1.59    0.44
## 2 1.67    1.43
## 3 1.65    1.57
## 4 1.72    0.14
## 5 1.79    0.27
## 6 1.77    0.35
plota <- ggplot(testo, aes(x = Face, y = Penalty)) + geom_point()
plota

model_testo <- lm(Penalty ~ Face, data = testo)
summary(model_testo)
## 
## Call:
## lm(formula = Penalty ~ Face, data = testo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9338 -0.3883 -0.1260  0.3381  1.0711 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   -4.505      2.089  -2.157   0.0440 *
## Face           3.189      1.150   2.774   0.0121 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6129 on 19 degrees of freedom
## Multiple R-squared:  0.2882, Adjusted R-squared:  0.2508 
## F-statistic: 7.694 on 1 and 19 DF,  p-value: 0.0121
plotb <- ggplot(testo, aes(x = Face, y = Penalty)) + 
  geom_point() +        
  stat_smooth(method = "lm", col = "blue", se=F)
plotb
## `geom_smooth()` using formula = 'y ~ x'

library(performance)
## Warning: package 'performance' was built under R version 4.3.2
check_model(lm(Penalty ~ Face, data = testo))

  1. How steeply does the number of penalty minutes increase per unit increase in face ratio? Calculate the estimate of the intercept. Write the result in the form of an equation for the line. See: y = mx + b <- y = 3.189x - 4.505, where the slope is 3.189 and the intercept is -4.505

  2. How many degrees of freedom does this analysis have? See: summary(model_testo) - 19 DF

  3. What is the t-statistic? See: summary(model_testo) - t value = -2.157(intercept), 2.774

  4. What is your final conclusion about the slope? The slope is a positive slope, indicating an increase in 3.189 face ratio units for each penalty minute.

  5. What is the variation explained, R2? The R^2 value (multiple) is 0.2882, adjusted is 0.2508.

  6. Make a scatterplot of the data and include a linear regression line See: Above code: plotb

  7. Check the model assumptions See: Above code: check_model