The purpose of this lab is to have you investigate more complex statistical models, ones that include multiple variables!

NOTE: there is code in the hypothesis testing guide document to help create code to run these types of tests, but use this link for more detail and for other methods of graphing: https://www.datanovia.com/en/lessons/ancova-in-r/

PART 1: ANCOVA: flower characteristics

The “iris.csv” dataset is commonly used for examples in R. The dataset utilizes measurements of flower parts across different species of irises. Please download and bring in the “iris.csv” file.

For this dataset, I would like you to do the following:

  1. Code: run the ANCOVA to determine how ‘variety’ and ‘sepal.width’ influence ‘sepal.length’ NOTE: don’t worry about testing assumptions for this (you would need to do this normally)
iris<-read.table('iris.csv', ',', header=T)
iris
##     sepal.length sepal.width petal.length petal.width    variety
## 1            5.1         3.5          1.4         0.2     Setosa
## 2            4.9         3.0          1.4         0.2     Setosa
## 3            4.7         3.2          1.3         0.2     Setosa
## 4            4.6         3.1          1.5         0.2     Setosa
## 5            5.0         3.6          1.4         0.2     Setosa
## 6            5.4         3.9          1.7         0.4     Setosa
## 7            4.6         3.4          1.4         0.3     Setosa
## 8            5.0         3.4          1.5         0.2     Setosa
## 9            4.4         2.9          1.4         0.2     Setosa
## 10           4.9         3.1          1.5         0.1     Setosa
## 11           5.4         3.7          1.5         0.2     Setosa
## 12           4.8         3.4          1.6         0.2     Setosa
## 13           4.8         3.0          1.4         0.1     Setosa
## 14           4.3         3.0          1.1         0.1     Setosa
## 15           5.8         4.0          1.2         0.2     Setosa
## 16           5.7         4.4          1.5         0.4     Setosa
## 17           5.4         3.9          1.3         0.4     Setosa
## 18           5.1         3.5          1.4         0.3     Setosa
## 19           5.7         3.8          1.7         0.3     Setosa
## 20           5.1         3.8          1.5         0.3     Setosa
## 21           5.4         3.4          1.7         0.2     Setosa
## 22           5.1         3.7          1.5         0.4     Setosa
## 23           4.6         3.6          1.0         0.2     Setosa
## 24           5.1         3.3          1.7         0.5     Setosa
## 25           4.8         3.4          1.9         0.2     Setosa
## 26           5.0         3.0          1.6         0.2     Setosa
## 27           5.0         3.4          1.6         0.4     Setosa
## 28           5.2         3.5          1.5         0.2     Setosa
## 29           5.2         3.4          1.4         0.2     Setosa
## 30           4.7         3.2          1.6         0.2     Setosa
## 31           4.8         3.1          1.6         0.2     Setosa
## 32           5.4         3.4          1.5         0.4     Setosa
## 33           5.2         4.1          1.5         0.1     Setosa
## 34           5.5         4.2          1.4         0.2     Setosa
## 35           4.9         3.1          1.5         0.2     Setosa
## 36           5.0         3.2          1.2         0.2     Setosa
## 37           5.5         3.5          1.3         0.2     Setosa
## 38           4.9         3.6          1.4         0.1     Setosa
## 39           4.4         3.0          1.3         0.2     Setosa
## 40           5.1         3.4          1.5         0.2     Setosa
## 41           5.0         3.5          1.3         0.3     Setosa
## 42           4.5         2.3          1.3         0.3     Setosa
## 43           4.4         3.2          1.3         0.2     Setosa
## 44           5.0         3.5          1.6         0.6     Setosa
## 45           5.1         3.8          1.9         0.4     Setosa
## 46           4.8         3.0          1.4         0.3     Setosa
## 47           5.1         3.8          1.6         0.2     Setosa
## 48           4.6         3.2          1.4         0.2     Setosa
## 49           5.3         3.7          1.5         0.2     Setosa
## 50           5.0         3.3          1.4         0.2     Setosa
## 51           7.0         3.2          4.7         1.4 Versicolor
## 52           6.4         3.2          4.5         1.5 Versicolor
## 53           6.9         3.1          4.9         1.5 Versicolor
## 54           5.5         2.3          4.0         1.3 Versicolor
## 55           6.5         2.8          4.6         1.5 Versicolor
## 56           5.7         2.8          4.5         1.3 Versicolor
## 57           6.3         3.3          4.7         1.6 Versicolor
## 58           4.9         2.4          3.3         1.0 Versicolor
## 59           6.6         2.9          4.6         1.3 Versicolor
## 60           5.2         2.7          3.9         1.4 Versicolor
## 61           5.0         2.0          3.5         1.0 Versicolor
## 62           5.9         3.0          4.2         1.5 Versicolor
## 63           6.0         2.2          4.0         1.0 Versicolor
## 64           6.1         2.9          4.7         1.4 Versicolor
## 65           5.6         2.9          3.6         1.3 Versicolor
## 66           6.7         3.1          4.4         1.4 Versicolor
## 67           5.6         3.0          4.5         1.5 Versicolor
## 68           5.8         2.7          4.1         1.0 Versicolor
## 69           6.2         2.2          4.5         1.5 Versicolor
## 70           5.6         2.5          3.9         1.1 Versicolor
## 71           5.9         3.2          4.8         1.8 Versicolor
## 72           6.1         2.8          4.0         1.3 Versicolor
## 73           6.3         2.5          4.9         1.5 Versicolor
## 74           6.1         2.8          4.7         1.2 Versicolor
## 75           6.4         2.9          4.3         1.3 Versicolor
## 76           6.6         3.0          4.4         1.4 Versicolor
## 77           6.8         2.8          4.8         1.4 Versicolor
## 78           6.7         3.0          5.0         1.7 Versicolor
## 79           6.0         2.9          4.5         1.5 Versicolor
## 80           5.7         2.6          3.5         1.0 Versicolor
## 81           5.5         2.4          3.8         1.1 Versicolor
## 82           5.5         2.4          3.7         1.0 Versicolor
## 83           5.8         2.7          3.9         1.2 Versicolor
## 84           6.0         2.7          5.1         1.6 Versicolor
## 85           5.4         3.0          4.5         1.5 Versicolor
## 86           6.0         3.4          4.5         1.6 Versicolor
## 87           6.7         3.1          4.7         1.5 Versicolor
## 88           6.3         2.3          4.4         1.3 Versicolor
## 89           5.6         3.0          4.1         1.3 Versicolor
## 90           5.5         2.5          4.0         1.3 Versicolor
## 91           5.5         2.6          4.4         1.2 Versicolor
## 92           6.1         3.0          4.6         1.4 Versicolor
## 93           5.8         2.6          4.0         1.2 Versicolor
## 94           5.0         2.3          3.3         1.0 Versicolor
## 95           5.6         2.7          4.2         1.3 Versicolor
## 96           5.7         3.0          4.2         1.2 Versicolor
## 97           5.7         2.9          4.2         1.3 Versicolor
## 98           6.2         2.9          4.3         1.3 Versicolor
## 99           5.1         2.5          3.0         1.1 Versicolor
## 100          5.7         2.8          4.1         1.3 Versicolor
## 101          6.3         3.3          6.0         2.5  Virginica
## 102          5.8         2.7          5.1         1.9  Virginica
## 103          7.1         3.0          5.9         2.1  Virginica
## 104          6.3         2.9          5.6         1.8  Virginica
## 105          6.5         3.0          5.8         2.2  Virginica
## 106          7.6         3.0          6.6         2.1  Virginica
## 107          4.9         2.5          4.5         1.7  Virginica
## 108          7.3         2.9          6.3         1.8  Virginica
## 109          6.7         2.5          5.8         1.8  Virginica
## 110          7.2         3.6          6.1         2.5  Virginica
## 111          6.5         3.2          5.1         2.0  Virginica
## 112          6.4         2.7          5.3         1.9  Virginica
## 113          6.8         3.0          5.5         2.1  Virginica
## 114          5.7         2.5          5.0         2.0  Virginica
## 115          5.8         2.8          5.1         2.4  Virginica
## 116          6.4         3.2          5.3         2.3  Virginica
## 117          6.5         3.0          5.5         1.8  Virginica
## 118          7.7         3.8          6.7         2.2  Virginica
## 119          7.7         2.6          6.9         2.3  Virginica
## 120          6.0         2.2          5.0         1.5  Virginica
## 121          6.9         3.2          5.7         2.3  Virginica
## 122          5.6         2.8          4.9         2.0  Virginica
## 123          7.7         2.8          6.7         2.0  Virginica
## 124          6.3         2.7          4.9         1.8  Virginica
## 125          6.7         3.3          5.7         2.1  Virginica
## 126          7.2         3.2          6.0         1.8  Virginica
## 127          6.2         2.8          4.8         1.8  Virginica
## 128          6.1         3.0          4.9         1.8  Virginica
## 129          6.4         2.8          5.6         2.1  Virginica
## 130          7.2         3.0          5.8         1.6  Virginica
## 131          7.4         2.8          6.1         1.9  Virginica
## 132          7.9         3.8          6.4         2.0  Virginica
## 133          6.4         2.8          5.6         2.2  Virginica
## 134          6.3         2.8          5.1         1.5  Virginica
## 135          6.1         2.6          5.6         1.4  Virginica
## 136          7.7         3.0          6.1         2.3  Virginica
## 137          6.3         3.4          5.6         2.4  Virginica
## 138          6.4         3.1          5.5         1.8  Virginica
## 139          6.0         3.0          4.8         1.8  Virginica
## 140          6.9         3.1          5.4         2.1  Virginica
## 141          6.7         3.1          5.6         2.4  Virginica
## 142          6.9         3.1          5.1         2.3  Virginica
## 143          5.8         2.7          5.1         1.9  Virginica
## 144          6.8         3.2          5.9         2.3  Virginica
## 145          6.7         3.3          5.7         2.5  Virginica
## 146          6.7         3.0          5.2         2.3  Virginica
## 147          6.3         2.5          5.0         1.9  Virginica
## 148          6.5         3.0          5.2         2.0  Virginica
## 149          6.2         3.4          5.4         2.3  Virginica
## 150          5.9         3.0          5.1         1.8  Virginica
ancova <- aov(sepal.length ~ variety + sepal.width, data = iris)
summary(ancova)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## variety       2  63.21  31.606   164.8  < 2e-16 ***
## sepal.width   1  10.95  10.953    57.1 4.19e-12 ***
## Residuals   146  28.00   0.192                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Question: how do you interpret the results of ANCOVA? These p-vales are far below 0.05 therefore we reject the null hypothesis. Meaning there is some influence on variety and sepal.width on sepal.length.

PART 2: Two Way ANOVA: tilling methods and plant yield

Download and bring in the “biomassTill.csv” file. This file contains data on how different soil tilling methods impact the growth (biomass) of the plants. But the researchers also manipulated the amount of water received by each plant, which would obviously influence any measurement of growth.

For this dataset, I would like you to do the following:

  1. Code: change the “DVS” variable to a factor

  2. Code: Run a two-way ANOVA that looks at how “Tillage” and “DVS” (watering regime) influence “Biomass” NOTE: don’t worry about testing assumptions for this (you would need to do this normally)

  3. Question: how do you interpret the results of the two way ANOVA

The p-value of Tillage alone is 0.0796, meaning tillage alone significantly impacts Biomass. The p-value of DVS alone is 4.71e-11, meaning on its own, DVS does not have an impact on Biomass. The p-value of Tillage:DVS is 0.7422 telling us that the effect of one of the independent variables on Biomass is dependent on the level of the other independent variable.

  1. Code: running a Tukey test for a two way ANOVA can be confusing if there are many iterations, the easiest way to look at comparisons is through a visual. Create a visual that shows the two way model. Note: you can use code from my example on the guide OR use code from the website linked at the top.
soil<-read.table('biomassTill.csv', ',', header=T)
soil
##    rownames Tillage       DVS     Biomass      Biom.2
## 1         1     CA- 0.4903887   0.5412327 108.2465444
## 2         2     CA- 0.4903887   1.3090013 261.8002616
## 3         3     CA- 0.4903887   7.0303258   7.0303258
## 4         4     CA- 0.4903887   2.0303258   2.0303258
## 5         5     CA- 0.7488892   5.9105648   5.9105648
## 6         6     CA- 0.7488892  68.2182182  68.2182182
## 7         7     CA- 0.7488892  26.6471017  26.6471017
## 8         8     CA- 1.0000000  50.6356090  50.6356090
## 9         9     CA- 1.0000000  37.7156002  37.7156002
## 10       10     CA- 1.0000000 116.0872520 116.0872520
## 11       11     CA- 1.0000000  52.8712804  52.8712804
## 12       12     CA- 1.3880686  48.2025909  48.2025909
## 13       13     CA- 1.3880686 100.7209132 100.7209132
## 14       14     CA- 1.3880686 158.3894974 158.3894974
## 15       15     CA- 1.3880686 168.0453182 168.0453182
## 16       16     CA- 2.0000000 103.7269983 103.7269983
## 17       17     CA- 2.0000000 284.4575343 284.4575343
## 18       18     CA- 2.0000000 431.9106036 431.9106036
## 19       19     CA- 2.0000000 152.5598327 152.5598327
## 20       20     CA+ 0.4903887   1.0071369   1.0071369
## 21       21     CA+ 0.4903887   0.9085065   0.9085065
## 22       22     CA+ 0.4903887   1.5603494   1.5603494
## 23       23     CA+ 0.4903887  11.2221944  11.2221944
## 24       24     CA+ 0.7488892  12.0856433  12.0856433
## 25       25     CA+ 0.7488892  17.3492120  17.3492120
## 26       26     CA+ 0.7488892  11.2746862  11.2746862
## 27       27     CA+ 0.7488892  63.9362974  63.9362974
## 28       28     CA+ 1.0000000  74.4275526  74.4275526
## 29       29     CA+ 1.0000000  48.9460049  48.9460049
## 30       30     CA+ 1.0000000 101.5051511 101.5051511
## 31       31     CA+ 1.0000000 147.7895951 147.7895951
## 32       32     CA+ 1.3880686  39.6237584  39.6237584
## 33       33     CA+ 1.3880686  85.2690927  85.2690927
## 34       34     CA+ 1.3880686 174.6212360 174.6212360
## 35       35     CA+ 1.3880686 323.9773106 323.9773106
## 36       36     CA+ 2.0000000 231.4165129 231.4165129
## 37       37     CA+ 2.0000000 200.8488960 200.8488960
## 38       38     CA+ 2.0000000 579.6046045   5.7960460
## 39       39      CT 0.4903887   3.6286286   3.6286286
## 40       40      CT 0.4903887  16.1427911  16.1427911
## 41       41      CT 0.4903887   3.4733134   3.4733134
## 42       42      CT 0.4903887   1.9655578   1.9655578
## 43       43      CT 0.7488892  74.8043743  74.8043743
## 44       44      CT 0.7488892  61.8082368  61.8082368
## 45       45      CT 0.7488892  21.8416090  21.8416090
## 46       46      CT 0.7488892  31.4149100  31.4149100
## 47       47      CT 1.0000000 187.8068930 187.8068930
## 48       48      CT 1.0000000 114.2407517 114.2407517
## 49       49      CT 1.0000000  54.8790069  54.8790069
## 50       50      CT 1.0000000  42.7952224  42.7952224
## 51       51      CT 1.3880686 246.4736243 246.4736243
## 52       52      CT 1.3880686 345.5412829 345.5412829
## 53       53      CT 1.3880686 128.1885083 128.1885083
## 54       54      CT 1.3880686 270.4160471 270.4160471
## 55       55      CT 2.0000000 513.6421369 513.6421369
## 56       56      CT 2.0000000 361.9505493 361.9505493
## 57       57      CT 2.0000000 295.0319506 295.0319506
## 58       58      CT 2.0000000 317.4047834 317.4047834
soil$DVS <- as.factor(soil$DVS)
str(soil)
## 'data.frame':    58 obs. of  5 variables:
##  $ rownames: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Tillage : chr  "CA-" "CA-" "CA-" "CA-" ...
##  $ DVS     : Factor w/ 5 levels "0.490388663",..: 1 1 1 1 2 2 2 3 3 3 ...
##  $ Biomass : num  0.541 1.309 7.03 2.03 5.911 ...
##  $ Biom.2  : num  108.25 261.8 7.03 2.03 5.91 ...
TDvB_model <- aov(Biomass ~ DVS * Tillage, data = soil)
summary(TDvB_model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DVS          4 708855  177214  26.123 5.03e-11 ***
## Tillage      2  39556   19778   2.915    0.065 .  
## DVS:Tillage  8  34574    4322   0.637    0.742    
## Residuals   43 291703    6784                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)

soil_means <- soil %>%
  group_by(DVS, Tillage) %>%
  summary(Biomass_mean = mean(Biomass))

ggplot(soil, aes(x = DVS, y = Biomass, color = Tillage)) +
  geom_line() + geom_point() + facet_wrap(~ Tillage) + theme_minimal() +
  labs(x = "DVS (Watering Regime)", y = "Biomass")

PART 3: Logistic Regression: Non-restorative sleep and age

Download and bring in the “sleep.csv” file. This file contains data on participants in a sleep study and documenting the occurance of non-restorative sleep. Here, data in the “NRS” column are either a 0 or a 1, 0 meaning NRS not observed, 1 meaning NRS observed.

For this dataset, I would like you to do the following:

  1. Code: run a logistic regression that looks at how “Age_2013” correlates with NRS.

  2. Question: how would you interpret the results of the logistic regression function?

The coefficient estimate for Age_2013 is -0.031965 seeing that this value is negative it tells us that an increase in age is associated with lower odds of the NRS outcome. The p - value being <2e-16 tells us to reject the null hypothesis, meaning there is a significant association between age and NRS.

  1. Code: create a logistic regression curve for the model

  2. Question: using the curve, how would you explain patterns of NRS across age? As predicted by the coefficient estimate, the logistic regression curve confirms that as age increases, the probability of NRS decreases.

sleep<-read.table('sleep.csv', ',', header=T)

sleep_model <- glm(NRS ~ Age_2013, data = sleep, family = binomial)
summary(sleep_model)
## 
## Call:
## glm(formula = NRS ~ Age_2013, family = binomial, data = sleep)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.868077   0.068995   12.58   <2e-16 ***
## Age_2013    -0.031965   0.001079  -29.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 98557  on 90188  degrees of freedom
## Residual deviance: 97701  on 90187  degrees of freedom
## AIC: 97705
## 
## Number of Fisher Scoring iterations: 4
plot(sleep$Age_2013, sleep$NRS, main = "Age vs. NRS", xlab = "Age_2013", ylab = "NRS", pch = 19, col = "blue")

age_seq <- seq(from = min(sleep$Age_2013), to = max(sleep$Age_2013), length.out = 90190)
predictions <- data.frame(Age_2013 = age_seq)
predictions$NRS_prob <- predict(sleep_model, newdata = predictions, type = "response")
plot(sleep$Age_2013, sleep$NRS, main = "Logistic Regression Curve", xlab = "Age_2013", ylab = "Probability of NRS", col = "blue", pch = 20)
lines(predictions$Age_2013, predictions$NRS_prob, col = "red", lwd = 2)