NOTE: there is code in the hypothesis testing guide document to help create code to run these types of tests, but use this link for more detail and for other methods of graphing: https://www.datanovia.com/en/lessons/ancova-in-r/
The “iris.csv” dataset is commonly used for examples in R. The dataset utilizes measurements of flower parts across different species of irises. Please download and bring in the “iris.csv” file.
For this dataset, I would like you to do the following:
iris<-read.table('iris.csv', ',', header=T)
iris
## sepal.length sepal.width petal.length petal.width variety
## 1 5.1 3.5 1.4 0.2 Setosa
## 2 4.9 3.0 1.4 0.2 Setosa
## 3 4.7 3.2 1.3 0.2 Setosa
## 4 4.6 3.1 1.5 0.2 Setosa
## 5 5.0 3.6 1.4 0.2 Setosa
## 6 5.4 3.9 1.7 0.4 Setosa
## 7 4.6 3.4 1.4 0.3 Setosa
## 8 5.0 3.4 1.5 0.2 Setosa
## 9 4.4 2.9 1.4 0.2 Setosa
## 10 4.9 3.1 1.5 0.1 Setosa
## 11 5.4 3.7 1.5 0.2 Setosa
## 12 4.8 3.4 1.6 0.2 Setosa
## 13 4.8 3.0 1.4 0.1 Setosa
## 14 4.3 3.0 1.1 0.1 Setosa
## 15 5.8 4.0 1.2 0.2 Setosa
## 16 5.7 4.4 1.5 0.4 Setosa
## 17 5.4 3.9 1.3 0.4 Setosa
## 18 5.1 3.5 1.4 0.3 Setosa
## 19 5.7 3.8 1.7 0.3 Setosa
## 20 5.1 3.8 1.5 0.3 Setosa
## 21 5.4 3.4 1.7 0.2 Setosa
## 22 5.1 3.7 1.5 0.4 Setosa
## 23 4.6 3.6 1.0 0.2 Setosa
## 24 5.1 3.3 1.7 0.5 Setosa
## 25 4.8 3.4 1.9 0.2 Setosa
## 26 5.0 3.0 1.6 0.2 Setosa
## 27 5.0 3.4 1.6 0.4 Setosa
## 28 5.2 3.5 1.5 0.2 Setosa
## 29 5.2 3.4 1.4 0.2 Setosa
## 30 4.7 3.2 1.6 0.2 Setosa
## 31 4.8 3.1 1.6 0.2 Setosa
## 32 5.4 3.4 1.5 0.4 Setosa
## 33 5.2 4.1 1.5 0.1 Setosa
## 34 5.5 4.2 1.4 0.2 Setosa
## 35 4.9 3.1 1.5 0.2 Setosa
## 36 5.0 3.2 1.2 0.2 Setosa
## 37 5.5 3.5 1.3 0.2 Setosa
## 38 4.9 3.6 1.4 0.1 Setosa
## 39 4.4 3.0 1.3 0.2 Setosa
## 40 5.1 3.4 1.5 0.2 Setosa
## 41 5.0 3.5 1.3 0.3 Setosa
## 42 4.5 2.3 1.3 0.3 Setosa
## 43 4.4 3.2 1.3 0.2 Setosa
## 44 5.0 3.5 1.6 0.6 Setosa
## 45 5.1 3.8 1.9 0.4 Setosa
## 46 4.8 3.0 1.4 0.3 Setosa
## 47 5.1 3.8 1.6 0.2 Setosa
## 48 4.6 3.2 1.4 0.2 Setosa
## 49 5.3 3.7 1.5 0.2 Setosa
## 50 5.0 3.3 1.4 0.2 Setosa
## 51 7.0 3.2 4.7 1.4 Versicolor
## 52 6.4 3.2 4.5 1.5 Versicolor
## 53 6.9 3.1 4.9 1.5 Versicolor
## 54 5.5 2.3 4.0 1.3 Versicolor
## 55 6.5 2.8 4.6 1.5 Versicolor
## 56 5.7 2.8 4.5 1.3 Versicolor
## 57 6.3 3.3 4.7 1.6 Versicolor
## 58 4.9 2.4 3.3 1.0 Versicolor
## 59 6.6 2.9 4.6 1.3 Versicolor
## 60 5.2 2.7 3.9 1.4 Versicolor
## 61 5.0 2.0 3.5 1.0 Versicolor
## 62 5.9 3.0 4.2 1.5 Versicolor
## 63 6.0 2.2 4.0 1.0 Versicolor
## 64 6.1 2.9 4.7 1.4 Versicolor
## 65 5.6 2.9 3.6 1.3 Versicolor
## 66 6.7 3.1 4.4 1.4 Versicolor
## 67 5.6 3.0 4.5 1.5 Versicolor
## 68 5.8 2.7 4.1 1.0 Versicolor
## 69 6.2 2.2 4.5 1.5 Versicolor
## 70 5.6 2.5 3.9 1.1 Versicolor
## 71 5.9 3.2 4.8 1.8 Versicolor
## 72 6.1 2.8 4.0 1.3 Versicolor
## 73 6.3 2.5 4.9 1.5 Versicolor
## 74 6.1 2.8 4.7 1.2 Versicolor
## 75 6.4 2.9 4.3 1.3 Versicolor
## 76 6.6 3.0 4.4 1.4 Versicolor
## 77 6.8 2.8 4.8 1.4 Versicolor
## 78 6.7 3.0 5.0 1.7 Versicolor
## 79 6.0 2.9 4.5 1.5 Versicolor
## 80 5.7 2.6 3.5 1.0 Versicolor
## 81 5.5 2.4 3.8 1.1 Versicolor
## 82 5.5 2.4 3.7 1.0 Versicolor
## 83 5.8 2.7 3.9 1.2 Versicolor
## 84 6.0 2.7 5.1 1.6 Versicolor
## 85 5.4 3.0 4.5 1.5 Versicolor
## 86 6.0 3.4 4.5 1.6 Versicolor
## 87 6.7 3.1 4.7 1.5 Versicolor
## 88 6.3 2.3 4.4 1.3 Versicolor
## 89 5.6 3.0 4.1 1.3 Versicolor
## 90 5.5 2.5 4.0 1.3 Versicolor
## 91 5.5 2.6 4.4 1.2 Versicolor
## 92 6.1 3.0 4.6 1.4 Versicolor
## 93 5.8 2.6 4.0 1.2 Versicolor
## 94 5.0 2.3 3.3 1.0 Versicolor
## 95 5.6 2.7 4.2 1.3 Versicolor
## 96 5.7 3.0 4.2 1.2 Versicolor
## 97 5.7 2.9 4.2 1.3 Versicolor
## 98 6.2 2.9 4.3 1.3 Versicolor
## 99 5.1 2.5 3.0 1.1 Versicolor
## 100 5.7 2.8 4.1 1.3 Versicolor
## 101 6.3 3.3 6.0 2.5 Virginica
## 102 5.8 2.7 5.1 1.9 Virginica
## 103 7.1 3.0 5.9 2.1 Virginica
## 104 6.3 2.9 5.6 1.8 Virginica
## 105 6.5 3.0 5.8 2.2 Virginica
## 106 7.6 3.0 6.6 2.1 Virginica
## 107 4.9 2.5 4.5 1.7 Virginica
## 108 7.3 2.9 6.3 1.8 Virginica
## 109 6.7 2.5 5.8 1.8 Virginica
## 110 7.2 3.6 6.1 2.5 Virginica
## 111 6.5 3.2 5.1 2.0 Virginica
## 112 6.4 2.7 5.3 1.9 Virginica
## 113 6.8 3.0 5.5 2.1 Virginica
## 114 5.7 2.5 5.0 2.0 Virginica
## 115 5.8 2.8 5.1 2.4 Virginica
## 116 6.4 3.2 5.3 2.3 Virginica
## 117 6.5 3.0 5.5 1.8 Virginica
## 118 7.7 3.8 6.7 2.2 Virginica
## 119 7.7 2.6 6.9 2.3 Virginica
## 120 6.0 2.2 5.0 1.5 Virginica
## 121 6.9 3.2 5.7 2.3 Virginica
## 122 5.6 2.8 4.9 2.0 Virginica
## 123 7.7 2.8 6.7 2.0 Virginica
## 124 6.3 2.7 4.9 1.8 Virginica
## 125 6.7 3.3 5.7 2.1 Virginica
## 126 7.2 3.2 6.0 1.8 Virginica
## 127 6.2 2.8 4.8 1.8 Virginica
## 128 6.1 3.0 4.9 1.8 Virginica
## 129 6.4 2.8 5.6 2.1 Virginica
## 130 7.2 3.0 5.8 1.6 Virginica
## 131 7.4 2.8 6.1 1.9 Virginica
## 132 7.9 3.8 6.4 2.0 Virginica
## 133 6.4 2.8 5.6 2.2 Virginica
## 134 6.3 2.8 5.1 1.5 Virginica
## 135 6.1 2.6 5.6 1.4 Virginica
## 136 7.7 3.0 6.1 2.3 Virginica
## 137 6.3 3.4 5.6 2.4 Virginica
## 138 6.4 3.1 5.5 1.8 Virginica
## 139 6.0 3.0 4.8 1.8 Virginica
## 140 6.9 3.1 5.4 2.1 Virginica
## 141 6.7 3.1 5.6 2.4 Virginica
## 142 6.9 3.1 5.1 2.3 Virginica
## 143 5.8 2.7 5.1 1.9 Virginica
## 144 6.8 3.2 5.9 2.3 Virginica
## 145 6.7 3.3 5.7 2.5 Virginica
## 146 6.7 3.0 5.2 2.3 Virginica
## 147 6.3 2.5 5.0 1.9 Virginica
## 148 6.5 3.0 5.2 2.0 Virginica
## 149 6.2 3.4 5.4 2.3 Virginica
## 150 5.9 3.0 5.1 1.8 Virginica
ancova <- aov(sepal.length ~ variety + sepal.width, data = iris)
summary(ancova)
## Df Sum Sq Mean Sq F value Pr(>F)
## variety 2 63.21 31.606 164.8 < 2e-16 ***
## sepal.width 1 10.95 10.953 57.1 4.19e-12 ***
## Residuals 146 28.00 0.192
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Download and bring in the “biomassTill.csv” file. This file contains data on how different soil tilling methods impact the growth (biomass) of the plants. But the researchers also manipulated the amount of water received by each plant, which would obviously influence any measurement of growth.
For this dataset, I would like you to do the following:
Code: change the “DVS” variable to a factor
Code: Run a two-way ANOVA that looks at how “Tillage” and “DVS” (watering regime) influence “Biomass” NOTE: don’t worry about testing assumptions for this (you would need to do this normally)
Question: how do you interpret the results of the two way ANOVA
The p-value of Tillage alone is 0.0796, meaning tillage alone significantly impacts Biomass. The p-value of DVS alone is 4.71e-11, meaning on its own, DVS does not have an impact on Biomass. The p-value of Tillage:DVS is 0.7422 telling us that the effect of one of the independent variables on Biomass is dependent on the level of the other independent variable.
soil<-read.table('biomassTill.csv', ',', header=T)
soil
## rownames Tillage DVS Biomass Biom.2
## 1 1 CA- 0.4903887 0.5412327 108.2465444
## 2 2 CA- 0.4903887 1.3090013 261.8002616
## 3 3 CA- 0.4903887 7.0303258 7.0303258
## 4 4 CA- 0.4903887 2.0303258 2.0303258
## 5 5 CA- 0.7488892 5.9105648 5.9105648
## 6 6 CA- 0.7488892 68.2182182 68.2182182
## 7 7 CA- 0.7488892 26.6471017 26.6471017
## 8 8 CA- 1.0000000 50.6356090 50.6356090
## 9 9 CA- 1.0000000 37.7156002 37.7156002
## 10 10 CA- 1.0000000 116.0872520 116.0872520
## 11 11 CA- 1.0000000 52.8712804 52.8712804
## 12 12 CA- 1.3880686 48.2025909 48.2025909
## 13 13 CA- 1.3880686 100.7209132 100.7209132
## 14 14 CA- 1.3880686 158.3894974 158.3894974
## 15 15 CA- 1.3880686 168.0453182 168.0453182
## 16 16 CA- 2.0000000 103.7269983 103.7269983
## 17 17 CA- 2.0000000 284.4575343 284.4575343
## 18 18 CA- 2.0000000 431.9106036 431.9106036
## 19 19 CA- 2.0000000 152.5598327 152.5598327
## 20 20 CA+ 0.4903887 1.0071369 1.0071369
## 21 21 CA+ 0.4903887 0.9085065 0.9085065
## 22 22 CA+ 0.4903887 1.5603494 1.5603494
## 23 23 CA+ 0.4903887 11.2221944 11.2221944
## 24 24 CA+ 0.7488892 12.0856433 12.0856433
## 25 25 CA+ 0.7488892 17.3492120 17.3492120
## 26 26 CA+ 0.7488892 11.2746862 11.2746862
## 27 27 CA+ 0.7488892 63.9362974 63.9362974
## 28 28 CA+ 1.0000000 74.4275526 74.4275526
## 29 29 CA+ 1.0000000 48.9460049 48.9460049
## 30 30 CA+ 1.0000000 101.5051511 101.5051511
## 31 31 CA+ 1.0000000 147.7895951 147.7895951
## 32 32 CA+ 1.3880686 39.6237584 39.6237584
## 33 33 CA+ 1.3880686 85.2690927 85.2690927
## 34 34 CA+ 1.3880686 174.6212360 174.6212360
## 35 35 CA+ 1.3880686 323.9773106 323.9773106
## 36 36 CA+ 2.0000000 231.4165129 231.4165129
## 37 37 CA+ 2.0000000 200.8488960 200.8488960
## 38 38 CA+ 2.0000000 579.6046045 5.7960460
## 39 39 CT 0.4903887 3.6286286 3.6286286
## 40 40 CT 0.4903887 16.1427911 16.1427911
## 41 41 CT 0.4903887 3.4733134 3.4733134
## 42 42 CT 0.4903887 1.9655578 1.9655578
## 43 43 CT 0.7488892 74.8043743 74.8043743
## 44 44 CT 0.7488892 61.8082368 61.8082368
## 45 45 CT 0.7488892 21.8416090 21.8416090
## 46 46 CT 0.7488892 31.4149100 31.4149100
## 47 47 CT 1.0000000 187.8068930 187.8068930
## 48 48 CT 1.0000000 114.2407517 114.2407517
## 49 49 CT 1.0000000 54.8790069 54.8790069
## 50 50 CT 1.0000000 42.7952224 42.7952224
## 51 51 CT 1.3880686 246.4736243 246.4736243
## 52 52 CT 1.3880686 345.5412829 345.5412829
## 53 53 CT 1.3880686 128.1885083 128.1885083
## 54 54 CT 1.3880686 270.4160471 270.4160471
## 55 55 CT 2.0000000 513.6421369 513.6421369
## 56 56 CT 2.0000000 361.9505493 361.9505493
## 57 57 CT 2.0000000 295.0319506 295.0319506
## 58 58 CT 2.0000000 317.4047834 317.4047834
soil$DVS <- as.factor(soil$DVS)
str(soil)
## 'data.frame': 58 obs. of 5 variables:
## $ rownames: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Tillage : chr "CA-" "CA-" "CA-" "CA-" ...
## $ DVS : Factor w/ 5 levels "0.490388663",..: 1 1 1 1 2 2 2 3 3 3 ...
## $ Biomass : num 0.541 1.309 7.03 2.03 5.911 ...
## $ Biom.2 : num 108.25 261.8 7.03 2.03 5.91 ...
TDvB_model <- aov(Biomass ~ DVS * Tillage, data = soil)
summary(TDvB_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## DVS 4 708855 177214 26.123 5.03e-11 ***
## Tillage 2 39556 19778 2.915 0.065 .
## DVS:Tillage 8 34574 4322 0.637 0.742
## Residuals 43 291703 6784
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)
soil_means <- soil %>%
group_by(DVS, Tillage) %>%
summary(Biomass_mean = mean(Biomass))
ggplot(soil, aes(x = DVS, y = Biomass, color = Tillage)) +
geom_line() + geom_point() + facet_wrap(~ Tillage) + theme_minimal() +
labs(x = "DVS (Watering Regime)", y = "Biomass")
Download and bring in the “sleep.csv” file. This file contains data on participants in a sleep study and documenting the occurance of non-restorative sleep. Here, data in the “NRS” column are either a 0 or a 1, 0 meaning NRS not observed, 1 meaning NRS observed.
For this dataset, I would like you to do the following:
Code: run a logistic regression that looks at how “Age_2013” correlates with NRS.
Question: how would you interpret the results of the logistic regression function?
The coefficient estimate for Age_2013 is -0.031965 seeing that this value is negative it tells us that an increase in age is associated with lower odds of the NRS outcome. The p - value being <2e-16 tells us to reject the null hypothesis, meaning there is a significant association between age and NRS.
Code: create a logistic regression curve for the model
Question: using the curve, how would you explain patterns of NRS across age? As predicted by the coefficient estimate, the logistic regression curve confirms that as age increases, the probability of NRS decreases.
sleep<-read.table('sleep.csv', ',', header=T)
sleep_model <- glm(NRS ~ Age_2013, data = sleep, family = binomial)
summary(sleep_model)
##
## Call:
## glm(formula = NRS ~ Age_2013, family = binomial, data = sleep)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.868077 0.068995 12.58 <2e-16 ***
## Age_2013 -0.031965 0.001079 -29.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 98557 on 90188 degrees of freedom
## Residual deviance: 97701 on 90187 degrees of freedom
## AIC: 97705
##
## Number of Fisher Scoring iterations: 4
plot(sleep$Age_2013, sleep$NRS, main = "Age vs. NRS", xlab = "Age_2013", ylab = "NRS", pch = 19, col = "blue")
age_seq <- seq(from = min(sleep$Age_2013), to = max(sleep$Age_2013), length.out = 90190)
predictions <- data.frame(Age_2013 = age_seq)
predictions$NRS_prob <- predict(sleep_model, newdata = predictions, type = "response")
plot(sleep$Age_2013, sleep$NRS, main = "Logistic Regression Curve", xlab = "Age_2013", ylab = "Probability of NRS", col = "blue", pch = 20)
lines(predictions$Age_2013, predictions$NRS_prob, col = "red", lwd = 2)