library(tidyverse)
library(palmerpenguins)
library(ggfortify)
library(dplyr)
newts<-read.csv("C:/Users/paul/OneDrive/Documents/ES3307/Newts.csv",header=TRUE)
head(newts)
## SVL Crest Pond Date LSVL LCREST
## 1 108.9 10.12 4 27 2.037 1.005
## 2 94.7 5.62 7 37 1.976 0.750
## 3 96.9 2.00 7 28 1.986 0.302
## 4 110.1 15.04 10 24 2.042 1.177
## 5 103.7 3.88 6 9 2.016 0.589
## 6 99.3 4.45 8 12 1.997 0.648
str(newts)
## 'data.frame': 87 obs. of 6 variables:
## $ SVL : num 108.9 94.7 96.9 110.1 103.7 ...
## $ Crest : num 10.12 5.62 2 15.04 3.88 ...
## $ Pond : int 4 7 7 10 6 8 7 6 7 2 ...
## $ Date : int 27 37 28 24 9 12 32 16 15 25 ...
## $ LSVL : num 2.04 1.98 1.99 2.04 2.02 ...
## $ LCREST: num 1.005 0.75 0.302 1.177 0.589 ...
summary(newts)
## SVL Crest Pond Date
## Min. : 87.60 Min. : 1.320 Min. : 1.00 Min. : 1.00
## 1st Qu.: 92.90 1st Qu.: 3.765 1st Qu.: 4.00 1st Qu.:11.50
## Median : 99.10 Median : 5.900 Median : 7.00 Median :22.00
## Mean : 99.64 Mean : 7.439 Mean : 6.08 Mean :21.99
## 3rd Qu.:105.75 3rd Qu.: 9.065 3rd Qu.: 8.00 3rd Qu.:32.00
## Max. :112.40 Max. :25.940 Max. :10.00 Max. :45.00
## LSVL LCREST
## Min. :1.942 Min. :0.1200
## 1st Qu.:1.968 1st Qu.:0.5755
## Median :1.996 Median :0.7710
## Mean :1.997 Mean :0.7792
## 3rd Qu.:2.024 3rd Qu.:0.9575
## Max. :2.051 Max. :1.4140
library(ggplot2)
newts$Pond<-as.factor(newts$Pond)
ggplot(newts, aes(x = LSVL, y = LCREST, shape = Pond, color = Pond)) +
geom_point(alpha = 2, size = 2) +
labs(
x = "Body size (log snout-vent length, mm)",
y = "Dorsal crest size (log crest height, mm)",
title = "Scatter plot showing Dorsal crest size vs body size across ponds",
shape = "Pond",
color = "Pond"
) +
scale_shape_manual(values = 0:9) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold")
)
m1<-lm(LCREST~LSVL,newts)
par(mfrow=c(2,2))
plot(m1)
The diagnostic plots suggest that the data follows the assumptions of
the linear model relatively well. The residuals vs Fitted plot shows an
even distribution of residuals suggesting homogeneity of variance.The QQ
residuals plot demonstrates most of the residuals in alignment with the
model, apart from two outliers, suggesting normally distributed data.
The scale-location plot shows an even spread of residuals suggesting
homogeneity of residuals.The Residuals vs Leverage plot suggest that the
outliers are not sufficiant enough to significantly influence the
results as they donot exceed the cook’s distance. Therefore, an ANOVA
test can be done.
anova(m1)
## Analysis of Variance Table
##
## Response: LCREST
## Df Sum Sq Mean Sq F value Pr(>F)
## LSVL 1 2.3894 2.38939 45.808 1.579e-09 ***
## Residuals 85 4.4337 0.05216
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA showed that there is a statistically significant difference in the size of the dorsal crest (LCREST) based on the body size of the newts (LSVL) (F1,85 = 45.8, p<0.001)
m2<-lm(LCREST~LSVL*Pond,data=newts)
anova(m2)
## Analysis of Variance Table
##
## Response: LCREST
## Df Sum Sq Mean Sq F value Pr(>F)
## LSVL 1 2.3894 2.38939 42.6712 1.031e-08 ***
## Pond 9 0.2406 0.02674 0.4775 0.8848
## LSVL:Pond 9 0.4414 0.04904 0.8759 0.5510
## Residuals 67 3.7517 0.05600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There doesn’t seem to be a statistically significant interaction effect of Pond as a blocking factor on LSVL (F9,67 = 0.876, p=0.551), however it still accounts for 17.9% of the variation caused by LSVL, suggesting that interaction between some of the ponds must be present.
ggplot(newts, aes(x = LSVL, y = LCREST, color = Pond)) +
geom_point(alpha = 0.7, size = 2) +
geom_smooth(method = "lm", se = FALSE, aes(linetype = Pond)) + # regression per pond
labs(
x = "Body size (log snout-vent length, mm)",
y = "Dorsal crest size (log height, mm)",
title = "Interaction between body size and pond on crest size",
color = "Pond",
linetype = "Pond"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
The plot above shows an interaction of pond as a blocking factor in certain ponds but not enough to be considered a statistically significant interaction, suggesting that pond should not be included in the model.
m3<-lm(LCREST~Date,data=newts)
anova(m3)
## Analysis of Variance Table
##
## Response: LCREST
## Df Sum Sq Mean Sq F value Pr(>F)
## Date 1 0.0310 0.031014 0.3881 0.535
## Residuals 85 6.7921 0.079907
There is no statistically significant interaction between LCREST and the covariate Date (anova: F1,85 = 0.388, p = 0.535) and we have already demonstrated that the covariate Pond has no significant interaction. As a result of there being no significant interactions, we will run a type 2 SS
library(car)
m5<-lm(LCREST~Date+LSVL,data=newts)
Anova(m5, type="2")
## Anova Table (Type II tests)
##
## Response: LCREST
## Sum Sq Df F value Pr(>F)
## Date 0.0674 1 1.2972 0.258
## LSVL 2.4258 1 46.6682 1.242e-09 ***
## Residuals 4.3663 84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a significant difference between LCREST (Anova type 2: F1,84= 40.1, p<0.001) and LSVL (F1,84 = 46.7, p<0.001) based on the date. As a result, the date covariate has a significant interaction in the model and should be kept.
sheep<-read.csv("C:/Users/paul/OneDrive/Documents/ES3307/sheep.csv",header=TRUE)
head(sheep)
## Duration nlookups sex sheep sheepn obsper luprate Sex2 sheep2 avluprate
## 1 39 3 1 1 1 1 0.076923 1 1 0.13270
## 2 40 7 1 1 1 2 0.175000 1 2 0.23485
## 3 39 3 1 1 1 3 0.076923 1 3 0.16753
## 4 34 7 1 1 1 4 0.205882 2 4 0.19490
## 5 31 6 1 1 1 5 0.193548 2 5 0.21700
## 6 37 5 1 1 1 6 0.135135 2 6 0.32280
str(sheep)
## 'data.frame': 120 obs. of 10 variables:
## $ Duration : int 39 40 39 34 31 37 38 36 36 37 ...
## $ nlookups : int 3 7 3 7 6 5 3 4 2 5 ...
## $ sex : int 1 1 1 1 1 1 1 1 1 1 ...
## $ sheep : int 1 1 1 1 1 1 1 1 1 1 ...
## $ sheepn : int 1 1 1 1 1 1 1 1 1 1 ...
## $ obsper : int 1 2 3 4 5 6 7 8 9 10 ...
## $ luprate : num 0.0769 0.175 0.0769 0.2059 0.1935 ...
## $ Sex2 : int 1 1 1 2 2 2 NA NA NA NA ...
## $ sheep2 : int 1 2 3 4 5 6 NA NA NA NA ...
## $ avluprate: num 0.133 0.235 0.168 0.195 0.217 ...
summary(sheep)
## Duration nlookups sex sheep sheepn
## Min. :27.00 Min. : 2.00 Min. :1.0 Min. :1.0 Min. :1
## 1st Qu.:34.75 1st Qu.: 6.00 1st Qu.:1.0 1st Qu.:2.0 1st Qu.:1
## Median :38.00 Median : 7.00 Median :1.5 Median :3.5 Median :2
## Mean :38.09 Mean : 7.85 Mean :1.5 Mean :3.5 Mean :2
## 3rd Qu.:41.00 3rd Qu.:10.00 3rd Qu.:2.0 3rd Qu.:5.0 3rd Qu.:3
## Max. :53.00 Max. :14.00 Max. :2.0 Max. :6.0 Max. :3
##
## obsper luprate Sex2 sheep2
## Min. : 1.00 Min. :0.05556 Min. :1.0 Min. :1.00
## 1st Qu.: 5.75 1st Qu.:0.15790 1st Qu.:1.0 1st Qu.:2.25
## Median :10.50 Median :0.20521 Median :1.5 Median :3.50
## Mean :10.50 Mean :0.21164 Mean :1.5 Mean :3.50
## 3rd Qu.:15.25 3rd Qu.:0.25659 3rd Qu.:2.0 3rd Qu.:4.75
## Max. :20.00 Max. :0.43333 Max. :2.0 Max. :6.00
## NA's :114 NA's :114
## avluprate
## Min. :0.1327
## 1st Qu.:0.1744
## Median :0.2059
## Mean :0.2116
## 3rd Qu.:0.2304
## Max. :0.3228
## NA's :114
sm1<-lm(luprate~obsper+sex, data=sheep)
summary(sm1)
##
## Call:
## lm(formula = luprate ~ obsper + sex, data = sheep)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.142030 -0.053607 -0.007543 0.045107 0.182505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.086954 0.023550 3.692 0.000339 ***
## obsper 0.002369 0.001119 2.117 0.036386 *
## sex 0.066537 0.012907 5.155 1.04e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0707 on 117 degrees of freedom
## Multiple R-squared: 0.2098, Adjusted R-squared: 0.1962
## F-statistic: 15.53 on 2 and 117 DF, p-value: 1.045e-06
This anova suggests that there is a significant difference between look up rates in male and female sheep (F2,117 = 15.53, p<0.001). However, there is obvious pseudo replication in this experimental design by including OBSPER in the model, which ultimately has incorrectly inflated the degrees of freedom to 117, due to the increase in sample size. As a result, the P value is most likely much lower than it should be causing a type 1 error, which mistakenly rejects a null hypothesis when no statistical difference actually exists.
sm2<-lm(avluprate~Sex2,data=sheep)
summary(sm2)
##
## Call:
## lm(formula = avluprate ~ Sex2, data = sheep)
##
## Residuals:
## 1 2 3 4 5 6
## -0.04566 0.05649 -0.01083 -0.05000 -0.02790 0.07790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.11182 0.07837 1.427 0.227
## Sex2 0.06654 0.04956 1.342 0.251
##
## Residual standard error: 0.0607 on 4 degrees of freedom
## (114 observations deleted due to missingness)
## Multiple R-squared: 0.3106, Adjusted R-squared: 0.1383
## F-statistic: 1.802 on 1 and 4 DF, p-value: 0.2506
A more appropriate analysis will look at the interaction between the mean look up rates of each of the three sheep in the two different sexes. This way the degrees of freedom will be correctly based on the sample size of 6, reducing the df by 114). As a result, this results of this anova suggest no significant difference bin the average look up rates of male and female sheep (F1,4= 1.802, P=0.251), proving that the pseudoreplication in the first analysis caused a type 1 error.
weight<-read.csv("C:/Users/paul/OneDrive/Documents/ES3307/weight.csv",header=TRUE)
head(sheep)
## Duration nlookups sex sheep sheepn obsper luprate Sex2 sheep2 avluprate
## 1 39 3 1 1 1 1 0.076923 1 1 0.13270
## 2 40 7 1 1 1 2 0.175000 1 2 0.23485
## 3 39 3 1 1 1 3 0.076923 1 3 0.16753
## 4 34 7 1 1 1 4 0.205882 2 4 0.19490
## 5 31 6 1 1 1 5 0.193548 2 5 0.21700
## 6 37 5 1 1 1 6 0.135135 2 6 0.32280
str(sheep)
## 'data.frame': 120 obs. of 10 variables:
## $ Duration : int 39 40 39 34 31 37 38 36 36 37 ...
## $ nlookups : int 3 7 3 7 6 5 3 4 2 5 ...
## $ sex : int 1 1 1 1 1 1 1 1 1 1 ...
## $ sheep : int 1 1 1 1 1 1 1 1 1 1 ...
## $ sheepn : int 1 1 1 1 1 1 1 1 1 1 ...
## $ obsper : int 1 2 3 4 5 6 7 8 9 10 ...
## $ luprate : num 0.0769 0.175 0.0769 0.2059 0.1935 ...
## $ Sex2 : int 1 1 1 2 2 2 NA NA NA NA ...
## $ sheep2 : int 1 2 3 4 5 6 NA NA NA NA ...
## $ avluprate: num 0.133 0.235 0.168 0.195 0.217 ...
summary(sheep)
## Duration nlookups sex sheep sheepn
## Min. :27.00 Min. : 2.00 Min. :1.0 Min. :1.0 Min. :1
## 1st Qu.:34.75 1st Qu.: 6.00 1st Qu.:1.0 1st Qu.:2.0 1st Qu.:1
## Median :38.00 Median : 7.00 Median :1.5 Median :3.5 Median :2
## Mean :38.09 Mean : 7.85 Mean :1.5 Mean :3.5 Mean :2
## 3rd Qu.:41.00 3rd Qu.:10.00 3rd Qu.:2.0 3rd Qu.:5.0 3rd Qu.:3
## Max. :53.00 Max. :14.00 Max. :2.0 Max. :6.0 Max. :3
##
## obsper luprate Sex2 sheep2
## Min. : 1.00 Min. :0.05556 Min. :1.0 Min. :1.00
## 1st Qu.: 5.75 1st Qu.:0.15790 1st Qu.:1.0 1st Qu.:2.25
## Median :10.50 Median :0.20521 Median :1.5 Median :3.50
## Mean :10.50 Mean :0.21164 Mean :1.5 Mean :3.50
## 3rd Qu.:15.25 3rd Qu.:0.25659 3rd Qu.:2.0 3rd Qu.:4.75
## Max. :20.00 Max. :0.43333 Max. :2.0 Max. :6.00
## NA's :114 NA's :114
## avluprate
## Min. :0.1327
## 1st Qu.:0.1744
## Median :0.2059
## Mean :0.2116
## 3rd Qu.:0.2304
## Max. :0.3228
## NA's :114
par(mfrow=c(1,2))
ggplot(weight, aes(x = HT, y = FOREARM)) +
geom_point(alpha = 0.7, color = "#2A9571", size = 2) +
geom_smooth(method = "lm", se = FALSE, color = "#1F4E79") +
labs(
x = "HEIGHT)",
y = "Forearm length (FOREARM)",
title = "Relationship between Forearm length and height"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
ggplot(weight, aes(x = WT, y = FOREARM)) +
geom_point(alpha = 0.7, color = "#2A9571", size = 2) +
geom_smooth(method = "lm", se = FALSE, color = "#1F4E79") +
labs(
x = "weight",
y = "Forearm length (FOREARM)",
title = "Relationship between Forearm length and weight"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
wm1<-lm(FOREARM~HT,data=weight)
summary(wm1)
##
## Call:
## lm(formula = FOREARM ~ HT, data = weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5203 -1.5167 -0.3876 1.4310 4.9971
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.95880 9.82623 0.098 0.923
## HT 0.02605 0.06219 0.419 0.678
##
## Residual standard error: 2.32 on 37 degrees of freedom
## Multiple R-squared: 0.004719, Adjusted R-squared: -0.02218
## F-statistic: 0.1754 on 1 and 37 DF, p-value: 0.6778
There is no significant interaction between Forearm and height measurements, suggesting that height is not a good predictor of obesity (F1,37 = 0.175, P=0.678)
wm2<-lm(FOREARM~WT,data=weight)
summary(wm2)
##
## Call:
## lm(formula = FOREARM ~ WT, data = weight)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4386 -1.3166 -0.5204 0.6831 5.1331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.59476 2.47273 -1.858 0.071115 .
## WT 0.15298 0.03882 3.941 0.000347 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.951 on 37 degrees of freedom
## Multiple R-squared: 0.2956, Adjusted R-squared: 0.2766
## F-statistic: 15.53 on 1 and 37 DF, p-value: 0.000347
There is a significant interaction between Forearm and weight measurements, suggesting that weight is a much better predictor of obesity than height (F1,37 = 15.53, P<0.001)
wm3<-lm(FOREARM~WT+HT,data=weight)
anova(wm3)
## Analysis of Variance Table
##
## Response: FOREARM
## Df Sum Sq Mean Sq F value Pr(>F)
## WT 1 59.137 59.137 18.3334 0.0001314 ***
## HT 1 24.777 24.777 7.6813 0.0087755 **
## Residuals 36 116.124 3.226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(wm3,type="2")
## Anova Table (Type II tests)
##
## Response: FOREARM
## Sum Sq Df F value Pr(>F)
## WT 82.970 1 25.7220 1.207e-05 ***
## HT 24.777 1 7.6813 0.008775 **
## Residuals 116.124 36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Including HT and WT as main effects in the model using type 1 SS, decreases the informativeness of the model as HT is seen to have a significant effect on Forearm (F1,36= 7.68, P=0.009).
Including HT and WT as main effects in the model but using type 11 SS doesn’t change the decreased informativess as the p-value of HT is the same, however, the statistical significance of WT in the model increases as the p value is smaller (F1,36= 25.72, P<0.001).
wm4<-lm(FOREARM~HT+WT,data=weight)
anova(wm4)
## Analysis of Variance Table
##
## Response: FOREARM
## Df Sum Sq Mean Sq F value Pr(>F)
## HT 1 0.944 0.944 0.2926 0.5919
## WT 1 82.970 82.970 25.7220 1.207e-05 ***
## Residuals 36 116.124 3.226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Putting HT first increases the informativeness of the model as HT is shown to have no statistical significance on forearm (F1,36= 0.293, p=0.592), suggesting that the order of the two explantory variables in the model matters.
blooms<-read.csv("C:/Users/paul/OneDrive/Documents/ES3307/blooms.csv",header=TRUE)
head(blooms)
## SQBLOOMS BED WATER SHADE X SQ2 B2 W2 S2
## 1 4.359 1 1 1 NA 4.359 1 1 1
## 2 3.317 1 1 2 NA 3.317 1 1 2
## 3 3.606 1 1 3 NA 3.606 1 1 3
## 4 4.123 1 1 4 NA 4.123 1 1 4
## 5 4.472 1 2 1 NA 4.472 1 2 1
## 6 4.583 1 2 2 NA 4.583 1 2 2
str(blooms)
## 'data.frame': 36 obs. of 9 variables:
## $ SQBLOOMS: num 4.36 3.32 3.61 4.12 4.47 ...
## $ BED : int 1 1 1 1 1 1 1 1 1 1 ...
## $ WATER : int 1 1 1 1 2 2 2 2 3 3 ...
## $ SHADE : int 1 2 3 4 1 2 3 4 1 2 ...
## $ X : logi NA NA NA NA NA NA ...
## $ SQ2 : num 4.36 3.32 3.61 4.12 4.47 ...
## $ B2 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ W2 : int 1 1 1 1 2 2 2 2 3 3 ...
## $ S2 : int 1 2 3 4 1 2 3 4 1 2 ...
summary(blooms)
## SQBLOOMS BED WATER SHADE X
## Min. :2.449 Min. :1 Min. :1 Min. :1.00 Mode:logical
## 1st Qu.:3.571 1st Qu.:1 1st Qu.:1 1st Qu.:1.75 NA's:36
## Median :4.061 Median :2 Median :2 Median :2.50
## Mean :4.029 Mean :2 Mean :2 Mean :2.50
## 3rd Qu.:4.583 3rd Qu.:3 3rd Qu.:3 3rd Qu.:3.25
## Max. :5.385 Max. :3 Max. :3 Max. :4.00
##
## SQ2 B2 W2 S2
## Min. :2.828 Min. :1.00 Min. :1 Min. :1.000
## 1st Qu.:3.606 1st Qu.:1.00 1st Qu.:1 1st Qu.:1.000
## Median :4.123 Median :2.00 Median :2 Median :2.000
## Mean :4.094 Mean :1.97 Mean :2 Mean :2.424
## 3rd Qu.:4.583 3rd Qu.:3.00 3rd Qu.:3 3rd Qu.:3.000
## Max. :5.385 Max. :3.00 Max. :3 Max. :4.000
## NA's :3 NA's :3 NA's :3 NA's :3
library(ggplot2)
library(gghalves)
library(dplyr)
blooms <- blooms %>%
mutate(
Water_bin = cut(WATER, breaks = 3, labels = c("1", "2", "3")),
Shade_bin = cut(SHADE, breaks = 4, labels = c("none", "1/4", "1/2","fully"))
)
ggplot(blooms) +
geom_boxplot(aes(x = Water_bin, y = SQBLOOMS, fill = Shade_bin),
width = 0.3,
outlier.shape = NA,
alpha = 0.7,
position = position_dodge(0.8)) +
geom_half_point(aes(x = Water_bin, y = SQBLOOMS, colour = Shade_bin),
side = "l",
range_scale = 0.1,
position = position_dodge(0.8),
alpha = 0.7,
size = 2) +
labs(
x = "Number of times watered per week",
y = "Number of Blooms",
fill = "Shade Level",
colour = "Shade Level",
title = "Number of Blooms across Water and Shade Levels"
) +
scale_fill_manual(values = c("#2A9571", "#DFB468", "#4B8CD8", "#E27D60")) +
scale_color_manual(values = c("#2A9571", "#DFB468", "#4B8CD8", "#E27D60")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
text = element_text(family = "Roboto", size = 14)
)
bm1<-lm(SQBLOOMS~BED+WATER+SHADE,data=blooms)
Anova(bm1,type="2")
## Anova Table (Type II tests)
##
## Response: SQBLOOMS
## Sum Sq Df F value Pr(>F)
## BED 1.5266 1 4.1248 0.05064 .
## WATER 1.2092 1 3.2671 0.08009 .
## SHADE 1.0322 1 2.7890 0.10467
## Residuals 11.8434 32
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(bm1)
## Analysis of Variance Table
##
## Response: SQBLOOMS
## Df Sum Sq Mean Sq F value Pr(>F)
## BED 1 1.5266 1.52662 4.1248 0.05064 .
## WATER 1 1.2092 1.20916 3.2671 0.08009 .
## SHADE 1 1.0322 1.03225 2.7890 0.10467
## Residuals 32 11.8434 0.37011
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The analysis should in theory be orthogonal because the experimental design in balanced with the same number of replicates across the different treatments and the variables are categorical rather than continuous. This is shown by the sequential and adjusted sums of squares being the same and the p values for all treatments remaining the same after using both type 1 and type 11 SS.(Bed: F1,32=4.12, p=0.051)(Water: F 1,32= 3.27,p=0.0801)(Shade: F1,32= 2.79, P=0.105)
bm2<-lm(SQBLOOMS~WATER+SHADE,data=blooms)
anova(bm2)
## Analysis of Variance Table
##
## Response: SQBLOOMS
## Df Sum Sq Mean Sq F value Pr(>F)
## WATER 1 1.2092 1.20916 2.9845 0.09342 .
## SHADE 1 1.0322 1.03225 2.5478 0.11998
## Residuals 33 13.3700 0.40515
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By getting rid of bed from the model, the p values have remained very similar and have not changed enough to affect the results (Water: F1,33= 2.98, P=0.093) (Shade: F1,33= 2.55, p=0.12), suggesting that it wasn’t worthwhile blocking for bed. However, this increases the residual error as less of the error can be explained by blocking into bed, reducing the models power. As a result, it may be more beneficial to keep blocking for bed in the experiment as it doesn’t affect the outcome of the results (p-value) but does increase the power of the model by accounting for some of the residual error.
bm3<-lm(SQ2~B2+W2+S2,data=blooms)
anova(bm3)
## Analysis of Variance Table
##
## Response: SQ2
## Df Sum Sq Mean Sq F value Pr(>F)
## B2 1 0.8176 0.81757 2.3194 0.1386
## W2 1 1.3946 1.39456 3.9563 0.0562 .
## S2 1 0.3360 0.33599 0.9532 0.3370
## Residuals 29 10.2223 0.35249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model above (model 3) has a smaller sample size due to shorter data set which has reduced the residual df from both models 1 and 2, causing a decrease f-value, suggesting that less of the variability in the model is explained by the three factors. All three models have similar p values providing the same results of no statistical significance between the three factors on bloom numbers.