Set up summarySE function to summarize data.
Summarize and plot mean length (and error bars) by Sex and Region
LabCultureSummary<-summarySE(LabCultures,measurevar = "Length", groupvars=c("Region","Sex"))
LabCultureSummary
## Region Sex N Length sd se ci
## 1 North F 52 15.63462 1.645300 0.2281621 0.4580546
## 2 North M 51 18.80392 3.612587 0.5058634 1.0160564
## 3 South F 57 14.09649 1.596421 0.2114511 0.4235874
## 4 South M 90 17.01111 2.722129 0.2869376 0.5701389
# Plots
# Point Plot of Standard error of the mean
ggplot(LabCultureSummary, aes(x=Region, y=Length, color=Sex)) +
geom_errorbar(aes(ymin=Length-se, ymax=Length+se), width=.1) +
geom_line() +
geom_point()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
Look for a significant difference between Sexes in mean length.
model1=lm(Length~Sex, data=LabCultures)
summary(model1)
##
## Call:
## lm(formula = Length ~ Sex, data = LabCultures)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6596 -1.8303 -0.1596 1.3404 10.3404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.8303 0.2553 58.092 < 2e-16 ***
## SexM 2.8293 0.3399 8.323 5.79e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.665 on 248 degrees of freedom
## Multiple R-squared: 0.2183, Adjusted R-squared: 0.2152
## F-statistic: 69.27 on 1 and 248 DF, p-value: 5.794e-15
library(car)
Anova(model1)
## Anova Table (Type II tests)
##
## Response: Length
## Sum Sq Df F value Pr(>F)
## Sex 492.11 1 69.273 5.794e-15 ***
## Residuals 1761.77 248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F value: 69.27, 248 DF Relatively high F value indicates there is a significant difference between Sexes in mean length.
Look for a significant difference between Regions in mean length.
model2=lm(Length~Region, data=LabCultures)
summary(model2)
##
## Call:
## lm(formula = Length ~ Region, data = LabCultures)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.881 -2.204 -0.881 1.796 10.796
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.2039 0.2900 59.329 < 2e-16 ***
## RegionSouth -1.3229 0.3782 -3.498 0.000555 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.943 on 248 degrees of freedom
## Multiple R-squared: 0.04703, Adjusted R-squared: 0.04319
## F-statistic: 12.24 on 1 and 248 DF, p-value: 0.0005547
library(car)
Anova(model2)
## Anova Table (Type II tests)
##
## Response: Length
## Sum Sq Df F value Pr(>F)
## Region 106.0 1 12.239 0.0005547 ***
## Residuals 2147.9 248
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F value: 12.239, 248 DF Lower F value indicates not a significant difference between Regions in mean length.
Does the effect of Region differ between sexes?
model3=lm(Length~ Region*Sex, data=LabCultures)
library(car)
Anova(model3)
## Anova Table (Type II tests)
##
## Response: Length
## Sum Sq Df F value Pr(>F)
## Region 168.00 1 25.9472 6.988e-07 ***
## Sex 554.12 1 85.5806 < 2.2e-16 ***
## Region:Sex 0.96 1 0.1484 0.7004
## Residuals 1592.81 246
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F value: 0.1484 DF=246 Low F value means that the difference between males and females in the North is not signficantly different from the difference between males and females in the South.
Plot model-fitted group means and standard errors.
#sex
plot(allEffects(model1,))
#Region
plot(allEffects(model2))
#Region, sex
plot(allEffects(model3))
Plot Number of Eggs vs. Length and color code points by Region.
EggData<-read_excel("ManyakBellSotka_AmNat_AllData.xlsx", sheet=3)
names(EggData)[4] <- "Length"
names(EggData)[6] <- "NumberOfEggs"
EggData
## # A tibble: 200 x 7
## Region Population Number Length `Width (mm)` NumberOfEggs `Dry Mass (mg)`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 North Nahant 1 14.6 4.19 172 7.2
## 2 North Nahant 2 12.6 3.42 110 9
## 3 North Nahant 3 11.4 3.65 80 3.8
## 4 North Nahant 4 12.9 3.87 104 4.5
## 5 North Nahant 5 13.9 4.10 148 10.4
## 6 North Nahant 6 12.0 3.85 83 5.3
## 7 North Nahant 7 14.4 4.68 184 10.6
## 8 North Nahant 8 13.7 4.47 79 10.9
## 9 North Nahant 9 13.0 4.2 86 9.9
## 10 North Nahant 10 11.9 3.85 58 8
## # ... with 190 more rows
ggplot(EggData, aes(x=Length, y=NumberOfEggs, color=Region))+geom_point()+ggtitle("Number of Eggs vs Length, by Region")+xlab("Length (mm)")+ylab("Number of Eggs")
Test whether the relationship between Number of Eggs and Length differs between Regions.
modelEgg=lm(NumberOfEggs~Length*Region, data=EggData)
summary(modelEgg)
##
## Call:
## lm(formula = NumberOfEggs ~ Length * Region, data = EggData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102.922 -13.479 0.379 15.391 64.127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -151.316 21.354 -7.086 2.43e-11 ***
## Length 18.825 1.526 12.333 < 2e-16 ***
## RegionSouth 73.091 29.456 2.481 0.0139 *
## Length:RegionSouth -7.454 2.414 -3.088 0.0023 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.44 on 196 degrees of freedom
## Multiple R-squared: 0.7388, Adjusted R-squared: 0.7348
## F-statistic: 184.8 on 3 and 196 DF, p-value: < 2.2e-16
Anova(modelEgg)
## Anova Table (Type II tests)
##
## Response: NumberOfEggs
## Sum Sq Df F value Pr(>F)
## Length 107208 1 179.5393 < 2.2e-16 ***
## Region 6371 1 10.6689 0.001286 **
## Length:Region 5696 1 9.5382 0.002304 **
## Residuals 117037 196
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Significant p-values (p=0.0012) indicate that there is a significant difference in the relationship of Number of Eggs and Length by Region.
Plot the two linear regression lines, one for each Region, on the previous plot.
##### Add the two linear Regression lines for each region
ggplot(EggData, aes(x=Length, y=NumberOfEggs, color=Region))+geom_point()+geom_abline(intercept=-151.316,slope=18.825, color="red")+geom_abline(intercept=(-151.316+73.091),slope=(18.835-7.454),color="blue")+ggtitle("Number of Eggs vs Length, by Region")+xlab("Length (mm)")+ylab("Number of Eggs")
#first geom_abline is for North and second geom_abline is for South. I can't figure out how to take the residuals directly from the data frame but manually plugging in the values works...?
North:Red line, South:Blue line
For the Egg Data model, make plots that explore whether the residuals of the model are normally distributed.
EggResid=resid(modelEgg)
qqnorm(EggResid)
qqline(EggResid)
Based on the residuals, normally distributed and aprox. linear..possibly heavy tailed, as some points near both ends seem to diverge from the line
Look at whether the variance of the residuals increases as Length increases.
ggplot(EggData,aes(x=Length,y=EggResid))+geom_point()+geom_hline(yintercept=0)
Variance appears to increase as length increases.
Look at whether the variance of the residuals varies between regions
plot(modelEgg)
Genuinely unsure how to interpret these graphs and the role that regions play here