Load all necessary packages for analysis
library(ggplot2)
library(ggthemes)
library(RColorBrewer)
library(extrafont)
library(grid)
library(tidyr)
library(plyr)
library(car)
Load in my dataset. Cleaning the dataset of any toads we don’t have the lifestage for and one outlier that is an obvious measurement error
setwd('/Users/Connor/Dropbox/Old Mac/School_Stuff/Amphibian Seminar/Cane toad project')
toads = read.csv('Cane.Toad.Data.csv')
toads = toads[1:13]
toads = toads %>% drop_na(Year) %>% drop_na(Lifestage)
toads = subset(toads, Catalog != 23988)
Create new columns in my data set for tibia difference correction, from Palmer 1994. This adjustment corrects for size-biased tibia length differences. Eq: 2 * abs(Right - Left) / (Right - Left)
tibia_dif = toads$Tibia_Right - toads$Tibia_Left
tibia_correction = (toads$Tibia_Right + toads$Tibia_Left) / 2
tibia_corrected = tibia_dif/tibia_correction
toads = cbind(toads, tibia_dif)
toads = cbind(toads, tibia_correction)
toads = cbind(toads, tibia_corrected)
toads = droplevels.data.frame(toads)
I’m grouping years for a categorical analysis of FA across years that I’ll perform later. The sample sizes aren’t even across blocks of years (1960s- 5, 1970s- 56 (16 S, 40 C), 1990s- 60 (45 S, 15 C), 2000s- 42 (15 S, 27 C)). This is important for later analyses, where we will have to conduct a type III anova for an unbalanced design. Since our sample size from the 1960s is so small, we need to obtain more measurements for a publication-worthy analysis.
breaks_yrs = c(0, 1970,1980,1999,2015)
labels_yrs = c('sixties', 'seventies', 'nineties', 'oughts')
toads$year_groups = cut(toads$Year,
breaks = breaks_yrs,
labels = labels_yrs)
#year_groups plot
ggplot(toads, aes(x = year_groups, fill = Region)) +
geom_bar()
For statistical analysis, I excluded the 1960s individuals. It doesn’t include both regions and has a tiny sample size (N = 5). If I include the 1960s, I run into the problem of perfect multicollinearity, and the 2000s southern region tibia differences get excluded from the model.
toads_new = droplevels(subset(toads, year_groups != 'sixties'))
Justification for inclusion of juveniles. Juveniles don’t appear to differ from adults. The low number of juveniles doesn’t justify the use of a statistical test.
juveniles = subset(toads_new, toads_new$Lifestage == "J")
#histogram
ggplot(juveniles, aes(x = abs(tibia_corrected), fill = Region)) +
geom_histogram(alpha = 0.75, binwidth = 0.005, position = 'dodge') +
geom_density(alpha = 0, aes(color = Region))
#scatterplot of juveniles
ggplot(juveniles, aes(x = Year, y = abs(tibia_corrected), color = Region)) +
geom_point(alpha = 0.75)
#scatterplot of juveniles vs. adults
ggplot(toads_new, aes(x = Year, y = abs(tibia_corrected), color = Lifestage)) +
geom_point(alpha = 0.75)
The beginning of my tibia FA analysis. My workflow (checking assumptions aside) is as follows: 1) Correct for size. -I did the actual correction earlier to accomodate for adding a year grouping variable. Below is my assessment to make sure it was necessary 2) Check distribution of tibia length differences 3) Assess directional asymmetry 4) Assess fluctuating asymmetry 5) Compare fluctuating asymmetry
Step 1) Correct for size. I am checking to make sure a size correction is necessary.
plot(tibia_dif ~ SVL, data = toads_new)
plot(abs(tibia_dif) ~ tibia_correction, data = toads_new)
It is. SVL definitely varies with differences in tibia length.
Step 2) Check the distribution of tibia length differences. A non-normal distribution may lead to incorrect statistical inference later on.
Total sample:
shapiro.test(toads_new$tibia_corrected)
Shapiro-Wilk normality test
data: toads_new$tibia_corrected
W = 0.91112, p-value = 3.118e-08
ggplot(toads_new, aes(x = tibia_corrected)) +
geom_histogram(binwidth = 0.005) +
geom_density(alpha = 0, color = 'red')
The distribution of tibia differences is leptokurtic. This may lead to an overabundance of false negatives.
Southern Region:
shapiro.test(toads_new$tibia_corrected[toads_new$Region == 'South'])
Shapiro-Wilk normality test
data: toads_new$tibia_corrected[toads_new$Region == "South"]
W = 0.9205, p-value = 0.0001499
ggplot(subset(toads_new, Region == "South"), aes(x = tibia_corrected)) +
geom_histogram(binwidth = 0.005) +
geom_density(alpha = 0, color = 'red') +
labs(x = "Tibia Differences (South)")
Also leptokurtic. An outlier may affect the test as well.
Central Region:
shapiro.test(toads_new$tibia_corrected[toads_new$Region == 'Central'])
Shapiro-Wilk normality test
data: toads_new$tibia_corrected[toads_new$Region == "Central"]
W = 0.85828, p-value = 2.376e-07
ggplot(subset(toads_new, Region == "Central"), aes(x = tibia_corrected)) +
geom_histogram(binwidth = 0.005) +
geom_density(alpha = 0, color = 'red') +
labs(x = "Tibia Differences (Central)")
Leptokurtic and may also be slightly left-skewed.
QQplots of the three distributions.
par(mfrow = c(2,2))
qqnorm(total_dist, main = 'Total')
qqline(total_dist)
qqnorm(south_dist, main = 'South')
qqline(south_dist)
qqnorm(central_dist, main = 'Central')
qqline(central_dist)
This tells us that our analysis may not be ideal for a study of fluctuating asymmetry since our distributions are not normal. Since the distributions are mainly leptokurtic and not extremely right or left skewed, I think interpretations of any significant test statistics will be accurate.
Step 3) Assessment of directional assymetry. Looking at the plots above, there doesn’t appear to be any, but I will use a one-sample t-test to test for any deviations of mean tibia differences from zero to check.
DA Total:
t.test(toads_new$tibia_corrected)
One Sample t-test
data: toads_new$tibia_corrected
t = -0.36686, df = 157, p-value = 0.7142
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.004881135 0.003351951
sample estimates:
mean of x
-0.0007645922
There is no significant directional asymmetry.
DA South:
t.test(toads_new$tibia_corrected[toads_new$Region == 'South'])
One Sample t-test
data: toads_new$tibia_corrected[toads_new$Region == "South"]
t = 1.1499, df = 75, p-value = 0.2538
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.002477156 0.009242170
sample estimates:
mean of x
0.003382507
There is no significant directional asymmetry.
DA Central:
t.test(toads_new$tibia_corrected[toads_new$Region == 'Central'])
One Sample t-test
data: toads_new$tibia_corrected[toads_new$Region == "Central"]
t = -1.5883, df = 81, p-value = 0.1161
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.010381026 0.001164535
sample estimates:
mean of x
-0.004608245
There is no significant directional asymmetry.
Step 4) Assess fluctuating asymmetry.
Total FA:
t.test(abs(toads_new$tibia_corrected)) #p < 0.001 ###Do this per region
One Sample t-test
data: abs(toads_new$tibia_corrected)
t = 10.052, df = 157, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.01313521 0.01956003
sample estimates:
mean of x
0.01634762
ggplot(toads_new, aes(x = abs(tibia_corrected))) +
geom_histogram(binwidth = 0.005) +
geom_density(alpha = 0, color = 'red') +
labs(x = "Tibia Difference Magnitudes")
There is significant overall fluctuating asymmetry.
South FA:
t.test(abs(toads_new$tibia_corrected[toads_new$Region == 'South']))
One Sample t-test
data: abs(toads_new$tibia_corrected[toads_new$Region == "South"])
t = 7.9618, df = 75, p-value = 1.411e-11
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.01304029 0.02174342
sample estimates:
mean of x
0.01739186
ggplot(subset(toads_new, Region == 'South'), aes(x = abs(tibia_corrected))) +
geom_histogram(binwidth = 0.005) +
geom_density(alpha = 0, color = 'red') +
labs(x = "Tibia Difference Magnitudes (South)")
There is significant fluctuating asymmetry in the southern region.
Central FA:
t.test(abs(toads_new$tibia_corrected[toads_new$Region == 'Central']))
One Sample t-test
data: abs(toads_new$tibia_corrected[toads_new$Region == "Central"])
t = 6.4084, df = 81, p-value = 9.11e-09
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.01060463 0.02015495
sample estimates:
mean of x
0.01537979
ggplot(subset(toads_new, Region == 'Central'), aes(x = abs(tibia_corrected))) +
geom_histogram(binwidth = 0.005) +
geom_density(alpha = 0, color = 'red') +
labs(x = "Tibia Difference Magnitudes (Central)")
There is significant fluctuating asymmetry in the central region.
Step 4) Compare fluctuating asymmetry levels. I approached the comparison by both binning years into categorical variables and including years as a continuous variable.
First I need to check assumptions and any potential confounding affects that I can account for.
Assumptions:
#linear model fit
fit_cat = lm(abs(tibia_corrected) ~ Region * year_groups, data = toads_new)
#Check assumptions
par(mfrow = c(2, 2))
plot(fit_cat)
The response deviates quite a bit from normality, and the residuals vs leverage plot extends beyond Cook’s distance. Therefore, I need to transform the response variable.
I’m transforming the corrected tibia lengths with a logit transformation since we have proportional data that is not binomial. I had to set a low number (the lowest non-zero tibia length difference) so there are no infinity values.
Here’s the transformation compared to the non-transformed data.
logit_tibia = logit(abs(toads_new$tibia_corrected), adjust = 0.001177941) #Change
toads_new = cbind.data.frame(toads_new, logit_tibia)
par(mfrow = c(1, 2)) #to compare the distributions
plot(toads_new$logit_tibia)
plot(abs(toads_new$tibia_corrected))
It looks overall more varied with fewer outliers.
Testing the assumptions again with transformed data:
fit_tibia = lm(logit_tibia ~ Region * year_groups, data = toads_new)
plot(fit_tibia)
This still deviates a bit from normality, but is better than the untransformed data. The variance is more even as well.
Now I need to make sure sex doesn’t covary with either year or region.
Year:
ggplot(toads_new, aes(x = year_groups, y = abs(tibia_corrected), fill = Sex)) +
geom_boxplot(alpha = 0.75)
fit_sex_year = lm(logit_tibia ~ Sex * year_groups, data = toads_new)
model_fit_sex_year = Anova(fit_sex_year, type = 3)
model_fit_sex_year
Anova Table (Type III tests)
Response: logit_tibia
Sum Sq Df F value Pr(>F)
(Intercept) 515 1 263.30 <2e-16 ***
Sex 1 1 0.26 0.61
year_groups 3 2 0.65 0.52
Sex:year_groups 2 2 0.52 0.59
Residuals 231 118
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Region:
ggplot(toads_new, aes(x = Region, y = abs(tibia_corrected), fill = Sex)) +
geom_boxplot(alpha = 0.75)
fit_sex_region = lm(logit_tibia ~ Sex * Region, data = toads_new)
model_fit_sex_region = Anova(fit_sex_region, type = 3)
model_fit_sex_region
Anova Table (Type III tests)
Response: logit_tibia
Sum Sq Df F value Pr(>F)
(Intercept) 930 1 486.96 <2e-16 ***
Sex 3 1 1.37 0.245
Region 0 1 0.01 0.909
Sex:Region 5 1 2.85 0.094 .
Residuals 229 120
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Sex doesn’t affect anything, so we’re good to continue.
It’s finally time to make the model!
ggplot(toads_new, aes(x = year_groups, y = logit_tibia, color = Region)) +
geom_boxplot(alpha = 0.75) +
scale_y_continuous(limits = c(-10, 0))
model_cat_trans = Anova(fit_tibia, type = 3)
model_cat_trans
Anova Table (Type III tests)
Response: logit_tibia
Sum Sq Df F value Pr(>F)
(Intercept) 1191 1 646.88 < 2e-16 ***
Region 28 1 15.09 0.00015 ***
year_groups 14 2 3.89 0.02252 *
Region:year_groups 22 2 6.08 0.00288 **
Residuals 280 152
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Since the time frame from the 1990s-2000s doesn’t have a clear-cut separation, I’m also going to analyze the data with a linear model. The linear model assumptions are met, so we can go ahead and look at the least-squares regression.
ggplot(toads_new, aes(x = Year, y = logit_tibia, color = Region)) +
geom_point(alpha = 0.75) +
geom_smooth(method = "lm")
fit_continuous = lm(logit_tibia ~ Region * Year, data = toads_new)
summary(fit_continuous)
Call:
lm(formula = logit_tibia ~ Region * Year, data = toads_new)
Residuals:
Min 1Q Median 3Q Max
-2.1357 -1.3842 0.0511 1.0649 2.7544
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -50.7549 23.8993 -2.12 0.0353 *
RegionSouth 134.4906 44.1253 3.05 0.0027 **
Year 0.0229 0.0120 1.91 0.0578 .
RegionSouth:Year -0.0672 0.0221 -3.04 0.0028 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.37 on 154 degrees of freedom
Multiple R-squared: 0.0924, Adjusted R-squared: 0.0748
F-statistic: 5.23 on 3 and 154 DF, p-value: 0.00183
I reran the model using the southern region as the reference region, so we can see the change across years for the central region.
fit_continuous_C = lm(logit_tibia ~ relevel(Region, ref = 'South') * Year, data = toads_new)
summary(fit_continuous_C)
Call:
lm(formula = logit_tibia ~ relevel(Region, ref = "South") * Year,
data = toads_new)
Residuals:
Min 1Q Median 3Q Max
-2.1357 -1.3842 0.0511 1.0649 2.7544
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 83.7357 37.0926 2.26 0.0254 *
relevel(Region, ref = "South")Central -134.4906 44.1253 -3.05 0.0027 **
Year -0.0443 0.0186 -2.38 0.0186 *
relevel(Region, ref = "South")Central:Year 0.0672 0.0221 3.04 0.0028 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.37 on 154 degrees of freedom
Multiple R-squared: 0.0924, Adjusted R-squared: 0.0748
F-statistic: 5.23 on 3 and 154 DF, p-value: 0.00183
I think the general trends are more clear with the continuous model, but the initial increase in FA with subsequent decrease in FA over time with the Central region gets lost with a single regression line.
Here’s a plot of tibia differences across years when including the sixties specimens. It shows the initial increase and subsequent decrease in FA for the southern region specimens shifted ahead of the central region’s similar pattern.
ggplot(toads, aes(x = year_groups, y = logit_tibia, fill = Region)) +
geom_boxplot(alpha = 0.75) +
scale_y_continuous(limits = c(-10, 0))
To obtain an index of relative tibia length, we conducted a general linear regression of mean tibia length against SVL and extracted the residuals.
length_glm = glm(mean_tibia ~ SVL, data = toads_new)
length_residuals = length_glm$residuals
toads_new = cbind(toads_new, length_residuals)
We then constructed an ANCOVA with relative tibia length as the response and the interaction of region and year as the fixed effect
ggplot(toads_new, aes(x = year_groups, y = length_residuals, color = Region)) +
geom_boxplot() +
scale_y_continuous(limits = c(-7, 7))
#model.
length_yr_fit = lm(length_residuals ~ Region * year_groups,
data = toads_new)
length_yr_model = Anova(length_yr_fit, type = 3)
length_yr_model
Anova Table (Type III tests)
Response: length_residuals
Sum Sq Df F value Pr(>F)
(Intercept) 3.65 1 0.9999 0.31893
Region 12.55 1 3.4337 0.06582 .
year_groups 29.78 2 4.0740 0.01890 *
Region:year_groups 11.39 2 1.5575 0.21400
Residuals 555.62 152
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Plot of leg length across time, including the 1960s individuals
mean_tibia_old = (toads$Tibia_Left + toads$Tibia_Right) / 2
toads = cbind(toads, mean_tibia_old)
#Correct for SVL by using residuals of GLM
length_glm_old = glm(mean_tibia_old ~ SVL, data = toads)
length_residuals_old = length_glm_old$residuals
toads = cbind(toads, length_residuals_old)
##Linear model of length residuals as a function of year
#plot
ggplot(toads, aes(x = year_groups, y = length_residuals_old, color = Region)) +
geom_boxplot() +
scale_y_continuous(limits = c(-7, 7))