Read in the dataset:
wolf <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter16/chap16e2InbreedingWolves.csv"))
Plot with no formatting:
plot(wolf$nPups~wolf$inbreedCoef)
Question 1: Plot with formatting
plot(wolf$nPups~wolf$inbreedCoef, xlab = "Inbreeding Coefficient", ylab = "Number of Pups", col = "blue", pch = 16)
Question 2: Yes there looks to be a moderate, linear, negative correlation between the inbreeding coefficient and the number of pups surving the winter. There looks to be an outlier for the wolves that had an inbreeding coefficient of 0.
Question 3: Ho: rho = 0 HA: rho != 0 (!= means “does not equal”)
Finding the correlation coefficient and testing the hypotheses:
cor.test(wolf$nPups, wolf$inbreedCoef)
Pearson's product-moment correlation
data: wolf$nPups and wolf$inbreedCoef
t = -3.5893, df = 22, p-value = 0.001633
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8120418 -0.2706791
sample estimates:
cor
-0.6077184
Question 4: The p-value is .001633, this suggest that the null hypothesis is rejected and there is a correlation between inbreeding coefficient and number of pups surviving the winter. Additionally, this correlation (r) = -.61. Because correlation is not dependent on an explanatory and response variable, changing the order should not change the coefficient or result of the hypothesis test.
cor.test(wolf$inbreedCoef, wolf$nPups)
Pearson's product-moment correlation
data: wolf$inbreedCoef and wolf$nPups
t = -3.5893, df = 22, p-value = 0.001633
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8120418 -0.2706791
sample estimates:
cor
-0.6077184
Linear Regression:
Determine if the response variable is normally distributed:
hist(wolf$nPups)
boxplot(wolf$nPups)
shapiro.test(wolf$nPups)
Shapiro-Wilk normality test
data: wolf$nPups
W = 0.91711, p-value = 0.05043
qqnorm(wolf$nPups)
qqline(wolf$nPups)
There are no outliers, and although the p-value is barely above .05 for the shapiro test, the qqplot looks fairly normal. So I think we are OK to proceed with fitting a linear model. (Could have also looked at the boxplot)
Fit a linear model:
wolf.lm <- lm(nPups~inbreedCoef, data = wolf)
summary(wolf.lm)
Call:
lm(formula = nPups ~ inbreedCoef, data = wolf)
Residuals:
Min 1Q Median 3Q Max
-2.1332 -0.8200 -0.4345 0.6680 3.6076
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5672 0.7906 8.307 3.14e-08 ***
inbreedCoef -11.4467 3.1891 -3.589 0.00163 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.523 on 22 degrees of freedom
Multiple R-squared: 0.3693, Adjusted R-squared: 0.3407
F-statistic: 12.88 on 1 and 22 DF, p-value: 0.001633
Quetion 6: 1.) the intercept is 6.5672 2.) the slope is: -11.4467 3.) estimate of number of pups = 6.5672 - 11.4467*inbreeding coefficient 4.) R2 = .3693, which is the r we found earlier, squared.
Question 7: The intercept is not equal to zero, as determine by the very small p-value. Additioanlly, the inbreeding coefficient does have a slope different than zero. This was expected as there was a correlation detected. In fact – this is the exact same t test statistic and p-value reported for the correlation coefficient! The tests are run in the same way both using a t-distribution that is why they produce the same results.
Saving the residuals of the linear model:
wolf.residuals <- wolf.lm$residuals
Question 8:
plot(wolf.residuals~wolf$nPups)
abline(h=0)
hist(wolf.residuals)
qqnorm(wolf.residuals)
qqline(wolf.residuals)
The residuals plot shows fairly even variance across the dataset. The residuals look fairly normal as visualized by the histogram and the qqplot. It does not look like there are any violations of our assumptions.
Add the linear model to the scatterplot of the data:
plot(wolf$nPups~wolf$inbreedCoef, xlab = "Inbreeding Coefficient", ylab = "Number of Pups", col = "blue", pch = 16)
abline(wolf.lm)
Predicting a value from the linear model:
predict(wolf.lm, data.frame(inbreedCoef = 0.250), interval = "confidence")
fit lwr upr
1 3.705553 3.044309 4.366796
For an inbreeding coefficient of .25, we are 95% confident that the number of pups that would survive the winter would be between 3.04 and 4.37.
Question 10: This is not advisable because this would be extrapolation. Our linear model is only fit within the bounds (min and max) for our explanatory variable. Thus, we cannot be confident the same rate would persist beyond the bounds of our data.