Correlation and Regression Exercises

Read in the dataset:

wolf <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter16/chap16e2InbreedingWolves.csv"))

Plot with no formatting:

plot(wolf$nPups~wolf$inbreedCoef)

Question 1: Plot with formatting

plot(wolf$nPups~wolf$inbreedCoef, xlab = "Inbreeding Coefficient", ylab = "Number of Pups", col = "blue", pch = 16)

Question 2: Yes there looks to be a moderate, linear, negative correlation between the inbreeding coefficient and the number of pups surving the winter. There looks to be an outlier for the wolves that had an inbreeding coefficient of 0.

Question 3: Ho: rho = 0 HA: rho != 0 (!= means “does not equal”)

Finding the correlation coefficient and testing the hypotheses:

cor.test(wolf$nPups, wolf$inbreedCoef)


    Pearson's product-moment correlation

data:  wolf$nPups and wolf$inbreedCoef
t = -3.5893, df = 22, p-value = 0.001633
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8120418 -0.2706791
sample estimates:
       cor 
-0.6077184

Question 4: The p-value is .001633, this suggest that the null hypothesis is rejected and there is a correlation between inbreeding coefficient and number of pups surviving the winter. Additionally, this correlation (r) = -.61. Because correlation is not dependent on an explanatory and response variable, changing the order should not change the coefficient or result of the hypothesis test.

cor.test(wolf$inbreedCoef, wolf$nPups)


    Pearson's product-moment correlation

data:  wolf$inbreedCoef and wolf$nPups
t = -3.5893, df = 22, p-value = 0.001633
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8120418 -0.2706791
sample estimates:
       cor 
-0.6077184

Linear Regression:

Determine if the response variable is normally distributed:

hist(wolf$nPups)

boxplot(wolf$nPups)

shapiro.test(wolf$nPups)


    Shapiro-Wilk normality test

data:  wolf$nPups
W = 0.91711, p-value = 0.05043

qqnorm(wolf$nPups)
qqline(wolf$nPups)

There are no outliers, and although the p-value is barely above .05 for the shapiro test, the qqplot looks fairly normal. So I think we are OK to proceed with fitting a linear model. (Could have also looked at the boxplot)

Fit a linear model:

wolf.lm <- lm(nPups~inbreedCoef, data = wolf)
summary(wolf.lm)


Call:
lm(formula = nPups ~ inbreedCoef, data = wolf)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1332 -0.8200 -0.4345  0.6680  3.6076 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6.5672     0.7906   8.307 3.14e-08 ***
inbreedCoef -11.4467     3.1891  -3.589  0.00163 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.523 on 22 degrees of freedom
Multiple R-squared:  0.3693,    Adjusted R-squared:  0.3407 
F-statistic: 12.88 on 1 and 22 DF,  p-value: 0.001633

Quetion 6: 1.) the intercept is 6.5672 2.) the slope is: -11.4467 3.) estimate of number of pups = 6.5672 - 11.4467*inbreeding coefficient 4.) R2 = .3693, which is the r we found earlier, squared.

Question 7: The intercept is not equal to zero, as determine by the very small p-value. Additioanlly, the inbreeding coefficient does have a slope different than zero. This was expected as there was a correlation detected. In fact – this is the exact same t test statistic and p-value reported for the correlation coefficient! The tests are run in the same way both using a t-distribution that is why they produce the same results.

Saving the residuals of the linear model:

wolf.residuals <- wolf.lm$residuals

Question 8:

plot(wolf.residuals~wolf$nPups)
abline(h=0)

hist(wolf.residuals)

qqnorm(wolf.residuals)
qqline(wolf.residuals)

The residuals plot shows fairly even variance across the dataset. The residuals look fairly normal as visualized by the histogram and the qqplot. It does not look like there are any violations of our assumptions.

Add the linear model to the scatterplot of the data:

plot(wolf$nPups~wolf$inbreedCoef, xlab = "Inbreeding Coefficient", ylab = "Number of Pups", col = "blue", pch = 16)
abline(wolf.lm)

Predicting a value from the linear model:

predict(wolf.lm, data.frame(inbreedCoef = 0.250), interval = "confidence")

       fit      lwr      upr
1 3.705553 3.044309 4.366796

For an inbreeding coefficient of .25, we are 95% confident that the number of pups that would survive the winter would be between 3.04 and 4.37.

Question 10: This is not advisable because this would be extrapolation. Our linear model is only fit within the bounds (min and max) for our explanatory variable. Thus, we cannot be confident the same rate would persist beyond the bounds of our data.

LS0tCnRpdGxlOiAiQ29ycmVsYXRpb24gYW5kIFJlZ3Jlc3Npb24gRXhlcmNpc2VzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpSZWFkIGluIHRoZSBkYXRhc2V0OgoKYGBge3J9CndvbGYgPC0gcmVhZC5jc3YodXJsKCJodHRwOi8vd3d3Lnpvb2xvZ3kudWJjLmNhL35zY2hsdXRlci9XaGl0bG9ja1NjaGx1dGVyL3dwLWNvbnRlbnQvZGF0YS9jaGFwdGVyMTYvY2hhcDE2ZTJJbmJyZWVkaW5nV29sdmVzLmNzdiIpKQpgYGAKClBsb3Qgd2l0aCBubyBmb3JtYXR0aW5nOgpgYGB7cn0KcGxvdCh3b2xmJG5QdXBzfndvbGYkaW5icmVlZENvZWYpCmBgYAoKUXVlc3Rpb24gMTogUGxvdCB3aXRoIGZvcm1hdHRpbmcKYGBge3J9CnBsb3Qod29sZiRuUHVwc353b2xmJGluYnJlZWRDb2VmLCB4bGFiID0gIkluYnJlZWRpbmcgQ29lZmZpY2llbnQiLCB5bGFiID0gIk51bWJlciBvZiBQdXBzIiwgY29sID0gImJsdWUiLCBwY2ggPSAxNikKYGBgCgpRdWVzdGlvbiAyOiAKWWVzIHRoZXJlIGxvb2tzIHRvIGJlIGEgbW9kZXJhdGUsIGxpbmVhciwgbmVnYXRpdmUgY29ycmVsYXRpb24gYmV0d2VlbiB0aGUgaW5icmVlZGluZyBjb2VmZmljaWVudCBhbmQgdGhlIG51bWJlciBvZiBwdXBzIHN1cnZpbmcgdGhlIHdpbnRlci4gVGhlcmUgbG9va3MgdG8gYmUgYW4gb3V0bGllciBmb3IgdGhlIHdvbHZlcyB0aGF0IGhhZCBhbiBpbmJyZWVkaW5nIGNvZWZmaWNpZW50IG9mIDAuCgpRdWVzdGlvbiAzOgpIbzogcmhvID0gMApIQTogcmhvICE9IDAgKCE9IG1lYW5zICJkb2VzIG5vdCBlcXVhbCIpCgpGaW5kaW5nIHRoZSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCBhbmQgdGVzdGluZyB0aGUgaHlwb3RoZXNlczoKYGBge3J9CmNvci50ZXN0KHdvbGYkblB1cHMsIHdvbGYkaW5icmVlZENvZWYpCmBgYAoKUXVlc3Rpb24gNDoKVGhlIHAtdmFsdWUgaXMgLjAwMTYzMywgdGhpcyBzdWdnZXN0IHRoYXQgdGhlIG51bGwgaHlwb3RoZXNpcyBpcyByZWplY3RlZCBhbmQgdGhlcmUgaXMgYSBjb3JyZWxhdGlvbiBiZXR3ZWVuIGluYnJlZWRpbmcgY29lZmZpY2llbnQgYW5kIG51bWJlciBvZiBwdXBzIHN1cnZpdmluZyB0aGUgd2ludGVyLiBBZGRpdGlvbmFsbHksIHRoaXMgY29ycmVsYXRpb24gKHIpID0gLS42MS4gQmVjYXVzZSBjb3JyZWxhdGlvbiBpcyBub3QgZGVwZW5kZW50IG9uIGFuIGV4cGxhbmF0b3J5IGFuZCByZXNwb25zZSB2YXJpYWJsZSwgY2hhbmdpbmcgdGhlIG9yZGVyIHNob3VsZCBub3QgY2hhbmdlIHRoZSBjb2VmZmljaWVudCBvciByZXN1bHQgb2YgdGhlIGh5cG90aGVzaXMgdGVzdC4gCgpgYGB7cn0KY29yLnRlc3Qod29sZiRpbmJyZWVkQ29lZiwgd29sZiRuUHVwcykKYGBgCgpMaW5lYXIgUmVncmVzc2lvbjoKCkRldGVybWluZSBpZiB0aGUgcmVzcG9uc2UgdmFyaWFibGUgaXMgbm9ybWFsbHkgZGlzdHJpYnV0ZWQ6IApgYGB7cn0gCmhpc3Qod29sZiRuUHVwcykKYm94cGxvdCh3b2xmJG5QdXBzKQpzaGFwaXJvLnRlc3Qod29sZiRuUHVwcykKcXFub3JtKHdvbGYkblB1cHMpCnFxbGluZSh3b2xmJG5QdXBzKQpgYGAKVGhlcmUgYXJlIG5vIG91dGxpZXJzLCBhbmQgYWx0aG91Z2ggdGhlIHAtdmFsdWUgaXMgYmFyZWx5IGFib3ZlIC4wNSBmb3IgdGhlIHNoYXBpcm8gdGVzdCwgdGhlIHFxcGxvdCBsb29rcyBmYWlybHkgbm9ybWFsLiBTbyBJIHRoaW5rIHdlIGFyZSBPSyB0byBwcm9jZWVkIHdpdGggZml0dGluZyBhIGxpbmVhciBtb2RlbC4gKENvdWxkIGhhdmUgYWxzbyBsb29rZWQgYXQgdGhlIGJveHBsb3QpCgpGaXQgYSBsaW5lYXIgbW9kZWw6CmBgYHtyfQp3b2xmLmxtIDwtIGxtKG5QdXBzfmluYnJlZWRDb2VmLCBkYXRhID0gd29sZikKc3VtbWFyeSh3b2xmLmxtKQpgYGAKClF1ZXRpb24gNjoKMS4pIHRoZSBpbnRlcmNlcHQgaXMgNi41NjcyCjIuKSB0aGUgc2xvcGUgaXM6IC0xMS40NDY3CjMuKSBlc3RpbWF0ZSBvZiBudW1iZXIgb2YgcHVwcyA9IDYuNTY3MiAtIDExLjQ0NjcqaW5icmVlZGluZyBjb2VmZmljaWVudAo0LikgUjIgPSAuMzY5Mywgd2hpY2ggaXMgdGhlIHIgd2UgZm91bmQgZWFybGllciwgc3F1YXJlZC4KClF1ZXN0aW9uIDc6IApUaGUgaW50ZXJjZXB0IGlzIG5vdCBlcXVhbCB0byB6ZXJvLCBhcyBkZXRlcm1pbmUgYnkgdGhlIHZlcnkgc21hbGwgcC12YWx1ZS4gQWRkaXRpb2FubGx5LCB0aGUgaW5icmVlZGluZyBjb2VmZmljaWVudCBkb2VzIGhhdmUgYSBzbG9wZSBkaWZmZXJlbnQgdGhhbiB6ZXJvLiBUaGlzIHdhcyBleHBlY3RlZCBhcyB0aGVyZSB3YXMgYSBjb3JyZWxhdGlvbiBkZXRlY3RlZC4gSW4gZmFjdCAtLSB0aGlzIGlzIHRoZSBleGFjdCBzYW1lIHQgdGVzdCBzdGF0aXN0aWMgYW5kIHAtdmFsdWUgcmVwb3J0ZWQgZm9yIHRoZSBjb3JyZWxhdGlvbiBjb2VmZmljaWVudCEgVGhlIHRlc3RzIGFyZSBydW4gaW4gdGhlIHNhbWUgd2F5IGJvdGggdXNpbmcgYSB0LWRpc3RyaWJ1dGlvbiB0aGF0IGlzIHdoeSB0aGV5IHByb2R1Y2UgdGhlIHNhbWUgcmVzdWx0cy4KClNhdmluZyB0aGUgcmVzaWR1YWxzIG9mIHRoZSBsaW5lYXIgbW9kZWw6CgpgYGB7cn0Kd29sZi5yZXNpZHVhbHMgPC0gd29sZi5sbSRyZXNpZHVhbHMKYGBgCgpRdWVzdGlvbiA4OgoKYGBge3J9CnBsb3Qod29sZi5yZXNpZHVhbHN+d29sZiRuUHVwcykKYWJsaW5lKGg9MCkKaGlzdCh3b2xmLnJlc2lkdWFscykKcXFub3JtKHdvbGYucmVzaWR1YWxzKQpxcWxpbmUod29sZi5yZXNpZHVhbHMpCmBgYAoKVGhlIHJlc2lkdWFscyBwbG90IHNob3dzIGZhaXJseSBldmVuIHZhcmlhbmNlIGFjcm9zcyB0aGUgZGF0YXNldC4gVGhlIHJlc2lkdWFscyBsb29rIGZhaXJseSBub3JtYWwgYXMgdmlzdWFsaXplZCBieSB0aGUgaGlzdG9ncmFtIGFuZCB0aGUgcXFwbG90LiBJdCBkb2VzIG5vdCBsb29rIGxpa2UgdGhlcmUgYXJlIGFueSB2aW9sYXRpb25zIG9mIG91ciBhc3N1bXB0aW9ucy4KCkFkZCB0aGUgbGluZWFyIG1vZGVsIHRvIHRoZSBzY2F0dGVycGxvdCBvZiB0aGUgZGF0YToKYGBge3J9CnBsb3Qod29sZiRuUHVwc353b2xmJGluYnJlZWRDb2VmLCB4bGFiID0gIkluYnJlZWRpbmcgQ29lZmZpY2llbnQiLCB5bGFiID0gIk51bWJlciBvZiBQdXBzIiwgY29sID0gImJsdWUiLCBwY2ggPSAxNikKYWJsaW5lKHdvbGYubG0pCmBgYAoKUHJlZGljdGluZyBhIHZhbHVlIGZyb20gdGhlIGxpbmVhciBtb2RlbDoKYGBge3J9CnByZWRpY3Qod29sZi5sbSwgZGF0YS5mcmFtZShpbmJyZWVkQ29lZiA9IDAuMjUwKSwgaW50ZXJ2YWwgPSAiY29uZmlkZW5jZSIpCmBgYAoKRm9yIGFuIGluYnJlZWRpbmcgY29lZmZpY2llbnQgb2YgLjI1LCB3ZSBhcmUgOTUlIGNvbmZpZGVudCB0aGF0IHRoZSBudW1iZXIgb2YgcHVwcyB0aGF0IHdvdWxkIHN1cnZpdmUgdGhlIHdpbnRlciB3b3VsZCBiZSBiZXR3ZWVuIDMuMDQgYW5kIDQuMzcuIAoKUXVlc3Rpb24gMTA6ClRoaXMgaXMgbm90IGFkdmlzYWJsZSBiZWNhdXNlIHRoaXMgd291bGQgYmUgZXh0cmFwb2xhdGlvbi4gT3VyIGxpbmVhciBtb2RlbCBpcyBvbmx5IGZpdCB3aXRoaW4gdGhlIGJvdW5kcyAobWluIGFuZCBtYXgpIGZvciBvdXIgZXhwbGFuYXRvcnkgdmFyaWFibGUuIFRodXMsIHdlIGNhbm5vdCBiZSBjb25maWRlbnQgdGhlIHNhbWUgcmF0ZSB3b3VsZCBwZXJzaXN0IGJleW9uZCB0aGUgYm91bmRzIG9mIG91ciBkYXRhLgoK