## On PC
## brains <- read.table(file = "clipboard", header = F)
# For some people the following approach also worked last time:
# brains <- readClipboard()
## On Mac
brains <- read.table(pipe(description = "pbpaste"), header = F)
names
command to assign the following headers: gender, age.range, head.size and brain.weighthead(brains, n = 3)
names(brains) <- c("gender", "age.range", "head.size", "brain.weight")
head(brains, n = 3) # check whether name change has happened
brain.weight
against head.size
.## Check the structure
str(brains)
## 'data.frame': 237 obs. of 4 variables:
## $ gender : int 1 1 1 1 1 1 1 1 1 1 ...
## $ age.range : int 1 1 1 1 1 1 1 1 1 1 ...
## $ head.size : int 4512 3738 4261 3777 4177 3585 3785 3559 3613 3982 ...
## $ brain.weight: int 1530 1297 1335 1282 1590 1300 1400 1255 1355 1375 ...
plot(brain.weight ~ head.size, data = brains)
brain.weight
as response variable and head.size
as predictor variable.m1 <- lm(brain.weight ~ head.size, data = brains)
m0 <- lm(brain.weight ~ 1, data = brains)
mean(brains$brain.weight)
## [1] 1282.873
summary(m0)
##
## Call:
## lm(formula = brain.weight ~ 1, data = brains)
##
## Residuals:
## Min 1Q Median 3Q Max
## -327.87 -75.87 -2.87 67.13 352.13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1282.873 7.817 164.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 120.3 on 236 degrees of freedom
anova(m0, m1)
par(mfrow = c(2, 2))
to split the plot window into four panels and inspect the model diagnostic plots using plot(yourmodelname)
(where ‘yourmodelname’ is obviously the object name that you gave your model). Comment on the underlying model assumptions of variance homogeneity and normality of the residuals and whether there are influential observations.par(mfrow = c(2, 2), mar = c(3.5, 3, 1.5, 1), cex.lab = 0.9, cex.axis = 0.8, mgp = c(1.9, 0.7, 0)) # see ?par if you are interested in what all those arguments are specifying...
plot(m1)
The residual vs. fitted values plot indicates no violation of the variance homogeneity criterion and the quantile-quantile plot suggests normality of the residuals. No influential observations were flagged.
summary(m1) # An intercept of 325 doesn't make any sense: no head capsule - no brain!
##
## Call:
## lm(formula = brain.weight ~ head.size, data = brains)
##
## Residuals:
## Min 1Q Median 3Q Max
## -175.98 -49.76 -1.76 46.60 242.34
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 325.57342 47.14085 6.906 4.61e-11 ***
## head.size 0.26343 0.01291 20.409 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 72.43 on 235 degrees of freedom
## Multiple R-squared: 0.6393, Adjusted R-squared: 0.6378
## F-statistic: 416.5 on 1 and 235 DF, p-value: < 2.2e-16
Both, the intercept and slope paramters are statistically significant but does a significant intercept make sense biologically? The slope suggests that with an increase in 1 \(cm^3\) head volume, the brain weight increases by 0.26 g. The adjusted \(R^2\) values implies that model explains about 64% of the variations seen in the raw data.\ We should rerun the model without intercept term to force the model fit through the origin, i.e. there is no brain when head size is zero.
m2 <- lm(brain.weight ~ head.size - 1, data = brains)
summary(m2)
##
## Call:
## lm(formula = brain.weight ~ head.size - 1, data = brains)
##
## Residuals:
## Min 1Q Median 3Q Max
## -234.153 -47.988 6.495 56.307 250.163
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## head.size 0.35213 0.00141 249.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.27 on 236 degrees of freedom
## Multiple R-squared: 0.9962, Adjusted R-squared: 0.9962
## F-statistic: 6.238e+04 on 1 and 236 DF, p-value: < 2.2e-16
plot(brain.weight ~ head.size, data = brains, ylim = c(950, 1750))
abline(m1)
abline(m2, col = "red")
predict
function to obtain a brain weight prediction for a head size of 4750 \(cm^3\).point.pred <- predict(m2, newdata = data.frame(head.size = 4750))
interval = "confidence"
within the predict
function. What does it return in addition to the predicted value?predict(m2, newdata = data.frame(head.size = 4750), interval = "confidence")
## fit lwr upr
## 1 1672.609 1659.416 1685.802
By specifying interval = "confidence"
, the predict
function returns the fitted value including the upper and lower 95% confidence intervals around it.
predict
function returns a matrix, so create an assignment for the predictions and convert the object into a data frame). Now figure out how to add lines indicating the upper and lower bounds of the confidence intervals to the plot. Use the lines
command with the argument lty = 2
(meaning dashed lines). Ok, here is the procedure:## Create a fine grid of new predictor values
xv <- with(brains, seq(min(head.size), max(head.size), length.out = 100)) # xv is short for x-values (new set of predictor values)
## Get model predictions based on the new x-values
preds <- as.data.frame(predict(m1, newdata = data.frame(head.size = xv), interval = "confidence"))
plot(brain.weight ~ head.size, data = brains, ylab = "Brain weight (g)", xlab = expression(paste("Brain size (cm"^3, ")")))
text(x = 2800, y = 1600, labels = expression(paste(italic("r")^2, " = 0.99 ***")), adj = 0)
lines(xv, preds$fit, col = "white", lwd = 2.5) # Graphical trick: add a thick white regression line first to create a white edge, which makes the following black line more conspicuous!
lines(xv, preds$fit) # the regression line
lines(xv, preds$upr, lty = 2) # upper 95% CI
lines(xv, preds$lwr, lty = 2) # lower 95% CI
For graph enthusiasts, we can fancy things up by using the polygon
function to plot the 95% confidence interval as a grey area underlying the points and the regression line.
plot(brain.weight ~ head.size, data = brains, ylab = "Brain weight (g)", xlab = expression(paste("Brain size (cm"^3, ")")))
text(x = 2800, y = 1600, labels = expression(paste(italic("r")^2, " = 0.99 ***")), adj = 0)
polygon(x = c(xv, rev(xv)), y = c(preds$upr, rev(preds$lwr)), col = "grey", border = NA) # this plot over the points, so we'll have to replot those again...
points(brain.weight ~ head.size, data = brains, pch = 21, bg = "black", col = "white", lwd = 0.7, cex = 1.1) # makes black points with tiny white margin to enhance contrast in overlapping points
lines(xv, preds$fit, col = "white", lwd = 2.5) # Graphical trick: add a thick white regression line first to create a white edge, which makes the following black line more conspicuous!
lines(xv, preds$fit) # the regression line