Simple linear regression

  1. Go to http://www.stat.ufl.edu/~winner/datasets.html and find the dataset called “Head Size and Brain Weight”. Open the “Description” and the “Data” files in separate browser tabs. Read the description and then switch to the data tab and press Ctrl + a to highlight the entire dataset and copy it via Ctrl + c (On Mac use: Cmd + a and Cmd + c). Use the following code to import the data into R:
## On PC
## brains <- read.table(file = "clipboard", header = F)

# For some people the following approach also worked last time:
# brains <- readClipboard()

## On Mac
brains <- read.table(pipe(description = "pbpaste"), header = F)
## Warning in read.table(pipe(description = "pbpaste"), header = F):
## incomplete final line found by readTableHeader on 'pbpaste'
  1. The data frame comes without column names. Make use of the names command to assign the following headers: gender, age.range, head.size and brain.weight

  2. Examine the structure of the data frame and plot the variable brain.weight against head.size.

  3. Run a simple linear regression model with brain.weight as response variable and head.size as predictor variable.

  1. Use par(mfrow = c(2, 2)) to split the plot window into four panels and inspect the model diagnostic plots using plot(yourmodelname) (where ‘yourmodelname’ is obviously the object name that you gave your model). Comment on the underlying model assumptions of variance homogeneity and normality of the residuals and whether there are influential observations.

  2. Inspect and interpret the model summary output.

  3. Add the model fit to the plot of the raw data.

  4. Use the predict function to obtain a brain weight prediction for a head size of 4750 \(cm^3\).

  5. Repeat d. and add the argument interval = "confidence" within the predict function. What does it return in addition to the predicted value?

  6. This last task is for the keen ones among you: Compute the model predictions along with their 95% confidence intervals across the entire range of head sizes on a very fine grid (the predict function returns a matrix, so create an assignment for the predictions and convert the object into a data frame). Now figure out how to add lines indicating the upper and lower bounds of the confidence intervals to the plot. Use the lines command with the argument lty = 2 (meaning dashed lines). Ok, here is the procedure: