Matrix Algebra and Linear Models

Week 2 - Testing Linear Models

See edX

library(UsingR)
## Warning: package 'UsingR' was built under R version 3.1.3
## Loading required package: MASS
## Loading required package: HistData
## Warning: package 'HistData' was built under R version 3.1.3
## Loading required package: Hmisc
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
## 
## 
## Attaching package: 'UsingR'
## 
## The following object is masked from 'package:survival':
## 
##     cancer
set.seed(1)
n <- nrow(father.son)
N <- 50
index <- sample(n,N)
sampledat <- father.son[index,]
x <- sampledat$fheight
y <- sampledat$sheight

Question 2.2.1

Note that the fitted values (Y-hat) from a linear model can be obtained with:

fit = lm(y ~ x)
y_hat <- fit$fitted.values

What is the sum of the squared residuals (where residuals are given by r_i = Y_i - Y-hat_i)?

ssr <- sum((y - y_hat)^2)
ssr
## [1] 256.2152

Our estimate of sigma^2 will be the sum of squared residuals divided by (N - p), the sample size minus the number of terms in the model. Since we have a sample of 50 and 2 terms in the model (an intercept and a slope), our estimate of sigma^2 will be the sum of squared residuals divided by 48. Save this to a variable ‘sigma2’:

sigma2 <- ssr / 48
sigma2
## [1] 5.337816

Question 2.2.2

Form the design matrix X (note: use a capital X!). This can be done by combining a column of 1’s with a column of ‘x’ the father’s heights.

X <- cbind(1, x)

Now calculate (X^T X)^-1, the inverse of X transpose times X. Use the solve() function for the inverse and t() for the transpose. What is the element in the first row, first column?

Xinv <- solve(crossprod(X))
Xinv[1,1]
## [1] 11.30275

Question 2.2.3

Now we are one step away from the standard error of beta-hat. Take the diagonals from the (X^T X)^-1 matrix above, using the diag() function. Now multiply our estimate of sigma^2 and the diagonals of this matrix. This is the estimated variance of beta-hat, so take the square root of this. You should end up with two numbers, the standard error for the intercept and the standard error for the slope.

What is the standard error for the slope?

d <- diag(Xinv)
beta_hat <- sqrt(sigma2 * d)
beta_hat[2]
##         x 
## 0.1141966