In this note we’re going to demonstrate how we can use linear algebra to compute the standard errors of regression coefficients. We’re going to start by defining the variance covariance matrix: We denote a variance covariance matrix of a random variable \(Y\) with \(\mathbb{V}[Y]=\Sigma\) with entries: \[\Sigma_{ij}=\mathbb{C}[y_{i},y_{j}]\] In the cases we have been studying: \[\mathbb{C}[y_{i},y_{i}]=\mathbb{V}[y_{i}]=\sigma^{2}\] \[\begin{align}\mathbb{C}[y_{i},y_{j}]=0 & & \forall & i\not=j\end{align}\] Thus, \(\mathbb{V}[Y]=\sigma^{2}I\)

Variance of a linear combination. A useful matrix algebra result is that \[\mathbb{V}[AY]=A\mathbb{V}[Y]A^{T}\] For example if \(y_{1}\) and \(y_{2}\) are independent, both with variance \(\sigma^{2}\): \[\mathbb{V}[y_{1}+y_{2}]=\mathbb{V}\left\{\begin{pmatrix}1&1\end{pmatrix}\underbrace{\begin{pmatrix}y_{1}\\y_{2}\end{pmatrix}}_{Y}\right\}=\begin{pmatrix}1&1\end{pmatrix}\sigma^{2}I\begin{pmatrix}1\\1\end{pmatrix}\] LSE Standard Errors:

Note that \(\hat{\beta}\) is a linear combination of \(Y\): \[\begin{align}\hat{\beta}&=(X^{T}X)^{-1}X^{T}Y\\ \mathbb{V}[\hat{\beta}]&=\mathbb{V}\{X^{T}X)^{-1}X^{T}Y\}=\\ &=(X^{T}X)^{-1}X^{T}\sigma^{2}I\{(X^{T}X)^{-1}X^{T}\}^{T}=\\ &=\sigma^{2}(X^{T}X)^{-1}X^{T}X(X^{T}X)^{-1}=\\ &=\sigma^{2}(X^{T}X)^{-1}\end{align}\] Linear Combination of Estimates

Commonly we want to compute the standard deviations of a linear estimation of estimates such as

\[\hat{\beta_{1}}-\hat{\beta_{2}}=\begin{pmatrix}0&1&-1&0&...&0\end{pmatrix}\begin{pmatrix}\hat{\beta}_{0}\\ \hat{\beta}_{1}\\ \hat{\beta}_{2}\\ \vdots \\ \hat{\beta}_{p}\end{pmatrix}\] We know how to compute the variance covariance matrix of \(\hat{\beta}\)

The standard approach to writing linear models either assume the \(X\) are fixed or that we are conditioning on them. Thus \(X\beta\) has no variance as the \(X\) is considered fixed. This is why we write \(\mathbb{V}[y_{i}]=\mathbb{V}[e_{i}]=\sigma^{2}\). This can cause confusion in practice because if you, for example, compute the following:

# install.packages("UsingR")
# library(UsingR)
x =  father.son$fheight
beta =  c(34,0.5)
var(beta[1]+beta[2]*x)
[1] 1.883576

it is nowhere near 0. This is an example in which we have to be careful in distinguishing code from math. The function var is simply computing the variance of the list we feed it, while the mathematical use of var is considering only quantities that are random variables. In the R code above \(x\) is not fixed at all: we are letting it vary but when we write \(\mathbb{V}[y_{i}]=\sigma^{2}\) we are imposing, mathematically, \(x\) to be fixed. Similarly if we use R to compute the variance of \(Y\) in our object dropping example we obtain something very different than \(\sigma^{2}=1\) (the known variance):

g <- 9.8 
h.0 <- 56.67
v.0 <- 0
N <- 25
tt <- seq(0,3.4,len=N)
y = h.0 + v.0*tt  - 0.5*g*tt^2 + rnorm(N,sd=1)
var(y)
[1] 314.3813

Again, this is because we are not fixing tt.

Exercises:

In the previous assessment, we used a Monte Carlo technique to see that the linear model coefficients are random variables when the data is a random sample. Now we will use the matrix algebra from the previous lesson to try to estimate the standard error of the linear model coefficients. Again, take a random sample of the father.son heights data:

#library(UsingR)
x <- father.son$fheight
y <- father.son$sheight
n <- length(y)
N <- 50
set.seed(1)
index <- sample(n,N) #Sample function gives a sample of N values in n.
sampledat <- father.son[index,]
x.s <- sampledat$fheight
y.s <- sampledat$sheight
beta.hat <- lm(y.s~x.s)$coef

The formula for the standard error in the previous lesson was

\[\begin{align} SE(\hat{\beta})&=\sqrt{\mathbb{V}(\hat{\beta})}\\ \mathbb{V}(\hat{\beta})&=\sigma^{2}(X^{T}X)^{-1} \end{align}\] We will estimate or calculate each part of this equation and then combine them.

First, we want to estimate \(\sigma^{2}\), the variance of \(Y\). As we have seen in the previous unit, the random part of \(Y\) only come from \(\varepsilon\) because we assume \(X\beta\) is fixed. So we can try to estimate the variance of the epsilons from the residuals, the \(Y\) minus the fitted values from the linear model.

Standard Errors Exercises #1

Note that the fitted values \(\hat{Y}\) from a linear model can be obtained with:

fit <- lm(y.s ~ x.s) # Estimates parameters for a given model from a set of data.
fit$fitted.values
       1        2        3        4        5        6        7        8        9 
69.27033 68.11931 69.09869 68.52856 66.48068 68.01818 67.79780 68.20419 68.04929 
      10       11       12       13       14       15       16       17       18 
68.50890 67.59336 67.40100 67.71831 66.27646 68.48256 67.80804 69.25742 69.11139 
      19       20       21       22       23       24       25       26       27 
68.76791 67.57585 66.65584 68.72140 67.55598 69.01422 68.62526 67.33597 68.92766 
      28       29       30       31       32       33       34       35       36 
68.94045 66.37204 66.58197 67.63505 68.03238 67.87915 69.69109 68.03458 68.47743 
      37       38       39       40       41       42       43       44       45 
68.71055 67.38635 68.88510 69.01250 68.61522 68.27404 69.71375 68.47982 70.01991 
      46       47       48       49       50 
67.87682 68.33042 67.88145 70.01590 68.81012 

What is the sum of the squared residuals (where residuals are given by \(r_{i}=y_{i}-\hat{y}_{i}\)?

y.hat<-fit$fitted.values
r<-y.s-y.hat
SSR<-t(r)%*%r
SSR
         [,1]
[1,] 331.2952

Our estimate of \(\sigma^{2}\) will be the sum of squared residuals divided by \((N-p)\) ), the sample size minus the number of terms in the model. Since we have a sample of 50 and 2 terms in the model (an intercept and a slope), our estimate of \(\sigma^{2}\) will be the sum of squared residuals divided by 48 Save this to a variable sigma2:

sigma2 <- SSR / 48
sigma2
         [,1]
[1,] 6.901984

where SSR is the answer to the previous question.

Standard Errors Exercises #2

Form the design matrix \(X\) (note: use a capital \(X\)!). This can be done by combining a column of 1’s with a column of \(x\) the father’s heights.

X <- cbind(rep(1,N), x.s)

Now calculate \((X^{T}X)^{-1}\) the inverse of \(X\) transpose times \(X\) use the solve() function to the inverse and t() for transpose. What is the element in the first row, first column?

Var.X<-solve(t(X)%*%X)
Var.X[1,1]
[1] 14.5639

Standard Errors Exercises #3

Now we are one step away from the standard error of \(\hat{\beta}\). Take the diagonals from \((X^{T}X)^{-1}\) the matrix above, using the diag() function. Now multiply our estimate of \(\sigma^{2}\) and the diagonals of this matrix. This is the estimated variance of \(\hat{\beta}\) so take the square root of this. You should end up with two numbers, the standard error for the intercept and the standard error for the slope.

What is the standard error for the slope?

diagonal<-diag(Var.X)
diagonal
                      x.s 
14.563903878  0.003169572 
sqrt(diagonal[2]*sigma2)
          [,1]
[1,] 0.1479065
---
title: "INFERENCE"
output: html_notebook
---

In this note we're going to demonstrate how we can use linear algebra to compute the standard errors of regression coefficients. We're going to start by defining the variance covariance matrix:
We denote a variance covariance matrix of a random variable $Y$ with $\mathbb{V}[Y]=\Sigma$ with entries:
$$\Sigma_{ij}=\mathbb{C}[y_{i},y_{j}]$$
In the cases we have been studying:
$$\mathbb{C}[y_{i},y_{i}]=\mathbb{V}[y_{i}]=\sigma^{2}$$
$$\begin{align}\mathbb{C}[y_{i},y_{j}]=0 & & \forall & i\not=j\end{align}$$
Thus, $\mathbb{V}[Y]=\sigma^{2}I$

Variance of a linear combination.
A useful matrix algebra result is that
$$\mathbb{V}[AY]=A\mathbb{V}[Y]A^{T}$$
For example if $y_{1}$ and $y_{2}$ are independent, both with variance $\sigma^{2}$:
$$\mathbb{V}[y_{1}+y_{2}]=\mathbb{V}\left\{\begin{pmatrix}1&1\end{pmatrix}\underbrace{\begin{pmatrix}y_{1}\\y_{2}\end{pmatrix}}_{Y}\right\}=\begin{pmatrix}1&1\end{pmatrix}\sigma^{2}I\begin{pmatrix}1\\1\end{pmatrix}$$
LSE Standard Errors:

Note that $\hat{\beta}$ is a linear combination of $Y$:
$$\begin{align}\hat{\beta}&=(X^{T}X)^{-1}X^{T}Y\\ \mathbb{V}[\hat{\beta}]&=\mathbb{V}\{X^{T}X)^{-1}X^{T}Y\}=\\ &=(X^{T}X)^{-1}X^{T}\sigma^{2}I\{(X^{T}X)^{-1}X^{T}\}^{T}=\\ &=\sigma^{2}(X^{T}X)^{-1}X^{T}X(X^{T}X)^{-1}=\\ &=\sigma^{2}(X^{T}X)^{-1}\end{align}$$
Linear Combination of Estimates

Commonly we want to compute the standard deviations of a linear estimation of estimates such as

$$\hat{\beta_{1}}-\hat{\beta_{2}}=\begin{pmatrix}0&1&-1&0&...&0\end{pmatrix}\begin{pmatrix}\hat{\beta}_{0}\\ \hat{\beta}_{1}\\ \hat{\beta}_{2}\\ \vdots \\ \hat{\beta}_{p}\end{pmatrix}$$
We know how to compute the variance covariance matrix of $\hat{\beta}$

The standard approach to writing linear models either assume the $X$ are fixed or that we are conditioning on them. Thus $X\beta$ has no variance as the $X$ is considered fixed. This is why we write $\mathbb{V}[y_{i}]=\mathbb{V}[e_{i}]=\sigma^{2}$. This can cause confusion in practice because if you, for example, compute the following:

```{r}
# install.packages("UsingR")
# library(UsingR)
x =  father.son$fheight
beta =  c(34,0.5)
var(beta[1]+beta[2]*x)
```
it is nowhere near 0. This is an example in which we have to be careful in distinguishing code from math. The function var is simply computing the variance of the list we feed it, while the mathematical use of var is considering only quantities that are random variables. In the R code above $x$ is not fixed at all: we are letting it vary but when we write $\mathbb{V}[y_{i}]=\sigma^{2}$ we are imposing, mathematically, $x$ to be fixed. Similarly if we use R to compute the variance of $Y$ in our object dropping example we obtain something very different than $\sigma^{2}=1$ (the known variance):
```{r}
g <- 9.8 
h.0 <- 56.67
v.0 <- 0
N <- 25
tt <- seq(0,3.4,len=N)
y = h.0 + v.0*tt  - 0.5*g*tt^2 + rnorm(N,sd=1)
var(y)
```
Again, this is because we are not fixing tt.


Exercises:

In the previous assessment, we used a Monte Carlo technique to see that the linear model coefficients are random variables when the data is a random sample. Now we will use the matrix algebra from the previous lesson to try to estimate the standard error of the linear model coefficients. Again, take a random sample of the father.son heights data:

```{r}
#library(UsingR)
x <- father.son$fheight
y <- father.son$sheight
n <- length(y)
N <- 50
set.seed(1)
index <- sample(n,N) #Sample function gives a sample of N values in n.
sampledat <- father.son[index,]
x.s <- sampledat$fheight
y.s <- sampledat$sheight
beta.hat <- lm(y.s~x.s)$coef
```

The formula for the standard error in the previous lesson was

$$\begin{align}  SE(\hat{\beta})&=\sqrt{\mathbb{V}(\hat{\beta})}\\  \mathbb{V}(\hat{\beta})&=\sigma^{2}(X^{T}X)^{-1} \end{align}$$
We will estimate or calculate each part of this equation and then combine them.

First, we want to estimate $\sigma^{2}$, the variance of $Y$. As we have seen in the previous unit, the random part of $Y$ only come from $\varepsilon$ because we assume $X\beta$ is fixed. So we can try to estimate the variance of the epsilons from the residuals, the $Y$ minus the fitted values from the linear model.

Standard Errors Exercises #1

Note that the fitted values $\hat{Y}$ from a linear model can be obtained with:

```{r}
fit <- lm(y.s ~ x.s) # Estimates parameters for a given model from a set of data.
fit$fitted.values
```
What is the sum of the squared residuals (where residuals are given by $r_{i}=y_{i}-\hat{y}_{i}$?

```{r}
y.hat<-fit$fitted.values
r<-y.s-y.hat
SSR<-t(r)%*%r
SSR
```
Our estimate of $\sigma^{2}$ will be the sum of squared residuals divided by $(N-p)$ ), the sample size minus the number of terms in the model. Since we have a sample of 50 and 2 terms in the model (an intercept and a slope), our estimate of $\sigma^{2}$ will be the sum of squared residuals divided by 48 Save this to a variable sigma2:
```{r}
sigma2 <- SSR / 48
sigma2
```
where SSR is the answer to the previous question.

Standard Errors Exercises #2

Form the design matrix $X$  (note: use a capital $X$!). This can be done by combining a column of 1's with a column of $x$ the father's heights.

```{r}
X <- cbind(rep(1,N), x.s)
```

Now calculate $(X^{T}X)^{-1}$ the inverse of $X$ transpose times $X$ use the solve() function to the inverse and t() for transpose. What is the element in the first row, first column?
```{r}
Var.X<-solve(t(X)%*%X)
Var.X[1,1]
```

Standard Errors Exercises #3

Now we are one step away from the standard error of $\hat{\beta}$. Take the diagonals from  $(X^{T}X)^{-1}$  the matrix above, using the diag() function. Now multiply our estimate of $\sigma^{2}$  and the diagonals of this matrix. This is the estimated variance of $\hat{\beta}$ so take the square root of this. You should end up with two numbers, the standard error for the intercept and the standard error for the slope.

What is the standard error for the slope?

```{r}
diagonal<-diag(Var.X)
diagonal
sqrt(diagonal[2]*sigma2)
```















