Once we’ve described some basic concepts of matrix algebra, we’re ready to move on to see how this is used in statistics and data analysis. A simple example:
- Mean \[\begin{align}A=\underbrace{\begin{pmatrix}1\\1\\\vdots\\1\end{pmatrix}}_{N\times 1}\Longrightarrow\frac{1}{N}A^{T}Y&=\frac{1}{N}\begin{pmatrix}1&1&...&1\end{pmatrix}\begin{pmatrix}y_{1}\\y_{2}\\\vdots\\y_{n}\end{pmatrix}\\&=\frac{1}{N}\sum_{i=1}^{N}y_{1}=\overline{Y}\end{align}\] As shown, we can use the matrix algebra notation to represent the sample average.
mean(y)
[1] 68.68407
N<-length(y)
A<-matrix(1,N,1)
Y<-matrix(y,N,1)
Y.bar<-(1/N)*t(A)%*%Y
Y.bar
[,1]
[1,] 68.68407
Let \(r\) be the deviations of each observation in \(Y\) with respect to the mean \(\overline{Y}\), called \(\textit{vector of residuals}\) \[r\equiv\begin{pmatrix}y_{1}-\overline{Y}\\\vdots\\y_{n}-\overline{Y}\end{pmatrix}\] and we can write the \(\textit{variance}\) as, \[\frac{1}{N}r^{T}r=\frac{1}{N}\sum_{i=1}^{N}(y_{i}-\overline{Y})^{2}\] in general, \(r^{T}r\) gives us the sum of squares of \(r\) entries.
Y.bar.v<-matrix(Y.bar,N,1)
r<-Y-Y.bar.v
variance<-(t(r)%*%r)/N
print(variance)
[,1]
[1,] 7.915196
var(Y)*(N-1)/N
[,1]
[1,] 7.915196
Using matrix algebra we can rewrite the general model: \[\begin{align}y_{i}=\beta_{0}+\sum_{j=1}^{N}\beta_{j}x_{i,j}+\varepsilon_{i} & & i=1,...,N\end{align}\] Simply as, \[Y=X\beta+\varepsilon\] Such that, \[\begin{align}Y=\begin{pmatrix}y_{1}\\y_{2}\\ \vdots\\ y_{N}\end{pmatrix},& & X=\begin{pmatrix}1&x_{1,1}&...&x_{1,p}\\1&x_{2,1}&...&x_{2,p}\\ \vdots &\vdots&\ddots&\vdots\\1&x_{N,1}&...&x_{N,p}\end{pmatrix},\end{align}\] \[\begin{align}\beta=\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \vdots\\ \beta_{p}\end{pmatrix}& &\text{and} & &\varepsilon=\begin{pmatrix}\varepsilon_{1}\\ \varepsilon_{2}\\ \vdots \\ \varepsilon_{N}\end{pmatrix}\end{align}\] So we have, \[\begin{pmatrix}y_{1}\\ y_{2}\\ \vdots \\ y_{N}\end{pmatrix}=\begin{pmatrix}1&x_{1,1}&...&x_{1,p}\\1&x_{2,1}&...&x_{2,p}\\ \vdots &\vdots&\ddots&\vdots\\1&x_{N,1}&...&x_{N,p}\end{pmatrix}\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \vdots\\ \beta_{p}\end{pmatrix}+\begin{pmatrix}\varepsilon_{1}\\ \varepsilon_{2}\\ \vdots \\ \varepsilon_{N}\end{pmatrix}\] One of the ways this is useful is that we can write down the residual sum of squares in a relatively simple formula. First, we need to find is the vector of betas that minimize the residual sums of squares. Minimizing Residual Sum of Squares: The RSS equation becomes simpler \[(Y-X\beta)^{T}(Y-X\beta)\] and to find the \(\hat{\beta}\) that minimizes this we solve (by taking the derivative): \[\begin{align}2X^{T}(Y-X\hat{\beta})&=0\\ \Leftrightarrow X^{T}X\hat{\beta}&=X^{T}Y\\ \Leftrightarrow \hat{\beta}&=(X^{T}X)^{-1}X^{T}Y\end{align}\]
x<-father.son$fheight
y<-father.son$sheight
X<-cbind(1,x)
beta.hat<-solve(t(X)%*%X)%*%t(X)%*%y
beta.hat
[,1]
33.886604
x 0.514093
Practice
g<-9.8 #gravity constant
N<-25 #number of observations
t<- seq(0,3.4,len=N) #sequence of numbers btwn 0-3,4
f<-56.67+0*t-0.5*g*t^{2} #1st term = constant height of the Tower of Pisa, 2nd term=linear term in time, 3rd term=square of time times g.
epsilon<-rnorm(N,0,1) #error term distributed as normal with mean=0 and variance=1
y<-f+epsilon
#install.packages("ggplot")
#library(ggplot2)
plot(t,y, xlab="Time in secs", ylab="Distance in metters")
lines(t,f,col=2)

t2<-t^2
fit<-lm(y~t+t2)
summary(fit)
Call:
lm(formula = y ~ t + t2)
Residuals:
Min 1Q Median 3Q Max
-2.1189 -0.3065 0.1061 0.8327 1.3394
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.2953 0.6000 93.825 < 2e-16 ***
t 0.3284 0.8173 0.402 0.692
t2 -4.9640 0.2322 -21.379 3.29e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.081 on 22 degrees of freedom
Multiple R-squared: 0.9967, Adjusted R-squared: 0.9964
F-statistic: 3285 on 2 and 22 DF, p-value: < 2.2e-16
We need first construct the \(X\) that has covariates as columns:
t<- seq(0,3.4,len=N)
t2<-t^2
N<-25
f<-56.67+0*t-0.5*g*t^{2}
epsilon<-rnorm(N,0,1)
y<-f+epsilon
intercept<-matrix(1,N,1)
X<-cbind(intercept,t,t2)
betas<-solve(t(X)%*%X)%*%t(X)%*%y
betas
[,1]
56.2523269
t 0.5732126
t2 -5.0763945
Exercises: Suppose we are analyzing a set of 4 samples. The first two samples are from a treatment group A and the second two samples are from a treatment group B. This design can be represented with a model matrix like so:
X
[,1] [,2]
a 1 0
a 1 0
b 1 1
b 1 1
Suppose that the fitted parameters for a linear model give us:
beta
[1] 5 2
Use the matrix multiplication operator, %*%, in R to answer the following questions:
Matrix Algebra Examples Exercises #1
What is the fitted value for the A samples? (The fitted Y values.)
Y<-X%*%beta
Y[1:2,]
Matrix Algebra Examples Exercises #2 What is the fitted value for the B samples? (The fitted Y values.)
Y[3:4,]
b b
7 7
Inference Review Exercises #1 We have shown how to find the least squares estimates with matrix algebra. These estimates are random variables as they are linear combinations of the data. For these estimates to be useful we also need to compute the standard errors.
Here we review standard errors in the context of linear models.
It is useful to think about where randomness comes from. In our falling object example, randomness was introduced through measurement errors. Every time we rerun the experiment a new set of measurement errors will be made which implies our data will be random. This implies that our estimate of, for example, the gravitational constant will change. The constant is fixed, but our estimates are not. To see this we can run a Monte Carlo simulation. Specifically we will generate the data repeatedly and compute the estimate for the quadratic term each time.
g <- 9.8 ## meters per second
h.0 <- 56.67
v.0 <- 0
N <- 25
tt <- seq(0,3.4,len=N) ##time in secs, t is a base function
y <- h.0 + v.0 *tt - 0.5* g*tt^2 + rnorm(N,sd=1)
Now we act as if we didn’t know \(h.0\), \(v.0\) and \(-0.5*g\) and use regression to estimate these. We can rewrite the model as \(y = \beta_{0} + \beta_{1}t + \beta_{2} t^2 + \varepsilon\) and obtain the LSE we have used in this class. Note that \(g = -2\beta_{2}\). To obtain the LSE in R we could write:
X <- cbind(1,tt,tt^2)
A <- solve(t(X)%*%X)%*%t(X)
Given how we have defined A, which of the following is the LSE of g, the acceleration due to gravity?
betas<-A%*%y
-2*betas[3]
[1] 8.895099
Inference Review Exercises #2 In the lines of code above, there was a call to a random function rnorm(). This means that each time the lines of code above are repeated, the estimate of \(g\) will be different.
Set the seed to 1, then use the code above in conjunction with the function replicate() to generate 100,000 Monte Carlo simulated datasets. For each dataset compute an estimate of g (remember to multiply by -2).
What is the standard deviation of this estimate?:
##Info Parenthesis
To use Monte Carlo methods, you need to be able to replicate some random process many times. There are two main ways this is commonly done: either with replicate() or with for() loops.
The replicate() function executes some expression many times and returns the output from each execution. Say we have a vector x, which represents 30 observations of fish length (mm):
x
[1] 468.4231 552.0125 492.7318 505.9992 522.0911 517.0467 464.9127 514.3202 495.0259 576.6961 509.8098
[12] 449.0348 466.7975 459.5749 493.6497 489.5787 474.1776 527.2837 533.8535 450.2837 536.4532 459.5868
[23] 495.1613 490.2220 510.0694 536.4575 484.2347 489.3954 509.4261 568.6227
We wish to build the sampling distribution of the mean length “by hand”. We can sample randomly from it, calculate the mean, then repeat this process many times:
means <- replicate(n = 1000, expr = {
x_i = sample(x, length(x), replace = T)
mean(x_i)
})
If we take mean(means) and sd(means), that should be very similar to mean(x) and se(x). Create the se() function and prove this to yourself:
se <- function(x) sd(x)/sqrt(length(x))
mean(means); mean(x)
[1] 501.4155
[1] 501.4311
sd(means); se(x)
[1] 6.144784
[1] 6.077123
##.
set.seed(1)
B <- 100000
g <- 9.8 ## meters per second
n <- 25
tt <- seq(0,3.4,len=n) ##time in secs, t is a base function
X <- cbind(1,tt,tt^2)
A <- solve(crossprod(X))%*%t(X)
betahat <- replicate(B,{
y = 56.67 - 0.5*g*tt^2 + rnorm(n,sd=1)
betahats = -2*A%*%y
return(betahats[3])
})
sqrt(mean( (betahat-mean(betahat) )^2))
[1] 0.4297449
---
title: "DATA ANALYSIS WITH MATRIX ALGEBRA"
output: html_notebook
---
Once we've described some basic concepts of matrix algebra, we're ready to move on to see how this is used in statistics and data analysis.
A simple example:

- Mean
$$\begin{align}A=\underbrace{\begin{pmatrix}1\\1\\\vdots\\1\end{pmatrix}}_{N\times 1}\Longrightarrow\frac{1}{N}A^{T}Y&=\frac{1}{N}\begin{pmatrix}1&1&...&1\end{pmatrix}\begin{pmatrix}y_{1}\\y_{2}\\\vdots\\y_{n}\end{pmatrix}\\&=\frac{1}{N}\sum_{i=1}^{N}y_{1}=\overline{Y}\end{align}$$
As shown, we can use the matrix algebra notation to represent the sample average.
```{r Average}
#install.packages("UsingR")
#library(UsingR)
y<-father.son$sheight #generates vector of son heights.
mean(y)
```
```{r}
N<-length(y)
A<-matrix(1,N,1)
Y<-matrix(y,N,1)
Y.bar<-(1/N)*t(A)%*%Y
Y.bar
```
- The variance

Let $r$ be the deviations of each observation in $Y$ with respect to the mean $\overline{Y}$, called $\textit{vector of residuals}$
$$r\equiv\begin{pmatrix}y_{1}-\overline{Y}\\\vdots\\y_{n}-\overline{Y}\end{pmatrix}$$
and we can write the $\textit{variance}$ as,
$$\frac{1}{N}r^{T}r=\frac{1}{N}\sum_{i=1}^{N}(y_{i}-\overline{Y})^{2}$$
in general, $r^{T}r$ gives us the sum of squares of $r$ entries.
```{r Variance}
Y.bar.v<-matrix(Y.bar,N,1)
r<-Y-Y.bar.v
variance<-(t(r)%*%r)/N
print(variance)
#Equivalent to:
var(Y)*(N-1)/N #unbiased sample variance
```
Using matrix algebra we can rewrite the general model:
$$\begin{align}y_{i}=\beta_{0}+\sum_{j=1}^{N}\beta_{j}x_{i,j}+\varepsilon_{i} & & i=1,...,N\end{align}$$
Simply as,
$$Y=X\beta+\varepsilon$$
Such that,
$$\begin{align}Y=\begin{pmatrix}y_{1}\\y_{2}\\ \vdots\\ y_{N}\end{pmatrix},& & X=\begin{pmatrix}1&x_{1,1}&...&x_{1,p}\\1&x_{2,1}&...&x_{2,p}\\ \vdots &\vdots&\ddots&\vdots\\1&x_{N,1}&...&x_{N,p}\end{pmatrix},\end{align}$$
$$\begin{align}\beta=\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \vdots\\  \beta_{p}\end{pmatrix}& &\text{and} & &\varepsilon=\begin{pmatrix}\varepsilon_{1}\\ \varepsilon_{2}\\ \vdots \\ \varepsilon_{N}\end{pmatrix}\end{align}$$
So we have,
$$\begin{pmatrix}y_{1}\\ y_{2}\\ \vdots \\ y_{N}\end{pmatrix}=\begin{pmatrix}1&x_{1,1}&...&x_{1,p}\\1&x_{2,1}&...&x_{2,p}\\ \vdots &\vdots&\ddots&\vdots\\1&x_{N,1}&...&x_{N,p}\end{pmatrix}\begin{pmatrix}\beta_{0}\\ \beta_{1}\\ \vdots\\  \beta_{p}\end{pmatrix}+\begin{pmatrix}\varepsilon_{1}\\ \varepsilon_{2}\\ \vdots \\ \varepsilon_{N}\end{pmatrix}$$
One of the ways this is useful is that we can write down the residual sum of squares in a relatively simple formula. First, we need to find is the vector of betas that minimize the residual sums of squares.
Minimizing Residual Sum of Squares:
The RSS equation becomes simpler
$$(Y-X\beta)^{T}(Y-X\beta)$$
and to find the $\hat{\beta}$ that minimizes this we solve (by taking the derivative):
$$\begin{align}2X^{T}(Y-X\hat{\beta})&=0\\ \Leftrightarrow X^{T}X\hat{\beta}&=X^{T}Y\\  \Leftrightarrow  \hat{\beta}&=(X^{T}X)^{-1}X^{T}Y\end{align}$$
```{r}
x<-father.son$fheight
y<-father.son$sheight
X<-cbind(1,x)
beta.hat<-solve(t(X)%*%X)%*%t(X)%*%y
beta.hat
```

Practice
```{r}
g<-9.8 #gravity constant
N<-25 #number of observations
t<- seq(0,3.4,len=N) #sequence of numbers btwn 0-3,4
f<-56.67+0*t-0.5*g*t^{2} #1st term = constant height of the Tower of Pisa, 2nd term=linear term in time, 3rd term=square of time times g.
epsilon<-rnorm(N,0,1) #error term distributed as normal with mean=0 and variance=1
y<-f+epsilon
#install.packages("ggplot")
#library(ggplot2)
plot(t,y, xlab="Time in secs", ylab="Distance in metters")
lines(t,f,col=2)
```
```{r}
t2<-t^2
fit<-lm(y~t+t2)
summary(fit)
```

We need first construct the $X$ that has covariates as columns:
```{r}
t<- seq(0,3.4,len=N)
t2<-t^2
N<-25 
f<-56.67+0*t-0.5*g*t^{2} 
epsilon<-rnorm(N,0,1)
y<-f+epsilon
intercept<-matrix(1,N,1)
X<-cbind(intercept,t,t2)
betas<-solve(t(X)%*%X)%*%t(X)%*%y
betas
```

Exercises:
Suppose we are analyzing a set of 4 samples. The first two samples are from a treatment group A and the second two samples are from a treatment group B. This design can be represented with a model matrix like so:
```{r}
X <- matrix(c(1,1,1,1,0,0,1,1),nrow=4)
rownames(X) <- c("a","a","b","b")
X
```
Suppose that the fitted parameters for a linear model give us:
```{r}
beta <- c(5, 2)
beta
```
Use the matrix multiplication operator, %*%, in R to answer the following questions:

Matrix Algebra Examples Exercises #1

What is the fitted value for the A samples? (The fitted Y values.)

```{r}
Y<-X%*%beta
Y[1:2,]
```
Matrix Algebra Examples Exercises #2
What is the fitted value for the B samples? (The fitted Y values.)
```{r}
Y[3:4,]
```

Inference Review Exercises #1
We have shown how to find the least squares estimates with matrix algebra. These estimates are random variables as they are linear combinations of the data. For these estimates to be useful we also need to compute the standard errors.

Here we review standard errors in the context of linear models.

It is useful to think about where randomness comes from. In our falling object example, randomness was introduced through measurement errors. Every time we rerun the experiment a new set of measurement errors will be made which implies our data will be random. This implies that our estimate of, for example, the gravitational constant will change. The constant is fixed, but our estimates are not. To see this we can run a Monte Carlo simulation. Specifically we will generate the data repeatedly and compute the estimate for the quadratic term each time.

```{r}
g <- 9.8 ## meters per second
h.0 <- 56.67
v.0 <- 0
N <- 25
tt <- seq(0,3.4,len=N) ##time in secs, t is a base function
y <- h.0 + v.0 *tt  - 0.5* g*tt^2 + rnorm(N,sd=1)
```

Now we act as if we didn't know $h.0$, $v.0$ and $-0.5*g$ and use regression to estimate these. We can rewrite the model as $y = \beta_{0} + \beta_{1}t + \beta_{2} t^2 + \varepsilon$ and obtain the LSE we have used in this class. Note that $g = -2\beta_{2}$.
To obtain the LSE in R we could write:
```{r}
X <- cbind(1,tt,tt^2)
A <- solve(t(X)%*%X)%*%t(X)
```
Given how we have defined A, which of the following is the LSE of g, the acceleration due to gravity?
```{r}
betas<-A%*%y
-2*betas[3]
```
Inference Review Exercises #2
In the lines of code above, there was a call to a random function rnorm(). This means that each time the lines of code above are repeated, the estimate of $g$ will be different.

Set the seed to 1, then use the code above in conjunction with the function replicate() to generate 100,000 Monte Carlo simulated datasets. For each dataset compute an estimate of g (remember to multiply by -2).

What is the standard deviation of this estimate?:

##Info Parenthesis

To use Monte Carlo methods, you need to be able to replicate some random process many times. There are two main ways this is commonly done: either with replicate() or with for() loops.

The replicate() function executes some expression many times and returns the output from each execution. Say we have a vector x, which represents 30 observations of fish length (mm):
```{r}
x <- rnorm(30, 500, 30)
```
We wish to build the sampling distribution of the mean length “by hand”. We can sample randomly from it, calculate the mean, then repeat this process many times:
```{r}
means <- replicate(n = 1000, expr = {
  x_i = sample(x, length(x), replace = T)
  mean(x_i)
})
```
If we take mean(means) and sd(means), that should be very similar to mean(x) and se(x). Create the se() function  and prove this to yourself:
```{r}
se <- function(x) sd(x)/sqrt(length(x))
mean(means); mean(x)
```
```{r}
sd(means); se(x)
```

##.

```{r}
set.seed(1)
B <- 100000
g <- 9.8 ## meters per second
n <- 25
tt <- seq(0,3.4,len=n) ##time in secs, t is a base function
X <- cbind(1,tt,tt^2)
A <- solve(crossprod(X))%*%t(X)

betahat <- replicate(B,{
  y = 56.67  - 0.5*g*tt^2 + rnorm(n,sd=1)
  betahats = -2*A%*%y
  return(betahats[3])
})
sqrt(mean( (betahat-mean(betahat) )^2))

```








