Stat 510 Homework 3

Initial Data:

Given the predictor and response matricies, I import them into R as so:

y <- matrix(data = c(50,73,32,121,156,98,62,51,80),ncol = 1)
x <- matrix(c(1,14,11,1,28,18,1,10,5,1,30,20,1,48,30,1,30,21,1,20,15,1,16,11,1,25,17),
            ncol = 3, byrow = T)
mat_all <- cbind(y,x)
colnames(mat_all) <-  c("Y","B_0","B_1","B_2");mat_all

##         Y B_0 B_1 B_2
##  [1,]  50   1  14  11
##  [2,]  73   1  28  18
##  [3,]  32   1  10   5
##  [4,] 121   1  30  20
##  [5,] 156   1  48  30
##  [6,]  98   1  30  21
##  [7,]  62   1  20  15
##  [8,]  51   1  16  11
##  [9,]  80   1  25  17

Let’s look at X and Y in matrix form:

\[ X = \begin{bmatrix} 1 & 14 & 11\\ 1 & 28 & 18\\ 1 & 10 & 5\\ 1 & 30 & 20\\ 1 & 48 & 30\\ 1 & 30 & 21\\ 1 & 20 & 15\\ 1 & 16 & 11\\ 1 & 25 & 17 \end{bmatrix} \qquad Y= \begin{bmatrix} 50\\ 73\\ 32\\ 121\\ 156\\ 98\\ 62\\ 51\\ 80\\ \end{bmatrix} \]

Now that we have a matrix with our data cleaned and ready to go, we can begin to find the elements needed to analyze residuals.

Part A:

Find \((X'X)^{-1}\)

\[ \left( \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 14 & 28 & 10 & 30 & 48 & 30 & 20 & 16 & 25\\ 11 & 18 & 5 & 20 & 30 & 21 & 15 & 11 & 17 \end{bmatrix} * \begin{bmatrix} 1 & 14 & 11\\ 1 & 28 & 18\\ 1 & 10 & 5\\ 1 & 30 & 20\\ 1 & 48 & 30\\ 1 & 30 & 21\\ 1 & 20 & 15\\ 1 & 16 & 11\\ 1 & 25 & 17 \end{bmatrix} \right)^{-1} \\= \begin{bmatrix} 0.81290680 & 0.03927051 &-0.10131719\\ 0.03927051 &0.03367859 &-0.05267840\\ -0.10131719 &-0.05267840 &0.08482285 \end{bmatrix} \]

Using R,

solve(t(x)%*%x)

##             [,1]        [,2]        [,3]
## [1,]  0.81290680  0.03927051 -0.10131719
## [2,]  0.03927051  0.03367859 -0.05267840
## [3,] -0.10131719 -0.05267840  0.08482285

Part B:

Find \((X'Y)\)

\[ \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 14 & 28 & 10 & 30 & 48 & 30 & 20 & 16 & 25\\ 11 & 18 & 5 & 20 & 30 & 21 & 15 & 11 & 17 \end{bmatrix} * \begin{bmatrix} 50\\ 73\\ 32\\ 121\\ 156\\ 98\\ 62\\ 51\\ 80 \end{bmatrix} \\ = \begin{bmatrix} 723\\ 21178\\ 14033 \end{bmatrix} \]

Using R,

t(x)%*%y

##       [,1]
## [1,]   723
## [2,] 21178
## [3,] 14033

Part C:

Find \((Y'Y)\)

\[ \begin{bmatrix} 50 & 73 & 32 & 121 & 156 & 98 & 62 & 51 & 80 \end{bmatrix} * \begin{bmatrix} 50\\ 73\\ 32\\ 121\\ 156\\ 98\\ 62\\ 51\\ 80 \end{bmatrix} \\ = \begin{bmatrix} 70279 \end{bmatrix} \]

Using R,

t(y)%*%y

##       [,1]
## [1,] 70279

Part D

Find the least square estimator of \(\hat{\beta}\).

\[ \hat{\beta} = (X'X)^{-1}X'Y \]

Using results from parts A and B, \[ \begin{bmatrix} 0.81290680 & 0.03927051 &-0.10131719\\ 0.03927051 &0.03367859 &-0.05267840\\ -0.10131719 &-0.05267840 &0.08482285 \end{bmatrix} * \begin{bmatrix} 723\\ 21178\\ 14033 \end{bmatrix}\\ = \begin{bmatrix} -2.381630\\ 2.401792\\ 1.443504 \end{bmatrix} = \begin{bmatrix} \hat{\beta_0}\\ \hat{\beta_1}\\ \hat{\beta_2} \end{bmatrix} \]

Using R,

beta_hat <- solve(t(x)%*%x) %*% t(x)%*%y
rownames(beta_hat) <- c("Beta0","Beta1","Beta2");beta_hat

##            [,1]
## Beta0 -2.381630
## Beta1  2.401792
## Beta2  1.443504

Part E

Compute the estimate for \(s^2\) of \(\sigma^2\). Note,

\[ s^2 = \frac{SSE}{n-(k+1)} = \frac{Y'Y-\hat{\beta}'X'Y}{n-(k+1)} \] Therefore, once the matrix calculations are made, we end up with: \[ s^2 = \frac{70279 - 69399.93}{9 - (2 + 1)} = 146.5119 \]

Using R,

s_2 <- (t(y)%*%y-t(beta_hat)%*%t(x)%*%y)/(length(x[,1])-(2+1));s_2

##          [,1]
## [1,] 146.5119

Part F

Find the variance-covariance matric of \(\hat{\beta}\).

The variance-covariance matrix for \(\hat{\beta}\) or \(cov(\hat{\beta})\) is given by

\[ cov(\hat{\beta}) = \sigma^2(X'X)^{-1} \] Note, the resulting matrix is just the \((X'X)^{-1}\) element since \(\sigma^2\) must be estimated using \(s^2\), which comes later.

\[ cov(\hat{\beta}) = \sigma^2 \begin{bmatrix} 0.81290680 & 0.03927051 & -0.10131719\\ 0.03927051 & 0.03367859 & -0.05267840\\ -0.10131719 & -0.05267840 & 0.08482285\\ \end{bmatrix} \]

Using R,

var_cov_B <- (solve(t(x)%*%x));var_cov_B

##             [,1]        [,2]        [,3]
## [1,]  0.81290680  0.03927051 -0.10131719
## [2,]  0.03927051  0.03367859 -0.05267840
## [3,] -0.10131719 -0.05267840  0.08482285

Part G

Using your answers to parts E and F, find the variances of \(\hat{\beta_0}\), \(\hat{\beta_1}\) and \(\hat{\beta_2}\).

Using the two previous parts, we use the estimate for \(\sigma^2 = s^2\), and extract the diagonal from the variance-covariance matrix after multiplying the matrix by the constant. The diagonal of the resulting matrix holds the variances for our predictors, and the surrounding values are their respective relevant covariances.

\[ cov(\hat{\beta}) = s^2(X'X)^{-1}\\ \] \[ \begin{aligned} cov(\hat{\beta}) &= 146.5119 \begin{bmatrix} 0.81290680 & 0.03927051 & -0.10131719\\ 0.03927051 & 0.03367859 & -0.05267840\\ -0.10131719 & -0.05267840 & 0.08482285\\ \end{bmatrix}\\ \\ &= \begin{bmatrix} 119.100534 & 5.753598 & -14.844175\\ 5.753598 & 4.934315 & -7.718014\\ -14.844175 & -7.718014 & 12.427558 \end{bmatrix}\\ \end{aligned} \] So, the variances for our estimators are as follows:

\[ var(\hat{\beta_0})= 119.100534 \qquad var(\hat{\beta_1})= 4.934315 \qquad var(\hat{\beta_2)}= 12.427558 \]

Using R,

var_cov <- as.double(s_2)*(solve(t(x)%*%x));var_cov

##            [,1]      [,2]       [,3]
## [1,] 119.100534  5.753598 -14.844175
## [2,]   5.753598  4.934315  -7.718014
## [3,] -14.844175 -7.718014  12.427558

est_vars <- as.data.frame(diag(var_cov))
rownames(est_vars) <- c("var(B_0)","var(B_1)","var(B_2)");est_vars

##          diag(var_cov)
## var(B_0)    119.100534
## var(B_1)      4.934315
## var(B_2)     12.427558

Part H

Find the variance of the fitted values \((\hat{Y})\)

To find the variance of the fitted values for Y, we need to compute the hat matrix, and then multiply the hat matrix by our sample variance, as follows: \[ cov(\hat{Y})= s^2H \qquad \text{for} \qquad H= X(X'X)^{-1}X' \] So the resulting matrix equation follows:

\[ cov(\hat{Y}) = 146.5119 \begin{bmatrix} 0.32312385 & 0.02109678 & 0.0324712461 \\0.02109678 & 0.15142657 & 0.1818643960\\0.03247125 & 0.18186440 & 0.8057353462 \\0.07218657 & 0.12054394 & -0.0007806897\\-0.22189744 & 0.23860965 & -0.0126907470&&\dots\\0.16642308 & 0.07104277 & -0.2047676540\\0.28792019 & 0.02745123 & -0.1074900825\\0.18574061 & 0.08921650 & 0.2578000890\\0.13293512 & 0.09874817 & 0.0478580960 \end{bmatrix} \] Please note, the hat matrix is large (9x9 in this case) and was truncated. We now need to extract the diagonal from the resulting matrix to get the variances for our fitted values. Here they are listed in vector format:

\[ var(\hat{Y})= \begin{bmatrix} 47.34149\\ 22.18580\\ 118.04983\\ 20.83983\\ 106.38038\\ 37.60054\\ 43.03760\\ 26.82219\\ 17.27808 \end{bmatrix} \]

Using R,

hat_mat <- x%*%solve(t(x)%*%x)%*%t(x)
var_fitted <- as.data.frame(as.double(s_2)*diag(hat_mat))
colnames(var_fitted) <- c("Variance of Fitted Values");var_fitted

##   Variance of Fitted Values
## 1                  47.34149
## 2                  22.18580
## 3                 118.04983
## 4                  20.83983
## 5                 106.38038
## 6                  37.60054
## 7                  43.03760
## 8                  26.82219
## 9                  17.27808

Part I

Find the variance-covariance matrix of the residuals

The variance-covariance matrix for the residuals is found by: \[ cov(\epsilon) = \sigma^2(I-H) \] where I is the identity matrix corresponding to the dimensions of the hat matrix, and H obviously is the hat matrix. The calculation of the variance-covariance matrix for residuals includes the estimator for \(\sigma^2\) so we use \(s^2\).

The resulting matrix is: \[ \begin{bmatrix} 99.170423 & -3.090929 & -4.7574245 & -10.5761923 & 32.510619 & -24.382964 & -42.183739 & -27.213213 & -19.476580\\ -3.090929 & 124.326120 & -26.6453014 & -17.6611240 & -34.959157 & -10.408612 & -4.021932 & -13.071280 & -14.467784\\ -4.757425 & -26.645301 & 28.4620870 & 0.1143803 & 1.859346 & 30.000902 & 15.748578 & -37.770785 & -7.011781\\ -10.576192 & -17.661124 & 0.1143803 & 125.6720840 & -32.115608 & -23.006410 & -15.779168 & -9.421749 & -17.226213\\ 32.510619 & -34.959157 & 1.8593457 & -32.1156081 & 40.131543 & -19.633520 & 16.077089 & 10.389992 & -14.260304\\ -24.382964 & -10.408612 & 30.0009017 & -23.0064103 & -19.633520 & 108.911373 & -32.988091 & -7.792494 & -20.700183\\ -42.183739 & -4.021932 & 15.7485781 & -15.7791682 & 16.077089 & -32.988091 & 103.474316 & -19.523130 & -20.803924\\ -27.213213 & -13.071280 & -37.7707854 & -9.4217488 & 10.389992 & -7.792494 & -19.523130 & 119.689726 & -15.287067\\ -19.476580 & -14.467784 & -7.0117814 & -17.2262126 & -14.260304 & -20.700183 & -20.803924 & -15.287067 & 129.233835 \end{bmatrix} \] and the vector of the variances for our residuals are extracted from the diagonal of the above matrix:

\[ var(\hat{\epsilon}) = \begin{bmatrix} 99.17042\\ 124.32612\\ 28.46209\\ 125.67208\\ 40.13154\\ 108.91137\\ 103.47432\\ 119.68973\\ 129.23383\\ \end{bmatrix} \]

Using R,

cov_var_mat <- as.double(s_2)*(diag(9)-hat_mat)
var_resid <- cbind(diag(cov_var_mat))
colnames(var_resid) <- c("Variance of Residuals");var_resid

##       Variance of Residuals
##  [1,]              99.17042
##  [2,]             124.32612
##  [3,]              28.46209
##  [4,]             125.67208
##  [5,]              40.13154
##  [6,]             108.91137
##  [7,]             103.47432
##  [8,]             119.68973
##  [9,]             129.23383

Stat 510 Homework 3

Sean Krinik

2/21/2018

Initial Data:

Part A:

Part B:

Part C:

Part D

Part E

Part F

Part G

Part H

Part I