In this problem set we’ll work out some properties of the least squares solution that we reviewed in the weekly readings. Consider the unsolvable system \(Ax = b\) as given below:
\[ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 3 \\ 1 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 8 \\ 8 \\ 20 \end{bmatrix} \]
#Prepare the matrix A, and the constraint vector b matrix.
A <- matrix(c(1,0,1,1,1,3,1,4), byrow=TRUE, nrow=4)
b <- matrix(c(0,8,8,20), nrow=4)
#Let's compute the $A^T A$ and $A^T b$
(ATA <- t(A) %*% A)
## [,1] [,2]
## [1,] 4 8
## [2,] 8 26
(ATb <- t(A) %*% b)
## [,1]
## [1,] 36
## [2,] 112
From the formula \(A^T A \hat{x} = A^T b\) , the \(\hat{x} = (A^T A)^{-1} (A^T b)\)
(x_hat <- solve(ATA) %*% ATb)
## [,1]
## [1,] 1
## [2,] 4
#Compare it with built-in R function 'lsfit' , which provides the least squares estimate of b, just like above.
x_hat_R <- lsfit(A, b, intercept=FALSE)
round(x_hat) == round(x_hat_R$coefficients)
## [,1]
## [1,] TRUE
## [2,] TRUE
The projection \(p = \hat{x_1} a_1 + \hat{x_2} a_2\), where \(\hat{x_1} and \hat{x_2}\) indicate ‘estimates’ From the above, p can be written as , \(p=A \hat{x}\). We already have the \(\hat{x}\), lets find the projection vector p. Also, We know that, \(b = p + e\) , so the error vector \(e = b - p\) , which is what we need to minimize :
(p <- A %*% x_hat)
## [,1]
## [1,] 1
## [2,] 5
## [3,] 13
## [4,] 17
#Now , lets compute the error vector e , which is b - p
(e <- b - p)
## [,1]
## [1,] -1
## [2,] 3
## [3,] -5
## [4,] 3
# Compare the error vector, with built-in lsfit function - residuals.
round(e) == round(x_hat_R$residuals)
## [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] TRUE
## [4,] TRUE
#Find the squared error
(sq.e <- sum(e^2))
## [1] 44
#basically, the case where b=p.
#Let's first find the A^T p , we have already got the (A^T A)A -1.
p = matrix(c(1,5,13,17), nrow=4)
(ATp <- t(A) %*% p)
## [,1]
## [1,] 36
## [2,] 112
Find x-hat, by finding \((A^T A)^{-1} (A^T p)\)
(x_hat_p <- solve(ATA) %*% ATp)
## [,1]
## [1,] 1
## [2,] 4
#p=Ax, and then, calculate the error
(p2 <- A %*% x_hat_p)
## [,1]
## [1,] 1
## [2,] 5
## [3,] 13
## [4,] 17
(error <- round(p2 - p))
## [,1]
## [1,] 0
## [2,] 0
## [3,] 0
## [4,] 0
(sq.error <- round(sum(error^2), digits=6))
## [1] 0
So, from the above, we have solvable system of equations.
(res.error <- b - p)
## [,1]
## [1,] -1
## [2,] 3
## [3,] -5
## [4,] 3
If 2 vectors are orthogonal , then their dot product is equal to zero.
#e and p are orthogonal
round(sum(e*p))
## [1] 0
#e and the columns of A are orthogonal
round(sum(e*A[,1]))
## [1] 0
round(sum(e*A[,2]))
## [1] 0
Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration, mpg. We are going to model mpg as a function of the other four variables.
data <- read.table('auto-mpg.data',
col.names=c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg'))
summary(data)
## displacement horsepower weight acceleration
## Min. : 68.0 Min. : 46.0 Min. :1613 Min. : 8.00
## 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225 1st Qu.:13.78
## Median :151.0 Median : 93.5 Median :2804 Median :15.50
## Mean :194.4 Mean :104.5 Mean :2978 Mean :15.54
## 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615 3rd Qu.:17.02
## Max. :455.0 Max. :230.0 Max. :5140 Max. :24.80
## mpg
## Min. : 9.00
## 1st Qu.:17.00
## Median :22.75
## Mean :23.45
## 3rd Qu.:29.00
## Max. :46.60
Lets convert the data frame into \(A\) and \(b\) matrices
A <- data.matrix(data[,1:4])
A <- cbind(rep(1, nrow(A)), A) #added a column of 1's for the intercepts
b <- data.matrix(data[,5])
head(A)
## displacement horsepower weight acceleration
## [1,] 1 307 130 3504 12.0
## [2,] 1 350 165 3693 11.5
## [3,] 1 318 150 3436 11.0
## [4,] 1 304 150 3433 12.0
## [5,] 1 302 140 3449 10.5
## [6,] 1 429 198 4341 10.0
head(b)
## [,1]
## [1,] 18
## [2,] 15
## [3,] 18
## [4,] 16
## [5,] 17
## [6,] 15
We got the required inputs, to calculate the coordinates \(\hat{x}\) which approximates the solution to the \(Ax = b\).
ATA <- t(A) %*% A
ATb <- t(A) %*% b
#Lets calculate the x_hat
(x_hat <- solve(ATA) %*% ATb)
## [,1]
## 45.251139699
## displacement -0.006000871
## horsepower -0.043607731
## weight -0.005280508
## acceleration -0.023147999
The above \(\hat{x}\) gives us the best fitting solution for the given auto mpg data, Lets find out the error.
p <- A %*% x_hat
e <- b - p
(sq_e <- sum(e^2))
## [1] 6979.413
Let’s match with R built-in function lsfit
lsr <- lsfit(A, b, intercept=FALSE) #we already added 1st vector for intercept, so no need to add again.
lsr$coefficients
## displacement horsepower weight acceleration
## 45.251139699 -0.006000871 -0.043607731 -0.005280508 -0.023147999
round(x_hat) == round(lsr$coefficients)
## [,1]
## TRUE
## displacement TRUE
## horsepower TRUE
## weight TRUE
## acceleration TRUE