In this problem set we’ll work out some properties of the least squares solution that we reviewed in the weekly readings. Consider the unsolvable system Ax = b as given below
$$ ( \[\begin{array}{cccc} 1 & 0\\ 1 & 1\\ 1 & 3\\ 1 & 4\\ \end{array}\])
( \[\begin{array}{cc} x1\\ x2 \end{array}\]) $$
write code in R to compute= \(A^{T}A\) and \(A^{T}b\)
A = matrix(c(1,0,1,1,1,3,1,4), nrow=4,byrow=TRUE)
A
## [,1] [,2]
## [1,] 1 0
## [2,] 1 1
## [3,] 1 3
## [4,] 1 4
b = matrix(c(0,8,8,20), nrow=4,byrow=TRUE)
b
## [,1]
## [1,] 0
## [2,] 8
## [3,] 8
## [4,] 20
taa = t(A) %*% A
taa
## [,1] [,2]
## [1,] 4 8
## [2,] 8 26
tab = t(A) %*% b
tab
## [,1]
## [1,] 36
## [2,] 112
Solve for \(\hat{x}\) in R using the above two computed matrice
xhat = solve(taa) %*% tab
xhat
## [,1]
## [1,] 1
## [2,] 4
What is the squared error of this solution?
Axhat = A %*% xhat
Axhat
## [,1]
## [1,] 1
## [2,] 5
## [3,] 13
## [4,] 17
#error
er = Axhat - b
er
## [,1]
## [1,] 1
## [2,] -3
## [3,] 5
## [4,] -3
#squared error
sqdE = sum(er^2)
sqdE
## [1] 44
Instead of b = [0;8;8;20], start with p = [1;5;13;17] and find the exact solution (i.e. show that this system is solvable as all equations are consistent with each other. This should result in an error vector e = 0).
p = matrix(c(1,5,13,17), nrow=4,byrow=TRUE)
p
## [,1]
## [1,] 1
## [2,] 5
## [3,] 13
## [4,] 17
taa = t(A) %*% A
taa
## [,1] [,2]
## [1,] 4 8
## [2,] 8 26
tap = t(A) %*% p
tap
## [,1]
## [1,] 36
## [2,] 112
xhat = solve(taa) %*% tap
xhat
## [,1]
## [1,] 1
## [2,] 4
Axhat = A %*% xhat
Axhat
## [,1]
## [1,] 1
## [2,] 5
## [3,] 13
## [4,] 17
er = Axhat - p
er
## [,1]
## [1,] 0.000000e+00
## [2,] 8.881784e-16
## [3,] 3.552714e-15
## [4,] 3.552714e-15
#squared error
sqdE = sum(er^2)
sqdE
## [1] 2.603241e-29
Show that the error e = b−p = [−1;3;−5;3].
e = b-p
e
## [,1]
## [1,] -1
## [2,] 3
## [3,] -5
## [4,] 3
Show that the error e is orthogonal to p and to each of the columns of A.
# if the values are orthogonal the dot products will be equal to 0
sum(e*p)
## [1] 0
sum(e*A[,1])
## [1] 0
sum(e*A[,2])
## [1] 0
Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration, mpg. We are going to model mpg as a function of the other four variables. Write an R markdown script that takes in the auto-mpg data, extracts an A matrix from the first 4 columns and b vector from the fifth (mpg) column. Using the least squares approach, your code should compute the best fitting solution. That is, find the best fitting equation that expresses mpg in terms of the other 4 variables. Finally, calculate the fitting error between the predicted mpg of your model and the actual mpg. Your script should be able to load in the 5 column data set, extract A and b, and perform the rest of the calculations. Please have adequate comments in your code to make it easy to follow your work.
cars = read.table("https://raw.githubusercontent.com/bkreis84/Math/master/auto-mpg.data")
#read data into r
colnames(cars) = c('displacement', 'hp', 'wgt', 'acc', 'mpq')
#break the columns of the dataframeinto A and b
A = data.matrix(cars[,1:4])
b = data.matrix(cars[,5])
taa = t(A) %*% A
taa
## displacement hp wgt acc
## displacement 19097634 9374647.0 259345480 1123011.9
## hp 9374647 4857524.0 132989885 607832.3
## wgt 259345480 132989885.0 3757575489 17758103.6
## acc 1123012 607832.3 17758104 97656.9
tab = t(A) %*% b
tab
## [,1]
## displacement 1529685.9
## hp 868718.8
## wgt 25209061.4
## acc 146401.4
#solve for xhat
xhat = solve(taa) %*% tab
xhat
## [,1]
## displacement -0.030037938
## hp 0.157115685
## wgt -0.006217883
## acc 1.997320955
#our best fit of b is A %*% xhat
Axhat = A %*% xhat
# error is equal to Axhat - b for each observation
#squared error
sqdE = sum((Axhat - b)^2)
sqdE
## [1] 13101.43