In this problem set we’ll work out some properties of the least squares solution that wereviewed in the weekly readings. Consider the unsolvable system Ax = b as given below: ##### Write R markdown script to compute ATA and ATb.
A <- matrix(c(1,1,1,1,0,1,3,4), nrow = 4)
b <- matrix(c(0,8,8,20), nrow = 4)
# calculating A transpose * A
ata <- t(A) %*% A
ata
## [,1] [,2]
## [1,] 4 8
## [2,] 8 26
atb <- t(A) %*% b
# Calculating A transpose * b
atb
## [,1]
## [1,] 36
## [2,] 112
x<-solve(ata,atb) # solving for x
x
## [,1]
## [1,] 1
## [2,] 4
p <- A %*% x
# calculate the error vector e = b - p
e <- b-p
sq_err <- sum(e^2)
sq_err
## [1] 44
p <- matrix(c(1,5,13,17), nrow = 4)
atp <- t(A) %*% p
x1<-solve(ata,atp)
x1
## [,1]
## [1,] 1
## [2,] 4
e <- (p - A %*% x1) # error should come zero
e
## [,1]
## [1,] 0
## [2,] 0
## [3,] 0
## [4,] 0
e <- b - p
e
## [,1]
## [1,] -1
## [2,] 3
## [3,] -5
## [4,] 3
# If dot product is zero, then it's orthogonal
round(sum(e*p))
## [1] 0
round(sum(e*A[,1]))
## [1] 0
round(sum(e*A[,2]))
## [1] 0
Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration,mpg. We are going to model mpg as a function of the other four variables.
car_mpg <- read.table("~/Desktop/mpg_car.txt",col.names=c('disp', 'hp', 'wt', 'acc', 'mpg'))
View(car_mpg)
Write an R markdown script that takes in the auto-mpg data, extracts an A matrix from the first 4 columns and b vector from the fifth (mpg) column.
# convert into matrices
A <- as.matrix(car_mpg[,1:4])
b <- as.matrix(car_mpg[,5])
head(A)
## disp hp wt acc
## [1,] 307 130 3504 12.0
## [2,] 350 165 3693 11.5
## [3,] 318 150 3436 11.0
## [4,] 304 150 3433 12.0
## [5,] 302 140 3449 10.5
## [6,] 429 198 4341 10.0
head(b)
## [,1]
## [1,] 18
## [2,] 15
## [3,] 18
## [4,] 16
## [5,] 17
## [6,] 15
Using the least squares approach, your code should compute the best fitting solution. That is, find the best fitting equation that expresses mpg in terms of the other 4 variables.
ATA <- t(A) %*% A
ATB <- t(A) %*% b
x1<-solve(ATA) %*% (ATB)
x1
## [,1]
## disp -0.030037938
## hp 0.157115685
## wt -0.006217883
## acc 1.997320955
Finally, calculate the fitting error between the predicted mpg of your model and the actual mpg. Your script should be able to load in the 5 column data set, extract A and b, and perform the rest of the calculations.
(e1 <- sqrt(sum(((A %*% x1) - b)^2)))
## [1] 114.4615
e1
## [1] 114.4615