1. Problem Set 1

In this problem set we’ll work out some properties of the least squares solution that wereviewed in the weekly readings. Consider the unsolvable system Ax = b as given below: ##### Write R markdown script to compute ATA and ATb.

A <- matrix(c(1,1,1,1,0,1,3,4), nrow = 4)
b <- matrix(c(0,8,8,20), nrow = 4)
# calculating A transpose * A
ata <- t(A) %*% A
ata
##      [,1] [,2]
## [1,]    4    8
## [2,]    8   26
atb <- t(A) %*% b
# Calculating A transpose * b
atb
##      [,1]
## [1,]   36
## [2,]  112
Solve for ˆx in R using the above two computed matrices.
x<-solve(ata,atb) # solving for x
x
##      [,1]
## [1,]    1
## [2,]    4
What is the squared error of this solution?
p <- A %*% x
# calculate the error vector e = b - p
e <- b-p
sq_err <- sum(e^2)
sq_err
## [1] 44
Instead of b = [0; 8; 8; 20], start with p = [1; 5; 13; 17] and find the exact solution(i.e. show that this system is solvable as all equations are consistent with each other. This should result in an error vector e = 0).
p <- matrix(c(1,5,13,17), nrow = 4)
atp <- t(A) %*% p
x1<-solve(ata,atp)
x1
##      [,1]
## [1,]    1
## [2,]    4
e <- (p - A %*% x1) # error should come zero
e
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    0
Show that the error e = b − p = [−1; 3; −5; 3].
e <- b - p
e
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3
Show that the error e is orthogonal to p and to each of the columns of A.
# If dot product is zero, then it's orthogonal
round(sum(e*p)) 
## [1] 0
round(sum(e*A[,1]))
## [1] 0
round(sum(e*A[,2]))
## [1] 0

2. Problem Set 2

Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration,mpg. We are going to model mpg as a function of the other four variables.

car_mpg <- read.table("~/Desktop/mpg_car.txt",col.names=c('disp', 'hp', 'wt', 'acc', 'mpg'))
View(car_mpg)

Write an R markdown script that takes in the auto-mpg data, extracts an A matrix from the first 4 columns and b vector from the fifth (mpg) column.

# convert into matrices
A <- as.matrix(car_mpg[,1:4])
b <- as.matrix(car_mpg[,5])
head(A)
##      disp  hp   wt  acc
## [1,]  307 130 3504 12.0
## [2,]  350 165 3693 11.5
## [3,]  318 150 3436 11.0
## [4,]  304 150 3433 12.0
## [5,]  302 140 3449 10.5
## [6,]  429 198 4341 10.0
head(b)
##      [,1]
## [1,]   18
## [2,]   15
## [3,]   18
## [4,]   16
## [5,]   17
## [6,]   15

Using the least squares approach, your code should compute the best fitting solution. That is, find the best fitting equation that expresses mpg in terms of the other 4 variables.

ATA <- t(A) %*% A
ATB <- t(A) %*% b
x1<-solve(ATA) %*% (ATB)
x1
##              [,1]
## disp -0.030037938
## hp    0.157115685
## wt   -0.006217883
## acc   1.997320955

Finally, calculate the fitting error between the predicted mpg of your model and the actual mpg. Your script should be able to load in the 5 column data set, extract A and b, and perform the rest of the calculations.

(e1 <- sqrt(sum(((A %*% x1) - b)^2)))
## [1] 114.4615
e1
## [1] 114.4615