Problem Set 1

In this problem set we’ll work out some properties of the least squares solution that we reviewed in the weekly readings. Consider the unsolvable system \(Ax = b\) as given below:

\[ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 3 \\ 1 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 8 \\ 8 \\ 20 \end{bmatrix} \]

Write R markdown script to compute \(A^T A\) and \(A^T b\)

#Prepare the matrix A, and the constraint vector b matrix.
A <- matrix(c(1,0,1,1,1,3,1,4), byrow=TRUE, nrow=4)
b <- matrix(c(0,8,8,20), nrow=4)

#Let's compute the $A^T A$ and $A^T b$
(ATA <- t(A) %*% A)
##      [,1] [,2]
## [1,]    4    8
## [2,]    8   26
(ATb <- t(A) %*% b)
##      [,1]
## [1,]   36
## [2,]  112

Solve for \(\hat{x}\) in R using the above two computed matrices.

From the formula \(A^T A \hat{x} = A^T b\) , the \(\hat{x} = (A^T A)^{-1} (A^T b)\)

(x_hat <- solve(ATA) %*% ATb)
##      [,1]
## [1,]    1
## [2,]    4
#Compare it with built-in R function 'lsfit' , which provides the least squares estimate of b, just like above.
x_hat_R <- lsfit(A, b, intercept=FALSE)
round(x_hat) == round(x_hat_R$coefficients)
##      [,1]
## [1,] TRUE
## [2,] TRUE

What is the squared error of this solution?

The projection \(p = \hat{x_1} a_1 + \hat{x_2} a_2\), where \(\hat{x_1} and \hat{x_2}\) indicate ‘estimates’ From the above, p can be written as , \(p=A \hat{x}\). We already have the \(\hat{x}\), lets find the projection vector p. Also, We know that, \(b = p + e\) , so the error vector \(e = b - p\) , which is what we need to minimize :

(p <- A %*% x_hat)
##      [,1]
## [1,]    1
## [2,]    5
## [3,]   13
## [4,]   17
#Now , lets compute the error vector e , which is b - p
(e <- b - p)
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3
# Compare the error vector, with built-in lsfit function - residuals.
round(e) == round(x_hat_R$residuals)
##      [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] TRUE
## [4,] TRUE
#Find the squared error
(sq.e <- sum(e^2))
## [1] 44

Instead of b = [0; 8; 8; 20], start with p = [1; 5; 13; 17] and find the exact solution (i.e. show that this system is solvable as all equations are consistent with each other. This should result in an error vector e = 0).

#basically, the case where b=p.

#Let's first find the A^T p , we have already got the (A^T A)A -1.
p = matrix(c(1,5,13,17), nrow=4)
(ATp <- t(A) %*% p)
##      [,1]
## [1,]   36
## [2,]  112

Find x-hat, by finding \((A^T A)^{-1} (A^T p)\)

(x_hat_p <- solve(ATA) %*% ATp)
##      [,1]
## [1,]    1
## [2,]    4
#p=Ax, and then, calculate the error
(p2 <- A %*% x_hat_p)
##      [,1]
## [1,]    1
## [2,]    5
## [3,]   13
## [4,]   17
(error <- round(p2 - p))
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    0
(sq.error <- round(sum(error^2), digits=6))
## [1] 0

So, from the above, we have solvable system of equations.

Show that the error e = b - p = [-1; 3;-5; 3].

(res.error <- b - p)
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3

Show that the error e is orthogonal to p and to each of the columns of A.

If 2 vectors are orthogonal , then their dot product is equal to zero.

#e and p are orthogonal
round(sum(e*p))
## [1] 0
#e and the columns of A are orthogonal
round(sum(e*A[,1]))
## [1] 0
round(sum(e*A[,2]))
## [1] 0

Problem Set 2

Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration, mpg. We are going to model mpg as a function of the other four variables.

data <- read.table('auto-mpg.data', 
                       col.names=c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg'))

summary(data)
##   displacement     horsepower        weight      acceleration  
##  Min.   : 68.0   Min.   : 46.0   Min.   :1613   Min.   : 8.00  
##  1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225   1st Qu.:13.78  
##  Median :151.0   Median : 93.5   Median :2804   Median :15.50  
##  Mean   :194.4   Mean   :104.5   Mean   :2978   Mean   :15.54  
##  3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615   3rd Qu.:17.02  
##  Max.   :455.0   Max.   :230.0   Max.   :5140   Max.   :24.80  
##       mpg       
##  Min.   : 9.00  
##  1st Qu.:17.00  
##  Median :22.75  
##  Mean   :23.45  
##  3rd Qu.:29.00  
##  Max.   :46.60

Lets convert the data frame into \(A\) and \(b\) matrices

A <- data.matrix(data[,1:4])
A <- cbind(rep(1, nrow(A)), A)  #added a column of 1's for the intercepts
b <- data.matrix(data[,5])
head(A)
##        displacement horsepower weight acceleration
## [1,] 1          307        130   3504         12.0
## [2,] 1          350        165   3693         11.5
## [3,] 1          318        150   3436         11.0
## [4,] 1          304        150   3433         12.0
## [5,] 1          302        140   3449         10.5
## [6,] 1          429        198   4341         10.0
head(b)
##      [,1]
## [1,]   18
## [2,]   15
## [3,]   18
## [4,]   16
## [5,]   17
## [6,]   15

We got the required inputs, to calculate the coordinates \(\hat{x}\) which approximates the solution to the \(Ax = b\).

ATA <- t(A) %*% A
ATb <- t(A) %*% b

#Lets calculate the x_hat
(x_hat <- solve(ATA) %*% ATb)
##                      [,1]
##              45.251139699
## displacement -0.006000871
## horsepower   -0.043607731
## weight       -0.005280508
## acceleration -0.023147999

The above \(\hat{x}\) gives us the best fitting solution for the given auto mpg data, Lets find out the error.

p <- A %*% x_hat
e <- b - p
(sq_e <- sum(e^2))
## [1] 6979.413

Let’s match with R built-in function lsfit

lsr <- lsfit(A, b, intercept=FALSE)  #we already added 1st vector for intercept, so no need to add again.
lsr$coefficients
##              displacement   horsepower       weight acceleration 
## 45.251139699 -0.006000871 -0.043607731 -0.005280508 -0.023147999
round(x_hat) == round(lsr$coefficients)
##              [,1]
##              TRUE
## displacement TRUE
## horsepower   TRUE
## weight       TRUE
## acceleration TRUE