Problem Set 1

(1) Consider the unsolvable system \(Ax = b\) as given below:

\[ \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 3 \\ 1 & 4 \\ \end{bmatrix} \begin{bmatrix} x1 \\ x2 \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 8 \\ 8 \\ 20 \\ \end{bmatrix} \]

Write R markdown script to compute \(A^TA\) and \(A^Tb\)

A <- matrix(c(1, 0, 1, 1, 1, 3, 1, 4), nrow=4, ncol=2, byrow=TRUE)

b <- c(0, 8, 8, 20)

(ATA <- t(A) %*% A)
##      [,1] [,2]
## [1,]    4    8
## [2,]    8   26
(ATb <- t(A) %*% b)
##      [,1]
## [1,]   36
## [2,]  112

\[ A^TA = \begin{bmatrix} 4 & 8 \\ 8 & 26 \\ \end{bmatrix} \]

\[ A^Tb = \begin{bmatrix} 36 \\ 112 \\ \end{bmatrix} \]

Solve for \(\hat{x}\) in R using the above two computed matrices.

The least squares solution for an unsolvable system of equations is given by \(\hat{x}\) that makes the error as small as possible. From before, we know that this is given by the vector \(\hat{x}\) that solves the modified equation \(A^TA\hat{x} = A^Tb\)

We can solve for \(\hat{x}\) using the the R solve command for the two computed matrices \(A^TA\) and \(A^Tb\).

x_hat <- solve(ATA, ATb)

\[ \hat{x} = \begin{bmatrix} 1 \\ 4 \\ \end{bmatrix} \]

# alternatively, these system of equations can be solved using the following options.

solve(t(A) %*% A, t(A) %*% b)
solve(crossprod(A), crossprod(A, b))

What is the squared error of this solution?

\(e = b - A\hat{x}\)

The squared error is calcuated looking at the squared sum of the difference between actual values (b) the predicted values \(A\hat{x}\)

\(||Ax - b||^2 = ||Ax - p||^2 +||e||^2\)

We know that when we choose \(x = \hat{x}\), the first term in the difference vanishes leaving us only with the orthogonal error term \(||e||^2\). This is the smallest error we can get.

\(||e||^2 = ||b - A\hat{x}||^2\)

error <- b - (A %*% x_hat)  #    (A %*% x_hat) = y_hat = predicted values

(error_squared <- sum((error)^2))
## [1] 44
# OR

error_squared <- sum((b - (A %*% x_hat))^2)

The squared error of this solution = 44

Instead of b = [0;8;8;20], start with p = [1;5;13;17] and find the exact solution (i.e. show that this system is solvable as all equations are consistent with each other. This should result in an error vector e = 0).

\(Ax = b = p + e\)

\[ p = \begin{bmatrix} 1 \\ 5 \\ 13 \\ 17 \\ \end{bmatrix} \]

#Ax = b = p + e

# using Ax = p + e, solve for p
# We'll calculate the error term after solving with p

p <- matrix(c(1,5,13,17), nrow=4)   

ATp <- t(A) %*% p; ATp
##      [,1]
## [1,]   36
## [2,]  112
# solve x_hat using using p instead of b
x_hat.p <- solve(ATA, ATp)

# calculate the predicted values 
y_hat <- A %*% x_hat.p

If the predicted values are correct, solving:

\(y\hat = p + e\)

Or

\(y\hat - p = e = 0\)

(e <- y_hat - p)
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    0

As shown, the result for \(e\) is a 0 vector.

Show that the error \(e = b - p\) = [-1;3;-5;3].

\[ e = \begin{bmatrix} 0 \\ 8 \\ 8 \\ 0 \\ \end{bmatrix} - \begin{bmatrix} 1 \\ 5 \\ 13 \\ 17 \\ \end{bmatrix} \]

\(e = b - p\) can be calculated in R by the following:

(e <- b - p)
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3

Show that the error e is orthogonal to p and to each of the columns of A.

If e and p are orthogonal, their dot product should equal 0.

Show that e and p are orthogonal:

(crossprod(e, p))   # or sum(e*p) or t(e) %*% p
##      [,1]
## [1,]    0

Show that e and column 1 of matrix \(A\) are orthogonal:

crossprod(e, A[, 1])   
##      [,1]
## [1,]    0

Show that e and column 2 of matrix \(A\) are orthogonal:

crossprod(e, A[, 2])    
##      [,1]
## [1,]    0

Problem Set 2

Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration, mpg. We are going to model mpg as a function of the other four variables. Write an R markdown script that takes in the auto-mpg data, extracts an A matrix from the first 4 columns and b vector from the fifth (mpg) column. Using the least squares approach, your code should compute the best fitting solution. That is, find the best fitting equation that expresses mpg in terms of the other 4 variables.

Finally, calculate the fitting error between the predicted mpg of your model and the actual mpg. Your script should be able to load in the 5 column data set, extract A and b, and perform the rest of the calculations. Please have adequate comments in your code to make it easy to follow your work.

# read the auto-mpg dataset into a dataframe
# NOTE: auto-mpg.data needs to be in your current working directory

setwd("C:\\Users\\keith\\Documents\\DataScience\\CUNY\\DATA605\\Week5\\assign5")

auto_mpg <-  read.table("auto-mpg.data",  header=FALSE)

# prepare the matrices by converting columns 1 - 4 of the auto-mpg dataframe into the design matrix A
# the dependent variable b, mpg, is the 5th column of the auto-mpg dataframe

A <- cbind(constant = 1, as.matrix(auto_mpg[, ])[, -5])  # design matrix
b <- as.matrix(auto_mpg[, ])[, 5]                        # dependent variable


#parameter estimates
x.hat <- solve(t(A) %*% A) %*% t(A) %*% b  #  

# OR, this can be solved more efficiently using -

# as noted on CRAN regarding solving this in a more efficient way:

# There is no need to calculate the inverse of a matrix explicitly when solving a linear system of equations.  
# When the two argument form of the solve function is used the linear system.  
# The approve can be re-written to:

x.hat <- solve(crossprod(A), crossprod(A,b))

x.hat
##                  [,1]
## constant 45.251139699
## V1       -0.006000871
## V2       -0.043607731
## V3       -0.005280508
## V4       -0.023147999
# predicted values    
y.hat <- A %*% x.hat
y.hat[1:5, 1]
## [1] 18.95919 16.18844 18.40325 18.47996 18.87827
# calculate the fitting error between the predicted mpg of your model and the actual mpg
squared.error <- sum((b - y.hat)^2)

The best fitting equation that expresses mpg in terms of the other 4 variables is given by:

\begin{multline} \widehat{mpg} = 45.251139699 -0.0060009 \times displacement -0.0436077 \times horsepower -0.0052805 \times weight -0.023148 \times acceleration \end{multline}
We can compare the results to R calculated values:
m1 <- lm(V5 ~ V1 + V2 + V3 + V4, auto_mpg)

summary(m1)
## 
## Call:
## lm(formula = V5 ~ V1 + V2 + V3 + V4, data = auto_mpg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.378  -2.793  -0.333   2.193  16.256 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 45.2511397  2.4560447  18.424  < 2e-16 ***
## V1          -0.0060009  0.0067093  -0.894  0.37166    
## V2          -0.0436077  0.0165735  -2.631  0.00885 ** 
## V3          -0.0052805  0.0008109  -6.512  2.3e-10 ***
## V4          -0.0231480  0.1256012  -0.184  0.85388    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.247 on 387 degrees of freedom
## Multiple R-squared:  0.707,  Adjusted R-squared:  0.704 
## F-statistic: 233.4 on 4 and 387 DF,  p-value: < 2.2e-16

Note. While researching calculating least squares, I found some other calculations which are derived as part of the standard R lm command:

n <- nrow(A)
p <- ncol(A)

degrees_of_freedom <- n - p
    
# variance is calculated below using the squared error term

sigma2 <- sum((b - y.hat)^2)/(n - p)

# residual standard error

sqrt(sigma2)
## [1] 4.246723