1. Problem Set 1

In this problem set we’ll work out some properties of the least squares solution
that we reviewed in the weekly readings. Consider the unsolvable system Ax = b as given ##### below:
A<- matrix(c(1,0,1,1,1,3,1,4), nrow=4, ncol=2, byrow=TRUE)
A
##      [,1] [,2]
## [1,]    1    0
## [2,]    1    1
## [3,]    1    3
## [4,]    1    4
b<- matrix(c(0,8,8,20), nrow=4, ncol=1, byrow=TRUE)
b
##      [,1]
## [1,]    0
## [2,]    8
## [3,]    8
## [4,]   20
# Write R markdown script to compute t(A)A and t(A)b.
# t(A)A
t(A)%*%A
##      [,1] [,2]
## [1,]    4    8
## [2,]    8   26
#t(A)b
t(A)%*%b 
##      [,1]
## [1,]   36
## [2,]  112
# Solve for ^x in R using the above two computed matrices.

least_x <- solve(t(A)%*%A, t(A)%*%b )
least_x
##      [,1]
## [1,]    1
## [2,]    4
# What is the squared error of this solution?
# A least_x - b = e
# ||e||^2 = ||A leats_x  - b||^2

squared_error<- t((A%*%least_x - b)) %*% (A%*%least_x - b)
squared_error
##      [,1]
## [1,]   44
#or 
squared_error<- t((b- A%*%least_x)) %*% (b- A%*%least_x) 
squared_error
##      [,1]
## [1,]   44
#
#Instead of b = [0; 8; 8; 20], start with p = [1; 5; 13; 17] and the exact solution
#(i.e. show that this system is solvable as all equations are consistent with each
#other. This should result in an error vector e = 0).


p <- matrix(c(1,5,13,17),nrow=4)
p
##      [,1]
## [1,]    1
## [2,]    5
## [3,]   13
## [4,]   17
# using rref(), the reduced row echelon form, for Gauss Jordan elimination with partial pivoting in package pracma

library(pracma)
new_eq<- cbind(A,p)
new_eq
##      [,1] [,2] [,3]
## [1,]    1    0    1
## [2,]    1    1    5
## [3,]    1    3   13
## [4,]    1    4   17
new_sol<- rref(new_eq)
new_sol
##      [,1] [,2] [,3]
## [1,]    1    0    1
## [2,]    0    1    4
## [3,]    0    0    0
## [4,]    0    0    0
# For p = [1; 5; 13; 17], from new_sol, we see that x1 = 1 and x2= 4

# Show that this system is solvable as all equations are consistent with each
# other. we replace x1 and x2 with their values 1 and 4 respectively in the orginal system with p = = [1; 5; 13; 17]

new_eq
##      [,1] [,2] [,3]
## [1,]    1    0    1
## [2,]    1    1    5
## [3,]    1    3   13
## [4,]    1    4   17
#we see that: 
x1= 1
x2 = 4

c((x1 + 0 == 1),(x1 + x2 == 5),(x1 + 3 * x2 == 13),(x1 + 4 * x2 == 17))
## [1] TRUE TRUE TRUE TRUE
# Show that the error e = b -???? p =- [????1-; 3;????5; 3].

e<- b - p
e
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3
# To show that the error e is orthogonal to p and to each of the columns of A, is 
# to show that the dot product of t(e) and p is zero. And similarly the dot product of t(e) with each 
# column of A is zero hence



(t(e) %*% p ==0)
##      [,1]
## [1,] TRUE
(t(e) %*% A[,1] == 0)
##      [,1]
## [1,] TRUE
(t(e) %*% A[,2] == 0)
##      [,1]
## [1,] TRUE

2. Problem Set 2

Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning
dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration,
mpg. We are going to model mpg as a function of the other four variables.
Write an R markdown script that takes in the auto-mpg data, extracts an A matrix
from the frst 4 columns and b vector from the fifth (mpg) column. Using the least squares
approach, your code should compute the best fitting solution. That is,finnd the best fitting
equation that expresses mpg in terms of the other 4 variables. Finally, calculate the fitting
error between the predicted mpg of your model and the actual mpg. Your script should
be able to load in the 5 column data set, extract A and b, and perform the rest of the
calculations. Please have adequate comments in your code to make it easy to follow your
work.
Please complete both problem set 1 & problem set 2 in one R markdown document
and upload it to the site. You don’t have to attach the auto-mpg data. Just write your
markdown document in such a way that it expects and loads the auto-mpg data file from the

Solution Problem Set 2

Data loading and matrix conversion

datapath <- "C:/CUNY/Courses/IS605/Assignments/Assignment05/auto-mpg.DATA"
autompg_data <- scan(datapath)
#str(autompg_data)

autompg_data_mtrx <- t(matrix(autompg_data, nrow = 5))
#head(autompg_data_mtrx)

Finding the least squares

##           [,1]        [,2]       [,3]       [,4]
## [1,]  19097634   9374647.0  259345480  1123011.9
## [2,]   9374647   4857524.0  132989885   607832.3
## [3,] 259345480 132989885.0 3757575489 17758103.6
## [4,]   1123012    607832.3   17758104    97656.9
##            [,1]
## [1,]  1529685.9
## [2,]   868718.8
## [3,] 25209061.4
## [4,]   146401.4
##              [,1]
## [1,] -0.030037938
## [2,]  0.157115685
## [3,] -0.006217883
## [4,]  1.997320955
## [1] -0.03003794
## [1] 0.1571157
## [1] -0.006217883
## [1] 1.997321
The least squares:
C1 = -0.030037938
C2 = 0.157115685
C3 = -0.006217883
C4 = 1.997320955
Find the best fitting equation that expresses mpg in terms of the other 4 variables:
C1 %% displacement + C2 %% horsepower + C 3%% weight + C4 %% acceleration = mpg

validation and checking using linear regression functiom lm()

## 
## Call:
## lm(formula = mpg ~ 0 + displacement + horsepower + weight + acceleration)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.719  -2.894   0.136   3.590  20.747 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## displacement -0.030038   0.009005  -3.336 0.000933 ***
## horsepower    0.157116   0.017090   9.193  < 2e-16 ***
## weight       -0.006218   0.001107  -5.615 3.75e-08 ***
## acceleration  1.997321   0.083790  23.837  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.811 on 388 degrees of freedom
## Multiple R-squared:  0.9453, Adjusted R-squared:  0.9447 
## F-statistic:  1675 on 4 and 388 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = mpg ~ 0 + displacement + horsepower + weight + acceleration)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.719  -2.894   0.136   3.590  20.747 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## displacement -0.030038   0.009005  -3.336 0.000933 ***
## horsepower    0.157116   0.017090   9.193  < 2e-16 ***
## weight       -0.006218   0.001107  -5.615 3.75e-08 ***
## acceleration  1.997321   0.083790  23.837  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.811 on 388 degrees of freedom
## Multiple R-squared:  0.9453, Adjusted R-squared:  0.9447 
## F-statistic:  1675 on 4 and 388 DF,  p-value: < 2.2e-16

## displacement   horsepower       weight acceleration 
## -0.030037938  0.157115685 -0.006217883  1.997320955
##              [,1]
## [1,] -0.030037938
## [2,]  0.157115685
## [3,] -0.006217883
## [4,]  1.997320955

Please note that the fitting model coefficients match the least squares values once coerced at 0

if it is not coerced at 0 intercept, they dont match…