Assignment 5

Problem Set 1

In this problem set we’ll work out some properties of the least squares solution that we reviewed in the weekly readings. Consider the unsolvable system Ax = b as given below: [ \[\begin{bmatrix} 1 & 0\\ 1 & 1\\ 1 & 3\\ 1 & 4\\ \end{bmatrix}\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}\] = \[\begin{bmatrix} 0\\ 8\\ 8\\ 20\\ \end{bmatrix}\]

]

  • Write R markdown script to compute \[A^TA\] and \[A^Tb\].
A <- matrix(c(1, 1, 1, 1, 0, 1, 3, 4), nrow = 4, byrow = FALSE) #build A
b <- matrix(c(0, 8, 8, 20), nrow = 4) #build b
A.tran <- t(A)#use the built in function to find A transpose

ATA <- A.tran%*%A #calculate A transpose multiplied by A
ATA
##      [,1] [,2]
## [1,]    4    8
## [2,]    8   26
ATb <- A.tran%*%b #calculate A transpose multiplied by b
ATb
##      [,1]
## [1,]   36
## [2,]  112
  • Solve for \[\hat{x}\] in R using the above two computed matrices.

    • To solve the above, we will use equation \[A^TA*\hat{x} = A^Tb\]. We will essentially just plug in what we have from above to find \[\hat{x}\] = \[A^Tb/A^TA\]
x.hat <- solve(ATA, ATb) #use the built in function to find xhat
x.hat
##      [,1]
## [1,]    1
## [2,]    4
  • What is the squared error of this solution
error <- b - A%*%x.hat #this equation is straight from the weeks reading 
error
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3
error.square <- sum(error^2)#square error to find error squared
error.square
## [1] 44

 + Instead of b = [0; 8; 8; 20], start with p = [1; 5; 13; 17] and find the exact solution(i.e. show that this system is solvable as all equations are consistent with each other. This should result in an error vector e = 0).

p <- matrix(c(1, 5, 13, 17), nrow = 4)#build p
ATp <- A.tran%*%p #find A transpose p
p.xhat <- solve(ATA, ATp) #find the xhat associated with equation when using p  
error.p <- p - A%*%p.xhat #use the above equation to find error when using p
error.p
##      [,1]
## [1,]    0
## [2,]    0
## [3,]    0
## [4,]    0
errorp.square <- sum(error^2)
errorp.square
## [1] 44
  • Show that the error e = b - p = [-1;3;-5;3].
e <- b - p #simply plug in our variables
e
##      [,1]
## [1,]   -1
## [2,]    3
## [3,]   -5
## [4,]    3

 + Show that the error e is orthogonal to p and to each of the columns of A. + Two vectors are orthogonal when their dot products are 0.

require(Matrix) #for the crossprod function to check that the dot products are 0
## Loading required package: Matrix
#break A down in to it's columns
col1 <- matrix(c(1, 1, 1, 1))
col2 <- matrix(c(0, 1, 3, 4))
e.p <- crossprod(e,p)
e.p
##      [,1]
## [1,]    0
e.col1 <- crossprod(e, col1)
e.col1 
##      [,1]
## [1,]    0
e.col2 <- crossprod(e, col2)
e.col2 
##      [,1]
## [1,]    0

Your code should be able to print all of the above requested quantities. Please include enough comments to make it easy to follow your R markdown document.

Problem Set 2

Consider the modified auto-mpg data (obtained from the UC Irvine Machine Learning dataset). This dataset contains 5 columns: displacement, horsepower, weight, acceleration,mpg. We are going to model mpg as a function of the other four variables. Write an R markdown script that takes in the auto-mpg data, extracts an A matrix from the first 4 columns and b vector from the fifth (mpg) column. Using the least squares approach, your code should compute the best fitting solution. That is,find the best fitting equation that expresses mpg in terms of the other 4 variables. Finally, calculate the fitting error between the predicted mpg of your model and the actual mpg. Your script should be able to load in the 5 column data set, extract A and b, and perform the rest of the calculations. Please have adequate comments in your code to make it easy to follow your work. Please complete both problem set 1 & problem set 2 in one R markdown document and upload it to the site. You don’t have to attach the auto-mpg data. Just write your markdown document in such a way that it expects and loads the auto-mpg data file from the current working directoy. As always, your code is expected to compile and run successfully. Adding test cases to demostrate that your code is working will be very helpful.

require(RCurl)
## Loading required package: RCurl
## Loading required package: bitops
require(plyr)
## Loading required package: plyr
data <- getURL('https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data') #I was unable to load data from zip file so I loaded from URL
auto.data <- read.table(textConnection(data),header=F)
auto.data <- auto.data[c(1:5)] #since we only needed the first 5 columns, I created a subset
colnames(auto.data) <- c( 'displacement', 'horsepower','weight', 'acceleration', 'mpg')
head(auto.data)#the class for our data is dataframe and we eventually want numerical matrix
##   displacement horsepower weight acceleration  mpg
## 1           18          8    307        130.0 3504
## 2           15          8    350        165.0 3693
## 3           18          8    318        150.0 3436
## 4           16          8    304        150.0 3433
## 5           17          8    302        140.0 3449
## 6           15          8    429        198.0 4341

Now that we have loaded our data, we will start my creating our matrix A.

A.auto <- data.matrix(auto.data[,1:4])#create the proper subset for A
intercept <- rep(1, nrow(A.auto))
A.auto <- cbind(intercept, A.auto) #had to go back to add this as my solution was incorrect initially
b.auto <- data.matrix(auto.data[,5]) #do the same for b
#intially tried creating matrix and subset in two steps but the class of my data got compromised

One we have our matrices (A and b), we can compute \[A^TA, A^Tb, \hat{x}\]. We will use our \[\hat{x}\] values as our coeffients.

A.trans <- t(A.auto)
ATA <- A.trans %*% A.auto
ATb <- A.trans %*% b.auto
xhat.auto <- solve(ATA, ATb) #use the built in function to find xhat
round(xhat.auto, digits = 2)
##                 [,1]
## intercept    2378.20
## displacement  -24.54
## horsepower     21.93
## weight          5.64
## acceleration   -0.82
#alternatively we could use the following
lm(b.auto ~ A.auto)
## 
## Call:
## lm(formula = b.auto ~ A.auto)
## 
## Coefficients:
##        (Intercept)     A.autointercept  A.autodisplacement  
##          2378.2041                  NA            -24.5435  
##   A.autohorsepower        A.autoweight  A.autoacceleration  
##            21.9349              5.6440             -0.8173

Our equation: 2378.20 - 24.54 displacement + 21.93 horsepower + 5.64 weight - .82 acceleration = mpg

error.auto <- b.auto - A.auto%*%xhat.auto
error.auto
##                 [,1]
##   [1,]  -326.7258176
##   [2,]  -439.3381511
##   [3,]  -447.0022398
##   [4,]  -420.0727273
##   [5,]  -372.3277912
##   [6,]  -231.4960729
##   [7,]  -380.0539623
##   [8,]  -343.8547182
##   [9,]  -313.8806676
##  [10,]  -504.0131285
##  [11,]  -753.9568468
##  [12,]  -492.2586067
##  [13,]  -658.4442407
##  [14,] -1652.8806676
##  [15,]   -68.2980584
##  [16,]  -179.9984690
##  [17,]  -341.1819739
##  [18,]  -470.0034474
##  [19,]  -152.0841464
##  [20,]  -499.4171095
##  [21,]   266.6389107
##  [22,]    19.4795095
##  [23,]    10.0418486
##  [24,]  -268.5675505
##  [25,]  -399.2727252
##  [26,]   312.4944623
##  [27,]   370.1766509
##  [28,]   340.2704011
##  [29,]   716.9305690
##  [30,]  -152.0841464
##  [31,]  -234.5997227
##  [32,]  -187.7545245
##  [33,]  -358.6541563
##  [34,]  -717.2693977
##  [35,]    56.0602991
##  [36,]  -172.9492117
##  [37,]   -83.8405794
##  [38,]   -87.8129317
##  [39,]    52.1183149
##  [40,]    27.3682507
##  [41,]   -11.7950711
##  [42,]   114.8236245
##  [43,]   566.0472237
##  [44,]   284.0073807
##  [45,]   678.8247168
##  [46,]  -555.6539925
##  [47,]  -251.7556408
##  [48,]  -170.8621439
##  [49,]  -271.3841134
##  [50,]  -302.9939908
##  [51,]  -240.1427278
##  [52,]   -46.7555131
##  [53,]  -102.4652054
##  [54,]  -282.1463283
##  [55,]  -346.3468894
##  [56,]  -470.1522225
##  [57,]  -331.6581462
##  [58,]  -162.2980584
##  [59,]  -214.5319242
##  [60,]  -149.9610306
##  [61,]  -286.9479941
##  [62,]  -346.0810587
##  [63,]    92.5747810
##  [64,]   -51.6317493
##  [65,]   178.3671584
##  [66,]   -36.7950711
##  [67,]  -156.5291934
##  [68,]   -36.0355363
##  [69,]   318.1227725
##  [70,]   249.2139109
##  [71,]   -37.5406109
##  [72,]    33.2476363
##  [73,]    14.3837388
##  [74,]   144.5565127
##  [75,]   374.4980731
##  [76,]    95.8236245
##  [77,]   233.2668419
##  [78,]   -39.0668448
##  [79,]   419.0243604
##  [80,]  -126.6956897
##  [81,]  -152.5375247
##  [82,]    33.7287321
##  [83,]     3.2847898
##  [84,]  -105.7233432
##  [85,]  -182.0841464
##  [86,]   -78.9732105
##  [87,]  -230.1597952
##  [88,]  -200.7812444
##  [89,]   144.5895985
##  [90,]  -179.6328416
##  [91,]   305.8733254
##  [92,]    -4.5313086
##  [93,]   174.2960672
##  [94,]   255.8236245
##  [95,]    54.6017479
##  [96,]   163.0322645
##  [97,]  -414.4136251
##  [98,]  -212.8526330
##  [99,]  -248.4927457
## [100,]  -430.8129317
## [101,]  -389.3841134
## [102,]   -84.4549351
## [103,]  -384.4171095
## [104,]   479.3816236
## [105,]   418.6465106
## [106,]   417.7690388
## [107,]   296.3005917
## [108,]  -586.8129317
## [109,]  -174.8888840
## [110,]  -283.2991748
## [111,]   -82.9822552
## [112,]  -203.0172507
## [113,]  -311.9854627
## [114,]  -392.3217934
## [115,]   -45.6370495
## [116,]   -57.6941765
## [117,]  -100.5539837
## [118,]  -228.4746333
## [119,]  -315.5769058
## [120,]    34.6144198
## [121,]   192.8103759
## [122,]  -557.6328416
## [123,]   106.7107094
## [124,]   -80.9706796
## [125,]  -563.2429423
## [126,]    39.9144631
## [127,]  -247.3903395
## [128,]  -450.2693977
## [129,]  -215.0362796
## [130,]  -148.6639876
## [131,]    -3.2674059
## [132,]  -194.6027944
## [133,]   -42.4903668
## [134,]   254.5072543
## [135,]    65.2589396
## [136,]   279.1473670
## [137,]   295.1286749
## [138,]   512.6707641
## [139,]   475.8236245
## [140,]   743.0416070
## [141,]   354.8402048
## [142,]   -23.7278008
## [143,]  -258.3816573
## [144,]   -14.8010419
## [145,]  -441.7225609
## [146,]   -98.6006364
## [147,]  -103.6576923
## [148,]  -218.8318280
## [149,]  -178.4898380
## [150,]    10.8283237
## [151,]    26.3745444
## [152,]   -98.6639876
## [153,]    24.9818099
## [154,]   -15.9536694
## [155,]   -64.2747574
## [156,]  -338.2747574
## [157,]   279.6379825
## [158,]   300.3058235
## [159,]   565.9106924
## [160,]   487.9355843
## [161,]   517.1915928
## [162,]   372.9592628
## [163,]   138.7154057
## [164,]   521.4382760
## [165,]  -252.6342715
## [166,]  -314.0129097
## [167,]  -757.0406162
## [168,]   -72.6224486
## [169,]    11.9612546
## [170,]  -412.7258638
## [171,]   -39.1254262
## [172,]   143.9944072
## [173,]   -81.7403026
## [174,]    72.4723652
## [175,]    26.8511868
## [176,]  -270.3835030
## [177,]   -71.6131610
## [178,]   217.8703247
## [179,]   446.9287644
## [180,]   412.9145505
## [181,]   144.7062518
## [182,]  -330.4787787
## [183,]   148.3843007
## [184,]  -224.1293550
## [185,]     1.4043478
## [186,]   -64.6277472
## [187,]  -106.7469930
## [188,]   389.0118514
## [189,]   257.9106924
## [190,]    83.5781273
## [191,]    60.6593597
## [192,]    -5.1305058
## [193,]   -23.7795336
## [194,]    25.3578098
## [195,]  -111.7107922
## [196,]  -155.6060019
## [197,]  -207.1550988
## [198,]  -270.3835030
## [199,]  -330.4787787
## [200,]   363.7824264
## [201,]   155.4425251
## [202,]   184.7701061
## [203,]  -269.0868589
## [204,]  -408.8026900
## [205,]  -115.5326939
## [206,]  -113.1659825
## [207,]    15.6902618
## [208,]   443.6535198
## [209,]   -65.7199094
## [210,]   661.7546287
## [211,]    11.7644334
## [212,]   777.5811181
## [213,]   287.7464944
## [214,]  -133.7812444
## [215,]   -55.2232800
## [216,]  -250.7199094
## [217,]  -147.8116721
## [218,]  -139.0088142
## [219,]  -157.4850072
## [220,]  -153.4617944
## [221,]  -123.7173930
## [222,]    55.6465237
## [223,]   462.6445714
## [224,]   193.1869169
## [225,]   418.8637879
## [226,]    35.2265721
## [227,]   118.6419531
## [228,]   318.2388924
## [229,]   135.0610151
## [230,]  -166.7273452
## [231,]    46.5682881
## [232,]   -73.1817761
## [233,]   215.8399883
## [234,]  -301.1704401
## [235,]    91.7787803
## [236,]   -52.2530503
## [237,]   194.2241063
## [238,]  -170.4418868
## [239,]   -57.2818981
## [240,]  -239.8002677
## [241,]   -14.3551392
## [242,]    97.1263588
## [243,]   -14.6481255
## [244,]   358.6924837
## [245,]   110.6029470
## [246,]  -281.5460883
## [247,]   -72.8322827
## [248,]   146.0894572
## [249,]  -246.9418151
## [250,]  -161.1791802
## [251,]  -117.7279730
## [252,]  -173.6058188
## [253,]   196.7353590
## [254,]    93.8981472
## [255,]  -111.6382745
## [256,]   148.5893566
## [257,]   155.0541933
## [258,]   -62.7957474
## [259,]    76.0963065
## [260,]     8.0878458
## [261,]   303.3254958
## [262,]  -101.1129584
## [263,]  -357.6294686
## [264,]    94.4401426
## [265,]  -590.1472400
## [266,]   180.6393125
## [267,]   -74.6269730
## [268,]    87.0794398
## [269,]   -93.9883263
## [270,]   -12.1420657
## [271,]  -114.9991773
## [272,]   -27.9175771
## [273,]   187.1462981
## [274,]   -69.9819882
## [275,]   104.2549651
## [276,]   139.7077646
## [277,]   185.2582364
## [278,]   393.3422821
## [279,]  -149.5632906
## [280,]  -106.8987400
## [281,]   -31.9104961
## [282,]   -96.4556881
## [283,]   249.8674616
## [284,]    11.8390797
## [285,]    92.4125636
## [286,]    -3.9812686
## [287,]   -88.1403601
## [288,]  -156.7922617
## [289,]   -55.4495583
## [290,]   271.8425549
## [291,]   -79.8837871
## [292,]    27.8534089
## [293,]  -167.7802138
## [294,]  -204.7458770
## [295,]   -88.7219949
## [296,]  -165.7381319
## [297,]   255.7375830
## [298,]   692.3330270
## [299,]   -52.3366027
## [300,]   651.4093580
## [301,]    55.7471753
## [302,]    35.5822517
## [303,]    -7.0546882
## [304,]   -94.5280815
## [305,]   119.8664510
## [306,]   119.1332349
## [307,]  -175.3882941
## [308,]  -119.4753620
## [309,]   130.3052580
## [310,]   202.3450203
## [311,]   -18.5666643
## [312,]   -56.4508794
## [313,]    31.3629603
## [314,]   117.3158213
## [315,]   330.4959507
## [316,]   351.5047458
## [317,]   140.3494826
## [318,]    76.9102897
## [319,]   290.4428871
## [320,]   225.0147261
## [321,]   276.4516255
## [322,]    37.8324040
## [323,]   353.0721792
## [324,]   142.4370324
## [325,]   216.3637239
## [326,]   240.0551877
## [327,]   467.9660072
## [328,]   724.8864450
## [329,]   748.6417011
## [330,]    17.3995764
## [331,]  -106.0394280
## [332,]    13.4651612
## [333,]  -343.6433236
## [334,]   269.2730485
## [335,]   164.2246560
## [336,]   273.1630887
## [337,]   228.9351551
## [338,]    71.7504800
## [339,]    -4.9183595
## [340,]    35.0508569
## [341,]   -21.2654883
## [342,]  -177.9210324
## [343,]   -41.1964645
## [344,]  -151.4000520
## [345,]   -69.2760148
## [346,]  -255.0449345
## [347,]  -103.3501396
## [348,]   -11.9017050
## [349,]    55.2505944
## [350,]  -104.4901937
## [351,]    57.1326656
## [352,]   -79.0874321
## [353,]   145.4666652
## [354,]    -1.4179806
## [355,]   137.2213331
## [356,]    25.2917464
## [357,]   127.7411108
## [358,]   286.5322273
## [359,]   324.5604501
## [360,]   720.0372277
## [361,]   644.1350863
## [362,]    75.2012339
## [363,]   200.7352413
## [364,]   157.7266760
## [365,]  -147.1532421
## [366,]   -14.1862661
## [367,]   183.4475009
## [368,]   262.7987657
## [369,]   273.2552318
## [370,]   200.0599693
## [371,]   303.9773590
## [372,]    74.2600016
## [373,]   149.7722874
## [374,]   269.8608139
## [375,]   282.1245788
## [376,]  -137.7873788
## [377,]     6.6860547
## [378,]  -195.5751489
## [379,]    48.1263275
## [380,]    44.2689029
## [381,]   -31.0052945
## [382,]    76.7418744
## [383,]    58.7414205
## [384,]   -29.5877475
## [385,]  -176.8489511
## [386,]     0.4122525
## [387,]    33.7419369
## [388,]    25.3060594
## [389,]   -51.3567815
## [390,]  -436.9174429
## [391,]   246.9022641
## [392,]    91.0647391
## [393,]   364.7722874
## [394,]   263.5873988
## [395,]   239.8185097
## [396,]   -82.1093966
## [397,]   230.2904087
## [398,]   407.0170604
errorsquare.auto <-sum(error.auto^2)
errorsquare.auto
## [1] 31308059

The error seems extremely large which leads me to believe that I may have an error somewhere.