Overview and summary

Regression Analysis for the price of Automobiles:

Our objective is to find the Regression analysis for the given below questions:

1. Stepwise Regression: Use the stepAIC function from the MASS package to perform stepwise regression starting with the linear model including all features. Which features are remain in the model? Evaluate this model using the summary and plot methods for your model object.

2. SVD Regression: Create a model matrix for all features using the model.matrix function with no intercept term (e.g. -1 in your model formula). Examine the first few rows of the model matrix. Notice how the categorical variables are encoded by a series of dummy variables. Does the coding make sense given the number of unique levels? Next, compute the SVD of the model matrix. Examine the singular values and determine which diagonal elements of the inverse singular value matrix should be set to zero. A plot may help you understand the fall-off in singular values. Then compute the weight vector. Use the weight vector to compute scores and evaluate the model. You may need to try models with several different numbers of inverse singular values set to zero to find a model with good overall performance.

3. Elastic Net Regression: Elastic net regression is the combination of ridge regression and lasso regression. In this case use an alpha parameter of 0.5 to give equal weight to each regularization method. Using the model matrix you created for part 2 and the 1-d matrix of log price, compute a Gaussian regression model using 20 values of the regularization parameter lambda. Compute the scores using the predict method. Plot and examine the evaluation of the parameter values and deviance with lambda. Choose a value of lambda (a column in the scores matrix) and evaluate your model. You may need to try several lambda’s to find one which gives good overall performance.

Note: Below are the required packages. - MASS

Data staging:

  require(MASS)
  require(ggplot2)
  require(gridExtra)
  require(dplyr)
  require(glmnet)
  #require(HistData)
read.auto = function(file = "C:\\Tejo\\DataScience\\UW_Datascience_Course\\350\\DataScience350-master\\Lecture7\\Automobile price data _Raw_.csv"){
  ## Read the csv file
  auto.price <- read.csv(file, header = TRUE, 
                         stringsAsFactors = FALSE)
  ## Coerce some character columns to numeric
  numcols <- c('price', 'bore', 'stroke', 'horsepower', 'peak.rpm','wheel.base','length','width','height','curb.weight','engine.size','compression.ratio','city.mpg','highway.mpg')
  auto.price[, numcols] <- lapply(auto.price[, numcols], as.numeric)
  auto.price$log.price=log(auto.price$price)
  auto.price[, numcols] <- lapply(auto.price[, numcols], scale)
  
  
  ## remove symbolizing and normalized.losses
  ## auto.price = auto.price[, names(auto.price) != c("symboling", "normalized.losses")]  
  auto.price <- auto.price %>% dplyr::select(-symboling,-normalized.losses,-price)
  ## Remove cases or rows with missing values. In this case we keep the 
  ## rows which do not have nas. 
  auto.price[complete.cases(auto.price), ]
  
}
auto.price = read.auto()

Lets solve Question 1:

1. Stepwise Regression: Use the stepAIC function from the MASS package to perform stepwise regression starting with the linear model including all features. Which features are remain in the model? Evaluate this model using the summary and plot methods for your model object.

lm.auto.log.price = lm(log.price ~ . , data = auto.price)
summary(lm.auto.log.price)
## 
## Call:
## lm(formula = log.price ~ ., data = auto.price)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26018 -0.05512  0.00000  0.06592  0.20811 
## 
## Coefficients: (3 not defined because of singularities)
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            10.1852947  0.4137735  24.616  < 2e-16 ***
## makeaudi                0.2285854  0.1355484   1.686 0.093983 .  
## makebmw                 0.4409076  0.1411200   3.124 0.002172 ** 
## makechevrolet          -0.0858318  0.1384351  -0.620 0.536271    
## makedodge              -0.2323665  0.1137079  -2.044 0.042902 *  
## makehonda               0.1014587  0.1336205   0.759 0.448965    
## makeisuzu              -0.3678090  0.1498096  -2.455 0.015325 *  
## makejaguar             -0.3593378  0.1695327  -2.120 0.035832 *  
## makemazda              -0.0002466  0.1034542  -0.002 0.998102    
## makemercedes-benz       0.0257536  0.1522460   0.169 0.865920    
## makemercury             0.0080468  0.1785745   0.045 0.964124    
## makemitsubishi         -0.2697050  0.1127970  -2.391 0.018149 *  
## makenissan             -0.0123073  0.1023058  -0.120 0.904421    
## makepeugot             -0.6400802  0.2717795  -2.355 0.019924 *  
## makeplymouth           -0.2442795  0.1119721  -2.182 0.030832 *  
## makeporsche             0.4211767  0.1782252   2.363 0.019515 *  
## makesaab                0.1965853  0.1211351   1.623 0.106902    
## makesubaru             -0.1493564  0.1176602  -1.269 0.206439    
## maketoyota             -0.1339611  0.0956239  -1.401 0.163484    
## makevolkswagen         -0.0114735  0.1067106  -0.108 0.914533    
## makevolvo               0.0357455  0.1326739   0.269 0.788006    
## fuel.typegas           -0.8278023  0.4038746  -2.050 0.042292 *  
## aspirationturbo         0.0954193  0.0489902   1.948 0.053478 .  
## num.of.doorsfour        0.0159817  0.0857732   0.186 0.852463    
## num.of.doorstwo        -0.0408062  0.0908550  -0.449 0.654038    
## body.stylehardtop      -0.1588071  0.0721192  -2.202 0.029325 *  
## body.stylehatchback    -0.1995974  0.0670226  -2.978 0.003428 ** 
## body.stylesedan        -0.1788447  0.0725350  -2.466 0.014904 *  
## body.stylewagon        -0.2054669  0.0786239  -2.613 0.009962 ** 
## drive.wheelsfwd        -0.0394053  0.0561587  -0.702 0.484062    
## drive.wheelsrwd         0.0481098  0.0744878   0.646 0.519433    
## engine.locationrear     0.4489981  0.1616325   2.778 0.006233 ** 
## wheel.base              0.0976269  0.0334353   2.920 0.004091 ** 
## length                 -0.0517560  0.0378006  -1.369 0.173166    
## width                   0.0401781  0.0298421   1.346 0.180395    
## height                 -0.0711441  0.0211272  -3.367 0.000983 ***
## curb.weight             0.2858047  0.0533241   5.360 3.41e-07 ***
## engine.typel            0.2241708  0.2540721   0.882 0.379143    
## engine.typeohc         -0.0990792  0.0732466  -1.353 0.178370    
## engine.typeohcf                NA         NA      NA       NA    
## engine.typeohcv        -0.1062898  0.0751138  -1.415 0.159306    
## num.of.cylindersfive   -0.0312688  0.1737810  -0.180 0.857470    
## num.of.cylindersfour    0.0880023  0.2142437   0.411 0.681887    
## num.of.cylinderssix    -0.0428522  0.1630979  -0.263 0.793144    
## num.of.cylindersthree          NA         NA      NA       NA    
## num.of.cylinderstwelve  0.2481157  0.3119906   0.795 0.427824    
## engine.size             0.1065991  0.0631797   1.687 0.093817 .  
## fuel.system2bbl         0.1534935  0.0900585   1.704 0.090561 .  
## fuel.systemidi                 NA         NA      NA       NA    
## fuel.systemmfi          0.1708235  0.1600510   1.067 0.287697    
## fuel.systemmpfi         0.1965818  0.0953881   2.061 0.041194 *  
## fuel.systemspdi         0.1628443  0.1116275   1.459 0.146887    
## fuel.systemspfi         0.3994432  0.1853389   2.155 0.032881 *  
## bore                   -0.0591592  0.0305156  -1.939 0.054584 .  
## stroke                  0.0010852  0.0190376   0.057 0.954626    
## compression.ratio      -0.1906659  0.1181942  -1.613 0.108995    
## horsepower             -0.0297056  0.0591075  -0.503 0.616068    
## peak.rpm                0.0116223  0.0189560   0.613 0.540807    
## city.mpg               -0.1201985  0.0533627  -2.252 0.025870 *  
## highway.mpg             0.0895815  0.0481500   1.860 0.064948 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.106 on 138 degrees of freedom
## Multiple R-squared:  0.9691, Adjusted R-squared:  0.9566 
## F-statistic: 77.36 on 56 and 138 DF,  p-value: < 2.2e-16
plot(lm.auto.log.price)

lm.auto.log.price.step = stepAIC(lm.auto.log.price, direction = "both")
## Start:  AIC=-828.59
## log.price ~ make + fuel.type + aspiration + num.of.doors + body.style + 
##     drive.wheels + engine.location + wheel.base + length + width + 
##     height + curb.weight + engine.type + num.of.cylinders + engine.size + 
##     fuel.system + bore + stroke + compression.ratio + horsepower + 
##     peak.rpm + city.mpg + highway.mpg
## 
## 
## Step:  AIC=-828.59
## log.price ~ make + fuel.type + aspiration + num.of.doors + body.style + 
##     drive.wheels + wheel.base + length + width + height + curb.weight + 
##     engine.type + num.of.cylinders + engine.size + fuel.system + 
##     bore + stroke + compression.ratio + horsepower + peak.rpm + 
##     city.mpg + highway.mpg
## 
## 
## Step:  AIC=-828.59
## log.price ~ make + aspiration + num.of.doors + body.style + drive.wheels + 
##     wheel.base + length + width + height + curb.weight + engine.type + 
##     num.of.cylinders + engine.size + fuel.system + bore + stroke + 
##     compression.ratio + horsepower + peak.rpm + city.mpg + highway.mpg
## 
##                     Df Sum of Sq    RSS     AIC
## - stroke             1   0.00004 1.5515 -830.59
## - horsepower         1   0.00284 1.5543 -830.24
## - peak.rpm           1   0.00423 1.5557 -830.06
## <none>                           1.5514 -828.59
## - fuel.system        6   0.09857 1.6500 -828.58
## - width              1   0.02038 1.5718 -828.05
## - length             1   0.02108 1.5725 -827.96
## - num.of.cylinders   4   0.07187 1.6233 -827.76
## - num.of.doors       2   0.04263 1.5941 -827.31
## - drive.wheels       2   0.04363 1.5951 -827.19
## - compression.ratio  1   0.02926 1.5807 -826.95
## - engine.size        1   0.03200 1.5835 -826.61
## - highway.mpg        1   0.03891 1.5904 -825.76
## - bore               1   0.04225 1.5937 -825.35
## - aspiration         1   0.04265 1.5941 -825.31
## - city.mpg           1   0.05704 1.6085 -823.55
## - body.style         4   0.11434 1.6658 -822.73
## - wheel.base         1   0.09585 1.6473 -818.90
## - engine.type        3   0.13315 1.6846 -818.54
## - height             1   0.12748 1.6789 -815.19
## - curb.weight        1   0.32296 1.8744 -793.72
## - make              19   1.40945 2.9609 -740.56
## 
## Step:  AIC=-830.59
## log.price ~ make + aspiration + num.of.doors + body.style + drive.wheels + 
##     wheel.base + length + width + height + curb.weight + engine.type + 
##     num.of.cylinders + engine.size + fuel.system + bore + compression.ratio + 
##     horsepower + peak.rpm + city.mpg + highway.mpg
## 
##                     Df Sum of Sq    RSS     AIC
## - horsepower         1   0.00281 1.5543 -832.24
## - peak.rpm           1   0.00419 1.5557 -832.06
## <none>                           1.5515 -830.59
## - fuel.system        6   0.09894 1.6504 -830.53
## - width              1   0.02043 1.5719 -830.04
## - length             1   0.02106 1.5725 -829.96
## - num.of.doors       2   0.04262 1.5941 -829.30
## - drive.wheels       2   0.04362 1.5951 -829.18
## - compression.ratio  1   0.02923 1.5807 -828.95
## - num.of.cylinders   4   0.08121 1.6327 -828.64
## - engine.size        1   0.03209 1.5836 -828.60
## + stroke             1   0.00004 1.5514 -828.59
## - highway.mpg        1   0.03903 1.5905 -827.74
## - aspiration         1   0.04278 1.5943 -827.28
## - bore               1   0.04599 1.5975 -826.89
## - city.mpg           1   0.05731 1.6088 -825.52
## - body.style         4   0.11498 1.6665 -824.65
## - wheel.base         1   0.09628 1.6478 -820.85
## - engine.type        3   0.13530 1.6868 -820.28
## - height             1   0.13036 1.6818 -816.86
## - curb.weight        1   0.32929 1.8808 -795.06
## - make              19   1.58097 3.1324 -731.58
## 
## Step:  AIC=-832.24
## log.price ~ make + aspiration + num.of.doors + body.style + drive.wheels + 
##     wheel.base + length + width + height + curb.weight + engine.type + 
##     num.of.cylinders + engine.size + fuel.system + bore + compression.ratio + 
##     peak.rpm + city.mpg + highway.mpg
## 
##                     Df Sum of Sq    RSS     AIC
## - peak.rpm           1   0.00226 1.5566 -833.95
## <none>                           1.5543 -832.24
## - fuel.system        6   0.10058 1.6549 -832.01
## - width              1   0.01862 1.5729 -831.91
## - length             1   0.02096 1.5753 -831.62
## - num.of.doors       2   0.04142 1.5957 -831.11
## - drive.wheels       2   0.04150 1.5958 -831.10
## - compression.ratio  1   0.02650 1.5808 -830.94
## - num.of.cylinders   4   0.07843 1.6327 -830.64
## + horsepower         1   0.00281 1.5515 -830.59
## - engine.size        1   0.03099 1.5853 -830.39
## + stroke             1   0.00001 1.5543 -830.24
## - highway.mpg        1   0.03751 1.5918 -829.58
## - bore               1   0.04463 1.5989 -828.71
## - aspiration         1   0.04573 1.6000 -828.58
## - city.mpg           1   0.05544 1.6097 -827.40
## - body.style         4   0.12165 1.6759 -825.54
## - wheel.base         1   0.10350 1.6578 -821.66
## - engine.type        3   0.13873 1.6930 -821.56
## - height             1   0.13191 1.6862 -818.35
## - curb.weight        1   0.33119 1.8855 -796.57
## - make              19   1.65349 3.2078 -728.95
## 
## Step:  AIC=-833.95
## log.price ~ make + aspiration + num.of.doors + body.style + drive.wheels + 
##     wheel.base + length + width + height + curb.weight + engine.type + 
##     num.of.cylinders + engine.size + fuel.system + bore + compression.ratio + 
##     city.mpg + highway.mpg
## 
##                     Df Sum of Sq    RSS     AIC
## <none>                           1.5566 -833.95
## - fuel.system        6   0.10031 1.6569 -833.77
## - width              1   0.01927 1.5758 -833.55
## - length             1   0.02053 1.5771 -833.40
## - num.of.doors       2   0.04055 1.5971 -832.94
## - compression.ratio  1   0.02637 1.5829 -832.68
## - drive.wheels       2   0.04303 1.5996 -832.63
## - engine.size        1   0.02957 1.5861 -832.28
## + peak.rpm           1   0.00226 1.5543 -832.24
## + horsepower         1   0.00088 1.5557 -832.06
## + stroke             1   0.00001 1.5565 -831.95
## - highway.mpg        1   0.03620 1.5928 -831.47
## - num.of.cylinders   4   0.08807 1.6446 -831.22
## - aspiration         1   0.04347 1.6000 -830.58
## - bore               1   0.04987 1.6064 -829.80
## - city.mpg           1   0.05754 1.6141 -828.87
## - body.style         4   0.12039 1.6770 -827.42
## - wheel.base         1   0.10150 1.6581 -823.63
## - engine.type        3   0.16814 1.7247 -819.95
## - height             1   0.13317 1.6897 -819.94
## - curb.weight        1   0.33000 1.8866 -798.46
## - make              19   1.85756 3.4141 -718.79
summary(lm.auto.log.price.step)
## 
## Call:
## lm(formula = log.price ~ make + aspiration + num.of.doors + body.style + 
##     drive.wheels + wheel.base + length + width + height + curb.weight + 
##     engine.type + num.of.cylinders + engine.size + fuel.system + 
##     bore + compression.ratio + city.mpg + highway.mpg, data = auto.price)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26258 -0.05555  0.00000  0.06920  0.21369 
## 
## Coefficients: (1 not defined because of singularities)
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             9.388919   0.245487  38.246  < 2e-16 ***
## makeaudi                0.242740   0.126450   1.920 0.056920 .  
## makebmw                 0.464288   0.123883   3.748 0.000259 ***
## makechevrolet          -0.066704   0.125000  -0.534 0.594440    
## makedodge              -0.207767   0.100761  -2.062 0.041047 *  
## makehonda               0.119490   0.120630   0.991 0.323603    
## makeisuzu              -0.352075   0.146422  -2.405 0.017492 *  
## makejaguar             -0.313952   0.149418  -2.101 0.037406 *  
## makemazda               0.014885   0.097277   0.153 0.878608    
## makemercedes-benz       0.043028   0.146721   0.293 0.769753    
## makemercury             0.002417   0.159330   0.015 0.987919    
## makemitsubishi         -0.248010   0.101604  -2.441 0.015889 *  
## makenissan              0.004783   0.091880   0.052 0.958558    
## makepeugot             -0.597076   0.258326  -2.311 0.022264 *  
## makeplymouth           -0.220036   0.099419  -2.213 0.028491 *  
## makeporsche             0.432156   0.163033   2.651 0.008950 ** 
## makesaab                0.201051   0.109877   1.830 0.069394 .  
## makesubaru             -0.582996   0.200138  -2.913 0.004164 ** 
## maketoyota             -0.122274   0.090517  -1.351 0.178911    
## makevolkswagen          0.007413   0.097369   0.076 0.939425    
## makevolvo               0.059697   0.116970   0.510 0.610592    
## aspirationturbo         0.077720   0.039165   1.984 0.049149 *  
## num.of.doorsfour        0.010775   0.084409   0.128 0.898604    
## num.of.doorstwo        -0.044574   0.089620  -0.497 0.619708    
## body.stylehardtop      -0.162044   0.070241  -2.307 0.022513 *  
## body.stylehatchback    -0.202478   0.065047  -3.113 0.002244 ** 
## body.stylesedan        -0.180640   0.070615  -2.558 0.011580 *  
## body.stylewagon        -0.206151   0.077263  -2.668 0.008520 ** 
## drive.wheelsfwd        -0.043380   0.054367  -0.798 0.426261    
## drive.wheelsrwd         0.040809   0.069906   0.584 0.560308    
## wheel.base              0.098475   0.032477   3.032 0.002890 ** 
## length                 -0.050919   0.037337  -1.364 0.174814    
## width                   0.038562   0.029188   1.321 0.188593    
## height                 -0.071620   0.020621  -3.473 0.000683 ***
## curb.weight             0.279232   0.051072   5.467 2.02e-07 ***
## engine.typel            0.205689   0.250140   0.822 0.412295    
## engine.typeohc         -0.102758   0.057524  -1.786 0.076189 .  
## engine.typeohcf         0.435964   0.155698   2.800 0.005826 ** 
## engine.typeohcv        -0.103281   0.071153  -1.452 0.148850    
## num.of.cylindersfive   -0.030838   0.167983  -0.184 0.854608    
## num.of.cylindersfour    0.077548   0.205606   0.377 0.706614    
## num.of.cylinderssix    -0.067828   0.153169  -0.443 0.658566    
## num.of.cylindersthree         NA         NA      NA       NA    
## num.of.cylinderstwelve  0.165848   0.244833   0.677 0.499266    
## engine.size             0.088377   0.054001   1.637 0.103950    
## fuel.system2bbl         0.142534   0.087745   1.624 0.106519    
## fuel.systemidi          0.756979   0.379445   1.995 0.047974 *  
## fuel.systemmfi          0.156825   0.157099   0.998 0.319864    
## fuel.systemmpfi         0.181128   0.091088   1.988 0.048693 *  
## fuel.systemspdi         0.153179   0.109548   1.398 0.164227    
## fuel.systemspfi         0.390775   0.182972   2.136 0.034430 *  
## bore                   -0.059913   0.028187  -2.126 0.035286 *  
## compression.ratio      -0.168782   0.109212  -1.545 0.124476    
## city.mpg               -0.119238   0.052229  -2.283 0.023927 *  
## highway.mpg             0.085383   0.047148   1.811 0.072275 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1051 on 141 degrees of freedom
## Multiple R-squared:  0.969,  Adjusted R-squared:  0.9574 
## F-statistic: 83.23 on 53 and 141 DF,  p-value: < 2.2e-16
lm.auto.log.price.step$anova # ANOVA of the result 
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## log.price ~ make + fuel.type + aspiration + num.of.doors + body.style + 
##     drive.wheels + engine.location + wheel.base + length + width + 
##     height + curb.weight + engine.type + num.of.cylinders + engine.size + 
##     fuel.system + bore + stroke + compression.ratio + horsepower + 
##     peak.rpm + city.mpg + highway.mpg
## 
## Final Model:
## log.price ~ make + aspiration + num.of.doors + body.style + drive.wheels + 
##     wheel.base + length + width + height + curb.weight + engine.type + 
##     num.of.cylinders + engine.size + fuel.system + bore + compression.ratio + 
##     city.mpg + highway.mpg
## 
## 
##                Step Df     Deviance Resid. Df Resid. Dev       AIC
## 1                                         138   1.551445 -828.5934
## 2 - engine.location  0 6.661338e-16       138   1.551445 -828.5934
## 3       - fuel.type  0 1.532108e-14       138   1.551445 -828.5934
## 4          - stroke  1 3.652842e-05       139   1.551482 -830.5888
## 5      - horsepower  1 2.814223e-03       140   1.554296 -832.2355
## 6        - peak.rpm  1 2.260803e-03       141   1.556557 -833.9520

Conclusion:

StepAIC has determined the below features as the best fitted model.

log.price ~ make + aspiration + num.of.doors + body.style + drive.wheels + wheel.base + length + width + height + curb.weight + engine.type + num.of.cylinders + engine.size + fuel.system + bore + compression.ratio + city.mpg + highway.mpg

Lets solve Q2:

2. SVD Regression: Create a model matrix for all features using the model.matrix function with no intercept term (e.g. -1 in your model formula). Examine the first few rows of the model matrix. Notice how the categorical variables are encoded by a series of dummy variables. Does the coding make sense given the number of unique levels? Next, compute the SVD of the model matrix. Examine the singular values and determine which diagonal elements of the inverse singular value matrix should be set to zero. A plot may help you understand the fall-off in singular values. Then compute the weight vector. Use the weight vector to compute scores and evaluate the model. You may need to try models with several different numbers of inverse singular values set to zero to find a model with good overall performance.

ModelMatrix = model.matrix(log.price ~ . -1, data = auto.price)
M = as.matrix(ModelMatrix)

MTM = t(M) %*% M
#MTM
mSVD = svd(MTM)
#mSVD$d

d.trim = rep(0,60)
d.trim[1:57] = 1/mSVD$d[1:57]
mD=diag(d.trim)
#mD
mInv = mSVD$v %*% mD %*% t(mSVD$u)
#mInv
MTMTM = mInv %*% t(M)

b = MTMTM %*% auto.price$log.price

auto.price$score = M%*% b + mean(auto.price$log.price)
auto.price$resids = auto.price$score - auto.price$log.price

plot.svd.reg <- function(df, k = 4){
  
  p1 <- ggplot(df) + 
    geom_point(aes(score, resids), size = 2) + 
    stat_smooth(aes(score, resids)) +
    ggtitle('Residuals vs. fitted values')
  
  p2 <- ggplot(df, aes(resids)) +
    geom_histogram(aes(y = ..density..)) +
    geom_density(color = 'red', fill = 'red', alpha = 0.2) +
    ggtitle('Histogram of residuals')
  
  qqnorm(df$resids)
  
  grid.arrange(p1, p2, ncol = 2)
  
  df$std.resids = sqrt((df$resids - mean(df$resids))^2)  
  
  p3 = ggplot(df) + 
    geom_point(aes(score, std.resids), size = 2) + 
    stat_smooth(aes(score, std.resids)) +
    ggtitle('Standardized residuals vs. fitted values')
  print(p3) 
  
  n = nrow(df)
  Ybar = mean(df$log.price)
  SST <- sum((df$log.price - Ybar)^2)
  SSR <- sum(df$resids * df$resids)
  SSE = SST - SSR
  cat(paste('SSE =', as.character(SSE), '\n'))
  cat(paste('SSR =', as.character(SSR), '\n'))
  cat(paste('SST =', as.character(SSE + SSR), '\n'))
  cat(paste('RMSE =', as.character(SSE/(n - 2)), '\n'))
  
  adjR2  <- 1.0 - (SSR/SST) * ((n - 1)/(n - k - 1))
  cat(paste('Adjusted R^2 =', as.character(adjR2)), '\n')
}

plot.svd.reg(auto.price)

## SSE = -16997.3121390497 
## SSR = 17047.5681761553 
## SST = 50.2560371056279 
## RMSE = -88.068974813729 
## Adjusted R^2 = -345.355688804103

Conclusion:

Lets solve question 3:

3. Elastic Net Regression: Elastic net regression is the combination of ridge regression and lasso regression. In this case use an alpha parameter of 0.5 to give equal weight to each regularization method. Using the model matrix you created for part 2 and the 1-d matrix of log price, compute a Gaussian regression model using 20 values of the regularization parameter lambda. Compute the scores using the predict method. Plot and examine the evaluation of the parameter values and deviance with lambda. Choose a value of lambda (a column in the scores matrix) and evaluate your model. You may need to try several lambda’s to find one which gives good overall performance.

b = as.matrix(auto.price$log.price)
mod.ridge = glmnet(M, b, family = 'gaussian', nlambda = 20, alpha = 0.5)
plot(mod.ridge, xvar = 'lambda', label = TRUE)

plot(mod.ridge, xvar = 'dev', label = TRUE)

mod.ridge.lasso = glmnet(M, b, family = 'gaussian', nlambda = 20, alpha = 0.5)
plot(mod.ridge.lasso, xvar = 'lambda', label = TRUE)

plot(mod.ridge.lasso, xvar = 'dev', label = TRUE)

auto.price$score = predict(mod.ridge.lasso, newx = M)[, 15]
auto.price$resids = auto.price$score - auto.price$log.price

plot.svd.reg(auto.price)

## SSE = 48.5980375634444 
## SSR = 1.65799954218514 
## SST = 50.2560371056296 
## RMSE = 0.251803303437536 
## Adjusted R^2 = 0.96631439935969

Conclusion: