class: center, middle, inverse, title-slide .title[ # Week 2 Assignment: Presentation ] .subtitle[ ## MLR of CO2 Emissions for Vehicles ] .author[ ### Alice Xiang & Angelo Saporito ] .date[ ### 2024-02-25 ] --- class: inverse1 <h2 align="center"> Table of Contents</h2> .pull-left[ - Introduction to the Dataset - Dataset and variables - Research Question - Full Model - Analysis - Residual analysis - Variance Inflation Factor (VIF) - Discussion - Edited Model - Analysis - Residual analysis - Variance Inflation Factor (VIF) - Discussion ] .pull-right[ - Transformed Model - Box-Cox Transform - Log Transform - Residual Analysis - Discussion - The Bootstrap - Model Selection - Goodness-of-fit - Final Model - Conclusion ] --- class: inverse center middle ## Introduction to the Dataset --- .pull-left[ ## Introduction We chose [this dataset](https://www.kaggle.com/datasets/bhuviranga/co2-emissions) on CO2 emissions of different vehicles to do multiple linear regression analysis. ] .pull-right[ ## Variables The following are the variables included in the dataset (6 continuous, 2 categorical): - Engine.Size - Cylinders - Fuel.Type - Fuel.Consumption.City - Fuel.Consumption.Hwy - Fuel.Consumption.Combined - Fuel.Consumption.mpg - CO2.Emissions ] --- class: inverse center middle ## Research Question: How do different predictor variables relate to the CO2 emissions of the vehicle? --- class: inverse center middle ## Full Model + Discussion --- ## The Full Model Using R, we create the full model. ```r full.model = lm(CO2.Emissions ~ ., data = emissions) ```
--- ## Residual Analysis <img src="Final-Presentation_files/figure-html/unnamed-chunk-4-1.png" width="100%" /> --- .pull-left[ ## VIF | | GVIF| Df| GVIF^(1/(2*Df))| |:-------------------------|-------:|--:|---------------:| |Engine.Size | 11.67| 1| 3.42| |Cylinders | 14.40| 7| 1.21| |Fuel.Type | 2.48| 4| 1.12| |Fuel.Consumption.City | 2069.97| 1| 45.50| |Fuel.Consumption.Hwy | 568.00| 1| 23.83| |Fuel.Consumption.Combined | 4651.99| 1| 68.21| |Fuel.Consumption.mpg | 10.26| 1| 3.20| ] .pull-right[ ## Issues We See - nonconstant variance - residuals not normal (Q-Q plot) - multicollinearity between all Fuel Consumption variables ] --- class: inverse center middle ## Edited Model + Discussion --- ## Edited Model: Removing Predictors due to Multicollinearity Of the Fuel Consumption variables, we keep only Fuel.Consumption.mpg and create the following model.
--- ## Residual Plots of Edited Model <img src="Final-Presentation_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> --- .pull-left[ ## VIF of Edited Model | | GVIF| Df| GVIF^(1/(2*Df))| |:--------------------|-----:|--:|---------------:| |Engine.Size | 11.15| 1| 3.34| |Cylinders | 11.79| 7| 1.19| |Fuel.Type | 1.41| 4| 1.04| |Fuel.Consumption.mpg | 2.95| 1| 1.72| ] .pull-right[ ## Edited Model Discussion We see that the residual plots improved, and the issues with multicollinearity have been resolved. ### Remaining Issues: - variances still nonconstant - assumption of normality still violated ] --- class: inverse center middle ## Transformed Model + Discussion --- ## Box-Cox Transformation We proceed by performing several box-cox transformations on the data. <img src="Final-Presentation_files/figure-html/unnamed-chunk-10-1.png" width="100%" /> The plots show that a log transformation of Fuel Consumption impacts lambda. --- ## Log Transformed Model Using a log transformed mpg, we create the following model with a log of the response variable CO2.Emissions: ```r log.model = lm(log(CO2.Emissions) ~ Engine.Size + Cylinders + Fuel.Type + log(Fuel.Consumption.mpg), data = emissions.edit) ```
--- ## Residual Plots of Transformed Model <img src="Final-Presentation_files/figure-html/unnamed-chunk-13-1.png" width="100%" /> --- ## Transformed Model Discussion - Significant improvements from earlier models - Curvature in residual plot greatly improved - Q-Q plot closest to normal --- class: inverse center middle ## The Bootstrap --- <h2 align="center">Boostrapping Coefficients</h2> .pull-center[ <img src="Final-Presentation_files/figure-html/unnamed-chunk-17-1.png" width="70%" height="70%" style="display: block; margin: auto;" /> ] --- <h2 align="center">Confidence Intervals</h2> .pull-center[ | |Estimate |Std. Error |t value |Pr(>|t|) |btc.ci.95 | |:-------------------------|:--------|:----------|:--------|:------------------|:----------------------| |(Intercept) |8.891 |0.006 |1414.010 |0.000 |[ 8.8791 , 8.9015 ] | |Engine.Size |0.000 |0.000 |0.798 |0.425 |[ -4e-04 , 0.0013 ] | |Cylinders4 |-0.002 |0.002 |-1.388 |0.165 |[ -0.0051 , 4e-04 ] | |Cylinders5 |-0.009 |0.004 |-2.359 |0.018 |[ -0.0143 , -0.0036 ] | |Cylinders6 |0.002 |0.002 |0.856 |0.392 |[ -0.0013 , 0.005 ] | |Cylinders8 |0.001 |0.002 |0.564 |0.572 |[ -0.0028 , 0.0054 ] | |Cylinders10 |0.003 |0.004 |0.699 |0.485 |[ -0.0038 , 0.009 ] | |Cylinders12 |0.007 |0.003 |2.181 |0.029 |[ 0.0012 , 0.0128 ] | |Fuel.TypeE |-0.492 |0.002 |-292.677 |0.000 |[ -0.4963 , -0.4875 ] | |Fuel.TypeX |-0.141 |0.001 |-108.385 |0.000 |[ -0.1426 , -0.139 ] | |Fuel.TypeZ |-0.142 |0.001 |-108.189 |0.000 |[ -0.1441 , -0.1404 ] | |log(Fuel.Consumption.mpg) |-0.988 |0.002 |-654.666 |0.000 |[ -0.9903 , -0.9848 ] | ] --- <h2 align="center">Bootstrapping Residuals</h2> <img src="Final-Presentation_files/figure-html/unnamed-chunk-19-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> - Residuals are largely symmetric - Presence of at least one outlier and some slight right skew --- <h2 align="center">Bootstrapping Residuals cont.</h2> <img src="Final-Presentation_files/figure-html/unnamed-chunk-20-1.png" width="70%" height="70%" style="display: block; margin: auto;" /> --- <h2 align="center">Bootstrapped Coefficients & Residuals</h2> | |Estimate |Std. Error |Pr(>|t|) |btc.ci.95 |btr.ci.95 | |:-------------------------|:--------|:----------|:------------------|:----------------------|:----------------------| |(Intercept) |8.891 |0.006 |0.000 |[ 8.8791 , 8.9015 ] |[ 8.8781 , 8.9035 ] | |Engine.Size |0.000 |0.000 |0.425 |[ -4e-04 , 0.0013 ] |[ -6e-04 , 0.0014 ] | |Cylinders4 |-0.002 |0.002 |0.165 |[ -0.0051 , 4e-04 ] |[ -0.0058 , 0.0014 ] | |Cylinders5 |-0.009 |0.004 |0.018 |[ -0.0143 , -0.0036 ] |[ -0.0156 , -0.0015 ] | |Cylinders6 |0.002 |0.002 |0.392 |[ -0.0013 , 0.005 ] |[ -0.0021 , 0.0058 ] | |Cylinders8 |0.001 |0.002 |0.572 |[ -0.0028 , 0.0054 ] |[ -0.0034 , 0.0064 ] | |Cylinders10 |0.003 |0.004 |0.485 |[ -0.0038 , 0.009 ] |[ -0.0042 , 0.0101 ] | |Cylinders12 |0.007 |0.003 |0.029 |[ 0.0012 , 0.0128 ] |[ 8e-04 , 0.0135 ] | |Fuel.TypeE |-0.492 |0.002 |0.000 |[ -0.4963 , -0.4875 ] |[ -0.4951 , -0.4883 ] | |Fuel.TypeX |-0.141 |0.001 |0.000 |[ -0.1426 , -0.139 ] |[ -0.1433 , -0.1382 ] | |Fuel.TypeZ |-0.142 |0.001 |0.000 |[ -0.1441 , -0.1404 ] |[ -0.1449 , -0.1396 ] | |log(Fuel.Consumption.mpg) |-0.988 |0.002 |0.000 |[ -0.9903 , -0.9848 ] |[ -0.9907 , -0.9844 ] | --- class: inverse center middle ## Model Selection --- ## Comparison of the Models' Goodness of Fit Table: Goodness-of-fit Measures of Candidate Models | | SSE| R.sq| R.adj| Cp| |:-----------------|-------:|-----:|-----:|------:| |Full Model | 1673191| 0.934| 0.934| 14.015| |Edited Model | 1673191| 0.934| 0.934| 14.015| |Transformed Model | 2| 0.995| 0.995| Inf| --- ## Model Selection Log Transformed Model selected - highest adjusted R squared - fewest violations to assumptions ## Variable Selection - Values of Cylinder have large p-values - Engine.Size p-value also large We remove Engine.Size from the model --- ## Final Model ```r log.model = lm(log(CO2.Emissions) ~ Cylinders + Fuel.Type + log(Fuel.Consumption.mpg), data = emissions.edit) ```
--- class: inverse center middle ## Conclusions --- ## Conclusions - Log transformed model chosen as best model due to residual analysis and goodness of fit - Still shows violations to assumptions - variation in residuals - assumption of normality - Includes outliers Further analysis can be done through bootstrapping to eliminate some of these issues --- class: inverse center middle ## Contributions: Alice: Beginning - Edited Model Angelo: Transformed Model - Conclusion