class: center, middle, inverse, title-slide .title[ #
State Petrol Prices
] .author[ ###
By: Evan Parker
] .institute[ ###
West Chester University of Pennsylvania
] .date[ ###
09/27/2022
Prepared for
STA 490: Capstone Statistics
] --- class: inverse, middle ## <center><b><font color = gold>Research Question </font></b></center> <center><font size = 6 color = white>What effect does petrol tax, income, miles of paved highways, and proportion of driver license ownership have on fuel consumption in the United States?</font></center> --- ## <center><b><font color = purple>Data Set Description </font></b></center> The *State Petrol Prices* data set was collected by Helmut Spaeth in 1991. The data was collected over the span of a year in 48 states (most likely being the continental United States). The *State Petrol Prices* data set consists of **48** observations of **5** variables: - **P.Tax**: The petrol tax for each state (*cents per gallon*) - **Income**: The income per capita for each state (*US dollars*) - **Paved**: Miles of paved state highways (*miles*) - **Driver.Prop**: Proportion of state's population with valid driver's licenses - **Consumption**: Consumption of petrol (*millions of gallons*) Our response variable for this data set is **Consumption**. <table> <thead> <tr> <th style="text-align:right;"> Index </th> <th style="text-align:right;"> P.Tax </th> <th style="text-align:right;"> Income </th> <th style="text-align:right;"> Paved </th> <th style="text-align:right;"> Driver.Prop </th> <th style="text-align:right;"> Consumption </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 9.0 </td> <td style="text-align:right;"> 3571 </td> <td style="text-align:right;"> 1976 </td> <td style="text-align:right;"> 0.525 </td> <td style="text-align:right;"> 541 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 9.0 </td> <td style="text-align:right;"> 4092 </td> <td style="text-align:right;"> 1250 </td> <td style="text-align:right;"> 0.572 </td> <td style="text-align:right;"> 524 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 9.0 </td> <td style="text-align:right;"> 3865 </td> <td style="text-align:right;"> 1586 </td> <td style="text-align:right;"> 0.580 </td> <td style="text-align:right;"> 561 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 7.5 </td> <td style="text-align:right;"> 4870 </td> <td style="text-align:right;"> 2351 </td> <td style="text-align:right;"> 0.529 </td> <td style="text-align:right;"> 414 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 8.0 </td> <td style="text-align:right;"> 4399 </td> <td style="text-align:right;"> 431 </td> <td style="text-align:right;"> 0.544 </td> <td style="text-align:right;"> 410 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 10.0 </td> <td style="text-align:right;"> 5342 </td> <td style="text-align:right;"> 1333 </td> <td style="text-align:right;"> 0.571 </td> <td style="text-align:right;"> 457 </td> </tr> </tbody> </table> --- ## <center><b><font color = purple>Assumptions </font></b></center> - Response Variable **Consumption** is normal <img src="data:image/png;base64,#Parker-HTML-Presentation_files/figure-html/Consumption Normal-1.png" width="50%" style="display: block; margin: auto;" /> - Explanatory variables are non-random and uncorrelated to each other - Data is independent and identically distributed <b><font color = purple>All assumptions have been met! Time to build the model</font></b> --- ## <center><b><font color = purple >Model #1 </font></b></center> <b><font color = purple>Suggested Model:</b> Consumption = P.Tax + Income + Paved + Driver.Prop</font> ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.772911e+02 1.855412e+02 2.0334630 4.820676e-02 ## P.Tax -3.479015e+01 1.297020e+01 -2.6823137 1.033230e-02 ## Income -6.658875e-02 1.722175e-02 -3.8665501 3.684222e-04 ## Paved -2.425889e-03 3.389174e-03 -0.7157758 4.779989e-01 ## Driver.Prop 1.336449e+03 1.922981e+02 6.9498818 1.520838e-08 ``` As seen above, all of the variables apart from **Paved** is statistically significant. Therefore, it might be beneficial to not include **Paved** in our model. ``` ## value numdf dendf ## 22.70644 4.00000 43.00000 ``` Above is the F-Statistic for our model: *22.706* In the next model, we will see how the F-Statsitic changes if we do not include **Paved**. --- ## <center><b><font color = purple>Model #2 </font></b></center> <b><font color = purple>Suggested Model:</b> Consumption = P.Tax + Income + Driver.Prop</font> ``` ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 307.32789650 156.83066992 1.959616 5.639436e-02 ## P.Tax -29.48380857 10.58357587 -2.785808 7.848184e-03 ## Income -0.06802286 0.01700975 -3.999051 2.396943e-04 ## Driver.Prop 1374.76841106 183.66953983 7.485010 2.238764e-09 ``` As seen above, all of the variables are statistically significant with the exclusion of **Paved**. ``` ## value numdf dendf ## 30.44188 3.00000 44.00000 ``` Above is the F-Statsitic for our new model: *30.44* Since all of the variables are statistically significant and the F-Statistic is higher, we will move forward with *Model #2*. --- ## <center><b><font color = purple>Residual Testing </font></b></center> .pull-left[ <center><b><font color = purple>Residual Plot</font></b></center> <img src="data:image/png;base64,#Parker-HTML-Presentation_files/figure-html/resid2-1.png" width="100%" style="display: block; margin: auto;" /> The residual plot shown above shows that the data points are randomly scattered around the plot. This indicates that the linear model constructed is an appropriate model for prediction and estimation. ] .pull-right[ <center><b><font color = purple>Q-Q Plot</font></b></center> <img src="data:image/png;base64,#Parker-HTML-Presentation_files/figure-html/qqplot2-1.png" width="100%" style="display: block; margin: auto;" /> The Q-Q plot shown above shows that a majority of the data points hug the . This indicates that the data is normally distributed and the linear model constructed is an appropriate model for prediction and estimation. ] --- class: inverse, middle ## <center><b><font color = gold>Goodness-of-Fit </font></b></center> The adjusted *r-squared* statistic for our model is: ``` ## [1] 0.6526896 ``` The *Residual Standard Error* or *RSE* for our model is: ``` ## [1] 65.93772 ``` The *F-Statistic* for our model is **30.44** on a combined **47** degrees of freedom (shown below). ``` ## value numdf dendf ## 30.44188 3.00000 44.00000 ``` The *P-Value* for our model is **0.00000000008235**. --- ## <center><b><font color = purple>Model Interpretation</font></b></center> <b><font color = purple>Final Model: Consumption</b> = 307.33 - 29.48(<b>P.Tax</b>) - 0.07(<b>Income</b>) + 1374.77(<b>Driver.Prop</b>)</font> Interpretation of the model's variables can be seen below: - **Intercept**: In a state with a 0% Petrol tax, $0 income per captia, and 0% population with a driver license, there would be 307 million gallons of petrol consumption. This does not make sense in an applied sense. - **P.Tax**: When a state's petrol tax increases by 1%, the state's petrol consumption decreases by *29.48 million gallons*. - **Income**: When a state's income per capita increases by 1 US dollar, the state's petrol consumption decreases by *68 thousand gallons*. - **Driver.Prop**: When a state's percentage of the population with a valid drivers license increases by 1%, the state's petrol consumption increases by *1.37 billion gallons*. <b><font color = purple>Potential Drawbacks: </b></font>This data set is a population rather than a sample for the year 1991, potentially creating an issue foe prediction for future years. However, it is common to assume not much difference in gas consumption from year to year, thus making this model accurate for prediction and estimation. --- class: inverse, middle ## <center><b><font color = gold>Conclusions </font></b></center> This model can accurately be used for prediction of future years. While this is a population of the continental United States in 1991, it is also a sample of any given year. Thus, the model can be used for future years (1992 and on). - **P.Tax** negatively effects petrol consumption - **Income** negatively effects petrol consumption - **Driver.Prop** positively effects petrol consumption #<b><center><font color = white> Any Questions?</font></center></b>