My research question is how fuel economy change over time. I am a car owner and want to save money. US standards for passenger vehicle fuel economy and greenhouse gas emissions are slated to tighten steeply.The goals of the standards are to reduce greenhouse gas emissions, improve energy security, and reduce consumers’ fuel costs.
Hadley Wickham released data packages to CRAN. Fuel economy data for all cars sold in the US from 1984 to 2015. 33,442 rows, 12 variables. (Source: Environmental protection agency). URL https://github.com/hadley/fueleconomy I will be studying two variables: hwy (Highway fuel economy, in mpg) and year (Model year). “Year” is explanatory variable and “hwy” is response variable. This is observational type of study.
Population of interest here is Large Cars class vehicle with front-wheel drive, regular fuel, six cylinders and 3.5 engine displacement, in litres. The findings from this analysis can be generalized to that population. All cars model make using the same specification. There are no potential sources of bias that might prevent generalizability. These data can be used to establish causal links between the variables of interest.
if(!require(devtools)) install.packages("devtools")
## Loading required package: devtools
## Loading required package: usethis
devtools::install_github("hadley/fueleconomy")
## Skipping install of 'fueleconomy' from a github remote, the SHA1 (d590bcf6) has not changed since last install.
## Use `force = TRUE` to force installation
library(fueleconomy)
library(psych)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
df_full = data.frame("vehicle_id" = as.numeric(fueleconomy::vehicles$id), "make" = fueleconomy::vehicles$make,
"model" = fueleconomy::vehicles$model, "year" = as.numeric(fueleconomy::vehicles$year), "class" = fueleconomy::vehicles$class, "trans" = fueleconomy::vehicles$trans, "drive"=fueleconomy::vehicles$drive, "cyl"=as.numeric(fueleconomy::vehicles$cyl),
"displ"= as.character(fueleconomy::vehicles$displ), "fuel"=fueleconomy::vehicles$fuel,"hwy"=as.numeric(fueleconomy::vehicles$hwy), "cty"=as.numeric(fueleconomy::vehicles$cty))
df <- subset(df_full, drive == "Front-Wheel Drive" & fuel == "Regular" & class == "Large Cars" & cyl == 6 & displ == "3.5")
cat("Data frame row number is",nrow(df),", column number is",ncol(df), "\n")
## Data frame row number is 81 , column number is 12
summary(df)
## vehicle_id make model year
## Min. :10113 Chrysler :21 Intrepid :15 Min. :1993
## 1st Qu.:13627 Dodge :15 Taurus FWD:10 1st Qu.:1997
## Median :19839 Ford :10 Avalon : 8 Median :2004
## Mean :20309 Chevrolet: 8 Vision : 7 Mean :2003
## 3rd Qu.:26004 Toyota : 8 Concorde : 6 3rd Qu.:2009
## Max. :33682 Eagle : 7 300 M : 5 Max. :2014
## (Other) :12 (Other) :30
## class trans
## Large Cars :81 Automatic 4-spd:45
## Compact Cars : 0 Automatic (S6) :11
## Midsize-Large Station Wagons: 0 Automatic 5-spd: 8
## Midsize Cars : 0 Automatic 6-spd: 8
## Midsize Station Wagons : 0 Automatic (S4) : 6
## Minicompact Cars : 0 Automatic (S5) : 3
## (Other) : 0 (Other) : 0
## drive cyl displ
## 2-Wheel Drive : 0 Min. :6 3.5 :81
## 4-Wheel Drive : 0 1st Qu.:6 0 : 0
## 4-Wheel or All-Wheel Drive: 0 Median :6 1 : 0
## All-Wheel Drive : 0 Mean :6 1.1 : 0
## Front-Wheel Drive :81 3rd Qu.:6 1.2 : 0
## Part-time 4-Wheel Drive : 0 Max. :6 1.3 : 0
## Rear-Wheel Drive : 0 (Other): 0
## fuel hwy cty
## Regular :81 Min. :23.00 Min. :15.00
## CNG : 0 1st Qu.:24.00 1st Qu.:16.00
## Diesel : 0 Median :25.00 Median :16.00
## Electricity : 0 Mean :25.83 Mean :17.11
## Gasoline or E85 : 0 3rd Qu.:28.00 3rd Qu.:18.00
## Gasoline or natural gas: 0 Max. :30.00 Max. :20.00
## (Other) : 0
describe(df$year)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 81 2003.31 6.16 2004 2003.38 7.41 1993 2014 21 -0.12 -1.32 0.68
describe(df$hwy)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 81 25.83 2.15 25 25.68 1.48 23 30 7 0.42 -1.53 0.24
ggplot(df, aes(x=df$year,y=df$hwy))+ geom_point()
## The relationship look linear, positive
cat("The correlation coefficient is", cor(df$year, df$hwy))
## The correlation coefficient is 0.7977006
DATA606::plot_ss(x = df$year, y = df$hwy, showSquares = TRUE)
## Click two points to make a line.
## Call:
## lm(formula = y ~ x, data = pts)
##
## Coefficients:
## (Intercept) x
## -532.1659 0.2785
##
## Sum of Squares: 134.407
m1 <- lm(hwy ~ year, data = df)
summary(m1)
##
## Call:
## lm(formula = hwy ~ year, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6909 -0.9056 0.2086 0.8661 2.0305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -532.16594 47.46062 -11.21 <2e-16 ***
## year 0.27854 0.02369 11.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.304 on 79 degrees of freedom
## Multiple R-squared: 0.6363, Adjusted R-squared: 0.6317
## F-statistic: 138.2 on 1 and 79 DF, p-value: < 2.2e-16
Linearity: Our scatterplot shows that the data is linear.
plot(m1$residuals ~ df$year)
abline(h = 0, lty = 3) # adds a horizontal dashed line at y = 0
## There is no pattern, linear relationship.
Nearly normal residuals: To check this condition, we can look at a histogram
hist(m1$residuals)
or a normal probability plot of the residuals.
qqnorm(m1$residuals)
qqline(m1$residuals) # adds diagonal line to the normal prob plot
## Normal residuals condition is meet. Histogram is a little left-skewed and not simmetrical.
Constant variability: Based on the plot the constant variability condition appear to be met.
Interpretation of the intercept doesn’t make sense.
As a general rule we reject H0 when the p-value is less than 0.05, i.e. we use a significance level of 0.05, α = 0.05.
Fuel efficient vehicles require less gas to go a given distance. When we burn less gas, we produce less pollution, while spending less on gas—a lot less. Our dependence on oil makes us vulnerable to oil market manipulation and price shocks. Improving the fuel efficiency of US vehicles is the single biggest step we can take to cut America’s oil consumption. Oil is a non-renewable resource, and we cannot sustain our current rate of use indefinitely. Using it wisely now allows us time to find alternative technologies and fuels that will be more sustainable.