\(~\) \(~\)

\(~\) \(~\)

\(\color{blue}{\text{QUESTION 1}}\)

head(cars)


colnames(cars)

str(cars)

\(~\) \(~\)

cars$horsepower <- as.numeric(cars$horsepower)
write_csv(cars, "cars.csv")
# Convert horsepower from character to dbl using as numeric 

\(~\) \(~\) \(~\) \(~\)

\(\color{blue}{\text{QUESTION 2}}\)

cars %>%
  ggplot(aes(x=cylinders, y=mpg)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y="Fuel Efficiency (mpg)", x="No. of cylinders")

\(~\) \(~\) \(~\) \(~\) \(~\)

\(\color{blue}{\text{QUESTION 3}}\)

\(~\) \(~\) \(~\) \(~\)

The coefficient (B1) is in line with the graphical representation in Question 2 as B1= -3.56, and the regression line in Question 2 is downward sloping. Further, a one unit increase in the no. of cylinders in the engine decreases the fuel efficiency measured in miles per gallon (mpg) by 3.5629. \(~\) \(~\)

cars %>%
  lm(mpg ~ cylinders, data=.) %>%
  summary()
## 
## Call:
## lm(formula = mpg ~ cylinders, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.2607  -3.3841  -0.6478   2.5538  17.9022 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.9493     0.8330   51.56   <2e-16 ***
## cylinders    -3.5629     0.1458  -24.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.942 on 396 degrees of freedom
## Multiple R-squared:  0.6012, Adjusted R-squared:  0.6002 
## F-statistic: 597.1 on 1 and 396 DF,  p-value: < 2.2e-16

\(~\) \(~\) \(~\) \(~\) \(~\)

\(\color{blue}{\text{QUESTION 4}}\)

\(~\) \(~\)

A one unit increase in the no. of cylinders in the engine decreases the fuel efficiency measured in miles per gallon (mpg) by 0.08740.

A one pound increase in vehicle weight decreases the fuel efficiency measured in(mpg) by 0.006511.

A one year difference (increase) in the car model year increases fuel efficiency measured in (mpg) by 0.7532, i.e. the newer the car model, the more efficient the fuel efficiency is.

The coefficient on cylinders is not statistically significant in this case as the p value for cylinders (0.706665) is > than 0.1.

cars %>%
  lm(mpg ~ cylinders+ weight + year,data=.) %>%
  summary()
## 
## Call:
## lm(formula = mpg ~ cylinders + weight + year, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9727 -2.3180 -0.0755  2.0138 14.3505 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -13.925603   4.037305  -3.449 0.000623 ***
## cylinders    -0.087402   0.232075  -0.377 0.706665    
## weight       -0.006511   0.000459 -14.185  < 2e-16 ***
## year          0.753286   0.049802  15.126  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.438 on 394 degrees of freedom
## Multiple R-squared:  0.8079, Adjusted R-squared:  0.8065 
## F-statistic: 552.4 on 3 and 394 DF,  p-value: < 2.2e-16

\(~\) \(~\) \(~\) \(~\) \(~\)

\(\color{blue}{\text{QUESTION 5}}\)

\(~\) \(~\)

The omitted variable bias would explain the difference between the results in Question 3 & 4. In Question 3, the effect of cylinders on fuel efficiency was -3.56 (B1), in Question 4, this effect decreased to -0.0874 (B1). In Question 3, we only accounted for the effect of cylinders on fuel efficiency and ran a bivariate regression analysis. The explanation/verification of the difference in results is due to weight and year having a correlation with cylinders. When running the below regression to see if the effect of weight and year on cylinders is statistically significant, we can see that both the p values of weight & year below are less than 0.1 and hence are statistically significant. Further, from Question 4, it can be seen both weight and year have an effect on fuel efficiency and that both have a p value less than 0.1, meaning both are statistically significant.

cars %>%
  lm(cylinders ~ weight + year,data=.) %>%
  summary()
## 
## Call:
## lm(formula = cylinders ~ weight + year, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97944 -0.51579 -0.02345  0.45658  2.20968 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.116e+00  8.612e-01   3.619 0.000334 ***
## weight       1.749e-03  4.642e-05  37.691  < 2e-16 ***
## year        -3.760e-02  1.063e-02  -3.537 0.000452 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7455 on 395 degrees of freedom
## Multiple R-squared:  0.8089, Adjusted R-squared:  0.8079 
## F-statistic:   836 on 2 and 395 DF,  p-value: < 2.2e-16

\(~\) \(~\) \(~\) \(~\) \(~\) \(~\)

\(~\) \(~\) \(~\) \(~\) \(~\)

\(\color{blue}{\text{QUESTION 6}}\)

\(~\) \(~\) \(~\) \(~\) \(~\)

\(~\) \(~\) \(~\)

Relationship between car model year and fuel efficiency

\(~\) \(~\) \(~\)

cars %>%
  lm(mpg ~ year,data=.) %>%
  summary()
## 
## Call:
## lm(formula = mpg ~ year, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.024  -5.451  -0.390   4.947  18.200 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -69.55560    6.58911  -10.56   <2e-16 ***
## year          1.22445    0.08659   14.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.379 on 396 degrees of freedom
## Multiple R-squared:  0.3356, Adjusted R-squared:  0.3339 
## F-statistic:   200 on 1 and 396 DF,  p-value: < 2.2e-16

\(~\) \(~\) \(~\) \(~\) \(~\)

\(~\) \(~\) \(~\) \(~\) \(~\)

Visualizing the relationship between car model year and fuel efficiency

\(~\) \(~\) \(~\)

cars %>%
  ggplot(aes(y=mpg, x=year)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y="Fuel Efficiency (mpg)", x="Car Model Year")

\(~\) \(~\) \(~\) \(~\) \(~\)

\(~\) \(~\) \(~\) \(~\) \(~\)

Understanding the effect of cylinders, displacement, horsepower, weight, acceleration & year on fuel efficiency

\(~\) \(~\) \(~\) \(~\) \(~\)

cars %>%
  lm(mpg ~ cylinders + displacement + horsepower + weight + acceleration + year, data=.) %>%
  summary()
## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight + 
##     acceleration + year, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.6927 -2.3864 -0.0801  2.0291 14.3607 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.454e+01  4.764e+00  -3.051  0.00244 ** 
## cylinders    -3.299e-01  3.321e-01  -0.993  0.32122    
## displacement  7.678e-03  7.358e-03   1.044  0.29733    
## horsepower   -3.914e-04  1.384e-02  -0.028  0.97745    
## weight       -6.795e-03  6.700e-04 -10.141  < 2e-16 ***
## acceleration  8.527e-02  1.020e-01   0.836  0.40383    
## year          7.534e-01  5.262e-02  14.318  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.435 on 385 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.8093, Adjusted R-squared:  0.8063 
## F-statistic: 272.2 on 6 and 385 DF,  p-value: < 2.2e-16

\(~\) \(~\) \(~\)

Relationship between weight and car model year

\(~\) \(~\) \(~\)

cars %>%
  ggplot(aes(x=year, y=weight)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y="Vehicle Weight (in pounds)", x="Car Model Year")

\(~\) \(~\) \(~\) \(~\)

Relationship between horsepower and car model year

\(~\) \(~\) \(~\)

cars %>%
  ggplot(aes(x=year, y=horsepower)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y="Engine Horsepower", x="Car Model Year")

\(~\) \(~\) \(~\) \(~\) \(~\)

Relationship between cylinders and car model year

\(~\) \(~\) \(~\)

cars %>%
  ggplot(aes(x=year, y=cylinders)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y="No. of cylinders in engine", x="Car Model Year")

\(~\) \(~\) \(~\) \(~\) \(~\)

Relationship between displacement and car model year

\(~\) \(~\) \(~\)

cars %>%
  ggplot(aes(x=year, y=displacement)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y="Engine Displacement (in cubic inches)", x="Car Model Year") 

\(~\) \(~\) \(~\) \(~\) \(~\)

Relationship between acceleration and car model year

\(~\) \(~\) \(~\)

cars %>%
  ggplot(aes(x=year, y=acceleration)) +
  geom_point() +
  stat_smooth(method= "lm", se=FALSE) +
  labs(y=" Acceleration time (in seconds)", x="Car Model Year")

\(~\) \(~\) \(~\) \(~\) \(~\)

Conclusion:

The friend was correct. As years went by, car manufacturers became more conscious about producing cars with better fuel efficiency. I ran an initial bivariate regression model \(\color{maroon}{\text{PART A}}\) to find the effect of the car model year on fuel efficiency. The effect of car model year on fuel efficiency is statistically significant since the P value of year (<2e-16) is less than 0.1. Further, to visualize the relationship, I plotted a graph with the regression line. The Visualization \(\color{maroon}{\text{PART B}}\) shows there is an upward, positive relationship between the car model year and fuel efficiency.

\(~\) \(~\) \(~\)

In \(\color{maroon}{\text{PART C}}\) , I ran a multivariate linear regression to find if the effect of the of cylinders, displacement, horsepower , weight, acceleration & year on fuel efficiency is statistically significant. Only the effect of weight & year on fuel efficiency is statistically significant. Hence, I ran a bivariate linear regression to understand the relationship between the car model year and the weight of the vehicles. As years went by, car manufactures produced lighter vehicles. This is in line when the regression we ran in Question 4 which showed a one pound increase in vehicle weight decreases the fuel efficiency.

\(~\) \(~\) \(~\)

Further, I visualized multiple bivariate regression lines to see the relationship between all other variables and how they changed as the car year model increased. As the years went by, car manufacturers produced cars with fewer no. of cylinders, lower engine displacement (in cubic inches), lower engine horsepower, & faster acceleration time.

\(~\) \(~\) \(~\)

\(~\) \(~\) \(~\)