Synopis

In this report, we’re exploring the relationship between a set of variables and miles per gallon (MPG) from the given data set of a collection of cars. Then, answer the following questions:

Summary

The data was extracted from the 1974 Motor Trend US Magazine, and comprises fuel consumptions and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

There are in total 32 observations with 11 variables. And, the data looks like this.

##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

Data Exploring

Based on the observation in the boxplot below, manual transmission cars seem to be more fuel efficient than the automatic transmission cars.

And, modelling the relationship between the dependent variable, mpg, and the regressor am.

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## factor(am)1  7.244939   1.764422  4.106127 2.850207e-04

The mean of mpg for automatic car is 17.1473684, while 7.2449393 is the change in mean of mpg between automatic and manual transmission cars.

Hypothesis Testing

Then, running a t-test to test whether the difference is significant.

## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Since the p-value of 0.0013736 is < 0.05, we reject the null hypothesisas there is significant difference in mean MPG bewtween Automatic and Manual transmission cars. Therefore, manual transmission car is better for mpg.

##                2.5 %   97.5 %
## (Intercept) 14.85062 19.44411
## factor(am)1  3.64151 10.84837

And, we are 95% confident that, on average, manual transmission cars are 3.64151 more efficient than automatic transmission cars.

Model Building

This is a scatterplot matrix across all 11 variables with regression lines and correlation values, grouping by transmission (red = automatic, green = manual)

From the pairs plot above, besides am, the few other variables that are highly correlated to mpg are cyl, disp, hp and wt. Hence, we will try

## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(am) + factor(cyl)
## Model 3: mpg ~ factor(am) + factor(cyl) + disp
## Model 4: mpg ~ factor(am) + factor(cyl) + disp + wt
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1     30 720.90                                   
## 2     28 264.50  2    456.40 32.4451 8.589e-08 ***
## 3     27 230.46  1     34.04  4.8391   0.03691 *  
## 4     26 182.87  1     47.59  6.7663   0.01513 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value of 0.01513, which is less than 0.05, we can claim that using model 4 is better than our initial model 1.

Constructing residual plots to check for any signs of regular patterns.

The residual plot show a random patterns, indicating that this is a good fit for the linear model.

## 
## Call:
## lm(formula = mpg ~ factor(am) + factor(cyl) + disp + wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5029 -1.2829 -0.4825  1.4954  5.7889 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  33.816067   2.914272  11.604 8.79e-12 ***
## factor(am)1   0.141212   1.326751   0.106  0.91605    
## factor(cyl)6 -4.304782   1.492355  -2.885  0.00777 ** 
## factor(cyl)8 -6.318406   2.647658  -2.386  0.02458 *  
## disp          0.001632   0.013757   0.119  0.90647    
## wt           -3.249176   1.249098  -2.601  0.01513 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.652 on 26 degrees of freedom
## Multiple R-squared:  0.8376, Adjusted R-squared:  0.8064 
## F-statistic: 26.82 on 5 and 26 DF,  p-value: 1.73e-09

This model explains 83.76% of the variance. But, with a p-value of 0.91605, it shows that transmission type is no insignificant statistical impact on fuel efficiency.