#Load and Explore the Data
# Load dataset
data(mtcars)
# View first few rows
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Summary statistics
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Interpretation of the mtcars Summary Statistics:
The mtcars dataset contains information on 32 cars and their performance characteristics.
The average miles per gallon (mpg) is 20.09, with values ranging from 10.4 to 33.9. This shows that fuel efficiency varies considerably among the cars. The average number of cylinders (cyl) is about 6.19, indicating that most cars have either 4, 6, or 8 cylinders. The average horsepower (hp) is 146.7, but it ranges from 52 to 335, showing large differences in engine power. The average weight (wt) is 3.22 (1000 lbs), meaning that most cars weigh around 3,217 pounds. The average rear axle ratio (drat) is 3.60, while the average quarter-mile time (qsec) is 17.85 seconds. For transmission type (am), the mean is 0.41, which indicates that about 41% of the cars have manual transmission and 59% have automatic transmission. The average number of gears is 3.69, with most cars having 3 or 4 gears. The average number of carburetors (carb) is 2.81, ranging from 1 to 8.
Overall Conclusion:
The summary statistics show that the cars in the dataset differ substantially in terms of fuel efficiency, engine size, horsepower, and weight. The dataset includes both small, fuel-efficient cars and larger, more powerful vehicles, making it useful for analyzing relationships between vehicle characteristics and performance.
#Fit a Multiple Linear Regression Model
# Fit model
model <- lm(mpg ~ wt + hp + am, data = mtcars)
# Display results
summary(model)
##
## Call:
## lm(formula = mpg ~ wt + hp + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## wt -2.878575 0.904971 -3.181 0.003574 **
## hp -0.037479 0.009605 -3.902 0.000546 ***
## am 2.083710 1.376420 1.514 0.141268
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
Interpretation of the Multiple Linear Regression Model:
A multiple linear regression model was fitted to examine how weight (wt), horsepower (hp), and transmission type (am) affect a car’s fuel efficiency (mpg).
Coefficients Interpretation:
Weight (wt) has a coefficient of -2.88 and is statistically significant (p = 0.0036). This means that, holding other variables constant, an increase of 1 unit in weight is associated with a decrease of about 2.88 mpg. Horsepower (hp) has a coefficient of -0.037 and is statistically significant (p = 0.0005). This indicates that cars with higher horsepower tend to have lower fuel efficiency. Transmission type (am) has a coefficient of 2.08, suggesting that manual transmission cars may have about 2.08 mpg higher fuel efficiency than automatic cars. However, this effect is not statistically significant (p = 0.141), so there is not enough evidence to conclude that transmission type affects mpg in this model.
Model Performance:
The R-squared value (0.8399) shows that approximately 84% of the variation in mpg is explained by weight, horsepower, and transmission type. The Adjusted R-squared (0.8227) indicates that the model still explains about 82% of the variation after accounting for the number of predictors. The F-statistic p-value (2.908e-11) is very small, indicating that the overall regression model is statistically significant.
Conclusion;
The results show that weight and horsepower are important predictors of fuel efficiency, with heavier and more powerful cars generally having lower mpg. Although manual transmission cars appear to have better fuel efficiency, the effect is not statistically significant. Overall, the model provides a good explanation of fuel efficiency, accounting for about 84% of the variation in mpg.
#Check Model Assumptions
par(mfrow = c(2, 2))
plot(model)
# Confidence intervals
confint(model)
## 2.5 % 97.5 %
## (Intercept) 28.58963286 39.41611738
## wt -4.73232353 -1.02482730
## hp -0.05715454 -0.01780291
## am -0.73575874 4.90317900
# Predictions
predict(model)
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## 24.422058 23.688021 25.922769 20.625595
## Hornet Sportabout Valiant Duster 360 Merc 240D
## 17.541799 20.107738 14.544073 22.496539
## Merc 230 Merc 280 Merc 280C Merc 450SE
## 21.374884 19.490692 19.490692 15.540903
## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## 16.519618 16.375689 11.207215 10.331556
## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
## 9.996783 27.280123 29.488792 28.368282
## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
## 23.271750 18.248481 18.493160 13.766858
## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
## 16.375976 28.042946 26.515870 27.496205
## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
## 17.067118 21.554154 13.254698 23.998964
Interpretation of Confidence Intervals:
The 95% confidence intervals provide a range of values within which the true coefficient is likely to fall.
For weight (wt), the confidence interval is (-4.732, -1.025). Since zero is not included in this interval, weight has a significant effect on fuel efficiency. This means heavier cars tend to have lower mpg. For horsepower (hp), the confidence interval is (-0.057, -0.018). Since zero is not included, horsepower is also a significant predictor. Cars with higher horsepower generally have lower fuel efficiency. For transmission type (am), the confidence interval is (-0.736, 4.903). Because zero lies within this interval, there is not enough evidence to conclude that transmission type significantly affects fuel efficiency.
Overall, the confidence intervals support the results obtained from the p-values in the regression output.
Conclusion:
The multiple linear regression model successfully explained the relationship between fuel efficiency, weight, horsepower, and transmission type. The analysis showed that weight and horsepower significantly influence fuel efficiency, while transmission type does not have a significant effect. The model had a high R-squared value of 84%, suggesting that it provides a good fit to the data and can be used to predict vehicle fuel efficiency reasonably well.
#Homework 2
#Read about variable selection method"
#Fit the Full Regression Model
# Full model
full_model <- lm(mpg ~ ., data = mtcars)
# Summary
summary(full_model)
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
#Backward Elimination
backward_model <- step(full_model,
direction = "backward")
## Start: AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - cyl 1 0.0799 147.57 68.915
## - vs 1 0.1601 147.66 68.932
## - carb 1 0.4067 147.90 68.986
## - gear 1 1.3531 148.85 69.190
## - drat 1 1.6270 149.12 69.249
## - disp 1 3.9167 151.41 69.736
## - hp 1 6.8399 154.33 70.348
## - qsec 1 8.8641 156.36 70.765
## <none> 147.49 70.898
## - am 1 10.5467 158.04 71.108
## - wt 1 27.0144 174.51 74.280
##
## Step: AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - vs 1 0.2685 147.84 66.973
## - carb 1 0.5201 148.09 67.028
## - gear 1 1.8211 149.40 67.308
## - drat 1 1.9826 149.56 67.342
## - disp 1 3.9009 151.47 67.750
## - hp 1 7.3632 154.94 68.473
## <none> 147.57 68.915
## - qsec 1 10.0933 157.67 69.032
## - am 1 11.8359 159.41 69.384
## - wt 1 27.0280 174.60 72.297
##
## Step: AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
##
## Df Sum of Sq RSS AIC
## - carb 1 0.6855 148.53 65.121
## - gear 1 2.1437 149.99 65.434
## - drat 1 2.2139 150.06 65.449
## - disp 1 3.6467 151.49 65.753
## - hp 1 7.1060 154.95 66.475
## <none> 147.84 66.973
## - am 1 11.5694 159.41 67.384
## - qsec 1 15.6830 163.53 68.200
## - wt 1 27.3799 175.22 70.410
##
## Step: AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
##
## Df Sum of Sq RSS AIC
## - gear 1 1.565 150.09 63.457
## - drat 1 1.932 150.46 63.535
## <none> 148.53 65.121
## - disp 1 10.110 158.64 65.229
## - am 1 12.323 160.85 65.672
## - hp 1 14.826 163.35 66.166
## - qsec 1 26.408 174.94 68.358
## - wt 1 69.127 217.66 75.350
##
## Step: AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - drat 1 3.345 153.44 62.162
## - disp 1 8.545 158.64 63.229
## <none> 150.09 63.457
## - hp 1 13.285 163.38 64.171
## - am 1 20.036 170.13 65.466
## - qsec 1 25.574 175.67 66.491
## - wt 1 67.572 217.66 73.351
##
## Step: AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - disp 1 6.629 160.07 61.515
## <none> 153.44 62.162
## - hp 1 12.572 166.01 62.682
## - qsec 1 26.470 179.91 65.255
## - am 1 32.198 185.63 66.258
## - wt 1 69.043 222.48 72.051
##
## Step: AIC=61.52
## mpg ~ hp + wt + qsec + am
##
## Df Sum of Sq RSS AIC
## - hp 1 9.219 169.29 61.307
## <none> 160.07 61.515
## - qsec 1 20.225 180.29 63.323
## - am 1 25.993 186.06 64.331
## - wt 1 78.494 238.56 72.284
##
## Step: AIC=61.31
## mpg ~ wt + qsec + am
##
## Df Sum of Sq RSS AIC
## <none> 169.29 61.307
## - am 1 26.178 195.46 63.908
## - qsec 1 109.034 278.32 75.217
## - wt 1 183.347 352.63 82.790
# Final model
summary(backward_model)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## am 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
#Forward Selection
# Null model
null_model <- lm(mpg ~ 1, data = mtcars)
# Forward selection
forward_model <- step(null_model,
scope = list(lower = null_model,
upper = full_model),
direction = "forward")
## Start: AIC=115.94
## mpg ~ 1
##
## Df Sum of Sq RSS AIC
## + wt 1 847.73 278.32 73.217
## + cyl 1 817.71 308.33 76.494
## + disp 1 808.89 317.16 77.397
## + hp 1 678.37 447.67 88.427
## + drat 1 522.48 603.57 97.988
## + vs 1 496.53 629.52 99.335
## + am 1 405.15 720.90 103.672
## + carb 1 341.78 784.27 106.369
## + gear 1 259.75 866.30 109.552
## + qsec 1 197.39 928.66 111.776
## <none> 1126.05 115.943
##
## Step: AIC=73.22
## mpg ~ wt
##
## Df Sum of Sq RSS AIC
## + cyl 1 87.150 191.17 63.198
## + hp 1 83.274 195.05 63.840
## + qsec 1 82.858 195.46 63.908
## + vs 1 54.228 224.09 68.283
## + carb 1 44.602 233.72 69.628
## + disp 1 31.639 246.68 71.356
## <none> 278.32 73.217
## + drat 1 9.081 269.24 74.156
## + gear 1 1.137 277.19 75.086
## + am 1 0.002 278.32 75.217
##
## Step: AIC=63.2
## mpg ~ wt + cyl
##
## Df Sum of Sq RSS AIC
## + hp 1 14.5514 176.62 62.665
## + carb 1 13.7724 177.40 62.805
## <none> 191.17 63.198
## + qsec 1 10.5674 180.60 63.378
## + gear 1 3.0281 188.14 64.687
## + disp 1 2.6796 188.49 64.746
## + vs 1 0.7059 190.47 65.080
## + am 1 0.1249 191.05 65.177
## + drat 1 0.0010 191.17 65.198
##
## Step: AIC=62.66
## mpg ~ wt + cyl + hp
##
## Df Sum of Sq RSS AIC
## <none> 176.62 62.665
## + am 1 6.6228 170.00 63.442
## + disp 1 6.1762 170.44 63.526
## + carb 1 2.5187 174.10 64.205
## + drat 1 2.2453 174.38 64.255
## + qsec 1 1.4010 175.22 64.410
## + gear 1 0.8558 175.76 64.509
## + vs 1 0.0599 176.56 64.654
# Final model
summary(forward_model)
##
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9290 -1.5598 -0.5311 1.1850 5.8986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
## wt -3.16697 0.74058 -4.276 0.000199 ***
## cyl -0.94162 0.55092 -1.709 0.098480 .
## hp -0.01804 0.01188 -1.519 0.140015
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
## F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
#Stepwise Selection
stepwise_model <- step(null_model,
scope = list(lower = null_model,
upper = full_model),
direction = "both")
## Start: AIC=115.94
## mpg ~ 1
##
## Df Sum of Sq RSS AIC
## + wt 1 847.73 278.32 73.217
## + cyl 1 817.71 308.33 76.494
## + disp 1 808.89 317.16 77.397
## + hp 1 678.37 447.67 88.427
## + drat 1 522.48 603.57 97.988
## + vs 1 496.53 629.52 99.335
## + am 1 405.15 720.90 103.672
## + carb 1 341.78 784.27 106.369
## + gear 1 259.75 866.30 109.552
## + qsec 1 197.39 928.66 111.776
## <none> 1126.05 115.943
##
## Step: AIC=73.22
## mpg ~ wt
##
## Df Sum of Sq RSS AIC
## + cyl 1 87.15 191.17 63.198
## + hp 1 83.27 195.05 63.840
## + qsec 1 82.86 195.46 63.908
## + vs 1 54.23 224.09 68.283
## + carb 1 44.60 233.72 69.628
## + disp 1 31.64 246.68 71.356
## <none> 278.32 73.217
## + drat 1 9.08 269.24 74.156
## + gear 1 1.14 277.19 75.086
## + am 1 0.00 278.32 75.217
## - wt 1 847.73 1126.05 115.943
##
## Step: AIC=63.2
## mpg ~ wt + cyl
##
## Df Sum of Sq RSS AIC
## + hp 1 14.551 176.62 62.665
## + carb 1 13.772 177.40 62.805
## <none> 191.17 63.198
## + qsec 1 10.567 180.60 63.378
## + gear 1 3.028 188.14 64.687
## + disp 1 2.680 188.49 64.746
## + vs 1 0.706 190.47 65.080
## + am 1 0.125 191.05 65.177
## + drat 1 0.001 191.17 65.198
## - cyl 1 87.150 278.32 73.217
## - wt 1 117.162 308.33 76.494
##
## Step: AIC=62.66
## mpg ~ wt + cyl + hp
##
## Df Sum of Sq RSS AIC
## <none> 176.62 62.665
## - hp 1 14.551 191.17 63.198
## + am 1 6.623 170.00 63.442
## + disp 1 6.176 170.44 63.526
## - cyl 1 18.427 195.05 63.840
## + carb 1 2.519 174.10 64.205
## + drat 1 2.245 174.38 64.255
## + qsec 1 1.401 175.22 64.410
## + gear 1 0.856 175.76 64.509
## + vs 1 0.060 176.56 64.654
## - wt 1 115.354 291.98 76.750
# Final model
summary(stepwise_model)
##
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9290 -1.5598 -0.5311 1.1850 5.8986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
## wt -3.16697 0.74058 -4.276 0.000199 ***
## cyl -0.94162 0.55092 -1.709 0.098480 .
## hp -0.01804 0.01188 -1.519 0.140015
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
## F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
#Compare Models
AIC(full_model)
## [1] 163.7098
AIC(backward_model)
## [1] 154.1194
AIC(forward_model)
## [1] 155.4766
AIC(stepwise_model)
## [1] 155.4766
Interpretation:
Variable selection methods were applied to the mtcars dataset to identify the most important predictors of fuel efficiency (mpg). Three approaches were used: forward selection, backward elimination, and stepwise selection. The step() function in R was used with Akaike Information Criterion (AIC) as the selection criterion. The final selected model retained variables that contributed significantly to explaining variations in fuel efficiency while removing unnecessary predictors. The model with the lowest AIC was considered the best model.
Conclusion
In this assignment, variable selection methods were applied to the mtcars dataset to identify the most important variables affecting fuel consumption (mpg). Three methods were used: forward selection, backward elimination, and stepwise selection.
The backward elimination method selected weight (wt), quarter-mile time (qsec), and transmission type (am) as the final predictors. This model had the lowest AIC value (154.12), indicating the best balance between model fit and simplicity. The model also explained about 85% of the variation in mpg (R² = 0.8497).
Both forward selection and stepwise selection produced the same model containing weight (wt), number of cylinders (cyl), and horsepower (hp). However, these models had slightly higher AIC values than the backward elimination model.
Overall, the results suggest that vehicle weight is the most important factor influencing fuel efficiency, while transmission type and acceleration performance also contribute to predicting miles per gallon. Therefore, the model obtained from backward elimination was selected as the preferred model for this analysis.