#Load and Explore the Data

# Load dataset
data(mtcars)

# View first few rows
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# Summary statistics
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

Interpretation of the mtcars Summary Statistics:

The mtcars dataset contains information on 32 cars and their performance characteristics.

The average miles per gallon (mpg) is 20.09, with values ranging from 10.4 to 33.9. This shows that fuel efficiency varies considerably among the cars. The average number of cylinders (cyl) is about 6.19, indicating that most cars have either 4, 6, or 8 cylinders. The average horsepower (hp) is 146.7, but it ranges from 52 to 335, showing large differences in engine power. The average weight (wt) is 3.22 (1000 lbs), meaning that most cars weigh around 3,217 pounds. The average rear axle ratio (drat) is 3.60, while the average quarter-mile time (qsec) is 17.85 seconds. For transmission type (am), the mean is 0.41, which indicates that about 41% of the cars have manual transmission and 59% have automatic transmission. The average number of gears is 3.69, with most cars having 3 or 4 gears. The average number of carburetors (carb) is 2.81, ranging from 1 to 8.

Overall Conclusion:

The summary statistics show that the cars in the dataset differ substantially in terms of fuel efficiency, engine size, horsepower, and weight. The dataset includes both small, fuel-efficient cars and larger, more powerful vehicles, making it useful for analyzing relationships between vehicle characteristics and performance.

#Fit a Multiple Linear Regression Model

# Fit model
model <- lm(mpg ~ wt + hp + am, data = mtcars)

# Display results
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt + hp + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4221 -1.7924 -0.3788  1.2249  5.5317 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.002875   2.642659  12.867 2.82e-13 ***
## wt          -2.878575   0.904971  -3.181 0.003574 ** 
## hp          -0.037479   0.009605  -3.902 0.000546 ***
## am           2.083710   1.376420   1.514 0.141268    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8227 
## F-statistic: 48.96 on 3 and 28 DF,  p-value: 2.908e-11

Interpretation of the Multiple Linear Regression Model:

A multiple linear regression model was fitted to examine how weight (wt), horsepower (hp), and transmission type (am) affect a car’s fuel efficiency (mpg).

Coefficients Interpretation:

Weight (wt) has a coefficient of -2.88 and is statistically significant (p = 0.0036). This means that, holding other variables constant, an increase of 1 unit in weight is associated with a decrease of about 2.88 mpg. Horsepower (hp) has a coefficient of -0.037 and is statistically significant (p = 0.0005). This indicates that cars with higher horsepower tend to have lower fuel efficiency. Transmission type (am) has a coefficient of 2.08, suggesting that manual transmission cars may have about 2.08 mpg higher fuel efficiency than automatic cars. However, this effect is not statistically significant (p = 0.141), so there is not enough evidence to conclude that transmission type affects mpg in this model.

Model Performance:

The R-squared value (0.8399) shows that approximately 84% of the variation in mpg is explained by weight, horsepower, and transmission type. The Adjusted R-squared (0.8227) indicates that the model still explains about 82% of the variation after accounting for the number of predictors. The F-statistic p-value (2.908e-11) is very small, indicating that the overall regression model is statistically significant.

Conclusion;

The results show that weight and horsepower are important predictors of fuel efficiency, with heavier and more powerful cars generally having lower mpg. Although manual transmission cars appear to have better fuel efficiency, the effect is not statistically significant. Overall, the model provides a good explanation of fuel efficiency, accounting for about 84% of the variation in mpg.

#Check Model Assumptions
par(mfrow = c(2, 2))
plot(model)

# Confidence intervals
confint(model)
##                   2.5 %      97.5 %
## (Intercept) 28.58963286 39.41611738
## wt          -4.73232353 -1.02482730
## hp          -0.05715454 -0.01780291
## am          -0.73575874  4.90317900
# Predictions
predict(model)
##           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
##           24.422058           23.688021           25.922769           20.625595 
##   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
##           17.541799           20.107738           14.544073           22.496539 
##            Merc 230            Merc 280           Merc 280C          Merc 450SE 
##           21.374884           19.490692           19.490692           15.540903 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
##           16.519618           16.375689           11.207215           10.331556 
##   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
##            9.996783           27.280123           29.488792           28.368282 
##       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
##           23.271750           18.248481           18.493160           13.766858 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
##           16.375976           28.042946           26.515870           27.496205 
##      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
##           17.067118           21.554154           13.254698           23.998964

Interpretation of Confidence Intervals:

The 95% confidence intervals provide a range of values within which the true coefficient is likely to fall.

For weight (wt), the confidence interval is (-4.732, -1.025). Since zero is not included in this interval, weight has a significant effect on fuel efficiency. This means heavier cars tend to have lower mpg. For horsepower (hp), the confidence interval is (-0.057, -0.018). Since zero is not included, horsepower is also a significant predictor. Cars with higher horsepower generally have lower fuel efficiency. For transmission type (am), the confidence interval is (-0.736, 4.903). Because zero lies within this interval, there is not enough evidence to conclude that transmission type significantly affects fuel efficiency.

Overall, the confidence intervals support the results obtained from the p-values in the regression output.

Conclusion:

The multiple linear regression model successfully explained the relationship between fuel efficiency, weight, horsepower, and transmission type. The analysis showed that weight and horsepower significantly influence fuel efficiency, while transmission type does not have a significant effect. The model had a high R-squared value of 84%, suggesting that it provides a good fit to the data and can be used to predict vehicle fuel efficiency reasonably well.

#Homework 2
#Read about variable selection method"
#Fit the Full Regression Model
# Full model
full_model <- lm(mpg ~ ., data = mtcars)

# Summary
summary(full_model)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07
#Backward Elimination

backward_model <- step(full_model,
                       direction = "backward")
## Start:  AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   1    0.0799 147.57 68.915
## - vs    1    0.1601 147.66 68.932
## - carb  1    0.4067 147.90 68.986
## - gear  1    1.3531 148.85 69.190
## - drat  1    1.6270 149.12 69.249
## - disp  1    3.9167 151.41 69.736
## - hp    1    6.8399 154.33 70.348
## - qsec  1    8.8641 156.36 70.765
## <none>              147.49 70.898
## - am    1   10.5467 158.04 71.108
## - wt    1   27.0144 174.51 74.280
## 
## Step:  AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1    0.2685 147.84 66.973
## - carb  1    0.5201 148.09 67.028
## - gear  1    1.8211 149.40 67.308
## - drat  1    1.9826 149.56 67.342
## - disp  1    3.9009 151.47 67.750
## - hp    1    7.3632 154.94 68.473
## <none>              147.57 68.915
## - qsec  1   10.0933 157.67 69.032
## - am    1   11.8359 159.41 69.384
## - wt    1   27.0280 174.60 72.297
## 
## Step:  AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - carb  1    0.6855 148.53 65.121
## - gear  1    2.1437 149.99 65.434
## - drat  1    2.2139 150.06 65.449
## - disp  1    3.6467 151.49 65.753
## - hp    1    7.1060 154.95 66.475
## <none>              147.84 66.973
## - am    1   11.5694 159.41 67.384
## - qsec  1   15.6830 163.53 68.200
## - wt    1   27.3799 175.22 70.410
## 
## Step:  AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
## 
##        Df Sum of Sq    RSS    AIC
## - gear  1     1.565 150.09 63.457
## - drat  1     1.932 150.46 63.535
## <none>              148.53 65.121
## - disp  1    10.110 158.64 65.229
## - am    1    12.323 160.85 65.672
## - hp    1    14.826 163.35 66.166
## - qsec  1    26.408 174.94 68.358
## - wt    1    69.127 217.66 75.350
## 
## Step:  AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1     3.345 153.44 62.162
## - disp  1     8.545 158.64 63.229
## <none>              150.09 63.457
## - hp    1    13.285 163.38 64.171
## - am    1    20.036 170.13 65.466
## - qsec  1    25.574 175.67 66.491
## - wt    1    67.572 217.66 73.351
## 
## Step:  AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     6.629 160.07 61.515
## <none>              153.44 62.162
## - hp    1    12.572 166.01 62.682
## - qsec  1    26.470 179.91 65.255
## - am    1    32.198 185.63 66.258
## - wt    1    69.043 222.48 72.051
## 
## Step:  AIC=61.52
## mpg ~ hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - hp    1     9.219 169.29 61.307
## <none>              160.07 61.515
## - qsec  1    20.225 180.29 63.323
## - am    1    25.993 186.06 64.331
## - wt    1    78.494 238.56 72.284
## 
## Step:  AIC=61.31
## mpg ~ wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## <none>              169.29 61.307
## - am    1    26.178 195.46 63.908
## - qsec  1   109.034 278.32 75.217
## - wt    1   183.347 352.63 82.790
# Final model
summary(backward_model)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11
#Forward Selection
# Null model
null_model <- lm(mpg ~ 1, data = mtcars)

# Forward selection
forward_model <- step(null_model,
                      scope = list(lower = null_model,
                                   upper = full_model),
                      direction = "forward")
## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + cyl   1    817.71  308.33  76.494
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## + drat  1    522.48  603.57  97.988
## + vs    1    496.53  629.52  99.335
## + am    1    405.15  720.90 103.672
## + carb  1    341.78  784.27 106.369
## + gear  1    259.75  866.30 109.552
## + qsec  1    197.39  928.66 111.776
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq    RSS    AIC
## + cyl   1    87.150 191.17 63.198
## + hp    1    83.274 195.05 63.840
## + qsec  1    82.858 195.46 63.908
## + vs    1    54.228 224.09 68.283
## + carb  1    44.602 233.72 69.628
## + disp  1    31.639 246.68 71.356
## <none>              278.32 73.217
## + drat  1     9.081 269.24 74.156
## + gear  1     1.137 277.19 75.086
## + am    1     0.002 278.32 75.217
## 
## Step:  AIC=63.2
## mpg ~ wt + cyl
## 
##        Df Sum of Sq    RSS    AIC
## + hp    1   14.5514 176.62 62.665
## + carb  1   13.7724 177.40 62.805
## <none>              191.17 63.198
## + qsec  1   10.5674 180.60 63.378
## + gear  1    3.0281 188.14 64.687
## + disp  1    2.6796 188.49 64.746
## + vs    1    0.7059 190.47 65.080
## + am    1    0.1249 191.05 65.177
## + drat  1    0.0010 191.17 65.198
## 
## Step:  AIC=62.66
## mpg ~ wt + cyl + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              176.62 62.665
## + am    1    6.6228 170.00 63.442
## + disp  1    6.1762 170.44 63.526
## + carb  1    2.5187 174.10 64.205
## + drat  1    2.2453 174.38 64.255
## + qsec  1    1.4010 175.22 64.410
## + gear  1    0.8558 175.76 64.509
## + vs    1    0.0599 176.56 64.654
# Final model
summary(forward_model)
## 
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9290 -1.5598 -0.5311  1.1850  5.8986 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.75179    1.78686  21.687  < 2e-16 ***
## wt          -3.16697    0.74058  -4.276 0.000199 ***
## cyl         -0.94162    0.55092  -1.709 0.098480 .  
## hp          -0.01804    0.01188  -1.519 0.140015    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared:  0.8431, Adjusted R-squared:  0.8263 
## F-statistic: 50.17 on 3 and 28 DF,  p-value: 2.184e-11
#Stepwise Selection

stepwise_model <- step(null_model,
                       scope = list(lower = null_model,
                                    upper = full_model),
                       direction = "both")
## Start:  AIC=115.94
## mpg ~ 1
## 
##        Df Sum of Sq     RSS     AIC
## + wt    1    847.73  278.32  73.217
## + cyl   1    817.71  308.33  76.494
## + disp  1    808.89  317.16  77.397
## + hp    1    678.37  447.67  88.427
## + drat  1    522.48  603.57  97.988
## + vs    1    496.53  629.52  99.335
## + am    1    405.15  720.90 103.672
## + carb  1    341.78  784.27 106.369
## + gear  1    259.75  866.30 109.552
## + qsec  1    197.39  928.66 111.776
## <none>              1126.05 115.943
## 
## Step:  AIC=73.22
## mpg ~ wt
## 
##        Df Sum of Sq     RSS     AIC
## + cyl   1     87.15  191.17  63.198
## + hp    1     83.27  195.05  63.840
## + qsec  1     82.86  195.46  63.908
## + vs    1     54.23  224.09  68.283
## + carb  1     44.60  233.72  69.628
## + disp  1     31.64  246.68  71.356
## <none>               278.32  73.217
## + drat  1      9.08  269.24  74.156
## + gear  1      1.14  277.19  75.086
## + am    1      0.00  278.32  75.217
## - wt    1    847.73 1126.05 115.943
## 
## Step:  AIC=63.2
## mpg ~ wt + cyl
## 
##        Df Sum of Sq    RSS    AIC
## + hp    1    14.551 176.62 62.665
## + carb  1    13.772 177.40 62.805
## <none>              191.17 63.198
## + qsec  1    10.567 180.60 63.378
## + gear  1     3.028 188.14 64.687
## + disp  1     2.680 188.49 64.746
## + vs    1     0.706 190.47 65.080
## + am    1     0.125 191.05 65.177
## + drat  1     0.001 191.17 65.198
## - cyl   1    87.150 278.32 73.217
## - wt    1   117.162 308.33 76.494
## 
## Step:  AIC=62.66
## mpg ~ wt + cyl + hp
## 
##        Df Sum of Sq    RSS    AIC
## <none>              176.62 62.665
## - hp    1    14.551 191.17 63.198
## + am    1     6.623 170.00 63.442
## + disp  1     6.176 170.44 63.526
## - cyl   1    18.427 195.05 63.840
## + carb  1     2.519 174.10 64.205
## + drat  1     2.245 174.38 64.255
## + qsec  1     1.401 175.22 64.410
## + gear  1     0.856 175.76 64.509
## + vs    1     0.060 176.56 64.654
## - wt    1   115.354 291.98 76.750
# Final model
summary(stepwise_model)
## 
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9290 -1.5598 -0.5311  1.1850  5.8986 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.75179    1.78686  21.687  < 2e-16 ***
## wt          -3.16697    0.74058  -4.276 0.000199 ***
## cyl         -0.94162    0.55092  -1.709 0.098480 .  
## hp          -0.01804    0.01188  -1.519 0.140015    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared:  0.8431, Adjusted R-squared:  0.8263 
## F-statistic: 50.17 on 3 and 28 DF,  p-value: 2.184e-11
#Compare Models
AIC(full_model)
## [1] 163.7098
AIC(backward_model)
## [1] 154.1194
AIC(forward_model)
## [1] 155.4766
AIC(stepwise_model)
## [1] 155.4766

Interpretation:

Variable selection methods were applied to the mtcars dataset to identify the most important predictors of fuel efficiency (mpg). Three approaches were used: forward selection, backward elimination, and stepwise selection. The step() function in R was used with Akaike Information Criterion (AIC) as the selection criterion. The final selected model retained variables that contributed significantly to explaining variations in fuel efficiency while removing unnecessary predictors. The model with the lowest AIC was considered the best model.

Conclusion

In this assignment, variable selection methods were applied to the mtcars dataset to identify the most important variables affecting fuel consumption (mpg). Three methods were used: forward selection, backward elimination, and stepwise selection.

The backward elimination method selected weight (wt), quarter-mile time (qsec), and transmission type (am) as the final predictors. This model had the lowest AIC value (154.12), indicating the best balance between model fit and simplicity. The model also explained about 85% of the variation in mpg (R² = 0.8497).

Both forward selection and stepwise selection produced the same model containing weight (wt), number of cylinders (cyl), and horsepower (hp). However, these models had slightly higher AIC values than the backward elimination model.

Overall, the results suggest that vehicle weight is the most important factor influencing fuel efficiency, while transmission type and acceleration performance also contribute to predicting miles per gallon. Therefore, the model obtained from backward elimination was selected as the preferred model for this analysis.