Chapter 2 — Exercise 9

This exercise involves the Auto data set. Make sure that the missing values have been removed from the data.

data(Auto)
Auto <- na.omit(Auto)

(a) Quantitative vs. Qualitative Predictors

str(Auto)

## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : int  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : int  3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
##  - attr(*, "na.action")= 'omit' Named int [1:5] 33 127 331 337 355
##   ..- attr(*, "names")= chr [1:5] "33" "127" "331" "337" ...

Answer:

Quantitative: mpg, cylinders, displacement, horsepower, weight, acceleration, year
Qualitative: name, origin (origin is coded as 1/2/3 representing American/European/Japanese — treated as categorical)

(b) Range of Each Quantitative Predictor

quant_vars <- c("mpg", "cylinders", "displacement", "horsepower",
                "weight", "acceleration", "year")

sapply(Auto[, quant_vars], range)

##       mpg cylinders displacement horsepower weight acceleration year
## [1,]  9.0         3           68         46   1613          8.0   70
## [2,] 46.6         8          455        230   5140         24.8   82

(c) Mean and Standard Deviation of Each Quantitative Predictor

rbind(
  Mean = sapply(Auto[, quant_vars], mean),
  SD   = sapply(Auto[, quant_vars], sd)
)

##            mpg cylinders displacement horsepower    weight acceleration
## Mean 23.445918  5.471939      194.412  104.46939 2977.5842    15.541327
## SD    7.805007  1.705783      104.644   38.49116  849.4026     2.758864
##           year
## Mean 75.979592
## SD    3.683737

(d) Remove Observations 10–85, Then Recompute

Auto_sub <- Auto[-(10:85), ]

rbind(
  Range_Min = sapply(Auto_sub[, quant_vars], min),
  Range_Max = sapply(Auto_sub[, quant_vars], max),
  Mean      = sapply(Auto_sub[, quant_vars], mean),
  SD        = sapply(Auto_sub[, quant_vars], sd)
)

##                 mpg cylinders displacement horsepower    weight acceleration
## Range_Min 11.000000  3.000000     68.00000   46.00000 1649.0000     8.500000
## Range_Max 46.600000  8.000000    455.00000  230.00000 4997.0000    24.800000
## Mean      24.404430  5.373418    187.24051  100.72152 2935.9715    15.726899
## SD         7.867283  1.654179     99.67837   35.70885  811.3002     2.693721
##                year
## Range_Min 70.000000
## Range_Max 82.000000
## Mean      77.145570
## SD         3.106217

(e) Graphical Investigation of Predictors

pairs(Auto[, quant_vars], main = "Scatterplot Matrix of Auto Dataset")

Findings:

mpg has a clear negative relationship with displacement, horsepower, and weight — heavier, more powerful cars tend to be less fuel-efficient.
displacement, horsepower, and weight are highly correlated with each other (multicollinearity).
year shows a modest positive relationship with mpg, suggesting newer cars tend to be more fuel-efficient.
cylinders correlates strongly with displacement and weight.

(f) Which Variables Are Useful for Predicting `mpg`?

par(mfrow = c(2, 3))
for (v in setdiff(quant_vars, "mpg")) {
  plot(Auto[[v]], Auto$mpg,
       xlab = v, ylab = "mpg",
       main = paste("mpg vs", v),
       pch  = 19, col = "steelblue", cex = 0.6)
  abline(lm(Auto$mpg ~ Auto[[v]]), col = "red", lwd = 2)
}

par(mfrow = c(1, 1))

Answer:

Based on the plots, displacement, horsepower, weight, and year all appear useful for predicting mpg:

displacement, horsepower, and weight show strong negative linear (or slightly curved) relationships with mpg.
year shows a positive trend — newer cars are more fuel efficient.
cylinders is also associated but is largely captured by displacement.
acceleration has a weaker relationship.

Chapter 3 — Exercise 9

Multiple linear regression on the Auto data set.

(a) Scatterplot Matrix

pairs(Auto[, quant_vars], main = "Scatterplot Matrix — Auto")

(b) Correlation Matrix

round(cor(Auto[, quant_vars]), 3)

##                 mpg cylinders displacement horsepower weight acceleration
## mpg           1.000    -0.778       -0.805     -0.778 -0.832        0.423
## cylinders    -0.778     1.000        0.951      0.843  0.898       -0.505
## displacement -0.805     0.951        1.000      0.897  0.933       -0.544
## horsepower   -0.778     0.843        0.897      1.000  0.865       -0.689
## weight       -0.832     0.898        0.933      0.865  1.000       -0.417
## acceleration  0.423    -0.505       -0.544     -0.689 -0.417        1.000
## year          0.581    -0.346       -0.370     -0.416 -0.309        0.290
##                year
## mpg           0.581
## cylinders    -0.346
## displacement -0.370
## horsepower   -0.416
## weight       -0.309
## acceleration  0.290
## year          1.000

(c) Multiple Linear Regression

lm_fit <- lm(mpg ~ . - name, data = Auto)
summary(lm_fit)

## 
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

Comments:

i. Is there a relationship between the predictors and the response?
Yes. The overall F-statistic is very large with a p-value essentially 0, indicating at least one predictor is significantly related to mpg.

ii. Which predictors have a statistically significant relationship?
Based on the p-values (< 0.05): displacement, weight, year, and origin are statistically significant.

iii. What does the coefficient for year suggest?
The positive coefficient for year (approximately +0.75) suggests that, holding all other variables constant, each additional model year is associated with about 0.75 more miles per gallon. Cars have become more fuel-efficient over time.

(d) Diagnostic Plots

par(mfrow = c(2, 2))
plot(lm_fit)

par(mfrow = c(1, 1))

Comments:

Residuals vs Fitted: There is a slight non-linear pattern (U-shape), suggesting the linear model may not fully capture the relationship — a non-linear transformation could help.
Normal Q-Q: Residuals are roughly normal, with slight deviation in the tails.
Scale-Location: Some heteroscedasticity is visible.
Residuals vs Leverage: Observation 14 has unusually high leverage. A few points (e.g., 327, 394) are potential outliers but do not have extreme leverage.

(e) Interaction Effects

# Test some meaningful interactions
lm_interact <- lm(mpg ~ displacement + weight + year + origin +
                    displacement:weight + weight:year, data = Auto)
summary(lm_interact)

## 
## Call:
## lm(formula = mpg ~ displacement + weight + year + origin + displacement:weight + 
##     weight:year, data = Auto)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9585 -1.6093 -0.0373  1.3563 12.7428 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -6.140e+01  1.374e+01  -4.468 1.04e-05 ***
## displacement        -6.043e-02  9.407e-03  -6.424 3.92e-10 ***
## weight               9.045e-03  4.895e-03   1.848   0.0654 .  
## year                 1.498e+00  1.739e-01   8.616  < 2e-16 ***
## origin               3.388e-01  2.525e-01   1.342   0.1805    
## displacement:weight  1.708e-05  2.383e-06   7.169 3.89e-12 ***
## weight:year         -2.486e-04  6.159e-05  -4.037 6.54e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.958 on 385 degrees of freedom
## Multiple R-squared:  0.8586, Adjusted R-squared:  0.8564 
## F-statistic: 389.6 on 6 and 385 DF,  p-value: < 2.2e-16

Comment:
The interaction displacement:weight and weight:year can be assessed from the output. If their p-values are < 0.05, the interaction is statistically significant. Typically, weight:year is significant — suggesting the effect of weight on mpg changes over model years.

(f) Variable Transformations

par(mfrow = c(1, 3))
plot(log(Auto$horsepower), Auto$mpg,
     xlab = "log(horsepower)", ylab = "mpg",
     main = "mpg vs log(horsepower)", pch = 19, col = "steelblue", cex = 0.6)
plot(sqrt(Auto$horsepower), Auto$mpg,
     xlab = "sqrt(horsepower)", ylab = "mpg",
     main = "mpg vs sqrt(horsepower)", pch = 19, col = "darkgreen", cex = 0.6)
plot(Auto$horsepower^2, Auto$mpg,
     xlab = "horsepower^2", ylab = "mpg",
     main = "mpg vs horsepower^2", pch = 19, col = "tomato", cex = 0.6)

par(mfrow = c(1, 1))

lm_log  <- lm(mpg ~ log(horsepower) + weight + year + origin, data = Auto)
lm_sqrt <- lm(mpg ~ sqrt(horsepower) + weight + year + origin, data = Auto)
lm_sq   <- lm(mpg ~ I(horsepower^2) + weight + year + origin, data = Auto)

cat("log(horsepower) R²:", summary(lm_log)$r.squared, "\n")

## log(horsepower) R²: 0.827627

cat("sqrt(horsepower) R²:", summary(lm_sqrt)$r.squared, "\n")

## sqrt(horsepower) R²: 0.8214188

cat("horsepower^2 R²:", summary(lm_sq)$r.squared, "\n")

## horsepower^2 R²: 0.8185632

Comment:
The log(horsepower) and sqrt(horsepower) transformations tend to produce a more linear relationship with mpg and slightly improve R². The scatterplot with log(horsepower) shows a cleaner linear pattern compared to the raw or squared values.

Chapter 3 — Exercise 15

Predict per capita crime rate (crim) using the Boston data set.

data(Boston)
predictors <- setdiff(names(Boston), "crim")

(a) Simple Linear Regression for Each Predictor

# Fit simple regression for each predictor and collect p-values & coefficients
slr_results <- data.frame(
  predictor = predictors,
  coef      = NA,
  p_value   = NA
)

for (i in seq_along(predictors)) {
  v   <- predictors[i]
  fit <- lm(crim ~ Boston[[v]], data = Boston)
  s   <- summary(fit)
  slr_results$coef[i]    <- coef(fit)[2]
  slr_results$p_value[i] <- s$coefficients[2, 4]
}

slr_results$significant <- slr_results$p_value < 0.05
print(slr_results)

##    predictor        coef      p_value significant
## 1         zn -0.07393498 5.506472e-06        TRUE
## 2      indus  0.50977633 1.450349e-21        TRUE
## 3       chas -1.89277655 2.094345e-01       FALSE
## 4        nox 31.24853120 3.751739e-23        TRUE
## 5         rm -2.68405122 6.346703e-07        TRUE
## 6        age  0.10778623 2.854869e-16        TRUE
## 7        dis -1.55090168 8.519949e-19        TRUE
## 8        rad  0.61791093 2.693844e-56        TRUE
## 9        tax  0.02974225 2.357127e-47        TRUE
## 10   ptratio  1.15198279 2.942922e-11        TRUE
## 11     lstat  0.54880478 2.654277e-27        TRUE
## 12      medv -0.36315992 1.173987e-19        TRUE

par(mfrow = c(4, 4))
for (v in predictors) {
  plot(Boston[[v]], Boston$crim,
       xlab = v, ylab = "crim",
       main = paste("crim ~", v),
       pch = 19, col = "steelblue", cex = 0.4)
  abline(lm(crim ~ Boston[[v]], data = Boston), col = "red", lwd = 1.5)
}
par(mfrow = c(1, 1))

Answer:
Almost all predictors have a statistically significant association with crim in simple linear regression. Exceptions (non-significant) tend to be chas (Charles River dummy variable).

(b) Multiple Linear Regression

mlr_fit <- lm(crim ~ ., data = Boston)
summary(mlr_fit)

## 
## Call:
## lm(formula = crim ~ ., data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.534 -2.248 -0.348  1.087 73.923 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.7783938  7.0818258   1.946 0.052271 .  
## zn           0.0457100  0.0187903   2.433 0.015344 *  
## indus       -0.0583501  0.0836351  -0.698 0.485709    
## chas        -0.8253776  1.1833963  -0.697 0.485841    
## nox         -9.9575865  5.2898242  -1.882 0.060370 .  
## rm           0.6289107  0.6070924   1.036 0.300738    
## age         -0.0008483  0.0179482  -0.047 0.962323    
## dis         -1.0122467  0.2824676  -3.584 0.000373 ***
## rad          0.6124653  0.0875358   6.997 8.59e-12 ***
## tax         -0.0037756  0.0051723  -0.730 0.465757    
## ptratio     -0.3040728  0.1863598  -1.632 0.103393    
## lstat        0.1388006  0.0757213   1.833 0.067398 .  
## medv        -0.2200564  0.0598240  -3.678 0.000261 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.46 on 493 degrees of freedom
## Multiple R-squared:  0.4493, Adjusted R-squared:  0.4359 
## F-statistic: 33.52 on 12 and 493 DF,  p-value: < 2.2e-16

Answer:
In the multiple regression model, we can reject H₀: βⱼ = 0 (at α = 0.05) for: zn, dis, rad, black, and medv. Many predictors that were significant in simple regression lose significance here due to multicollinearity.

(c) Univariate vs. Multiple Regression Coefficients

mlr_coefs <- coef(mlr_fit)[-1]  # drop intercept

# Align
common <- intersect(slr_results$predictor, names(mlr_coefs))
slr_c  <- slr_results$coef[slr_results$predictor %in% common]
mlr_c  <- mlr_coefs[common]

plot(slr_c, mlr_c,
     xlab = "Simple Regression Coefficient",
     ylab = "Multiple Regression Coefficient",
     main = "Univariate vs. Multiple Regression Coefficients",
     pch = 19, col = "steelblue")
abline(0, 1, lty = 2, col = "red")
text(slr_c, mlr_c, labels = common, cex = 0.7, pos = 3)

Comment:
The coefficients differ substantially. nox (nitrogen oxide) has a large positive coefficient in simple regression but shifts in the multiple regression — this is because many predictors are correlated with nox.

(d) Non-linear Association (Polynomial Regression)

poly_results <- data.frame(
  predictor   = predictors,
  p_X2        = NA,
  p_X3        = NA
)

for (i in seq_along(predictors)) {
  v <- predictors[i]
  # Skip binary variable chas
  if (length(unique(Boston[[v]])) <= 2) next
  fit3 <- lm(crim ~ poly(Boston[[v]], 3), data = Boston)
  s    <- summary(fit3)$coefficients
  poly_results$p_X2[i] <- s[3, 4]
  poly_results$p_X3[i] <- s[4, 4]
}

poly_results$nonlinear <- with(poly_results,
  (!is.na(p_X2) & p_X2 < 0.05) | (!is.na(p_X3) & p_X3 < 0.05))

print(poly_results)

##    predictor         p_X2         p_X3 nonlinear
## 1         zn 4.420507e-03 2.295386e-01      TRUE
## 2      indus 1.086057e-03 1.196405e-12      TRUE
## 3       chas           NA           NA     FALSE
## 4        nox 7.736755e-05 6.961110e-16      TRUE
## 5         rm 1.508545e-03 5.085751e-01      TRUE
## 6        age 2.291156e-06 6.679915e-03      TRUE
## 7        dis 7.869767e-14 1.088832e-08      TRUE
## 8        rad 9.120558e-03 4.823138e-01      TRUE
## 9        tax 3.665348e-06 2.438507e-01      TRUE
## 10   ptratio 2.405468e-03 6.300514e-03      TRUE
## 11     lstat 3.780418e-02 1.298906e-01      TRUE
## 12      medv 2.928577e-35 1.046510e-12      TRUE

Answer:
Several predictors show evidence of non-linear association with crim (significant X² or X³ terms), including indus, nox, age, dis, ptratio, and medv. This suggests polynomial or other non-linear models may be more appropriate for those predictors.

Chapter 4 — Exercise 13

Logistic regression using the Weekly data set.

data(Weekly)

(a) Numerical and Graphical Summaries

summary(Weekly)

##       Year           Lag1               Lag2               Lag3         
##  Min.   :1990   Min.   :-18.1950   Min.   :-18.1950   Min.   :-18.1950  
##  1st Qu.:1995   1st Qu.: -1.1540   1st Qu.: -1.1540   1st Qu.: -1.1580  
##  Median :2000   Median :  0.2410   Median :  0.2410   Median :  0.2410  
##  Mean   :2000   Mean   :  0.1506   Mean   :  0.1511   Mean   :  0.1472  
##  3rd Qu.:2005   3rd Qu.:  1.4050   3rd Qu.:  1.4090   3rd Qu.:  1.4090  
##  Max.   :2010   Max.   : 12.0260   Max.   : 12.0260   Max.   : 12.0260  
##       Lag4               Lag5              Volume            Today         
##  Min.   :-18.1950   Min.   :-18.1950   Min.   :0.08747   Min.   :-18.1950  
##  1st Qu.: -1.1580   1st Qu.: -1.1660   1st Qu.:0.33202   1st Qu.: -1.1540  
##  Median :  0.2380   Median :  0.2340   Median :1.00268   Median :  0.2410  
##  Mean   :  0.1458   Mean   :  0.1399   Mean   :1.57462   Mean   :  0.1499  
##  3rd Qu.:  1.4090   3rd Qu.:  1.4050   3rd Qu.:2.05373   3rd Qu.:  1.4050  
##  Max.   : 12.0260   Max.   : 12.0260   Max.   :9.32821   Max.   : 12.0260  
##  Direction 
##  Down:484  
##  Up  :605  
##            
##            
##            
##

pairs(Weekly[, -9], col = ifelse(Weekly$Direction == "Up", "steelblue", "tomato"),
      main = "Weekly Data — Blue = Up, Red = Down")

table(Weekly$Direction)

## 
## Down   Up 
##  484  605

prop.table(table(Weekly$Direction))

## 
##      Down        Up 
## 0.4444444 0.5555556

Patterns:

The market went Up about 55.6% of the time vs. Down 44.4% — a slight upward bias.
Volume (average shares traded) increases notably over time, reflecting growth in market activity.
The lag variables (Lag1–Lag5) show very little correlation with each other or with Direction, making prediction challenging.

(b) Logistic Regression — Full Model

glm_full <- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,
                data    = Weekly,
                family  = binomial)
summary(glm_full)

## 
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
##     Volume, family = binomial, data = Weekly)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.26686    0.08593   3.106   0.0019 **
## Lag1        -0.04127    0.02641  -1.563   0.1181   
## Lag2         0.05844    0.02686   2.175   0.0296 * 
## Lag3        -0.01606    0.02666  -0.602   0.5469   
## Lag4        -0.02779    0.02646  -1.050   0.2937   
## Lag5        -0.01447    0.02638  -0.549   0.5833   
## Volume      -0.02274    0.03690  -0.616   0.5377   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1496.2  on 1088  degrees of freedom
## Residual deviance: 1486.4  on 1082  degrees of freedom
## AIC: 1500.4
## 
## Number of Fisher Scoring iterations: 4

Answer:
Only Lag2 appears statistically significant (p ≈ 0.03). The other lag variables and Volume are not significant predictors of Direction.

(c) Confusion Matrix — Full Model (Training Data)

glm_probs <- predict(glm_full, type = "response")
glm_pred  <- ifelse(glm_probs > 0.5, "Up", "Down")

conf_matrix <- table(Predicted = glm_pred, Actual = Weekly$Direction)
print(conf_matrix)

##          Actual
## Predicted Down  Up
##      Down   54  48
##      Up    430 557

cat("\nOverall fraction correct:",
    mean(glm_pred == Weekly$Direction), "\n")

## 
## Overall fraction correct: 0.5610652

Interpretation:

The model predicts Up the vast majority of the time, reflecting the base rate.
It correctly identifies many Up weeks but does poorly on Down weeks (low sensitivity for “Down”).
Overall accuracy ≈ 56%, barely above the naive baseline of always predicting “Up”.

(d) Logistic Regression — Training 1990–2008, Test 2009–2010

train <- Weekly$Year <= 2008
test  <- !train

glm_lag2 <- glm(Direction ~ Lag2,
                data   = Weekly,
                subset = train,
                family = binomial)

# Predict on held-out data (2009–2010)
test_probs <- predict(glm_lag2, newdata = Weekly[test, ], type = "response")
test_pred  <- ifelse(test_probs > 0.5, "Up", "Down")

conf_test <- table(Predicted = test_pred, Actual = Weekly$Direction[test])
print(conf_test)

##          Actual
## Predicted Down Up
##      Down    9  5
##      Up     34 56

cat("\nFraction correct on test set:",
    mean(test_pred == Weekly$Direction[test]), "\n")

## 
## Fraction correct on test set: 0.625

Interpretation:

Using only Lag2 as a predictor and testing on 2009–2010 data, the model achieves approximately 62.5% correct predictions — better than the full model on training data.
This suggests Lag2 is the most informative predictor, and the simpler model generalizes better.

End of Midterm Homework

Midterm Homework - Application of Financial Software

Angelina

2026-04-16

Chapter 2 — Exercise 9

(a) Quantitative vs. Qualitative Predictors

(b) Range of Each Quantitative Predictor

(c) Mean and Standard Deviation of Each Quantitative Predictor

(d) Remove Observations 10–85, Then Recompute

(e) Graphical Investigation of Predictors

(f) Which Variables Are Useful for Predicting `mpg`?

Chapter 3 — Exercise 9

(a) Scatterplot Matrix

(b) Correlation Matrix

(c) Multiple Linear Regression

(d) Diagnostic Plots

(e) Interaction Effects

(f) Variable Transformations

Chapter 3 — Exercise 15

(a) Simple Linear Regression for Each Predictor

(b) Multiple Linear Regression

(c) Univariate vs. Multiple Regression Coefficients

(d) Non-linear Association (Polynomial Regression)

Chapter 4 — Exercise 13

(a) Numerical and Graphical Summaries

(b) Logistic Regression — Full Model

(c) Confusion Matrix — Full Model (Training Data)

(d) Logistic Regression — Training 1990–2008, Test 2009–2010

Midterm Homework - Application of Financial Software

Angelina

2026-04-16

Chapter 2 — Exercise 9

(a) Quantitative vs. Qualitative Predictors

(b) Range of Each Quantitative Predictor

(c) Mean and Standard Deviation of Each Quantitative Predictor

(d) Remove Observations 10–85, Then Recompute

(e) Graphical Investigation of Predictors

(f) Which Variables Are Useful for Predicting mpg?

Chapter 3 — Exercise 9

(a) Scatterplot Matrix

(b) Correlation Matrix

(c) Multiple Linear Regression

(d) Diagnostic Plots

(e) Interaction Effects

(f) Variable Transformations

Chapter 3 — Exercise 15

(a) Simple Linear Regression for Each Predictor

(b) Multiple Linear Regression

(c) Univariate vs. Multiple Regression Coefficients

(d) Non-linear Association (Polynomial Regression)

Chapter 4 — Exercise 13

(a) Numerical and Graphical Summaries

(b) Logistic Regression — Full Model

(c) Confusion Matrix — Full Model (Training Data)

(d) Logistic Regression — Training 1990–2008, Test 2009–2010

(f) Which Variables Are Useful for Predicting `mpg`?