Chapter 2: Exercise 10 — Boston Housing Data

(a) Load the Boston data and describe it

data("Boston")
dim(Boston)
## [1] 506  13
head(Boston)
##      crim zn indus chas   nox    rm  age    dis rad tax ptratio lstat medv
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3  4.98 24.0
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8  9.14 21.6
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8  4.03 34.7
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7  2.94 33.4
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7  5.33 36.2
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7  5.21 28.7
?Boston

(b) Pairwise scatterplots of the predictors

pairs(Boston, main = "Pairwise Scatterplots of Boston Predictors")

#Findings
#The scatterplots reveal several notable relationships:

#lstat and medv have a strong negative correlation—higher lower-status populations are linked to lower home values.

#rm and medv show a strong positive relationship—more rooms generally mean higher home values.

#nox and dis are negatively correlated, suggesting higher pollution in areas closer to employment centers.

#Variables like rad, tax, and ptratio show clustering, hinting at categorical-like behavior.

#Some relationships (e.g., crim vs. medv) are nonlinear but still show general trends.

(c) Correlations with per capita crime rate (crim)

cor(Boston$crim, Boston[-which(names(Boston) == "crim")])
##              zn     indus        chas       nox         rm       age        dis
## [1,] -0.2004692 0.4065834 -0.05589158 0.4209717 -0.2192467 0.3527343 -0.3796701
##            rad       tax   ptratio     lstat       medv
## [1,] 0.6255051 0.5827643 0.2899456 0.4556215 -0.3883046

(d) Outliers and predictor ranges

summary(Boston$crim)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.00632  0.08204  0.25651  3.61352  3.67708 88.97620
summary(Boston$tax)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   187.0   279.0   330.0   408.2   666.0   711.0
summary(Boston$ptratio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.60   17.40   19.05   18.46   20.20   22.00

(e) Number of tracts bounding the Charles River

sum(Boston$chas == 1)
## [1] 35

(f) Median pupil-teacher ratio

median(Boston$ptratio)
## [1] 19.05

(g) Tract with lowest median home value

min_index <- which.min(Boston$medv)
Boston[min_index, ]
##        crim zn indus chas   nox    rm age    dis rad tax ptratio lstat medv
## 399 38.3518  0  18.1    0 0.693 5.453 100 1.4896  24 666    20.2 30.59    5

(h) Number of tracts with more than 7 or 8 rooms per dwelling

sum(Boston$rm > 7)
## [1] 64
sum(Boston$rm > 8)
## [1] 13
Boston[Boston$rm > 8, ]
##        crim zn indus chas    nox    rm  age    dis rad tax ptratio lstat medv
## 98  0.12083  0  2.89    0 0.4450 8.069 76.0 3.4952   2 276    18.0  4.21 38.7
## 164 1.51902  0 19.58    1 0.6050 8.375 93.9 2.1620   5 403    14.7  3.32 50.0
## 205 0.02009 95  2.68    0 0.4161 8.034 31.9 5.1180   4 224    14.7  2.88 50.0
## 225 0.31533  0  6.20    0 0.5040 8.266 78.3 2.8944   8 307    17.4  4.14 44.8
## 226 0.52693  0  6.20    0 0.5040 8.725 83.0 2.8944   8 307    17.4  4.63 50.0
## 227 0.38214  0  6.20    0 0.5040 8.040 86.5 3.2157   8 307    17.4  3.13 37.6
## 233 0.57529  0  6.20    0 0.5070 8.337 73.3 3.8384   8 307    17.4  2.47 41.7
## 234 0.33147  0  6.20    0 0.5070 8.247 70.4 3.6519   8 307    17.4  3.95 48.3
## 254 0.36894 22  5.86    0 0.4310 8.259  8.4 8.9067   7 330    19.1  3.54 42.8
## 258 0.61154 20  3.97    0 0.6470 8.704 86.9 1.8010   5 264    13.0  5.12 50.0
## 263 0.52014 20  3.97    0 0.6470 8.398 91.5 2.2885   5 264    13.0  5.91 48.8
## 268 0.57834 20  3.97    0 0.5750 8.297 67.0 2.4216   5 264    13.0  7.44 50.0
## 365 3.47428  0 18.10    1 0.7180 8.780 82.9 1.9047  24 666    20.2  5.29 21.9

Chapter 3: Exercise 2 — KNN Classifier vs. Regression

## KNN Classifier:
# - Used for classification problems (categorical response).
# - Assigns the most frequent class among the k nearest neighbors.

## KNN Regression:
# - Used for regression problems (continuous response).
# - Predicts the average response value of the k nearest neighbors.

# Key differences:
# - Classifier outputs class label; regression outputs a numeric value.
# - Classifier uses majority vote; regression uses averaging.

Chapter 3: Exercise 10 — Carseats Regression

(a) Fit a multiple regression model

data("Carseats")
model_full <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(model_full)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

(b) Interpretation of coefficients

# Price: Sales decrease as price increases.
# UrbanYes: Difference in Sales for urban vs. non-urban stores.
# USYes: Difference in Sales for stores in US vs. elsewhere.

(c) Regression equation

# Sales = β0 + β1 * Price + β2 * UrbanYes + β3 * USYes + ε
# Use values from summary(model_full)

(d) Hypothesis tests

# Check p-values from model summary.
# Reject H0 if p-value < 0.05.

(e) Fit reduced model

model_reduced <- lm(Sales ~ Price + US, data = Carseats)
summary(model_reduced)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

(f) Compare model fits

summary(model_full)$adj.r.squared
## [1] 0.2335123
summary(model_reduced)$adj.r.squared
## [1] 0.2354305

(g) 95% Confidence intervals

confint(model_reduced)
##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632

(h) Outliers and leverage

par(mfrow = c(2, 2))
plot(model_reduced)

Chapter 4: Exercise 12 — Logistic Regression vs. Softmax

(a) Log odds in your model

# log(p_orange / (1 - p_orange)) = β0 + β1 * x

(b) Log odds in your friend’s model

# log(p_orange / p_apple) = (α_orange0 - α_apple0) + (α_orange1 - α_apple1) * x

(c) Match coefficients: β0 = 2, β1 = -1

# β0 = α_orange0 - α_apple0 = 2
# β1 = α_orange1 - α_apple1 = -1
# Example solution: α_orange0 = 2, α_orange1 = -1, α_apple0 = 0, α_apple1 = 0

(d) Friend’s softmax model estimates

# α_orange0 = 1.2, α_orange1 = -2
# α_apple0 = 3, α_apple1 = 0.6
# Then:
# β0 = 1.2 - 3 = -1.8
# β1 = -2 - 0.6 = -2.6

(e) Predictions comparison

# Both models give same class prediction if log-odds have same sign.
# Since β0 + β1 * x = (α_orange0 - α_apple0) + (α_orange1 - α_apple1) * x,
# the decision boundary is the same.
# Therefore, 100% agreement expected across test set.