Chapter 2: Exercise 10 — Boston Housing Data
(a) Load the Boston data and describe it
data("Boston")
dim(Boston)
## [1] 506 13
head(Boston)
## crim zn indus chas nox rm age dis rad tax ptratio lstat medv
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 4.98 24.0
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 9.14 21.6
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 4.03 34.7
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 2.94 33.4
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 5.33 36.2
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 5.21 28.7
?Boston
(b) Pairwise scatterplots of the predictors
pairs(Boston, main = "Pairwise Scatterplots of Boston Predictors")

#Findings
#The scatterplots reveal several notable relationships:
#lstat and medv have a strong negative correlation—higher lower-status populations are linked to lower home values.
#rm and medv show a strong positive relationship—more rooms generally mean higher home values.
#nox and dis are negatively correlated, suggesting higher pollution in areas closer to employment centers.
#Variables like rad, tax, and ptratio show clustering, hinting at categorical-like behavior.
#Some relationships (e.g., crim vs. medv) are nonlinear but still show general trends.
(c) Correlations with per capita crime rate (crim)
cor(Boston$crim, Boston[-which(names(Boston) == "crim")])
## zn indus chas nox rm age dis
## [1,] -0.2004692 0.4065834 -0.05589158 0.4209717 -0.2192467 0.3527343 -0.3796701
## rad tax ptratio lstat medv
## [1,] 0.6255051 0.5827643 0.2899456 0.4556215 -0.3883046
(d) Outliers and predictor ranges
summary(Boston$crim)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00632 0.08204 0.25651 3.61352 3.67708 88.97620
summary(Boston$tax)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 187.0 279.0 330.0 408.2 666.0 711.0
summary(Boston$ptratio)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.60 17.40 19.05 18.46 20.20 22.00
(e) Number of tracts bounding the Charles River
sum(Boston$chas == 1)
## [1] 35
(h) Number of tracts with more than 7 or 8 rooms per dwelling
sum(Boston$rm > 7)
## [1] 64
sum(Boston$rm > 8)
## [1] 13
Boston[Boston$rm > 8, ]
## crim zn indus chas nox rm age dis rad tax ptratio lstat medv
## 98 0.12083 0 2.89 0 0.4450 8.069 76.0 3.4952 2 276 18.0 4.21 38.7
## 164 1.51902 0 19.58 1 0.6050 8.375 93.9 2.1620 5 403 14.7 3.32 50.0
## 205 0.02009 95 2.68 0 0.4161 8.034 31.9 5.1180 4 224 14.7 2.88 50.0
## 225 0.31533 0 6.20 0 0.5040 8.266 78.3 2.8944 8 307 17.4 4.14 44.8
## 226 0.52693 0 6.20 0 0.5040 8.725 83.0 2.8944 8 307 17.4 4.63 50.0
## 227 0.38214 0 6.20 0 0.5040 8.040 86.5 3.2157 8 307 17.4 3.13 37.6
## 233 0.57529 0 6.20 0 0.5070 8.337 73.3 3.8384 8 307 17.4 2.47 41.7
## 234 0.33147 0 6.20 0 0.5070 8.247 70.4 3.6519 8 307 17.4 3.95 48.3
## 254 0.36894 22 5.86 0 0.4310 8.259 8.4 8.9067 7 330 19.1 3.54 42.8
## 258 0.61154 20 3.97 0 0.6470 8.704 86.9 1.8010 5 264 13.0 5.12 50.0
## 263 0.52014 20 3.97 0 0.6470 8.398 91.5 2.2885 5 264 13.0 5.91 48.8
## 268 0.57834 20 3.97 0 0.5750 8.297 67.0 2.4216 5 264 13.0 7.44 50.0
## 365 3.47428 0 18.10 1 0.7180 8.780 82.9 1.9047 24 666 20.2 5.29 21.9
Chapter 3: Exercise 2 — KNN Classifier vs. Regression
## KNN Classifier:
# - Used for classification problems (categorical response).
# - Assigns the most frequent class among the k nearest neighbors.
## KNN Regression:
# - Used for regression problems (continuous response).
# - Predicts the average response value of the k nearest neighbors.
# Key differences:
# - Classifier outputs class label; regression outputs a numeric value.
# - Classifier uses majority vote; regression uses averaging.
Chapter 3: Exercise 10 — Carseats Regression
(a) Fit a multiple regression model
data("Carseats")
model_full <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(model_full)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
(b) Interpretation of coefficients
# Price: Sales decrease as price increases.
# UrbanYes: Difference in Sales for urban vs. non-urban stores.
# USYes: Difference in Sales for stores in US vs. elsewhere.
(c) Regression equation
# Sales = β0 + β1 * Price + β2 * UrbanYes + β3 * USYes + ε
# Use values from summary(model_full)
(d) Hypothesis tests
# Check p-values from model summary.
# Reject H0 if p-value < 0.05.
(e) Fit reduced model
model_reduced <- lm(Sales ~ Price + US, data = Carseats)
summary(model_reduced)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
(f) Compare model fits
summary(model_full)$adj.r.squared
## [1] 0.2335123
summary(model_reduced)$adj.r.squared
## [1] 0.2354305
(g) 95% Confidence intervals
confint(model_reduced)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## Price -0.06475984 -0.04419543
## USYes 0.69151957 1.70776632
(h) Outliers and leverage
par(mfrow = c(2, 2))
plot(model_reduced)

Chapter 4: Exercise 12 — Logistic Regression vs. Softmax
(a) Log odds in your model
# log(p_orange / (1 - p_orange)) = β0 + β1 * x
(b) Log odds in your friend’s model
# log(p_orange / p_apple) = (α_orange0 - α_apple0) + (α_orange1 - α_apple1) * x
(c) Match coefficients: β0 = 2, β1 = -1
# β0 = α_orange0 - α_apple0 = 2
# β1 = α_orange1 - α_apple1 = -1
# Example solution: α_orange0 = 2, α_orange1 = -1, α_apple0 = 0, α_apple1 = 0
(d) Friend’s softmax model estimates
# α_orange0 = 1.2, α_orange1 = -2
# α_apple0 = 3, α_apple1 = 0.6
# Then:
# β0 = 1.2 - 3 = -1.8
# β1 = -2 - 0.6 = -2.6
(e) Predictions comparison
# Both models give same class prediction if log-odds have same sign.
# Since β0 + β1 * x = (α_orange0 - α_apple0) + (α_orange1 - α_apple1) * x,
# the decision boundary is the same.
# Therefore, 100% agreement expected across test set.