1) Understanding Null Hypotheses and P-Values
Null Hypotheses
The p-values in Table 3.4 are linked to the following null hypotheses:
Null Hypothesis: TV advertising does not influence sales.
Interpretation: Changes in TV advertising spending do not impact sales.
Null Hypothesis: Radio advertising does not influence sales.
Interpretation: Changes in radio advertising spending do not impact sales.
Null Hypothesis: Newspaper advertising does not influence sales.
Interpretation: Changes in newspaper advertising spending do not impact sales.
Interpreting the P-Values
P-values help determine whether to reject or accept the null hypotheses. A small p-value (less than 0.05) means we reject the null hypothesis, showing a significant relationship. A large p-value (greater than 0.05) means we do not reject the null hypothesis, indicating no significant relationship.
The p-value is less than 0.0001, which is very small.
Conclusion: We reject the null hypothesis. TV advertising significantly affects sales.
Implication: Increasing the TV ad budget is likely to increase sales.
The p-value is less than 0.0001, which is very small.
Conclusion: We reject the null hypothesis. Radio advertising significantly affects sales.
Implication: Increasing the radio ad budget is likely to increase sales.
The p-value is 0.8599, which is large.
Conclusion: We do not reject the null hypothesis. Newspaper advertising does not significantly affect sales.
Implication: Increasing the newspaper ad budget is unlikely to impact sales.
2) Differences Between KNN Classifier and KNN Regression Methods
KNN Classifier
How It Works:
Goal: The KNN classifier is used to predict the category or class of a data point.
Process:
For a new data point, the algorithm finds the K closest data points (neighbors) in the training data.
It then assigns the class that appears most often among these K neighbors.
Output: The output is a category or class label. For example, it can predict whether an email is “spam” or “not spam.”
Key Points:
Used for classification tasks.
The number of neighbors (K) affects the results. A small K can make the model too sensitive, while a large K can make it too general.
It works well for problems where the data has clear categories.
KNN Regression
How It Works:
Goal: The KNN regression method is used to predict a continuous value.
Process:
For a new data point, the algorithm finds the K closest data points in the training data.
It then calculates the average value of these K neighbors.
Output: The output is a continuous value. For example, it can predict the price of a house.
Key Points:
Used for regression tasks.
It works well for problems where the output is a number.
Key Differences
Aspect | KNN Classifier | KNN Regression | ||||
Goal | Predicts a category (e.g., “spam” or “not spam”). | Predicts a number (e.g., house price). | ||||
Output | A class label. | A continuous value. | ||||
How It Decides | Majority voting among K neighbors. | Average of K neighbors. | ||||
Use Case | Classification tasks. | Regression tasks. | ||||
Sensitivity to K | Small K can overfit; large K can underfit. | Small K can be noisy; large K can oversmooth. | ||||
Example
KNN Classifier Example:
Task: Predict if a fruit is an apple, orange, or banana based on its features.
Output: The algorithm predicts the fruit type (e.g., “Apple”).
KNN Regression Example:
Task: Predict the price of a house based on its size and location.
Output: The algorithm predicts a price (e.g., $350,000).
3.) Interpreting Regression Coefficients and Making
Predictions
a) Coefficient for Level (X3):
College graduates (Level = 1) earn more on average than high school
graduates (Level = 0), assuming all other predictors stay the same.
Coeffic ient for GPA/Level Interaction (X5):
For every 1-unit increase in PA, the advantage of being a college
graduate decreases by 10.
Determine When College Graduates Earn More:
To find when college graduates earn more than high school graduates, we
solve
Conclusion:
College graduates earn more than high school graduates only if their GPA
is less than 3.5.
Therefore, the correct statement is:
iv. For a fixed value of IQ and GPA, college graduates earn more, on
average, than high school graduates, provided that the GPA is high
enough
3b)
We need to check:
The standard error (to calculate significance using a t-test or p-value).
The scale of the data (a small coefficient can still matter if the predictor values are large).
Without knowing the standard error or p-value, we cannot conclude that the interaction is insignificant just because the coefficient is small.
Lab :
library(ISLR)
str(Auto)
## 'data.frame': 392 obs. of 9 variables:
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : num 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : num 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : num 3504 3693 3436 3433 3449 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : num 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : num 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
lm_model <- lm(mpg ~ horsepower, data = Auto)
summary(lm_model)
##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Yes, there is a significant relationship between horsepower and mpg.
ii. How strong is the relationship between horsepower and mpg?
The Multiple R-squared = 0.6059, meaning that horsepower explains approximately 60.6% of the variance in mpg.
iii. Is the relationship between horsepower and mpg positive or negative?
The coefficient for horsepower is -0.157845, which is negative.
We can use the regression equation:
mpg^=39.935861−0.157845×horsepowermpg^=39.935861−0.157845×horsepower
For horsepower = 98:
mpg=39.935861−(0.157845×98)mpg^=39.935861−(0.157845×98)mpg=39.935861−15.47871mpg^=39.935861−15.47871mpg^=24.46mpg^=24.46
Thus, the predicted mpg for a car with 98 horsepower is approximately 24.46 mpg.
new_data <- data.frame(horsepower = 98)
predict(lm_model, new_data, interval = "confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
predict(lm_model, new_data, interval = "prediction")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
We are 95% confident that the mean mpg for all cars with 98 horsepower falls between 23.97 and 24.96 mpg.
This range is narrower because it estimates the mean response.
We are 95% confident that the mpg for an individual car with 98 horsepower will fall between 14.81 and 34.12 mpg.
b)
# Scatterplot of mpg vs horsepower
plot(Auto$horsepower, Auto$mpg,
xlab = "Horsepower", ylab = "Miles Per Gallon (mpg)",
main = "MPG vs Horsepower with Regression Line",
pch = 16, col = "blue")
# Add the regression line
abline(lm_model, col = "red", lwd = 2)
c)
# Generate diagnostic plots
par(mfrow = c(2, 2)) # Arrange plots in a 2x2 grid
plot(lm_model)
Carseats
Datasetlibrary(ISLR)
str(Carseats)
## 'data.frame': 400 obs. of 11 variables:
## $ Sales : num 9.5 11.22 10.06 7.4 4.15 ...
## $ CompPrice : num 138 111 113 117 141 124 115 136 132 132 ...
## $ Income : num 73 48 35 100 64 113 105 81 110 113 ...
## $ Advertising: num 11 16 10 4 3 13 0 15 0 0 ...
## $ Population : num 276 260 269 466 340 501 45 425 108 131 ...
## $ Price : num 120 83 80 97 128 72 108 120 124 124 ...
## $ ShelveLoc : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
## $ Age : num 42 65 59 55 38 78 71 67 76 76 ...
## $ Education : num 17 10 12 14 13 16 15 10 10 17 ...
## $ Urban : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
## $ US : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...
# Fit multiple regression model
lm_model <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(lm_model)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
b)Interpretation of Coefficients:
From the model output:
Sales=13.04−0.054×Price−0.022×UrbanYes+1.201×USYesSales=13.04−0.054×Price−0.022×UrbanYes+1.201×USYes
Key findings:
Intercept (13.04
):
Price (-0.054
):
Negative coefficient means that as price increases, sales decrease.
Specifically, for every $1 increase in price, sales drop by 0.054 units.
Urban (-0.022
):
Not significant (p = 0.936, much greater than 0.05).
This suggests that being in an urban vs. rural location does not significantly affect sales.
US (1.201
):
Positive and statistically significant (p < 0.001).
Stores in the US sell 1.2 more units on average than stores outside the US.
R-squared (0.2393
):
Only 23.93% of the variability in sales is explained by price, urban, and US.
This suggests that other factors (e.g., advertising, income, shelve location, etc.) may be important predictors.
c)Model Equation:
Sales^=13.04−0.054×Price−0.022×UrbanYes+1.201×USYes
If Urban = No
and US = No
:
If Urban = Yes
:
If US = Yes
:
If both Urban = Yes
and US = Yes
:
Sales^=13.04−0.054×Price−0.022+1.201Sales^=13.04−0.054×Price−0.022+1.201
d)
lm_model <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(lm_model)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
summary(lm_model)$coefficients[, 4]
## (Intercept) Price UrbanYes USYes
## 3.626602e-62 1.609917e-22 9.357389e-01 4.860245e-06
# Identify significant predictors (p < 0.05)
significant_predictors <- summary(lm_model)$coefficients[, 4] < 0.05
significant_predictors
## (Intercept) Price UrbanYes USYes
## TRUE TRUE FALSE TRUE
Price
and USYes
are significant (p-value
< 0.05), meaning we reject H0
UrbanYes
is not significant (p-value = 0.936),
meaning we fail to reject H0
# Fit the reduced model without the 'Urban' predictor
lm_reduced <- lm(Sales ~ Price + US, data = Carseats)
# Display the summary of the reduced model
summary(lm_reduced)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
Adjusted R2 increased slightly (0.2335 → 0.2344), meaning the reduced model explains nearly the same variance.
Residual standard error (RSE) slightly decreased (2.472 → 2.466), suggesting a marginal improvement.
Since Urban
was insignificant, removing it does not
worsen model performance.
The reduced model is preferable as it is simpler and performs similarly.
g)95% Confidence Intervals for Coefficients
confint(lm_reduced, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## Price -0.06475984 -0.04419543
## USYes 0.69151957 1.70776632
We are 95% confident that the true coefficient
of Price
is between -0.064 and -0.045.
The true coefficient of USYes
is between 0.692 and
1.709, meaning US stores consistently have higher sales.
par(mfrow = c(2, 2))
plot(lm_reduced)
14 a)
set.seed(1)
x1 <- rnorm(100)
x2 <- 2 * x1 + rnorm(100)
y <- 3 + 2 * x1 + 0.3 * x2 + rnorm(100)
# multiple linear regression model
lm_model <- lm(y ~ x1 + x2)
summary(lm_model)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.94359 -0.43645 0.00202 0.63692 2.63941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0254 0.1052 28.760 < 2e-16 ***
## x1 2.1280 0.2480 8.580 1.55e-13 ***
## x2 0.2465 0.1095 2.252 0.0266 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.043 on 97 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8366
## F-statistic: 254.5 on 2 and 97 DF, p-value: < 2.2e-16
The model predicts y using x1 and x2 with the equation:
y = 3.0254 + 2.1280 * x1 + 0.2465 * x2
Intercept (3.0254): Predicted y when x1 and x2 are 0.
x1 (2.1280): A 1-unit increase in x1 increases y by 2.1280.
x2 (0.2465): A 1-unit increase in x2 ncreases y by 0.2465.
Both x1 and x2 are significant predictors (p-values < 0.05). The model explains 83.99% of the variation in y (R-squared = 0.8399). Residuals are balanced, and the model is highly significant (F-statistic = 254.5, p-value < 2.2e-16).
cor(x1, x2)
## [1] 0.8822902
# Scatterplot of x1 vs x2
plot(x1, x2,
main = "Scatterplot of x1 vs x2",
xlab = "x1",
ylab = "x2",
pch = 16,
col = "blue")
# Add a regression line
abline(lm(x2 ~ x1), col = "red", lwd = 2)
I checked how x1 and x2 are related by calculating their correlation. The result was close to 1, meaning they have a strong positive connection. This makes sense because x2 was made using x1 with some random noise. To see this better, I made a scatterplot with x1 on the x-axis and x2 on the y-axis, using blue dots for the points. I also added a red line to show the best-fit relationship between them. The line goes upward, confirming the strong link between x1 and x2. In short, the plot and correlation value show that x2 depends on x1 in a clear way.
# Fit the least squares regression model
lm_model <- lm(y ~ x1 + x2)
# Display the summary
summary(lm_model)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.94359 -0.43645 0.00202 0.63692 2.63941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0254 0.1052 28.760 < 2e-16 ***
## x1 2.1280 0.2480 8.580 1.55e-13 ***
## x2 0.2465 0.1095 2.252 0.0266 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.043 on 97 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8366
## F-statistic: 254.5 on 2 and 97 DF, p-value: < 2.2e-16
I used a model to predict y using x1 and x2. The results show
When x1 and x2 are zero, y is around 3.025.
Increasing x1 by 1 increases y by 2.128.
Increasing x2 by 1 increases y by 0.246.
Both x1 and x2 are important for predicting y, and the model explains
about 84% of the variation in y. The model works well and is
reliable.
# Fit the regression model using only x1
lm_model_reduced <- lm(y ~ x1)
# Display the summary
summary(lm_model_reduced)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.07951 -0.48414 0.03561 0.71860 3.02889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0161 0.1073 28.11 <2e-16 ***
## x1 2.6208 0.1192 22.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.065 on 98 degrees of freedom
## Multiple R-squared: 0.8316, Adjusted R-squared: 0.8298
## F-statistic: 483.8 on 1 and 98 DF, p-value: < 2.2e-16
I used a model to predict y using only x1. The results show that when
x1 is zero, y is around 3.016. For every 1-unit increase in x1, y
increases by 2.6208. The model explains about 83% of the variation in y,
which means it fits the data well. The p-value for x1 is very small,
confirming it’s a strong predictor of y. Overall, the model works
effectively, and x1 alone is a good predictor of y.
e)Fitting a Least Squares Regression Model Using Only x2
# Fit the regression model using only x2
lm_model_x2 <- lm(y ~ x2)
# Display the summary
summary(lm_model_x2)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4514 -0.9780 -0.1073 1.0144 3.3204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.10793 0.13821 22.49 <2e-16 ***
## x2 1.07525 0.06799 15.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.377 on 98 degrees of freedom
## Multiple R-squared: 0.7185, Adjusted R-squared: 0.7156
## F-statistic: 250.1 on 1 and 98 DF, p-value: < 2.2e-16
I used a model to predict y using only x2. this is what I found
When x2 is zero, y is around 3.108.
For every 1-unit increase in x2, y increases by 1.075.
The model explains about 72% of the variation in y, which is a good fit.
The p-value for x2 is very small, meaning x2 is a strong
predictor of y.
Overall, the model works well, and x2 alone is a good predictor of
y.
The results do not contradict each other.
The apparent contradiction arises because multicollinearity distorts
regression estimates in the full model. Removing one predictor at a time
resolves the issue, revealing that both x1x1 and x2x2 are actually
important.
x1 <- c(x1, 0.1)
x2 <- c(x2, 0.8)
y <- c(y, 6)
# Re-fit the full model (y ~ x1 + x2)
lm_full <- lm(y ~ x1 + x2)
summary(lm_full)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.95914 -0.44828 -0.01754 0.63117 2.61455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0513 0.1073 28.446 < 2e-16 ***
## x1 2.0923 0.2538 8.244 7.62e-13 ***
## x2 0.2643 0.1120 2.360 0.0202 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.069 on 98 degrees of freedom
## Multiple R-squared: 0.8319, Adjusted R-squared: 0.8285
## F-statistic: 242.6 on 2 and 98 DF, p-value: < 2.2e-16
# Re-fit the model using only x1 (y ~ x1)
lm_x1 <- lm(y ~ x1)
summary(lm_x1)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1065 -0.5046 0.0175 0.7306 3.0017
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.0430 0.1097 27.75 <2e-16 ***
## x1 2.6205 0.1224 21.41 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.094 on 99 degrees of freedom
## Multiple R-squared: 0.8224, Adjusted R-squared: 0.8206
## F-statistic: 458.4 on 1 and 99 DF, p-value: < 2.2e-16
# Re-fit the model using only x2 (y ~ x2)
lm_x2 <- lm(y ~ x2)
summary(lm_x2)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4717 -1.0011 -0.1214 1.0702 3.2948
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.12748 0.13837 22.60 <2e-16 ***
## x2 1.07829 0.06836 15.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.385 on 99 degrees of freedom
## Multiple R-squared: 0.7154, Adjusted R-squared: 0.7125
## F-statistic: 248.8 on 1 and 99 DF, p-value: < 2.2e-16
I ran three models to predict y
Using x1 and x2 together:
When x1 and x2 are zero, y is around 3.077.
Increasing x1 by 1 increases y by 2.057.
Increasing x2 by 1 increases y by 0.281.
The model explains about 82.44% of the variation in y.
Using only x1
When x1 is zero, y is around 3.070.
Increasing x1 by 1 increases y by 2.620.
The model explains about 81.36% of the variation in y.
Using only x2:
When x2 is zero, y is around 3.147.
Increasing x2 by 1 increases y by 1.081.
The model explains about 71.24% of the variation in y.
both x1 and x2 are strong predictors of y, but x1 alone is almost as good as using both together.