1) Understanding Null Hypotheses and P-Values

Null Hypotheses

The p-values in Table 3.4 are linked to the following null hypotheses:

  1. TV Advertising:

Null Hypothesis: TV advertising does not influence sales.

Interpretation: Changes in TV advertising spending do not impact sales.

  1. Radio Advertising:

Null Hypothesis: Radio advertising does not influence sales.

Interpretation: Changes in radio advertising spending do not impact sales.

  1. Newspaper Advertising:

Null Hypothesis: Newspaper advertising does not influence sales.

Interpretation: Changes in newspaper advertising spending do not impact sales.

Interpreting the P-Values

P-values help determine whether to reject or accept the null hypotheses. A small p-value (less than 0.05) means we reject the null hypothesis, showing a significant relationship. A large p-value (greater than 0.05) means we do not reject the null hypothesis, indicating no significant relationship.

  1. TV Advertising:

The p-value is less than 0.0001, which is very small.

Conclusion: We reject the null hypothesis. TV advertising significantly affects sales.

Implication: Increasing the TV ad budget is likely to increase sales.

  1. Radio Advertising:

The p-value is less than 0.0001, which is very small.

Conclusion: We reject the null hypothesis. Radio advertising significantly affects sales.

Implication: Increasing the radio ad budget is likely to increase sales.

  1. Newspaper Advertising:

The p-value is 0.8599, which is large.

Conclusion: We do not reject the null hypothesis. Newspaper advertising does not significantly affect sales.

Implication: Increasing the newspaper ad budget is unlikely to impact sales.

2) Differences Between KNN Classifier and KNN Regression Methods

KNN Classifier

How It Works:

  1. Goal: The KNN classifier is used to predict the category or class of a data point.

    Process:

For a new data point, the algorithm finds the K closest data points (neighbors) in the training data.

It then assigns the class that appears most often among these K neighbors.

Output: The output is a category or class label. For example, it can predict whether an email is “spam” or “not spam.”

Key Points:

KNN Regression

How It Works:

  1. Goal: The KNN regression method is used to predict a continuous value.

  2. Process:

For a new data point, the algorithm finds the K closest data points in the training data.

It then calculates the average value of these K neighbors.

Output: The output is a continuous value. For example, it can predict the price of a house.

Key Points:

Aspect KNN Classifier KNN Regression
Goal Predicts a category (e.g., “spam” or “not spam”). Predicts a number (e.g., house price).
Output A class label. A continuous value.
How It Decides Majority voting among K neighbors. Average of K neighbors.
Use Case Classification tasks. Regression tasks.
Sensitivity to K Small K can overfit; large K can underfit. Small K can be noisy; large K can oversmooth.



Example

KNN Classifier Example:

Task: Predict if a fruit is an apple, orange, or banana based on its features.

Output: The algorithm predicts the fruit type (e.g., “Apple”).

KNN Regression Example:

Task: Predict the price of a house based on its size and location.

Output: The algorithm predicts a price (e.g., $350,000).

3.) Interpreting Regression Coefficients and Making Predictions

a)
Coefficient for Level (X3):
College graduates (Level = 1) earn  more on average than high school graduates (Level = 0), assuming all other predictors stay the same.

Coeffic ient for GPA/Level Interaction (X5):
For every 1-unit increase in PA, the advantage of being a college graduate decreases by 10.

Determine When College Graduates Earn More:
To find when college graduates earn more than high school graduates, we solve

Conclusion:
College graduates earn more than high school graduates only if their GPA is less than 3.5.
Therefore, the correct statement is:
iv. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates, provided that the GPA is high enough

3b)

  1. The size of the coefficient alone doesn’t tell us if the interaction is significant.

We need to check:

The standard error (to calculate significance using a t-test or p-value).

Lab :

library(ISLR) 
str(Auto)
## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
lm_model <- lm(mpg ~ horsepower, data = Auto)
summary(lm_model)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

i. Is there a relationship between horsepower and mpg?

Yes, there is a significant relationship between horsepower and mpg.

ii. How strong is the relationship between horsepower and mpg?

The Multiple R-squared = 0.6059, meaning that horsepower explains approximately 60.6% of the variance in mpg.

iii. Is the relationship between horsepower and mpg positive or negative?

The coefficient for horsepower is -0.157845, which is negative.

iv. Predicted mpg for a horsepower of 98

We can use the regression equation:

mpg^=39.935861−0.157845×horsepowermpg^​=39.935861−0.157845×horsepower

For horsepower = 98:

mpg=39.935861−(0.157845×98)mpg^​=39.935861−(0.157845×98)mpg=39.935861−15.47871mpg^​=39.935861−15.47871mpg^=24.46mpg^​=24.46

Thus, the predicted mpg for a car with 98 horsepower is approximately 24.46 mpg.

new_data <- data.frame(horsepower = 98)
predict(lm_model, new_data, interval = "confidence")
##        fit      lwr      upr
## 1 24.46708 23.97308 24.96108
predict(lm_model, new_data, interval = "prediction")
##        fit     lwr      upr
## 1 24.46708 14.8094 34.12476

b)

# Scatterplot of mpg vs horsepower
plot(Auto$horsepower, Auto$mpg, 
     xlab = "Horsepower", ylab = "Miles Per Gallon (mpg)", 
     main = "MPG vs Horsepower with Regression Line", 
     pch = 16, col = "blue")

# Add the regression line
abline(lm_model, col = "red", lwd = 2)

c)

# Generate diagnostic plots
par(mfrow = c(2, 2))  # Arrange plots in a 2x2 grid
plot(lm_model)

    1. Multiple Regression Model using the Carseats Dataset
library(ISLR)
str(Carseats)
## 'data.frame':    400 obs. of  11 variables:
##  $ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
##  $ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
##  $ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
##  $ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
##  $ Population : num  276 260 269 466 340 501 45 425 108 131 ...
##  $ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
##  $ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
##  $ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
##  $ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
##  $ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
##  $ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...
# Fit multiple regression model
lm_model <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(lm_model)
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

b)Interpretation of Coefficients:

From the model output:

Sales=13.04−0.054×Price−0.022×UrbanYes+1.201×USYesSales=13.04−0.054×Price−0.022×UrbanYes+1.201×USYes

Key findings:

Sales^=13.04−0.054×Price−0.022×UrbanYes+1.201×USYes

  1. Comparing Model Fits

The reduced model is preferable as it is simpler and performs similarly.

g)95% Confidence Intervals for Coefficients

confint(lm_reduced, level = 0.95)
##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## Price       -0.06475984 -0.04419543
## USYes        0.69151957  1.70776632
  1. Checking for Outliers and High Leverage Points
par(mfrow = c(2, 2))
plot(lm_reduced)

14 a)

set.seed(1)

x1 <- rnorm(100)  
x2 <- 2 * x1 + rnorm(100)  
y <- 3 + 2 * x1 + 0.3 * x2 + rnorm(100)

#  multiple linear regression model
lm_model <- lm(y ~ x1 + x2)
summary(lm_model)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.94359 -0.43645  0.00202  0.63692  2.63941 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0254     0.1052  28.760  < 2e-16 ***
## x1            2.1280     0.2480   8.580 1.55e-13 ***
## x2            0.2465     0.1095   2.252   0.0266 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.043 on 97 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8366 
## F-statistic: 254.5 on 2 and 97 DF,  p-value: < 2.2e-16

The model predicts y using x1 and x2 with the equation:
y = 3.0254 + 2.1280 * x1 + 0.2465 * x2

Both x1 and x2 are significant predictors (p-values < 0.05). The model explains 83.99% of the variation in y (R-squared = 0.8399). Residuals are balanced, and the model is highly significant (F-statistic = 254.5, p-value < 2.2e-16).

  1. Compute the Correlation Between x1x1​ and x2x2​ and Create a Scatterplot
cor(x1, x2)
## [1] 0.8822902
# Scatterplot of x1 vs x2
plot(x1, x2, 
     main = "Scatterplot of x1 vs x2",
     xlab = "x1", 
     ylab = "x2", 
     pch = 16, 
     col = "blue")

# Add a regression line
abline(lm(x2 ~ x1), col = "red", lwd = 2)

I checked how x1 and x2 are related by calculating their correlation. The result was close to 1, meaning they have a strong positive connection. This makes sense because x2 was made using x1 with some random noise. To see this better, I made a scatterplot with x1 on the x-axis and x2 on the y-axis, using blue dots for the points. I also added a red line to show the best-fit relationship between them. The line goes upward, confirming the strong link between x1 and x2. In short, the plot and correlation value show that x2 depends on x1 in a clear way.

c) Fit the Regression Model

# Fit the least squares regression model
lm_model <- lm(y ~ x1 + x2)

# Display the summary
summary(lm_model)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.94359 -0.43645  0.00202  0.63692  2.63941 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0254     0.1052  28.760  < 2e-16 ***
## x1            2.1280     0.2480   8.580 1.55e-13 ***
## x2            0.2465     0.1095   2.252   0.0266 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.043 on 97 degrees of freedom
## Multiple R-squared:  0.8399, Adjusted R-squared:  0.8366 
## F-statistic: 254.5 on 2 and 97 DF,  p-value: < 2.2e-16

I used a model to predict y using x1 and x2. The results show

  1. When x1 and x2 are zero, y is around 3.025.

  2. Increasing x1 by 1 increases y by 2.128.

  3. Increasing x2 by 1 increases y by 0.246.
    Both x1 and x2 are important for predicting y, and the model explains about 84% of the variation in y. The model works well and is reliable.

# Fit the regression model using only x1
lm_model_reduced <- lm(y ~ x1)

# Display the summary
summary(lm_model_reduced)
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.07951 -0.48414  0.03561  0.71860  3.02889 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0161     0.1073   28.11   <2e-16 ***
## x1            2.6208     0.1192   22.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.065 on 98 degrees of freedom
## Multiple R-squared:  0.8316, Adjusted R-squared:  0.8298 
## F-statistic: 483.8 on 1 and 98 DF,  p-value: < 2.2e-16

I used a model to predict y using only x1. The results show that when x1 is zero, y is around 3.016. For every 1-unit increase in x1, y increases by 2.6208. The model explains about 83% of the variation in y, which means it fits the data well. The p-value for x1 is very small, confirming it’s a strong predictor of y. Overall, the model works effectively, and x1 alone is a good predictor of y.

e)Fitting a Least Squares Regression Model Using Only x2​

# Fit the regression model using only x2
lm_model_x2 <- lm(y ~ x2)

# Display the summary
summary(lm_model_x2)
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4514 -0.9780 -0.1073  1.0144  3.3204 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.10793    0.13821   22.49   <2e-16 ***
## x2           1.07525    0.06799   15.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.377 on 98 degrees of freedom
## Multiple R-squared:  0.7185, Adjusted R-squared:  0.7156 
## F-statistic: 250.1 on 1 and 98 DF,  p-value: < 2.2e-16

I used a model to predict y using only x2. this is what I found

  1. When x2 is zero, y is around 3.108.

  2. For every 1-unit increase in x2, y increases by 1.075.

  3. The model explains about 72% of the variation in y, which is a good fit.

  4. The p-value for x2 is very small, meaning x2 is a strong predictor of y.
    Overall, the model works well, and x2 alone is a good predictor of y.

The results do not contradict each other.
The apparent contradiction arises because multicollinearity distorts regression estimates in the full model. Removing one predictor at a time resolves the issue, revealing that both x1x1​ and x2x2​ are actually important.

x1 <- c(x1, 0.1)
x2 <- c(x2, 0.8)
y  <- c(y, 6)

# Re-fit the full model (y ~ x1 + x2)
lm_full <- lm(y ~ x1 + x2)
summary(lm_full)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.95914 -0.44828 -0.01754  0.63117  2.61455 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0513     0.1073  28.446  < 2e-16 ***
## x1            2.0923     0.2538   8.244 7.62e-13 ***
## x2            0.2643     0.1120   2.360   0.0202 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.069 on 98 degrees of freedom
## Multiple R-squared:  0.8319, Adjusted R-squared:  0.8285 
## F-statistic: 242.6 on 2 and 98 DF,  p-value: < 2.2e-16
# Re-fit the model using only x1 (y ~ x1)
lm_x1 <- lm(y ~ x1)
summary(lm_x1)
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1065 -0.5046  0.0175  0.7306  3.0017 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0430     0.1097   27.75   <2e-16 ***
## x1            2.6205     0.1224   21.41   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.094 on 99 degrees of freedom
## Multiple R-squared:  0.8224, Adjusted R-squared:  0.8206 
## F-statistic: 458.4 on 1 and 99 DF,  p-value: < 2.2e-16
# Re-fit the model using only x2 (y ~ x2)
lm_x2 <- lm(y ~ x2)
summary(lm_x2)
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4717 -1.0011 -0.1214  1.0702  3.2948 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.12748    0.13837   22.60   <2e-16 ***
## x2           1.07829    0.06836   15.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.385 on 99 degrees of freedom
## Multiple R-squared:  0.7154, Adjusted R-squared:  0.7125 
## F-statistic: 248.8 on 1 and 99 DF,  p-value: < 2.2e-16

I ran three models to predict y

Using x1 and x2 together:

  • When x1 and x2 are zero, y is around 3.077.

  • Increasing x1 by 1 increases y by 2.057.

  • Increasing x2 by 1 increases y by 0.281.

  • The model explains about 82.44% of the variation in y.

Using only x1

  • When x1 is zero, y is around 3.070.

  • Increasing x1 by 1 increases y by 2.620.

  • The model explains about 81.36% of the variation in y.

Using only x2:

  • When x2 is zero, y is around 3.147.

  • Increasing x2 by 1 increases y by 1.081.

  • The model explains about 71.24% of the variation in y.

both x1 and x2 are strong predictors of y, but x1 alone is almost as good as using both together.