Exercise 2

Carefully explain the differences between the KNN classifier and KNN regression methods.

The KNN classifier and regression are pretty similar. The main difference is that in KNN classifier we usually use the class of the majority of the neighbors, while in regression we “merge” the values of the neighbors. Another difference is that for classification we usually use an odd number of neighbors, while we don’t have that restriction for regression. Both methods can use some kind of weights in order to give more importance to closer neighbors. The weighting method can have much more impact in the regression methods so one should be careful when picking the weighting function.

KNN regression sample with K being 1, 2 and 3.

KNN Classification example.

Exercise 7

It is claimed in the text that in the case of simple linear regression of Y onto X, the R2 statistic (3.17) is equal to the square of the correlation between X and Y (3.18). Prove that this is the case. For simplicity, you may assume that ¯x = ¯y = 0.

Exercise 12

This problem involves simple linear regression without an intercept.

12.a)

Recall that the coefficient estimate βˆ for the linear regression of Y onto X without an intercept is given by (3.38). Under what circumstance is the coefficient estimate for the regression of X onto Y the same as the coefficient estimate for the regression of Y onto X?

12.b)

Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is different from the coefficient estimate for the regression of Y onto X

> x = runif(100, min=3, max=5)
> y = runif(100, min=5, max=7)
> lm.fity = lm(y ~ x + 0)
> lm.fitx = lm(x ~ y + 0)
> summary(lm.fity)
Call:
lm(formula = y ~ x + 0)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.12033 -0.58598  0.06645  0.79872  1.93644 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
x  1.47010    0.02344   62.72   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9542 on 99 degrees of freedom
Multiple R-squared:  0.9754,    Adjusted R-squared:  0.9752 
F-statistic:  3933 on 1 and 99 DF,  p-value: < 2.2e-16
> summary(lm.fitx)
Call:
lm(formula = x ~ y + 0)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.20111 -0.44104  0.06561  0.50158  1.52925 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
y  0.66353    0.01058   62.72   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.641 on 99 degrees of freedom
Multiple R-squared:  0.9754,    Adjusted R-squared:  0.9752 
F-statistic:  3933 on 1 and 99 DF,  p-value: < 2.2e-16

12.c)

Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is the same as the coefficient estimate for the regression of Y onto X.

> set.seed(1)
> x = runif(100, min=3, max=3.1)
> y = runif(100, min=3, max=3.1)
> lm.fity = lm(y ~ x + 0)
> lm.fitx = lm(x ~ y + 0)
> summary(lm.fity)
Call:
lm(formula = y ~ x + 0)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.088602 -0.029414  0.005026  0.027561  0.077283 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
x 0.999913   0.001239     807   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03782 on 99 degrees of freedom
Multiple R-squared:  0.9998,    Adjusted R-squared:  0.9998 
F-statistic: 6.512e+05 on 1 and 99 DF,  p-value: < 2.2e-16
> summary(lm.fitx)
Call:
lm(formula = x ~ y + 0)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.076819 -0.027097 -0.004566  0.029877  0.089068 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
y 0.999935   0.001239     807   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03782 on 99 degrees of freedom
Multiple R-squared:  0.9998,    Adjusted R-squared:  0.9998 
F-statistic: 6.512e+05 on 1 and 99 DF,  p-value: < 2.2e-16