Problem.11 from Chapter-III in ISLR

This work is part of my effort to become a well versed data analyst. At this point in time, and for the immediate future, I will undoubtedly be a novice at using R and solving the problem sets from this book. Hence, my solutions will at times reflect my limited abilities. But, with more practice, the quality and depth of my work will improve ( That is the whole point!). I welcome you to comment and critic my work to help me improve

In this problem we will investigate the t-statistic for the null hypothesis Ho : β = 0 in simple linear regression without an intercept. To begin, we generate a predictor x and a response y as follows.

set.seed(1)
x=rnorm(100)
y=2*x+rnorm (100)

Question-a

Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate βˆ, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis Ho : β = 0. Comment on these results.

lm.fit1 = lm(y~x+0)
summary(lm.fit1)

## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

Based on these results, the predictor ‘x’ is statistically significant for estimating the response variable ‘y’.

Question-b

Now perform a simple linear regression of x onto y without an intercept, and report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with the null hypothesis Ho : β = 0. Comment on these results.

lm.fit2 = lm(x~y+0)
summary(lm.fit2)

## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8699 -0.2368  0.1030  0.2858  0.8938 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.39111    0.02089   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

Same observation as the previous model. The predcitor variable is significance.

Question-c

What is the relationship between the results obtained in (a) and (b)?

par(mfrow=c(1,2))
plot(y~x); abline(lm.fit1)
plot(x~y); abline(lm.fit2)

The results are practically the same.

Question-f

In R, show that when regression is performed with an intercept, the t-statistic for Ho : β1 = 0 is the same for the regression of y onto x as it is for the regression of x onto y

lm.fit1 = lm(y~x)
summary(lm.fit1)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

lm.fit2 = lm(x~y)
summary(lm.fit2)

## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90848 -0.28101  0.06274  0.24570  0.85736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03880    0.04266    0.91    0.365    
## y            0.38942    0.02099   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

The summary tables shows that the t-value are the same (by approximation)

Problem.11 from Chapter-III in ISLR

Ahmed Tadde

May 9, 2015

Question-a

Question-b

Question-c

Question-f