1. Perform a simple linear regression of y onto x, without an intercept. Report the coefficient estimate βˆ, the standard error of this coefficient estimate, and the t-statistic and p-value associated with the null hypothesis H0 : β = 0. Comment on these results. (You can perform regression without an intercept using the command lm(y∼x+0).)

The coefficient estimate is 1.9939. The t-statistic is 18.73. The standard error of the coefficient estimate is 0.1065. The p-value is <2e-16, indicating that we can reject the null hypothesis.

set.seed (1)
x <- rnorm (100)
y <- 2 * x + rnorm (100)
lm_model = lm(y~x+0) #Create the linear regression
summary(lm_model)
## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16
  1. Now perform a simple linear regression of x onto y without an intercept, and report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with the null hypothesis H0 : β = 0. Comment on these results.

The coefficient estimate is 0.39111. The t-statistic is 18.73. The standard error of the coefficient estimate is 0.02089. The p-value is <2e-16, indicating that we can reject the null hypothesis.

  1. What is the relationship between the results obtained in (a) and (b)?

The t-statistic and the p-value are the same for both the results. The relationships represent the same least squares line since y=2x+e can be written as x=0.5(y-e)

set.seed (1)
x <- rnorm (100)
y <- 2 * x + rnorm (100)
lm_model = lm(x~y+0) #Create the linear regression
summary(lm_model)
## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8699 -0.2368  0.1030  0.2858  0.8938 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.39111    0.02089   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16
  1. For the regression of Y onto X without an intercept, the tstatistic for H0 : β = 0 takes the form βˆ/SE(βˆ), where βˆ is given by (3.38), and where . (These formulas are slightly different from those given in Sections 3.1.1 and 3.1.2, since here we are performing regression without an intercept.) Show algebraically, and confirm numerically in R, that the t-statistic can be written as

The formulae for the estimator coefficient and standard error of estimator coefficient are as below - $$ ={_ix_iy_y/_jx_j^2}

; SE()={} \[ Considering that the t statistic is the ratio of the estimator coefficient to the standard error of estimator coefficient, the t statistic value can be represented as \] t = \[ Substituting the value of the estimator coefficient in the denominator, we get \] t = = . $$

When the formula derived for t statistic is used to calculate its value in R, the same value is obtained as one shown in the models in part a) and b).

n <- length(x)
t <- (x %*% y)*sqrt(n - 1)/sqrt(sum(x^2) * sum(y^2) - (x %*% y)^2)
t
##          [,1]
## [1,] 18.72593
  1. Using the results from (d), argue that the t-statistic for the regression of y onto x is the same t-statistic for the regression of x onto y.

If we switch the values of x and y with each other in the formula for t-statistic, the formula essentially stays the same and we obtain the same value as shown below

n <- length(y)
t <- (y %*% x)*sqrt(n - 1)/sqrt(sum(y^2) * sum(x^2) - (y %*% x)^2)
t
##          [,1]
## [1,] 18.72593
  1. In R, show that when regression is performed with an intercept, the t-statistic for H0:β1=0 is the same for the regression of y onto x as it is the regression of x onto y.

When the intercept is included, the t-statistic is the same for the regression of y onto x and x onto y (both are equal to 18.56 as shown below).

lm_model_with_intercept<-lm(y~x)
summary(lm_model_with_intercept)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16
lm_model_with_intercept<-lm(x~y)
summary(lm_model_with_intercept)
## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90848 -0.28101  0.06274  0.24570  0.85736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03880    0.04266    0.91    0.365    
## y            0.38942    0.02099   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16