In collaboration with Sam Nurmi and Bryce O’Connor

Problem 1

set.seed(1)
x<-rnorm(100)
y<-2*x+rnorm(100)

Part A

Using the x and y we can perform a simple linear regression without intercepts for the null hypothesis

regressor <- lm(y~x+0) 
summary(regressor)
## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

The coefficient estimate is 1.9939, the standard error is 0.1065, the t-statistic is 18.73, and the p value is < 2.2e-16. With this p-value we are able to reject the null hypothesis.

Part B

Using the same method as Part A we can do the same for the alternative hyporthesis

regressant <- lm(x~y+0)
summary(regressant)
## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8699 -0.2368  0.1030  0.2858  0.8938 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.39111    0.02089   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

The coefficient estimate is 0.39111, the standard error is 0.02089, the t-statistic is 18.73, and the pvalue is < 2.2e-16 With this p-value we are able to reject the alternative hypothesis.

Part C

Comparing the two tests in part a and b we see that the t-statistic and p-value are the same in both tests. While the coefficiant estimate and standard error are higher for y onto x.

Part D

nulley <- lm(y~x)
summary(nulley)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16
alt_j <- lm(x~y)
summary(alt_j)
## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90848 -0.28101  0.06274  0.24570  0.85736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03880    0.04266    0.91    0.365    
## y            0.38942    0.02099   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

Problem 2

set.seed(1)

Part A

x<-rnorm(100)

Part B

eps<- rnorm(100, mean = 0, sqrt(0.25))

Part C

y <- (-1 + 0.5*x + eps)
length(y)
## [1] 100

vector y has a length of 100 β0 = -1 β1 = 0.5

Part D

There appears to be a moderate linear positive relationship between x and y

Part E

ls <- lm(y~x)
summary (ls)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.93842 -0.30688 -0.06975  0.26970  1.17309 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.01885    0.04849 -21.010  < 2e-16 ***
## x            0.49947    0.05386   9.273 4.58e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4814 on 98 degrees of freedom
## Multiple R-squared:  0.4674, Adjusted R-squared:  0.4619 
## F-statistic: 85.99 on 1 and 98 DF,  p-value: 4.583e-15

β0(hat) = -1.01885 β1(hat) = 0.49947 What we actually got was retty close to our estimates

Part F

plot(x, y)
abline(ls, col= "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("least Squares", "Regression"), col = c("blue", "red"), lty= c(1,1))

Part G

set.seed(1)
x<-rnorm(100)
eps2<- rnorm(100, mean = 0, sqrt(0.15))
y2 <- (-1 + 0.5*x + eps2)
length(y)
## [1] 100
plot(x, y2)
ls2 <- lm(y2~x)
summary (ls2)
## 
## Call:
## lm(formula = y2 ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.72690 -0.23771 -0.05403  0.20891  0.90867 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.01460    0.03756  -27.01   <2e-16 ***
## x            0.49959    0.04172   11.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3729 on 98 degrees of freedom
## Multiple R-squared:  0.594,  Adjusted R-squared:  0.5899 
## F-statistic: 143.4 on 1 and 98 DF,  p-value: < 2.2e-16
plot(x, y2)
abline(ls2, col= "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("least Squares", "Regression"), col = c("blue", "red"), lty= c(1,1))

The points appear closer to both the least squres line and the linear regression line as the noise decreases

Part H

set.seed(1)
x<-rnorm(100)
eps3<- rnorm(100, mean = 0, sqrt(5))
y3 <- (-1 + 0.5*x + eps3)
length(y3)
## [1] 100
plot(x, y3)
ls3 <- lm(y3~x)
summary (ls3)
## 
## Call:
## lm(formula = y3 ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1967 -1.3724 -0.3119  1.2061  5.2462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.0843     0.2169  -5.000 2.52e-06 ***
## x             0.4976     0.2409   2.066   0.0415 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.153 on 98 degrees of freedom
## Multiple R-squared:  0.04173,    Adjusted R-squared:  0.03195 
## F-statistic: 4.268 on 1 and 98 DF,  p-value: 0.04148
plot(x, y3)
abline(ls, col= "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("least Squares", "Regression"), col = c("blue", "red"), lty= c(1,1))

The data becomes more scattered and deviates further from the lines as the noise increases

Part I

orginal<- confint(ls)
reduce<- confint(ls2)
increase<- confint(ls3)
orginal
##                  2.5 %     97.5 %
## (Intercept) -1.1150804 -0.9226122
## x            0.3925794  0.6063602
reduce
##                  2.5 %     97.5 %
## (Intercept) -1.0891409 -0.9400557
## x            0.4167924  0.5823863
increase
##                   2.5 %     97.5 %
## (Intercept) -1.51465507 -0.6539114
## x            0.01960053  0.9756573

As noise decreases the confidence interval increases As noise increases the confidence interval decreases The noise and the confidence interval appear to have a inverse relationship