In collaboration with Sam Nurmi and Bryce O’Connor
set.seed(1)
x<-rnorm(100)
y<-2*x+rnorm(100)
Using the x and y we can perform a simple linear regression without intercepts for the null hypothesis
regressor <- lm(y~x+0)
summary(regressor)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
The coefficient estimate is 1.9939, the standard error is 0.1065, the t-statistic is 18.73, and the p value is < 2.2e-16. With this p-value we are able to reject the null hypothesis.
Using the same method as Part A we can do the same for the alternative hyporthesis
regressant <- lm(x~y+0)
summary(regressant)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8699 -0.2368 0.1030 0.2858 0.8938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.39111 0.02089 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
The coefficient estimate is 0.39111, the standard error is 0.02089, the t-statistic is 18.73, and the pvalue is < 2.2e-16 With this p-value we are able to reject the alternative hypothesis.
Comparing the two tests in part a and b we see that the t-statistic and p-value are the same in both tests. While the coefficiant estimate and standard error are higher for y onto x.
nulley <- lm(y~x)
summary(nulley)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
alt_j <- lm(x~y)
summary(alt_j)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
set.seed(1)
x<-rnorm(100)
eps<- rnorm(100, mean = 0, sqrt(0.25))
y <- (-1 + 0.5*x + eps)
length(y)
## [1] 100
vector y has a length of 100 β0 = -1 β1 = 0.5
There appears to be a moderate linear positive relationship between x and y
ls <- lm(y~x)
summary (ls)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93842 -0.30688 -0.06975 0.26970 1.17309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.01885 0.04849 -21.010 < 2e-16 ***
## x 0.49947 0.05386 9.273 4.58e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4814 on 98 degrees of freedom
## Multiple R-squared: 0.4674, Adjusted R-squared: 0.4619
## F-statistic: 85.99 on 1 and 98 DF, p-value: 4.583e-15
β0(hat) = -1.01885 β1(hat) = 0.49947 What we actually got was retty close to our estimates
plot(x, y)
abline(ls, col= "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("least Squares", "Regression"), col = c("blue", "red"), lty= c(1,1))
set.seed(1)
x<-rnorm(100)
eps2<- rnorm(100, mean = 0, sqrt(0.15))
y2 <- (-1 + 0.5*x + eps2)
length(y)
## [1] 100
plot(x, y2)
ls2 <- lm(y2~x)
summary (ls2)
##
## Call:
## lm(formula = y2 ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72690 -0.23771 -0.05403 0.20891 0.90867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.01460 0.03756 -27.01 <2e-16 ***
## x 0.49959 0.04172 11.97 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3729 on 98 degrees of freedom
## Multiple R-squared: 0.594, Adjusted R-squared: 0.5899
## F-statistic: 143.4 on 1 and 98 DF, p-value: < 2.2e-16
plot(x, y2)
abline(ls2, col= "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("least Squares", "Regression"), col = c("blue", "red"), lty= c(1,1))
The points appear closer to both the least squres line and the linear regression line as the noise decreases
set.seed(1)
x<-rnorm(100)
eps3<- rnorm(100, mean = 0, sqrt(5))
y3 <- (-1 + 0.5*x + eps3)
length(y3)
## [1] 100
plot(x, y3)
ls3 <- lm(y3~x)
summary (ls3)
##
## Call:
## lm(formula = y3 ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1967 -1.3724 -0.3119 1.2061 5.2462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0843 0.2169 -5.000 2.52e-06 ***
## x 0.4976 0.2409 2.066 0.0415 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.153 on 98 degrees of freedom
## Multiple R-squared: 0.04173, Adjusted R-squared: 0.03195
## F-statistic: 4.268 on 1 and 98 DF, p-value: 0.04148
plot(x, y3)
abline(ls, col= "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("least Squares", "Regression"), col = c("blue", "red"), lty= c(1,1))
The data becomes more scattered and deviates further from the lines as the noise increases
orginal<- confint(ls)
reduce<- confint(ls2)
increase<- confint(ls3)
orginal
## 2.5 % 97.5 %
## (Intercept) -1.1150804 -0.9226122
## x 0.3925794 0.6063602
reduce
## 2.5 % 97.5 %
## (Intercept) -1.0891409 -0.9400557
## x 0.4167924 0.5823863
increase
## 2.5 % 97.5 %
## (Intercept) -1.51465507 -0.6539114
## x 0.01960053 0.9756573
As noise decreases the confidence interval increases As noise increases the confidence interval decreases The noise and the confidence interval appear to have a inverse relationship