Lauren Collar HW #4

Problem 1

set.seed(1)
x <- rnorm(100)
y <- 2*x + rnorm(100)
Reg <- lm(y~x+0)
summary(Reg)
## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

β^ = 1.9939 and standard error is 18.73 for the t-statisitc and 2e-16 for the p-value. Because the p-value is very low, we can reject the null.

set.seed(1)
Reg2 <- lm(x~y+0)
summary(Reg2)
## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8699 -0.2368  0.1030  0.2858  0.8938 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.39111    0.02089   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

β^ = 0.3911 and standard error = 18.73 for the t-statisitc and 2e-16 for the p-value. Because the p-value is very low, we can reject the null.

1c We observe the same values for the t-statistic and p-value for both a and b. They are technically the same line and have the same relationship so it makes sense that these values are the same.

Reg3 <- lm(y~x)
summary(Reg3)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16
Reg4 <- lm(x~y)
summary(Reg4)
## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90848 -0.28101  0.06274  0.24570  0.85736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03880    0.04266    0.91    0.365    
## y            0.38942    0.02099   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

The t-statistic is about 18.56 for both y onto x and x onto y.

Problem 2

set.seed(1)
x <- rnorm(100, mean=0, sd=1)
eps <- rnorm(100, mean=0, sd=0.25)
y <- -1+0.5*x+eps
length(y)
## [1] 100
summary(y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.2700 -1.3294 -0.9215 -0.9550 -0.6021  0.3071

The length of y is 100. β0 = -1 and β 1= 0.5

plot(y~x)

The scatterplot depicts a postive moderatly strong linear relationship.

LSq <- lm(y~x)
summary(LSq)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46921 -0.15344 -0.03487  0.13485  0.58654 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.00942    0.02425  -41.63   <2e-16 ***
## x            0.49973    0.02693   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2407 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

β ^0 and β^1 and β0 and β1 are relatively close to eachother with β ^0 = -0.99316 and β0 = -1 and β^1 = 0.50529 and β1 = 0.5. This suggests that our model and prediction are very accurate.

plot(y~x)
abline(LSq, col="blue")
abline(a = -1, b = .5, col = "red")

eps2= rnorm(100, mean=0, sd= 0.10)
y = -1+0.5*x+eps2
LSq2 <- lm(y~x)
summary(LSq2)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.291411 -0.048230 -0.004533  0.064924  0.264157 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.99726    0.01047  -95.25   <2e-16 ***
## x            0.50212    0.01163   43.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1039 on 98 degrees of freedom
## Multiple R-squared:  0.9501, Adjusted R-squared:  0.9495 
## F-statistic:  1864 on 1 and 98 DF,  p-value: < 2.2e-16
plot(y~x)
abline(LSq2, col="blue")
abline(a = -1, b = .5, col = "red")

We find that there is less standard error in the data when there is less noise.

eps3= rnorm(100, mean=0, sd= 0.5)
y = -1+0.5*x+eps3
LSq3 <- lm(y~x)
summary(LSq3)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.25813 -0.27262 -0.01888  0.33644  0.93944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.97117    0.05014 -19.369  < 2e-16 ***
## x            0.47216    0.05569   8.478  2.4e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4977 on 98 degrees of freedom
## Multiple R-squared:  0.4231, Adjusted R-squared:  0.4172 
## F-statistic: 71.87 on 1 and 98 DF,  p-value: 2.4e-13
plot(y~x)
abline(LSq3, col="blue")
abline(a = -1, b = .5, col = "red")

We find that there is more standard error in the data when there is more noise.

confint(LSq)
##                  2.5 %     97.5 %
## (Intercept) -1.0575402 -0.9613061
## x            0.4462897  0.5531801
confint(LSq2)
##                  2.5 %     97.5 %
## (Intercept) -1.0180413 -0.9764850
## x            0.4790377  0.5251957
confint(LSq3)
##                 2.5 %     97.5 %
## (Intercept) -1.070670 -0.8716647
## x            0.361636  0.5826779

We find that the confidence interval is largest for the modified data with more noise. We find the the condience interval is smallest for the modified data with less noise. We find that confidence interval is in the middle for the original data.