Lauren Collar HW #4
Problem 1
set.seed(1)
x <- rnorm(100)
y <- 2*x + rnorm(100)
Reg <- lm(y~x+0)
summary(Reg)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
β^ = 1.9939 and standard error is 18.73 for the t-statisitc and 2e-16 for the p-value. Because the p-value is very low, we can reject the null.
set.seed(1)
Reg2 <- lm(x~y+0)
summary(Reg2)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8699 -0.2368 0.1030 0.2858 0.8938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.39111 0.02089 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
β^ = 0.3911 and standard error = 18.73 for the t-statisitc and 2e-16 for the p-value. Because the p-value is very low, we can reject the null.
1c We observe the same values for the t-statistic and p-value for both a and b. They are technically the same line and have the same relationship so it makes sense that these values are the same.
Reg3 <- lm(y~x)
summary(Reg3)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
Reg4 <- lm(x~y)
summary(Reg4)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
The t-statistic is about 18.56 for both y onto x and x onto y.
Problem 2
set.seed(1)
x <- rnorm(100, mean=0, sd=1)
eps <- rnorm(100, mean=0, sd=0.25)
y <- -1+0.5*x+eps
length(y)
## [1] 100
summary(y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.2700 -1.3294 -0.9215 -0.9550 -0.6021 0.3071
The length of y is 100. β0 = -1 and β 1= 0.5
plot(y~x)
The scatterplot depicts a postive moderatly strong linear relationship.
LSq <- lm(y~x)
summary(LSq)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46921 -0.15344 -0.03487 0.13485 0.58654
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.00942 0.02425 -41.63 <2e-16 ***
## x 0.49973 0.02693 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2407 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
β ^0 and β^1 and β0 and β1 are relatively close to eachother with β ^0 = -0.99316 and β0 = -1 and β^1 = 0.50529 and β1 = 0.5. This suggests that our model and prediction are very accurate.
plot(y~x)
abline(LSq, col="blue")
abline(a = -1, b = .5, col = "red")
eps2= rnorm(100, mean=0, sd= 0.10)
y = -1+0.5*x+eps2
LSq2 <- lm(y~x)
summary(LSq2)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.291411 -0.048230 -0.004533 0.064924 0.264157
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.99726 0.01047 -95.25 <2e-16 ***
## x 0.50212 0.01163 43.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1039 on 98 degrees of freedom
## Multiple R-squared: 0.9501, Adjusted R-squared: 0.9495
## F-statistic: 1864 on 1 and 98 DF, p-value: < 2.2e-16
plot(y~x)
abline(LSq2, col="blue")
abline(a = -1, b = .5, col = "red")
We find that there is less standard error in the data when there is less noise.
eps3= rnorm(100, mean=0, sd= 0.5)
y = -1+0.5*x+eps3
LSq3 <- lm(y~x)
summary(LSq3)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.25813 -0.27262 -0.01888 0.33644 0.93944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.97117 0.05014 -19.369 < 2e-16 ***
## x 0.47216 0.05569 8.478 2.4e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4977 on 98 degrees of freedom
## Multiple R-squared: 0.4231, Adjusted R-squared: 0.4172
## F-statistic: 71.87 on 1 and 98 DF, p-value: 2.4e-13
plot(y~x)
abline(LSq3, col="blue")
abline(a = -1, b = .5, col = "red")
We find that there is more standard error in the data when there is more noise.
confint(LSq)
## 2.5 % 97.5 %
## (Intercept) -1.0575402 -0.9613061
## x 0.4462897 0.5531801
confint(LSq2)
## 2.5 % 97.5 %
## (Intercept) -1.0180413 -0.9764850
## x 0.4790377 0.5251957
confint(LSq3)
## 2.5 % 97.5 %
## (Intercept) -1.070670 -0.8716647
## x 0.361636 0.5826779
We find that the confidence interval is largest for the modified data with more noise. We find the the condience interval is smallest for the modified data with less noise. We find that confidence interval is in the middle for the original data.