Part II; Practice Problems Problem 1: Investigating the T-stat a)

set.seed(1)
x<-rnorm(100)
y<-2*x+rnorm(100)

slr<-lm(y~x+0)
summary(slr)
## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

coefficient estimate: 1.9939 standard error of the coefficient estimate: 0.1065 t-statistic: 18.73 p-value: <2e-16

The summary suggests that we have a small p-value (< α = 0.05) that allows us to reject the null hypothesis, so the model is statistically significant. b)

set.seed(1)
x<-rnorm(100)
y<-2*x+rnorm(100)

slr2<-lm(x~y+0)
summary(slr2)
## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8699 -0.2368  0.1030  0.2858  0.8938 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.39111    0.02089   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16

coefficient estimate: 0.39111 standard error of the coefficient estimate: 0.02089 t-statistic: 18.73 p-value: <2e-16

These results suggest that we have a small p-value (< α = 0.05) which allows us to reject the null hypothesis again, so the model is statistically significant.

  1. The results obtained in a) and b) have the same t-statistic and p-value. The results in a) and b) are both models for the same line y = 2x + ε but with a) performing the regression of y onto x and b) performing the regression of x onto y.

slr3<- lm(y~x)
summary(slr3)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

The test statistic for the coefficient estimate is 18.556 when the regression of y onto x is performed with an intercept.

slr4<-lm(x~y)
summary(slr4)
## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90848 -0.28101  0.06274  0.24570  0.85736 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03880    0.04266    0.91    0.365    
## y            0.38942    0.02099   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16
summary(slr3)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8768 -0.6138 -0.1395  0.5394  2.3462 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.03769    0.09699  -0.389    0.698    
## x            1.99894    0.10773  18.556   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

The test statistic for the coefficient estimate is 18.556 when the regression of x onto y is performed with an intercept.

Problem 2: SLR Estimation a)

set.seed(1)
x<- rnorm(100, mean=0, sd=1)
eps <- rnorm(100, mean = 0, sd = 0.25)
y <- -1+0.5*x+ eps
length(y)
## [1] 100

The length of the vector y is 100 The value of β0 is -1 The value of β1 is 0.5

plot(x,y)

The scatterplot shows a positive direction, linear form, a moderately strong relationship between x and y, and there appear to be no outliers.

model <- lm(y~x)
summary(model)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46921 -0.15344 -0.03487  0.13485  0.58654 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.00942    0.02425  -41.63   <2e-16 ***
## x            0.49973    0.02693   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2407 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16

βhat0 is -1.00942 βhat1 is 0.49973 These predicted values are extremely close to values of the population model (β0 = -1, β1 = 0.5) The model has an R-squared value of 0.7784, suggesting that this model fits the data moderately well The model also had a very small p-value, so the null hypothesis can be rejected.

plot(x,y)
abline(model, col="blue")
abline(-1, 0.5, col="yellow")
legend("topleft", c("Least Square", "Regression"), col= c("blue","yellow"), lty=c(1,1))

  1. Less Noise
set.seed(1)
eps <-rnorm(100, sd = 0.10)
x <- rnorm(100)
y <- -1 + 0.5*x + eps
model_less <- lm(y~x)
summary(model_less)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.232416 -0.060361  0.000536  0.058305  0.229316 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.989115   0.009035 -109.48   <2e-16 ***
## x            0.499907   0.009472   52.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09028 on 98 degrees of freedom
## Multiple R-squared:  0.966,  Adjusted R-squared:  0.9657 
## F-statistic:  2785 on 1 and 98 DF,  p-value: < 2.2e-16
plot(x,y)
abline(model_less, col="blue")
abline(-1, 0.5, col = "yellow")
legend("topleft", c("Least Square", "Regression"), col= c("blue","yellow"), lty=c(1,1))

The noise was decreased by decreasing the variance. The relationship is strongly linear with an R^2 value of .966 and a residual standard error value of 0.09028. The least square and regression lines nearly overlap for this model.Compared to the original model, the R^2 value is greater, and the standard error value is lower.

set.seed(1)
eps <-rnorm(100, sd = 0.5)
x <- rnorm(100)
y <- -1 + 0.5*x + eps
model_more <- lm(y~x)
summary(model_more)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.16208 -0.30181  0.00268  0.29152  1.14658 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.94557    0.04517  -20.93   <2e-16 ***
## x            0.49953    0.04736   10.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4514 on 98 degrees of freedom
## Multiple R-squared:  0.5317, Adjusted R-squared:  0.5269 
## F-statistic: 111.2 on 1 and 98 DF,  p-value: < 2.2e-16
plot(x,y)
abline(model_more, col="blue")
abline(-1, 0.5, col = "yellow")
legend("topleft", c("Least Square", "Regression"), col= c("blue","yellow"), lty=c(1,1))

The noise was increased by increasing the variancee. The relationshop is moderarely weak but still linear with an R^2 value of 0.5317, which is greater than the original model created without changing the noise but less than the model created by decreasing noise.This model also has a residual standard of error of 0.4514 which is greater than the original model but less than the model created by decreasing noise

  1. Original Confidence Interval
confint(model)
##                  2.5 %     97.5 %
## (Intercept) -1.0575402 -0.9613061
## x            0.4462897  0.5531801

Less Noise Confidence Interval

confint(model_less)
##                  2.5 %     97.5 %
## (Intercept) -1.0070441 -0.9711855
## x            0.4811096  0.5187039

More Noise Confidence Interval

confint(model_more)
##                  2.5 %     97.5 %
## (Intercept) -1.0352203 -0.8559276
## x            0.4055479  0.5935197

The ranges for each of the confidence intervals are very smilar, but as the noise increases, the confidence interval widens. When there is less noise in the data, there is more predictability in the data set.