For testing the significance between x and y, set H_0: Beta1 hat = 0 and H_1: Beta1 hat =/= 0. Beta1 hat is normally distributed with mean 0 and variance Beta hat. The test stat is beta_1 hat over the SE(beta1 hat) which is normally distributed with mean 0 and variance 1.

data(women)
attach(women)
mymod <- lm(weight ~ height, data=women)
summary(mymod)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14
cor(height,weight)
## [1] 0.9954948

\[\hat{y_i}= \hat{\beta_0}+\hat{\beta_0} x_i\]

now we can look at confidence intervals and prediction intervals. confidence intervals are an estimate of the mean of y when x = x_0. we know beta_0 = 25.723456 and beta_1 = .287249 from our summary. first, create a new data frame

newdata <- data.frame(height=50)
conf <- predict(mymod, newdata, interval="confidence")
pred <- predict(mymod, newdata, interval="predict")
conf
##        fit      lwr      upr
## 1 84.98333 81.90994 88.05673
pred
##        fit      lwr      upr
## 1 84.98333 80.47778 89.48888

as you can see, the prediction interval is larger than the confidene interval and this will always be true for every prediction and confidence interval for the same assumptions. we can measure the length just to be sure:

conf %*% c(0, -1, 1)
##       [,1]
## 1 6.146788
pred %*% c(0, -1, 1)
##       [,1]
## 1 9.011097

and yes, the prediction interval is longer. If we want to check that they are centered at the same number, we can do so by writing:

conf[1]==pred[1]
## [1] TRUE

which we now see that the mean is the same.

We used distance values to calculate the confidence nad prediction intervals but we don’t need to do so in R because it’s an intermediate step that R can skip.

We worked on correlation coefficients, which only works for 2 variables, simple coefficient of determination, which worksmore generally and for multiple variables. Lastly we learned about F stat testing, which basically just asks if the variables are related (null hypothesis) or nah. our test stat is the explained variation / (unexplained variation/(n-2)) which is an f distribution with n-2 df and we will reject the null if the test stat is less than F_alpha or if the pvalue is less than alpha.

var.test(height,weight,alternative="two.sided")
## 
##  F test to compare two variances
## 
## data:  height and weight
## F = 0.083261, num df = 14, denom df = 14, p-value = 3.586e-05
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.02795306 0.24799912
## sample estimates:
## ratio of variances 
##         0.08326065