Over the course of the class period, the class covered chapter 3 with a focus on sections 3.5, 3.6, 3.7, and 3.8.

3.5 Significance of Slope

Section 3.5 covered the significance of a slope of a data. Using the data Women and the null hypothesis H0= 0, which means that the slope is 0, and the alternative hypothesis HA not equal to 0.

attach(women)
mod<-lm(height~weight)
mod
## 
## Call:
## lm(formula = height ~ weight)
## 
## Coefficients:
## (Intercept)       weight  
##     25.7235       0.2872
summary(mod)
## 
## Call:
## lm(formula = height ~ weight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83233 -0.26249  0.08314  0.34353  0.49790 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 25.723456   1.043746   24.64 2.68e-12 ***
## weight       0.287249   0.007588   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.44 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

The mod shows that the intercept for height of women is 25.7235 and that the slope is .2872 per pound of weight. The summary shows that the p value is 1.091e-14 which since it was so small I would reject the H0 in favor of HA. This means that the slope is not 0 or a flat line.

Section 3.6 covered intervals such as confidence intervals and prediction intervals and the “distance value”. The “distance value” for a particular X0 is (1/n +((x0-xbar)^2)/SSxx) where n is the sample size and xbar is the mean and SSxx is equal to the sum of xi^2 minus the ((sum of xi)^2)/n. This is important because the distance value is used to calucate the confidence interval and the prediction intervals. Confidence interval when s=sqrt(MSE) where MSE means mean squared error is yhat +- t(n-2) a/2 ssqrt(distance value) and the prediction interval is yhat +- t(n-2) a/2 ssqrt(distance value+1), which also shows that the interval for prediction is larger then the confidence interval. Section 3.7 looked at variation and corelation.

3.6 Confidence interval and Prediction interval

Section 3.6 covered intervals such as confidence intervals and prediction intervals and the “distance value”. The “distance value” for a particular X0 is (1/n +((x0-xbar)^2)/SSxx) where n is the sample size and xbar is the mean and SSxx is equal to the sum of xi^2 minus the ((sum of xi)^2)/n. This is important because the distance value is used to calucate the confidence interval and the prediction intervals. Confidence interval when s=sqrt(MSE) where MSE means mean squared error is yhat +- t(n-2) a/2 ssqrt(distance value) and the prediction interval is yhat +- t(n-2) a/2 ssqrt(distance value+1), which also shows that the interval for prediction is larger than the confidence interval.

The confidence interval is for the mean value of y when x=x0. The confidence interval’s variation for a mean is smaller than the variation for a point, which could be found with a prediction interval. The variance of a mean is the variance over the sample size or var(xbar)= sigma^2/n. The variance of the point is just the variance or sigma^2. Even though the variances differ in size, they are still both centered at yhat=B0 hat+ B1 hat * x0.

confint(mod)
##                  2.5 %     97.5 %
## (Intercept) 23.4685789 27.9783326
## weight       0.2708562  0.3036423
newdata<-data.frame(weight=120)
predy<-predict(mod,newdata,interval="predict")
predy
##        fit      lwr      upr
## 1 60.19336 59.17394 61.21279
confy<-predict(mod,newdata,interval="confidence")
confy
##        fit      lwr      upr
## 1 60.19336 59.82527 60.56146

According to what the confint command gave us, we are 95% certain that the slope is between .271 and .304.

Predy and confy both predict the intervals for prediction and confidence for a women’s height when their weight is 120 pounds. As you can see the intervals differ somewhat, with the confidence interval being smaller. We can also check their size using r.

confy %*% c(0, -1, 1)
##        [,1]
## 1 0.7361925
predy %*% c(0, -1, 1)
##       [,1]
## 1 2.038847

F tests and cor

F-stat= ((explained variation/ 1)/(unexplained variation/(n-2)) ~ F1,n-2 (under H0)

attach(women)
## The following objects are masked from women (pos = 3):
## 
##     height, weight
mod<-lm(height~weight)
summary(mod)
## 
## Call:
## lm(formula = height ~ weight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83233 -0.26249  0.08314  0.34353  0.49790 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 25.723456   1.043746   24.64 2.68e-12 ***
## weight       0.287249   0.007588   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.44 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14
cor(height,weight)
## [1] 0.9954948

As you can see from the summary, r^2 is .991, and the p-value is near 0 with a value of 1.091e^-14 under the F-statistic. This means that since the p value is so small we would reject the null and accept the Ha. Since the cor command was positive and close to 1 we can determine that height and weight have a positive association that can almost be completely explained by mod.