Chapter 10: Inference for Regression

Example 1: Beer and Blood Alcohol

The following are the R commands and answers to the in-class handout.

1

(a)

Load the data and create a scatterplot:

bac<-read.file("/home/emesekennedy/Data/Ch10/bac.txt")
## Reading data with read.table()
xyplot(BAC~Beers, data=bac)

(b)

As the scatterplot shows, there is a fairly strong positive linear relationship between the number of beers drank and blood alcohol content.

(c)

cor(BAC~Beers, data=bac)
## [1] 0.8943381

\(r=.8943\), which means that the linear relationship between the two variables is strong and positive since the \(r\) value is close to 1.

(d)

cor(BAC~Beers, data=bac)^2
## [1] 0.7998407

\(r^2=.7998\), which means that 79.98% of the variation in blood alcohol content can explained by a least-squares regression line.

(e)

Fit a regression line:

y<-lm(BAC~Beers, data=bac)

Display the slope and intercept of the line:

y
## 
## Call:
## lm(formula = BAC ~ Beers, data = bac)
## 
## Coefficients:
## (Intercept)        Beers  
##    -0.01270      0.01796

The least squares regression line is \(\hat{y}=.01796\text{Beers}-.0127\)

(f)

Create a function out of the regression line:

f<-makeFun(y)

Plot the line on the scatterplot:

plotFun(f(Beers)~Beers, data=bac, add=T)

(g)

Plot the residuals:

mplot(y, which=1)
## [[1]]

The residuals look random with no outliers, which means that the least-squares regression line fits the data well.

(h)

Create a Normal quantile plot of the residuals:

mplot(y, which=2)
## [[1]]

The residuals appear fairly close to Normal, so it is appropriate to use the inference procedures from Chapter 10 on the regression line.

2

(a)

\(H_0: \rho=0\) (no correlation between beers and BAC)

\(H_a: \rho>0\)

(b)

Get the statistics for the regression line:

summary(y)
## 
## Call:
## lm(formula = BAC ~ Beers, data = bac)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.027118 -0.017350  0.001773  0.008623  0.041027 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.012701   0.012638  -1.005    0.332    
## Beers        0.017964   0.002402   7.480 2.97e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02044 on 14 degrees of freedom
## Multiple R-squared:  0.7998, Adjusted R-squared:  0.7855 
## F-statistic: 55.94 on 1 and 14 DF,  p-value: 2.969e-06
2.969/2
## [1] 1.4845

The value of the test statistic is \(t=7.48\) and the \(P\)-value is \(1.4845\times 10^{-6}\).

(c)

The \(P\)-value is very small (\(P<.0001\)), so we can reject the null hypothesis at the .01% significance level. This means that the data provides very strong evidence to coclude that there is a positive correlation between number of beers and blood alcohol content.

3

(a)

predict(y, data.frame(Beers=5), interval="confidence", level=.9)
##          fit        lwr       upr
## 1 0.07711821 0.06808261 0.0861538

The 90% confidence interval for the mean blood alcohol content corresponding to 5 beers is \((.068, .086)\) which means that we are 90% confident that the average blood alcohol content after 5 beers is between .068 and .086.

(b)

predict(y, data.frame(Beers=5), interval="prediction", level=.9)
##          fit        lwr       upr
## 1 0.07711821 0.03999884 0.1142376

The 90% prediction interval for blood alcohol content corresponding to 5 beers is \((.04, .114)\). This means that there is a 90% chance that a person who drinks 5 beers will have a blood alcohol content between .04 and .114.

(c)

xyplot(BAC~Beers, data=bac, panel=panel.lmbands)

(d)

No, the student cannot be confident that he won’t be arrested if he drives after 5 beers and is stopped because there is a good chance that his blood alcohol content will be over .08.