Chapter 10 (cont.)

Example 3: Temperature and Utility Bills

Below are the R commands and answers for the More Inference for Regression handout posted on Blackboard.

1

(a)

Load the data:

bills<-read.file("/home/emesekennedy/Data/bills.txt")
## Reading data with read.table()

(b)

xyplot(gasbill~temp, data=bills)

The relationship between temperature and gas bill appears to be strong and negative. It looks fairly linear with a slight curve.

(c)

cor(gasbill~temp, data=bills)
## [1] -0.9267069

The correlation is \(r=-.9267\), which means that there is a strong negative linear relationship between the two variables.

(d)

y<-lm(gasbill~temp, data=bills)
y
## 
## Call:
## lm(formula = gasbill ~ temp, data = bills)
## 
## Coefficients:
## (Intercept)         temp  
##     228.896       -3.034

The equation of the least-squares regression line is \(\hat{y}=-3.034\text{temp}+228.896\).

(e)

mplot(y, which=1)
## [[1]]

The residuals seem to have a slight curve, which indicates that the least-squares regression line does not capture some behavior in the data.

mplot(y, which=2)
## [[1]]

The residuals are close to Normal, so it is appropriate to use the inference procedures from Chapter 10.

(f)

\(H_0: \rho=0\) (no correlation between temperature and gas bill)

\(H_a: \rho\ne0\)

summary(y)
## 
## Call:
## lm(formula = gasbill ~ temp, data = bills)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.400 -15.868  -0.015  16.551  75.723 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  228.895      6.419   35.66   <2e-16 ***
## temp          -3.034      0.123  -24.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.15 on 100 degrees of freedom
## Multiple R-squared:  0.8588, Adjusted R-squared:  0.8574 
## F-statistic: 608.1 on 1 and 100 DF,  p-value: < 2.2e-16

The test statistic is \(t=-24.66\), the \(r^2=.8588\), and the the \(P\)-value is \(2.2\times 10^{-16}\). We can conclude that the data provides strong evidence that there is a correlation between temperature and gas bill. The \(r^2\) value indicates that 85.88% of the variation in the gas bill is explained by the regression line.

(g)

predict(y, data.frame(temp=52), interval="conf", level=.95)
##        fit      lwr      upr
## 1 71.11614 66.08417 76.14812

The 95% confidence interval for the mean gas bill corresponding to an average outdoor temperature of \(52^\circ\) F is \((55.08, 76.15)\). This means that we are 95% confident that the mean gas bill when the average outdoor temperature is \(52^\circ\) F is between $55.08 and $76.15.

(h)

predict(y, data.frame(temp=52), interval="predict", level=.95)
##        fit    lwr      upr
## 1 71.11614 20.966 121.2663

The 95% prediction interval for a future gas bill when the average outdoor temperature is \(52^\circ\) F is \((20.966, 121.27)\). This means that there is a 95% chance that when the average outdoor temperature is \(52^\circ\) F, the gas bill will be between $20.97 and $121.27.

(i)

Find \(t^*\):

xqt(.975, df=100)

## [1] 1.983972
-3.033-1.984*0.123
## [1] -3.277032
-3.033+1.984*0.123
## [1] -2.788968

The 95% confidence interval for the slope is \((-3.277, -2.789)\). This means that every \(1^\circ\) F increase in temperature will result in an average decrease of $3.28 to $2.79 in the gas bill.

2

Repeat question 1 with the electricity bill. #### (b)

xyplot(elecbill~temp, data=bills)

The scatterplot looks random, and there doesn’t seem to be a relationship between outdoor temperature and electricity bill.

(c)

cor(elecbill~temp, data=bills)
## [1] 0.3143961

The correlation is \(r=.3144\), which means that there is a very weak positive linear relationship between the average outdoor temperature and the electricity bill.

(d)

y<-lm(elecbill~temp, data=bills)
y
## 
## Call:
## lm(formula = elecbill ~ temp, data = bills)
## 
## Coefficients:
## (Intercept)         temp  
##     59.2082       0.3636

The equation of the least-squares regression line is \(\hat{y}=.3636\text{temp}+59.2082\).

(e)

mplot(y, which=1)
## [[1]]

The residuals seem fairly random, which indicates that the least-squares regression line captures the behavior of the data fairly well.

mplot(y, which=2)
## [[1]]

The residuals residuals appear to be fairly close to Normal, so it is appropriate to use the inference procedures from Chapter 10.

(f)

\(H_0: \rho=0\) (no correlation between temperature and electricity bill)

\(H_a: \rho\ne0\)

summary(y)
## 
## Call:
## lm(formula = elecbill ~ temp, data = bills)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.492 -18.870   0.748  16.279  57.127 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59.2082     5.7278  10.337  < 2e-16 ***
## temp          0.3636     0.1098   3.312  0.00129 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.44 on 100 degrees of freedom
## Multiple R-squared:  0.09884,    Adjusted R-squared:  0.08983 
## F-statistic: 10.97 on 1 and 100 DF,  p-value: 0.00129

The test statistic is \(t=3.312\), the \(r^2=.09884\), and the the \(P\)-value is \(.00129\). We can conclude that the data provides evidence at the 1% significance level that there is a correlation between temperature and electricity bill. The \(r^2\) value indicates that only 9.88% of the variation in the electricity bill is explained by the regression line.

(g)

predict(y, data.frame(temp=52), interval="conf", level=.95)
##        fit      lwr      upr
## 1 78.11482 73.62502 82.60462

The 95% confidence interval for the mean gas bill corresponding to an average outdoor temperature of \(52^\circ\) F is \((73.625, 82.605)\). This means that we are 95% confident that the mean electricity bill when the average outdoor temperature is \(52^\circ\) F is between $73.63 and $82.61.

(h)

predict(y, data.frame(temp=52), interval="predict", level=.95)
##        fit      lwr      upr
## 1 78.11482 33.36814 122.8615

The 95% prediction interval for a future electricity bill when the average outdoor temperature is \(52^\circ\) F is \((33.3681, 122.8615)\). This means that there is a 95% chance that when the average outdoor temperature is \(52^\circ\) F, the electricity bill will be between $33.37 and $122.86.

(i)

Find \(t^*\):

xqt(.975, df=100)

## [1] 1.983972
0.3636-1.984*0.1098
## [1] 0.1457568
0.3636+1.984*0.1098
## [1] 0.5814432

The 95% confidence interval for the slope is \((0.145, 0.581)\). This means that every \(1^\circ\) F increase in temperature will result in an average increase of $0.15 to $0.58 in the electricity bill.

3

Repeat question 1 to investigate the correlation between gas bill and electricity bill.

(b)

xyplot(elecbill~gasbill, data=bills)

The scatterplot looks fairly random with a slight curve. There is a weak curved relationship between gas bill and electricity bill.

(c)

cor(elecbill~gasbill, data=bills)
## [1] -0.1866627

The correlation is \(r=-.1867\), which means that there is a very weak negative linear relationship between the gas bill and the electricity bill.

(d)

y<-lm(elecbill~gasbill, data=bills)
y
## 
## Call:
## lm(formula = elecbill ~ gasbill, data = bills)
## 
## Coefficients:
## (Intercept)      gasbill  
##    82.16380     -0.06593

The equation of the least-squares regression line is \(\hat{y}=-.06583x+82.1638\), where \(x\) is the value of the gas bill.

(e)

mplot(y, which=1)
## [[1]]

The residuals seem fairly random, which indicates that the least-squares regression line captures the behavior of the data fairly well.

mplot(y, which=2)
## [[1]]

The Normal quantile plot of the residuals shows a slight curve. This means that inference procedures from Chapter 10 might not be accurate.

(f)

\(H_0: \rho=0\) (no correlation between gas bill and electricity bill)

\(H_a: \rho\ne0\)

summary(y)
## 
## Call:
## lm(formula = elecbill ~ gasbill, data = bills)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.826 -21.436   1.485  17.151  54.509 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 82.16380    3.68504    22.3   <2e-16 ***
## gasbill     -0.06593    0.03470    -1.9   0.0603 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.22 on 100 degrees of freedom
## Multiple R-squared:  0.03484,    Adjusted R-squared:  0.02519 
## F-statistic:  3.61 on 1 and 100 DF,  p-value: 0.06031

The test statistic is \(t=-1.9\), the \(r^2=.03484\), and the the \(P\)-value is \(.06031\). We cannot reject the null hypothesis at the 5% significance level, so the data does not provide enough evidence to conclude that there is a correlation between the gas bill and the electricity bill. The \(r^2\) value indicates that only 3.48% of the variation in the electricity bill is explained by the regression line.