Below are the R commands and answers for the More Inference for Regression handout posted on Blackboard.
Load the data:
bills<-read.file("/home/emesekennedy/Data/bills.txt")
## Reading data with read.table()
xyplot(gasbill~temp, data=bills)
The relationship between temperature and gas bill appears to be strong and negative. It looks fairly linear with a slight curve.
cor(gasbill~temp, data=bills)
## [1] -0.9267069
The correlation is \(r=-.9267\), which means that there is a strong negative linear relationship between the two variables.
y<-lm(gasbill~temp, data=bills)
y
##
## Call:
## lm(formula = gasbill ~ temp, data = bills)
##
## Coefficients:
## (Intercept) temp
## 228.896 -3.034
The equation of the least-squares regression line is \(\hat{y}=-3.034\text{temp}+228.896\).
mplot(y, which=1)
## [[1]]
The residuals seem to have a slight curve, which indicates that the least-squares regression line does not capture some behavior in the data.
mplot(y, which=2)
## [[1]]
The residuals are close to Normal, so it is appropriate to use the inference procedures from Chapter 10.
\(H_0: \rho=0\) (no correlation between temperature and gas bill)
\(H_a: \rho\ne0\)
summary(y)
##
## Call:
## lm(formula = gasbill ~ temp, data = bills)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.400 -15.868 -0.015 16.551 75.723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 228.895 6.419 35.66 <2e-16 ***
## temp -3.034 0.123 -24.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.15 on 100 degrees of freedom
## Multiple R-squared: 0.8588, Adjusted R-squared: 0.8574
## F-statistic: 608.1 on 1 and 100 DF, p-value: < 2.2e-16
The test statistic is \(t=-24.66\), the \(r^2=.8588\), and the the \(P\)-value is \(2.2\times 10^{-16}\). We can conclude that the data provides strong evidence that there is a correlation between temperature and gas bill. The \(r^2\) value indicates that 85.88% of the variation in the gas bill is explained by the regression line.
predict(y, data.frame(temp=52), interval="conf", level=.95)
## fit lwr upr
## 1 71.11614 66.08417 76.14812
The 95% confidence interval for the mean gas bill corresponding to an average outdoor temperature of \(52^\circ\) F is \((55.08, 76.15)\). This means that we are 95% confident that the mean gas bill when the average outdoor temperature is \(52^\circ\) F is between $55.08 and $76.15.
predict(y, data.frame(temp=52), interval="predict", level=.95)
## fit lwr upr
## 1 71.11614 20.966 121.2663
The 95% prediction interval for a future gas bill when the average outdoor temperature is \(52^\circ\) F is \((20.966, 121.27)\). This means that there is a 95% chance that when the average outdoor temperature is \(52^\circ\) F, the gas bill will be between $20.97 and $121.27.
Find \(t^*\):
xqt(.975, df=100)
## [1] 1.983972
-3.033-1.984*0.123
## [1] -3.277032
-3.033+1.984*0.123
## [1] -2.788968
The 95% confidence interval for the slope is \((-3.277, -2.789)\). This means that every \(1^\circ\) F increase in temperature will result in an average decrease of $3.28 to $2.79 in the gas bill.
Repeat question 1 with the electricity bill. #### (b)
xyplot(elecbill~temp, data=bills)
The scatterplot looks random, and there doesn’t seem to be a relationship between outdoor temperature and electricity bill.
cor(elecbill~temp, data=bills)
## [1] 0.3143961
The correlation is \(r=.3144\), which means that there is a very weak positive linear relationship between the average outdoor temperature and the electricity bill.
y<-lm(elecbill~temp, data=bills)
y
##
## Call:
## lm(formula = elecbill ~ temp, data = bills)
##
## Coefficients:
## (Intercept) temp
## 59.2082 0.3636
The equation of the least-squares regression line is \(\hat{y}=.3636\text{temp}+59.2082\).
mplot(y, which=1)
## [[1]]
The residuals seem fairly random, which indicates that the least-squares regression line captures the behavior of the data fairly well.
mplot(y, which=2)
## [[1]]
The residuals residuals appear to be fairly close to Normal, so it is appropriate to use the inference procedures from Chapter 10.
\(H_0: \rho=0\) (no correlation between temperature and electricity bill)
\(H_a: \rho\ne0\)
summary(y)
##
## Call:
## lm(formula = elecbill ~ temp, data = bills)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.492 -18.870 0.748 16.279 57.127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.2082 5.7278 10.337 < 2e-16 ***
## temp 0.3636 0.1098 3.312 0.00129 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.44 on 100 degrees of freedom
## Multiple R-squared: 0.09884, Adjusted R-squared: 0.08983
## F-statistic: 10.97 on 1 and 100 DF, p-value: 0.00129
The test statistic is \(t=3.312\), the \(r^2=.09884\), and the the \(P\)-value is \(.00129\). We can conclude that the data provides evidence at the 1% significance level that there is a correlation between temperature and electricity bill. The \(r^2\) value indicates that only 9.88% of the variation in the electricity bill is explained by the regression line.
predict(y, data.frame(temp=52), interval="conf", level=.95)
## fit lwr upr
## 1 78.11482 73.62502 82.60462
The 95% confidence interval for the mean gas bill corresponding to an average outdoor temperature of \(52^\circ\) F is \((73.625, 82.605)\). This means that we are 95% confident that the mean electricity bill when the average outdoor temperature is \(52^\circ\) F is between $73.63 and $82.61.
predict(y, data.frame(temp=52), interval="predict", level=.95)
## fit lwr upr
## 1 78.11482 33.36814 122.8615
The 95% prediction interval for a future electricity bill when the average outdoor temperature is \(52^\circ\) F is \((33.3681, 122.8615)\). This means that there is a 95% chance that when the average outdoor temperature is \(52^\circ\) F, the electricity bill will be between $33.37 and $122.86.
Find \(t^*\):
xqt(.975, df=100)
## [1] 1.983972
0.3636-1.984*0.1098
## [1] 0.1457568
0.3636+1.984*0.1098
## [1] 0.5814432
The 95% confidence interval for the slope is \((0.145, 0.581)\). This means that every \(1^\circ\) F increase in temperature will result in an average increase of $0.15 to $0.58 in the electricity bill.
Repeat question 1 to investigate the correlation between gas bill and electricity bill.
xyplot(elecbill~gasbill, data=bills)
The scatterplot looks fairly random with a slight curve. There is a weak curved relationship between gas bill and electricity bill.
cor(elecbill~gasbill, data=bills)
## [1] -0.1866627
The correlation is \(r=-.1867\), which means that there is a very weak negative linear relationship between the gas bill and the electricity bill.
y<-lm(elecbill~gasbill, data=bills)
y
##
## Call:
## lm(formula = elecbill ~ gasbill, data = bills)
##
## Coefficients:
## (Intercept) gasbill
## 82.16380 -0.06593
The equation of the least-squares regression line is \(\hat{y}=-.06583x+82.1638\), where \(x\) is the value of the gas bill.
mplot(y, which=1)
## [[1]]
The residuals seem fairly random, which indicates that the least-squares regression line captures the behavior of the data fairly well.
mplot(y, which=2)
## [[1]]
The Normal quantile plot of the residuals shows a slight curve. This means that inference procedures from Chapter 10 might not be accurate.
\(H_0: \rho=0\) (no correlation between gas bill and electricity bill)
\(H_a: \rho\ne0\)
summary(y)
##
## Call:
## lm(formula = elecbill ~ gasbill, data = bills)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.826 -21.436 1.485 17.151 54.509
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82.16380 3.68504 22.3 <2e-16 ***
## gasbill -0.06593 0.03470 -1.9 0.0603 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.22 on 100 degrees of freedom
## Multiple R-squared: 0.03484, Adjusted R-squared: 0.02519
## F-statistic: 3.61 on 1 and 100 DF, p-value: 0.06031
The test statistic is \(t=-1.9\), the \(r^2=.03484\), and the the \(P\)-value is \(.06031\). We cannot reject the null hypothesis at the 5% significance level, so the data does not provide enough evidence to conclude that there is a correlation between the gas bill and the electricity bill. The \(r^2\) value indicates that only 3.48% of the variation in the electricity bill is explained by the regression line.