Section 12.2, #16. The article “Characterization of Highway Runoff in Austin, Texas, Area” gave a scatter plot, along with the least squares line, of x = rainfall volume and y = runoff volume for a particular values were read from the plot.
a) Does a scatter plot of the data support the use of the simple linear regression model?
rainfall.vol <- c(5, 12, 14, 17, 23, 30, 40, 47, 55, 67, 72, 81, 96, 112, 127)
runoff.vol <- c(4, 10, 13, 15, 15, 25, 27, 46, 38, 46, 53, 70, 82, 99, 100)
plot(runoff.vol, rainfall.vol, xlab = "Rainfall Volume", ylab = "Runoff Volume",
main = "Highway Runoff in Austin, Texas, Area")
Most of the data points on the scatter plot appear to lie on a straight line, supporting the use of the simple linear regression model.
b) Calculate point estimates of the slope and intercept of the population regression line.
fit <- lm(runoff.vol ~ rainfall.vol)
summary(fit)
##
## Call:
## lm(formula = runoff.vol ~ rainfall.vol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.28 -4.42 1.21 3.15 8.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1283 2.3678 -0.48 0.64
## rainfall.vol 0.8270 0.0365 22.64 7.9e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.24 on 13 degrees of freedom
## Multiple R-squared: 0.975, Adjusted R-squared: 0.973
## F-statistic: 513 on 1 and 13 DF, p-value: 7.9e-12
The estimate of the slope is 0.8270 and the esimate of the intercept is -1.1283 for the population regression line.
c) Calculate a point estimate of the true average runoff volume when rainfall volume is 50.
est <- c(fit$coefficients[1] + fit$coefficients[2] * 50)
est
## (Intercept)
## 40.22
When rainfall volume is 50, the estimate of the true average runoff volume is 40.22.
d) Calculate a point estimate of the standard deviation \( \sigma \).
summary(fit)$sigma
## [1] 5.24
The estimate of the standard deviation \( \sigma \) is 5.24.
e) What proportion of the observed variation in runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall?
summary(fit)$r.squared
## [1] 0.9753
The proportion of the observed variation in runoff volume that can be attributed to the simple linear regression relationship between runoff and rainfall is 0.9753.
Section 12.3, #38. The following data is representative of that reported in the article “An Experimental Correlation of Oxides of Nitrogen Emissions from Power Boilers Based on Field Data”, with x = burner-area liberation rate and y = oxides of nitrogen emission rate.
a) Does the simple linear regression model specify a useful relationship between the two rates? Use the appropriate test procedure to obtain information about the P-value, and then reach a conclusion at significance level 0.01.
x <- c(100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, 400)
y <- c(150, 140, 180, 210, 190, 320, 280, 400, 430, 440, 390, 600, 610, 670)
plot(x, y)
fit <- lm(x ~ y)
summary(fit)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.22 -13.59 -6.04 10.83 46.24
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.7964 12.9849 2.68 0.02 *
## y 0.5614 0.0327 17.17 8.2e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21 on 12 degrees of freedom
## Multiple R-squared: 0.961, Adjusted R-squared: 0.958
## F-statistic: 295 on 1 and 12 DF, p-value: 8.23e-10
Based on the scatter plot, the simple linear regression model specifies a useful relationship between the two rates since most of the data points appear to lie on a straight line. After calculating the estimate slope and intercept and standard error for them, the P-value was determined to be 8.2e-10, which is much smaller than 0.01, so that a 1% significance level, our P-value is significant. Thus, we can reject the null hypothesis and conclude that there is a relationship between the two rates.
b) Compute a 95% CI for the expected change in emission rate associated with a 10 MBtu/hr-ft2 increase in liberation rate.
tble <- summary(fit)$coefficients
slope.mean <- tble[2, 1]
slope.se <- tble[2, 2]
df <- fit$df.residual
critval <- abs(qt(0.025, df))
CIlow <- slope.mean - critval * slope.se
CIup <- slope.mean + critval * slope.se
A 95% confidence interval for the expected change in emission rate is (0.4902, 0.6327).