STT 430 Homework 11

Jessica Stuart

Section 12.2, #16. The article “Characterization of Highway Runoff in Austin, Texas, Area” gave a scatter plot, along with the least squares line, of x = rainfall volume and y = runoff volume for a particular values were read from the plot.

a) Does a scatter plot of the data support the use of the simple linear regression model?

rainfall.vol <- c(5, 12, 14, 17, 23, 30, 40, 47, 55, 67, 72, 81, 96, 112, 127)
runoff.vol <- c(4, 10, 13, 15, 15, 25, 27, 46, 38, 46, 53, 70, 82, 99, 100)
plot(runoff.vol, rainfall.vol, xlab = "Rainfall Volume", ylab = "Runoff Volume", 
    main = "Highway Runoff in Austin, Texas, Area")

plot of chunk unnamed-chunk-1

Most of the data points on the scatter plot appear to lie on a straight line, supporting the use of the simple linear regression model.

b) Calculate point estimates of the slope and intercept of the population regression line.

fit <- lm(runoff.vol ~ rainfall.vol)
summary(fit)
## 
## Call:
## lm(formula = runoff.vol ~ rainfall.vol)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8.28  -4.42   1.21   3.15   8.26 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.1283     2.3678   -0.48     0.64    
## rainfall.vol   0.8270     0.0365   22.64  7.9e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.24 on 13 degrees of freedom
## Multiple R-squared:  0.975,  Adjusted R-squared:  0.973 
## F-statistic:  513 on 1 and 13 DF,  p-value: 7.9e-12

The estimate of the slope is 0.8270 and the esimate of the intercept is -1.1283 for the population regression line.

c) Calculate a point estimate of the true average runoff volume when rainfall volume is 50.

est <- c(fit$coefficients[1] + fit$coefficients[2] * 50)
est
## (Intercept) 
##       40.22

When rainfall volume is 50, the estimate of the true average runoff volume is 40.22.

d) Calculate a point estimate of the standard deviation \( \sigma \).

summary(fit)$sigma
## [1] 5.24

The estimate of the standard deviation \( \sigma \) is 5.24.

e) What proportion of the observed variation in runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall?

summary(fit)$r.squared
## [1] 0.9753

The proportion of the observed variation in runoff volume that can be attributed to the simple linear regression relationship between runoff and rainfall is 0.9753.

Section 12.3, #38. The following data is representative of that reported in the article “An Experimental Correlation of Oxides of Nitrogen Emissions from Power Boilers Based on Field Data”, with x = burner-area liberation rate and y = oxides of nitrogen emission rate.

a) Does the simple linear regression model specify a useful relationship between the two rates? Use the appropriate test procedure to obtain information about the P-value, and then reach a conclusion at significance level 0.01.

x <- c(100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, 400)
y <- c(150, 140, 180, 210, 190, 320, 280, 400, 430, 440, 390, 600, 610, 670)
plot(x, y)

plot of chunk unnamed-chunk-6

fit <- lm(x ~ y)
summary(fit)
## 
## Call:
## lm(formula = x ~ y)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -26.22 -13.59  -6.04  10.83  46.24 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  34.7964    12.9849    2.68     0.02 *  
## y             0.5614     0.0327   17.17  8.2e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21 on 12 degrees of freedom
## Multiple R-squared:  0.961,  Adjusted R-squared:  0.958 
## F-statistic:  295 on 1 and 12 DF,  p-value: 8.23e-10

Based on the scatter plot, the simple linear regression model specifies a useful relationship between the two rates since most of the data points appear to lie on a straight line. After calculating the estimate slope and intercept and standard error for them, the P-value was determined to be 8.2e-10, which is much smaller than 0.01, so that a 1% significance level, our P-value is significant. Thus, we can reject the null hypothesis and conclude that there is a relationship between the two rates.

b) Compute a 95% CI for the expected change in emission rate associated with a 10 MBtu/hr-ft2 increase in liberation rate.

tble <- summary(fit)$coefficients
slope.mean <- tble[2, 1]
slope.se <- tble[2, 2]
df <- fit$df.residual
critval <- abs(qt(0.025, df))

CIlow <- slope.mean - critval * slope.se
CIup <- slope.mean + critval * slope.se

A 95% confidence interval for the expected change in emission rate is (0.4902, 0.6327).