Section 12.2 Problem 16: The article gave a scatter plot, along with the least squares line of x = rainfall volume (m3) and y = runoff volume (m3) for a particular location. The accompanying values were read from the plot.
x <- c(5, 12, 14, 17, 23, 30, 40, 47, 55, 67, 72, 81, 96, 112, 127)
y <- c(4, 10, 13, 15, 15, 25, 27, 46, 38, 46, 53, 70, 82, 99, 100)
a) Does a scatter plot of the data support the use of the simple linear regression model? Scatter plot:
plot(x, y, xlab = "Rainfall Volume (m^3)", ylab = "Runoff Volume", pch = 19,
main = "Scatter Plot", col = "blue")
The scatter plot looks like a reasonable linear relationship.
summary(x, y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 20.0 47.0 53.2 76.5 127.0
cor(x, y)
## [1] 0.9876
b.) Fitting a linear regression model and observing the regression output, we see:
fit <- lm(y ~ x)
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.28 -4.42 1.21 3.15 8.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1283 2.3678 -0.48 0.64
## x 0.8270 0.0365 22.64 7.9e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.24 on 13 degrees of freedom
## Multiple R-squared: 0.975, Adjusted R-squared: 0.973
## F-statistic: 513 on 1 and 13 DF, p-value: 7.9e-12
c.) Calculate a point estimate of the true average runoff volume when rainfall volume is 50.
plot(y ~ x, xlab = "Rainfall Volume (m^3)", ylab = "Runoff Volume", pch = 19,
main = "Scatter Plot", col = "blue")
abline(fit)
Note that the estimated regression line is y= -1.1283 + 0.827 x. Replacing x with 50 yields y= -1.1283 + 41.3487 Then
y <- fit$coefficients[1] + fit$coefficients[2] * 50
y
## (Intercept)
## 40.22
So the point estimate is (50,40.2204)
d.) Calculate a point estimate of the standard deviation sigma.
summary(fit)$sigma
## [1] 5.24
e.) What proportion of the observed variation in runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall?
summary(fit)$r.squared
## [1] 0.9753
summary(fit)$adj.r.squared
## [1] 0.9734
Approximately 0.9753 of the data can be attributed to the simple linear regression model.
Section 12.4 Problem 38
Refer to the data on x = liberation rate and y = NO3 emmission rate given in exercise 19:
x <- c(100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, 400)
y <- c(150, 140, 180, 210, 190, 320, 280, 400, 430, 440, 390, 600, 610, 670)
a.) Does the simple linear regression model specify a useful relationship between the two rates? Use the appropriate test procedure to obtain information about the P value and then reach a conclusion at the significance level .01
plot(x, y, xlab = "Burner area Liberation Rate (MBtu/hr-ft^2)", ylab = "No3 emission rate (ppm)",
pch = 19, main = "Scatter Plot", col = "gold")
The scatter plot looks like a reasonable linear fit.
summary(x, y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 100 150 225 236 300 400
cor(x, y)
## [1] 0.9802
fit <- lm(y ~ x)
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.88 -26.20 5.23 24.12 47.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -45.5519 25.4678 -1.79 0.099 .
## x 1.7114 0.0997 17.17 8.2e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.7 on 12 degrees of freedom
## Multiple R-squared: 0.961, Adjusted R-squared: 0.958
## F-statistic: 295 on 1 and 12 DF, p-value: 8.23e-10
plot(y ~ x, xlab = "Burner area Liberation Rate (MBtu/hr-ft^2)", ylab = "No3 emission rate (ppm)",
pch = 19, main = "Scatter Plot", col = "gold")
abline(fit)
Since the p Value is 8.23e-10, which is much smaller than 0.01, we have statistical evidence against the null hypothesis and can state that there is evidence of a linear association between the variables x and y.
b.) Compute a 95% CI for the expected change in emission rate associated with a 10 MBtu/hr-ft2 increase in liberation rate.
tble <- summary(fit)$coefficients
slope.mean <- tble[2, 1]
slope.se <- tble[2, 2]
df <- fit$df.residual
critval <- abs(qt(0.025, df))
CIlow <- slope.mean - critval * slope.se
CIup <- slope.mean + critval * slope.se
A 95% CI for the slope is (1.4942, 1.9286).