Homework 11

Kellie Stilson

Section 12.2 Problem 16: The article gave a scatter plot, along with the least squares line of x = rainfall volume (m³⁾ and y = runoff volume (m³⁾ for a particular location. The accompanying values were read from the plot.

x <- c(5, 12, 14, 17, 23, 30, 40, 47, 55, 67, 72, 81, 96, 112, 127)
y <- c(4, 10, 13, 15, 15, 25, 27, 46, 38, 46, 53, 70, 82, 99, 100)

a) Does a scatter plot of the data support the use of the simple linear regression model? Scatter plot:

plot(x, y, xlab = "Rainfall Volume (m^3)", ylab = "Runoff Volume", pch = 19, 
    main = "Scatter Plot", col = "blue")

plot of chunk unnamed-chunk-2

The scatter plot looks like a reasonable linear relationship.

summary(x, y)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     5.0    20.0    47.0    53.2    76.5   127.0

cor(x, y)

## [1] 0.9876

b.) Fitting a linear regression model and observing the regression output, we see:

fit <- lm(y ~ x)
summary(fit)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8.28  -4.42   1.21   3.15   8.26 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.1283     2.3678   -0.48     0.64    
## x             0.8270     0.0365   22.64  7.9e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.24 on 13 degrees of freedom
## Multiple R-squared:  0.975,  Adjusted R-squared:  0.973 
## F-statistic:  513 on 1 and 13 DF,  p-value: 7.9e-12

c.) Calculate a point estimate of the true average runoff volume when rainfall volume is 50.

plot(y ~ x, xlab = "Rainfall Volume (m^3)", ylab = "Runoff Volume", pch = 19, 
    main = "Scatter Plot", col = "blue")
abline(fit)

plot of chunk unnamed-chunk-5

Note that the estimated regression line is y= -1.1283 + 0.827 x. Replacing x with 50 yields y= -1.1283 + 41.3487 Then

y <- fit$coefficients[1] + fit$coefficients[2] * 50
y

## (Intercept) 
##       40.22

So the point estimate is (50,40.2204)

d.) Calculate a point estimate of the standard deviation sigma.

summary(fit)$sigma

## [1] 5.24

e.) What proportion of the observed variation in runoff volume can be attributed to the simple linear regression relationship between runoff and rainfall?

summary(fit)$r.squared

## [1] 0.9753

summary(fit)$adj.r.squared

## [1] 0.9734

Approximately 0.9753 of the data can be attributed to the simple linear regression model.

Section 12.4 Problem 38

Refer to the data on x = liberation rate and y = NO3 emmission rate given in exercise 19:

x <- c(100, 125, 125, 150, 150, 200, 200, 250, 250, 300, 300, 350, 400, 400)
y <- c(150, 140, 180, 210, 190, 320, 280, 400, 430, 440, 390, 600, 610, 670)

a.) Does the simple linear regression model specify a useful relationship between the two rates? Use the appropriate test procedure to obtain information about the P value and then reach a conclusion at the significance level .01

plot(x, y, xlab = "Burner area Liberation Rate (MBtu/hr-ft^2)", ylab = "No3 emission rate (ppm)", 
    pch = 19, main = "Scatter Plot", col = "gold")

plot of chunk unnamed-chunk-10

The scatter plot looks like a reasonable linear fit.

summary(x, y)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     100     150     225     236     300     400

cor(x, y)

## [1] 0.9802

fit <- lm(y ~ x)
summary(fit)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -77.88 -26.20   5.23  24.12  47.69 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -45.5519    25.4678   -1.79    0.099 .  
## x             1.7114     0.0997   17.17  8.2e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.7 on 12 degrees of freedom
## Multiple R-squared:  0.961,  Adjusted R-squared:  0.958 
## F-statistic:  295 on 1 and 12 DF,  p-value: 8.23e-10

plot(y ~ x, xlab = "Burner area Liberation Rate (MBtu/hr-ft^2)", ylab = "No3 emission rate (ppm)", 
    pch = 19, main = "Scatter Plot", col = "gold")
abline(fit)

plot of chunk unnamed-chunk-11

Since the p Value is 8.23e^-10, which is much smaller than 0.01, we have statistical evidence against the null hypothesis and can state that there is evidence of a linear association between the variables x and y.

b.) Compute a 95% CI for the expected change in emission rate associated with a 10 MBtu/hr-ft² increase in liberation rate.

tble <- summary(fit)$coefficients
slope.mean <- tble[2, 1]
slope.se <- tble[2, 2]
df <- fit$df.residual
critval <- abs(qt(0.025, df))

CIlow <- slope.mean - critval * slope.se
CIup <- slope.mean + critval * slope.se

A 95% CI for the slope is (1.4942, 1.9286).