##Once you finished, submit the generated word file or html file to WebCampus.

1. Corrosion loss in Cu-Ni alloys

We will use a dataset from package ‘faraway’. First install the package faraway and include it in the environment.

library(faraway)
data(corrosion)
corrosion
attach(corrosion)

The dataset contains 13 speicimens of 90/10 Cu-Ni alloys with varying iron content in percent Fe, and their weight loss loss due to corrosion when they were submerged in sea water for 60 days.

  1. Draw a scatter plot, make sure you specify the correct explanatory variable and the response variable.
plot(Fe,loss)

Describe what pattern you observe for this data, is it reasonable to consider a linear model? Why or why not?

We observe a decreasing pattern. When iron content increases the weight loss decreases. It is reasonable to consider a linear model because the observations are approximately linear in their trend.

  1. Fit a linear model, describe your fitted model and interepret the meaning of the slope.
fit=lm(loss~Fe, data = corrosion)
fit
## 
## Call:
## lm(formula = loss ~ Fe, data = corrosion)
## 
## Coefficients:
## (Intercept)           Fe  
##      129.79       -24.02

The model we fit is \(y=-24.02x+129.79+e\), which means 1 percent increase in iron content will result in a 24.02 decrease in the weight loss due to corrosion.When iron content is zero percent, the expected weight loss is 129.79mg per square decimeter.

  1. Add the fitted line to the original scatter plot.

The Purple plot point represents the expected weight loss for a specimen with 0.9% iron content.

plot(Fe, loss)
abline(fit)
points(0.9,108.1687, col=6)

  1. Conduct inference on the parameter. If we want to test \(H_0: \beta_1=0\) vs. \(H_1: \beta_1\neq 0\), should we reject \(H_0\) or not?
summary(fit)
## 
## Call:
## lm(formula = loss ~ Fe, data = corrosion)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7980 -1.9464  0.2971  0.9924  5.7429 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  129.787      1.403   92.52  < 2e-16 ***
## Fe           -24.020      1.280  -18.77 1.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.058 on 11 degrees of freedom
## Multiple R-squared:  0.9697, Adjusted R-squared:  0.967 
## F-statistic: 352.3 on 1 and 11 DF,  p-value: 1.055e-09

Since the p-value is close to zero, we should reject the \(H_0:\beta_1=0\). We are 99 percent confident that \(\beta_1\) is not zero.

  1. Conduct ANOVA F-test, what conclucsion can we make about this linear model? How can we say about the influence of iron conent on the corrosion of alloys?
anova(fit)

The test shows that the slope of the explanatory variable is statistically significant since the p-value is close to zero. This linear model (\(y=\beta_1 x+\beta_0+e\)) is appropriate for this data. We can conclude there is a negative influence of the iron content on the weight loss of alloys.

  1. What is the percentage of the variation in corrosion loss explained by this model?
summary(fit)
## 
## Call:
## lm(formula = loss ~ Fe, data = corrosion)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7980 -1.9464  0.2971  0.9924  5.7429 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  129.787      1.403   92.52  < 2e-16 ***
## Fe           -24.020      1.280  -18.77 1.06e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.058 on 11 degrees of freedom
## Multiple R-squared:  0.9697, Adjusted R-squared:  0.967 
## F-statistic: 352.3 on 1 and 11 DF,  p-value: 1.055e-09

There is a 96.97 percent variation in corrosion loss explained by this model. This means that almost all the variation in corrosion loss is explained by the model, making it a good representation of the trend.

  1. Suppose we have an new specimen with 0.9% iron content, what is the expected weight loss for this specimen? What is the 95% confidence interval for the expected response?
newx=data.frame(x=0.9)
colnames(newx) = "Fe"
newx
predict(fit, newdata=newx, interval="confidence",level=0.95)
##        fit      lwr      upr
## 1 108.1687 106.3006 110.0368

The expected weight loss of a specimen with 0.9% iron content is 108.1687 mg per square decimeter per day and the 95% CI for expected weight loss is (106.3006, 110.0368).

  1. For the same new specimen above, what is the 95% prediction interval for the weight loss?
predict(fit, newdata=newx, interval="prediction",level=0.95)
##        fit      lwr      upr
## 1 108.1687 101.1841 115.1533

The 95% prediction interval for the weight loss is (101.1841, 115.1533) when the iron content is 0.9%.

  1. Draw a plot with the information of original data set, fitted line, 95% confidence band and 95% prediction band.
x=seq(from=0, to=2, by=0.1)
newx=data.frame(x);colnames(newx)="Fe"
CI=predict(fit,newdata = newx, interval = "confidence")
PI=predict(fit,newdata=newx, interval = "predict")
plot(Fe, loss)
abline(fit)
lines(x,CI[,2],col=6)
lines(x,CI[,3],col=6)
lines(x,PI[,2],col=3)
lines(x,PI[,3],col=3)

The green lines represent the prediction interval and the purple lines represent the confidence interval.

2. Tree growth prediction

In forestry study, biologist measures the circumference of trees to predict the height of the tree. The excel file contains information about 15 trees with measurment of circumference and height.

  1. Import the data set and make a scatter plot, describe what pattern you observe.
library(readxl)
tree1 <- read_excel("tree1.xlsx")
tree1
attach(tree1)
plot(circ, height)

We observe an increasing pattern. As the circumference of a tree increases in size, its height also increases. The height and circumference look to increase in an approximately linear fashion.

  1. Because it’s reasonable to assume trees with 0 circumference will have 0 height, so we want fit a linear regression model through the origin (no-intercept)
fit=lm(height~circ-1)
fit
## 
## Call:
## lm(formula = height ~ circ - 1)
## 
## Coefficients:
##  circ  
## 10.11

The model we fit is \(y=10.11x+e\). A 1 percent increase in circumference will result in a 10.11 increase in height. When circumference is zero the height is also zero.

plot(circ, height)
abline(fit)

summary(fit)
## 
## Call:
## lm(formula = height ~ circ - 1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.354  -8.061   5.117  22.025  68.819 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)    
## circ  10.1057     0.1293   78.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.21 on 14 degrees of freedom
## Multiple R-squared:  0.9977, Adjusted R-squared:  0.9976 
## F-statistic:  6111 on 1 and 14 DF,  p-value: < 2.2e-16
  1. If a new tree is measured to have circumference of 70 inches, what is the expected height of this tree? What is the 95% prediction interval for its height?
newtree = data.frame(x=70)
colnames(newtree)="circ"
newtree
predict(fit,newdata=newtree, interval="prediction",level=0.95)
##        fit      lwr      upr
## 1 707.4021 635.6398 779.1645

The expected height of a tree with a circumference of 70 inches is 707.4021 inches. The 95% prediction interval for tree height is (635.6398,779.1645) when circumference is 70 inches.