##Once you finished, submit the generated word file or html file to WebCampus.
We will use a dataset from package ‘faraway’. First install the package faraway and include it in the environment.
library(faraway)
data(corrosion)
corrosion
attach(corrosion)
The dataset contains 13 speicimens of 90/10 Cu-Ni alloys with varying iron content in percent Fe, and their weight loss loss due to corrosion when they were submerged in sea water for 60 days.
plot(Fe,loss)
Describe what pattern you observe for this data, is it reasonable to consider a linear model? Why or why not?
We observe a decreasing pattern. When iron content increases the weight loss decreases. It is reasonable to consider a linear model because the observations are approximately linear in their trend.
fit=lm(loss~Fe, data = corrosion)
fit
##
## Call:
## lm(formula = loss ~ Fe, data = corrosion)
##
## Coefficients:
## (Intercept) Fe
## 129.79 -24.02
The model we fit is \(y=-24.02x+129.79+e\), which means 1 percent increase in iron content will result in a 24.02 decrease in the weight loss due to corrosion.When iron content is zero percent, the expected weight loss is 129.79mg per square decimeter.
The Purple plot point represents the expected weight loss for a specimen with 0.9% iron content.
plot(Fe, loss)
abline(fit)
points(0.9,108.1687, col=6)
summary(fit)
##
## Call:
## lm(formula = loss ~ Fe, data = corrosion)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7980 -1.9464 0.2971 0.9924 5.7429
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 129.787 1.403 92.52 < 2e-16 ***
## Fe -24.020 1.280 -18.77 1.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.058 on 11 degrees of freedom
## Multiple R-squared: 0.9697, Adjusted R-squared: 0.967
## F-statistic: 352.3 on 1 and 11 DF, p-value: 1.055e-09
Since the p-value is close to zero, we should reject the \(H_0:\beta_1=0\). We are 99 percent confident that \(\beta_1\) is not zero.
anova(fit)
The test shows that the slope of the explanatory variable is statistically significant since the p-value is close to zero. This linear model (\(y=\beta_1 x+\beta_0+e\)) is appropriate for this data. We can conclude there is a negative influence of the iron content on the weight loss of alloys.
summary(fit)
##
## Call:
## lm(formula = loss ~ Fe, data = corrosion)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7980 -1.9464 0.2971 0.9924 5.7429
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 129.787 1.403 92.52 < 2e-16 ***
## Fe -24.020 1.280 -18.77 1.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.058 on 11 degrees of freedom
## Multiple R-squared: 0.9697, Adjusted R-squared: 0.967
## F-statistic: 352.3 on 1 and 11 DF, p-value: 1.055e-09
There is a 96.97 percent variation in corrosion loss explained by this model. This means that almost all the variation in corrosion loss is explained by the model, making it a good representation of the trend.
newx=data.frame(x=0.9)
colnames(newx) = "Fe"
newx
predict(fit, newdata=newx, interval="confidence",level=0.95)
## fit lwr upr
## 1 108.1687 106.3006 110.0368
The expected weight loss of a specimen with 0.9% iron content is 108.1687 mg per square decimeter per day and the 95% CI for expected weight loss is (106.3006, 110.0368).
predict(fit, newdata=newx, interval="prediction",level=0.95)
## fit lwr upr
## 1 108.1687 101.1841 115.1533
The 95% prediction interval for the weight loss is (101.1841, 115.1533) when the iron content is 0.9%.
x=seq(from=0, to=2, by=0.1)
newx=data.frame(x);colnames(newx)="Fe"
CI=predict(fit,newdata = newx, interval = "confidence")
PI=predict(fit,newdata=newx, interval = "predict")
plot(Fe, loss)
abline(fit)
lines(x,CI[,2],col=6)
lines(x,CI[,3],col=6)
lines(x,PI[,2],col=3)
lines(x,PI[,3],col=3)
The green lines represent the prediction interval and the purple lines represent the confidence interval.
In forestry study, biologist measures the circumference of trees to predict the height of the tree. The excel file contains information about 15 trees with measurment of circumference and height.
library(readxl)
tree1 <- read_excel("tree1.xlsx")
tree1
attach(tree1)
plot(circ, height)
We observe an increasing pattern. As the circumference of a tree increases in size, its height also increases. The height and circumference look to increase in an approximately linear fashion.
fit=lm(height~circ-1)
fit
##
## Call:
## lm(formula = height ~ circ - 1)
##
## Coefficients:
## circ
## 10.11
The model we fit is \(y=10.11x+e\). A 1 percent increase in circumference will result in a 10.11 increase in height. When circumference is zero the height is also zero.
plot(circ, height)
abline(fit)
summary(fit)
##
## Call:
## lm(formula = height ~ circ - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.354 -8.061 5.117 22.025 68.819
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## circ 10.1057 0.1293 78.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.21 on 14 degrees of freedom
## Multiple R-squared: 0.9977, Adjusted R-squared: 0.9976
## F-statistic: 6111 on 1 and 14 DF, p-value: < 2.2e-16
newtree = data.frame(x=70)
colnames(newtree)="circ"
newtree
predict(fit,newdata=newtree, interval="prediction",level=0.95)
## fit lwr upr
## 1 707.4021 635.6398 779.1645
The expected height of a tree with a circumference of 70 inches is 707.4021 inches. The 95% prediction interval for tree height is (635.6398,779.1645) when circumference is 70 inches.