The dataset is from The Statistical Sleuth, Ramsey and Schafer (http://www.statisticalsleuth.com/). The data on corn yields and rainfall are in `ex0915’. Variables:
Plot corn yield vs Rainfall. Fit the multiple regression of corn yield on Rainfall and Rainfall^2.
library(Sleuth3)
attach(ex0915)
plot(Rainfall, Yield)
fit1 <- lm(Yield~Rainfall + I(Rainfall^2), data = ex0915)
Asses the model fit with diagnostic plots - everything seems OK.
op <- par(mfrow = c(1, 2) )
plot(fit1, which = c(1, 2))
par(op)
Plot the residuals from the above model vs Year.
plot(Year, residuals(fit1))
There is a pattern evident in this plot - yield increases with years after adjusting for rainfall. This could imply there have been advances in technology from 1890 - 1930 that have enabled increased corn yield.
fit2 <- lm(Yield~Rainfall + I(Rainfall^2) + Year, data = ex0915)
summary(fit1)
##
## Call:
## lm(formula = Yield ~ Rainfall + I(Rainfall^2), data = ex0915)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4642 -2.3236 -0.1265 3.5151 7.1597
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.01467 11.44158 -0.438 0.66387
## Rainfall 6.00428 2.03895 2.945 0.00571 **
## I(Rainfall^2) -0.22936 0.08864 -2.588 0.01397 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.763 on 35 degrees of freedom
## Multiple R-squared: 0.2967, Adjusted R-squared: 0.2565
## F-statistic: 7.382 on 2 and 35 DF, p-value: 0.002115
summary(fit2)
##
## Call:
## lm(formula = Yield ~ Rainfall + I(Rainfall^2) + Year, data = ex0915)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3995 -1.8086 -0.0479 2.4050 5.1839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -263.30324 98.24094 -2.680 0.01126 *
## Rainfall 5.67038 1.88824 3.003 0.00499 **
## I(Rainfall^2) -0.21550 0.08207 -2.626 0.01286 *
## Year 0.13634 0.05156 2.644 0.01229 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.477 on 34 degrees of freedom
## Multiple R-squared: 0.4167, Adjusted R-squared: 0.3652
## F-statistic: 8.095 on 3 and 34 DF, p-value: 0.0003339
Year is a significant predictor of corn yield (P-value= 0.012). The proportion of variance explains increases from 30% to 40%.
library(rgl)
to visualize the fitted model. This allows us to see the effect of the two predictors on the response.xRain <- seq(min(Rainfall), max(Rainfall), by = 0.1)
xYear <- seq(min(Year), max(Year), length.out= length(xRain))
f <- function(x,x2) { r <- predict(fit2, newdata =data.frame( cbind(Year = x2,Rainfall = x)))}
z <- outer(xRain, xYear, f)
persp3d(xRain, xYear, z, col = "light blue", xlab = "Rain", ylab = "Year", zlab = "Yield")
points3d(Rainfall, Year, Yield, col="red")
You must enable Javascript to view this page properly.
The effect of Year is linear - each additional year adds 0.136 to the average yield for fixed rainfall.
The model has a quadratic term for Rainfall - the effect of increase of one inch of rainfall on the mean yield differs over the range of ranfalls.
This model does not include an interaction term - the effect of increase of one inch of rainfall on the mean yield is the same over the range of years.
rainbyyear <- Rainfall * Year
e2 <- residuals(lm(Yield~Rainfall + I(Rainfall^2) + Year, data = ex0915))
e1 <- residuals(lm(rainbyyear~Rainfall + I(Rainfall^2) + Year, data = ex0915))
plot(e1, e2)
abline(lm(e2~e1), col = 2)
fit3 <- lm(Yield~Rainfall* Year + I(Rainfall^2), data = ex0915)
summary(fit3)
##
## Call:
## lm(formula = Yield ~ Rainfall * Year + I(Rainfall^2), data = ex0915)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2969 -2.5471 0.6011 1.9923 5.0204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.909e+03 4.862e+02 -3.927 0.000414 ***
## Rainfall 1.588e+02 4.457e+01 3.564 0.001138 **
## Year 1.001e+00 2.555e-01 3.919 0.000423 ***
## I(Rainfall^2) -1.862e-01 7.198e-02 -2.588 0.014257 *
## Rainfall:Year -8.064e-02 2.345e-02 -3.439 0.001599 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.028 on 33 degrees of freedom
## Multiple R-squared: 0.5706, Adjusted R-squared: 0.5185
## F-statistic: 10.96 on 4 and 33 DF, p-value: 9.127e-06
We will plot this fitted model.
f <- function(x,x2) { r <- coef(fit3)[1] + coef(fit3)[2] * x + coef(fit3)[3] * x2 + coef(fit3)[4] * x^2 + coef(fit3)[5] * x * x2}
z <- outer(xRain, xYear, f)
persp3d(xRain, xYear, z, col = "light blue", xlab = "Rain", ylab = "Year", zlab = "Yield")
points3d(Rainfall, Year, Yield, col="red")
You must enable Javascript to view this page properly.
In 1890 Rainfall has an increasing effect on Yield. As Year increases that effect tapers off and the increase in Rainfall has a negative effect. There is a couple of influential points with high Rainfall and low Yield in the later years - they may be driving the fit.
# Cook's D plot
# identify D values > 4/(n-k-1)
cutoff <- 4/((nrow(ex0915)-length(fit3$coefficients)-2))
plot(fit3, which = 4, cook.levels = cutoff)
D <- cooks.distance(fit3)
f <- function(x,x2) { r <- coef(fit3)[1] + coef(fit3)[2] * x + coef(fit3)[3] * x2 + coef(fit3)[4] * x^2 + coef(fit3)[5] * x * x2}
z <- outer(xRain, xYear, f)
persp3d(xRain, xYear, z, col = "light blue", xlab = "Rain", ylab = "Year", zlab = "Yield")
spheres3d(Rainfall, Year, Yield,pointstyle = "s", col=as.numeric(D>cutoff)+1, radius = D^(1/3))
You must enable Javascript to view this page properly.
We can use library(condvis)
to visualize the model at different sections of variable Year.
library(condvis)
models <- list(quadratic = fit2, quadraticPlusinteraction = fit3)
ceplot(data = ex0915, model = models, sectionvars = "Rainfall", threshold = 1)