This short document contains the R code used in the article “Crop Yield estimation under climate change: a semiparametric approach using tensor splines” by Stephen Peplow and Malcolm Little.
We will need the R packages mgcv and earth, so load these now and the dataset from my public dropbox folder
library(mgcv)
library(earth)
tithetrimmed6sept <- read.csv("C:/Users/Stephen/Dropbox/Public/tithetrimmed6sept.csv")
There are some nas in the data which need to be cleaned and the dataset renamed to reflect this:
tithena <- na.omit(tithetrimmed6sept)
The first job is to run a regression using earth. I have set a seed to ensure consistency of results. The regression below creates an object e1 which is the regression of log wy (logged wheat yield) against elevation, easily available water (eawc), and remaining variables. The degree is 2 to allow for two levels of interaction
set.seed(2)
e1 <- earth(lwy ~ elev + eawc + augtemp + marain + augustrain + julrain + julaugtdiff, degree=2, data = tithena)
Now get figure 2
plot(e1, which =1, caption= "", col.grsq="blue")
and figure 3
plot(evimp(e1))
Finally a summary of the results:
summary(e1)
## Call: earth(formula=lwy~elev+eawc+augtemp+marain+augustrain+
## julrain+julaugtdiff, data=tithena, degree=2)
##
## coefficients
## (Intercept) 3.01351996
## h(241-elev) 0.00175879
## h(augustrain-63.91) -0.07521511
## h(augustrain-70.56) 0.08583560
## h(julrain-65.15) 0.05339493
## h(241-elev) * h(julaugtdiff-0.08) -0.00592140
## h(241-elev) * h(0.08-julaugtdiff) -0.01080131
## h(16.5626-augtemp) * h(augustrain-63.91) 0.02494627
## h(marain-64.44) * h(63.91-augustrain) -0.00534656
## h(marain-88.76) * h(augustrain-63.91) -0.00104507
## h(88.76-marain) * h(augustrain-63.91) 0.00353334
## h(63.91-augustrain) * h(julaugtdiff-0.22) 0.10649160
## h(63.91-augustrain) * h(0.22-julaugtdiff) -0.37102876
##
## Selected 13 of 20 terms, and 6 of 7 predictors
## Importance: augustrain, marain, elev, julaugtdiff, augtemp, julrain, ...
## Number of terms at each degree of interaction: 1 4 8
## GCV 0.02131443 RSS 7.509768 GRSq 0.5669889 RSq 0.6278952
The next job is to run mgcv with a general additive model (GAM). The code below creates an object bsg, with tensor products (te) for agustrain and july-august temperature difference. There are also splines (s) for march rain and a tensor product form august rain and elevation. The method is maximum likelihood (ML)
bsg <- gam(lwy~ te(augustrain,julaugtdiff, bs = "tp", mp=TRUE ,fx=FALSE ) + s(marain) + te(augustrain,elev, bs="tp") +s(julrain), select=TRUE, data = tithena, method = "ML")
We ask for a summary of results:
summary(bsg)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## lwy ~ te(augustrain, julaugtdiff, bs = "tp", mp = TRUE, fx = FALSE) +
## s(marain) + te(augustrain, elev, bs = "tp") + s(julrain)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.92161 0.00689 424 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(augustrain,julaugtdiff) 5.395 19 1.555 2.31e-08 ***
## s(marain) 5.061 9 2.848 1.82e-07 ***
## te(augustrain,elev) 3.095 14 1.764 1.08e-06 ***
## s(julrain) 5.623 9 2.114 1.03e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.602 Deviance explained = 62%
## -ML = -194.34 Scale est. = 0.019561 n = 412
and an attractive visualization of the results. First a perspective view and then a contour plot.
vis.gam(bsg, ticktype="detailed", plot.type="persp",color="topo", theta= -30, phi=10)
vis.gam(bsg, view=c("augustrain","elev"), plot.type = "contour", color= "topo", theta = -20)
Now the ordinary least squares method: the regression model below
tnew1 <- lm(lwy ~ elev + augustrain + marain + julrain + julaugtdiff + augtemp , data= tithena)
summary(tnew1)
##
## Call:
## lm(formula = lwy ~ elev + augustrain + marain + julrain + julaugtdiff +
## augtemp, data = tithena)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39742 -0.09902 -0.00936 0.10994 0.44842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.193e+00 1.122e+00 2.847 0.004645 **
## elev -3.272e-04 9.733e-05 -3.362 0.000848 ***
## augustrain -3.590e-03 7.281e-03 -0.493 0.622280
## marain -3.508e-03 2.860e-03 -1.227 0.220679
## julrain -3.977e-03 8.474e-03 -0.469 0.639120
## julaugtdiff 2.049e-01 7.964e-02 2.573 0.010423 *
## augtemp 2.883e-02 6.727e-02 0.429 0.668475
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1565 on 405 degrees of freedom
## Multiple R-squared: 0.5087, Adjusted R-squared: 0.5015
## F-statistic: 69.9 on 6 and 405 DF, p-value: < 2.2e-16
and check the diagnostics
plot(tnew1)
and now include quadratics
tnew2 <- lm(lwy ~ elev + augtemp + augtempsq + augustrain +augrainsq + marain + julrain + julrainsq + julaugtdiff, data= tithena)
summary(tnew2)
##
## Call:
## lm(formula = lwy ~ elev + augtemp + augtempsq + augustrain +
## augrainsq + marain + julrain + julrainsq + julaugtdiff, data = tithena)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40876 -0.09544 -0.01290 0.10778 0.40970
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.572e+01 4.544e+01 1.226 0.220898
## elev -2.804e-04 9.737e-05 -2.879 0.004197 **
## augtemp -6.695e+00 5.457e+00 -1.227 0.220589
## augtempsq 2.068e-01 1.647e-01 1.256 0.209831
## augustrain 2.005e-01 6.305e-02 3.180 0.001589 **
## augrainsq -1.547e-03 4.648e-04 -3.329 0.000951 ***
## marain 4.771e-05 3.923e-03 0.012 0.990303
## julrain -1.999e-01 7.309e-02 -2.734 0.006527 **
## julrainsq 1.966e-03 6.602e-04 2.978 0.003073 **
## julaugtdiff 2.820e-01 8.992e-02 3.136 0.001837 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.154 on 402 degrees of freedom
## Multiple R-squared: 0.5274, Adjusted R-squared: 0.5169
## F-statistic: 49.85 on 9 and 402 DF, p-value: < 2.2e-16