This short document contains the R code used in the article “Crop Yield estimation under climate change: a semiparametric approach using tensor splines” by Stephen Peplow and Malcolm Little.

We will need the R packages mgcv and earth, so load these now and the dataset from my public dropbox folder

library(mgcv)
library(earth)
tithetrimmed6sept <- read.csv("C:/Users/Stephen/Dropbox/Public/tithetrimmed6sept.csv")

There are some nas in the data which need to be cleaned and the dataset renamed to reflect this:

tithena <- na.omit(tithetrimmed6sept)

The first job is to run a regression using earth. I have set a seed to ensure consistency of results. The regression below creates an object e1 which is the regression of log wy (logged wheat yield) against elevation, easily available water (eawc), and remaining variables. The degree is 2 to allow for two levels of interaction

set.seed(2)
e1 <- earth(lwy ~ elev + eawc + augtemp + marain + augustrain + julrain + julaugtdiff, degree=2, data = tithena)

Now get figure 2

plot(e1, which =1, caption= "", col.grsq="blue")

and figure 3

plot(evimp(e1))

Finally a summary of the results:

summary(e1)
## Call: earth(formula=lwy~elev+eawc+augtemp+marain+augustrain+
##       julrain+julaugtdiff, data=tithena, degree=2)
## 
##                                           coefficients
## (Intercept)                                 3.01351996
## h(241-elev)                                 0.00175879
## h(augustrain-63.91)                        -0.07521511
## h(augustrain-70.56)                         0.08583560
## h(julrain-65.15)                            0.05339493
## h(241-elev) * h(julaugtdiff-0.08)          -0.00592140
## h(241-elev) * h(0.08-julaugtdiff)          -0.01080131
## h(16.5626-augtemp) * h(augustrain-63.91)    0.02494627
## h(marain-64.44) * h(63.91-augustrain)      -0.00534656
## h(marain-88.76) * h(augustrain-63.91)      -0.00104507
## h(88.76-marain) * h(augustrain-63.91)       0.00353334
## h(63.91-augustrain) * h(julaugtdiff-0.22)   0.10649160
## h(63.91-augustrain) * h(0.22-julaugtdiff)  -0.37102876
## 
## Selected 13 of 20 terms, and 6 of 7 predictors 
## Importance: augustrain, marain, elev, julaugtdiff, augtemp, julrain, ...
## Number of terms at each degree of interaction: 1 4 8
## GCV 0.02131443    RSS 7.509768    GRSq 0.5669889    RSq 0.6278952

The next job is to run mgcv with a general additive model (GAM). The code below creates an object bsg, with tensor products (te) for agustrain and july-august temperature difference. There are also splines (s) for march rain and a tensor product form august rain and elevation. The method is maximum likelihood (ML)

bsg <- gam(lwy~ te(augustrain,julaugtdiff, bs = "tp", mp=TRUE ,fx=FALSE ) + s(marain) +  te(augustrain,elev, bs="tp") +s(julrain), select=TRUE, data = tithena, method = "ML")

We ask for a summary of results:

summary(bsg)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## lwy ~ te(augustrain, julaugtdiff, bs = "tp", mp = TRUE, fx = FALSE) + 
##     s(marain) + te(augustrain, elev, bs = "tp") + s(julrain)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.92161    0.00689     424   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                              edf Ref.df     F  p-value    
## te(augustrain,julaugtdiff) 5.395     19 1.555 2.31e-08 ***
## s(marain)                  5.061      9 2.848 1.82e-07 ***
## te(augustrain,elev)        3.095     14 1.764 1.08e-06 ***
## s(julrain)                 5.623      9 2.114 1.03e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.602   Deviance explained =   62%
## -ML = -194.34  Scale est. = 0.019561  n = 412

and an attractive visualization of the results. First a perspective view and then a contour plot.

vis.gam(bsg, ticktype="detailed", plot.type="persp",color="topo", theta= -30, phi=10)

vis.gam(bsg, view=c("augustrain","elev"), plot.type = "contour", color= "topo", theta = -20)

Now the ordinary least squares method: the regression model below

tnew1 <- lm(lwy ~ elev + augustrain +  marain + julrain + julaugtdiff + augtemp , data= tithena)
summary(tnew1)
## 
## Call:
## lm(formula = lwy ~ elev + augustrain + marain + julrain + julaugtdiff + 
##     augtemp, data = tithena)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39742 -0.09902 -0.00936  0.10994  0.44842 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.193e+00  1.122e+00   2.847 0.004645 ** 
## elev        -3.272e-04  9.733e-05  -3.362 0.000848 ***
## augustrain  -3.590e-03  7.281e-03  -0.493 0.622280    
## marain      -3.508e-03  2.860e-03  -1.227 0.220679    
## julrain     -3.977e-03  8.474e-03  -0.469 0.639120    
## julaugtdiff  2.049e-01  7.964e-02   2.573 0.010423 *  
## augtemp      2.883e-02  6.727e-02   0.429 0.668475    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1565 on 405 degrees of freedom
## Multiple R-squared:  0.5087, Adjusted R-squared:  0.5015 
## F-statistic:  69.9 on 6 and 405 DF,  p-value: < 2.2e-16

and check the diagnostics

plot(tnew1)

and now include quadratics

tnew2 <- lm(lwy ~ elev + augtemp + augtempsq + augustrain +augrainsq + marain + julrain + julrainsq + julaugtdiff, data= tithena)
summary(tnew2)
## 
## Call:
## lm(formula = lwy ~ elev + augtemp + augtempsq + augustrain + 
##     augrainsq + marain + julrain + julrainsq + julaugtdiff, data = tithena)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40876 -0.09544 -0.01290  0.10778  0.40970 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.572e+01  4.544e+01   1.226 0.220898    
## elev        -2.804e-04  9.737e-05  -2.879 0.004197 ** 
## augtemp     -6.695e+00  5.457e+00  -1.227 0.220589    
## augtempsq    2.068e-01  1.647e-01   1.256 0.209831    
## augustrain   2.005e-01  6.305e-02   3.180 0.001589 ** 
## augrainsq   -1.547e-03  4.648e-04  -3.329 0.000951 ***
## marain       4.771e-05  3.923e-03   0.012 0.990303    
## julrain     -1.999e-01  7.309e-02  -2.734 0.006527 ** 
## julrainsq    1.966e-03  6.602e-04   2.978 0.003073 ** 
## julaugtdiff  2.820e-01  8.992e-02   3.136 0.001837 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.154 on 402 degrees of freedom
## Multiple R-squared:  0.5274, Adjusted R-squared:  0.5169 
## F-statistic: 49.85 on 9 and 402 DF,  p-value: < 2.2e-16