12/9/2019

Computer model emulation

Statistical models can be used to emulate (approximate) complex computer models, such as APSIM.

  • Simplify computer model via Random Forests, Neural Networks, Generalized Additive Models
  • Run more efficiently in repetition
  • Reduce computation for sensitivity analysis and large spatial scales

What is online prediction?

Predict the \(i^{th}\) observation by modeling the previous \(i-1\) observations, then update the model and predict the \((i+1)^{th}\) observation.

What is online prediction?

Why study online prediction for APSIM?

It may seem silly to predict the next observation using a statistical model when we are only running a single APSIM simulation…

But, online prediction can be seen as a first step toward full emulation of APSIM.

  1. Accurate online prediction knowing the next day's weather
  2. Online prediction without knowing the next day's weather (imputed or naive)
  3. Predict next week or month of APSIM output
  4. Fully predict APSIM output using a statistical model (emulation)
  5. Use APSIM emulator to run large-scale simulations more efficiently than APSIM

How does this connect to my research?

I work for the National Resources Inventory, which conducts land-use / erosion monitoring through complex geographical survey techniques over all non-Federal lands in the United States.

APSIM Setup

I used a simple APSIM simulation, as learning the basics of APSIM was one of my goals for this project.

  • 35 year simulation of continous corn on a field in central Iowa
  • Fertiliser: 150 kg/ha of UreaN applied annually at sowing
  • Sowing: Maize crop was sown at a fixed date, May 5, of each year at a depth of 50 mm, with 760 mm between rows and 8 plants per square meter. The maize cultivar GH_5019WX was selected
  • Harvesting: The crop was automatically harvested when APSIM phenology stage reached 'ReadyForHarvesting'
  • SurfaceOrganicMatter: An initial soybean residual pool of 1250 kg/ha with a 27 g/g C/N ratio was assumed present on the field
  • Maize: The GH_5019WX cultivar was applied to the field.

Simulation –> R

Covariates:

data("simulation")
print(colnames(simulation[c(5:11,22)]))
## [1] "Weather.Rain"                     "Weather.Radn"                    
## [3] "Weather.MaxT"                     "Weather.MeanT"                   
## [5] "Weather.MinT"                     "Weather.VPD"                     
## [7] "Maize.Phenology.CurrentStageName" "Date"

Simulation –> R

Outputs:

data("simulation")
print(colnames(simulation[c(12:21)]))
##  [1] "Maize.AboveGround.Wt"              
##  [2] "Maize.Grain.Wt"                    
##  [3] "Maize.Grain.Size"                  
##  [4] "Maize.Leaf.Transpiration"          
##  [5] "MicroClimate.RadiationInterception"
##  [6] "MicroClimate.PetTotal"             
##  [7] "Soil.SoilWater.Runoff"             
##  [8] "Soil.SoilWater.Drainage"           
##  [9] "sum(Soil.SoilWater.ESW)"           
## [10] "SoilWater.LeachNO3"

Training Data

Testing Data

Algorithm

  1. Has the crop been sowed? then continue (Else predict 0)
  2. Is the crop still growing? then continue (Else predict 0)
  3. Update model on \(i-1\) observations
  4. Is the crop ripe? then harvest and predict 0
  5. If crop is not ripe, predict the \(i^{th}\) observation using model
  6. Retrieve the true observation for \(i\)

Trial 1 - Linear Model (LM)

Linear Model Assumptions

  1. Independence between observations
  2. Normally distributed residuals
  3. Constant variance
  4. Mean of responses are linear combination of covariates

Definitely violate the independence assumption… why?

The residuals here are also ugly.

Trial 2 - LM on daily change

Trial 3 - Autoregressive Model

Trial 4 - Gen. Additive Model (GAM)

Trial 5 - Local GAM

Trial 6 - Random Forest

Results

methods rmse
LM: total 160
LM: daily change 105
AR 12
GAM: full 83
GAM: local 2
RF 76

Others: Transpiration (Local GAM)

Others: Runoff (RF)

Conclusions

Online prediction can be used to approximate daily APSIM output.

  • Local GAM models work well for accumulated totals
  • AR and local RF models also work pretty well
  • This wasn't an easy task, so full emulation might be really difficult to do accurately
  • This project helped me see why computer models are reasonable and useful compared to statistical models in some cases

Future Directions

Complex techniques might be needed to fully emulate APSIM

  • Sowing / Harvest events in APSIM require careful consideration
  • Predict events and then predict outputs?
  • Will weather data be assumed known, imputed, or missing?