We have a range of data on the amounts of conservation tillage being adopted in British Columbia. We have calculated the proportions of each census subdivision according to the 2011 and 2016 agricultural censuses. We want to undertsand the driving factors between a decrease in proportion, no change and an increase. We originally classified the data into three groups according to those criteria. We found that such a coarse grouping throws away much information. Using the difference in proportions between 2011 and 2016 and regression provides a much better model.

I tried different dependent variables: the difference in conservation tillage proportions, and just the 2016 proportion. I found that the conservation tillage proportion gives a more accurate model: 7% error rate compared to 12%. the code below is based on the 2016 proportions.

I found that the package ‘party.plot’ gives a nice clean graphical output to the plot.

The code below: imports the data; removes missing observations, and library the package ‘rpart’ from the library.

merged <- read.csv("E:/Malcolm Project/consoldwithownership.csv")
mergedna <- na.omit(merged)
library(rpart)
party3 <- rpart(cons2016  ~ slopepercent + marchprecipdailymean + marchprecipdailystdev  + julyprecipdailymean +  julyprecipdailystdev + augustprecipdailymean + augustprecipdailystdev +  julydailymeantemperature + augustdailymeantemperature  + julyaugustdailymeantemperaturedi + julymaxtempmean + julymintempmean  + augustmaxtempmean + augustmintempmean + age2011 +  age2016 + prop012011 +   propo12016 +  leased2011 + leased2016 + alfalfa2011 + alfalfa2016 + baledprop2011 +  baledprop2016 +  male2011 + male2016 + avsize2011 + avsize2016 + cultivatedstd  + cultivated_mean, data=mergedna )

The above provides an object party3 using all the variables we have. I used the control code below to define the cross validations

rpart.control(minsplit = 20, cp = 0.01, 
              maxcompete = 4, maxsurrogate = 5, usesurrogate = 2, xval = 10)
## $minsplit
## [1] 20
## 
## $minbucket
## [1] 7
## 
## $cp
## [1] 0.01
## 
## $maxcompete
## [1] 4
## 
## $maxsurrogate
## [1] 5
## 
## $usesurrogate
## [1] 2
## 
## $surrogatestyle
## [1] 0
## 
## $maxdepth
## [1] 30
## 
## $xval
## [1] 10

load the rpart.plot package, which of course you must have previously installed and then draw the tree

library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.2.5
rpart.plot(party3)

Notice that the software has removed variables with a p value > 0.05, eliminating all the other variables. The proportions are in order in the terminal nodes.