Use historical harvests, number of hunters, and weather data to predict the harvest for the upcoming hunting seasons.
NOTICE that I am only looking at the general rifle hunting seasons on public land. There are also hunters for Archery, Muzzleloader, Private Land, Ranching for Wildlife, etc.
Load required libraries for wrangling data, charting, and mapping
library(plyr,quietly = T) # data wrangling
library(dplyr,quietly = T) # data wrangling
library(ggplot2, quietly = T) # charting
library(ggthemes,quietly = T) # so I can add the highcharts theme and palette
library(scales,quietly = T) # to load the percent function when labeling plots
library(caret,quietly = T) # classification and regression training
library(foreach,quietly = T) # parallel processing to speed up the model training
library(doMC,quietly = T) # parallel processing to speed up the model training
library(lubridate,quietly = T) # for timing models
Set our preferred charting theme
theme_set(theme_minimal()+theme_hc()+theme(legend.key.width = unit(1.5, "cm")))
Run script to get hunter data
source('~/_code/colorado-dow/datasets/Colorado Elk Harvest Data.R', echo=F)
The working directory was changed to /Users/psarnow/_code/colorado-dow/datasets inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
Table of the harvest data
head(COElkRifleAll)
Unit Harvest.Antlered Hunters.Antlered Success.Antlered Season HuntCode Harvest.Antlerless Hunters.Antlerless
1 1 0 0 NA 1 EM001O1R NA NA
2 2 0 0 NA 1 EM002O1R NA NA
3 201 0 0 NA 1 EM201O1R NA NA
4 3 0 0 NA 1 EM003O1R NA NA
5 301 0 0 NA 1 EM301O1R NA NA
6 4 0 0 NA 1 EM004O1R NA NA
Success.Antlerless Hunters.Either Success.Either Year
1 NA NA NA 2006
2 NA NA NA 2006
3 NA NA NA 2006
4 NA NA NA 2006
5 NA NA NA 2006
6 NA NA NA 2006
Load the weather data
load("weatherdata5.RData")
head(weatherdata5)
source geodata
source('~/_code/colorado-dow/datasets/Colorado GMUnit and Road data.R', echo=F)
rgdal: version: 1.2-20, (SVN revision 725)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
GDAL binary built with GEOS: FALSE
Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
Linking to sp version: 1.2-7
rgeos version: 0.3-26, (SVN revision 560)
GEOS runtime version: 3.6.1-CAPI-1.10.1 r0
Linking to sp version: 1.2-7
Polygon checking: TRUE
OGR data source with driver: ESRI Shapefile
Source: "/Users/psarnow/_code/colorado-dow/datasets/CPW_GMUBoundaries/BigGameGMUBoundaries03172015.shp", layer: "BigGameGMUBoundaries03172015"
with 185 features
It has 12 fields
Integer64 fields read as strings: GMUID
OGR data source with driver: ESRI Shapefile
Source: "/Users/psarnow/_code/colorado-dow/datasets/ne_10m_roads/ne_10m_roads.shp", layer: "ne_10m_roads"
with 56601 features
It has 29 fields
Integer64 fields read as strings: scalerank question
Take a peak at the boundary data
head(Unitboundaries2)
id long lat order hole piece group COUNTY DEERDAU ELKDAU ANTDAU MOOSEDAU BEARDAU LIONDAU SQ_MILES ACRES
1 1 -109.0486 40.83476 1 FALSE 1 1.1 MOFFAT D-1 E-47 A-11 M-99 B-15 L-1 127.2227 81422.53
2 1 -109.0472 40.83357 2 FALSE 1 1.1 MOFFAT D-1 E-47 A-11 M-99 B-15 L-1 127.2227 81422.53
3 1 -109.0460 40.83295 3 FALSE 1 1.1 MOFFAT D-1 E-47 A-11 M-99 B-15 L-1 127.2227 81422.53
4 1 -109.0449 40.83228 4 FALSE 1 1.1 MOFFAT D-1 E-47 A-11 M-99 B-15 L-1 127.2227 81422.53
5 1 -109.0438 40.83204 5 FALSE 1 1.1 MOFFAT D-1 E-47 A-11 M-99 B-15 L-1 127.2227 81422.53
6 1 -109.0423 40.83181 6 FALSE 1 1.1 MOFFAT D-1 E-47 A-11 M-99 B-15 L-1 127.2227 81422.53
SHAPE_area SHAPE_len Unit
1 329506619 100751.7 1
2 329506619 100751.7 1
3 329506619 100751.7 1
4 329506619 100751.7 1
5 329506619 100751.7 1
6 329506619 100751.7 1
Set to predictive analytics directory
setwd("~/_code/colorado-dow/phase III - predictive analytics")
I could also use the harvest and population estimates to create another elk population number, might be useful in predicting. Because there are some elk not accounted for in DecemberPopulation = JanuaryPopulation - FallHarvest
Group weather data by year
UnitWeather <- summarise(group_by(weatherdata5,Year,Unit),
daily.temperatureHigh = max(daily.temperatureHigh,na.rm = T),
daily.temperatureLow = min(daily.temperatureLow,na.rm = T),
daily.temperatureMean = mean(daily.temperatureMean,na.rm = T),
daily.precipAccumulation = mean(daily.precipAccumulation,na.rm = T),
daily.precipType = mean(daily.precipType,na.rm = T),
daily.windSpeed = mean(daily.windSpeed,na.rm = T),
daily.FullmoonPhase = mean(daily.FullmoonPhase,na.rm = T))
Appropriate field classes for model training
UnitWeather$Year <- as.numeric(UnitWeather$Year)
Group Hunter data by Year and Unit
COElkHarvest <- summarise(group_by(COElkRifleAll,Year,Unit),
Harvest = sum(c(Harvest.Antlered,Harvest.Antlerless),na.rm = T),
Hunters = sum(c(Hunters.Antlered,Hunters.Antlerless,Hunters.Either),na.rm = T))
COElkHarvest$Year <- as.numeric(COElkHarvest$Year)
Join Harvest, Hunter, and Weather data
COElkHarvest <- full_join(COElkHarvest, UnitWeather, by = c("Year","Unit"))
Remove rows with missing data
COElkHarvest <- filter(COElkHarvest, !is.na(Harvest) & !is.na(daily.temperatureMean) & Year != 2018)
Split into train and test sets. Will use 75% of the data to train on.
COElkHarvest <- mutate(group_by(COElkHarvest,Unit),
numentries = n())
COElkHarvest <- filter(COElkHarvest, numentries >= 3)
COElkHarvest$UnitYear <- paste(COElkHarvest$Unit, COElkHarvest$Year)
traindata <- COElkHarvest %>% group_by(Unit) %>% sample_frac(size = .75, replace = F)
testdata <- COElkHarvest[!COElkHarvest$UnitYear %in% traindata$UnitYear,]
COElkHarvest <- select(COElkHarvest, -UnitYear, -numentries)
traindata <- select(traindata, -UnitYear, -numentries)
testdata <- select(testdata, -UnitYear, -numentries)
Save off for importing into AzureML
write.csv(COElkHarvest,file = "~/_code/colorado-dow/datasets/COElkHarvest.csv",row.names = F)
Loop through possible methods, utilizing the quicker ‘adaptive_cv’ parameter search from caret. Consider scripting this into AzureML to make it run much faster, though there is more setup and errors to control for
quickmethods <- c("lm",'svmLinear',"svmRadial","knn","cubist","kknn")
step1_all <- NULL
for (imethod in quickmethods) {
step1 <- NULL
start <- now()
if (imethod == "lm" | imethod == "svmLinear") {
controlmethod <- "repeatedcv"
} else {controlmethod <- "adaptive_cv"}
fitControl <- trainControl(
method = controlmethod,
# search = 'random',
number = 4,
repeats = 4,
allowParallel = TRUE,
summaryFunction = defaultSummary)
registerDoSEQ()
registerDoMC(cores = 6)
HarvestModel_1 = train(Harvest ~ ., data = traindata,
method = imethod,
#preProc = c("center","scale"),
tuneLength = 15,
trControl = fitControl)
HarvestModel_1
# measure performance
predictdata <- predict(HarvestModel_1, testdata)
step1$method <- imethod
step1$RMSE <- postResample(pred = predictdata, obs = testdata$Harvest)[1]
step1$duration <- now() - start
step1 <- as.data.frame(step1)
step1_all <- rbind(step1_all,step1)
}
row names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discardedrow names were found from a short variable and have been discarded
View Results
step1_all
method RMSE duration
RMSE lm 52.23203 1.742155 secs
RMSE1 svmLinear 56.67347 1.994666 secs
RMSE2 svmRadial 55.10650 7.456390 secs
RMSE3 knn 105.23611 4.174521 secs
RMSE4 cubist 60.09588 9.766908 secs
RMSE5 kknn 59.56523 15.774304 secs
Run the best model with all of the data
fitControl <- trainControl(
method = "repeatedcv",
# search = 'random',
number = 4,
repeats = 4,
allowParallel = TRUE,
summaryFunction = defaultSummary)
registerDoSEQ()
registerDoMC(cores = 6)
HarvestModel_1 = train(Harvest ~ ., data = COElkHarvest,
method = 'lm',
#preProc = c("center","scale"),
tuneLength = 15,
trControl = fitControl)
HarvestModel_1
Linear Regression
704 samples
10 predictor
No pre-processing
Resampling: Cross-Validated (4 fold, repeated 4 times)
Summary of sample sizes: 527, 527, 529, 529, 528, 529, ...
Resampling results:
RMSE Rsquared MAE
57.90792 0.9189491 41.26232
Tuning parameter 'intercept' was held constant at a value of TRUE
# Get 2018 Hunters from the Hunters Predicted model
load("~/_code/colorado-dow/datasets/COElkHunters2018_Season.RData")
COElkHunters2018Predicted <- summarise(group_by(COElkHunters2018_Season, Year, Unit),
Hunters = sum(Hunters,na.rm = T))
# ensure we include all of the units
COElkHarvest2018 <- as.data.frame(unique(COElkHarvest$Unit))
colnames(COElkHarvest2018) <- "Unit"
COElkHarvest2018$Year <- 2018
# Weather data for 2018
UnitWeather2018 <- filter(UnitWeather,Year==2018)
# A left join will autofill missing draw data with NAs, but will retain the full list of Units
COElkHarvest2018 <- left_join(COElkHarvest2018,UnitWeather2018)
Joining, by = c("Unit", "Year")
Column `Unit` joining factor and character vector, coercing into character vector
# Join in forecasted Hunters
COElkHarvest2018 <- left_join(COElkHarvest2018,COElkHunters2018Predicted)
Joining, by = c("Unit", "Year")
# Replace the hunter data with missing entries
COElkHarvest2018$Hunters[is.na(COElkHarvest2018$Hunters)] <- 0
COElkHarvest2018 <- COElkHarvest2018[, colnames(COElkHarvest2018) %in% c("Unit",HarvestModel_1$coefnames)]
COElkHarvest2018$Harvest <- predict(HarvestModel_1, COElkHarvest2018)
COElkHarvest2018$Harvest[COElkHarvest2018$Harvest<0] <- 0
# Combine with historic data
COElkHarvestAll <- rbind.fill(COElkHarvest,COElkHarvest2018)
ggplot(COElkHarvestAll, aes(Year,Harvest)) +
geom_bar(position="dodge",stat="identity") +
# coord_cartesian(ylim = c(130000,155000)) +
scale_fill_hc() +
labs(title="Statewide Elk Harvest", caption="source: cpw.state.co.us")
Harvest_77 <- filter(COElkHarvestAll, Unit == "77")
ggplot(Harvest_77, aes(Year,Harvest)) +
geom_bar(position="dodge",stat="identity") +
# coord_cartesian(ylim = c(130000,155000)) +
scale_fill_hc() +
labs(title="Unit 77 Elk Harvest", caption="source: cpw.state.co.us")