Avalanche Hazard Data Reduction

0.1 About This Document
- 0.1.1 Goal
- 0.1.2 Format
0.2 Summary of Data
0.3 Data Reduction Goals
0.4 Feature Selection
0.5 Feature Extraction
- 0.5.1 Simple Means
- 0.5.2 Mean Fraction Over Threshold
0.6 Crunching the Data
0.7 Formatting Data
0.8 Comparison and Visualization
- 0.8.1 Elevation Filtering Comparison
0.9 Remarks

0.1 About This Document

This is a summary of a few different techniques to process avalanche hazard data to be used in survival analysis of Sierra Nevada Bighorn Sheep.

The avalanche hazard data is big and high-dimensional. In order to meaningfully incorporate it into our survival analysis, we need to reduce the dimensionality. This document describes how I’ve done this, and compares thre results of a few different methods. The methods were based on discussion and rough understanding of the dangers avalanches present to sheep.

0.1.1 Goal

There are infinitely many ways of aggregating and reducing this type of data. I have no idea which way is best for this application. My goal with this document is to explain the methods and provide a toolset with which to manipulate large multidemensional raster datas. The reader should be able to manipulate the code and use it in their own analyses.

0.1.2 Format

This document contains both the description of the data reduction and the actual code. The descriptive text appears as this text does, while the implementation code appears as follows:

# Installing Required Packages
# Running the code in these sections will reproduce 
#   the data processing steps.
# In order to run the code, your R must first be 
#   configured correctly. 

# Sets up your library source
# Note the option "eval=FALSE" in the markdown source.
# This code only needs to be run once. 
# If you need to run it, change that option to:
#   eval=TRUE

# Some of the required packages come from the Bioconductor repository
source("https://bioconductor.org/biocLite.R")
biocLite("rhdf5")

If you would like to reproduce the data processing steps as you go through this document, you can run each code section in order. The following section loads the required R packages, and installs those that are not already installed.

# Loading Required Packages 
# This section loads the packages necessary to run the data processing

libraries <- c("rhdf5",
               "rgdal",
               "gdalUtils",
               "rgeos",
               "maptools",
               "raster",
               "rasterVis",
               "ggmap",
               "dismo",
               "sp",
               "Rcpp",
               "coda",
               "bitops",
               "tmap", 
               "ggplot2",
               "reshape2", 
               "grid",
               "gridBase",
               "gridExtra"
               # "animation"
               )
#Install Packages that are not already installed:
for (i in 1:length(libraries) ) {
  if ( !require(libraries[i], character.only=TRUE) ) {
    install.packages(libraries[i])
  }
}
# Load the libraries:
lapply(libraries, library, character.only=TRUE)

# knitr is only required in the markdown documnent:
library(knitr)

This document and analysis uses some constants and functions that are custom-defined. I put most of these in separate files for organizational purposes and code clarity. Some of them are also included in code snippets within this document, for illustrative purposes. For more details on their implementation, you can view their source files. To use them, they must be sourced as follows:

# Load project constants:
source('Constants.R')
# Load project functions:
source('avalancheFunctions.R')
# prepare Data:
source('dataPrep.r')

0.2 Summary of Data

The raw avalanche hazard data for each water year (Oct 1 - Sept 30) is a 3 dimensional (2 spatial, one temporal) array of hazard values, ranging from 0-100. The time dimension for each year is 365 days long. For each day, there is a 602 by 722 georeferenced rectangular grid. The hazard values are a dimensionless representing the danger. Ned Bair computed them as follows:

Hazard is 0-100, based on:

terrain, i.e. the percent of 30-60 degree terrain (from 10 m DEM) that is snow covered. I ultimately wound up using SNODAS to identify snow covered pixels, irrespective of snow depth.

Sum of daily relative class sizes (R-sizes) from Mammoth Ski Patrol avalanche control records. This sum is normalized by the maximum R-size sum recorded over the study period. The final hazard formula is the unweighted average of the terrain (0-100) and the normalized R-size sum (0-100).

The hazard is also zero if there were no avalanches recorded on Mammoth Mountain. I did this to avoid showing avalanche hazard throughout the snow cover season. The exception is if there are one or two days following recorded avalanches with zero R-size sums, in which case the hazard decays as an exponential function. This addresses the issue where hazard might linger in the backcountry, even though it has been mitigated on Mammoth Mountain.
- Ned Bair

The code example that follows shows the generation of a plot of one day’s worth of avalanche hazard for the entire sierra.

# Important Constants
# This section hard-codes some of the constants relating to the data

# Spatial Coordinates of Hazard Data
sierraCoordinates <- c( left = -122.004,
                        right = -115.996 , 
                        bottom = 34.996 , 
                        top = 40.004)

# Coordinate Reference System
myCrsStr <- "+proj=longlat +datum=WGS84"

# Hazard Data Files
hazardFilenames <- list.files('data/HDF5', ".h5$")
hazardFiles <- list.files('data/HDF5', ".h5$", full.names=TRUE)
years <- seq(length(hazardFiles)) + YEAR_ZERO

## ---- Load_Example_Day ----
hazardBrick <- loadHazardYear(2011)
dates <- getZ(hazardBrick)
exampleDate <- "2010-12-25"
exampleDay <- hazardBrick[[grep(exampleDate,dates)]] # one day as an example
caMap <- gmap(x = extent(exampleDay), type="terrain", zoom=7) # map image from google

# reproject our example data onto the same CRS as the google data:
exampleDayGM <- projectRaster(from=exampleDay, crs = CRS("+init=epsg:3857")) 
caMap <- crop(caMap,exampleDayGM)   
plot(caMap, main=exampleDate)       # plot basemap from google
plot(exampleDayGM, add=T,legend=F, col=hazardPalette) # Plot example hazard over basemap

rm(exampleDayGM)

0.3 Data Reduction Goals

We want to use this avalanche data as an explanatory variable in a regression analysis of sheep survival. Ideally, we would like to summarize the degree to which avalanche hazard explains survival as a single value.

Sheep survival is calculated on a yearly basis over eight spatially-defined herd units, so we have 8 values per year compared to the \(~10^8\) hazard values. In order to say anything meaningful about the relationship between the hazard values and survival, we must drastically reduce the dimensions of the hazard data. Ideally we can reduce it to one scalar value per survival data point. In ther words, our goal is to transform each \(365*602*722\) yearly hazard cube into a vector of 8 scalars corresponding to the 8 herd units.

0.4 Feature Selection

The first step is to select meaningful features from our raw data, in other words, to throw out useless data points. To do this I make the following assumptions:

Hazard from spatial regions outside of a certain herd unit’s habitat have no impact on the survival in that herd unit.
A herd unit’s habitat is well approximated by the 95% confidence interval kernel density estimate of homerange.

Using this knowdedge, we eliminate most of the hazard data by just throwing it out. Only data that spatially intersect the 8 herd unit homeranges is saved. We will also look at feature selection based on elevation and time. Note that in all the feature selection steps, the data is reduced based on its location, not its value.

The code example shows importing shapefiles into R, making a list of herd-unit polygons, and plotting the polygons on a map.

hrShapeFiles <- list.files(gisDir, "kde.95.shp")
hrPolyList <- list()  # List of herd polygons 
for(i in 1:length(hrShapeFiles)) {
  herdCode <- substr(hrShapeFiles[i],1,2)
  hr.poly <- shapefile(paste(gisDir,hrShapeFiles[i],sep=''))
  hrPolyList[herdCode] <- hr.poly
}
allHerdPoly <- gUnionByList(hrPolyList) # Union of regions of interest
sheepHazardBrick <- mask(crop(hazardBrick, allHerdPoly),allHerdPoly)

## ---- Plot_Example_Day_Filtered ----
sheepHazardBrick <- mask(crop(hazardBrick, allHerdPoly),allHerdPoly)
exampleDaySheepOnly <- sheepHazardBrick[[grep(exampleDate,dates)]] # one day as an example
exampleDayGM <- projectRaster(from=exampleDaySheepOnly, crs = CRS("+init=epsg:3857")) 
caMapSheep <- gmap(x = extent(mapCoordinates), type="terrain", zoom=8)
caMapSheep <- crop(caMapSheep,exampleDayGM) 
plot(caMapSheep)       
plot(exampleDayGM, add=T,legend=F, col=hazardPalette)

rm(exampleDayGM)

## ---- Plot_Herd_Polygons ----
plot(caMapSheep)
n <- length(hrPolyList)
mapColors <- adjustcolor(brewer.pal(n,"Set1"), alpha.f = 0.5)
for(i in 1:n) {
  plot(spTransform(hrPolyList[[i]], gCrs), add=T, col=mapColors[i])
}

plot of chunk Herd Units

rm(n, mapColors)

0.4.1 Spatial Divisions (Herd VS Recovery)

All methods of data reduction are performed on a herd-unit basis as outlined above, and also on recovery unit basis. The recovery unit areas are defined as the union of the herd unit areas comprised by the recovery unit. For example, the spatial boundaries of the Northern Recovery Unit are defined by the union of the Warren and Gibbs herd units.

The code example shows combining herd units to form recovery units, and plots the recovery units.

# Define recovery units:
ruSouthHerds <- hrPolyList[ruSouth]
ruCentralHerds <- hrPolyList[ruCentral]
ruNorthHerds <- hrPolyList[ruNorth]

# Build recovery unit polygons:
ruSouthPoly <- gUnionByList(ruSouthHerds)
ruCentralPoly <- gUnionByList(ruCentralHerds)
ruNorthPoly <- gUnionByList(ruNorthHerds)

# List of recovery polygons:
ruPolyList <- list(South=ruSouthPoly, Central=ruCentralPoly, North=ruNorthPoly)

Now we can plot the recovery units:

plot(caMapSheep)
n <- length(ruPolyList)
mapColors <- adjustcolor(brewer.pal(n,"Set1"), alpha.f = 0.5)
for(i in 1:n) {
  plot(spTransform(ruPolyList[[i]], gCrs), add=T, col=mapColors[i])
}

plot of chunk Recovery Units

rm(n, mapColors)

0.4.2 Elevation Filtering

We can also reduce the amount of features by discarding hazard data from certain elevations. In doing so, we assume that certain parts of the terrain do not influence sheep survival, regardless of their modeled avalanche hazard. For the elevation filtering in this data analysis, we make the following assumptions:

Below a certain elevation threshold, there is not enough snow to be hazardous to sheep, even if the model presents some hazard.
Above a certain elevation, sheep are not present during the time where there is avalanche hazard.
We can ignore the fine scale elevation differences within a single pixel of avalanche hazard.

The last assumption must be explained. I am filtering the hazard pixels based on the average elevation within one hazard pixel. If the average is too low or too high, that entire pixel is thrown away. It might be better to instead map the hazard values onto a higher resolution grid, so that it matches up precisely with the resolution of the elevation data. Doing this would allow me to select hazard pixels with a much more precise elevation filter. I am not doing this because I wish to use the same pixel scale for both the elevation-filtered data and the rest of the data reduction procedures. In other words, laziness.

The code example shows generating an elevation filter, and filtering a day’s worth of hazard data based on the elevation.

First lets look at the elevations we’ve seen avalanche mortalities before:

mortElevations <- read.csv(file="ElevAvalMorts.csv", header=TRUE, sep=",")
ggplot(mortElevations, aes(x=RASTERVALU*M_PER_FOOT)) + 
  geom_bar(width=20) + 
  xlab("Elevation (m)") +
  ggtitle('Observed Avalanche Mortality Elevations')

plot of chunk Mortality Elevations

From the plot, we can guess some boundaries for an elevation filter. It looks like we want to use elevations roughly in the 2250m to 3500m range. However, our sample is small here and it’s unwise to trust these cutoffs too much. Also, observed sheep are probably at the bottom of slides, so we might want a higher upper bound.

# Generate reference raster file:
referenceBrick <- loadHazardYear(years[1])
referenceDay <- referenceBrick[[86]]
writeRaster(referenceDay, filename="gis/referenceAvyRaster.tif",format="GTiff",overwrite=T)
# Clear large objects from memory:
rm(referenceBrick, referenceDay)   

# All-Pass Filter (used for default processing)
noFilter <- generateElevationFilter(elevationFile='gis/elev_hdr.tif', 
                                    alignmentFile='gis/referenceAvyRaster.tif')
# Elevation Filter Generation
elevationFilter <- generateElevationFilter(elevationFile='gis/elev_hdr.tif', 
                                           alignmentFile='gis/referenceAvyRaster.tif',
                                           minElevation=2250,
                                           maxElevation=3657)

## ---- Plot_Elevation_Filter ----
plot(caMap)
plot(projectRaster(from=elevationFilter, crs = CRS("+init=epsg:3857")),
     legend=F, add=T, col=c("#FF0F0F00"), alpha=0.5)

plot of chunk Plot Elevation Filter

## ---- Plot_Hazard_Raw ----
plot(caMap)
plot(projectRaster(from=exampleDay, crs = CRS("+init=epsg:3857")),
     col=lavaPalette2, legend=F, add=T)

plot of chunk Plot Raw Hazard

## ---- Plot_Hazard_ElevationFiltered ----
plot(caMap)
filteredExample <- filterRaster(exampleDay, elevationFilter)
plot(projectRaster(from=filteredExample, crs = CRS("+init=epsg:3857")),
     col=lavaPalette2, legend=F, add=T)

plot of chunk Plot Filtered Hazard

rm(filteredExample)

0.4.3 Seasonal Selection

We can throw out some data based on the season as well. We will do that by not considering Summer data. In the analysis the time frame will go from the beginning of the water year through May. This does not have a big effect on the results, but it can simplify things and reduce the processing load.

Let’s take a look at the data to see what time frames have data.

First we we cna process some statistics on the data and load it into a dataframe:

stackedStatsDF <- data.frame()
for( year in years) {
  hazardBrick <- loadHazardYear(year)
  dates <- getZ(hazardBrick)
  hazardBrick <- mask(crop(hazardBrick, allHerdPoly),allHerdPoly)
  hazardBrick <- setZ(hazardBrick, dates)
  means <- cellStats(mask(crop(hazardBrick, allHerdPoly),allHerdPoly), stat='mean')
  FOT_60 <- as.numeric(brickApply(hazardBrick, fractionOverThreshold, 60))
  index <- c(1:365)
  dates <- dates[index]
  means <- means[index]
  FOT_60 <- FOT_60[index]
  tmpDF <- data.frame(index=index,
                      water_year=as.character(year),
                      date=dates, 
                      mean= means,
                      FOT_60=FOT_60)
  
  stackedStatsDF <- rbind(stackedStatsDF, tmpDF)
  rm(tmpDF)
}
head(stackedStatsDF)

Let’s examine the data with a plot:

minIdx <- min(stackedStatsDF[stackedStatsDF$mean>0,'index'])
maxIdx <- max(stackedStatsDF[stackedStatsDF$mean>0,'index'])
avySeasonIndex <- c(minIdx:maxIdx)
tmpDF <- stackedStatsDF[stackedStatsDF$mean>0,] # find non-zero value

monthLabels <- format(seq.Date(from = as.Date("2000-10-01"), to=as.Date("2001-09-30"), by="month"), "%b")
ggplot(stackedStatsDF, aes(x=index, y=mean, group=water_year)) +
  #geom_tile(aes(height=max(stackedStatsDF$value))) +
  geom_area(color = "black", fill = "red",alpha = .2) +
  facet_grid(water_year~., scales="free_x", space="free_x") +
  # geom_blank() +
  #scale_fill_gradient2(low="blue", high="red") +
  scale_x_continuous(name="Time of Year", breaks=seq(30, 365, by = 30), labels = monthLabels) +
  ylab("means") + 
  annotate("rect",xmin=minIdx, xmax=maxIdx, ymin=0, ymax=100,alpha=.1, fill="blue")

plot of chunk Plot Year Comparison

tmpDF[tmpDF$index==minIdx,]$date # Earliest date with hazard

## [1] "2009-10-15"

tmpDF[tmpDF$index==maxIdx,]$date # Latest date with hazard.

## [1] "2011-06-02"

rm(tmpDF)

Ok, now we can restrict our data to a more meaningful avalanche season:

tmpDF <- stackedStatsDF[(stackedStatsDF$index > minIdx) & (stackedStatsDF$index < maxIdx), ]
ggplot(tmpDF, aes(x=index, y=mean, group=water_year)) +
  #geom_tile(aes(height=max(stackedStatsDF$value))) +
  geom_area(color = "black", fill = "red",alpha = .2) +
  facet_grid(water_year~., scales="free_x", space="free_x") +
  # geom_blank() +
  #scale_fill_gradient2(low="blue", high="red") +
  scale_x_continuous(name="Time of Year", breaks=seq(30, 365, by = 30), labels = monthLabels) +
  ylab("means")

plot of chunk Restricted Season

rm(tmpDF)

0.4.4 Summary of Feature Selection

To review, we’ve now filtered out all the data outside of the herd unit boundaries, and split it up according to two separate methods. One method based on herd units, and one based on recovery units. Lastly, for each of these feature selection methods, we can optionally filter the data based on elevation. Therefore, when we compare methods of feature extraction, we will have four categories of data: herd vs recovery units, and elevation filtered vs. unfiltered.

0.5 Feature Extraction

Now that the data is split up by herd unit, we still have a dimensionality problem. We need to reduce our remaining data to one scalar value per spatial unit per year. There are infinitely many ways of doing this, and the main point of this analysis is to compare a few simples ones.

0.5.1 Simple Means

The simplest method is to just take the mean of all the hazard values within a spatial region. This means is across both space and time.

The code shows how to compute this with built-in R functions:

# statsOfBrickList
# calculate a basic statistics (mean, sd, median)
#   for a list of raster bricks
#   This aggregates the data first in space, 
#   and then by layer (time)
# Returns a list of stats, one value for each brick
#   in the brick list
statsOfBrickList <- function(brickList, statArg, na.rm.Arg=T) {
  statList <- lapply(brickList, FUN=cellStats, stat=statArg, na.rm=na.rm.Arg)
  statList <- lapply(statList, statArg)
  return(statList)
}

# meansOfBrickList
# Shortcut for the statsOfBrickList function
#   using the 'mean' as the stat. 
meansOfBrickList <- function(brickList) {
  return(statsOfBrickList(brickList, statArg='mean', na.rm.Arg=T)
}

0.5.2 Mean Fraction Over Threshold

This is another simple method. It is related to the skew, but with a hard-valued threshold. For each region we compute the fraction of the area with hazard above a certain threshold. This fraction is then averaged over the season.

The following code shows how to compute this stat:

# fractionOverThreshold
# Find the fraction of pixels in a raster above a certain threshold 
# arguments: 
#   raster: a raster layer to analyze
#   threshold: a threshold value to compare pixel values to
# returns: a real valued ratio 
fractionOverThreshold <- function(raster, threshold) {
  valueMatrix <- as.matrix(raster)
  nValues <- sum(!is.na(valueMatrix))
  nOverThreshold <- sum(!is.na(valueMatrix[valueMatrix > threshold]))
  return(nOverThreshold/nValues)
}

# meanFOTofBrickList
# Apply the fraction-over-threshold stat to a brick list
# Generates a list of meanFOT stats for each brick. 
meanFOTofBrickList <- function(brickList, thresholdArg) {
  overThresholdList <- list()
  for(brickName in names(brickList)) {
    y = vector()
    for (i in 1:nlayers(brickList)) {
      y[i] <- fractionOverThreshold(brickList[[brickName]][[i]], threshold=thresholdArg)
    }
    overThresholdList[[brickName]] <- y
  }
  meanFOTList <- lapply(overThresholdList, mean)
  return(meanFOTList)
}

Now that we’ve seen an example of how to apply a feature extraction function to a list of hazard bricks, let’s take that same code structure and generalize it. The following code shows the function “brickList2ScalarList”, which takes any function like “fractionOverThreshold” that generates scalars from rasters, and applies it to a list of raster bricks.

# brickList2ScalarList
# This is a generic function to aggregate data in a list of raster bricks.
# Arguments: 
#     brickList: list of raster bricks
#     statFun: by-layer integration function. Defaults to mean
#     raster2scalarFun: within-layer integration function
# Returns: 
#   scalarList: list of scalar values, one for each brick in brickList
brickList2ScalarList <- function(brickList, statFun=mean, raster2scalarFun, ...) {
  scalarList <- list()
  for (i in 1:length(brickList)) {
    brick <- brickList[[i]]
    y = vector()
    for (j in 1:nlayers(brick)) {
      y[j] <- raster2scalarFun( brick[[j]], ...)
    }
    scalarList[[i]] <- y
  }
  scalarList <- lapply(scalarList, statFun)
  if (!is.null(names(brickList))) {
    names(scalarList) <- names(brickList)
  }
  return(scalarList)
}

Here’s an example of using that general framework to compute two statistics and compare them for a single year:

year <- 2011
hazardBrick <- loadHazardYear(year)   # load raw hazard data
sheepHazardBrick <- mask(crop(hazardBrick, allHerdPoly),allHerdPoly) # Apply spatial filter
rm(hazardBrick)   # unload raw data
polyList <- hrPolyList
myFilter <- noFilter
filteredBrick <- filterRaster(sheepHazardBrick, myFilter) # Apply elevation/other filter
brickList <- generateBricksForRegions(polyList, filteredBrick) # Divide data by spatial polygon
# rm(filteredBrick)

thresholds <- c(50, 60, 70)
plotDF <- data.frame(herd = names(brickList),
                     area=sapply(brickList,areaOfBrick),
                     lat=sapply(brickList, midLatitude)
                     )
for ( threshold in thresholds) {
  meanFOTList <- brickList2ScalarList(brickList, statFun=mean, fractionOverThreshold, threshold=threshold)
  plotDF <- cbind(plotDF,unlist(meanFOTList))
  colnames(plotDF)[length(plotDF)] <- paste("meanFractionOver", threshold,sep='')
}

plotDF_long <- melt(plotDF, measure.vars=colnames(plotDF)[4:6])
p1 <- ggplot(plotDF_long, aes(x=reorder(herd, lat), y=value, fill=variable)) +
  geom_bar(stat="identity", position="dodge", colour="black") +
  scale_fill_brewer(palette="Pastel2") +
  xlab("herd, ordered by latitude") +
  ylab("time-average of % over threshold") + 
  ggtitle("Threshold Metrics by Herd (2011)")

myFilter <- elevationFilter
filteredBrick <- filterRaster(filteredBrick, myFilter) # Apply elevation/other filter
brickList <- generateBricksForRegions(polyList, filteredBrick) # Divide data by spatial polygon
rm(filteredBrick)

plotDF <- data.frame(herd = names(brickList),
                     area=sapply(brickList,areaOfBrick),
                     lat=sapply(brickList, midLatitude)
                     )
for ( threshold in thresholds) {
  meanFOTList <- brickList2ScalarList(brickList, statFun=mean, fractionOverThreshold, threshold=threshold)
  plotDF <- cbind(plotDF,unlist(meanFOTList))
  colnames(plotDF)[length(plotDF)] <- paste("meanFractionOver", threshold,sep='')
}

plotDF_long <- melt(plotDF, measure.vars=colnames(plotDF)[4:6])
p2 <- ggplot(plotDF_long, aes(x=reorder(herd, lat), y=value, fill=variable)) +
  geom_bar(stat="identity", position="dodge", colour="black") +
  scale_fill_brewer(palette="Pastel2") +
  xlab("herd, ordered by latitude") +
  ylab("time-average of % over threshold") + 
  ggtitle("Threshold Metrics by Herd, elevaton filtered (2011)")


grid.arrange(p1, p2, nrow=1, ncol=2, widths=c(4,4), heights=2)

plot of chunk Single Year Comparison

0.6 Crunching the Data

Now that we have established how to load the data, select the features, and use different function to extract the featueres, the last step in our data reduction pathway is to put that all together apply those methods to all years of data. Then we can store the reduced data and compare it. To do this, we will take all of the different methods discussed above, and run loops over each of the possible combinations.

The code is as follow:

################################
# Feature Selection Methods    #
################################

spatialFilters <- list(none=noFilter, elevation=elevationFilter)
spatialDivisions <- list(HerdUnits=hrPolyList, RecoveryUnits=ruPolyList)

################################
# Feature Extraction Methods   #
################################

extractionMethods <- list()
extractionMethods[['simpleMeans']] <- function(brickList) { 
  brickListStats(brickList,'mean') }
extractionMethods[['meanFOT50']] <- function(brickList) { 
  brickList2ScalarList(brickList, statFun=mean, fractionOverThreshold, threshold=50) }
extractionMethods[['meanFOT60']] <- function(brickList) { 
  brickList2ScalarList(brickList, statFun=mean, fractionOverThreshold, threshold=60) }


################################
# Main Data Crunch Loop        #
################################

# Main data proceesing loop, runs over years in analysis
dataByYear <- list()
for ( year in years) {
  # Select hazard data file by year:
  hazardBrick <- loadHazardYear(year)
  dates <- getZ(hazardBrick)
  # Use only data within the herd polygons:
  sheepHazardBrick <- mask(crop(hazardBrick, allHerdPoly),allHerdPoly) 
  rm(hazardBrick)
  dataByDivision <- list()
  for ( divName in names(spatialDivisions)) { 
    dataByFilter <- list()
    for ( filterName  in names(spatialFilters))  {
      polyList <- spatialDivisions[[divName]]
      myFilter <- spatialFilters[[filterName]]
      
      filteredBrick <- filterRaster(sheepHazardBrick, myFilter)
      brickList <- generateBricksForRegions(polyList, filteredBrick)
      rm(filteredBrick)
      
      dataByMethod <- list()
      for ( method in names(extractionMethods) ) {
        extraction <- extractionMethods[[method]]
        extractedFeatureList <- extraction(brickList)
        dataByMethod[[method]] <- extractedFeatureList
      }
      dataByFilter[[filterName]] <- dataByMethod
    }
   dataByDivision[[divName]] <- dataByFilter 
  }
  dataByYear[[as.character(year)]] <- dataByDivision
}

0.7 Formatting Data

The main data loop stored the results in a recursive list-of-lists. The list has the water year as its first index, which makes it hard to see time sequences. This format is not easy to use for graphing, so I ran another loop to convert it into a couple different formats:

List of list, with time as the last index, to make time-sequences easily accessible
Long data frame, to make data easily graphable.
Wide data frame, to make data easily exportable into CSV format.

0.8 Comparison and Visualization

0.8.1 Elevation Filtering Comparison

Elevation filter does not appear to have a big influnce on the extracted data:

d <- all.df[all.df$division=="RecoveryUnits" & all.df$method=="simpleMeans", ]
elPlotMeans <- ggplot(d, aes(x=year, y=value, color=filter, group=filter) )+
  geom_line(size=2, alpha=0.5) + 
  scale_colour_brewer(palette="Set1") +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  facet_grid(unit~., scales="free_x", space="free_x") +
  ylab("means") + xlab("water year") +
  ggtitle('Elevation Filter for Mean Hazard, (2286-3657)m')

d <- all.df[all.df$division=="RecoveryUnits" & all.df$method=="meanFOT60", ]
elPlotFOT <- ggplot(d, aes(x=year, y=value, color=filter, group=filter) )+
  geom_line(size=2, alpha=0.5) + 
  scale_colour_brewer(palette="Set1") +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  facet_grid(unit~., scales="free_x", space="free_x") +
  ylab("means") + xlab("water year") +
  ggtitle('Elevation Filter for FOT_60, (2286-3657)m')

grid.arrange(elPlotMeans, elPlotFOT, nrow=1, ncol=2, widths=c(4,4), heights=2)

plot of chunk Elevation Filtering Comparison

d <- all.df[all.df$division=="RecoveryUnits" & 
              all.df$filter=="none" , ]
p1 <- ggplot(d, aes(x=year, y=(value/sum(value)), color=unit, group=unit) )+
  geom_line(size=2, alpha=0.5) + 
  #geom_bar(stat="identity", alpha = .5, width=1) +
  scale_colour_brewer(palette="Set1") +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  facet_grid(method ~., scales="free") +
  ggtitle("Fraction-Over-Threshold vs Means") +
  ylab("means") + xlab("water year") 

d <- all.df[all.df$division=="HerdUnits" & 
              all.df$filter=="none" , ]
p2 <- ggplot(d, aes(x=year, y=(value/sum(value)), color=unit, group=unit) )+
  geom_line(size=2, alpha=0.5) + 
  #geom_bar(stat="identity", alpha = .5, width=1) +
  scale_colour_brewer(palette="Set1") +
  guides(colour = guide_legend(override.aes = list(alpha = 1))) +
  facet_grid(method ~., scales="free") +
  ggtitle("Fraction-Over-Threshold vs Means") +
  ylab("means") + xlab("water year") 

grid.arrange(p1, p2, nrow=1, ncol=2, widths=c(6,6), heights=7)

plot of chunk FOT vs Means