STAT 545A Final Project: Explore the temporal and spatial distribution of Canadian Large Fire

Yumian Hu

2013-10-21

Introduction

In this project, I will use the Canadian Large Fire Database (LFDB) to explore the temporal and spatial distribution and association of landscape-scale fire in Canada. LFDB is a compilation of forest fire data from all Canadian agencies, including provinces, territories, and Parks Canada. The data set includes only fires greater than 200 hectares in final size; these represent only a few percent of all fires but account for most of the area burned (usually more than 97%). For more information, please visit website of Natural Resources Canada, LFDB can also be downloaded ditrectly from this webpage.

All the data, code and figures for the project can be found on my Github.

Big Picture

In thie project, I will explore from the following aspects.

Data Description and Import

The dataset is comma-delimited ASCII, named “LFD_A02_5999_e.txt” originally, with each row representing a record for a given fire. The raw data incorporates the following 16 variables:

I include 8 most meaningful variables in my data analysis, and summarize thier information and description in the table below.

Variable_Name Variable_Class Variable_Description Missing_Entry Missing_NumPer
1 Year integer Year of fire start: from 1959 to 1999 0 (0.000)
2 Month integer Month of fire start (No January, strange) 0 111 (0.010)
3 Day integer Day of fire start 0 112 (0.010)
4 Province factor 11 levels: BC, AB, SK, MB, ON, QC, NF, NB, NS, YK, NWT 0 (0.000)
5 Latitude numeric Fire start location, from 42.56 to 68.98 0 79 (0.007)
6 Longitude numeric Fire start location, from -141 to -52.65 0 79 (0.007)
7 Cause factor 3 levels: MAN (human), LTG (lightning), UNK (unknown) UNK 354 (0.032)
8 Size numeric Final fire size in hectares 0 (0.000)

There is no record of January in the dataset, quite strange.

Now let's import the clean dataset and load necessary packages.

> jDat <- read.table("data/CleanData.tsv", header=T)
> library(plyr)          # for data aggregation
> library(RColorBrewer)  # for color setting
> library(ggplot2)       # for advanced figure plotting 
> library(gplots)        # for heatmap
KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009

Attaching package: 'gplots'

The following object is masked from 'package:stats':

lowess
> library(RgoogleMaps)   # for showing map of Canada
Loading required package: png Loading required package: RJSONIO

Perform a superficial check that data import went OK.

> str(jDat)
'data.frame':   10874 obs. of  8 variables:
 $ Year     : int  1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ...
 $ Month    : int  5 5 5 5 5 5 5 5 7 5 ...
 $ Day      : int  15 14 17 17 16 21 14 22 31 15 ...
 $ Province : Factor w/ 11 levels "AB","BC","MB",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Latitude : num  54.5 55.6 55.3 54.8 56.2 ...
 $ Longitude: num  -111 -116 -116 -117 -120 ...
 $ Cause    : Factor w/ 3 levels "LTG","MAN","UNK": 2 2 2 1 2 2 2 2 1 2 ...
 $ Size     : num  1190 478 607 245 260 ...
> head(jDat)
  Year Month Day Province Latitude Longitude Cause   Size
1 1959     5  15       AB    54.46    -111.2   MAN 1190.3
2 1959     5  14       AB    55.61    -116.5   MAN  477.7
3 1959     5  17       AB    55.30    -116.4   MAN  607.3
4 1959     5  17       AB    54.76    -117.3   LTG  245.3
5 1959     5  16       AB    56.24    -119.7   MAN  259.5
6 1959     5  21       AB    56.28    -119.6   MAN  696.4

Reorder the levels of Province: From West to East and from South to North and sort the data by Year and Month

> ProLevels <- c("BC", "AB", "SK", "MB", "ON", "QC", "NB", "NS", "NL", "YT", "NT")
> jDat <- arrange(transform(jDat, Province = factor(Province, levels = ProLevels)), Province)
> jDat <- jDat[with(jDat, order(Year, Month, Day, Province, Size)), ]
> str(jDat)
'data.frame':   10874 obs. of  8 variables:
 $ Year     : int  1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ...
 $ Month    : int  4 4 5 5 5 5 5 5 5 5 ...
 $ Day      : int  1 27 8 10 10 11 11 12 12 12 ...
 $ Province : Factor w/ 11 levels "BC","AB","SK",..: 1 2 3 1 6 1 1 1 1 1 ...
 $ Latitude : num  60 55.1 53.6 56.6 47.5 ...
 $ Longitude: num  -133.6 -116.4 -108.7 -122 -77.3 ...
 $ Cause    : Factor w/ 3 levels "LTG","MAN","UNK": 2 2 2 2 2 2 2 2 2 2 ...
 $ Size     : num  3205 317 1069 514 263 ...

Temporal Distribution of Large Fire

As a basic check, let's first look at how Fire Frequency and Fire Size change with time.

How does Fire Frequency Change over Year

plot of chunk unnamed-chunk-7plot of chunk unnamed-chunk-7

I plot the total number of Large Fire against Year, and grouped by Causes in the right figure. There is no obvious increasing or decreasing trend for fire frequency. It's rather fluctuated with its peak at the year of 1979. Most of the fire, of course, was caused by lightning but still rather amount was caused by human, especially from 1959 to 1969.

How does Fire Size Change over Month

plot of chunk unnamed-chunk-8

To make the above figure, I choose records from year 1959, 1969, 1979, 1989 and 1999, plot the log transformed fire size against its happening month and wrap by province. Most of the provinces show the similar volcano shape, centered from May to August, which implies that Summer and Autumn is the frequent seasons for large fire.

Spatial Distribution of Large Fire

Let's move to the spatial part, look at the geographically distribution of Large Fire location every 5 years from 1959 to 1999.

Fire Frequency in Different Provinces

plot of chunk unnamed-chunk-9

It's noticed that in BC Province, a major percentage of large fire was cuased by human being!

Fire Frequency seen from Google Map

Since the dataset incorporate Latitude and Longitude of each Fire, I can download the GoogleMap of Canada and plot the fire location in this map. You need to install RgoogleMaps package first.

> center = c(mean(jDat$Latitude), mean(jDat$Longitude))
> zoom <- min(MaxZoom(range(jDat$Latitude), range(jDat$Longitude)))
> GeoMap <- GetMap(center = center, zoom = zoom, destfile = "figure/capre.png")
[1] "http://maps.google.com/maps/api/staticmap?center=55.9010302651573,-103.490342349641&zoom=3&size=640x640&maptype=mobile&format=png32&sensor=true"
> gDat <- subset(jDat, Year %in% jYear)
> PlotOnStaticMap(GeoMap, lat = gDat$Latitude, lon = gDat$Longitude, 
+                 destfile ="figure/capre.png", cex = 0.5, pch = 20, col = "red")

plot of chunk unnamed-chunk-10

Emulate Real Map with ggplot2

I would like to emulate it with ggplot2 and distinguish Month by different colors. plot of chunk unnamed-chunk-11

Again it shows the seasonal trendence with the most Fire happening from May to August and in the middle of Canada (SK, MB, ON).

Buble Chart: Seen from both Temporal and Spatial Aspects

In this part, I will replicate the Bubble Chart from Gapminder Project, which displays both the Temporal and Spatial trends of Large Fire. Based on the above figure, I make the area of solid circle proportional to its fire size at that location (not the exact fire size if it exceeds its provincial territory).

plot of chunk unnamed-chunk-12

Heatmap: Is the monthly distribution of Fire Frequency same every Year?

To better show the temporal distribution of fire, I use heatmap to demonstrate the number of fire happening at every month and year. The darker the blue is, the more fire it presents.

> gDat <- daply(jDat, ~ Year + Month, function(x) length(x$Year))
> ## replace NA with 0 in the gDat
> gDat[is.na(gDat)] <- 0
> gDat <- as.data.frame(gDat)
> colnames(gDat) <- paste0("Month ",colnames(gDat))
> rownames(gDat) <- paste0("Year ", rownames(gDat))
> gDat <- as.matrix(t(gDat))
> gBuPuFun <- colorRampPalette(brewer.pal(n = 9, "BuPu"))
> heatmap.2(gDat, col = gBuPuFun, trace = "none", main = "Heatmap of the Number of Fire")

plot of chunk unnamed-chunk-13

To explore whether the monthly distribution of fire frequency similar in every year, I first calculate the correlation between every two years and then show the result with heatmap.

> gDatCor <- cor(gDat)
> heatmap.2(gDatCor, col = gBuPuFun, trace = "none", 
+           main = "Heatmap of Correlation among Years")

plot of chunk unnamed-chunk-14

It shows that the color of grid between year 1960 and 1995 is the lightest, which indicates the lowest correlation. We can also demonstrate it from the correlation matrix.

> min(gDatCor) 
[1] 0.112
> minrow <- which.min(gDatCor) %/% nrow(gDatCor) + 1
> mincol <- which.min(gDatCor) %% nrow(gDatCor)
> gDatCor[minrow, mincol] == min(gDatCor)
[1] TRUE
> (year1 <- as.numeric(substr(rownames(gDatCor)[minrow], 6, 9)))
[1] 1960
> (year2 <- as.numeric(substr(colnames(gDatCor)[mincol], 6, 9)))
[1] 1995

Let's plot the monthly trend of fire frequency for each year and highlight the year 1960 and 1995. plot of chunk unnamed-chunk-16

Cubic Spline: Fluctuation of Fire Frequency over Year in Different Provinces

The following figure displays the fire frequency trend over year in each province. Most of them are quite fluctuated which makes it meaningless to fit a linear regression line. However, we can still compare the fluctuation of fire frequency among different provinde.

plot of chunk unnamed-chunk-17

I first fit a cubic spline (a cubic spline is a spline constructed of piecewise third-order polynomials which pass through a set of control points) for each type of fire cause per province. Then I use the average curvature (curvature/length of year) as a representative of fluctuation of trend; the larger the average curvature is, the more fluctuation the curve has.

I drop the province of “NB” and “NS” since they don't have enough number of points (at least four) to fit the cubic spline.

LTG MAN
NT 17251.58 58.30
MB 10945.36 3269.13
ON 10026.15 111.81
SK 8025.57 360.97
QC 4141.79 298.78
BC 3101.90 2978.33
YT 2793.78 7.18
NL 729.91 68.84
AB 680.95 894.57

Conclusion

Through this exploratory data analysis journey, I find some preliminary interesting phenomena in the large fire dataset. There is a seasonal trend and geographical pattern in the distribution of fire frequency and fire size. For more information, the following papers present analyses of this database, which are recommended on the Website of Natural Resources Canada.