2013-10-21
In this project, I will use the Canadian Large Fire Database (LFDB) to explore the temporal and spatial distribution and association of landscape-scale fire in Canada. LFDB is a compilation of forest fire data from all Canadian agencies, including provinces, territories, and Parks Canada. The data set includes only fires greater than 200 hectares in final size; these represent only a few percent of all fires but account for most of the area burned (usually more than 97%). For more information, please visit website of Natural Resources Canada, LFDB can also be downloaded ditrectly from this webpage.
All the data, code and figures for the project can be found on my Github.
In thie project, I will explore from the following aspects.
The dataset is comma-delimited ASCII, named “LFD_A02_5999_e.txt” originally, with each row representing a record for a given fire. The raw data incorporates the following 16 variables:
Year, Month, Day, Province, Fire_ID, Latitude, Longitude, Start_Date, Detect_Date, Cause, Size, Fire_Region, Fire_Zone, EcoZone, EcoRegion, EcoDistrict. I include 8 most meaningful variables in my data analysis, and summarize thier information and description in the table below.
| Variable_Name | Variable_Class | Variable_Description | Missing_Entry | Missing_NumPer | |
|---|---|---|---|---|---|
| 1 | Year | integer | Year of fire start: from 1959 to 1999 | 0 (0.000) | |
| 2 | Month | integer | Month of fire start (No January, strange) | 0 | 111 (0.010) |
| 3 | Day | integer | Day of fire start | 0 | 112 (0.010) |
| 4 | Province | factor | 11 levels: BC, AB, SK, MB, ON, QC, NF, NB, NS, YK, NWT | 0 (0.000) | |
| 5 | Latitude | numeric | Fire start location, from 42.56 to 68.98 | 0 | 79 (0.007) |
| 6 | Longitude | numeric | Fire start location, from -141 to -52.65 | 0 | 79 (0.007) |
| 7 | Cause | factor | 3 levels: MAN (human), LTG (lightning), UNK (unknown) | UNK | 354 (0.032) |
| 8 | Size | numeric | Final fire size in hectares | 0 (0.000) |
There is no record of January in the dataset, quite strange.
quot="\"" and comment.char="" when importing the raw data, since there are symbols of ' and # in the dataset. Cause) in my clean dataset since its percentage is relatively large and it seems to have some missing pattern (all the fire causes in BC in 1999 were unkown!). Now let's import the clean dataset and load necessary packages.
> jDat <- read.table("data/CleanData.tsv", header=T)
> library(plyr) # for data aggregation
> library(RColorBrewer) # for color setting
> library(ggplot2) # for advanced figure plotting
> library(gplots) # for heatmap
KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009
Attaching package: 'gplots'
The following object is masked from 'package:stats':
lowess
> library(RgoogleMaps) # for showing map of Canada
Loading required package: png Loading required package: RJSONIO
Perform a superficial check that data import went OK.
> str(jDat)
'data.frame': 10874 obs. of 8 variables:
$ Year : int 1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ...
$ Month : int 5 5 5 5 5 5 5 5 7 5 ...
$ Day : int 15 14 17 17 16 21 14 22 31 15 ...
$ Province : Factor w/ 11 levels "AB","BC","MB",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Latitude : num 54.5 55.6 55.3 54.8 56.2 ...
$ Longitude: num -111 -116 -116 -117 -120 ...
$ Cause : Factor w/ 3 levels "LTG","MAN","UNK": 2 2 2 1 2 2 2 2 1 2 ...
$ Size : num 1190 478 607 245 260 ...
> head(jDat)
Year Month Day Province Latitude Longitude Cause Size
1 1959 5 15 AB 54.46 -111.2 MAN 1190.3
2 1959 5 14 AB 55.61 -116.5 MAN 477.7
3 1959 5 17 AB 55.30 -116.4 MAN 607.3
4 1959 5 17 AB 54.76 -117.3 LTG 245.3
5 1959 5 16 AB 56.24 -119.7 MAN 259.5
6 1959 5 21 AB 56.28 -119.6 MAN 696.4
Reorder the levels of Province: From West to East and from South to North and sort the data by Year and Month
> ProLevels <- c("BC", "AB", "SK", "MB", "ON", "QC", "NB", "NS", "NL", "YT", "NT")
> jDat <- arrange(transform(jDat, Province = factor(Province, levels = ProLevels)), Province)
> jDat <- jDat[with(jDat, order(Year, Month, Day, Province, Size)), ]
> str(jDat)
'data.frame': 10874 obs. of 8 variables:
$ Year : int 1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ...
$ Month : int 4 4 5 5 5 5 5 5 5 5 ...
$ Day : int 1 27 8 10 10 11 11 12 12 12 ...
$ Province : Factor w/ 11 levels "BC","AB","SK",..: 1 2 3 1 6 1 1 1 1 1 ...
$ Latitude : num 60 55.1 53.6 56.6 47.5 ...
$ Longitude: num -133.6 -116.4 -108.7 -122 -77.3 ...
$ Cause : Factor w/ 3 levels "LTG","MAN","UNK": 2 2 2 2 2 2 2 2 2 2 ...
$ Size : num 3205 317 1069 514 263 ...
As a basic check, let's first look at how Fire Frequency and Fire Size change with time.
I plot the total number of Large Fire against Year, and grouped by Causes in the right figure. There is no obvious increasing or decreasing trend for fire frequency. It's rather fluctuated with its peak at the year of 1979. Most of the fire, of course, was caused by lightning but still rather amount was caused by human, especially from 1959 to 1969.
To make the above figure, I choose records from year 1959, 1969, 1979, 1989 and 1999, plot the log transformed fire size against its happening month and wrap by province. Most of the provinces show the similar volcano shape, centered from May to August, which implies that Summer and Autumn is the frequent seasons for large fire.
Let's move to the spatial part, look at the geographically distribution of Large Fire location every 5 years from 1959 to 1999.
It's noticed that in BC Province, a major percentage of large fire was cuased by human being!
Since the dataset incorporate Latitude and Longitude of each Fire, I can download the GoogleMap of Canada and plot the fire location in this map. You need to install RgoogleMaps package first.
> center = c(mean(jDat$Latitude), mean(jDat$Longitude))
> zoom <- min(MaxZoom(range(jDat$Latitude), range(jDat$Longitude)))
> GeoMap <- GetMap(center = center, zoom = zoom, destfile = "figure/capre.png")
[1] "http://maps.google.com/maps/api/staticmap?center=55.9010302651573,-103.490342349641&zoom=3&size=640x640&maptype=mobile&format=png32&sensor=true"
> gDat <- subset(jDat, Year %in% jYear)
> PlotOnStaticMap(GeoMap, lat = gDat$Latitude, lon = gDat$Longitude,
+ destfile ="figure/capre.png", cex = 0.5, pch = 20, col = "red")
ggplot2I would like to emulate it with ggplot2 and distinguish Month by different colors.
Again it shows the seasonal trendence with the most Fire happening from May to August and in the middle of Canada (SK, MB, ON).
In this part, I will replicate the Bubble Chart from Gapminder Project, which displays both the Temporal and Spatial trends of Large Fire. Based on the above figure, I make the area of solid circle proportional to its fire size at that location (not the exact fire size if it exceeds its provincial territory).
To better show the temporal distribution of fire, I use heatmap to demonstrate the number of fire happening at every month and year. The darker the blue is, the more fire it presents.
> gDat <- daply(jDat, ~ Year + Month, function(x) length(x$Year))
> ## replace NA with 0 in the gDat
> gDat[is.na(gDat)] <- 0
> gDat <- as.data.frame(gDat)
> colnames(gDat) <- paste0("Month ",colnames(gDat))
> rownames(gDat) <- paste0("Year ", rownames(gDat))
> gDat <- as.matrix(t(gDat))
> gBuPuFun <- colorRampPalette(brewer.pal(n = 9, "BuPu"))
> heatmap.2(gDat, col = gBuPuFun, trace = "none", main = "Heatmap of the Number of Fire")
To explore whether the monthly distribution of fire frequency similar in every year, I first calculate the correlation between every two years and then show the result with heatmap.
> gDatCor <- cor(gDat)
> heatmap.2(gDatCor, col = gBuPuFun, trace = "none",
+ main = "Heatmap of Correlation among Years")
It shows that the color of grid between year 1960 and 1995 is the lightest, which indicates the lowest correlation. We can also demonstrate it from the correlation matrix.
> min(gDatCor)
[1] 0.112
> minrow <- which.min(gDatCor) %/% nrow(gDatCor) + 1
> mincol <- which.min(gDatCor) %% nrow(gDatCor)
> gDatCor[minrow, mincol] == min(gDatCor)
[1] TRUE
> (year1 <- as.numeric(substr(rownames(gDatCor)[minrow], 6, 9)))
[1] 1960
> (year2 <- as.numeric(substr(colnames(gDatCor)[mincol], 6, 9)))
[1] 1995
Let's plot the monthly trend of fire frequency for each year and highlight the year 1960 and 1995.
The following figure displays the fire frequency trend over year in each province. Most of them are quite fluctuated which makes it meaningless to fit a linear regression line. However, we can still compare the fluctuation of fire frequency among different provinde.
I first fit a cubic spline (a cubic spline is a spline constructed of piecewise third-order polynomials which pass through a set of control points) for each type of fire cause per province. Then I use the average curvature (curvature/length of year) as a representative of fluctuation of trend; the larger the average curvature is, the more fluctuation the curve has.
I drop the province of “NB” and “NS” since they don't have enough number of points (at least four) to fit the cubic spline.
| LTG | MAN | |
|---|---|---|
| NT | 17251.58 | 58.30 |
| MB | 10945.36 | 3269.13 |
| ON | 10026.15 | 111.81 |
| SK | 8025.57 | 360.97 |
| QC | 4141.79 | 298.78 |
| BC | 3101.90 | 2978.33 |
| YT | 2793.78 | 7.18 |
| NL | 729.91 | 68.84 |
| AB | 680.95 | 894.57 |
Through this exploratory data analysis journey, I find some preliminary interesting phenomena in the large fire dataset. There is a seasonal trend and geographical pattern in the distribution of fire frequency and fire size. For more information, the following papers present analyses of this database, which are recommended on the Website of Natural Resources Canada.