When most people think of California, they imagine mild and sunny weather. The Mediterranean climate means more blue skies than rainy days. This is why most water does not come from rain. Coastal areas get most of their water from moist air blowing off the Pacific Ocean, while rivers stemming from glacial melts provide water to inland areas. Water becomes even more scarce as the global temperature rises, drying out trees and grasslands into tinder. The Golden State burns into bronze, black, and red when lightning storms ignite flames. Fires happen more often and are larger with each year as long-term drought conditions in California continue.

This document uses records of wildfire events to show fires trends from 2013 to 2021. These records are publicly available at the California Department of Forestry & Fire Protection (CAL FIRE) website. Step-wise guidance shows how I analyzed and visualized this data.


How To Find the Data


Step 1: Use this link (https://www.fire.ca.gov/incidents/)

Step 2: Scroll to the bottom of the web page

Step 3: Click the “All Incident Data (.csv format)” link to download data.

The link (https://www.fire.ca.gov/imapdata/mapdataall.csv) downloads a Microsoft Excel file in the comma separated value (.csv) format on your computer.


Prepare to Import Data


The first step in data analyses is getting ready to import it into the statistical programming language, R. We will install and load packages then set our working directory. The below packages were already installed in my R program, but you may need to install them by typing the code in brackets [install.packages”package”] then replacing the word package with the name of the package that you want to load.

# load packages

  # open data file in program
    # install.packages("openxlsx")
    library(openxlsx)
  # clean & sort data
    # install.packages("dplyr")
    library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
  # visualize data
    library(ggplot2)
  # map data
    library(sp)
    library(rgdal)
## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## 
## rgdal: version: 1.5-30, (SVN revision 1171)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.4.2, released 2022/03/08
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rgdal/gdal
## GDAL binary built with GEOS: FALSE 
## Loaded PROJ runtime: Rel. 8.2.1, January 1st, 2022, [PJ_VERSION: 821]
## Path to PROJ shared files: /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
    library(sf)
## Linking to GEOS 3.9.1, GDAL 3.4.0, PROJ 8.1.1; sf_use_s2() is TRUE
    library(leaflet)
  # sonify data
    library(tuneR)
      
    # packages are necessary extensions of programming code for analyses

# set working directory
  setwd("~/OneDrive - ucsc.edu/Post-Grad/Grist - Data Fellow/Grist Article Visuals/Fires")
    # Change the file pathway for your computer system

Import & Review Data


Now we can view the dataset and wrangle it for visuals of trends. Wrangling data means removing missing entries, sorting, and adding columns without changing existing values or creating new values in a misleading way.

# load data
  dat <- read.xlsx("CALFIRE_all_incidents.xlsx")
  
# review all years with data
  unique(dat$incident_dateonly_created)
##  [1] 2017 2009 2013 2014 2015 2016 2018 1969 2019 2020 2021
# determine if the first date variable is in the date format
  str(dat$incident_date_created) # numeric format
##  chr [1:1788] "2017-10-31T11:22:00Z" "2009-05-24T14:56:00Z" ...
  dat$incident_date_created <- 
    as.Date(dat$incident_date_created) 
  # convert to date & replace old variable with the date format
  str(dat$incident_date_created) # now is in date format
##  Date[1:1788], format: "2017-10-31" "2009-05-24" "2013-02-24" "2013-04-20" "2013-04-30" ...
# repeat steps for the second date variable
  str(dat$incident_date_last_update) # numeric format
##  chr [1:1788] "2018-01-09T13:46:00Z" "2020-09-16T14:07:35Z" ...
  dat$incident_date_last_update <- 
    as.Date(dat$incident_date_last_update) # convert to date & replace
  str(dat$incident_date_last_update) # now in date format
##  Date[1:1788], format: "2018-01-09" "2020-09-16" "2013-02-28" "2013-04-22" "2013-05-01" ...
# remove fire incidents with 0 acres burned
  dat2 <- filter(dat, !incident_acres_burned == "0")
  
# review years with fires where acres burned
  unique(dat2$incident_dateonly_created)
##  [1] 2017 2009 2013 2014 2015 2016 2018 1969 2019 2020 2021
# data skips from 1969 to 2009 then to 2013 before it becomes consistent
  # so select data starting in 2013
  dat3 <- filter(dat2, incident_dateonly_created > "2009")
  
# select columns of interest
  dat4 <- select(dat3, incident_name, incident_dateonly_created, 
                  incident_date_created, incident_date_last_update, 
                  incident_location, incident_acres_burned, 
                  incident_longitude, incident_latitude, incident_url)
  
  # add column for cumulative acres burned
    dat4 <- dat4 %>%  mutate(cumulative_acres_burned = 
                               cumsum(incident_acres_burned))

Visualize Data


We are going to create two different plots to show trends in acres burned by California wildfires from 2013 to 2021.

Plot 1 - Acres Burned by Year (2013 - 2021)

The below code creates plots showing how many acres burned during each wildfire in each year. A scatter plot is a good choice for this visual because it shows the acres burned, the dependent variable on the y-axis, by year, the independent variable on the x-axis. A grey circle indicates each fire, so the darker areas show multiple overlapping fire events. I could edit code to space apart these circles, but I chose not to because the dark areas indicate how more fires occur in later year than earlier ones. The y-axes goes from 20 acres to over 10 million acres burned. Although I could have scaled the axis by only showing fires larger than 1000 acres, I worked to show show the trend in acres burned by all fires together.

# plots
  # acres burned per year (scatter/dot plot)
    dat4 %>%
      ggplot(aes(x = incident_dateonly_created, y = incident_acres_burned)) +
      geom_point(alpha = 0.5) +
      labs(caption = "Data: CalFire Incidents (2013 - 2021); created by Will Ware",
      x = "",
      y = "Acres burned") +
      theme_classic() +
      scale_y_continuous(breaks =
          seq(20, 10400000, 40000)) +
      scale_x_continuous(breaks = 
          c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021),
          labels =
            c("2013", "2014", "2015", "2016", "2017", "2018", 
              "2019", "2020", "2021"))

Plot 2 - Cumulative Acres Burned by Year (2009 - 2021)

The below code creates a plot showing the sum total of all acres burned by all wildfires in a single year. A line plot is a good choice for this visual because it shows the annual totals for acres burned by California wildfires more clearly than a scatter plot would. The general trend is an increase in annual totals for acres burned by wildfire, but some small fires disrupt this trend. I chose to show the trend for all fires without removing ones below a certain value.

# cumulative acres burned per year (line plot)
      dat4 %>%
        ggplot() +
        aes(x = incident_dateonly_created, y = cumulative_acres_burned) +
        geom_line() +
        theme_classic() +
        scale_x_continuous(breaks = 
          c(2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021)) +
        scale_y_continuous(breaks = seq(30, 7510730, 375085)) +
        ylab("Cumulative acres burned") +
        xlab("Year") +
        labs(caption = "Acres Burned by California's Wildfires (2013 - 2021), created by Will Ware")

After visualizing trends in acres that each wildfire burned, I tried to make an interactive map for the fires in California. The result was a map that did not show the fires nor zoom in to where they occurred. I am not sure why this happened, even after extensively searching online, but I considered alternative mapping techniques.

# map CA's wildfire incidents
  leaflet(data = dat4) %>%
    addTiles() %>%
    addCircleMarkers(lng = ~incident_longitude,
                     lat = ~incident_latitude)

I also attempted to make the data more accessible for people with visual impairments by sonifying it. Sonification assigns pitches that correspond with unique data values, such as high notes for high values and low notes for low values. The code did not create a sound, so I want to solve this issue.

# sonify fire incidents by year
  sonify_stb <- function(y, out_dir = "~/Desktop") {
    
    tuneR::writeWave(
      sonify::sonify(
        x = as.Date(dat4$incident_dateonly_created), 
        y = dat4$incident_acres_burned,
        play = T),
      file.path(out_dir = "~/Desktop", paste0(y, ".wav")))
    
  }