DS736 US Wildfires

# add these two lines underneath the chunk where you have included the use_python line.
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'D:/Anaconda3/Library/plugins/platforms'

Introduction

Wildfires in the United States have been monitored and tracked for over 100 years. Data is collected and updated regularly to support the National Fire Program Analysis (FPA) system. The National Wildfire Coordinating Group is a federal program which sets fire operating and recording standards. This data presented here describes 2.3 million wildfire records from 1992 to 2020.

Dataset

The dataset analyzed includes 2,303,566 rows and 22 columns. The month with the most fire discoveries is July. The lowest amount of fire discoveries is December. The most common NWCG Reporting Agency is ST/C&L. The top NWCG Reporting Unit Name is the Georgia Forestry Commission. The largest recorded fire was 662,700 acres in Oklahoma in 2017. The average fire size recorded is 78 acres.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import folium
import seaborn as sns
from matplotlib.ticker import FuncFormatter #bc doing formatting
import matplotlib.patches as mpatches #patches are custom legends
import datetime

warnings.filterwarnings("ignore")
path = 'U:\\'
filename = 'firedata.csv'
#use columns we want and read. column names is too long and i am too lazy to type, hence the index
df = pd.read_csv(path+filename, usecols = [4, 6, 8, 14, 20, 21, 22, 24, 25, 27, 28, 30, 31, 32, 33, 34, 35, 36, 38])

#make you date
df['DISCOVERY_DATE'] = pd.to_datetime(df['DISCOVERY_DATE'], format = '%m/%d/%Y')
#make you date
df['CONT_DATE'] = pd.to_datetime(df['CONT_DATE'], format = '%m/%d/%Y')
#change FIRE_name NaNs to "no name given"
df.FIRE_NAME.fillna("No Name Given", inplace=True)
#combine nos and nones, and local and local fire
df['FIRE_NAME'] = df.FIRE_NAME.replace(["No Name Given", "NOT NAMED"], "UNKNOWN")
df['FIRE_NAME'] = df.FIRE_NAME.replace("LOCAL FIRE", "LOCAL") 
#fill rest of na's for cont date, cont doy, county
df.CONT_DATE.fillna("UNKNOWN", inplace=True)
df.CONT_DOY.fillna("UNKNOWN", inplace=True)
df.COUNTY.fillna("UNKNOWN", inplace=True)
df.FIPS_NAME.fillna("UNKNOWN", inplace=True)

#some descriptions
df.shape;
#df.DISCOVERY_MONTH.value_counts();
df.NWCG_REPORTING_AGENCY.value_counts();
df.NWCG_REPORTING_UNIT_NAME.value_counts();
maxfire = df.FIRE_SIZE.max();
maxfiredf = df[df['FIRE_SIZE'] == maxfire];
df.FIRE_SIZE.mean();

Findings

Analyzing the wildfire data shows what causes fires, when fires occur, along with their size and location. The wildfire dataset included information about NWCG reporting criteria, fire discovery, containment, classification, fire size, and state/county. Using this information uncovers trends in wildfires due to both natural and human causes.

Fire Size Class Pie Chart

This pie chart displays the count of fires based on their fire class.The size of the fire determines the class. Class A is the smallest, being 1/4 acre or less. Class A is the second most common fire size. Class B has the most occurrences with 47.94%. It’s size is between 1/4 and 10 acres. The third most common is Class C being between 10 and 100 acres. Classes D through G make up only 2.85% of fires and will be any greater than 100 acres. This data shows that most wildfires are less than 10 acres and very few are larger than 100.

#create df with fire classes and the counts of each, plus sum of # of fires at end (and definition of class)
class_df = df.groupby(['FIRE_SIZE_CLASS'])['FIRE_SIZE_CLASS'].count().reset_index(name='count_class')
all_fires = class_df.count_class.sum()
#group if less than 5% of total fires
pct_allfires = class_df['count_class']/ all_fires
newobj = class_df[pct_allfires<0.05].sum(axis=0)
#newobj = class_df[class_df['count_class']<35000].sum(axis=0)
class_df = class_df.append(newobj, ignore_index=True)
class_df = class_df.drop([3, 4, 5, 6])

fig = plt.figure(figsize=(10,10)) 
ax = fig.add_subplot(1,1,1)

colors = plt.get_cmap('Set2')

#labels list
classlabels = ['Class A \n 1/4 acre or less', 'Class B \n more than 1/4 acre, but less than 10 acres', 
               'Class C \n 10 acres or more, but less than 100 acres', 'Other Classes (D-G) \n 100 acres or more']

#grouping total fines per quarter and plot that 
class_df.groupby(['FIRE_SIZE_CLASS'])['count_class'].sum().plot(
    kind='pie', radius=1, pctdistance=0.85, labeldistance=1.1,
    autopct= '%1.2f%%', 
    cmap=colors, 
    wedgeprops=dict(edgecolor='w'), textprops={'fontsize':13}, 
    startangle=90, labels=classlabels) 

#add hole in middle
hole=plt.Circle((0,0), 0.4, fc='white')
fig1 = plt.gcf() #gcf is get current figure
fig1.gca().add_artist(hole) #get current axis

#get rid of y axis
ax.yaxis.set_visible(False)

ax.text(0, 0, 'Total Fires\n (1992-2020): \n' + str(('{:,}'.format(all_fires))), size=18,  ha='center', va='center')

ax.axis('equal') #center it in axes

plt.tight_layout() #scooch labels


plt.show()

NWCG General Cause Fire Count (1992-2020) Bar Chart

This bar chart shows the count of wildfires based on their NWCG General Cause. The most common cause is debris and open burning. Missing data is the second most common, demonstrating flaws in the FPA’s record keeping system. Beyond missing data, natural causes and arson also occur more frequently. The least common causes are firearms and exposives use, other causes, and fireworks. Many of these causes are due to human intervention and are easy to distinguish.

#get count of general cause first
x = df.groupby(['NWCG_GENERAL_CAUSE', 'NWCG_CAUSE_CLASSIFICATION'])['NWCG_GENERAL_CAUSE'].count().reset_index(name='NWCG_CAUSE_COUNT')
x = pd.DataFrame(x)
#take out missing
x = x[x['NWCG_CAUSE_CLASSIFICATION'] != 'Missing data/not specified/undetermined']

fig=plt.figure(figsize=(16,10))
ax = fig.add_subplot(1,1,1)

sort_xcauses = x.sort_values('NWCG_CAUSE_COUNT', ascending=True)
plt.barh('NWCG_GENERAL_CAUSE', 'NWCG_CAUSE_COUNT', data = sort_xcauses, color='indianred')

#patches for legend

Human = mpatches.Patch(color='indianred', label='Human')
Natural = mpatches.Patch(color='y', label='Natural')

#titles and ticks
plt.title('NWCG General Cause Fire Count (1992-2020)', size=24)
plt.xlabel('Fire Count', fontsize=16)
plt.ylabel('NWCG General Cause', fontsize=16)
plt.xticks(fontsize=14)

plt.yticks(fontsize=14)

current_x = plt.gca().get_xticks()
plt.gca().set_xticklabels(['{:,.0f}'.format(each) for each in current_x])

for row_counter, value_at_row_counter in enumerate(sort_xcauses.NWCG_CAUSE_COUNT):
    plt.text(value_at_row_counter + 10, row_counter, format(value_at_row_counter, ','), fontweight='bold') 
    

plt.show()

Fires by Discovery Date Heatmap

This heatmap displays the count of fires based on discovery date. It shows the month of the year on the x axis and year on the y. The colorbar shows a range of values from 0 to 20,000. The highest number of wildfires occurred in March of 2006, shown because it is significantly darker than all other months. The heatmap shows a trend of the majority of fires taking place in the middle of the year with winter months experiencing far fewer. 2001 and 2006 are both years that show outliers of high counts of fires in unexpected months. These could indicate drier years or possibly the presence of a serial arsonist.

#create new df that does the counts
df['DISCOVERY_YEAR'] = pd.DatetimeIndex(df['DISCOVERY_DATE']).year
df['DISCOVERY_MONTH'] = pd.DatetimeIndex(df['DISCOVERY_DATE']).month
#how many of each month of each year
x1 = df.groupby(['DISCOVERY_YEAR', 'DISCOVERY_MONTH'])['DISCOVERY_YEAR'].count().reset_index(name='Count')
x1 = pd.DataFrame(x1)
x1['New_Count'] = round(x1['Count']/10, 0) #round bc cant be decimal
x2 = pd.pivot_table(x1, index='DISCOVERY_YEAR', columns='DISCOVERY_MONTH', values='Count')
#HEATMAP
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)

comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))

ax = sns.heatmap(x2, cmap='Reds', annot=True, fmt=',.0f',
                cbar_kws={'format': comma_fmt, 'orientation': 'vertical'})

ax.invert_yaxis()

plt.title('Fires by Discovery Date', fontsize=24, pad=15)
plt.xlabel('Month of the Year', fontsize=18, labelpad=10)
plt.ylabel('Year', fontsize=18, labelpad=10)
plt.yticks(size=12)

plt.xticks(size=12)

plt.show()

Count of Fires (1992-2020) by Day of Year Line Chart

These three line charts show the frequency of fires over the course of the year. This is the total count of fires from every year. Initially the data is divided by region. The West tends to experience more wildfires in the middle of the year. There is a peak for the West not long after day 180. The line representing the South shows a less dramatic fire season occurring during the beginning of the year. The Mid-Atlantic, Midwest, New England, and Southwest tend to have less than the average of 1,050 fires on a particular day of the year.

The second line chart shows just the Western region divided by state. California is the clear outlier with considerably more wildfires than other states in the region. California fairly consistently stays above the mean of 129 wildfires on a given day of all the states.

The final line chart displays just the southern region broken up by states. The South is much more homogeneous, but Georgia tends to have the highest count of fires on any day of the year.

#lets group states by region to get 6 lines
    #maybe divide region with most counts into states OH MULTIPLE SUBPLOTS
west = ['AK', 'CA', 'CO', 'HI', 'ID', 'MT', 'NV', 'OR', 'UT', 'WA', 'WY']
newengland = ['CT', 'ME', 'MA', 'NH', 'RI', 'VT']
midatlantic = ['DE', 'MD', 'NJ', 'NY', 'PA']
south = ['AL', 'AR', 'FL', 'GA', 'KY', 'LA', 'MS', 'MO', 'NC', 'SC', 'TN', 'VA', 'WV']
midwest = ['IL', 'IN', 'IA', 'KS', 'MI', 'MN', 'NE', 'ND', 'OH', 'SD', 'WI']
southwest = ['AZ', 'NM', 'OK', 'TX']

def REGION(state):
    if state in west:
        return 'WEST'
    elif state in newengland:
        return 'NEW ENGLAND'
    elif state in midatlantic:
        return 'MID ATLANTIC'
    elif state in south:
        return 'SOUTH'
    elif state in midwest:
        return 'MIDWEST'
    else:
        return 'SOUTHWEST'
df['REGION'] = df['STATE'].map(REGION)

#create new df with doy, value_count fires groupby doy, fips name, state
line_df = df.groupby(['DISCOVERY_DOY', 'REGION'])['DISCOVERY_DOY'].count().reset_index(name='count_at_doy')

#make regions
regions_df = df.groupby(['DISCOVERY_DOY', 'REGION','STATE'])['DISCOVERY_DOY'].count().reset_index(name='count_at_doy')
west_df = regions_df[regions_df.REGION=='WEST']
south_df = regions_df[regions_df.REGION=='SOUTH']

fig = plt.figure(figsize=(18,14))

#setting up figure
ax0 = plt.gca()
ax0.set_ylabel('Count of Fires', fontsize=18, labelpad=40)
plt.setp(ax0.get_xticklabels(), visible=False)

plt.setp(ax0.get_yticklabels(), visible=False)

ax0.tick_params(axis='both', which='both', length=0)

#region graph
ax = fig.add_subplot(3, 1, 1)

for key, grp in line_df.groupby(['REGION']):
    grp.plot(ax=ax, kind='line', x='DISCOVERY_DOY', y='count_at_doy')

#region titles and ticks etc
ax.set_title('Count of Fires (1992-2020) by Day of Year', fontsize=24)
ax.legend(line_df['REGION'])
ax.set(xlabel=None)
ax.set_xticks(np.arange(0, 365, 30))
plt.axhline(line_df.count_at_doy.mean(), color = 'black', linestyle = ':', linewidth=2)
ax.text(-12, line_df.count_at_doy.mean()+750, 'Region Mean \n' + str(round(line_df.count_at_doy.mean())), 
         fontsize=14, fontweight='bold',
         bbox={'edgecolor':'black', 'facecolor': 'lightgray', 'boxstyle':'round', 'alpha':0.5})

#yticks
uncommad = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(x) for x in uncommad])

#adding just west
ax2 = fig.add_subplot(3, 1, 2)

for key1, grp1 in west_df.groupby(['STATE']):
    grp1.plot(ax=ax2, kind='line', x='DISCOVERY_DOY', y='count_at_doy')
    
#west titles and ticks etc
ax2.legend(west_df['STATE'])
ax2.set(xlabel=None)
ax2.set_xticks(np.arange(0, 365, 30))
plt.axhline(regions_df.count_at_doy.mean(), color = 'black', linestyle = ':', linewidth=2)
ax2.text(-12, regions_df.count_at_doy.mean()+300, 'State Mean \n' + str(round(regions_df.count_at_doy.mean())), 
         fontsize=14, fontweight='bold',
         bbox={'edgecolor':'black', 'facecolor': 'lightgray', 'boxstyle':'round', 'alpha':0.5})
ax2.set_title('West', fontsize=18)


uncommad = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(x) for x in uncommad])

#adding just south
ax3 = fig.add_subplot(3, 1, 3)
for key2, grp2 in south_df.groupby(['STATE']):
    grp2.plot(ax=ax3, kind='line', x='DISCOVERY_DOY', y='count_at_doy')
    
#south titles and ticks etc
ax3.legend(south_df['STATE'])
ax3.set_xlabel('Discovery Day of Year', fontsize=14)
ax3.set_xticks(np.arange(0, 365, 30))
plt.axhline(regions_df.count_at_doy.mean(), color = 'black', linestyle = ':', linewidth=2)

ax3.text(-12, regions_df.count_at_doy.mean()+150, 'State Mean \n' + str(round(regions_df.count_at_doy.mean())), 
         fontsize=14, fontweight='bold',
         bbox={'edgecolor':'black', 'facecolor': 'lightgray', 'boxstyle':'round', 'alpha':0.5})
ax3.set_title('South', fontsize=18)


uncommad = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(x) for x in uncommad])


plt.show()

State/Territory Based on Count of Fires Folium Choropleth Map

This map also displays count of fires but shows where they happened. The scale indicates darker shades of red with a higher count of fires. The scale shows a range of 83 to 251,881 occurrences. The map includes all US states and territories across the world. California is the darkest red and therefore has the greatest number of wildfires. States like Texas and Georgia are also a darker red with a higher count. The map clearly shows the regions with less fires, supporting the previous line charts. The states in the Midwest, Mid-Atlantic, and New England are a much lighter color because they experience fewer wildfires.

#state dict. tbh this was copy/paste bc there was no way i was going to type all this
    #to convert abbreviations to full name for json file
state_dict = {'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MP': 'Northern Mariana Islands',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'PR': 'Puerto Rico',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VI': 'Virgin Islands',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'}
#state count
state_df = df.groupby(['STATE']).size().reset_index(name="Count")
#from abbreviation to full name
state_df = state_df.replace({"STATE":state_dict})
#this json file was found @ https://eric.clst.org/tech/usgeojson/. i trust him.
mapoutline = path + 'gz_2010_us_040_00_20m.json' 

#setting center
center_of_map = [40, -95] #somewhere in the midwest

fire_map = folium.Map(location=center_of_map,
                     zoom_start=4,
                     tiles= 'cartodbpositron',
                     width='90%', 
                     height='100%', 
                     left='5%',
                     right='5%',
                     top='0%')

ch_map = folium.Choropleth(geo_data=mapoutline,
                          name='choropleth',
                          data=state_df,
                          columns=['STATE', 'Count'],
                          key_on= 'feature.properties.NAME', #navigate through geojson file
                          fill_color='Reds',
                          fill_opacity=0.9,
                          line_opacity=0.4,
                          legend_name='State/Territory Based on Count of Fires',
                          highlight=True).add_to(fire_map)

ch_map.geojson.add_child(
    folium.features.GeoJsonTooltip(fields=['NAME'], aliases=['State: '],
                                  labels=True, style=('background-color: white; color:black;')));

fire_map.save(path+'Choropleth_FIRES.html')

fire_map

Make this Notebook Trusted to load map: File -> Trust Notebook

Conclusion

Tracking Wildfires across the United States reveals their frequency in time and place. Of the 2.3 million wildfires recorded from 1992-2020, the vast majority were 10 acres or less. The leading causes of wildfires are due to human actions. Proper education could reduce the impact humans have starting and spreading wildfires, especially during wildfire season. The heatmap shows the frequency of wildfires over all recorded years. Trends of years and months with more fires show riskier times. This example covers the entire United States and is not region or state specific. Different regions of the US have different wildfire seasons with the South’s being towards the beginning of the year and the West’s being in the middle. The day or year line charts coordinate with the choropleth map. The map shows states with the highest record of wildfires. California, Texas, and Georgia lead all other states and are located in both the West, South, and Southwest. These locations are riskier and require more fire safety procedures. Another possible reason for the higher frequency of fires in these states is the recording techniques from the agency reporting the fires.