Exploring Fire Incidence Frequency and Demographic Data

Abstract:

Fire incidence can have a spatial component that can be useful to understand for better resource allocation for emergency management. This project analyzes the spatial distribution of fire incidents within the city of Boston and the relationship between fires and demographic information. The data on fires was collected by the Boston Fire Department and sourced from a Boston government website and the data on demographics was sourced from the US Census Bureau’s American Community Survey. The tabular fire data was transformed into spatial data using geocoding and additionally spatially joined for some of this analysis. This project created maps showing the clusters of fire incidents across Boston. Additionally, scatterplots exploring the relationship between demographic data/housing characteristics and fire incidence were created. Finally, multivariable spatial regression techniques was employed to attempt to model fire incidence using spatial and demographic information. The final result is a number of maps, tables, and graphs that display and attempt to model the fire incidence across the city.

Introduction:

Understanding the spatial component of fire incidence frequency can be helpful to providing appropriate resource allocation for departments and help quicker response to disaster. Therefore, research into the spatial distribution of fire incidence can be an important part of emergency preparedness. Furthermore, different communities of people might have a higher likelihood of experiencing fires based on certain demographics. Previous research into fire incidence data has shown that demographic data such as income and population density can be predictive to a certain extent of the number of fires in an area (Buffington et. al., 2021). These demographics could definitely have a causal relationship with fire incidence such as with people with lower income not being able to afford areas with more fire safe infrastructure. Therefore, it could reveal important information on fire incidence to analyze this demographic data along with fire data. Similarly information in the housing in an area might show patterns as there is a possibility that older homes have less resilience to fire due to being built without current safety measures.

This project attempts to better understand possible patterns and clustering in fire incidence within the specific spatial context of the city of Boston Massachusetts using data on fire incidence within the city. It also assessed the relationships between demographic data, such as income, population density, and housing information compared to the number of fires in an area. This analysis will be somewhat similar to what was done in Buffington et. al. (2021), but focusing specifically on the city of Boston. The question that this project attempts to answer is- what patterns in fire incidence data exist within the city of Boston and how do demographic/housing characteristics affect this incidence?

Data:

In order to complete this analysis, data on fires and on demographic/housing data needed to be collected. The fire incidence data is sourced from the “Fire Incidence Reporting” dataset on the Analyze Boston website. This data is collected by the Boston Fire department and includes dates from 2014 to 2025 and covers the city of Boston. The demographic/housing data is sourced from American Community Survey 5-year estimate datasets created by the US Census Bureau. This survey collects estimates on population and demographic data such as income across the United States. The smallest unit for this data is a block group but based on conversations with peers and exploration of the data, census tracts were used as the smallest spatial unit in this analysis as it largely had enough accuracy. The data that was used in this analysis was estimates of income, population estimates, number of units in a building, and year of construction of buildings within each census tract within the city of Boston.

Methodology:

There are a number of steps that were employed in order to better understand the spatial and demographic relationships of fire incidence within the city of Boston. First, the tabular fire and demographic data needed to be converted into a spatial format. For the fire data, there are addresses, but not coordinates. This means that geocoding was used to create points representing fires.

For this the tidygeocoder package was used which provides R access to several geocoding options. The geocoder employed in this analysis was geoapify due to the being the only free option for the amount of addresses that needed to be geocoded. This data was checked for general accuracy after this step and then a cluster map of fires was created to explore general spatial patterns in the fire data. The demographic data was loaded into R in a spatial format using the tidycensus package and the fire data was spatially joined to the census tract demographic data for appropriate comparison. The density of fire incidents and population was then calculated for appropriate spatial comparison. Scatterplots showing relationships between the demographic and housing data such as income and population, housing units, and building age, versus fire incidence data were then created. Finally, methods of spatial multivariate regression were employed using this data and the spatialreg pakcage in R in an attempt to use the demographic and housing data to model the density of fire incidents in a census tract. The specific spatial multivariate regression technique used in this analysis was a spatial lag model which takes into account neighboring values in the model. Figure 1 charts the general workflow of this project.

Figure 1: Workflow Chart

Results

Fire Cluster Patterns

First, the fire data was looked at as point clusters within the city of Boston. As seen in figure 2, all of the points occur in the general Boston area, which likely means that the points were geocoded correctly. Based on knowledge of the area, it would seem that clusters are generally in the parts of the city with higher populations.

#Load Packages
library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(mapview)
library(sf) 
library(here)
library(dplyr)
library(tidygeocoder)
library(sf)
library(leaflet)
library(leafem)
library(leafpop)
library(units)
library(cowplot)
library(tigris)
library(spatialreg)
library(spdep)
library(flextable)
library(stargazer)

#Load fire data
fireraw <- read.csv(here("rawfiredata.csv"))
fireraw$alarm_date <- as.Date(fireraw$alarm_date, format= "%m/%d/%Y")


#Filter to dates 2020 to 2024 for comparison to ACS estimates
fireraw$alarm_date <- as.Date(fireraw$alarm_date, format= "%m/%d/%Y")
fire <- subset(fireraw, alarm_date>= "2020-01-01" & alarm_date <= "2024-12-31")

#Filter to fire incidences
fire <- mutate(fire, inc= substr(fire$incident_type,1,1))
fire <- subset(fire, inc==1)

#Create address fields for geocoding
fire <- mutate(fire, st = "MA")
fire <- mutate(fire, cty = "Boston")

fire <-fire %>% 
  unite(street, street_number:street_name:street_type, sep= " ",remove = FALSE)
#Geocode addresses
fire_spatial <- fire %>%
  geocode(street = street, city = cty, state = st, method = "geoapify")

#Write geocoded addresses to a file for further use
write_csv(fire_spatial, "firesspatial.csv")

fire_spatial <- read.csv(here("firesspatial.csv"))
bostonfire <- st_as_sf(x = fire_spatial,                         
               coords = c("long", "lat"),
               crs = st_crs(4269),
               na.fail =FALSE)
#create cluster map
leaflet(data = bostonfire) |> 
  addTiles() |> #Add open street map basemap 
  addMarkers(clusterOptions = markerClusterOptions(),label = ~alarm_date, popup =~street)

Figure 2: Map of Fire Clusters in Boston, MA.

Demographic and Housing Data Comparison

To better understand how fire data might be related to areas with high population and how it might be related to other factors such as income, housing age and building size, the scatterplots in figures 3, 4, 5 and 6 were created comparing these variables to fire density within each census tract. Figure 3 seems to show a strong relationship between population density and fire incidence density per census tract.

#Load population estimate data, Median annual household income, building units, and age from 2020-2024 ACS data for Suffolk County Tracts

demographics<- get_acs(
  geography = "tract",
  variables = c(
    
    total_pop = "B01003_001",
    income =  "B19013_001",
    housing="DP04_0006P",
    units1= "DP04_0007P",
    units1a="DP04_0008P",
    units2= "DP04_0009P",
    units3_4="DP04_0010P",
    units5_9="DP04_0011P",
    units10_19="DP04_0012P",
    units20= "DP04_0013P",
    mobile="DP04_0014P",
    houseyear="DP04_0016P",
    year2020="DP04_0017P",
    year2010s="DP04_0018P",
    year2000s="DP04_0019P",
    year90s="DP04_0020P",
    year80s="DP04_0021P",
    year70s="DP04_0022P",
    year60s="DP04_0023P",
    year50s="DP04_0024P",
    year40s="DP04_0025P",
    yearpre39="DP04_0026P"
    
  ),
  state = "MA",
  county = "Suffolk",
  output = "wide",
  progress_bar = FALSE,
  geometry = TRUE
)

#filter to only boston
county_sub <-county_subdivisions("MA", county = "Suffolk",class="sf")

boston<-subset(county_sub, NAME=="Boston")

demographics<- st_filter(demographics, boston, .predicate = st_intersects)

mapview(demographics)

#aggregate year and unit data
demo<- demographics %>%
  mutate(units1_2=units1E+units1aE+units2E) %>%
  mutate(post2000=year2020E+year2010sE+year2000sE)%>%
  mutate(y1950_2000=year90sE+year80sE+year70sE+year60sE+year50sE)%>%
  mutate(pre1950=year40sE+yearpre39E)

#keep only neccessary columns
demo<- demo %>%
  select(GEOID, NAME, total_popE, incomeE, units1_2, units3_4E, units5_9E, units10_19E, units20E, mobileE, post2000, y1950_2000, pre1950)
  

#calculate area
demo <- demo %>%
  mutate(area = st_area(demo))%>%
  drop_units()

#Remove zero area
demo <- subset(demo, area != 0)

#calculate population density
demo <- demo %>%
  mutate(dens = 10^6 * 2.59 * total_popE / area)

#join fires data
  
firetract <- st_join(demo, bostonfire)#join tracts to fires

firetract <- st_drop_geometry(firetract)#drop geometry so summary data can be created

firesum <- firetract %>% #create a summary of fires by tract
  group_by(GEOID) %>%
  summarize(fcnt = n())

#join fires data to demographic data and calculate fire density
alldata <- demo %>%
  left_join(firesum, by = "GEOID") %>%
  replace(is.na(.), 0) %>%
  ungroup()%>%
  mutate(fdens = 10^6 * 2.59 * fcnt / area) %>%
  drop_units()

#Create scatterplots

#Population Density
ggplot(data = alldata,aes(x=dens, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="Relationship between Population Density and Fires", caption="Figure 3:The correlational relationship
       between population density and fire density by census tract in the city of Boston",  x="Population Density (mi^2)",
       y="Fires per square mile")

Figure 4 shows that median income has much less of a correlation with fire density as compared to population density.

#Create scatterplots

#Median Income
ggplot(data = alldata,aes(x=incomeE, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="Relationship between Household Income and Fires", caption="Figure 4:The correlational relationship
       between median household income and fire density by census tract in the city of Boston",  x="Median Household Income",
       y="Fires per square mile")

Figure 5 shows that there are some correlations with certain number of units especially with higher percentages of buildings with a larger number of units having higher amounts of fires per tract. However, this probably is somewhat related to the population density as places that are more densely populated need housing that has enough units to accommodate for that.

#Building Units
one<-ggplot(data = alldata,aes(x=units1_2, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="1 to 2 Unit Buildings",  x="Percent Buildings with 1 to 2 units",
  y="Fires per square mile")

three<-ggplot(data = alldata,aes(x=units3_4E, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="3 to 4 Unit Buildings",  x="Percent Buildings with 3 to 4 units",
       y="Fires per square mile")

five<-ggplot(data = alldata,aes(x=units5_9E, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="5 to 9 Unit Buildings",  x="Percent Buildings with 5 to 9 units",
       y="Fires per square mile")

ten<-ggplot(data = alldata,aes(x=units10_19E, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="10 to 19 Unit Buildings",  x="Percent Buildings with 10 to 19 units",
       y="Fires per square mile")

twenty<-ggplot(data = alldata,aes(x=units20E, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="20 plus Unit Buildings",  x="Percent Buildings with 20 plus units",
       y="Fires per square mile")

mobile<-ggplot(data = alldata,aes(x=mobileE, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="Mobile Homes",  x="Percent mobile homes",
       y="Fires per square mile")



#composite
#Create a composite figure containing the map of tornado paths and the map of tornado points using plot_grid().
plots1 <- plot_grid(one, three, five, ten, twenty,mobile, 
          ncol = 2, 
          hjust = 0, 
          label_x = 0, 
          align = "hv")

title <- ggdraw() + 
  draw_label(
    "Relationship of Fire Density with Buildings Units",
    fontface = 'bold',
    x = 0,
    hjust = 0
  ) +
  theme(
    # add margin on the left of the drawing canvas,
    # so title is aligned with left edge of first plot
    plot.margin = margin(0, 0, 0, 7)
  )

caption <- ggdraw() + 
  draw_label(
    str_wrap("Figure 5: The correlational relationship
       between percentage of buildings separated by amount of units in a and fire density by census tract Boston"),
    x = 0,
    hjust = 0,
    size=12
  ) +
  theme(

    
  )

plot_grid(
  title, plots1, caption,
  ncol = 1,
  # rel_heights values control vertical title margins
  rel_heights = c(0.1, 1,0.1)
)

Figure 6 shows that there did not seem to much correlation any particular age of a building and fire incidents in the census tracts.

#building year
recent<-ggplot(data = alldata,aes(x=post2000, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="Post 2000 buildings",  x="Percent buildings built after 2000",
       y="Fires per square mile")

middle<-ggplot(data = alldata,aes(x=y1950_2000, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="1950-2000 buildings",  x="Percent buildings built 1950-2000",
       y="Fires per square mile")

old<-ggplot(data = alldata,aes(x=pre1950, y=fdens))+
  geom_point()+
  geom_smooth()+
  labs(title="Pre 1950 buildings",  x="Percent buildings built before 1950",
       y="Fires per square mile")




#composite
#Create a composite figure containing the map of tornado paths and the map of tornado points using plot_grid().
plots2 <- plot_grid(recent, middle, old, 
          ncol = 1, 
          hjust = 0, 
          label_x = 0, 
          align = "hv")

title2 <- ggdraw() + 
  draw_label(
    "Relationship of Fire Density with Buildings Year Built",
    fontface = 'bold',
    x = 0,
    hjust = 0
  ) +
  theme(
    # add margin on the left of the drawing canvas,
    # so title is aligned with left edge of first plot
    plot.margin = margin(0, 0, 0, 7)
  )

caption2 <- ggdraw() + 
  draw_label(
    str_wrap("Figure 6: The correlational relationship
       between percentage of buildings separated by when they were built and fire density by census tract in Boston"),
    x = 0,
    hjust = 0,
    size=12
  ) +
  theme(

    
  )

plot_grid(
  title2, plots2, caption2,
  ncol = 1,
  # rel_heights values control vertical title margins
  rel_heights = c(0.1, 1,0.1)
)

Calculating the Pearson’s correlations coefficients can possibly give more information about the strengths in relationships. As shown in Table 1 the strongest correlation for fire density is with population density but there are some moderate positive and negative correlations shown with fire density and other.

#correlation calculations-------------------------------
alldata <- alldata %>%
  select(incomeE, units1_2, units3_4E, units5_9E, units10_19E, units20E, mobileE, post2000, y1950_2000, pre1950,dens,fdens)



alldata2 <- alldata %>%
  st_drop_geometry(alldata)

alldata2 <- alldata2 %>%
  rename(income=incomeE, units3_4= units3_4E, units5_9= units5_9E, units10_19=units10_19E, units20_=units20E, mobile_homes= mobileE, built_post_2000=post2000, built_1950_2000=y1950_2000,built_pre_1950 =pre1950,density=dens,fire_density=fdens)

corr<-cor(alldata2,
        method = "pearson"
)

corrs<-as.data.frame(corr)

corrs<-corrs %>%
  select(fire_density)

library(knitr)

kable(corrs, caption = "Table 1: Pearson Correlation Coefficients for Fire Density and Demographic/Housing Variables")

Table 1: Pearson Correlation Coefficients for Fire Density and Demographic/Housing Variables
	fire_density
income	-0.0987458
units1_2	-0.4396835
units3_4	-0.0143396
units5_9	0.4403991
units10_19	0.3965355
units20_	0.2605572
mobile_homes	-0.0584224
built_post_2000	-0.0268636
built_1950_2000	0.0939150
built_pre_1950	0.1705951
density	0.8149883
fire_density	1.0000000

There are some relationships that were able to be seen in this data. Therefore, an attempt was made to create a model using demographic and spatial information in order to attempt to predict fire incident data. The results of this model are shown in Table 2. Some experimentation showed that the year data did not add to the model so they were removed.

#create neighbors list
neighbors<-poly2nb(alldata, queen=T)

nlist<-nb2listw(neighbors, style="W", zero.policy = TRUE)

#create model
lag2<-lagsarlm(fdens ~ incomeE + units1_2 + units3_4E +
                units5_9E + units10_19E + units20E + mobileE + dens,  
              data = alldata, 
              listw = nlist)


stargazer(lag2, type = "html",
                    title="Table 2: Regression Results")

**Table 2: Regression Results**

	Dependent variable:

	fdens

incomeE	-0.001^***
	(0.0004)

units1_2	1.655
	(1.281)

units3_4E	1.485
	(1.426)

units5_9E	4.886^**
	(2.387)

units10_19E	8.243^***
	(2.198)

units20E	3.360^***
	(1.230)

mobileE	9.528
	(28.194)

dens	0.022^***
	(0.002)

Constant	-197.688^**
	(95.515)


Observations	216
Log Likelihood	-1,541.909
sigma²	90,480.180
Akaike Inf. Crit.	3,105.819
Wald Test	37.781^*** (df = 1)
LR Test	32.192^*** (df = 1)

Note:	p<0.1; p<0.05; p<0.01

Looking at the residuals created from this model can illuminate somewhat the fit of the model. As shown in figure 7, the residuals were generally normally distributed but there were definitely some numbers that seem high.

hist(residuals(lag2), main = "Residuals Histogram", xlab = "Residuals", 
     ylab = "Frequency", sub = "Figure 7: Residuals distribution for the created spatial model."
     )

Discussion

There were shown to be relationships with fire density and several demographic and housing variables with population density having the strongest relationship with fire density in the census tracts in Boston, MA. The strong relationship with population density makes sense as the more people there are the more likely it is for there to be people whose actions lead to accidental fires. From the demographic data and spatial data, a model was created for predicting fire density at the tract level in the city of Boston. Overall, this model did seem to have some insight into the factors that could influence the fire incidents in the city. However, there were some higher residual values meaning that the model was further away from the actual results. There could be other factors that affect fire incidents that would need to be brought into the model in order for a more accurate model if the goal is to predict fire densities.

Conclusion

Fire density in Boston MA was found to have correlation with demographic and housing data such as population density, income, and the building sizes. Spatial regression was performed for fire density using these variables which did show some ability to model the fire density throughout the city of Boston for this data but more data would be needed to see if the created model can be predictive in general. This analysis give some insight into the overall, patterns of fire density in the city of Boston which could be helpful to know from a safety perspctive.

References

Boston Fire Department. (2022, July 21). Fire Incident Reporting. Analyze Boston. https://data.boston.gov/dataset/fire-incident-reporting/resource/91a38b1f-8439-46df-ba47-a30c48845e06

Buffington, T., Scott, J. G., & Ezekoye, O. A. (2021). Combining spatial and sociodemographic regression techniques to predict residential fire counts at the census tract level. Computers, Environment and Urban Systems, 88, 101633.

Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

U.S. Census Bureau. (n.d.). Income in the Past 12 Months (in 2024 Inflation-Adjusted Dollars). American Community Survey, ACS 5-Year Estimates Subject Tables, Table S1901. Retrieved March 10, 2026, from https://data.census.gov/table/ACSST5Y2024.S1901?t=Income+(Households,+Families,+Individuals)&g=010XX00US$1400000&d=American+Community+Survey.

U.S. Census Bureau. (n.d.). Selected Housing Characteristics. American Community Survey, ACS 5-Year Estimates Data Profiles, Table DP04. Retrieved April 26, 2026, from https://data.census.gov/table/ACSDP5Y2024.DP04?q=DP04&g=050XX00US25025,25025$1400000.