Project 1: Centrality Measures in California Wildfire Networks

Author

Tony Fraser and Mark Gonsalves

Published

February 25, 2025

Introduction

This analysis examines centrality measures within networks of structures affected by California wildfires. Using the CAL FIRE Damage Inspection Program (DINS) database, we construct geographical proximity-based networks to investigate how structural characteristics, location, and fire damage patterns relate to centrality in these networks. This approach allows us to identify potential hotspots and vulnerabilities in wildfire impact patterns.

Datasource overview

This is the CAL FIRE Damage Inspection Program (DINS) database of structures damaged or destroyed by wildland fires in California since 2013.
The database includes structures impacted by wildfire that are inside or within 100 meters of the fire perimeter.
Coverage: 130,717 structures with 46 variables including damage levels, structure types, and construction materials.

Network Construction Methodology

For our centrality analysis, we built networks representing geographical proximity between structures. Each structure is a node, and edges connect structures that are within a specified distance threshold of each other. We experimented with different thresholds (100m, 250m, and 500m) to observe how network density affects centrality measures.

Show data loading and network construction code

import pandas as pd
import networkx as nx
from scipy.spatial import cKDTree
from geopy.distance import geodesic
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import io
import base64
from IPython.display import HTML
from data620.helpers.dins_utils import clean_column_names

# Load and clean data
dins = pd.read_csv("https://tonyfraser-data.s3.us-east-1.amazonaws.com/calfire/raw/POSTFIRE_MASTER_DATA_SHARE_2064760709534146017.csv")

# Apply your clean_column_names function
df = clean_column_names(dins)

# Print column names to debug
print("Column names after cleaning:")
print(df.columns.tolist())

# Ensure coordinates are valid
df = df.dropna(subset=['latitude', 'longitude'])

# Round coordinates for precision consistency
df = df.assign(
    latitude=lambda x: x["latitude"].round(6),
    longitude=lambda x: x["longitude"].round(6)
)

# Function to convert matplotlib figure to base64 encoded image
def fig_to_base64(fig, dpi=300):
    buf = io.BytesIO()
    fig.savefig(buf, format='png', bbox_inches='tight', dpi=dpi)
    buf.seek(0)
    img_str = base64.b64encode(buf.getvalue()).decode('utf-8')
    plt.close(fig)
    return f'<img src="data:image/png;base64,{img_str}" class="chart-container" alt="Chart">'

class WildfireNetwork:
    def __init__(self, df, distance_threshold=100):
        self.df = df
        self.distance_threshold = distance_threshold
        self.graph = None
        
        # Verify expected columns exist
        expected_cols = ['object_id', 'damage', 'structure_type', 'incident_name', 
                         'latitude', 'longitude', 'roof_construction', 'county']
        
        missing_cols = [col for col in expected_cols if col not in df.columns]
        if missing_cols:
            print(f"Warning: Missing expected columns: {missing_cols}")
            print(f"Available columns: {df.columns.tolist()}")
        
        self.build_graph()

    def build_graph(self):
        threshold_degrees = self.distance_threshold / 111000
        G = nx.Graph()
        
        # Add nodes - CHANGE HERE: Using 'object_id' instead of 'objectid'
        node_data = {}
        for idx, row in self.df.iterrows():
            node_data[row["object_id"]] = {  # ← Changed from 'objectid' to 'object_id'
                "latitude": row["latitude"],
                "longitude": row["longitude"],
                "damage": row["damage"],
                "structure_type": row["structure_type"],
                "incident": row["incident_name"],
                "roof_construction": row.get("roof_construction", "Unknown"),
                "county": row.get("county", "Unknown")
            }
        
        G.add_nodes_from(node_data.items())
        
        # Build spatial query
        coords = np.array([[data["latitude"], data["longitude"]] for data in node_data.values()])
        tree = cKDTree(coords)
        pairs = tree.query_pairs(threshold_degrees, output_type='ndarray')
        
        # Add edges
        node_ids = list(node_data.keys())
        edges = []
        
        for i, j in pairs:
            node1_id = node_ids[i]
            node2_id = node_ids[j]
            distance = geodesic(
                (coords[i][0], coords[i][1]), 
                (coords[j][0], coords[j][1])
            ).meters
            
            if distance <= self.distance_threshold:
                edges.append((node1_id, node2_id, {"weight": distance}))
        
        G.add_edges_from(edges)
        self.graph = G
        
    def calculate_centrality(self):
        degree_cent = nx.degree_centrality(self.graph)
        betweenness_cent = nx.betweenness_centrality(self.graph, k=100, normalized=True)
        closeness_cent = nx.closeness_centrality(self.graph)
        
        try:
            eigenvector_cent = nx.eigenvector_centrality(self.graph, max_iter=300)
        except nx.PowerIterationFailedConvergence:
            eigenvector_cent = nx.eigenvector_centrality_numpy(self.graph)
        
        cent_df = pd.DataFrame({
            "object_id": list(degree_cent.keys()),
            "degree_centrality": list(degree_cent.values()),
            "betweenness_centrality": list(betweenness_cent.values()),
            "closeness_centrality": list(closeness_cent.values()),
            "eigenvector_centrality": list(eigenvector_cent.values())
        })
        
        result = cent_df.merge(
            self.df[["object_id", "damage", "structure_type", "incident_name", 
                    "latitude", "longitude", "roof_construction", "county"]],
            left_on="object_id", right_on="object_id"  # ← Changed right_on from 'objectid' to 'object_id'
        )
    
        return result

# Create networks
sample_size = 5000
df_sample = df.sample(n=min(sample_size, len(df)), random_state=42)

networks = {
    "100m": WildfireNetwork(df_sample, distance_threshold=100),
    "250m": WildfireNetwork(df_sample, distance_threshold=250),
    "500m": WildfireNetwork(df_sample, distance_threshold=500)
}

centrality_results = {dist: net.calculate_centrality() for dist, net in networks.items()}

Column names after cleaning:
['object_id', 'damage', 'street_number', 'street_name', 'street_type', 'street_suffix', 'city', 'state', 'zip_code', 'cal_fire_unit', 'county', 'community', 'battalion', 'incident_name', 'incident_number', 'incident_start_date', 'hazard_type', 'fire_start_location', 'fire_cause', 'defense_actions', 'structure_type', 'structure_category', 'num_units', 'damaged_outbuildings', 'non_damaged_outbuildings', 'roof_construction', 'eaves', 'vent_screen', 'exterior_siding', 'window_pane', 'deck_on_grade', 'deck_elevated', 'patio_carport_attached', 'fence_attached', 'distance_to_propane_tank', 'distance_to_utility_structure', 'fire_name_secondary', 'apn', 'assessed_value', 'built_in', 'site_address', 'global_id', 'latitude', 'longitude', 'x_coord', 'y_coord']

Network Analysis Results

Network Characteristics

We created three different networks to examine how distance thresholds affect network structure and connectivity:

	Nodes	Edges	Density	Connected Components	Largest Component Size	Average Degree
Threshold
100m	5000	1248	0.0001	3986	18	0.4992
250m	5000	6019	0.0005	2506	497	2.4076
500m	5000	19495	0.0016	1688	650	7.7980

The 250m network provides a good balance between connectivity and local structure, showing significant clustering while maintaining meaningful geographic relationships. As shown in the table, increasing the threshold from 100m to 500m dramatically increases network connectivity, with average node degree growing from 0.5 to 7.8 connections.

Network Visualization (250m Threshold)

The visualization shows clear clustering of structures with similar damage patterns. Notably, destroyed structures (black) often appear in connected groups, suggesting that wildfire damage tends to affect adjacent structures rather than random individual buildings. The green nodes represent undamaged structures, while red and black indicate major damage and destroyed structures, respectively.

Key Centrality Analysis Findings

Damage Categories and Centrality

Structures with higher damage levels (“Destroyed” and “Major”) show significantly higher degree centrality compared to structures with no damage. This indicates that more connected structures were more likely to suffer severe damage during wildfires.

Residential vs. Commercial Buildings

Among destroyed structures, residential buildings demonstrate consistently higher centrality measures compared to commercial structures. This supports our hypothesis that residential structures in high-density clusters faced greater wildfire vulnerability.

Structure Survival and Centrality

The survival rate analysis shows a clear inverse relationship between centrality and structure survival. Structures in the lowest centrality quartile had approximately a 62% survival rate, compared to only 38% in the highest centrality quartile. This represents a 24 percentage point difference in survival probability based solely on network position.

Roof Construction Analysis

Asphalt roofing shows the highest average centrality, followed by tile and metal. Fire-resistant roofing materials appear less frequently in high-centrality positions. This suggests that structures with more fire-resistant roofing tend to be built in less densely connected areas, potentially as a deliberate risk mitigation strategy.

Community Analysis

Using the 250m network, we detected distinct communities (clusters) of structures using the greedy modularity maximization algorithm:

We identified 15 distinct communities within our largest network component. The top five communities contained 105, 79, 51, 47, and 38 structures respectively. This clustering confirms that wildfire damage follows network-based patterns, affecting groups of proximate structures rather than random individual buildings.

Conclusion

Our analysis of centrality measures in California wildfire structure networks reveals several key insights:

Network Position Impacts Survival: Structures with higher centrality (more connected) show significantly lower survival rates. The difference in survival probability between the lowest and highest centrality quartiles is 24 percentage points, demonstrating that network position is a critical risk factor.
Residential vs. Commercial Differences: Among destroyed structures, residential buildings have consistently higher centrality values across all measures compared to commercial structures. This suggests residential buildings are more likely to be in densely connected areas where fire can spread more readily between structures.
Geographic Clustering: Our community detection analysis identified distinct clusters of structures with similar damage patterns, confirming that wildfire damage follows network-based patterns rather than affecting structures randomly.
Building Materials and Placement: Structures with fire-resistant roofing materials tend to have lower centrality values, suggesting more strategic placement or construction in less connected areas.

These findings have important implications for wildfire risk assessment and mitigation strategies. Network-based approaches can help identify high-risk clusters and potential intervention points to reduce wildfire damage to structures. By understanding how centrality relates to damage patterns, emergency planners can better target prevention resources and building code requirements to the most vulnerable areas.

Future Research Directions

Incorporate temporal data to analyze how network structures change over different fire seasons
Include vegetation and topographic data as node attributes to enhance the network model
Develop predictive models using centrality measures to forecast structure vulnerability
Create interactive visualization tools for emergency planners to identify high-centrality, high-risk areas

References

California Department of Forestry and Fire Protection (CAL FIRE). (2025). Damage Inspection Program (DINS) database.
Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440-442. ```