Introduction

Extreme weather events impose substantial economic and societal costs on the United States. Over the past several decades, natural disasters such as floods, hurricanes, droughts, severe storms, and wildfires have generated billions of dollars in damages and severely disrupted communities across the country. Understanding how the frequency and economic impact of these events have changed over time is critical to our ability to assess long-term patterns and consequences of climate-related disasters.

This study analyzes billion-dollar weather and climate disasters in the United States between 1980 and 2021 using data from the National Oceanic and Atmospheric Administration (NOAA), with a focus on the frequency and annual inflation-adjusted economic cost of these disasters. Adjusting costs for inflation allows for a more consistent comparison of costs over several years. To investigate trends over time, this study utilizes statistical techniques such as descriptive statistics, data visualization, and linear regression. The analysis is limited to events in the NOAA database and the study is conducted at the annual level, meaning that individual disasters are integrated into yearly counts and total costs in order to identify broader national patterns.

Research Question: Have billion-dollar weather disasters become more frequent and more costly over time in the United States?

Several limitations should be taken into consideration. Because only observational data is used, changes in disaster cost and frequency may also reflect factors outside of weather patterns, such as population growth, economic development, changes in land use, or improvements in damage assessment methods. Annual costs may also be influenced by a small number of exceptionally severe disasters, which can create large variations between years. In addition, smaller but still potentially significant disasters are also excluded since the study only examines disasters that have exceeded the billion-dollar threshold. Finally, this analysis focuses on national-level trends without accounting for regional differences or specific characteristics of individual disasters.

Methods

The data used in this study was obtained from the National Oceanic and Atmospheric Administration’s (NOAA) National Hurricane Center and Central Pacific Hurricane Center database. NOAA records climate and weather events occurring within the United States, and the data used includes only the events with inflation-adjusted economic costs in excess of one billion dollars. The data selected for this study includes 323 events between 1980 and 2021, which include information on disaster type, event dates, inflation-adjusted economic costs, and fatalities. Disasters were assigned to the year in which they began; any disasters that began in one year and ended in the following year were included in the year of their initial occurrence. This study relies solely on observational data, so no primary data collection or experimental procedures were conducted.

The dataset was imported into R for analysis. As part of the preprocessing stage, disaster start dates were converted into a date format and a year variable was created. Disaster events were then organized by year to calculate two variables of interest: the annual frequency of billion-dollar disasters and the annual total inflation-adjusted economic cost of those disasters. Any missing values were excluded.

Descriptive statistics including the mean, median, standard deviation, minimum value, and maximum value were calculated to summarize the annual disaster frequency and total costs. Scatterplots with linear trend lines were created to visualize the annual number of billion-dollar disasters and the annual total inflation-adjusted disaster costs between 1980 and 2021. Simple linear regression analyses were performed to analyze trends over time. The first model examined the relationship between year and annual disaster frequency, while the second examined the relationship between year and annual total inflation-adjusted disaster costs.

Results

Dataset Preparation

The NOAA dataset is imported into R:

disasters <- read.csv ("disasters2.csv")

Load libraries

library(lubridate)
library(dplyr)
library(ggplot2)

Create Year Variable

The disaster start date is converted into a year variable so that events can be grouped and analyzed by calendar year.

disasters$Year <- as.numeric(substr(as.character(disasters$Begin.Date), 1, 4))

Annual Disaster Counts

Individual disaster records are grouped to calculate the total number of billion-dollar disasters occurring each year.

annual_counts <- disasters %>% group_by(Year) %>% summarise(Disaster_Count = n())

Annual Total Costs

The inflation-adjusted costs of all disasters occurring in the same year are added together to calculate the annual economic impact.

annual_costs <- disasters %>% group_by(Year) %>% summarise(Total_Cost = sum(Total.CPI.Adjusted.Cost..Millions.of.Dollars., na.rm = TRUE))

Merge Datasets

The annual disaster frequency and annual total cost datasets are combined into a single dataset for ease of analysis.

annual_data <-left_join(annual_counts, annual_costs, by="Year")

Descriptive Statistics

Summary statistics are calculated to describe the distribution of annual disaster frequency and annual costs.

summary(annual_data)
##       Year      Disaster_Count     Total_Cost    
##  Min.   :1980   Min.   : 1.000   Min.   :  2893  
##  1st Qu.:1991   1st Qu.: 4.000   1st Qu.: 17826  
##  Median :2001   Median : 6.000   Median : 27798  
##  Mean   :2001   Mean   : 7.878   Mean   : 53656  
##  3rd Qu.:2011   3rd Qu.:10.000   3rd Qu.: 59109  
##  Max.   :2021   Max.   :22.000   Max.   :354705
sd(annual_data$Disaster_Count)
## [1] 5.197091
sd(annual_data$Total_Cost)
## [1] 67632.64

Visualization 1: Frequency Trend

ggplot(annual_data, aes(x = Year, y = Disaster_Count)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Annual Number of Billion-Dollar Disasters", x = "Year", y = "Disaster Count")
## `geom_smooth()` using formula = 'y ~ x'

This regression plot shows the relationship between year and number of disasters. The upward slope of the fitted line indicates that the number of disasters has generally increased over time.

Visualization 2: Cost Trend

ggplot(annual_data, aes(x = Year, y = Total_Cost)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Annual Disaster Costs", x = "Year", y = "Total Cost (Billions)")
## `geom_smooth()` using formula = 'y ~ x'

This regression plot shows the relationship between total disaster costs and time. The upward slope of the fitted line indicates that costs have generally increased over time.

Regression 1: Frequency Trend

freq_model <- lm(Disaster_Count ~ Year, data = annual_data)
summary(freq_model)
## 
## Call:
## lm(formula = Disaster_Count ~ Year, data = annual_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0170 -2.2440 -0.1765  1.8634  7.4769 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -685.66273   78.74310  -8.708 1.11e-10 ***
## Year           0.34663    0.03935   8.808 8.20e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.044 on 39 degrees of freedom
## Multiple R-squared:  0.6655, Adjusted R-squared:  0.6569 
## F-statistic: 77.58 on 1 and 39 DF,  p-value: 8.197e-11

This frequency trend regression indicates a significant increasing trend in disasters over time. The positive Year coefficient (0.3466) indicates that the number of disasters increased by approximately 0.35 events per year on average. The relationship is statistically significant as p < 0.05, and the Rsquared value is 0.6655, explaining about 66.6% of the variation in disaster frequency.

Regression 2: Cost Trend

cost_model <- lm(Total_Cost ~ Year, data = annual_data)
summary(cost_model)
## 
## Call:
## lm(formula = Total_Cost ~ Year, data = annual_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -65362 -34516  -9767  12799 259786 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -5051866.4  1571759.7  -3.214  0.00263 **
## Year            2551.7      785.5   3.248  0.00239 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60770 on 39 degrees of freedom
## Multiple R-squared:  0.2129, Adjusted R-squared:  0.1928 
## F-statistic: 10.55 on 1 and 39 DF,  p-value: 0.002391

This cost trend regression indicates that disaster costs have increased significantly over time. The relationship is statistically significant (p < 0.05), but the Rsquared value is 0.2129, only explaining about 21.3% of the variation in costs, suggesting that other factors besides year likely influence disaster costs.

Diagnostic Plots

par(mfrow=c(2,2))
plot(freq_model)

par(mfrow=c(2,2))
plot(cost_model)

The diagnostic plots show that the linear regression model fits the disaster frequency trend fairly well. The residuals vs. fitted plot reveals residuals dispersed about zero, indicating a roughly linear relationship between the number of disasters and the year. The residuals are generally near the reference line, indicating approximate normality, according to the Q-Q plot. The scale-location plot indicates continuous variance since the residuals are distributed rather consistently. There are no significant observations that might have a significant impact on the model in the residuals vs. leverage plot.

Cost by Disaster Type

ggplot(disasters, aes(x = Disaster, y = Total.CPI.Adjusted.Cost..Millions.of.Dollars.)) + geom_boxplot() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

This boxplot shows the relationship between cost and specific disaster type. It indicates that tropical cyclones typically cost the greatest amount, while freezes tend to cost the least.

Discussion/Conclusion

This study looked at whether billion-dollar weather disasters in the US have increased in frequency and expense between 1980 and 2021. The findings show that over the course of the study, both the number of disasters and their overall economic losses have increased. The number of billion-dollar catastrophes and year had a high positive correlation according to the frequency regression, with year accounting for about 66.6% of the variation in yearly disaster counts. This implies that billion-dollar catastrophes have become more frequent in recent decades.

The analysis of economic costs also showed a significant upward trend over time. However, compared to catastrophe frequency, the lower R-squared value (21.3%) suggests that year alone accounts for a smaller percentage of the variation in yearly disaster expenses. This implies that factors other than time have an impact on costs. Rising economic losses may be caused by a number of factors, such as population growth, economic development, changes in land use, or improvements in damage assessment methods.

Additionally, the boxplot analysis and visualization demonstrated that the effects of different types of disasters differ significantly. While some disaster categories resulted in smaller economic losses, tropical cyclones often had the highest costs. This variation indicates that changes in overall disaster costs are not only related to the number of events but also to the severity and type of disasters occurring in a given year. Large variations across years may result from a small number of exceptional occurrences contributing disproportionately to the overall annual costs.

The findings support the conclusion that billion-dollar weather disasters have increased in frequency and economic impact over time. However, these results should not be interpreted as evidence that increasing disaster costs are caused only by changes in weather patterns. Because this study uses observational data, it cannot separate the effects of climate changes from social and economic factors that influence disaster losses. Additionally, the dataset only includes events exceeding one billion dollars, meaning smaller disasters and regional impacts are not represented.