This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which types of severe weather events are most harmful with respect to population health and have the greatest economic consequences. The database contains data from 1950-2011 tracking characteristics of major storms and weather events including fatalities, injuries, property damage, and crop damage. Our analysis reveals that tornadoes cause the most fatalities and injuries, while floods cause the most property damage and droughts cause the most crop damage.
# Load the storm data file
data <- read.csv("repdata_data_StormData.csv.bz2")
# Check the structure of the data
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
# Check for missing values in key variables
summary(data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")])
## EVTYPE FATALITIES INJURIES PROPDMG
## Length:902297 Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Mode :character Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## CROPDMG
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
# Examine the first few rows
head(data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "PROPDMGEXP", "CROPDMGEXP")])
## EVTYPE FATALITIES INJURIES PROPDMG CROPDMG PROPDMGEXP CROPDMGEXP
## 1 TORNADO 0 15 25.0 0 K
## 2 TORNADO 0 0 2.5 0 K
## 3 TORNADO 0 2 25.0 0 K
## 4 TORNADO 0 2 2.5 0 K
## 5 TORNADO 0 2 2.5 0 K
## 6 TORNADO 0 6 2.5 0 K
# Function to convert damage values based on exponent
convert_damage <- function(dmg, exp) {
exp <- toupper(exp)
multiplier <- ifelse(exp == "K", 1000,
ifelse(exp == "M", 1000000,
ifelse(exp == "B", 1000000000,
ifelse(exp %in% c("H", "2"), 100,
ifelse(exp %in% c("3", "4", "5", "6", "7", "8"), 10^as.numeric(exp),
1)))))
dmg * multiplier
}
# Calculate actual property and crop damage values
data$PROPDMG_ACTUAL <- convert_damage(data$PROPDMG, data$PROPDMGEXP)
## Warning in ifelse(exp %in% c("3", "4", "5", "6", "7", "8"),
## 10^as.numeric(exp), : NAs introduced by coercion
data$CROPDMG_ACTUAL <- convert_damage(data$CROPDMG, data$CROPDMGEXP)
# Check for missing values in processed data
cat("Missing values in key variables:\n")
## Missing values in key variables:
cat("FATALITIES:", sum(is.na(data$FATALITIES)), "\n")
## FATALITIES: 0
cat("INJURIES:", sum(is.na(data$INJURIES)), "\n")
## INJURIES: 0
cat("PROPDMG_ACTUAL:", sum(is.na(data$PROPDMG_ACTUAL)), "\n")
## PROPDMG_ACTUAL: 0
cat("CROPDMG_ACTUAL:", sum(is.na(data$CROPDMG_ACTUAL)), "\n")
## CROPDMG_ACTUAL: 0
# Aggregate fatalities and injuries by event type
health_impact <- data %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
total_health_impact = sum(FATALITIES, na.rm = TRUE) + sum(INJURIES, na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(desc(total_health_impact))
# Top 10 most harmful events for population health
top_health <- head(health_impact, 10)
print(top_health)
## # A tibble: 10 x 4
## EVTYPE total_fatalities total_injuries total_health_impact
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
# Separate analysis for fatalities and injuries
top_fatalities <- health_impact %>%
arrange(desc(total_fatalities)) %>%
head(10)
top_injuries <- health_impact %>%
arrange(desc(total_injuries)) %>%
head(10)
cat("\nTop 10 events by fatalities:\n")
##
## Top 10 events by fatalities:
print(top_fatalities[c("EVTYPE", "total_fatalities")])
## # A tibble: 10 x 2
## EVTYPE total_fatalities
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
cat("\nTop 10 events by injuries:\n")
##
## Top 10 events by injuries:
print(top_injuries[c("EVTYPE", "total_injuries")])
## # A tibble: 10 x 2
## EVTYPE total_injuries
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
# Aggregate property and crop damage by event type
economic_impact <- data %>%
group_by(EVTYPE) %>%
summarise(
total_property_damage = sum(PROPDMG_ACTUAL, na.rm = TRUE),
total_crop_damage = sum(CROPDMG_ACTUAL, na.rm = TRUE),
total_economic_damage = sum(PROPDMG_ACTUAL, na.rm = TRUE) + sum(CROPDMG_ACTUAL, na.rm = TRUE),
.groups = 'drop'
) %>%
arrange(desc(total_economic_damage))
# Top 10 most economically damaging events
top_economic <- head(economic_impact, 10)
print(top_economic)
## # A tibble: 10 x 4
## EVTYPE total_property_damage total_crop_damage total_economic_dama…
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHO… 69305840000 2607872800 71913712800
## 3 TORNADO 56947380676. 414953270 57362333946.
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15735267513. 3025954473 18761221986.
## 6 FLASH FLOOD 16822673978. 1421317100 18243991078.
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927860 5022113500 8967041360
# Separate analysis for property and crop damage
top_property <- economic_impact %>%
arrange(desc(total_property_damage)) %>%
head(10)
top_crop <- economic_impact %>%
arrange(desc(total_crop_damage)) %>%
head(10)
cat("\nTop 10 events by property damage (in billions):\n")
##
## Top 10 events by property damage (in billions):
top_property$total_property_damage <- top_property$total_property_damage / 1e9
print(top_property[c("EVTYPE", "total_property_damage")])
## # A tibble: 10 x 2
## EVTYPE total_property_damage
## <chr> <dbl>
## 1 FLOOD 145.
## 2 HURRICANE/TYPHOON 69.3
## 3 TORNADO 56.9
## 4 STORM SURGE 43.3
## 5 FLASH FLOOD 16.8
## 6 HAIL 15.7
## 7 HURRICANE 11.9
## 8 TROPICAL STORM 7.70
## 9 WINTER STORM 6.69
## 10 HIGH WIND 5.27
cat("\nTop 10 events by crop damage (in billions):\n")
##
## Top 10 events by crop damage (in billions):
top_crop$total_crop_damage <- top_crop$total_crop_damage / 1e9
print(top_crop[c("EVTYPE", "total_crop_damage")])
## # A tibble: 10 x 2
## EVTYPE total_crop_damage
## <chr> <dbl>
## 1 DROUGHT 14.0
## 2 FLOOD 5.66
## 3 RIVER FLOOD 5.03
## 4 ICE STORM 5.02
## 5 HAIL 3.03
## 6 HURRICANE 2.74
## 7 HURRICANE/TYPHOON 2.61
## 8 FLASH FLOOD 1.42
## 9 EXTREME COLD 1.29
## 10 FROST/FREEZE 1.09
# Plot for health impact
health_plot_data <- head(health_impact, 10) %>%
mutate(EVTYPE = reorder(EVTYPE, total_health_impact))
p1 <- ggplot(health_plot_data, aes(x = EVTYPE, y = total_health_impact)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events Most Harmful to Population Health",
x = "Event Type",
y = "Total Health Impact (Fatalities + Injuries)") +
theme_minimal() +
theme(plot.title = element_text(size = 12, hjust = 0.5))
print(p1)
Figure 1: Top 10 Weather Events Most Harmful to Population Health
# Plot for economic impact
economic_plot_data <- head(economic_impact, 10) %>%
mutate(
EVTYPE = reorder(EVTYPE, total_economic_damage),
total_economic_damage_billions = total_economic_damage / 1e9
)
p2 <- ggplot(economic_plot_data, aes(x = EVTYPE, y = total_economic_damage_billions)) +
geom_bar(stat = "identity", fill = "darkgreen") +
coord_flip() +
labs(title = "Top 10 Weather Events with Greatest Economic Impact",
x = "Event Type",
y = "Total Economic Damage (Billions USD)") +
theme_minimal() +
theme(plot.title = element_text(size = 12, hjust = 0.5))
print(p2)
Figure 2: Top 10 Weather Events with Greatest Economic Impact
Based on our analysis of the NOAA storm database from 1950-2011:
Population Health Impact: - Tornadoes are by far the most harmful weather events to population health, causing the highest number of both fatalities and injuries combined. - Excessive Heat ranks second in fatalities, while TSTM Wind (thunderstorm wind) ranks second in injuries. - The top weather events for health impact are: Tornado, Excessive Heat, Flash Flood, Heat, Lightning, TSTM Wind, Flood, Rip Current, High Wind, and Avalanche.
Economic Consequences: - Floods cause the most property damage overall, followed by hurricanes/typhoons and tornadoes. - Droughts cause the most crop damage, followed by floods and river floods. - When combining property and crop damage, floods have the greatest total economic impact.
Recommendations: Public health and emergency management resources should prioritize: 1. Tornado preparedness and warning systems given their extreme health impact 2. Heat wave prevention programs, especially for vulnerable populations 3. Flood mitigation and insurance programs given the massive economic consequences 4. Drought preparedness in agricultural regions to minimize crop losses
These findings emphasize the need for comprehensive weather event preparedness that addresses both the human health and economic dimensions of severe weather impacts.