##Synopsis: This analysis examines the NOAA Storm Database to identify U.S. severe weather events that most impact health and the economy. Data on fatalities, injuries, property damage, and crop damage are cleaned and standardized for analysis. Events are ranked by total health impact and total economic damage. Tornadoes and heat waves cause the most human harm, while hurricanes and floods lead in economic losses. The results provide insights for policymakers and emergency managers to prioritize resources and preparedness. ##Data Processing The analysis begins by loading the raw NOAA Storm Database CSV file directly into R. No preprocessing is done outside this document, ensuring that all transformations are reproducible. The dataset includes information on storm event types, fatalities, injuries, and economic damages (property and crop). Some of the damage variables are expressed using exponents (e.g., K for thousands, M for millions), so these are converted to numeric dollar amounts. This section also demonstrates basic inspection of the data and preparation of key variables for analysis. To save time on repeated processing, the cache = TRUE option can be used in code chunks.
#Load necessary libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Load the raw CSV data (compressed .bz2 file)
repdata <- read.csv("repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE)
#Quick inspection
head(repdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
str(repdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
summary(repdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.00 Min. : 0.0 Min. : 0.00000 Min. : 0.0000
## 1st Qu.:0.00 1st Qu.: 0.0 1st Qu.: 0.00000 1st Qu.: 0.0000
## Median :1.00 Median : 50.0 Median : 0.00000 Median : 0.0000
## Mean :0.91 Mean : 46.9 Mean : 0.01678 Mean : 0.1557
## 3rd Qu.:1.00 3rd Qu.: 75.0 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :5.00 Max. :22000.0 Max. :583.00000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
#Function to convert damage exponents to multipliers
exp_to_multiplier <- function(exp) {
if (exp %in% c("K","k")) return(1e3)
if (exp %in% c("M","m")) return(1e6)
if (exp %in% c("B","b")) return(1e9)
if (exp %in% c("", "0")) return(1)
if (grepl("^[0-9]+$", exp)) return(as.numeric(exp))
return(1)
}
#Create numeric columns for property and crop damage
repdata$PROPDMGVAL <- repdata$PROPDMG * sapply(repdata$PROPDMGEXP, exp_to_multiplier)
repdata$CROPDMGVAL <- repdata$CROPDMG * sapply(repdata$CROPDMGEXP, exp_to_multiplier)
#Check processed data
head(repdata[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMGVAL", "CROPDMGVAL")])
## EVTYPE FATALITIES INJURIES PROPDMGVAL CROPDMGVAL
## 1 TORNADO 0 15 25000 0
## 2 TORNADO 0 0 2500 0
## 3 TORNADO 0 2 25000 0
## 4 TORNADO 0 2 2500 0
## 5 TORNADO 0 2 2500 0
## 6 TORNADO 0 6 2500 0
(echo = TRUE)
## [1] TRUE
##Results
The goal of this section is to identify which types of weather events are most harmful to population health and which have the greatest economic impact. Population health is measured by the total number of fatalities and injuries, while economic impact is measured as the sum of property and crop damages in U.S. dollars.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
#Summarize fatalities and injuries by event type
health_impact <- repdata %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
total_health = total_fatalities + total_injuries
) %>%
arrange(desc(total_health))
#Top 10 events for population health impact
top10_health <- head(health_impact, 10)
#Plot
ggplot(top10_health, aes(x = reorder(EVTYPE, total_health), y = total_health)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events by Population Health Impact",
x = "Event Type", y = "Total Fatalities + Injuries")
The plot shows that tornadoes, heat waves, and floods have the highest impact on population health, causing the largest numbers of fatalities and injuries across the United States.
Economic Impact
Next, we summarize the economic damage by event type, combining property and crop damages. The top 10 most costly events are displayed in a bar plot.
The most economically damaging events are hurricanes, floods, and tropical storms, which cause billions of dollars in property and crop losses nationwide.