Across the United States, severe weather events have significant impacts on both population health and the economy. This analysis uses the NOAA Storm Database to identify which types of events are most harmful and costly. The study processes the raw dataset, cleans inconsistent values, and summarizes the data to highlight the events that result in the highest fatalities, injuries, property damage, and crop damage. Using plots and tables, we visualize these impacts to provide insights for government or municipal managers. Our results indicate patterns of risk by event type, showing which events pose the greatest threat to human life and economic assets. This analysis serves as a data-driven foundation for prioritizing preparedness resources.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
storm_data <- read.csv("/Users/vivaannanda/Documents/reproducible_research/repdata-data-StormData.csv.bz2")
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
summary(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.00 Min. : 0.0 Min. : 0.00000 Min. : 0.0000
## 1st Qu.:0.00 1st Qu.: 0.0 1st Qu.: 0.00000 1st Qu.: 0.0000
## Median :1.00 Median : 50.0 Median : 0.00000 Median : 0.0000
## Mean :0.91 Mean : 46.9 Mean : 0.01678 Mean : 0.1557
## 3rd Qu.:1.00 3rd Qu.: 75.0 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :5.00 Max. :22000.0 Max. :583.00000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
storm_data <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
exp_to_multiplier <- function(exp) {
case_when(
exp %in% c("K", "k") ~ 1e3,
exp %in% c("M", "m") ~ 1e6,
exp %in% c("B", "b") ~ 1e9,
TRUE ~ 1
)
}
storm_data <- storm_data %>%
mutate(
PROPDMGNUM = PROPDMG * exp_to_multiplier(PROPDMGEXP),
CROPDMGNUM = CROPDMG * exp_to_multiplier(CROPDMGEXP)
)
storm_data <- storm_data %>%
mutate(TOTALDMG = PROPDMGNUM + CROPDMGNUM)
In this section, we load the raw NOAA Storm Database CSV file directly into R and explore its structure. The dataset contains a variety of variables describing severe weather events, including event type (EVTYPE), human impacts (FATALITIES and INJURIES), and economic impacts (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP). We first select only the relevant columns needed for our analysis. Since property and crop damage values use exponent codes (e.g., “K” for thousands, “M” for millions, “B” for billions), we create a function to convert these codes into numeric multipliers. We then calculate the actual property and crop damage amounts and combine them into a single variable representing total economic damage (TOTALDMG). This preprocessing ensures that all subsequent analyses on population health and economic consequences are based on clean, numeric data directly derived from the raw dataset.
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE)
) %>%
mutate(total_harm = total_fatalities + total_injuries) %>%
arrange(desc(total_harm))
top_health_events <- head(health_impact, 10)
top_health_events
## # A tibble: 10 × 4
## EVTYPE total_fatalities total_injuries total_harm
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
ggplot(top_health_events, aes(x = reorder(EVTYPE, -total_harm), y = total_harm)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Top 10 Weather Events Harmful to Population Health",
x = "Event Type",
y = "Total Harm (Fatalities + Injuries)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
To evaluate the impact on human life, we group the data by event type (EVTYPE) and summarize the total fatalities and injuries for each type. We then calculate the combined total harm as the sum of fatalities and injuries. Sorting these totals in descending order allows us to identify which types of severe weather events have been most harmful to populations in the United States. The top ten events are visualized using a bar chart for clear comparison.
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(total_damage = sum(TOTALDMG, na.rm = TRUE)) %>%
arrange(desc(total_damage))
top_econ_events <- head(economic_impact, 10)
top_econ_events
## # A tibble: 10 × 2
## EVTYPE total_damage
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758221521.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
ggplot(top_econ_events, aes(x = reorder(EVTYPE, -total_damage), y = total_damage/1e9)) +
geom_bar(stat = "identity", fill = "tomato") +
labs(title = "Top 10 Weather Events with Greatest Economic Damage",
x = "Event Type",
y = "Total Damage (Billion $)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
To assess economic impacts, we group the data by event type and sum the total economic damage (TOTALDMG) across all events of each type. Sorting these sums in descending order highlights the events with the greatest economic consequences. The top ten most costly events are plotted in a bar chart, showing damage in billions of dollars to provide an intuitive understanding of the financial impact.