The purpose of this document is to outline some of the initial, exploratory findings of the NOAA Storm Database. Preprocessing was performed on the data frame, including scaling the damages appropriately using the exponent column. Then the data were aggregated by type (EVTYPE), and then sorted to examine the highest count with respect to damages, injuries, and fatalities. Further analysis of the event types showed a few major groups, so the data were then split into groups based on five strings: “FLOOD”, “HAIL”, “SNOW”, “THUNDERSTORM” (or “TSTM”), and “TORNADO”. Graphs were then prepared to further look at the toatls of damages, injuries, and fatalities for these groups. These graphs are followed with short commentary, and final conclusions are presented for further discussion.
After reading the data into the table, aggregating the fatalities and injuries was a straightforward process, as shown below.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df2 <- read.csv("repdata_data_StormData.csv.bz2")
fatalities <- aggregate(FATALITIES~EVTYPE, df2, sum, na.rm = TRUE)
injuries <- aggregate(INJURIES~EVTYPE, df2, sum, na.rm = TRUE)
For total damage, the processing was slightly more involved. The exponent column for each type of damage (Property and Crop) was very inconsistently labeled. The following code replaced each value in the exponent columns with the appropriate power of 10. All values beyond the digits 1-9, H, h, K, k M, m, B and b were replaced with 0 (including NAs).
df2$CROPDMGEXP <- case_when(
df2$CROPDMGEXP == 'k' ~ 3,
df2$CROPDMGEXP == 'K' ~ 3,
df2$CROPDMGEXP == 'm' ~ 6,
df2$CROPDMGEXP == 'M' ~ 6,
df2$CROPDMGEXP == 'B' ~ 9,
df2$CROPDMGEXP == 'h' ~ 2,
df2$CROPDMGEXP == 'H' ~ 2,
df2$CROPDMGEXP == '-' ~ 0,
df2$CROPDMGEXP == '?' ~ 0,
df2$CROPDMGEXP == '+' ~ 0,
df2$CROPDMGEXP == '' ~ 0,
is.na(df2$CROPDMGEXP) ~ 0,
TRUE ~ as.numeric(df2$CROPDMGEXP)
)
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
df2$PROPDMGEXP <- case_when(
df2$PROPDMGEXP == 'k' ~ 3,
df2$PROPDMGEXP == 'K' ~ 3,
df2$PROPDMGEXP == 'm' ~ 6,
df2$PROPDMGEXP == 'M' ~ 6,
df2$PROPDMGEXP == 'B' ~ 9,
df2$PROPDMGEXP == 'h' ~ 2,
df2$PROPDMGEXP == 'H' ~ 2,
df2$PROPDMGEXP == '-' ~ 0,
df2$PROPDMGEXP == '?' ~ 0,
df2$PROPDMGEXP == '+' ~ 0,
df2$PROPDMGEXP == '' ~ 0,
df2$PROPDMGEXP == '' ~ 0,
is.na(df2$PROPDMGEXP) ~ 0,
TRUE ~ as.numeric(df2$PROPDMGEXP)
)
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
From here, a column of TOTALDAMAGE was added to the dataframe, which then allowed an aggregate sum over damages for each event type.
df2<- mutate(df2, TOTALDAMAGE = df2$CROPDMG*10^df2$CROPDMGEXP+df2$PROPDMG*10^df2$PROPDMGEXP)
damages <- aggregate(TOTALDAMAGE~EVTYPE, df2, sum, na.rm = TRUE)
Upon closer inspection of the event types, there was a lot of repeated categories, as well as a variety of different types of each broad class of event type. A further step was taken to join the aggregated damages, injuries, and fatalities columns, and then organize the event types into a few major categories, depending on whether they contained certain strings.
totals_table <- damages %>% full_join(fatalities, by = "EVTYPE")
totals_table <- totals_table %>% full_join(injuries, by = "EVTYPE")
all_floods<-totals_table[grepl("FLOOD", totals_table$EVTYPE, ignore.case=TRUE),]
all_tstm<-totals_table[grepl("TSTM|THUNDERSTORM", totals_table$EVTYPE, ignore.case=TRUE),]
all_hail<-totals_table[grepl("HAIL", totals_table$EVTYPE, ignore.case=TRUE),]
all_snow<-totals_table[grepl("SNOW", totals_table$EVTYPE, ignore.case=TRUE),]
all_tornado<-totals_table[grepl("TORNADO", totals_table$EVTYPE, ignore.case=TRUE),]
labels <- c("Floods", "Thunderstorms", "Hail", "Snow", "Tornado")
damage_bars <- c(sum(all_floods$TOTALDAMAGE),
sum(all_tstm$TOTALDAMAGE),
sum(all_hail$TOTALDAMAGE),
sum(all_snow$TOTALDAMAGE),
sum(all_tornado$TOTALDAMAGE))
fatalities_bars <- c(sum(all_floods$FATALITIES),
sum(all_tstm$FATALITIES),
sum(all_hail$FATALITIES),
sum(all_snow$FATALITIES),
sum(all_tornado$FATALITIES))
injuries_bars <- c(sum(all_floods$INJURIES),
sum(all_tstm$INJURIES),
sum(all_hail$INJURIES),
sum(all_snow$INJURIES),
sum(all_tornado$INJURIES))
Here are the highest ten event types for damages, injuries, and fatalities:
damages %>%
arrange(desc(TOTALDAMAGE)) %>%
slice(1:10)
## EVTYPE TOTALDAMAGE
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333946
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221986
## 6 FLASH FLOOD 18243991078
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
Clearly, the highest single event type for damages was FLOOD.
injuries %>%
arrange(desc(INJURIES)) %>%
slice(1:10)
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
For injuries, the highest single event type was TORNADO.
fatalities %>%
arrange(desc(FATALITIES)) %>%
slice(1:10)
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
For fatalities, the highest singe event type was also TORNADO.
Here are the graphs for each event type group with respect to damages, injuries, and fatalities.
barplot(damage_bars/10^9, names.arg = labels, ylim = c(0,200),ylab = "Cost in Billions",
xlab = "Event Type", main = "Damage Totals for Event Type Groups")
barplot(fatalities_bars, names.arg = labels, ylim = c(0, 6000), ylab = "Fatalities",
xlab = "Event Type", main = "Fatality Totals for Event Type Groups")
barplot(injuries_bars/1000, names.arg = labels, ylim = c(0, 100), ylab = "Thousands of Injuries",
xlab = "Event Type", main = "Injury Totals for Event Type Groups")
If the government is looking to address property damage, money and efforts should be invested into Flood awareness and prevention. However, if the concern is more toward human health and safety, the data suggest that research and capital be poured into Tornado tracking, evacuation, and safety.