Synopsis

The purpose of this document is to outline some of the initial, exploratory findings of the NOAA Storm Database. Preprocessing was performed on the data frame, including scaling the damages appropriately using the exponent column. Then the data were aggregated by type (EVTYPE), and then sorted to examine the highest count with respect to damages, injuries, and fatalities. Further analysis of the event types showed a few major groups, so the data were then split into groups based on five strings: “FLOOD”, “HAIL”, “SNOW”, “THUNDERSTORM” (or “TSTM”), and “TORNADO”. Graphs were then prepared to further look at the toatls of damages, injuries, and fatalities for these groups. These graphs are followed with short commentary, and final conclusions are presented for further discussion.

Data Processing

After reading the data into the table, aggregating the fatalities and injuries was a straightforward process, as shown below.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
df2 <- read.csv("repdata_data_StormData.csv.bz2")
fatalities <- aggregate(FATALITIES~EVTYPE, df2, sum, na.rm = TRUE)

injuries <- aggregate(INJURIES~EVTYPE, df2, sum, na.rm = TRUE)

For total damage, the processing was slightly more involved. The exponent column for each type of damage (Property and Crop) was very inconsistently labeled. The following code replaced each value in the exponent columns with the appropriate power of 10. All values beyond the digits 1-9, H, h, K, k M, m, B and b were replaced with 0 (including NAs).

df2$CROPDMGEXP <- case_when(
  df2$CROPDMGEXP == 'k' ~ 3,
  df2$CROPDMGEXP == 'K' ~ 3, 
  df2$CROPDMGEXP == 'm' ~ 6,
  df2$CROPDMGEXP == 'M' ~ 6,
  df2$CROPDMGEXP == 'B' ~ 9,
  df2$CROPDMGEXP == 'h' ~ 2,
  df2$CROPDMGEXP == 'H' ~ 2,
  df2$CROPDMGEXP == '-' ~ 0,
  df2$CROPDMGEXP == '?' ~ 0,
  df2$CROPDMGEXP == '+' ~ 0,
  df2$CROPDMGEXP == '' ~ 0,
  is.na(df2$CROPDMGEXP) ~ 0,
  TRUE ~ as.numeric(df2$CROPDMGEXP)
)
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
df2$PROPDMGEXP <- case_when(
  df2$PROPDMGEXP == 'k' ~ 3,
  df2$PROPDMGEXP == 'K' ~ 3, 
  df2$PROPDMGEXP == 'm' ~ 6,
  df2$PROPDMGEXP == 'M' ~ 6,
  df2$PROPDMGEXP == 'B' ~ 9,
  df2$PROPDMGEXP == 'h' ~ 2,
  df2$PROPDMGEXP == 'H' ~ 2,
  df2$PROPDMGEXP == '-' ~ 0,
  df2$PROPDMGEXP == '?' ~ 0,
  df2$PROPDMGEXP == '+' ~ 0,
  df2$PROPDMGEXP == '' ~ 0,
  df2$PROPDMGEXP == '' ~ 0,
  is.na(df2$PROPDMGEXP) ~ 0,
  TRUE ~ as.numeric(df2$PROPDMGEXP)
)
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

From here, a column of TOTALDAMAGE was added to the dataframe, which then allowed an aggregate sum over damages for each event type.

df2<- mutate(df2, TOTALDAMAGE = df2$CROPDMG*10^df2$CROPDMGEXP+df2$PROPDMG*10^df2$PROPDMGEXP)
damages <- aggregate(TOTALDAMAGE~EVTYPE, df2, sum, na.rm = TRUE)

Upon closer inspection of the event types, there was a lot of repeated categories, as well as a variety of different types of each broad class of event type. A further step was taken to join the aggregated damages, injuries, and fatalities columns, and then organize the event types into a few major categories, depending on whether they contained certain strings.

totals_table <- damages %>% full_join(fatalities, by = "EVTYPE")
totals_table <- totals_table %>% full_join(injuries, by = "EVTYPE")

all_floods<-totals_table[grepl("FLOOD", totals_table$EVTYPE, ignore.case=TRUE),]
all_tstm<-totals_table[grepl("TSTM|THUNDERSTORM", totals_table$EVTYPE, ignore.case=TRUE),]
all_hail<-totals_table[grepl("HAIL", totals_table$EVTYPE, ignore.case=TRUE),]
all_snow<-totals_table[grepl("SNOW", totals_table$EVTYPE, ignore.case=TRUE),]
all_tornado<-totals_table[grepl("TORNADO", totals_table$EVTYPE, ignore.case=TRUE),]

labels <- c("Floods", "Thunderstorms", "Hail", "Snow", "Tornado")

damage_bars <- c(sum(all_floods$TOTALDAMAGE),
                  sum(all_tstm$TOTALDAMAGE),
                  sum(all_hail$TOTALDAMAGE),
                  sum(all_snow$TOTALDAMAGE),
                  sum(all_tornado$TOTALDAMAGE))


fatalities_bars <- c(sum(all_floods$FATALITIES),
                  sum(all_tstm$FATALITIES),
                  sum(all_hail$FATALITIES),
                  sum(all_snow$FATALITIES),
                  sum(all_tornado$FATALITIES))

injuries_bars <- c(sum(all_floods$INJURIES),
                  sum(all_tstm$INJURIES),
                  sum(all_hail$INJURIES),
                  sum(all_snow$INJURIES),
                  sum(all_tornado$INJURIES))

Results

Here are the highest ten event types for damages, injuries, and fatalities:

damages %>%
  arrange(desc(TOTALDAMAGE)) %>%
  slice(1:10)
##               EVTYPE  TOTALDAMAGE
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57362333946
## 4        STORM SURGE  43323541000
## 5               HAIL  18761221986
## 6        FLASH FLOOD  18243991078
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041360

Clearly, the highest single event type for damages was FLOOD.

injuries %>%
  arrange(desc(INJURIES)) %>%
  slice(1:10)
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

For injuries, the highest single event type was TORNADO.

fatalities %>%
  arrange(desc(FATALITIES)) %>%
  slice(1:10)
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

For fatalities, the highest singe event type was also TORNADO.

Here are the graphs for each event type group with respect to damages, injuries, and fatalities.

barplot(damage_bars/10^9, names.arg = labels, ylim = c(0,200),ylab = "Cost in Billions", 
        xlab = "Event Type", main = "Damage Totals for Event Type Groups")

barplot(fatalities_bars, names.arg = labels, ylim = c(0, 6000), ylab = "Fatalities", 
        xlab = "Event Type", main = "Fatality Totals for Event Type Groups")

barplot(injuries_bars/1000, names.arg = labels, ylim = c(0, 100), ylab = "Thousands of Injuries",
        xlab = "Event Type", main = "Injury Totals for Event Type Groups")

Conclusions

If the government is looking to address property damage, money and efforts should be invested into Flood awareness and prevention. However, if the concern is more toward human health and safety, the data suggest that research and capital be poured into Tornado tracking, evacuation, and safety.