Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Dta Load

First we load the data as a table, and transform the date columns, in case we need to do some plotting.:

storm_data = read.csv('repdata_data_StormData.csv.bz2')

storm_data$BGN_DATE = as.Date(storm_data$BGN_DATE, format = '%m/%d/%Y %H:%M:%S')
storm_data$END_DATE = as.Date(storm_data$END_DATE, format = '%m/%d/%Y %H:%M:%S')

Subsetting the data

We will only need a few columns to defin the effects of each event. First the type of event, and then the human and economic impact the event had.

relevant_columns <- c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')
storm_data <- storm_data[, relevant_columns]

summary(storm_data)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000
str(storm_data)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Calibrating the numbers

We also have the order of magnitude the events had a,dn that is done through the GEXP variables, so we need to convert the damages both property and crop, using the scale listed.

PROPDM_keys <- c("\"\"" = 10^0, "-" = 10^0, "+" = 10^0, "0" = 10^0, "1" = 10^1, "2" = 10^2, "3" = 10^3, "4" = 10^4, "5" = 10^5, 
                    "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, "H" = 10^2, "K" = 10^3, "M" = 10^6, "B" = 10^9)
CROPDMG_keys <- c("\"\"" = 10^0, "?" = 10^0, "0" = 10^0, "K" = 10^3, "M" = 10^6, "B" = 10^9)

storm_data$PROPDMGEXP = PROPDM_keys[as.character(storm_data$PROPDMGEXP)]
storm_data$PROPDMGEXP[is.na(storm_data$PROPDMGEXP)] <- 0

storm_data$CROPDMGEXP = CROPDMG_keys[as.character(storm_data$CROPDMGEXP)]
storm_data$CROPDMGEXP[is.na(storm_data$CROPDMGEXP)] <- 0

storm_data$PROPDMG_DLLS = storm_data$PROPDMG * storm_data$PROPDMGEXP
storm_data$CROPDMG_DLLS = storm_data$CROPDMG * storm_data$CROPDMGEXP 

Summarization

So with the right order of magnitude for the actual variables, we now generate a couple of tables with the human and economic total effects:

total_human <- storm_data %>% group_by(EVTYPE) %>% summarise(total_fatalities = sum(FATALITIES), total_injuries = sum(INJURIES), total=total_fatalities+total_injuries) %>% arrange(desc(total))
total_dlls <- storm_data %>% group_by(EVTYPE) %>% summarise(total_prop_damage = sum(PROPDMG_DLLS), total_crop_damage = sum(CROPDMG_DLLS), total=total_prop_damage+total_crop_damage) %>% arrange(desc(total))

Results

Population Health

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Ranking

First we generate the top-10 events for each type of human damange, fatalities or injuries, and if we look at them side by side, the general human impact mostly resembles the same order for fatalities and injuries:

total_human %>% arrange(desc(total_fatalities)) %>% head(10)
## # A tibble: 10 × 4
##    EVTYPE         total_fatalities total_injuries total
##    <chr>                     <dbl>          <dbl> <dbl>
##  1 TORNADO                    5633          91346 96979
##  2 EXCESSIVE HEAT             1903           6525  8428
##  3 FLASH FLOOD                 978           1777  2755
##  4 HEAT                        937           2100  3037
##  5 LIGHTNING                   816           5230  6046
##  6 TSTM WIND                   504           6957  7461
##  7 FLOOD                       470           6789  7259
##  8 RIP CURRENT                 368            232   600
##  9 HIGH WIND                   248           1137  1385
## 10 AVALANCHE                   224            170   394

There’s a slight variation in order:

total_human %>% arrange(desc(total_injuries)) %>% head(10)
## # A tibble: 10 × 4
##    EVTYPE            total_fatalities total_injuries total
##    <chr>                        <dbl>          <dbl> <dbl>
##  1 TORNADO                       5633          91346 96979
##  2 TSTM WIND                      504           6957  7461
##  3 FLOOD                          470           6789  7259
##  4 EXCESSIVE HEAT                1903           6525  8428
##  5 LIGHTNING                      816           5230  6046
##  6 HEAT                           937           2100  3037
##  7 ICE STORM                       89           1975  2064
##  8 FLASH FLOOD                    978           1777  2755
##  9 THUNDERSTORM WIND              133           1488  1621
## 10 HAIL                            15           1361  1376

Summary

But in general and by far, tornados have the highest human impact of all:

health_impact <- melt(data.table(head(total_human,10)), id.vars = "EVTYPE", variable.name = "Impact")

ggplot(health_impact, aes(x = reorder(EVTYPE, -value), y = value)) + 
  geom_bar(stat = "identity", aes(fill = Impact), position = "dodge") + 
  ylab("Total Human Impact") + 
  xlab("Event Type") + 
  theme(axis.text.x = element_text(angle=90, hjust=1)) + 
  ggtitle("Top 10 most impactful US Weather Events on Humans") + 
  theme(plot.title = element_text(hjust = 0.5))

Economic Impact

Across the United States, which types of events have the greatest economic consequences?

Ranking

If we look at crop and property damage separately, we can see that property damage is mostly caused by flooding.

total_dlls %>% arrange(desc(total_prop_damage)) %>% head(10)
## # A tibble: 10 × 4
##    EVTYPE            total_prop_damage total_crop_damage         total
##    <chr>                         <dbl>             <dbl>         <dbl>
##  1 FLOOD                 144657709800         5661968450 150319678250 
##  2 HURRICANE/TYPHOON      69305840000         2607872800  71913712800 
##  3 TORNADO                56935880674.         414953270  57350833944.
##  4 STORM SURGE            43323536000               5000  43323541000 
##  5 FLASH FLOOD            16822673772.        1421317100  18243990872.
##  6 HAIL                   15730367456.        3025537470  18755904926.
##  7 HURRICANE              11868319010         2741910000  14610229010 
##  8 TROPICAL STORM          7703890550          678346000   8382236550 
##  9 WINTER STORM            6688497251           26944000   6715441251 
## 10 HIGH WIND               5270046295          638571300   5908617595

While crop damange happens mostly with drought and flood events, but even that number is nowhere close as the property damage that has been caused by all events.

total_dlls %>% arrange(desc(total_crop_damage)) %>% head(10)
## # A tibble: 10 × 4
##    EVTYPE            total_prop_damage total_crop_damage         total
##    <chr>                         <dbl>             <dbl>         <dbl>
##  1 DROUGHT                 1046106000        13972566000  15018672000 
##  2 FLOOD                 144657709800         5661968450 150319678250 
##  3 RIVER FLOOD             5118945500         5029459000  10148404500 
##  4 ICE STORM               3944927860         5022113500   8967041360 
##  5 HAIL                   15730367456.        3025537470  18755904926.
##  6 HURRICANE              11868319010         2741910000  14610229010 
##  7 HURRICANE/TYPHOON      69305840000         2607872800  71913712800 
##  8 FLASH FLOOD            16822673772.        1421317100  18243990872.
##  9 EXTREME COLD              67737400         1292973000   1360710400 
## 10 FROST/FREEZE               9480000         1094086000   1103566000

Summary

When looking at the total economic impact, flooding is the root causes for property and drop damage:

eonomic_impact <- melt(data.table(head(total_dlls,10)), id.vars = "EVTYPE", variable.name = "Impact")

ggplot(eonomic_impact, aes(x = reorder(EVTYPE, -value), y = value)) + 
  geom_bar(stat = "identity", aes(fill = Impact), position = "dodge") + 
  ylab("Total Economic Impact") + 
  xlab("Event Type") + 
  theme(axis.text.x = element_text(angle=90, hjust=1)) + 
  ggtitle("Top 10 most impactful US Weather Events on the Economy") + 
  theme(plot.title = element_text(hjust = 0.5))