library(readr)
## Warning: package 'readr' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3

Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to learn about the consequences of storms and other severe weather events for public health and the economy. The analysis looks to answer the following two broad questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

For the first question, the analysis focuses on two variables that describe the number of fatalities (FATALITIES) and the number of injuries (INJURIES) resulting form a severe weather event. The most harmful events are defined as those that have caused the greatest total number of fatalities and injuries during 1950-2011. To answer the second question, the analysis focuses on two sources of economic damage from severe weather events - property damage (PROPDMG) and crop damage (CROPDMG). The most harmful events are defined as those that have caused the greatest total damage over the period of 1950-2011. To facilitate decisionmaking, the analysis ranks different types of events (EVTYPE) according to the degree of their severity.


Data Processing

Data are loaded using read_csv function from readr package. This function is capable of reading in archived datasets without having to previously extract them:

mydata <- read_csv("data/StormData.csv.bz2", col_names = TRUE)
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )
## See spec(...) for full column specifications.

Total Injuries

The following code summarizes the total injuries per event type over the period 1950-2011, calculates the number of severe weather events for each event type, and also calculates the average number of injuries per event across different types. The results are printed in tabular form in the order of decreasing severity (only the top 25 most severe event types are shown):

tot_inj <- mydata %>% 
        filter(!is.na(INJURIES)) %>% 
        group_by(EVTYPE) %>% 
        summarize(TotalInjuries = sum(INJURIES), N_Events = n(), Avg_Inj_per_Event = round(TotalInjuries/N_Events, digits=1)) %>%
        arrange(desc(TotalInjuries)) %>% 
        filter(TotalInjuries > 0)
## Warning: package 'bindrcpp' was built under R version 3.3.3
print(tot_inj, n=25)
## # A tibble: 158 x 4
##                EVTYPE TotalInjuries N_Events Avg_Inj_per_Event
##                 <chr>         <dbl>    <int>             <dbl>
##  1            TORNADO         91346    60652               1.5
##  2          TSTM WIND          6957   219944               0.0
##  3              FLOOD          6789    25326               0.3
##  4     EXCESSIVE HEAT          6525     1678               3.9
##  5          LIGHTNING          5230    15755               0.3
##  6               HEAT          2100      767               2.7
##  7          ICE STORM          1975     2006               1.0
##  8        FLASH FLOOD          1777    54278               0.0
##  9  THUNDERSTORM WIND          1488    82563               0.0
## 10               HAIL          1361   288661               0.0
## 11       WINTER STORM          1321    11433               0.1
## 12  HURRICANE/TYPHOON          1275       88              14.5
## 13          HIGH WIND          1137    20212               0.1
## 14         HEAVY SNOW          1021    15708               0.1
## 15           WILDFIRE           911     2761               0.3
## 16 THUNDERSTORM WINDS           908    20843               0.0
## 17           BLIZZARD           805     2719               0.3
## 18                FOG           734      538               1.4
## 19   WILD/FOREST FIRE           545     1457               0.4
## 20         DUST STORM           440      427               1.0
## 21     WINTER WEATHER           398     7026               0.1
## 22          DENSE FOG           342     1293               0.3
## 23     TROPICAL STORM           340      690               0.5
## 24          HEAT WAVE           309       74               4.2
## 25         HIGH WINDS           302     1533               0.2
## # ... with 133 more rows

The following code plots Total Injuries from severe weather events across different event types in the decreasing order of severity with the most harmul events starting at the left. To correctly order the evemts in the plot, it was necessary to transform the EVTYPE variable into a factor with explicitly ordered levels that follow the decreasing levels of Total Injuries:

tot_inj$EVTYPE <- factor(tot_inj$EVTYPE, 
                         levels = tot_inj$EVTYPE[order(tot_inj$TotalInjuries, decreasing = TRUE)])

ggplot(tot_inj[1:25,], aes(EVTYPE, TotalInjuries)) +  
    theme(axis.text.x=element_text(angle=90,hjust=1)) + geom_col() +
    labs(title = "Total Injuries by Event Type 1950-2011", x = "Event Type", y = "Total Injuries")

Because some events are substantially more frequent than others, it is important to rank the events based on the average injuries an event causes. For example, a rae but extremely severe event can result in a relatively high number of injuries but low total (cumulative) number of injuries over the years:

tot_inj <- mydata %>% 
        filter(!is.na(INJURIES)) %>% 
        group_by(EVTYPE) %>% 
        summarize(Avg_Inj_per_Event = round(sum(INJURIES)/n(), digits=1), 
                  N_Events = n(), 
                  TotalInjuries = sum(INJURIES)) %>%
        arrange(desc(Avg_Inj_per_Event)) %>% 
        filter(TotalInjuries > 0) %>%
        print(n=25)
## # A tibble: 158 x 4
##                     EVTYPE Avg_Inj_per_Event N_Events TotalInjuries
##                      <chr>             <dbl>    <int>         <dbl>
##  1               Heat Wave              70.0        1            70
##  2   TROPICAL STORM GORDON              43.0        1            43
##  3              WILD FIRES              37.5        4           150
##  4           THUNDERSTORMW              27.0        1            27
##  5      HIGH WIND AND SEAS              20.0        1            20
##  6         SNOW/HIGH WINDS              18.0        2            36
##  7         GLAZE/ICE STORM              15.0        1            15
##  8       HEAT WAVE DROUGHT              15.0        1            15
##  9 WINTER STORM HIGH WINDS              15.0        1            15
## 10       HURRICANE/TYPHOON              14.5       88          1275
## 11      WINTER WEATHER MIX              11.3        6            68
## 12            EXTREME HEAT               7.0       22           155
## 13  NON-SEVERE WIND DAMAGE               7.0        1             7
## 14                   GLAZE               6.8       32           216
## 15                 TSUNAMI               6.5       20           129
## 16           WINTER STORMS               5.7        3            17
## 17              TORNADO F2               5.3        3            16
## 18      EXCESSIVE RAINFALL               5.2        4            21
## 19      WATERSPOUT/TORNADO               5.2        8            42
## 20               HEAT WAVE               4.2       74           309
## 21     Torrential Rainfall               4.0        1             4
## 22          EXCESSIVE HEAT               3.9     1678          6525
## 23                    HEAT               2.7      767          2100
## 24            MIXED PRECIP               2.6       10            26
## 25           MARINE MISHAP               2.5        2             5
## # ... with 133 more rows

Total Fatalities

Similar analysis is condcuted for the measure of Total Fatalities:

tot_fat <- mydata %>% 
        filter(!is.na(FATALITIES)) %>% 
        group_by(EVTYPE) %>% 
        summarize(TotalFatalities = sum(FATALITIES), N = n(), Avg_F_per_Event = round(TotalFatalities/N, digits = 1)) %>%
        arrange(desc(TotalFatalities)) %>% 
        filter(TotalFatalities > 0)

print(tot_fat, n=25)
## # A tibble: 168 x 4
##                     EVTYPE TotalFatalities      N Avg_F_per_Event
##                      <chr>           <dbl>  <int>           <dbl>
##  1                 TORNADO            5633  60652             0.1
##  2          EXCESSIVE HEAT            1903   1678             1.1
##  3             FLASH FLOOD             978  54278             0.0
##  4                    HEAT             937    767             1.2
##  5               LIGHTNING             816  15755             0.1
##  6               TSTM WIND             504 219944             0.0
##  7                   FLOOD             470  25326             0.0
##  8             RIP CURRENT             368    470             0.8
##  9               HIGH WIND             248  20212             0.0
## 10               AVALANCHE             224    386             0.6
## 11            WINTER STORM             206  11433             0.0
## 12            RIP CURRENTS             204    304             0.7
## 13               HEAT WAVE             172     74             2.3
## 14            EXTREME COLD             160    655             0.2
## 15       THUNDERSTORM WIND             133  82563             0.0
## 16              HEAVY SNOW             127  15708             0.0
## 17 EXTREME COLD/WIND CHILL             125   1002             0.1
## 18             STRONG WIND             103   3566             0.0
## 19                BLIZZARD             101   2719             0.0
## 20               HIGH SURF             101    725             0.1
## 21              HEAVY RAIN              98  11723             0.0
## 22            EXTREME HEAT              96     22             4.4
## 23         COLD/WIND CHILL              95    539             0.2
## 24               ICE STORM              89   2006             0.0
## 25                WILDFIRE              75   2761             0.0
## # ... with 143 more rows

The results (25 deadliest) are graphically displayed in the following bar chart:

tot_fat$EVTYPE <- factor(tot_fat$EVTYPE, levels = tot_fat$EVTYPE[order(tot_fat$TotalFatalities, decreasing = TRUE)])
ggplot(tot_fat[1:25,], aes(EVTYPE, TotalFatalities)) +  theme(axis.text.x=element_text(angle=90,hjust=1)) + geom_col() + 
    labs(title = "Total Fatalities by Event Type 1950-2011", x = "Event Type", y = "Total Fatalities")

Finally, the average numbers fatalities per event across different event types:

tot_fat <- mydata %>% 
        filter(!is.na(FATALITIES)) %>% 
        group_by(EVTYPE) %>% 
        summarize(Avg_F_per_Event = round(sum(FATALITIES)/n(), digits = 1), N = n(), TotalFatalities = sum(FATALITIES)) %>%
        arrange(desc(Avg_F_per_Event)) %>% 
        filter(TotalFatalities > 0) %>%
        print(tot_fat, n=25)
## # A tibble: 168 x 4
##                        EVTYPE Avg_F_per_Event     N TotalFatalities
##                         <chr>           <dbl> <int>           <dbl>
##  1 TORNADOES, TSTM WIND, HAIL            25.0     1              25
##  2              COLD AND SNOW            14.0     1              14
##  3      TROPICAL STORM GORDON             8.0     1               8
##  4      RECORD/EXCESSIVE HEAT             5.7     3              17
##  5               EXTREME HEAT             4.4    22              96
##  6          HEAT WAVE DROUGHT             4.0     1               4
##  7             HIGH WIND/SEAS             4.0     1               4
##  8              MARINE MISHAP             3.5     2               7
##  9              WINTER STORMS             3.3     3              10
## 10        Heavy surf and wind             3.0     1               3
## 11         HIGH WIND AND SEAS             3.0     1               3
## 12                 ROUGH SEAS             2.7     3               8
## 13                 HEAT WAVES             2.5     2               5
## 14    RIP CURRENTS/HEAVY SURF             2.5     2               5
## 15                  HEAT WAVE             2.3    74             172
## 16  UNSEASONABLY WARM AND DRY             2.2    13              29
## 17  HURRICANE OPAL/HIGH WINDS             2.0     1               2
## 18                    TSUNAMI             1.6    20              33
## 19                 HEAVY SEAS             1.5     2               3
## 20       Hypothermia/Exposure             1.3     3               4
## 21               COLD WEATHER             1.2     4               5
## 22                       HEAT             1.2   767             937
## 23             EXCESSIVE HEAT             1.1  1678            1903
## 24                   AVALANCE             1.0     1               1
## 25               COASTALSTORM             1.0     1               1
## # ... with 143 more rows

Economic Consequences

Data Transformations

In this analysis, economic consequences of severe weather events are defined as the sum of the property damage (PROPDMG) and crop damage (CROPDMG) from an event. This measure has to be derived by summing up the amounts of PROPDMG and CROPDMG. In order to do that, the measures of PROPDMG and CROPDMG have to be expressed in comparable units - dollars, thousands of dollars, millions of dollars etc. In the original dataset this is not the case: the amount of damage for each type of damage is described by two variables - one variable (PROPDMG or CROPDMG) gives a numeric measure, and the second, character variable tells us the measuement units (“K” corresponds to thousands of dollars, “M” to billions, and “B” to billions). Therefore, those two-column measures for each type of economic effect need to be transformed into a new variable measuring the effect in dollars; then the dollar amounts for property damage and crop damage are added up and again, for convenience, converted to millions of dollars. Any records that have other characters than “K”, “M”, or “B” are treated as entry errors and removed from the analysis:

dmg <- mydata %>% 
        filter(grepl("[KkMmBb]", PROPDMGEXP) | is.na(PROPDMGEXP)) %>%
        mutate(PROPDMGEXP = toupper(PROPDMGEXP)) %>% 
        filter(grepl("[KkMmBb]", CROPDMGEXP) | is.na(CROPDMGEXP)) %>%
        mutate(CROPDMGEXP = toupper(CROPDMGEXP)) %>% 
        mutate(Prop_Dmg_Dollars = ifelse(PROPDMG==0, 0, ifelse(PROPDMG>0 & PROPDMGEXP=="K", PROPDMG*1000, 
                                                        ifelse(PROPDMG>0 & PROPDMGEXP=="M", PROPDMG*1000000, 
                                                        ifelse(PROPDMG>0 & PROPDMGEXP=="B", PROPDMG*1000000000, PROPDMG))))) %>%
        mutate(Crop_Dmg_Dollars = ifelse(CROPDMG==0, 0, ifelse(CROPDMG>0 & CROPDMGEXP=="K", CROPDMG*1000, 
                                                        ifelse(CROPDMG>0 & CROPDMGEXP=="M", CROPDMG*1000000, 
                                                        ifelse(CROPDMG>0 & CROPDMGEXP=="B", CROPDMG*1000000000, CROPDMG))))) %>%
        mutate(Dmg_Dollars_M = (Prop_Dmg_Dollars+Crop_Dmg_Dollars)/1000000)

Data Analysis

Now, the Total Economic Damage (Total_Econ_Dmg_M) in millions of dollars over the whole period of time for each event type, along with the number of events, and the average damage caused by each event (Dmg_per_Event_M) are the following:

dmg_tot <- dmg %>%
        group_by(EVTYPE) %>% 
        summarise(Total_Econ_Dmg_M = round(sum(Dmg_Dollars_M), digits = 1), N = n(), Avg_Dmg_per_Event_M = round(Total_Econ_Dmg_M/N, digits=1)) %>% 
        arrange(desc(Total_Econ_Dmg_M)) %>% 
        print(n=25)
## # A tibble: 973 x 4
##                        EVTYPE Total_Econ_Dmg_M      N Avg_Dmg_per_Event_M
##                         <chr>            <dbl>  <int>               <dbl>
##  1          HURRICANE/TYPHOON          71913.7     88               817.2
##  2                STORM SURGE          43323.5    261               166.0
##  3                    DROUGHT          15018.7   2487                 6.0
##  4                  HURRICANE          14610.2    174                84.0
##  5                RIVER FLOOD          10148.4    173                58.7
##  6                  ICE STORM           8967.0   2005                 4.5
##  7             TROPICAL STORM           8382.2    690                12.1
##  8               WINTER STORM           6715.4  11432                 0.6
##  9                  HIGH WIND           5908.6  20210                 0.3
## 10                   WILDFIRE           5060.6   2761                 1.8
## 11                  TSTM WIND           5047.0 219943                 0.0
## 12           STORM SURGE/TIDE           4642.0    148                31.4
## 13             HURRICANE OPAL           3191.8      9               354.6
## 14           WILD/FOREST FIRE           3108.6   1457                 2.1
## 15  HEAVY RAIN/SEVERE WEATHER           2500.0      2              1250.0
## 16 TORNADOES, TSTM WIND, HAIL           1602.5      1              1602.5
## 17                 HEAVY RAIN           1427.6  11723                 0.1
## 18               EXTREME COLD           1360.7    655                 2.1
## 19        SEVERE THUNDERSTORM           1205.6     13                92.7
## 20               FROST/FREEZE           1103.6   1342                 0.8
## 21                 HEAVY SNOW           1067.2  15705                 0.1
## 22                   BLIZZARD            771.3   2719                 0.3
## 23                 WILD FIRES            624.1      4               156.0
## 24                    TYPHOON            601.1     11                54.6
## 25             EXCESSIVE HEAT            500.2   1678                 0.3
## # ... with 948 more rows

The following graph shows the Total Economic Damage caused by each event type over the period of 1950-2011 in the order of decreasing importance (from the most damaging at the left to the least damaging to the right):

dmg_tot$EVTYPE <- factor(dmg_tot$EVTYPE, levels = dmg_tot$EVTYPE[order(dmg_tot$Total_Econ_Dmg_M, decreasing = TRUE)])        

ggplot(dmg_tot[1:25,], aes(EVTYPE, Total_Econ_Dmg_M)) +  
    theme(axis.text.x=element_text(angle=90,hjust=1)) + geom_col() +
    labs(title = "Total Economic Damage by Event Type 1950-2011", x = "Event Type", y = "Total Economic Damage")

Becasue some events are much more frequent than others, the following table can be useful by showing how the different types of events rank based on the average damage per single event across different event types:

dmg_event <- dmg %>% group_by(EVTYPE) %>% 
        summarise(Avg_Dmg_per_Event_M = round(sum(Dmg_Dollars_M)/n(), digits=1), N = n(), Total_Econ_Dmg_M = round(sum(Dmg_Dollars_M), digits=1)) %>% 
        arrange(desc(Avg_Dmg_per_Event_M)) %>%
        print(n=25)
## # A tibble: 973 x 4
##                        EVTYPE Avg_Dmg_per_Event_M     N Total_Econ_Dmg_M
##                         <chr>               <dbl> <int>            <dbl>
##  1 TORNADOES, TSTM WIND, HAIL              1602.5     1           1602.5
##  2  HEAVY RAIN/SEVERE WEATHER              1250.0     2           2500.0
##  3          HURRICANE/TYPHOON               817.2    88          71913.7
##  4             HURRICANE OPAL               354.6     9           3191.8
##  5                STORM SURGE               166.0   261          43323.5
##  6                 WILD FIRES               156.0     4            624.1
##  7          EXCESSIVE WETNESS               142.0     1            142.0
##  8  HURRICANE OPAL/HIGH WINDS               110.0     1            110.0
##  9        SEVERE THUNDERSTORM                92.7    13           1205.6
## 10                  HURRICANE                84.0   174          14610.2
## 11                  HAILSTORM                80.3     3            241.0
## 12    COLD AND WET CONDITIONS                66.0     1             66.0
## 13    WINTER STORM HIGH WINDS                65.0     1             65.0
## 14                RIVER FLOOD                58.7   173          10148.4
## 15             HURRICANE ERIN                56.3     7            394.1
## 16                    TYPHOON                54.6    11            601.1
## 17            HURRICANE EMILY                50.0     1             50.0
## 18            DAMAGING FREEZE                45.0     6            270.1
## 19                Early Frost                42.0     1             42.0
## 20                MAJOR FLOOD                35.0     3            105.0
## 21           STORM SURGE/TIDE                31.4   148           4642.0
## 22             River Flooding                26.8     5            134.2
## 23            HIGH WINDS/COLD                23.5     5            117.5
## 24           FLOOD/RAIN/WINDS                18.8     6            112.8
## 25            Damaging Freeze                17.1     2             34.1
## # ... with 948 more rows

Results

The analysis conducted here provides exploratory insights with respect to what types of severe weather events cause most injuries to people, what events are the deadliest, and which events cause most economic damage.

Thus, in the context of the harm to population health, the analysis shows that TORNADOS have caused the greatest number of injuries (91346) between 1950-2011. There were 60652 such events, on average causing 1.5 injuries per event. There are many other more severe event conditions that caused up to 70 injuries per event (such as Heat Wave), but they are relatively rare.

Tornado also leads as the deadliest type of event in terms of the Total Fatalities (5633) followed by Excessive Heat with cumulative 1903 fatalities and Flash Flood accounting for 978 fatalities. In terms of average fatalieies per event, those are not the deadliest events but they are quite frequent. Much less frequent (only one occurence between 1950 and 2011) but more severe in terms of fatalities are TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, and TROPICAL STORM GORDON events.

Finally, Hurricane/Typhoon and Storm Surge lead the ranking of the most severe weather events on the measure of the Total Economic Damage accounting for 71,913.7 and 43,323.5 millions of dollars of total economic damage respectively. These events are quite frequent causing on average 817.2 and 166.0 millions of dollars of damage per event. On the other hand, there are several types of unfrequent events that have more substantial economic consequences at the levels of 1602.5 and 1250.0 millions of dollars per event on average.