Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
In this report we present a brief overview on the impact of some of these main events. More specifically we focus on the effects on human life and their economic impact. This helps to pain a cleared picture on the total cost of the most destructive severe weather events.

 


Data loading and processing

The data for this project comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and was obtained from the archive Storm Data. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. A detailed description of the data set can be obtained here.

 

Loading the data

The downloaded archive (bzip2) was loaded into a data frame directly using the read.csv function.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
              destfile = "repdata-data-StormData.csv.bz2")
raw = read.csv("repdata-data-StormData.csv.bz2", na.strings = "")
dim(raw)
## [1] 902297     37

We can look at the first few rows of the dataset to get an idea of the data represented in this huge data set of 902297 observations.

head(raw)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0    <NA>       <NA>     <NA>     <NA>          0
## 2 TORNADO         0    <NA>       <NA>     <NA>     <NA>          0
## 3 TORNADO         0    <NA>       <NA>     <NA>     <NA>          0
## 4 TORNADO         0    <NA>       <NA>     <NA>     <NA>          0
## 5 TORNADO         0    <NA>       <NA>     <NA>     <NA>          0
## 6 TORNADO         0    <NA>       <NA>     <NA>     <NA>          0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0    <NA>       <NA>   14.0   100 3   0          0
## 2         NA         0    <NA>       <NA>    2.0   150 2   0          0
## 3         NA         0    <NA>       <NA>    0.1   123 2   0          0
## 4         NA         0    <NA>       <NA>    0.0   100 2   0          0
## 5         NA         0    <NA>       <NA>    0.0   150 2   0          0
## 6         NA         0    <NA>       <NA>    1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP  WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0       <NA> <NA>       <NA>      <NA>
## 2        0     2.5          K       0       <NA> <NA>       <NA>      <NA>
## 3        2    25.0          K       0       <NA> <NA>       <NA>      <NA>
## 4        2     2.5          K       0       <NA> <NA>       <NA>      <NA>
## 5        2     2.5          K       0       <NA> <NA>       <NA>      <NA>
## 6        6     2.5          K       0       <NA> <NA>       <NA>      <NA>
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806    <NA>      1
## 2     3042      8755          0          0    <NA>      2
## 3     3340      8742          0          0    <NA>      3
## 4     3458      8626          0          0    <NA>      4
## 5     3412      8642          0          0    <NA>      5
## 6     3450      8748          0          0    <NA>      6
names(raw)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

 

A little preprocessing

library(dplyr, warn.conflicts = FALSE)
raw <- tbl_df(raw)
names(raw) <- sub("_+", ".", tolower(names(raw)))

For the analysis in this report we use the R package dplyr, to turn our huge data frame into a format that is easier to work with. Also we change the row names to lower case letters and remove the underscores to replace them with periods, in line with the R naming conventions.

 


Results

There are two major questions that we are trying to address in this report:
1. Identify the weather events that are the most harmful in terms of population health.
2. Identify the weather events that have the greatest economic consequences.

 

Most harmful events, with respect to population health

As we can see above, our data set contains two columns that contain the number of fatalities and injuries, respectively, for each specific severe weather event. We can use this information to aggregate the total number of injuries for each event type and then order these events by the total toll on human life.

harm <- summarise(group_by(raw, evtype), deaths = sum(fatalities), injuries = sum(injuries), 
                  totaltoll = sum(fatalities) + sum(injuries))
harm <- arrange(harm, desc(totaltoll))
harm[1:15, ]
## Source: local data frame [15 x 4]
## 
##               evtype deaths injuries totaltoll
##               (fctr)  (dbl)    (dbl)     (dbl)
## 1            TORNADO   5633    91346     96979
## 2     EXCESSIVE HEAT   1903     6525      8428
## 3          TSTM WIND    504     6957      7461
## 4              FLOOD    470     6789      7259
## 5          LIGHTNING    816     5230      6046
## 6               HEAT    937     2100      3037
## 7        FLASH FLOOD    978     1777      2755
## 8          ICE STORM     89     1975      2064
## 9  THUNDERSTORM WIND    133     1488      1621
## 10      WINTER STORM    206     1321      1527
## 11         HIGH WIND    248     1137      1385
## 12              HAIL     15     1361      1376
## 13 HURRICANE/TYPHOON     64     1275      1339
## 14        HEAVY SNOW    127     1021      1148
## 15          WILDFIRE     75      911       986

To get a better view of the scale of comparison, we can plot the total number of both injuries and deaths by event type for the 15 events that carry the greatest impacts.

library(ggplot2, warn.conflicts = FALSE)
library(gridExtra, warn.conflicts = FALSE)
deathsplot <- ggplot(arrange(harm, desc(deaths))[1:15,], aes(x=reorder(evtype, desc(deaths)), y=deaths)) + 
    geom_bar(stat = "identity", fill = "magenta4") + 
    labs(title = "Total number of deaths and injuries, by event", y = "total deaths") + 
    theme(axis.text.x = element_text(angle = 35, hjust = 1), axis.title.x=element_blank())
injrplot <- ggplot(arrange(harm, desc(injuries))[1:15,], aes(x=reorder(evtype, desc(injuries)), y=injuries)) + 
    geom_bar(stat = "identity", fill = "midnightblue") + 
    labs(x = "event type", y = "total injuries") +
    theme(axis.text.x = element_text(angle = 35, hjust = 1))
grid.arrange(deathsplot, injrplot, nrow = 2)

We can see from the figure that tornadoes cause the greatest number of both human injuries and casualties. Exessive heat is a close second in terms of total deaths (and in terms of total number of injuries and deaths combined), followed by flash floods and heat. Thunderstorm winds (TSTM), floods and heat also cause similar number of injuries, following tornadoes.

 

Events with greatest economic consequences

Our data set contains variables corresponding to the values of property damage and crop damage caused by the storm.

head(select(raw, propdmg, propdmgexp, cropdmg, cropdmgexp), 5)
## Source: local data frame [5 x 4]
## 
##   propdmg propdmgexp cropdmg cropdmgexp
##     (dbl)     (fctr)   (dbl)     (fctr)
## 1    25.0          K       0         NA
## 2     2.5          K       0         NA
## 3    25.0          K       0         NA
## 4     2.5          K       0         NA
## 5     2.5          K       0         NA

The propdmg and cropdmg variables give the base number for the value of property and crop damage, respectively, while propdmgexp and cropdmgexp give the respective exponents. They can be “K” for thousands (dollars), “M” for millions and “B” for billions. In very few cases, these variables can take a few other values but we ignore them since they occur in such small numbers, and the base number in a lot of those cases is zero anyway.

table(filter(raw, cropdmg != 0)$cropdmgexp)
## 
##     ?     0     2     B     k     K     m     M 
##     0    12     0     7    21 20137     1  1918

So we mutate the data frame to add two new variables for the value of the damages so we can convert the base numbers and exponents into real dollar values. Then we can add the two values up and aggregate the total value of the damages for each event type.

raw <- mutate(raw, proptotal = 
                       ifelse(is.na(propdmgexp), 0, propdmg * 
                                  ifelse(propdmgexp == "K", 1000, 
                                         ifelse(propdmgexp == "M", 1000000, 
                                                ifelse(propdmgexp == "B", 1000000000, 0)))))
raw <- mutate(raw, croptotal = 
                    ifelse(is.na(cropdmgexp), 0, cropdmg * 
                               ifelse(cropdmgexp == "K", 1000, 
                                      ifelse(cropdmgexp == "M", 1000000, 
                                             ifelse(cropdmgexp == "B", 1000000000, 0)))))
damages <- summarise(group_by(raw, evtype), totaldamage = sum(proptotal) + sum(croptotal))
damages <- arrange(damages, desc(totaldamage))
damages[1:15, ]
## Source: local data frame [15 x 2]
## 
##               evtype  totaldamage
##               (fctr)        (dbl)
## 1              FLOOD 150319678250
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57340613590
## 4        STORM SURGE  43323541000
## 5               HAIL  18752904170
## 6        FLASH FLOOD  17562128610
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041310
## 11    TROPICAL STORM   8382236550
## 12      WINTER STORM   6715441250
## 13         HIGH WIND   5908617560
## 14          WILDFIRE   5060586800
## 15         TSTM WIND   5038935790

To get a better view of the economic consequences of the costliest weather events and how they compare to one another we can build a bar plot of the total damage from the storms.

damagesplot <- ggplot(arrange(damages, desc(totaldamage))[1:15,], 
                      aes(x=reorder(evtype, desc(totaldamage)), y=totaldamage/1000000000))
damagesplot + geom_bar(stat = "identity", fill = "forestgreen") + 
    labs(title = "Total property and crops damage, by event") +
    labs(x = "event type", y = "total value (in billions $)") + 
    theme(axis.text.x = element_text(angle = 35, hjust = 1))

As we can see, floods are the costliest of all, repsonsible for about $150 billion in damages, from the data collected in our data set. Hurricanes, tornadoes and storm surges follow next with around $40 - $70 billion in damages.

 


Concluding remarks

In this brief report, we have looked at wchich types of severe weather events have the greatest economic consequences and the biggest impact on human health. We have based our analysis on a data set of select severe weather event records from 1951 to 2012.