Synopsis

Storms and other severe weather events can wreak havoc for communities and municipalities. Preventing fatalities, injuries and property damage is obviously a key concern for municipal managers. In this report we aim to describe which types of severe weather events across the United States are most harmful to the population health, and which have the greatest economic consequences.

For this analysis we used data from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States from 1950 to 2011.

The study found that tornados are the biggest threat to human life overall, while floods causes the most damage annually.

Data Processing

Preparation

library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Load the dataset

The raw data set is never manipulated, so to save extraction time we do not read it again from scratch on every run.

stormdata.raw <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)
dim(stormdata.raw)
## [1] 902297     37
str(stormdata.raw)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Strip the unnecessary columns from the dataset, and add a new calculated column: the total economic cost of damage to property and crops, using the exponent values from the two exponent columns:

# create a function to calculate the damage values using the exponent columns

CalcExponent <- function(value, exp)
{
   exp <- toupper(exp)
   if (is.numeric(exp))
   {
      result <- value * 10^as.numeric(exp)
   }
   else
   {
      result <- value * 10 ^ ifelse(exp == "H", 2,
                             ifelse(exp == "K", 3,
                             ifelse(exp == "M", 6,
                             ifelse(exp == "B", 9, 0))))
   }
   return(result)
}

# create a dataset containing only the relevant columns, including the total calculated damage

columns <- c( "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
stormdata.selected <- stormdata.raw[, columns]
stormdata.selected <- mutate(stormdata.selected, DAMAGE = as.numeric(CalcExponent(PROPDMG, PROPDMGEXP) + 
                                                                     CalcExponent(CROPDMG, CROPDMGEXP)))
head(stormdata.selected)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP DAMAGE
## 1 TORNADO          0       15    25.0          K       0             25000
## 2 TORNADO          0        0     2.5          K       0              2500
## 3 TORNADO          0        2    25.0          K       0             25000
## 4 TORNADO          0        2     2.5          K       0              2500
## 5 TORNADO          0        2     2.5          K       0              2500
## 6 TORNADO          0        6     2.5          K       0              2500

Summarize the data per event type and create the top 10 pareto for each of the indicators:

# prepare the top 10 fatalities contributors

stormdata.fatalities <- aggregate(FATALITIES ~ EVTYPE, data = stormdata.selected, FUN="sum")
stormdata.fatalities.top10 <- stormdata.fatalities[order(-stormdata.fatalities$FATALITIES), ][1:10, ]
stormdata.fatalities.top10
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
# prepare the top 10 injuries contributors

stormdata.injuries <- aggregate(INJURIES ~ EVTYPE, data = stormdata.selected,  FUN="sum")
stormdata.injuries.top10 <- stormdata.injuries[order(-stormdata.injuries$INJURIES), ][1:10, ]
stormdata.injuries.top10
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
# prepare the top 10 economic contributors

stormdata.damage <- aggregate(DAMAGE ~ EVTYPE, data = stormdata.selected,  FUN="sum")
stormdata.damage.top10 <- stormdata.damage[order(-stormdata.damage$DAMAGE), ][1:10, ]
stormdata.damage.top10
##                EVTYPE       DAMAGE
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57352114049
## 670       STORM SURGE  43323541000
## 244              HAIL  18758222016
## 153       FLASH FLOOD  17562129167
## 95            DROUGHT  15018672000
## 402         HURRICANE  14610229010
## 590       RIVER FLOOD  10148404500
## 427         ICE STORM   8967041360

Results

Across the United States, which types of events are most harmful with respect to population health?

The following bar charts display the top 10 contributors with regards to fatalities and health-related incidents, respectively:

par(mfrow = c(1,1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(stormdata.fatalities.top10$FATALITIES, 
        names.arg = stormdata.fatalities.top10$EVTYPE, 
        main = "Top 10 Causes of Fatalities", 
        ylab = "Number of Fatalities",
        las = 3)

barplot(stormdata.injuries.top10$INJURIES, 
        names.arg = stormdata.injuries.top10$EVTYPE, 
        main = "Top 10 Causes of Injuries", 
        ylab = "Number of Injuries",
        las = 3)

Conclusion

From the above it is clear that Tornados are by far the most harmful in terms of fatalities and health-related incidents.

Across the United States, which types of events have the greatest economic consequences?

The following bar chart displays the top 10 contributors with regards to economic consequences:

par(mfrow = c(1,1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(stormdata.damage.top10$DAMAGE / 1000000, 
        names.arg = stormdata.damage.top10$EVTYPE, 
        main = "Top 10 Causes of Economic Damage", 
        ylab = "Damage (MIllion USD)",
        las = 3)

Conclusion

From the above it is clear that normal floods is the single biggest contributor to economic damage.