Pareto Principle Applied to Weather, Health, and Economics: Data Analysis of Severe Weather and its Effects on U.S. Population Health and Economic Indicators (based on NOAA storm data from 1950-2011)

Synopsis

Severe weather events are costly in terms of lives and property, and municipalities can suffer for years from the impact of a single event. This analysis looks at 48 severe weather events and their effects on population health and economic indicators. The purpose of this study is to provide information to state, county, and city planners so they can prioritize resources and improve their emergency preparedness programs. National Oceanographic and Atmospheric Administration (NOAA) storm data collected from 1950 to 2011 provided the historical information. Preliminary results of this analysis indicate that weather impacts abide by the Pareto Principle and a small number of events (between four and eight) are responsible for approximately 80% of impacts. This insight allows planners to focus on the weather events most likely to occur in their region. Population health is most susceptible to tornadic, extreme heat, and tropical storm events. Property and crop damage is most susceptible to flood, hurricane, tornadic, and drought events. This study also provides a full summary of all 48 event types along with the aggregate impact on health and economics over the 60-years of data collected.

Data Processing

Data Collection

This analysis is based on a sample of severe weather events occurring in the United States between 1950 and 2011 [1]. NOAA collected this data to include numerical and categorical information as well as event description narratives. The data were downloaded from a site provided for educational use through Johns Hopkins University [2] using R programming software version 3.1.0 (2014-04-10, “Spring Dance”) [3].

The data itself was made available in a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data contained 902,297 weather event observations and 37 variables.

Exploratory Analysis

This analysis was executed on a Windows i386-w64-mingw32/i386 (32-bit) platform. One package was used that is not included in the R base packages and must be installed:

xtable (version 1.7-3)

Reading the raw data file

Analysis began by loading the data into the R Programming environment. This analysis begins with the raw data in its original .csv.bz2 so as to enable reproducible research. This code chunk reads in the data and provides a view of the data as revealed by the variable names.

## Unzip and read in NOAA bzip2 file
StormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))  # read in csv
names(StormData)  # take a look at names

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

This analysis addresses two fundamental questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

The purpose of this analysis is to provide insights to municipality planners in order to prioritize resources and improve emergency preparedness programs. Based on these questions and purpose, the original dataframe of 37 variables was reduced to seven variables of interest: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.

storm <- StormData[, c(8, 23:28)]  # subset dataframe
names(storm) <- tolower(colnames(storm))  # reset variables using tidy data
## Get overall appreciation for data
str(storm)

## 'data.frame':    902297 obs. of  7 variables:
##  $ evtype    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

rm(StormData)  # view structure and clean up environment

Inspecting for missing and bad data

Exploratory analysis has as one of its purposes the goal of becoming familiar with the data and identifying any pre-processing needs prior to further analysis. First, the structure of the data is satisfactory.

bad <- is.na(storm)  # identify 'NA' in the dataset (i.e., missing data)
table(bad)  # view results

## bad
##   FALSE 
## 6316079

rm(bad)  # clean up R environment

A look for “NA” within observations shows no missing values in the storm dataframe because all (902297 obs.) * (7 variables) = 6,316,079 are FALSE. However, there is some concern about some of the levels under propdmgexp and cropdmgexp. A table() call of these two factors reveals some anomalies (possible bad data)

# Create tables of expontial data to see occurences of the factor levels
table(storm$propdmgexp)

## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

table(storm$cropdmgexp)

## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Transforming variables for property and crop damage

The variables propdmgexp and cropdmgexp will need a transformation to mathematically standardize the numerical variables propdmg and cropdmg - specifically those levels labeled “-”, “?”, and “+” will need to be replaced and the labels for billion (B), million (m or M), thousand (k or K), and hundred (h or H) will need to be replaced with a number to allow scaling of the damage level.

Before eliminating or transforming the data one should understand the potential impacts:

options(scipen = 6)
look <- storm[storm$propdmgexp == "+" | storm$propdmgexp == "-" | storm$propdmgexp == 
    "?" | storm$cropdmgexp == "?", ]
look <- look[order(look$propdmgexp), ]
look[, c(1, 4:7)]

##                     evtype propdmg propdmgexp cropdmg cropdmgexp
## 192467   FLASH FLOOD WINDS    0.41                  0          ?
## 220300  THUNDERSTORM WINDS    0.00                  0          ?
## 229327           HIGH WIND   15.00          -       0           
## 198689  THUNDERSTORM WINDS    0.00          ?       0           
## 225254         FLASH FLOOD    0.00          ?       0           
## 227409         FLASH FLOOD    0.00          ?       0           
## 232016   THUNDERSTORM WIND    0.00          ?       0           
## 233746                HAIL    0.00          ?       0           
## 233747                HAIL    0.00          ?       0           
## 233748                HAIL    0.00          ?       0           
## 247617  THUNDERSTORM WINDS    0.00          ?       0           
## 188780    BREAKUP FLOODING   20.00          +       0           
## 189001           HIGH WIND   20.00          +       0           
## 192262 FLOODING/HEAVY RAIN    2.00          +       0           
## 216755          HIGH WINDS   15.00          +       0           
## 216802             TORNADO   60.00          +       0           
## 197066  THUNDERSTORM WINDS    0.50          K       0          ?
## 197331  THUNDERSTORM WINDS    0.50          K       0          ?
## 220877   FLOOD/FLASH FLOOD  400.00          K       0          ?
## 242953  THUNDERSTORM WINDS   80.00          K       0          ?
## 232901   FLOOD/FLASH FLOOD    0.50          M       0          ?

rm(look)  # clean up R environment

Based on this subset of data, the following transformations will be made using the procedures provided below:

For propdmgexp variable
- Change “?” to “” {blanks}; this retains event data but there is no monetary value tied to these
- Eliminate “-” and “+” observations; without knowing exponent factor the small dollar values could induce bias
For cropdmgexp variable
- Change “?” to “” {blanks}; this retains event and propdmg data but there is no monetary value tied to these cropdmg

stormMod <- storm  # create new dataframe to retain original
stormMod$propdmgexp <- as.character(stormMod$propdmgexp)  # set as char for gsub
stormMod$cropdmgexp <- as.character(stormMod$cropdmgexp)  # set as char for gsub
stormMod$propdmgexp <- gsub("\\?", "", stormMod$propdmgexp)  #replace ? with blank
stormMod$cropdmgexp <- gsub("\\?", "", stormMod$cropdmgexp)  #replace ? with blank
## Start a vector called remove to gather row numbers of data to remove
remove <- grep("\\+", stormMod$propdmgexp, value = FALSE, ignore.case = TRUE)
moreRemove <- grep("\\-", stormMod$propdmgexp, value = FALSE, ignore.case = TRUE)
remove <- append(remove, moreRemove, after = length(remove))  # combine grep info
stormMod <- stormMod[-remove, ]  # create final dataframe by removing rows
rm(moreRemove, remove)  #clean up R environment

Now the labels propmdgexp and cropdmgexp are changed to numeral characters.

## Replace exp labels for hundred, thousand, million, and billion
stormMod$propdmgexp <- gsub("[hH]", "2", stormMod$propdmgexp)
stormMod$propdmgexp <- gsub("[kK]", "3", stormMod$propdmgexp)
stormMod$propdmgexp <- gsub("[mM]", "6", stormMod$propdmgexp)
stormMod$propdmgexp <- gsub("[bB]", "9", stormMod$propdmgexp)

## Convert single digit to exponential form
for (i in 1:9) {
    stormMod$propdmgexp <- gsub(i, as.numeric(paste("1E", i, sep = "")), stormMod$propdmgexp)
}
stormMod$propdmgexp <- gsub("^[0]", "1", stormMod$propdmgexp)  #fixes the value '0'
stormMod$propdmgexp <- as.numeric(stormMod$propdmgexp)  # converts to numeric
fix.na <- is.na(stormMod$propdmgexp)  # finds blanks converted to 'NA'
stormMod$propdmgexp[fix.na] <- 1  # replaces 'NA' with the number 1

## Creates new column for normalized property damage values
stormMod$propertydamage <- stormMod$propdmg * stormMod$propdmgexp

## Compare original and transformed data for quality check
table(storm$propdmgexp)

## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

table(stormMod$propdmgexp)

## 
##          1         10        100       1000      10000     100000 
##     466158         25         20     424669          4         28 
##    1000000   10000000  100000000 1000000000 
##      11341          5          1         40

rm(i, fix.na)  # clean up R environment

Although it is a lot of code to show, the nature of the gsub() function makes it a good choice to simply repeat the transformation for property damage to create good numbers for crop damage.

## Replace exp labels for hundred, thousand, million, and billion
stormMod$cropdmgexp <- gsub("[kK]", "3", stormMod$cropdmgexp)
stormMod$cropdmgexp <- gsub("[mM]", "6", stormMod$cropdmgexp)
stormMod$cropdmgexp <- gsub("[bB]", "9", stormMod$cropdmgexp)

## Convert single digit to exponential form
for (i in 1:9) {
    stormMod$cropdmgexp <- gsub(i, as.numeric(paste("1E", i, sep = "")), stormMod$cropdmgexp)
}
stormMod$cropdmgexp <- gsub("^[0]", "1", stormMod$cropdmgexp)  #fixes the value '0'
stormMod$cropdmgexp <- as.numeric(stormMod$cropdmgexp)  # converts to numeric
fix.na <- is.na(stormMod$cropdmgexp)  # finds blanks converted to 'NA'
stormMod$cropdmgexp[fix.na] <- 1  # replaces 'NA' with the number 1

## Creates new column for normalized property damage values
stormMod$cropdamage <- stormMod$cropdmg * stormMod$cropdmgexp

## Compare original and transformed data for quality check
table(storm$cropdmgexp)

## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

table(stormMod$cropdmgexp)

## 
##          1        100       1000    1000000 1000000000 
##     618433          1     281853       1995          9

rm(i, fix.na)  # clean up R environment

Lastly, a new variable totaldamage is added to the original data, combining the levels of property and crop damage. The interim variables are removed to cleanup the transformed data.

## Creates new column for normalized property damage values
stormMod$totaldamage <- stormMod$propertydamage + stormMod$cropdamage
## remove all interim property and crop damage variables
stormMod <- stormMod[, -(4:7)]
## table(stormMod$evtype) #commented out as it is very long
message(paste("Number of unique event types in dataset:", length(unique(stormMod$evtype)), 
    sep = " "))

## Number of unique event types in dataset: 983

Transforming variables for event type

The most complex aspect of data transformation during the exploratory phases was the event types. A lengthy description plus all code is provided here to clearly demonstrate the assumptions and decisions made in the transformation process. There were 983 unique event types as determined from the length(unique()) call below. According to the document STORM DATA PREPARATION, NWSI 10-1605 [1], there are only 48 different types of entries permitted to be recorded. Section 2.1 of the document notes:

2.1 Permitted Storm Data Events. The only events permitted in Storm Data are listed in Table 1 of Section 2.1.1. The chosen event name should be the one that most accurately describes the meteorological event leading to fatalities, injuries, damage, etc. However, significant events, such as tornadoes, having no impact or causing no damage, should also be included in Storm Data. See Section 7 for detailed examples.

Therefore, during the exploratory analysis phase the event type variable is transformed to better conform to the 48 permitted entries for event type. This also serves to combine the data into less categories and make the results more useful.

stormMod$evtype <- as.character(stormMod$evtype)
stormMod$evtype <- gsub("^[ ]+", "", stormMod$evtype)  # remove leading whitespace

## Remove lines that are simply summaries containing no data
remove <- grep("[sS]ummar", stormMod$evtype, value = FALSE, ignore.case = TRUE)
remove <- append(remove, grep("\\?", stormMod$evtype, value = FALSE, ignore.case = TRUE), 
    after = length(remove))
remove <- append(remove, grep("^[mM]onthly", stormMod$evtype, value = FALSE, 
    ignore.case = TRUE), after = length(remove))
remove <- append(remove, grep("excessive$", stormMod$evtype, value = FALSE, 
    ignore.case = TRUE), after = length(remove))
remove <- append(remove, grep("^HIGH$", stormMod$evtype, value = FALSE, ignore.case = TRUE), 
    after = length(remove))
remove <- append(remove, grep("SOUTHEAST", stormMod$evtype, value = FALSE, ignore.case = TRUE), 
    after = length(remove))
remove <- append(remove, grep("APACHE COUNTY", stormMod$evtype, value = FALSE, 
    ignore.case = TRUE), after = length(remove))
stormMod <- stormMod[-remove, ]  # eliminates these observations
rm(remove)

The details of the grepl code will show readers the groupings of event types that were combined and transformed into one of the 48 permitted event types.

new <- stormMod
## Good codings of event types
new$swap[grepl("^ASTRO(.*)", new$evtype)] <- "ASTRONOMICAL LOW TIDE"
new$swap[grepl("^AVAL(.*)", new$evtype)] <- "AVALANCHE"
new$swap[grepl("^DUST(.*)L", new$evtype, ignore.case = T)] <- "DUST DEVIL"
dust <- c("sahar(.*)", "^DUSTS(.*)", "^dust sto(.*)")
new$swap[grepl(paste(dust, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "DUST STORM"
heat <- c("^EXCESSIVE HEAT$", "RECORD/EXCESSIVE HEAT")
new$swap[grepl(paste(heat, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "EXCESSIVE HEAT"
blizzard <- c("^BLIZZ(.*)", "^BLOW(.*)", "^RECORD(.*)snow$", "ground blizzard")
new$swap[grepl(paste(blizzard, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "BLIZZARD"
new$swap[grepl("BLOWING DUST", new$evtype)] <- "DUST STORM"
cflood <- c("^Beach e", "^coastal", "cstl", "^tidal flood", "Beach flood")
new$swap[grepl(paste(cflood, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "COASTAL FLOODING"
tstm <- c("^TROPICAL S(.*)", "^TSTM(.*)")
new$swap[grepl(paste(tstm, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "TROPICAL STORM"
new$swap[grepl("TSUNAMI", new$evtype)] <- "TSUNAMI"
cold <- c("^cold", "cool", "^WIND CHILL$", "WIND CHILL/HIGH WIND", "^COLD/WIND CHILL", 
    "COLD WIND CHILL TEMPERATURES", "Unseasonable Cold", "UNSEASONABLY COLD", 
    "UNSEASONAL LOW TEMP")
new$swap[grepl(paste(cold, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "COLD/WIND CHILL"
fog <- c("DENSE FOG", "^FOG", "FOG AND COLD TEMPERATURES", "VOG", "PATCHY DENSE FOG")
new$swap[grepl(paste(fog, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "DENSE FOG"
new$swap[grepl("SMOKE", new$evtype)] <- "DENSE SMOKE"
drought <- c("BELOW NORMAL PRECIPITATION", "DRIEST MONTH", "^DROUGHT$", "DROUGHT/EXCESSIVE HEAT", 
    "^DRY$", "DRY CONDITIONS", "DRY HOT WEATHER", "DRY PATTERN", "DRY SPELL", 
    "DRY WEATHER", "DRYNESS", "EXCESSIVE HEAT/DROUGHT", "EXCESSIVELY DRY", "EXTREME HEAT", 
    "HEAT WAVE DROUGHT", "LACK OF SNOW", "Mild and Dry Pattern", "MILD/DRY PATTERN", 
    "PROLONG WARMTH", "Record dry month", "RECORD DRYNESS", "RECORD LOW RAINFALL", 
    "heat drought", "UNSEASONABLY DRY")
new$swap[grepl(paste(drought, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "DROUGHT"
excold <- c("hyp", "low temperature", "^LOW WIND CHILL", "BITTER WIND CHILL", 
    "BITTER WIND CHILL TEMPERATURES", "Excessive Cold", "Extended Cold", "^Extreme ", 
    "EXTREME/RECORD COLD", "Prolong Cold", "RECORD  COLD", "Record Cold", "RECORD LOW$", 
    "SEVERE COLD", "UNUSUALLY COLD")
new$swap[grepl(paste(excold, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "EXTREME COLD/WIND CHILL"
flash <- c("^flash flood", "floood", "LOCAL FLASH FLOOD", "RAPIDLY RISING WATER", 
    "^street flood", "^small stream and", "^small stream flood", "SMALL STREAM URBAN FLOOD", 
    "SMALL STREAM/URBAN FLOOD", "^Sml Stream Fld", "^STREAM FLOODING$")
new$swap[grepl(paste(flash, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "FLASH FLOOD"
flood <- c("^flood", "^river", "HIGH WATER", "ICE JAM FLOODING", "LAKE FLOOD", 
    "LOCAL FLOOD", "MAJOR FLOOD", "rural flood", "^DAM ")
new$swap[grepl(paste(flood, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "FLOOD"
fog <- c("g fog$", "ice fog")
new$swap[grepl(paste(fog, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "FREEZING FOG"
frost <- c("^frost$", "freeze", "Freezing Spray", "Early Frost", "EARLY FROST", 
    "FIRST FROST", "freezing drizzle", "^freezing rain", "^frost", "glaze")
new$swap[grepl(paste(frost, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "FROST/FREEZE"
new$swap[grepl("^funnel", new$evtype, ignore.case = T)] <- "FUNNEL CLOUD"
hail <- c("DEEP HAIL", "^hail", "LATE SEASON HAIL", "NON SEVERE HAIL", "small hail", 
    "THUNDERSTORM HAIL", "THUNDERSTORM WINDS/ HAIL")
new$swap[grepl(paste(hail, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HAIL"
heat <- c("^heat w", "^heat$", "ABNORMAL WARMTH", "ABNORMALLY DRY", "^heat/", 
    "record heat", "record high", "^hot", "hot$", "record warm", "red flag", 
    "UNSEASONABLY WARM", "HIGH TEMPERATURE RECORD", "HOT/DRY PATTERN", "UNSEASONABLY HOT", 
    "UNUSUAL WARMTH", "UNUSUALLY WARM", "VERY DRY", "VERY WARM", "WARM DRY CONDITIONS", 
    "WARM WEATHER")
new$swap[grepl(paste(heat, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HEAT"
hrain <- c("^heavy rain", "heavy prec", "^mud", "^rain", "torrent", "^landsl", 
    "heavy shower", "EXCESSIVE PRECIPITATION", "EXCESSIVE RAIN", "EXCESSIVE WETNESS", 
    "EXTREMELY WET", "Heatburst", "RECORD PRECIPITATION", "RECORD RAINFALL", 
    "RECORD/EXCESSIVE RAINFALL", "PROLONGED RAIN", "LOCALLY HEAVY RAIN", "Metro Storm, May 26", 
    "^hvy rain")
new$swap[grepl(paste(hrain, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HEAVY RAIN"
hsnow <- c("^heavy snow", "HEAVY LAKE SNOW", "HEAVY MIX", "EXCESSIVE SNOW", 
    "^snow", "HEAVY WET SNOW", "Mountain Snows", "NEAR RECORD SNOW", "RECORD SNOW/COLD", 
    "RECORD SNOWFALL")
new$swap[grepl(paste(hsnow, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HEAVY SNOW"
surf <- c("^high surf", "^heavy surf", " seas$", "HAZARDOUS SURF", "HEAVY SEAS", 
    "HEAVY SWELLS", "HIGH  SWELLS", "HIGH SEAS", "HIGH SWELLS", "^HIGH TIDES", 
    "HIGH WAVES", "ROGUE WAVE", "ROUGH SURF")
new$swap[grepl(paste(surf, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HIGH SURF"
hwind <- c("^high wind", "high  wind", "^wnd$", "^non(.*)wind", "^gust", "gradient")
new$swap[grepl(paste(hwind, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HIGH WIND"
hurr <- c("hurricane", "REMNANTS OF FLOYD", "TYPHOON")
new$swap[grepl(paste(hurr, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "HURRICANE/TYPHOON"
ice <- c("^ice", "^icy", "LIGHT FREEZING RAIN")
new$swap[grepl(paste(ice, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "ICE STORM"
new$swap[grepl("^lake(.*)snow$", new$evtype, ignore.case = T)] <- "LAKE-EFFECT SNOW"
new$swap[grepl("LAKESHORE FLOOD", new$evtype, ignore.case = T)] <- "LAKESHORE FLOOD"
lightn <- c("^lightn", "lighting", "LIGNTNING")
new$swap[grepl(paste(lightn, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "LIGHTNING"
new$swap[grepl("MARINE HAIL", new$evtype, ignore.case = T)] <- "MARINE HAIL"
new$swap[grepl("MARINE HIGH WIND", new$evtype, ignore.case = T)] <- "MARINE HIGH WIND"
new$swap[grepl("MARINE STRONG WIND", new$evtype, ignore.case = T)] <- "MARINE STRONG WIND"
new$swap[grepl("MARINE THUNDERSTORM WIND", new$evtype, ignore.case = T)] <- "MARINE THUNDERSTORM WIND"
new$swap[grepl("rip current", new$evtype, ignore.case = T)] <- "RIP CURRENT"
new$swap[grepl("seiche", new$evtype, ignore.case = T)] <- "SEICHE"
new$swap[grepl("^sleet", new$evtype, ignore.case = T)] <- "SLEET"
new$swap[grepl("TROPICAL DEPRESSION", new$evtype, ignore.case = T)] <- "TROPICAL DEPRESSION"
new$swap[grepl("^VOLCANIC", new$evtype, ignore.case = T)] <- "VOLCANIC ASH"
wspout <- c("^waterspout", "WAYTERSPOUT", "WATER SPOUT")
new$swap[grepl(paste(wspout, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "WATERSPOUT"
wfires <- c("fires$", "^wild(.*)fire$", "BRUSH FIRE")
new$swap[grepl(paste(wfires, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "WILDFIRE"
wstorm <- c("^winter storm", "UNUSUALLY LATE SNOW", "WINTER MIX")
new$swap[grepl(paste(wstorm, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "WINTER STORM"
tornado <- c("^tornado", "microburst", "downburst", "wall cloud", "mirco", "torndao")
new$swap[grepl(paste(tornado, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "TORNADO"
swind <- c("^strong wind", "^wind", "whirlwind")
new$swap[grepl(paste(swind, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "STRONG WIND"
tstorm <- c("^thund", "THUNERSTORM", "TUNDERSTORM", "micoburst", "severe thun", 
    "SEVERE TURBULENCE", "STORM FORCE WINDS", "THUDERSTORM WINDS", "LANDSPOUT", 
    "MARINE TSTM WIND")
new$swap[grepl(paste(tstorm, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "THUNDERSTORM WIND"
surge <- c("^urban", "HIGHWAY FLOODING", "MINOR FLOOD", "storm surge")
new$swap[grepl(paste(surge, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "STORM SURGE/TIDE"
winterwx <- c("BLACK ICE", "Drifting Snow", "EARLY SNOW", "FALLING SNOW/ICE", 
    "FIRST SNOW", "LATE(.*)SNOW", "Late(.*)fall", "^LIGHT SNOW", "MIXED PRECIP", 
    "MODERATE SNOW", "PATCHY ICE", "Seasonal Snowfall", "WINTER WEATHER", "WINTERY MIX", 
    "wintry mix")
new$swap[grepl(paste(winterwx, collapse = "|"), new$evtype, ignore.case = TRUE)] <- "WINTER WEATHER"
rm(surge, tstorm, wfires, wstorm, tornado, dust, heat, blizzard, cflood, tstm, 
    cold, fog, drought, excold, flash, frost, flood, hail, hrain, hsnow, surf, 
    hwind, hurr, ice, lightn, swind, winterwx, wspout)
rm(stormMod)

Lastly, the small number of orphaned event types are gathered up and placed into an event type called OTHER EVENT. This code makes the transformation. The table() call shows the 48 (plus one) list of event types now available after transformation.

## replace all 'oddball items' with 'OTHER EVENT'
bad <- is.na(new$swap)
new[is.na(new)] <- 0
new$swap <- gsub(0, "OTHER EVENT", new$swap)
table(new$swap)  # show final set of event types

## 
##    ASTRONOMICAL LOW TIDE                AVALANCHE                 BLIZZARD 
##                      277                      387                     2765 
##         COASTAL FLOODING          COLD/WIND CHILL                DENSE FOG 
##                      901                      704                     1836 
##              DENSE SMOKE                  DROUGHT               DUST DEVIL 
##                       21                     2601                      151 
##               DUST STORM           EXCESSIVE HEAT  EXTREME COLD/WIND CHILL 
##                      437                     1681                     2047 
##              FLASH FLOOD                    FLOOD             FREEZING FOG 
##                    55071                    26319                       46 
##             FROST/FREEZE             FUNNEL CLOUD                     HAIL 
##                     1857                     6980                   288838 
##                     HEAT               HEAVY RAIN               HEAVY SNOW 
##                     1291                    12521                    16663 
##                HIGH SURF                HIGH WIND        HURRICANE/TYPHOON 
##                     1088                    21929                      301 
##                ICE STORM         LAKE-EFFECT SNOW          LAKESHORE FLOOD 
##                     2149                      659                       23 
##                LIGHTNING              MARINE HAIL         MARINE HIGH WIND 
##                    15770                      442                      135 
##       MARINE STRONG WIND MARINE THUNDERSTORM WIND              OTHER EVENT 
##                       48                     5812                      166 
##              RIP CURRENT                   SEICHE                    SLEET 
##                      777                       21                       78 
##         STORM SURGE/TIDE              STRONG WIND        THUNDERSTORM WIND 
##                     4230                     4229                   109946 
##                  TORNADO      TROPICAL DEPRESSION           TROPICAL STORM 
##                    60920                       60                   221754 
##                  TSUNAMI             VOLCANIC ASH               WATERSPOUT 
##                       20                       29                     3861 
##                 WILDFIRE             WINTER STORM           WINTER WEATHER 
##                     4237                    11442                     8635

## permanently transform the dataframe
new$evtypeoriginal <- new$evtype
new$evtype <- new$swap
new <- new[, -7]
## Create final dataset for analysis and remove temporary dataframe
stormFinal <- new  # create finalized dataset
stormFinal$evtype <- as.factor(stormFinal$evtype)  # make class 'factor'
rm(storm, new)

A summary() call give us an idea of what the final dataframe looks like. Notice the highly left skewed distribution of raw data based on the median = 0 for all numeric variables.

summary(stormFinal)  # get idea of what data looks like

##                evtype         fatalities     injuries     
##  HAIL             :288838   Min.   :  0   Min.   :   0.0  
##  TROPICAL STORM   :221754   1st Qu.:  0   1st Qu.:   0.0  
##  THUNDERSTORM WIND:109946   Median :  0   Median :   0.0  
##  TORNADO          : 60920   Mean   :  0   Mean   :   0.2  
##  FLASH FLOOD      : 55071   3rd Qu.:  0   3rd Qu.:   0.0  
##  FLOOD            : 26319   Max.   :583   Max.   :1700.0  
##  (Other)          :139307                                 
##  propertydamage           cropdamage          totaldamage          
##  Min.   :           0   Min.   :         0   Min.   :           0  
##  1st Qu.:           0   1st Qu.:         0   1st Qu.:           0  
##  Median :           0   Median :         0   Median :           0  
##  Mean   :      474669   Mean   :     54430   Mean   :      529099  
##  3rd Qu.:         500   3rd Qu.:         0   3rd Qu.:        1000  
##  Max.   :115000000000   Max.   :5000000000   Max.   :115032500000  
##                                                                    
##  evtypeoriginal    
##  Length:902155     
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
##

The final code book accompanying this analysis after transformations is provided in Table 1. This table was coded by hand using HTML and not generated using the xtable() function of R.

Table 1: Codebook for Storm Data Analysis Accounting for Transformations of Original Dataset

Variable Name	Description {class of variable}
evtype	Event type per document STORM DATA PREPARATION, NWSI 10-1605 {factor}
fatalities	Number of fatalities during event {numeric}
injuries	Number of injuries during event {numeric}
propertydamage	Property damage from event in U.S. dollars {numeric}
cropdamage	Crop damage from event in U.S. dollars {numeric}
totaldamage	Addend of property and crop damage {numeric}
evtypeoriginal	Event type as coded in the original data set {factor}

Results

The results are presented in three sections: aggregating the dataset, plotting aggregated data, and interpreting analysis results.

Aggregating the dataset

The dataset stormFinal is a tidy dataset with one row per observation and one column per variable. Analyzing this data through visual means first requires an aggregation of variables of interest by categorical group to enable plotting. Beginning with the full dataset, a temporary dataframe sum is generated by applying the sum() function across each variable and across each event type then merging the new information into sum. This results in a summary of impact by event type as provided in Table 2.

## Calculate the total level of impact by aggregating Event Types Begin with
## fatalities and injuries
sum <- aggregate(stormFinal$fatalities, by = list(stormFinal$evtype), sum)
sumInjuries <- aggregate(stormFinal$injuries, by = list(stormFinal$evtype), 
    sum)
sum <- merge(sum, sumInjuries, by.x = "Group.1", by.y = "Group.1")
## add property damage to dataframe
sumProp <- aggregate(stormFinal$propertydamage, by = list(stormFinal$evtype), 
    sum)
sum <- merge(sum, sumProp, by.x = "Group.1", by.y = "Group.1")
## add crop damage to dataframe
sumCrop <- aggregate(stormFinal$cropdamage, by = list(stormFinal$evtype), sum)
sum <- merge(sum, sumCrop, by.x = "Group.1", by.y = "Group.1")
## add total damage to dataframe
sumTotDamage <- aggregate(stormFinal$totaldamage, by = list(stormFinal$evtype), 
    sum)
sum <- merge(sum, sumTotDamage, by.x = "Group.1", by.y = "Group.1")
## provide appropriate variable names
names(sum) <- c("eventtype", "fatalities", "injuries", "propertydamage", "cropdamage", 
    "totaldamage")
rm(sumCrop, sumInjuries, sumProp, sumTotDamage)

# Install package if not available to R and then open library
if (!"xtable" %in% installed.packages()) {
    install.packages("xtable")
}
library(xtable)
sumtab <- sum  # create temporary dataframe
sumtab[, 4:6] <- sumtab[, 4:6]/1000000  #scale down large numbers
names(sumtab) <- c("Event Type", "Fatalities", "Injuries", "Property Damage ($M)", 
    "Crop Damage ($M)", "Total Damage ($M)")  # give human readable names
tabletwo <- xtable(sumtab, caption = "Table 2: Weather Impacts. Fatalities, Injuries, and Damage Resulting from Severe Weather in the U.S. (1950-2011)", 
    digits = c(0, 0, 0, 0, 2, 2, 2), align = c("l", "l", "r", "r", "r", "r", 
        "r"), html.table.attributes = getOption("xtable.html.table.attributes", 
        "border=1"), html.table.attributes = getOption("xtable.html.table.attributes", 
        "align = center"))
print(tabletwo, type = "html", caption.placement = "top", include.rownames = F)

Table 2: Weather Impacts. Fatalities, Injuries, and Damage Resulting from Severe Weather in the U.S. (1950-2011)
Event Type	Fatalities	Injuries	Property Damage ($M)	Crop Damage ($M)	Total Damage ($M)
ASTRONOMICAL LOW TIDE	0	0	9.74	0.00	9.74
AVALANCHE	225	170	3.72	0.00	3.72
BLIZZARD	103	819	660.83	112.06	772.89
COASTAL FLOODING	10	10	445.20	0.06	445.25
COLD/WIND CHILL	160	60	2.54	101.74	104.29
DENSE FOG	81	1077	22.83	0.00	22.83
DENSE SMOKE	0	0	0.10	0.00	0.10
DROUGHT	2	4	1046.11	13972.57	15018.68
DUST DEVIL	2	43	0.72	0.00	0.72
DUST STORM	22	440	5.62	3.60	9.22
EXCESSIVE HEAT	1920	6525	7.75	492.40	500.16
EXTREME COLD/WIND CHILL	418	415	133.41	1335.02	1468.43
FLASH FLOOD	1019	1785	17414.73	1437.17	18851.90
FLOOD	502	6809	150301.11	10936.19	161237.31
FREEZING FOG	0	0	2.18	0.00	2.18
FROST/FREEZE	20	272	28.92	1997.06	2025.98
FUNNEL CLOUD	0	3	0.19	0.00	0.19
HAIL	15	1371	15977.54	3046.89	19024.43
HEAT	1160	2563	12.46	407.07	419.53
HEAVY RAIN	149	337	3564.10	958.52	4522.62
HEAVY SNOW	142	1139	984.52	134.68	1119.21
HIGH SURF	179	261	101.56	0.00	101.56
HIGH WIND	296	1491	6005.21	686.51	6691.72
HURRICANE/TYPHOON	135	1333	85356.41	5516.12	90872.53
ICE STORM	101	2146	3972.51	5027.11	8999.62
LAKE-EFFECT SNOW	0	0	40.18	0.00	40.18
LAKESHORE FLOOD	0	0	7.54	0.00	7.54
LIGHTNING	817	5232	935.46	12.09	947.55
MARINE HAIL	0	0	0.00	0.00	0.00
MARINE HIGH WIND	1	1	1.30	0.00	1.30
MARINE STRONG WIND	14	22	0.42	0.00	0.42
MARINE THUNDERSTORM WIND	10	26	0.44	0.05	0.49
OTHER EVENT	9	11	0.26	11.03	11.29
RIP CURRENT	577	529	0.16	0.00	0.16
SEICHE	0	0	0.98	0.00	0.98
SLEET	2	0	0.50	0.00	0.50
STORM SURGE/TIDE	54	122	48046.99	10.63	48057.63
STRONG WIND	137	388	188.41	70.81	259.23
THUNDERSTORM WIND	211	2461	6646.75	652.96	7299.71
TORNADO	5661	91393	58559.45	417.48	58976.93
TROPICAL DEPRESSION	0	0	1.74	0.00	1.74
TROPICAL STORM	576	7439	12252.46	1313.60	13566.06
TSUNAMI	33	129	144.06	0.02	144.08
VOLCANIC ASH	0	0	0.50	0.00	0.50
WATERSPOUT	6	72	60.73	0.00	60.73
WILDFIRE	90	1608	8496.63	403.28	8899.91
WINTER STORM	217	1353	6749.00	32.44	6781.44
WINTER WEATHER	67	668	30.88	15.00	45.88

rm(tabletwo, sumtab)

Plotting aggregated data

This table is useful, but visualization is even more useful in analysis. The first plot is provided to present a visual representation of the number of impacts to population health. It also will provide a feel for the distribution of the data as shown in Figure 1.

## Get a feel for what the big picture looks like...just exploring fatalities
par(mar = c(10, 4, 3, 2))
sumexplore <- sum[order(sum$fatalities, decreasing = T), ]
barplot(sumexplore$fatalities, names.arg = sumexplore$eventtype, horiz = F, 
    las = 2, axes = F, cex.names = 0.75, ylim = c(0, 6000), ylab = "Number of Fatalities by Event Type", 
    cex.lab = 1, main = "Exploring Distribution of Fatalities Based on Event Type", 
    cex.main = 1)
axis(2, las = 2, cex.axis = 0.8)

plot of chunk explorespace

Figure 1. Representative Distribution of “fatalities” Variable. This figure provides a view of a highly right-skewed variable aggregated from the pre-processed dataset. Similar highly skewed distributions would be seen for all of the six variables based on an inspection of the data in Table 2. This provides a rationale to further condense the data for a more meaningful analysis.

Analysis proceeds by employing the Pareto Principle, also referred to as the 80-20 rule, and named after the Italian economist Vilfredo Pareto. He observed that 20% of his countrymen owned 80% of the land. This led quality expert Joseph Juran to bring it to fame noting that a large number of events are based on sparsity and the vital few dictate the macro effects [4]. Based on the representative distribution in Figure 1, these appears to be a viable strategy to simplify the results. If the Pareto Principle for severe weather impacts in society, we might expect that 20% of the 49 categories (roughly 5 event types) would account for 80% of the impacts on population health and economics.

This chunk of code generates two dataframes for population health by ordering the data, cutting it at the 80% point, and recombining the remaining 20% into a single category. This transformed data is then plotted in Figure 2 with annotations showing various percentages of the whole.

rm(sumexplore)
## Generate Pareto-like plots showing the Event Types causing the top 80% of
## fatalities and injuries Extract fatality and injury data from summary
fatsort <- sum[, 1:2]
injsort <- sum[, c(1, 3)]

## Create pareto-style charts for both --> fatalities
fatsort <- fatsort[order(fatsort$fatalities, decreasing = T), ]  # order data
twentypercent <- sum(fatsort$fatalities[9:48])  # establish the sum of bottom 20%
eventtype = c("40 Other Events")
fatalities = twentypercent  # temp data frame
tempdf <- data.frame(eventtype, fatalities)
fatsort <- rbind(fatsort, tempdf)
fatsort <- fatsort[c(1:8, 49), ]
fatsort$fatalities <- as.numeric(fatsort$fatalities/1000)

## --> injuries
injsort <- injsort[order(injsort$injuries, decreasing = T), ]
twentypercent <- sum(injsort$injuries[5:48])
eventtype = c("44 Other Events")
injuries = twentypercent
temp <- data.frame(eventtype, injuries)
injsort <- rbind(injsort, temp)
injsort <- injsort[c(1:4, 49), ]
injsort$injuries <- as.numeric(injsort$injuries/1000)

## Generate figure - side-by-side plot
par(mfrow = c(1, 2))
par(mar = c(9, 4, 2, 2) + 0.1, oma = c(0, 0, 3, 0))
options(scipen = -2)

barplot(fatsort$fatalities, names.arg = fatsort$eventtype, horiz = F, las = 2, 
    axes = F, cex.names = 1, ylim = c(0, 10), col = "black", ylab = "Number of Fatalities by Event Type (in thousands)", 
    cex.lab = 1, main = "Eight (8) of 48 Event Types Represent 80% of Fatalities", 
    cex.main = 1)
axis(2, las = 2, cex.axis = 1)
tiles = c(sum(sum$fatalities)/1000 * 0.05, sum(sum$fatalities)/1000 * 0.2, sum(sum$fatalities)/1000 * 
    0.35)
abline(h = tiles, lty = 3)
text(8, tiles, c("5%", "20% of total", "35%"), cex = 1, pos = 3, offset = 0.25)
text(6, 9.5, cex = 1.2, "Note: left panel scale is an \norder of magnitude less than right panel")

barplot(injsort$injuries, names.arg = injsort$eventtype, horiz = F, las = 2, 
    axes = F, cex.names = 1, ylim = c(0, 100), col = "red3", ylab = "Number of Injuries by Event Type (in thousands)", 
    cex.lab = 1, main = "Four (4) of 48 Event Types Represent 80% of Injuries", 
    cex.main = 1)
axis(2, las = 2, cex.axis = 1)
tiles = c(sum(sum$injuries)/1000 * 0.05, sum(sum$injuries)/1000 * 0.2, sum(sum$injuries)/1000 * 
    0.65)
abline(h = tiles, lty = 3)
text(4, tiles, c("5%", "20% of total", "65%"), cex = 1, pos = 3, offset = 0.25)

par(mfrow = c(1, 1))
title(main = "Severe Weather Events with Greatest Impact to Population Health", 
    outer = TRUE)

plot of chunk generateplots

Figure 2. Impacts to Population Health. Two panels showing the impacts to population health in terms of fatalities (left panel) and injuries (right panel). It is worth noting that the left panel uses a y-axis that is one order of magnitude less than the right panel - simply to make the results more readable. Both distributions follow the Pareto Principle with results of 10-15% of event types causing 80% of the impact. All key event types causing injury are seen in roughly the same order as fatalities. A noteworthy difference is the impact of heat, lightning, and rip currents on fatalities.

Similar analysis was conducted on the individual and collective indicators of economic impact. Figure 3 provides similar annotations to show the levels representing 20% of the whole.

rm(eventtype, fatalities, injuries, tiles, twentypercent)
rm(tempdf, temp, fatsort, injsort)
## Generate Pareto-like plots showing the Event Types causing the top 80% of
## property, crop, and total damage

## Extract damage data from summary
propsort <- sum[, c(1, 4)]
cropsort <- sum[, c(1, 5)]
totalsort <- sum[, c(1, 6)]

## Create pareto-style chart with panel for individual and combined damage
## --> property
propsort <- propsort[order(propsort$propertydamage, decreasing = T), ]  # order data
twentypercent <- sum(propsort$propertydamage[5:48])  # sum of bottom 20%
eventtype = c("44 Other Events")
propertydamage = twentypercent  # temp data frame
tempdf <- data.frame(eventtype, propertydamage)
propsort <- rbind(propsort, tempdf)
propsort <- propsort[c(1:4, 49), ]
propsort$propertydamage <- as.numeric(propsort$propertydamage)

## --> crop
cropsort <- cropsort[order(cropsort$cropdamage, decreasing = T), ]  # order data
twentypercent <- sum(cropsort$cropdamage[6:48])  # sum of bottom 20%
eventtype = c("43 Other Events")
cropdamage = twentypercent  # temp data frame
tempdf <- data.frame(eventtype, cropdamage)
cropsort <- rbind(cropsort, tempdf)
cropsort <- cropsort[c(1:5, 49), ]
cropsort$cropdamage <- as.numeric(cropsort$cropdamage)

## --> total damage (property + crop)
totalsort <- totalsort[order(totalsort$totaldamage, decreasing = T), ]  # order data
twentypercent <- sum(totalsort$totaldamage[6:48])  # sum of bottom 20%
eventtype = c("43 Other Events")
totaldamage = twentypercent  # temp data frame
tempdf <- data.frame(eventtype, totaldamage)
totalsort <- rbind(totalsort, tempdf)
totalsort <- totalsort[c(1:5, 49), ]
totalsort$totaldamage <- as.numeric(totalsort$totaldamage)
# 
par(mfrow = c(1, 3))
par(mar = c(13, 4, 4, 2) + 0.1, oma = c(0, 0, 3, 0))

barplot(propsort$propertydamage/1e+09, names.arg = propsort$eventtype, horiz = F, 
    las = 2, axes = F, cex.names = 1.2, ylim = c(0, 200), col = "chocolate", 
    ylab = "Property Damage by Event Type (in billions of USD)", cex.lab = 1.2, 
    main = "Four (4) of 48 Event Types Represent 80% of Property Damage", cex.main = 1.2)
axis(2, las = 2, cex.axis = 1)
tiles = c(sum(sum$propertydamage)/1e+09 * 0.2, sum(sum$propertydamage)/1e+09 * 
    0.35)
abline(h = tiles, lty = 3)
text(4, tiles, c("20% of total", "35%"), cex = 1.2, pos = 3, offset = 0.25)

barplot(cropsort$cropdamage/1e+09, names.arg = cropsort$eventtype, horiz = F, 
    las = 2, axes = F, cex.names = 1.2, ylim = c(0, 20), col = "chartreuse3", 
    ylab = "Crop Damage by Event Type (in billions of USD)", cex.lab = 1.2, 
    main = "Five (5) of 48 Event Types Represent 80% of Crop Damage", cex.main = 1.2)
axis(2, las = 2, cex.axis = 1)
tiles = c(sum(sum$cropdamage)/1e+09 * 0.2, sum(sum$cropdamage)/1e+09 * 0.3)
abline(h = tiles, lty = 3)
text(5, tiles, c("20% of total", "30%"), cex = 1.2, pos = 3, offset = 0.25)
text(4, 18, cex = 1.2, "Note: center panel scale is an order \nof magnitude less than either left or right")

barplot(totalsort$totaldamage/1e+09, names.arg = totalsort$eventtype, horiz = F, 
    las = 2, axes = F, cex.names = 1.2, ylim = c(0, 200), col = "lightskyblue", 
    ylab = "Total Damage by Event Type (in billions of USD)", cex.lab = 1.2, 
    main = "Five (5) of 48 Event Types Represent 80% of Total Damage", cex.main = 1.2)
axis(2, las = 2, cex.axis = 1)
tiles = c(sum(sum$totaldamage)/1e+09 * 0.2, sum(sum$totaldamage)/1e+09 * 0.35)
abline(h = tiles, lty = 3)
text(5, tiles, c("20% of total", "35%"), cex = 1.2, pos = 3, offset = 0.25)

par(mfrow = c(1, 1))
title(main = "Severe Weather Events with Greatest Economic Consequences", outer = TRUE)

plot of chunk economicconsequences

Figure 3. Impacts to Economic Indicators. Three panels showing the impacts to population health in terms of property damage (left panel), crop damage (center panel), and total damage (property and crop combined) (right panel). It is worth noting that the center panel uses a y-axis that is one order of magnitude less than either the left or right panel - simply to make the results more readable. All three distributions follow the Pareto Principle with results of about 10% of event types causing 80% of the impact. Floods, hurricanes, and tornadoes are key events causing damage. Drought and hail have greater impact to crop damage than in property damage.

Interpreting analysis results

At the outset, two questions were presented to direct this analysis.

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

With respect to the first questions, four event types are key causes of death and injury:

Tornadoes
Tropical Storms
Floods
Excessive Heat

Additionally heat, lightning, and rip currents have a greater impact on fatalities than on injuries.

With respect to the second question, three event types are key causes of property and crop damage:

Floods
Hurricanes
Tornadoes

Crop damage is also uniquely impacted more by drought than any other event type.

Lastly, it is very interesting to see that roughly 10-15% of the 48 severe weather event types are responsible for 80% of the population health and economic impact. This phenomenon may be leveraged by state, county, and city planners in developing their emergency preparedness plans. Focus and depth can be given to prevention and remediation plans for fewer event types rather than trying to address all the the possible severe weather possibilities.

References

[1] National Weather Service Storm Data Documentation. URL https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

[2] Coursera-Johns Hopkins University. NOAA Storm Data. URL https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

[3] R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/

[4] Pareto Principle. URL http://en.wikipedia.org/wiki/Pareto_principle

APPENDIX (Session and Platform Information)

sessionInfo()

## R version 3.1.0 (2014-04-10)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] xtable_1.7-3 knitr_1.5   
## 
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.5 formatR_0.10   stringr_0.6.2  tools_3.1.0