Performance Testing Version

This is a duplicate version of my National Weather Service stormData analysis, including performance instrumentation to illustrate the runtime of various steps in the analysis. By subsetting the data early in the analysis, it is possible to run the entire script on a Windows-based tablet computer (HP Envy X2, 1.8ghz Intel Atom processor, 2gb of RAM) in less than 5 minutes. Machines with faster CPU, disk, and more memory will complete the stormData analysis in less than 1 minute.

As an experiment we have also installed R and RStudio on an HP Chromebook with specifications similar to the HP Envy X2 to see how well the Chromebook performs relative to other computers.

To establish the the operating system on which this specific run of the script was based, we’ll start with a sessionInfo() function.

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.1  backports_1.1.0 magrittr_1.5    rprojroot_1.2  
##  [5] tools_3.4.1     htmltools_0.3.6 yaml_2.1.14     Rcpp_0.12.11   
##  [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.16      stringr_1.2.0  
## [13] digest_0.6.12   evaluate_0.10.1

Synopsis

The National Weather Service StormData data set describes the location, event type, and damage information for over 900,000 extreme weather events from 1950 to 2011. This study identifies the event types that are associated with the greatest amount of economic damage, as well as the greatest impact to population health. Because the database only tracked a small proportion of event types between 1950 and 1995, we restricted our analysis to January 1, 1996 through November 30, 2011.

Our analysis found the following across the 48 individual event types tracked between 1996 and 2011:

Understanding the Data

Per the NOAA website, the StormData database only has data for all 48 event types starting in 1996, even though the data set starts in 1950. See https://www.ncdc.noaa.gov/stormevents/details.jsp for a listing of the event types collected over time. Between 1950 and 1954 only Tornado event types were recorded. Between 1955 and 1995, data was collected on Tornado, Thunderstorm Wind, and Hail event types. Starting in 1996, data has been collected for all 48 event types listed in the National Weather Service Directive 10-1605 document.

The unit of analysis that defines an “observation” in the database is a storm event. Each event is assigned a unique reference number. An event has a start date and time, and an end date and time.

Research Questions

For the purposes of the course project, we must answer two questions:

  1. Across the United States, which types of events are most harmful to human health?
  2. Across the United States, which types of events have the greatest economic consequences?

To answer the research questions, we’ll need to complete the following steps.

  1. Decide whether to use the pre-1996 data, which contains only a small subset of total event types
  2. Establish the evaluation criteria for most harmful to health and greatest economic consequences
  3. Clean the event types, which do not exactly match the 48 event types described in Directive 10-1605
  4. Clean the property and crop damage values by applying the exponent factor
  5. Analyze the data and report the results

Consideration 1: Data Included in the Analysis

Given that the purpose of this study is to compare impact across event types, it is important that we use time period when all 48 event types were being tracked. Therefore, we will restrict our analysis to data collected as of January 1996.

Also, for each research question, we will restrict our analysis to the storms that have non-zero values for the evaluation criteria. That is, for most harmful to health, we will evaluate storms that had at least one injury or fatality, and for greatest economic consequences we will evaluate storms that had some property or crop damage reported.

Finally, we will include data for all state Federal Information Processing Standard (FIPS) codes provided in the input data set, because the U.S. territories are under the jurisdiction of the United States of America. American emergency responders and tax dollars are used to respond to emergencies and repair damage after storms within the territories. Since these storms impact the American people and economy, we include them in the analysis.

Consideration 2: Evaluation Criteria

For the harmful to human health criterion, we will analyze the number of people injured or killed by a storm, summing the injuries and fatalities as total number of people harmed. We will not attempt to assess relative severity of deaths and injuries for the following reasons.

  • Severity of injury varies widely, and the StormData data set does not include objective information about injuries by severity, and
  • The actuarial valuation of a fatality is beyond the scope of this course.

For the greatest economic consequences criterion, we will analyze total economic impact, as measured by the sum of property damage and crop damage.

Data Processing

Data for this project assignment is stored on cloudfront.net, and the following section of code downloads and unzips the file if it is not already present in the R Working Directory.

library(pryr)
startTime <- Sys.time() 
# download bz2 version of input data file if needed, and unzip
if(!file.exists("repdata-data-StormData.csv")){
     library(R.utils) 
     dlMethod <- "curl"
     if(substr(Sys.getenv("OS"),1,7) == "Windows") dlMethod <- "wininet"
     url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
     download.file(url,destfile='repdata-data-StormData.csv.bz2',method=dlMethod,mode="wb")
     # unzip first because it improves performance of read_csv()
     bunzip2("repdata-data-StormData.csv.bz2",
          "repdata-data-StormData.csv")
}
currentTime <- Sys.time()
paste("Total elapsed time - download:", round(currentTime - startTime,2),"seconds")
## [1] "Total elapsed time - download: 0 seconds"

Next, we load the data using readr. We specify the total number of rows: 902,297 so the function allocates memory efficiently. Since the National Weather Service Directive 10-1605 codebook does not concisely describe the data types for all the columns in the data set, we will read everything as character and convert individual columns to numeric or date as needed. We refer the reader to the codebook for a detailed explanation of each variable contained in the StormData database.

The property and crop damage values are stored as floating point numbers with 3 digits of precision plus an exponent. We standardize the numeric values by inflating them by one thousand, one million, or one billion based on the value of the PROPDMGEXP or CROPDMGEXP variables.

intervalStart <- Sys.time()
library(readr)
stormData <- read_csv("repdata-data-StormData.csv",
     n_max = 902297,
     col_names = TRUE,
     col_types = "ccccccccccccccccccccccccccccccccccccc")
intervalEnd <- Sys.time()
paste("read_csv took:",intervalEnd - intervalStart,attr(intervalEnd - intervalStart,"units"))
## [1] "read_csv took: 24.7505855560303 secs"
object_size(stormData)
## 545 MB
# subset data to 1996 or newer, and print min and max dates
stormData$BGN_DATE <- as.Date(stormData$BGN_DATE,"%m/%d/%Y %T")
stormData <- stormData[stormData$BGN_DATE > as.Date("12/31/1995","%m/%d/%Y"),]
min(stormData$BGN_DATE)
## [1] "1996-01-01"
max(stormData$BGN_DATE)
## [1] "2011-11-30"
message("Property Damage Exponent Values")
## Property Damage Exponent Values
table(stormData$PROPDMGEXP)
## 
##      0      B      K      M 
##      1     32 369938   7374
message("Crop Damage Exponent Values")
## Crop Damage Exponent Values
table(stormData$CROPDMGEXP)
## 
##      B      K      M 
##      4 278686   1771
# convert required variables to numeric
stormData$FATALITIES <- as.numeric(stormData$FATALITIES)
stormData$INJURIES <- as.numeric(stormData$INJURIES)
stormData$PROPDMG <- as.numeric(stormData$PROPDMG)
stormData$CROPDMG <- as.numeric(stormData$CROPDMG)
stormData$REFNUM <- as.numeric(stormData$REFNUM)

# adjust by exponents
stormData[which(stormData$PROPDMGEXP == "K"),"PROPDMG"] <- stormData[which(stormData$PROPDMGEXP == "K"),"PROPDMG"] * 1000
stormData[which(stormData$PROPDMGEXP == "M"),"PROPDMG"] <- stormData[which(stormData$PROPDMGEXP == "M"),"PROPDMG"] * 1000000
stormData[which(stormData$PROPDMGEXP == "B"),"PROPDMG"] <- stormData[which(stormData$PROPDMGEXP == "B"),"PROPDMG"] * 1000000000

stormData[which(stormData$CROPDMGEXP == "K"),"CROPDMG"] <- stormData[which(stormData$CROPDMGEXP == "K"),"CROPDMG"] * 1000
stormData[which(stormData$CROPDMGEXP == "M"),"CROPDMG"] <- stormData[which(stormData$CROPDMGEXP == "M"),"CROPDMG"] * 1000000
stormData[which(stormData$CROPDMGEXP == "B"),"CROPDMG"] <- stormData[which(stormData$CROPDMGEXP == "B"),"CROPDMG"] * 1000000000

# uppercase EVTYPE and remove leading / trailing blanks
stormData$EVTYPE <- trimws(toupper(stormData$EVTYPE))

# calculate total humans harmed
stormData$humansHarmed <- stormData$FATALITIES + stormData$INJURIES

# calculate economic impact 
stormData$econImpact <- stormData$PROPDMG + stormData$CROPDMG
currentTime <- Sys.time()
paste("Current interval time - read raw data:",round(currentTime - intervalStart,2),attr(currentTime - intervalStart,"units"))
## [1] "Current interval time - read raw data: 56.29 secs"
paste("Total elapsed time:",round(currentTime - startTime,2),attr(currentTime - startTime,"units"))
## [1] "Total elapsed time: 56.37 secs"

Preparing the data for analysis

To prepare the data for analysis, we must complete two steps, including:

  1. Subset the data to events containing the variable of interest (fatalities / injuries, or economic impact), and
  2. For events in the subset, standardize them so they match the list of 48 event types, and no “non-standard” event types are included in the analysis reports.

Calculating harm to population health

The data set includes two variables related to population health: FATALITIES and INJURIES. We will assess the harm by analyzing event types relative to the total proportion of fatalities and injuries. We will use a Pareto chart to identify the event types with the greatest impact on humans. The Pareto chart combines a frequency bar chart with a cumulative percentage line graph, sorted by descending frequency. Since the qcc package includes a Pareto chart, we’ll use the package instead of designing a custom Pareto chart.

Calculating economic damage

We will combine information from the damage amounts and damage EXP columns to calculate the damage in a consistent scale across events, and sum across property and crops. We will assess the economic damage by analyzing event types relative to the total amount of damage across all event types. We will also use a Pareto chart to assess the event types, displaying total damage as the bar chart variable, and combining it with a cumulative percentage line graph.

Cleaning the event types: coding strategy

The data suffers from numerous data entry errors that cause the event types to fail to match the list of 48 types described in National Weather Service Directive 10-1605. We use a combination of regular expressions and direct assignment statements to recode the data to ensure all event types in our data match the event type names in the NWS Directive document.

We clean the event types after subsetting for our two questions: events harmful to human health, and events causing economic damage. We subset before cleaning to reduce the number of event types that must be cleaned prior to analysis.

To ensure all event types in the analysis conform to the NWS Directive list, the following code builds a data frame containing the event types and designators from The Storm Data Event Table listed on page 6 of National Weather Service Directive 10-1605. A description of the technique used below to read the data inline as is frequently done with SAS or SPSS is Bob Muenchen’s R for SAS and SPSS Users, page 14.

intervalStart <- Sys.time()
#
# read text data in R
#
theStormClasses <- 
     "eventType|designator
Astronomical Low Tide |Z 
Avalanche |Z 
Blizzard |Z 
Coastal Flood |Z 
Cold/Wind Chill |Z 
Debris Flow |C 
Dense Fog |Z 
Dense Smoke |Z 
Drought |Z 
Dust Devil |C 
Dust Storm |Z 
Excessive Heat |Z 
Extreme Cold/Wind Chill |Z 
Flash Flood |C 
Flood |C 
Frost/Freeze |Z 
Funnel Cloud |C 
Freezing Fog |Z 
Hail |C 
Heat |Z 
Heavy Rain  |C 
Heavy Snow |Z 
High Surf |Z 
High Wind |Z
Hurricane (Typhoon) |Z 
Ice Storm |Z 
Lake-Effect Snow |Z 
Lakeshore Flood |Z 
Lightning  |C 
Marine Hail |M 
Marine High Wind |M 
Marine Strong Wind |M 
Marine Thunderstorm Wind |M 
Rip Current |Z
Seiche |Z 
Sleet |Z 
Storm Surge/Tide |Z 
Strong Wind |Z 
Thunderstorm Wind  |C 
Tornado  |C 
Tropical Depression |Z 
Tropical Storm |Z 
Tsunami |Z 
Volcanic Ash |Z 
Waterspout |M 
Wildfire |Z 
Winter Storm |Z 
Winter Weather |Z"

stormClasses <- read.table(textConnection(theStormClasses),
     header=TRUE,sep="|",stringsAsFactors=FALSE)
stormClasses$eventType <- toupper(trimws(stormClasses$eventType))
stormClasses$designator <- trimws(stormClasses$designator)
currentTime <- Sys.time()
paste("Current interval time - build stormClasses table:",round(currentTime - intervalStart,2),attr(currentTime - intervalStart,"units"))
## [1] "Current interval time - build stormClasses table: 0.04 secs"
paste("Total elapsed time:",round(currentTime - startTime,2),attr(currentTime - startTime,"units"))
## [1] "Total elapsed time: 56.51 secs"

Event Types Most Harmful to Population Health

Here we reduce the data set to the events that killed or injured at least one person, then clean the event types so they conform to the list of 48 event types that are listed in the codebook provided by the National Weather Service. There are 12,764 events that caused at least one injury or death between January 1, 1996 and November 30, 2011.

intervalStart <- Sys.time()
humanHarm <- stormData[stormData$humansHarmed > 0, ]
nrow(humanHarm)
## [1] 12764
humanHarm$incidents <- 1
#
# Recoded event types here
#
humanHarm$EVTYPE <- gsub("AVALANCE","AVALANCHE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("BLOWING SNOW","BLIZZARD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COASTAL FLOODING","COASTAL FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COASTAL FLOOD/EROSION","COASTAL FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COASTALSTORM","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COASTAL STORM","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("BLACK ICE","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("BRUSH FIRE","WILDFIRE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COLD TEMPERATURE","cold/wind CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COLD WEATHER$","cold/wind CHILL",humanHarm$EVTYPE,fixed=TRUE)
humanHarm$EVTYPE <- gsub("COLD AND SNOW","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("DROWNING","FLASH FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("COLD WEATHER$","cold/wind CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("DRY MICROBURST","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^FOG","DENSE FOG",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("EXCESSIVE SNOW","HEAVY SNOW",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("EXTREME COLD$","EXTREME COLD/WIND CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("EXTENDED COLD$","cold/wind CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("EXTREME WINDCHILL","EXTREME COLD/WIND CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^FREEZING","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("FROST$","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("GLAZE","ICE STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("GUSTY WINDS","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("GUSTY WIND","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^HAZARDOUS SURF","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HEAT WAVE","HEAT",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^HEAVY SURF AND WIND","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^HEAVY SURF$","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^HEAVY SURF/HIGH SURF","HIGH SURF",humanHarm$EVTYPE)
# NOTE there is no reasonable equivalent to high seas other than high surf
humanHarm$EVTYPE <- gsub("HEAVY SEAS","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HIGH SEAS","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HIGH SWELLS","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HIGH WATER","FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HURRICANE$","HURRICANE (TYPHOON)",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HURRICANE EDOUARD","HURRICANE (TYPHOON)",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HURRICANE/TYPHOON$","HURRICANE (TYPHOON)",humanHarm$EVTYPE)
# Next item miscoded hypothermia as hyperthermia
humanHarm$EVTYPE <- gsub("^HYPERTHERMIA/EXPOSURE","cold/wind CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^HYPOTHERMIA/EXPOSURE","cold/wind CHILL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("ICE ON ROAD","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^ICE ROADS","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^ICY ROADS","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("FALLING SNOW/ICE","AVALANCHE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("HEAVY SNOW SHOWER$","HEAVY SNOW",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("LANDSLIDES","HEAVY RAIN",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("LANDSLIDE$","HEAVY RAIN",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("LIGHT SNOW","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("MARINE ACCIDENT","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("MARINE TSTM WIND","MARINE THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("MIXED PRECIP","FROST/FREEZE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("MUDSLIDE$","HEAVY RAIN",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("MUDSLIDES$","HEAVY RAIN",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("NON TSTM WIND$","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("NON-SEVERE WIND DAMAGE$","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("OTHER","DUST DEVIL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("RAIN/SNOW","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("RECORD HEAT","EXCESSIVE HEAT",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("RIP CURRENTS","RIP CURRENT",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("RIVER FLOODING$","FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("RIVER FLOOD$","FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("ROGUE WAVE","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("ROUGH SEAS","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("ROUGH SURF","HIGH SURF",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("SMALL HAIL","HAIL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("SNOW AND ICE","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("SNOW SQUALLS$","BLIZZARD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("SNOW SQUALL$","BLIZZARD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("STORM SURGE$","STORM SURGE/TIDE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("STRONG WINDS","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("TIDAL FLOODING$","STORM SURGE/TIDE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("TORRENTIAL RAINFALL","HEAVY RAIN",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("THUNDERSTORM WIND (.*)","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("THUNDERSTORM$","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^TSTM WIND (.*)","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^TSTM WIND (G45)","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^TSTM WIND (G40)","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("TSTM WIND/HAIL$","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("TSTM WIND$","THUNDERSTORM WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^TYPHOON","HURRICANE (TYPHOON)",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("UNSEASONABLY WARM","HEAT",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("URBAN/SML STREAM FLD","FLOOD",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WARM WEATHER","HEAT",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WHIRLWIND","DUST DEVIL",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WILD/FOREST FIRE$","WILDFIRE",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINDS$","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("^WIND","HIGH WIND",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER STORM AND ICE","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER STORM DRIZZLE","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER STORM RAIN","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER STORM SPRAY","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER STORM SQUALLS$","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER STORM SQUALL$","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER WEATHER MIX","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTER WEATHER/MIX","WINTER STORM",humanHarm$EVTYPE)
humanHarm$EVTYPE <- gsub("WINTRY MIX","WINTER STORM",humanHarm$EVTYPE)
humanHarm[humanHarm$EVTYPE == "COLD","EVTYPE"] <- "COLD/WIND CHILL"
humanHarm[humanHarm$EVTYPE == "SNOW","EVTYPE"] <- "WINTER STORM"
#
# end recoded event types
# 

humanHarm$EVTYPE <- toupper(humanHarm$EVTYPE)
#
# compare to NWS Storm Event Table, mismatches should be zero
# 
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
sqldf("select EVTYPE from humanHarm where EVTYPE not 
     in(select eventType from stormClasses)")
## [1] EVTYPE
## <0 rows> (or 0-length row.names)
harmByEvtype <- aggregate(cbind(humansHarmed,incidents) ~ EVTYPE,
     data = humanHarm, FUN="sum")

# calculate percentages so we can conduct Pareto analysis 
harmByEvtype$pctHarm <- harmByEvtype$humansHarmed / sum(harmByEvtype$humansHarmed)
harmByEvtype$pctOfIncidents <- harmByEvtype$incidents / sum(harmByEvtype$incidents)
# order by descending percentage of people harmed
harmByEvtype <- harmByEvtype[order(-harmByEvtype$pctHarm),]
currentTime <- Sys.time()
paste("Current interval time - subset humanHarm and recode EVTYPE:",round(currentTime - intervalStart,2),attr(currentTime - intervalStart,"units"))
## [1] "Current interval time - subset humanHarm and recode EVTYPE: 5.22 secs"
paste("Total elapsed time:",round(currentTime - startTime,2),attr(currentTime - startTime,"units"))
## [1] "Total elapsed time: 1.03 mins"

Event Types with Greatest Economic Consequences

Here we reduce the data set to the events that caused property or crop damage, then clean the event types so they conform to the list of 48 event types that are listed in the codebook provided by the National Weather Service. There are 194,525 events that caused at least $1 of economic impact between January 1, 1996 and November 30, 2011.

intervalStart <- Sys.time()
damageData <- stormData[stormData$econImpact > 0, ]
nrow(damageData)
## [1] 194525
damageData$incidents <- 1
#
# recode EVTYPE to match code book for 48 event types
#
damageData$EVTYPE <- gsub("AVALANCE","AVALANCHE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("BLOWING SNOW","BLIZZARD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COASTAL FLOODING","COASTAL FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COASTAL FLOOD/EROSION","COASTAL FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COASTALSTORM","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COASTAL STORM","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("BLACK ICE","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("BRUSH FIRE","WILDFIRE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COLD TEMPERATURE","cold/wind CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COLD WEATHER$","cold/wind CHILL",damageData$EVTYPE,fixed=TRUE)
damageData$EVTYPE <- gsub("COLD AND SNOW","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("DROWNING","FLASH FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("COLD WEATHER$","cold/wind CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("DRY MICROBURST","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^FOG","DENSE FOG",damageData$EVTYPE)
damageData$EVTYPE <- gsub("EXCESSIVE SNOW","HEAVY SNOW",damageData$EVTYPE)
damageData$EVTYPE <- gsub("EXTREME COLD$","EXTREME COLD/WIND CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("EXTENDED COLD$","cold/wind CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("EXTREME WINDCHILL","EXTREME COLD/WIND CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^FREEZING","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("FROST$","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("GLAZE","ICE STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("GUSTY WINDS","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("GUSTY WIND","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^HAZARDOUS SURF","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HEAT WAVE","HEAT",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^HEAVY SURF AND WIND","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^HEAVY SURF$","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^HEAVY SURF/HIGH SURF","HIGH SURF",damageData$EVTYPE)
# NOTE there is no reasonable equivalent to high seas other than high surf
damageData$EVTYPE <- gsub("HEAVY SEAS","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HIGH SEAS","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HIGH SWELLS","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HIGH WATER","FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HURRICANE$","HURRICANE (TYPHOON)",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HURRICANE EDOUARD","HURRICANE (TYPHOON)",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HURRICANE/TYPHOON$","HURRICANE (TYPHOON)",damageData$EVTYPE)
# Next item miscoded hypothermia as hyperthermia
damageData$EVTYPE <- gsub("^HYPERTHERMIA/EXPOSURE","cold/wind CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^HYPOTHERMIA/EXPOSURE","cold/wind CHILL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("ICE ON ROAD","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^ICE ROADS","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^ICY ROADS","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("FALLING SNOW/ICE","AVALANCHE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("HEAVY SNOW SHOWER$","HEAVY SNOW",damageData$EVTYPE)
damageData$EVTYPE <- gsub("LANDSLIDES","HEAVY RAIN",damageData$EVTYPE)
damageData$EVTYPE <- gsub("LANDSLIDE$","HEAVY RAIN",damageData$EVTYPE)
damageData$EVTYPE <- gsub("LIGHT SNOW","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("MARINE ACCIDENT","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("MARINE TSTM WIND","MARINE THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("MIXED PRECIP","FROST/FREEZE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("MUDSLIDE$","HEAVY RAIN",damageData$EVTYPE)
damageData$EVTYPE <- gsub("MUDSLIDES$","HEAVY RAIN",damageData$EVTYPE)
damageData$EVTYPE <- gsub("NON TSTM WIND$","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("NON-SEVERE WIND DAMAGE$","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("OTHER","DUST DEVIL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("RAIN/SNOW","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("RECORD HEAT","EXCESSIVE HEAT",damageData$EVTYPE)
damageData$EVTYPE <- gsub("RIP CURRENTS","RIP CURRENT",damageData$EVTYPE)
damageData$EVTYPE <- gsub("RIVER FLOODING$","FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("RIVER FLOOD$","FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("ROGUE WAVE","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("ROUGH SEAS","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("ROUGH SURF","HIGH SURF",damageData$EVTYPE)
damageData$EVTYPE <- gsub("SMALL HAIL","HAIL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("SNOW AND ICE","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("SNOW SQUALLS$","BLIZZARD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("SNOW SQUALL$","BLIZZARD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("STORM SURGE$","STORM SURGE/TIDE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("STRONG WINDS","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("TIDAL FLOODING$","STORM SURGE/TIDE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("TORRENTIAL RAINFALL","HEAVY RAIN",damageData$EVTYPE)
damageData$EVTYPE <- gsub("THUNDERSTORM WIND (.*)","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("THUNDERSTORM$","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^TSTM WIND (.*)","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^TSTM WIND (G45)","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^TSTM WIND (G40)","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("TSTM WIND/HAIL$","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("TSTM WIND$","THUNDERSTORM WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^TYPHOON","HURRICANE (TYPHOON)",damageData$EVTYPE)
damageData$EVTYPE <- gsub("UNSEASONABLY WARM","HEAT",damageData$EVTYPE)
damageData$EVTYPE <- gsub("URBAN/SML STREAM FLD","FLOOD",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WARM WEATHER","HEAT",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WHIRLWIND","DUST DEVIL",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WILD/FOREST FIRE$","WILDFIRE",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINDS$","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("^WIND","HIGH WIND",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER STORM AND ICE","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER STORM DRIZZLE","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER STORM RAIN","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER STORM SPRAY","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER STORM SQUALLS$","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER STORM SQUALL$","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER WEATHER MIX","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTER WEATHER/MIX","WINTER STORM",damageData$EVTYPE)
damageData$EVTYPE <- gsub("WINTRY MIX","WINTER STORM",damageData$EVTYPE)
damageData[damageData$EVTYPE == "COLD","EVTYPE"] <- "COLD/WIND CHILL"
damageData[damageData$EVTYPE == "SNOW","EVTYPE"] <- "WINTER STORM"
damageData[damageData$EVTYPE == "AGRICULTURAL FREEZE","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "ASTRONOMICAL HIGH TIDE","EVTYPE"] <- "STORM SURGE/TIDE"
damageData[damageData$EVTYPE == "BEACH EROSION","EVTYPE"] <- "STORM SURGE/TIDE"
damageData[damageData$EVTYPE == "BLOWING DUST","EVTYPE"] <- "DUST STORM"
damageData[damageData$EVTYPE == "COASTAL  FLOODING/EROSION","EVTYPE"] <- "COASTAL FLOOD"
damageData[damageData$EVTYPE == "COASTAL EROSION","EVTYPE"] <- "COASTAL FLOOD"
damageData[damageData$EVTYPE == "DAM BREAK","EVTYPE"] <- "HEAVY RAIN"
damageData[damageData$EVTYPE == "DAMAGING FREEZE","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "DOWNBURST","EVTYPE"] <- "THUNDERSTORM WIND"
damageData[damageData$EVTYPE == "EARLY FROST/FREEZE","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "EROSION/CSTL FLOOD","EVTYPE"] <- "STORM SURGE/TIDE"
damageData[damageData$EVTYPE == "FLOOD/FLASH/FLOOD","EVTYPE"] <- "FLASH FLOOD"
damageData[damageData$EVTYPE == "FLASH FLOOD/FLOOD","EVTYPE"] <- "FLASH FLOOD"
damageData[damageData$EVTYPE == "FREEZE","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "FROST/FREEZEFALL","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "FROST/FREEZEITATION","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "GRADIENT WIND","EVTYPE"] <- "HIGH WIND"
damageData[damageData$EVTYPE == "HARD FREEZE","EVTYPE"] <- "FROST/FREEZE"
damageData[damageData$EVTYPE == "HEAVY RAIN/HIGH SURF","EVTYPE"] <- "STORM SURGE/TIDE"
damageData[damageData$EVTYPE == "HIGH HIGH WIND","EVTYPE"] <- "HIGH WIND"
damageData$EVTYPE <- gsub("^HIGH WIND(.*)","HIGH WIND",damageData$EVTYPE)
damageData[damageData$EVTYPE == "HIGH SURF ADVISORY","EVTYPE"] <- "HIGH SURF"
damageData[damageData$EVTYPE == "ICE JAM FLOOD (MINOR","EVTYPE"] <- "FLOOD"
damageData[damageData$EVTYPE == "LAKE EFFECT SNOW","EVTYPE"] <- "LAKE-EFFECT SNOW"
damageData[damageData$EVTYPE == "LANDSLUMP","EVTYPE"] <- "COASTAL FLOOD"
damageData[damageData$EVTYPE == "LANDSPOUT","EVTYPE"] <- "DUST DEVIL"
damageData[damageData$EVTYPE == "LATE SEASON SNOW","EVTYPE"] <- "WINTER STORM"
damageData[damageData$EVTYPE == "LIGHT FREEZING RAIN","EVTYPE"] <- "WINTER STORM"
damageData[damageData$EVTYPE == "MICROBURST","EVTYPE"] <- "THUNDERSTORM WIND"
damageData[damageData$EVTYPE == "MUD SLIDE","EVTYPE"] <- "HEAVY RAIN"
damageData[damageData$EVTYPE == "NON-THUNDERSTORM WIND","EVTYPE"] <- "HIGH WIND"
damageData[damageData$EVTYPE == "RAIN","EVTYPE"] <- "HEAVY RAIN"
damageData[damageData$EVTYPE == "ROCK SLIDE","EVTYPE"] <- "HEAVY RAIN"
damageData[damageData$EVTYPE == "UNSEASONABLE COLD","EVTYPE"] <- "COLD/WIND CHILL"
damageData[damageData$EVTYPE == "UNSEASONABLY COLD","EVTYPE"] <- "COLD/WIND CHILL"
damageData[damageData$EVTYPE == "UNSEASONAL RAIN","EVTYPE"] <- "HEAVY RAIN"
damageData[damageData$EVTYPE == "WET MICROBURST","EVTYPE"] <- "THUNDERSTORM WIND"
damageData[damageData$EVTYPE == "WINTER STORM FOG","EVTYPE"] <- "DENSE FOG"
#
#
# end recoding
damageData$EVTYPE <- toupper(damageData$EVTYPE)
#
# compare to NWS Storm Event Table, mismatches should be zero
# 

sqldf("select EVTYPE from damageData where EVTYPE not 
     in(select eventType from stormClasses)")
## [1] EVTYPE
## <0 rows> (or 0-length row.names)
damageByEvtype <- aggregate(cbind(econImpact,incidents) ~ EVTYPE,
     data = damageData,FUN="sum")

# calculate percentages so we can conduct Pareto analysis
damageByEvtype$pctDamage <- 
     damageByEvtype$econImpact / sum(damageByEvtype$econImpact)
damageByEvtype$pctOfIncidents <- 
     damageByEvtype$incidents / sum(damageByEvtype$incidents)
damageByEvtype <- damageByEvtype[order(-damageByEvtype$pctDamage),]
currentTime <- Sys.time()
paste("Current interval time- subset damageData and recode EVTYPE:",round(currentTime - intervalStart,2),attr(currentTime - intervalStart,"units"))
## [1] "Current interval time- subset damageData and recode EVTYPE: 30.67 secs"
paste("Total elapsed time:",round(currentTime - startTime,2),attr(currentTime - startTime,"units"))
## [1] "Total elapsed time: 1.55 mins"

Results

Event types causing most harm to humans

Tornadoes cause the most harm to humans, with 1,968 incidents that killed or injured 22,178 people over the almost 15 years within our analysis. Tornadoes accounted for 33% of the total number of people killed or injured by storms.

The top 5 event types account for 72% of total injuries and deaths. Of the top five event types, only Excessive Heat and Flood are of the nature that public safety officials can prepare the public well in advance to minimize the impact to humans. For Tornadoes, thunderstorm wind, and lightning, rapid warning systems are required to alert people quickly to take cover when these types of events are imminent.

intervalStart <- Sys.time()

harmByEvtype[1:5,]
##               EVTYPE humansHarmed incidents    pctHarm pctOfIncidents
## 31           TORNADO        22178      1968 0.33246886     0.15418364
## 9     EXCESSIVE HEAT         8190       666 0.12277572     0.05217800
## 12             FLOOD         7285       420 0.10920893     0.03290505
## 30 THUNDERSTORM WIND         5536      2330 0.08298979     0.18254466
## 23         LIGHTNING         4792      2649 0.07183654     0.20753682
harm <- harmByEvtype$humansHarmed
names(harm) <- harmByEvtype$EVTYPE
library(qcc)
## Package 'qcc', version 2.6
## Type 'citation("qcc")' for citing this R package in publications.
par(mar=c(3,1,1,1))
par(las=2)
pareto.chart(harm,
     ylab="Total Injuries & Deaths",
     main="Injuries and Deaths - All Event Types, 1996 - 2011",
     ylab2 = "Cum. Percentage",
     cex.names=0.5)

##                           
## Pareto chart analysis for harm
##                            Frequency Cum.Freq.   Percentage Cum.Percent.
##   TORNADO                      22178     22178 33.246885634     33.24689
##   EXCESSIVE HEAT                8190     30368 12.277572069     45.52446
##   FLOOD                         7285     37653 10.920892860     56.44535
##   THUNDERSTORM WIND             5536     43189  8.298979118     64.74433
##   LIGHTNING                     4792     47981  7.183653889     71.92798
##   FLASH FLOOD                   2562     50543  3.840676391     75.76866
##   WINTER STORM                  1788     52331  2.680378371     78.44904
##   HEAT                          1548     53879  2.320596039     80.76963
##   WILDFIRE                      1545     55424  2.316098760     83.08573
##   HIGH WIND                     1473     56897  2.208164061     85.29390
##   HURRICANE (TYPHOON)           1453     58350  2.178182200     87.47208
##   RIP CURRENT                   1045     59395  1.566552236     89.03863
##   DENSE FOG                      924     60319  1.385161977     90.42379
##   HEAVY SNOW                     809     61128  1.212766276     91.63656
##   HAIL                           730     61858  1.094337926     92.73090
##   ICE STORM                      613     62471  0.918944039     93.64984
##   BLIZZARD                       495     62966  0.742051059     94.39189
##   HEAVY RAIN                     426     63392  0.638613639     95.03051
##   HIGH SURF                      415     63807  0.622123615     95.65263
##   TROPICAL STORM                 395     64202  0.592141754     96.24477
##   DUST STORM                     387     64589  0.580149010     96.82492
##   AVALANCHE                      381     64970  0.571154452     97.39608
##   STRONG WIND                    381     65351  0.571154452     97.96723
##   WINTER WEATHER                 376     65727  0.563658986     98.53089
##   EXTREME COLD/WIND CHILL        365     66092  0.547168963     99.07806
##   TSUNAMI                        162     66254  0.242853074     99.32091
##   COLD/WIND CHILL                150     66404  0.224863957     99.54577
##   FROST/FREEZE                    88     66492  0.131920188     99.67769
##   STORM SURGE/TIDE                56     66548  0.083949211     99.76164
##   MARINE THUNDERSTORM WIND        53     66601  0.079451932     99.84110
##   DUST DEVIL                      46     66647  0.068958280     99.91005
##   MARINE STRONG WIND              36     66683  0.053967350     99.96402
##   COASTAL FLOOD                   13     66696  0.019488210     99.98351
##   DROUGHT                          4     66700  0.005996372     99.98951
##   WATERSPOUT                       4     66704  0.005996372     99.99550
##   MARINE HIGH WIND                 2     66706  0.002998186     99.99850
##   FUNNEL CLOUD                     1     66707  0.001499093    100.00000
currentTime <- Sys.time()
paste("Current interval time - humanHarm Pareto analysis :",round(currentTime - intervalStart,2),attr(currentTime - intervalStart,"units"))
## [1] "Current interval time - humanHarm Pareto analysis : 0.53 secs"
paste("Total elapsed time:",round(currentTime - startTime,2),attr(currentTime - startTime,"units"))
## [1] "Total elapsed time: 1.56 mins"

Event types causing most economic damage

Floods cause the greatest economic damage, with 10,170 incidents causing a total of $149.1 billion in damage over the 15 years included within our analysis. Hurricanes and Storm Surges / Tides were second and third in terms of total damage. Together these three event types account for 71% of the total $401.5 billion in property and/or crop damage from storms in the United States between January 1996 and November 2011, or $284 billion.

Given that the event types causing the greatest economic impact are water-driven events, mitigation of their impact would require large investments in sea walls, dams, and other types of water management infrastructure. Further analysis of the geographic distribution of these types of events would be required to prioritize the needed infrastructure investments.

intervalStart <- Sys.time()
damageByEvtype[1:5,]
##                 EVTYPE   econImpact incidents  pctDamage pctOfIncidents
## 14               FLOOD 149142742700     10170 0.37144505    0.052281198
## 23 HURRICANE (TYPHOON)  87068996810       199 0.21684828    0.001023005
## 34    STORM SURGE/TIDE  47876317000       227 0.11923759    0.001166945
## 37             TORNADO  24900370720     12147 0.06201522    0.062444416
## 17                HAIL  17092035870     22621 0.04256829    0.116288395
sum(damageByEvtype$econImpact / 10^9)
## [1] 401.5203
sum(damageByEvtype$econImpact[1:3] / 10^9)
## [1] 284.0881
damage <- damageByEvtype$econImpact / 10^9
names(damage) <- damageByEvtype$EVTYPE
par(mar=c(3,1,1,1))
par(las=2)
pareto.chart(damage,
     ylab="Damage (in billions USD)",
     main="Economic Impact - All Event Types, 1996 - 2011",
     cex.names=0.5,
     ylab2 = "Cum. Percentage")

##                           
## Pareto chart analysis for damage
##                               Frequency Cum.Freq.   Percentage
##   FLOOD                    149.14274270  149.1427 3.714450e+01
##   HURRICANE (TYPHOON)       87.06899681  236.2117 2.168483e+01
##   STORM SURGE/TIDE          47.87631700  284.0881 1.192376e+01
##   TORNADO                   24.90037072  308.9884 6.201522e+00
##   HAIL                      17.09203587  326.0805 4.256829e+00
##   FLASH FLOOD               16.55717061  342.6376 4.123619e+00
##   DROUGHT                   14.41366700  357.0513 3.589773e+00
##   THUNDERSTORM WIND          8.93230308  365.9836 2.224620e+00
##   TROPICAL STORM             8.32018655  374.3038 2.072171e+00
##   WILDFIRE                   8.16270463  382.4665 2.032949e+00
##   HIGH WIND                  5.88972595  388.3562 1.466856e+00
##   ICE STORM                  3.65805881  392.0143 9.110519e-01
##   HEAVY RAIN                 1.67066134  393.6849 4.160839e-01
##   WINTER STORM               1.55509775  395.2400 3.873024e-01
##   FROST/FREEZE               1.39118720  396.6312 3.464799e-01
##   EXTREME COLD/WIND CHILL    1.35518640  397.9864 3.375138e-01
##   LIGHTNING                  0.74997552  398.7364 1.867839e-01
##   HEAVY SNOW                 0.70748464  399.4439 1.762014e-01
##   BLIZZARD                   0.53283395  399.9767 1.327041e-01
##   EXCESSIVE HEAT             0.50012570  400.4768 1.245580e-01
##   COASTAL FLOOD              0.39157556  400.8684 9.752322e-02
##   STRONG WIND                0.23971295  401.1081 5.970132e-02
##   TSUNAMI                    0.14408200  401.2522 3.588411e-02
##   HIGH SURF                  0.09544450  401.3476 2.377078e-02
##   LAKE-EFFECT SNOW           0.04018200  401.3878 1.000746e-02
##   WINTER WEATHER             0.03586600  401.4237 8.932549e-03
##   COLD/WIND CHILL            0.03338650  401.4571 8.315021e-03
##   DENSE FOG                  0.02264650  401.4797 5.640187e-03
##   DUST STORM                 0.00859400  401.4883 2.140365e-03
##   LAKESHORE FLOOD            0.00754000  401.4959 1.877863e-03
##   MARINE THUNDERSTORM WIND   0.00590740  401.5018 1.471258e-03
##   WATERSPOUT                 0.00573020  401.5075 1.427126e-03
##   AVALANCHE                  0.00371180  401.5112 9.244363e-04
##   DUST DEVIL                 0.00177253  401.5130 4.414546e-04
##   TROPICAL DEPRESSION        0.00173700  401.5147 4.326057e-04
##   HEAT                       0.00170650  401.5164 4.250096e-04
##   MARINE HIGH WIND           0.00129701  401.5177 3.230247e-04
##   SEICHE                     0.00098000  401.5187 2.440723e-04
##   VOLCANIC ASH               0.00050000  401.5192 1.245267e-04
##   MARINE STRONG WIND         0.00041833  401.5196 1.041865e-04
##   ASTRONOMICAL LOW TIDE      0.00032000  401.5199 7.969708e-05
##   RIP CURRENT                0.00016300  401.5201 4.059570e-05
##   FUNNEL CLOUD               0.00013410  401.5202 3.339806e-05
##   DENSE SMOKE                0.00010000  401.5203 2.490534e-05
##   MARINE HAIL                0.00000400  401.5203 9.962135e-07
##                           
## Pareto chart analysis for damage
##                            Cum.Percent.
##   FLOOD                        37.14450
##   HURRICANE (TYPHOON)          58.82933
##   STORM SURGE/TIDE             70.75309
##   TORNADO                      76.95461
##   HAIL                         81.21144
##   FLASH FLOOD                  85.33506
##   DROUGHT                      88.92483
##   THUNDERSTORM WIND            91.14945
##   TROPICAL STORM               93.22163
##   WILDFIRE                     95.25457
##   HIGH WIND                    96.72143
##   ICE STORM                    97.63248
##   HEAVY RAIN                   98.04857
##   WINTER STORM                 98.43587
##   FROST/FREEZE                 98.78235
##   EXTREME COLD/WIND CHILL      99.11986
##   LIGHTNING                    99.30665
##   HEAVY SNOW                   99.48285
##   BLIZZARD                     99.61555
##   EXCESSIVE HEAT               99.74011
##   COASTAL FLOOD                99.83763
##   STRONG WIND                  99.89733
##   TSUNAMI                      99.93322
##   HIGH SURF                    99.95699
##   LAKE-EFFECT SNOW             99.96700
##   WINTER WEATHER               99.97593
##   COLD/WIND CHILL              99.98424
##   DENSE FOG                    99.98988
##   DUST STORM                   99.99202
##   LAKESHORE FLOOD              99.99390
##   MARINE THUNDERSTORM WIND     99.99537
##   WATERSPOUT                   99.99680
##   AVALANCHE                    99.99773
##   DUST DEVIL                   99.99817
##   TROPICAL DEPRESSION          99.99860
##   HEAT                         99.99902
##   MARINE HIGH WIND             99.99935
##   SEICHE                       99.99959
##   VOLCANIC ASH                 99.99972
##   MARINE STRONG WIND           99.99982
##   ASTRONOMICAL LOW TIDE        99.99990
##   RIP CURRENT                  99.99994
##   FUNNEL CLOUD                 99.99997
##   DENSE SMOKE                 100.00000
##   MARINE HAIL                 100.00000
currentTime <- Sys.time()
paste("Current interval time - damageData Pareto analysis:",round(currentTime - intervalStart,2),attr(currentTime - intervalStart,"units"))
## [1] "Current interval time - damageData Pareto analysis: 0.11 secs"
paste("Total elapsed time:",round(currentTime - startTime,2),attr(currentTime - startTime,"units"))
## [1] "Total elapsed time: 1.57 mins"

References

Kabacoff, Robert (2015) – R in Action: Data Analysis and Graphics with R, Second Edition. Manning Publications, Shelter Island, NY 2015.

Muenchen, Robert (2006) – R for SAS and SPSS Users, PDF document retrieved from the internet on October 19, 2015.

National Oceanic and Atmospheric Administration of the United States – National Weather Service Directives, retrieved from the internet on October 15, 2015.

National Oceanic and Atmospheric Administration of the United States – National Weather Service Directive 10-1605, retrieved from the intenet on October 10, 2015.

Scrucca, L. (2004) – qcc: an R package for quality control charting and statistical process control. R News 4/1, 11-17.

citation("qcc")
## 
## To cite qcc in publications use:
## 
##   Scrucca, L. (2004). qcc: an R package for quality control
##   charting and statistical process control. R News 4/1, 11-17.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {qcc: an R package for quality control charting and statistical process control},
##     author = {Luca Scrucca},
##     journal = {R News},
##     year = {2004},
##     pages = {11--17},
##     volume = {4/1},
##     url = {http://CRAN.R-project.org/doc/Rnews/},
##   }
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] qcc_2.6      sqldf_0.4-11 RSQLite_2.0  gsubfn_0.6-6 proto_1.0.0 
## [6] readr_1.1.1  pryr_0.1.2  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11     knitr_1.16       magrittr_1.5     MASS_7.3-47     
##  [5] hms_0.3          bit_1.1-12       R6_2.2.2         rlang_0.1.1     
##  [9] blob_1.1.0       stringr_1.2.0    tcltk_3.4.1      tools_3.4.1     
## [13] DBI_0.7          htmltools_0.3.6  yaml_2.1.14      bit64_0.9-7     
## [17] rprojroot_1.2    digest_0.6.12    tibble_1.3.3     codetools_0.2-15
## [21] memoise_1.1.0    evaluate_0.10.1  rmarkdown_1.6    stringi_1.1.5   
## [25] compiler_3.4.1   backports_1.1.0  chron_2.3-50     pkgconfig_2.0.1