Economic and Health Risk Assessments of Storm Events

NOAA Storm Data Analysis
Heng-Ru May Tan

• Synopsis :

Severe weather events like storms can and often result in public health and economic consequences for local and national communities. Prevention and anticipation of the extent of severe storm event outcomes, such as fatalities, injuries, property and crop damages, are primary concerns. The assessment reported here involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The NOAA database tracks and documents characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, property and crop damages.

An initial data exploration was performed to tidy up the data for consistency, resolving the range of listed storm events to correspond with permitted NOAA events, reformatting values of estimated damages, as well as assessing for potential outliers. The data analysis subsequently peformed here investigated the effects of storm events on public health (in terms of fatalities and injuries) and economic consequences (as inferred by property and crop damages) at a national level across the U.S.

• Data Processing :

The data was retrieved from the NOAA storm database, which has data documented from 1950 to Nov 2011.
- DATA URL: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
- DOCUMENTATION: National Weather Service Storm Data Documentation
- FAQ: National Climatic Data Center Storm Events

o SETTING THE ENVIRONMENT

The analysis environment is set by loading the required libraries.

library(knitr)
library(graphics)
library(ggplot2)
library(cowplot)
library(gridExtra)
library(lubridate)
library(tidyr)
library(plyr)
library(dplyr)
library(regexr)

The R session information is provided for reference:

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.1 (El Capitan)
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.2.1   tools_3.2.2     htmltools_0.2.6
##  [5] yaml_2.1.13     stringi_1.0-1   rmarkdown_0.8.1 knitr_1.11     
##  [9] stringr_1.0.0   digest_0.6.8    evaluate_0.8

o READING IN THE DATA

The data file is read from current working directory (if present) or downloaded from the URL (if absent) and read to environment.

## --- DOWNLOAD FILE & UNZIP (if file not in current directory)  
if (!file.exists("repdata-data-StormData.csv.bz2") ) {
      fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
      tmp <- tempfile()
      download.file(fileurl, tmp, method = "curl", mode = "wb")
      stormD <- read.csv(bzfile(tmp, "repdata-data-StormData.csv.bz2"))
      unlink(tmp)
      rm(list = c("tmp","fileurl", "list") )
}

## --- READ FILE (if file exists in current directory)
if(file.exists("repdata-data-StormData.csv.bz2")){
      stormD <- read.csv("./repdata-data-StormData.csv.bz2") 
}

o EXPLORING | TIDYING RAW DATA

– DATE & TIME :
The date and time information are observed to have been entered with a range of date and time formats and are reformatted for consistency.

1– Beginning date and time columns of events are combined and parsed.

bgn_datetime <- paste(mdy_hms(stormD$BGN_DATE), stormD$BGN_TIME, sep=" ") 
bgn_datetime2 <- parse_date_time(bgn_datetime,  c('%Y-%m-%d %H%M',
                                                  '%Y-%m-%d %H:%M:%S %p')) 
## Warning: 5 failed to parse.

There were 5 values in the combined beginning date-time information that failed to parse.

## Explore what these NANs are
NANbgn_datetimeIDX <- which(is.na(bgn_datetime2)) 
bgn_datetime[NANbgn_datetimeIDX] 
## [1] "1993-01-12 1990" "1993-12-25 9999" "1994-06-29 2090" "1995-08-16 1580"
## [5] "1995-08-28 0572"

On inspection, these appear to be ‘nonsensical’ time entries from data prior to 1996.

2– Similarly, the end date and time columns of events are combined and parsed.

end_datetime <- paste(mdy_hms(stormD$END_DATE), stormD$END_TIME, sep=" ") 
end_datetime2 <- parse_date_time(end_datetime,  c('%Y-%m-%d %I:%M:%S %p',
                                                  '%Y-%m-%d',
                                                  '%H%M %p',
                                                  '%Y-%m-%d %H%M',
                                                  '%Y-%m-%d %H:%M:%S %p'), truncated = 3) 
## Warning: 238923 failed to parse.

There are a lot more end date-time information that failed to parse. These manifest as missing end date-time and other corresponding information values in the raw data.

sum(is.na(end_datetime2))/nrow(stormD)
## [1] 0.2647942

This was roughly 26% of the raw data. On further inspection using the combined beginning date-time information, it appears that most of these missing end date-time information exist in data recorded before 1996, and includes 3 of the 5 ‘nonsensical’ values from the combined beginning information.

NANend_datetimeIDX <- which(is.na(end_datetime2)) 
summary(year(bgn_datetime2[NANend_datetimeIDX]) )
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1950    1975    1986    1983    1992    1995       3

It is noted on the Storm Events Database that from 1996 to present, “All Event Types (48 from Directive 10-1605)” are recorded as defined in NWS Directive 10-1605. As such, and given the substantial missing end date-time values, it is likely that prior to 1996, the recorded storm data information were less homogeneous.

Therefore the subsequent further pre-proprocessing is performed on data that is recorded from 1996 onwards to Nov.2011.

We first merge date-time columns in the raw data and substiute with parsed beginning and end date-times before subsetting data from the relevant years of interest.

stormD <- stormD %>% 
      unite(BGN_DATETIME, c(BGN_DATE,BGN_TIME), sep = "" ) %>% 
      mutate(BGN_DATETIME = bgn_datetime2)

stormD <- stormD %>% 
      unite(END_DATETIME, c(END_DATE,END_TIME), sep = "" ) %>% 
      mutate(END_DATETIME = end_datetime2)

o PRE-PROCESSING Storm Data (1996-Nov2011)

We subset the data from the relevant years of interest in a new dataframe.

stormD_9611 <- filter(stormD, year(BGN_DATETIME)>=1996 )
head(stormD_9611[,c("STATE","BGN_DATETIME","END_DATETIME","EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP","REFNUM")],10)
##    STATE        BGN_DATETIME        END_DATETIME       EVTYPE FATALITIES
## 1     AL 1996-01-06 08:00:00 1996-01-07 15:00:00 WINTER STORM          0
## 2     AL 1996-01-11 06:35:00 1996-01-11 18:36:00      TORNADO          0
## 3     AL 1996-01-11 06:45:00 1996-01-11 18:45:00    TSTM WIND          0
## 4     AL 1996-01-11 07:05:00 1996-01-11 19:05:00    TSTM WIND          0
## 5     AL 1996-01-11 07:38:00 1996-01-11 19:38:00    TSTM WIND          0
## 6     AL 1996-01-18 05:05:00 1996-01-18 17:05:00         HAIL          0
## 7     AL 1996-01-18 06:00:00 1996-01-19 05:00:00    HIGH WIND          0
## 8     AL 1996-01-19 12:20:00 1996-01-19 12:20:00    TSTM WIND          0
## 9     AL 1996-01-24 03:40:00 1996-01-24 03:40:00    TSTM WIND          0
## 10    AL 1996-01-24 03:45:00 1996-01-24 03:45:00    TSTM WIND          0
##    INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP REFNUM
## 1         0     380          K      38          K 248768
## 2         0     100          K       0            248769
## 3         0       3          K       0            248770
## 4         0       5          K       0            248771
## 5         0       2          K       0            248772
## 6         0       0                  0            248773
## 7         0     400          K       0            248774
## 8         0      12          K       0            248775
## 9         0       8          K       0            248776
## 10        0      12          K       0            248777

The structure of the subset of storm data has 653530 obs. of 35 variables.

str(stormD_9611)
## 'data.frame':    653530 obs. of  35 variables:
##  $ STATE__     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATETIME: POSIXct, format: "1996-01-06 08:00:00" "1996-01-11 06:35:00" ...
##  $ TIME_ZONE   : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY      : num  1 31 31 45 67 125 1 75 51 101 ...
##  $ COUNTYNAME  : Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 737 2926 2926 4388 5778 24418 734 10074 4558 13540 ...
##  $ STATE       : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE      : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 972 834 856 856 856 244 359 856 856 856 ...
##  $ BGN_RANGE   : num  0 5 0 0 0 8 0 0 8 23 ...
##  $ BGN_AZI     : Factor w/ 35 levels "","  N"," NW",..: 1 14 1 1 1 21 1 1 17 22 ...
##  $ BGN_LOCATI  : Factor w/ 54429 levels ""," Christiansburg",..: 1 25146 15520 38419 21271 49638 1 50473 52699 31662 ...
##  $ END_DATETIME: POSIXct, format: "1996-01-07 15:00:00" "1996-01-11 18:36:00" ...
##  $ COUNTY_END  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN  : logi  NA NA NA NA NA NA ...
##  $ END_RANGE   : num  0 5 0 0 0 8 0 0 8 0 ...
##  $ END_AZI     : Factor w/ 24 levels "","E","ENE","ESE",..: 1 9 1 1 1 14 1 1 10 1 ...
##  $ END_LOCATI  : Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 16298 10081 24389 13781 31439 1 31961 33339 20494 ...
##  $ LENGTH      : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ WIDTH       : num  0 75 0 0 0 0 0 0 0 0 ...
##  $ F           : int  NA 1 NA NA NA NA NA NA NA NA ...
##  $ MAG         : num  0 0 0 0 0 75 40 50 50 50 ...
##  $ FATALITIES  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PROPDMG     : num  380 100 3 5 2 0 400 12 8 12 ...
##  $ PROPDMGEXP  : Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 1 17 17 17 17 ...
##  $ CROPDMG     : num  38 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP  : Factor w/ 9 levels "","?","0","2",..: 7 1 1 1 1 1 1 1 1 1 ...
##  $ WFO         : Factor w/ 542 levels ""," CI","%SD",..: 113 503 503 503 503 113 113 113 113 113 ...
##  $ STATEOFFIC  : Factor w/ 250 levels "","ALABAMA, Central",..: 2 4 4 4 4 2 2 2 2 2 ...
##  $ ZONENAMES   : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 11608 1 1 1 1 1 11611 1 1 1 ...
##  $ LATITUDE    : num  0 3116 3119 3119 3121 ...
##  $ LONGITUDE   : num  0 8608 8551 8533 8521 ...
##  $ LATITUDE_E  : num  0 3116 3119 3119 3121 ...
##  $ LONGITUDE_  : num  0 8608 8551 8533 8521 ...
##  $ REMARKS     : Factor w/ 436781 levels "","\t","\t\t",..: 53463 45112 354734 68033 68056 71225 296502 356375 74910 353280 ...
##  $ REFNUM      : num  248768 248769 248770 248771 248772 ...

Additionally, the data structure information indicates that there are 985 recorded event types. However, there are only 48 permitted storm data events, so there is need to resolve them.

o EVENTS MATCHING

Using the list of permitted storm events, an Event-Mapping strategy is implemented. The documented data events are string-matched to the list of 48 permitted events.

1–INITIAL EVENTS-MATCHING

PermittedEvents = (c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill",
                     "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", 
                     "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", 
                     "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf",
                     "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood",
                     "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", 
                     "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", 
                     "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm",
                     "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather") )

A more ‘conservative’ approach of “exact” string matches is used (i.e. grep() parameter “fixed” is set to “TRUE”). This means that some data that may well fall into the description of the permitted events would be ignored. It is reasoned that if documented appropriately any storm data information would (in principle) conform to the standardized input formats.

matchEvtIDX <- lapply(toupper(PermittedEvents), function(row) grep(row , as.character(stormD_9611$EVTYPE), 
                                                                   ignore.case = TRUE,
                                                                   fixed =  TRUE) )

Initial Event-Mapping revealed that about 87% of the documented events matched those on the permitted list.

length(unlist(matchEvtIDX))/nrow(stormD_9611)
## [1] 0.8701483

However, it is observed that there are variant (long/short-form) descriptors of the ‘same’ storm events.

unique(as.character( stormD_9611$EVTYPE[unlist(matchEvtIDX)] ))
##   [1] "ASTRONOMICAL LOW TIDE"     "AVALANCHE"                
##   [3] "BLIZZARD"                  "COASTAL FLOOD"            
##   [5] "COASTAL FLOODING"          " COASTAL FLOOD"           
##   [7] "COASTAL FLOODING/EROSION"  "EXTREME COLD/WIND CHILL"  
##   [9] "COLD/WIND CHILL"           "DENSE FOG"                
##  [11] "PATCHY DENSE FOG"          "DENSE SMOKE"              
##  [13] "DROUGHT"                   "SNOW DROUGHT"             
##  [15] "EXCESSIVE HEAT/DROUGHT"    "DUST DEVIL"               
##  [17] "DUST STORM"                "EXCESSIVE HEAT"           
##  [19] "FLASH FLOOD"               "FLASH FLOOD/FLOOD"        
##  [21] "FLASH FLOODING"            " FLASH FLOOD"             
##  [23] "FLOOD"                     "COASTALFLOOD"             
##  [25] "STREET FLOODING"           "TIDAL FLOODING"           
##  [27] "RIVER FLOOD"               "URBAN/STREET FLOODING"    
##  [29] "COASTAL  FLOODING/EROSION" "URBAN FLOOD"              
##  [31] "RIVER FLOODING"            "FLOOD/FLASH/FLOOD"        
##  [33] "CSTL FLOODING/EROSION"     "SNOWMELT FLOODING"        
##  [35] "LAKESHORE FLOOD"           "FROST/FREEZE"             
##  [37] "FUNNEL CLOUD"              "FUNNEL CLOUDS"            
##  [39] "FREEZING FOG"              "HAIL"                     
##  [41] "TSTM WIND/HAIL"            "HAIL/WIND"                
##  [43] "SMALL HAIL"                "GUSTY WIND/HAIL"          
##  [45] "LATE SEASON HAIL"          "NON SEVERE HAIL"          
##  [47] "MARINE HAIL"               "HEAT"                     
##  [49] "RECORD HEAT"               "HEAT WAVE"                
##  [51] "HEAVY RAIN"                "HEAVY RAIN/WIND"          
##  [53] "TSTM HEAVY RAIN"           "HEAVY RAINFALL"           
##  [55] "LOCALLY HEAVY RAIN"        "HEAVY RAIN EFFECTS"       
##  [57] "HEAVY SNOW"                "HEAVY SNOW SQUALLS"       
##  [59] "HIGH SURF"                 "HIGH SURF ADVISORY"       
##  [61] "   HIGH SURF ADVISORY"     "HIGH SURF ADVISORIES"     
##  [63] "HEAVY SURF/HIGH SURF"      "HIGH WIND"                
##  [65] "HIGH WINDS"                "HIGH WIND (G40)"          
##  [67] "MARINE HIGH WIND"          "ICE STORM"                
##  [69] "LAKE-EFFECT SNOW"          "LIGHTNING"                
##  [71] "TSTM WIND AND LIGHTNING"   " LIGHTNING"               
##  [73] "MARINE STRONG WIND"        "MARINE THUNDERSTORM WIND" 
##  [75] "RIP CURRENTS"              "RIP CURRENT"              
##  [77] "SEICHE"                    "SLEET/FREEZING RAIN"      
##  [79] "FREEZING RAIN/SLEET"       "SNOW/SLEET"               
##  [81] "SNOW AND SLEET"            "SLEET"                    
##  [83] "SLEET STORM"               "STORM SURGE/TIDE"         
##  [85] "STRONG WIND"               "STRONG WINDS"             
##  [87] "STRONG WIND GUST"          "THUNDERSTORM WIND (G40)"  
##  [89] "GUSTY THUNDERSTORM WINDS"  "GUSTY THUNDERSTORM WIND"  
##  [91] "THUNDERSTORM WIND"         "TORNADO"                  
##  [93] "TORNADO DEBRIS"            "TROPICAL DEPRESSION"      
##  [95] "TROPICAL STORM"            "TSUNAMI"                  
##  [97] "VOLCANIC ASH"              "VOLCANIC ASHFALL"         
##  [99] "WATERSPOUT"                "WATERSPOUTS"              
## [101] " WATERSPOUT"               "WILDFIRE"                 
## [103] "WINTER STORM"              "WINTER WEATHER"           
## [105] "WINTER WEATHER MIX"        "WINTER WEATHER/MIX"

Through a series of iterative explorations to verify if the Event-Mapping strategy yielded appropriate matches, an updated matching strategy was implemented. This uses an updated list of regular string expressions to match the permitted events.

2–UPDATED EVENTS-MATCHING

### RegExp2use------------------------------------------------------
RegExp2use = (c("Astronomical Low Tide", 
                "Avalanche", 
                "Blizzard", 
                "Flood", 
                "Flash Flood", 
                "FLOOD/FLASH/FLOOD", #"Flash Flood", 
                "Coastal Flood", 
                "COASTAL  FLOODING/EROSION", 
                "CSTL FLOODING/EROSION", 
                "COASTALFLOOD", 
                "Lakeshore Flood", 
                "Cold/Wind Chill", 
                "Extreme Cold/Wind Chill",
                "Debris Flow", 
                "Dense Fog", 
                "Freezing Fog",
                "Dense Smoke", 
                "Drought", 
                "Heat", 
                "Excessive Heat", 
                "Dust Devil", 
                "Dust Storm",
                "Frost/Freeze", 
                "Funnel Cloud", 
                "Hail", 
                "Marine Hail", 
                "Heavy Rain", 
                "Heavy Snow", 
                "High Surf",
                "High Wind",
                "Marine High Wind", 
                "Strong Wind",
                "Marine Strong Wind", 
                "Thunderstorm Wind", 
                "TSTM WIND",
                "Marine Thunderstorm Wind",
                "Marine TSTM WIND",
                "Hurrican/Typhoon", # "Hurricane (Typhoon)",  
                "Hurrican",
                "Typhoon",
                "Ice Storm",
                "Lake-Effect Snow", 
                "Lightning",
                "Rip Current",
                "Seiche", 
                "Sleet",
                "Storm Surge/Tide", 
                "Tornado", 
                "Tropical Depression", 
                "Tropical Storm", 
                "Tsunami",
                "Volcanic Ash",
                "Waterspout", 
                "Wildfire",
                "Winter Storm", 
                "Winter Weather") )

The following table illustrates the mapping:

Events Mapping: A listing of the regular expression(s) used to map to each permitted storm event. All Permitted Event Types in the Storm Data (48 from Directive 10-1605) are recorded (from 1996 to present) as defined in NWS Directive 10-1605, Table 1 of Section 2.1.1.
PermittedEvents RegExp2use
Astronomical Low Tide Astronomical Low Tide
Avalanche Avalanche
Blizzard Blizzard
Flood Flood
Flash Flood Flash Flood
Flash Flood FLOOD/FLASH/FLOOD
Coastal Flood Coastal Flood
Coastal Flood COASTAL FLOODING/EROSION
Coastal Flood CSTL FLOODING/EROSION
Coastal Flood COASTALFLOOD
Lakeshore Flood Lakeshore Flood
Cold/Wind Chill Cold/Wind Chill
Extreme Cold/Wind Chill Extreme Cold/Wind Chill
Debris Flow Debris Flow
Dense Fog Dense Fog
Freezing Fog Freezing Fog
Dense Smoke Dense Smoke
Drought Drought
Heat Heat
Excessive Heat Excessive Heat
Dust Devil Dust Devil
Dust Storm Dust Storm
Frost/Freeze Frost/Freeze
Funnel Cloud Funnel Cloud
Hail Hail
Marine Hail Marine Hail
Heavy Rain Heavy Rain
Heavy Snow Heavy Snow
High Surf High Surf
High Wind High Wind
Marine High Wind Marine High Wind
Strong Wind Strong Wind
Marine Strong Wind Marine Strong Wind
Thunderstorm Wind Thunderstorm Wind
Thunderstorm Wind TSTM WIND
Marine Thunderstorm Wind Marine Thunderstorm Wind
Marine Thunderstorm Wind Marine TSTM WIND
Hurricane (Typhoon) Hurrican/Typhoon
Hurricane (Typhoon) Hurrican
Hurricane (Typhoon) Typhoon
Ice Storm Ice Storm
Lake-Effect Snow Lake-Effect Snow
Lightning Lightning
Rip Current Rip Current
Seiche Seiche
Sleet Sleet
Storm Surge/Tide Storm Surge/Tide
Tornado Tornado
Tropical Depression Tropical Depression
Tropical Storm Tropical Storm
Tsunami Tsunami
Volcanic Ash Volcanic Ash
Waterspout Waterspout
Wildfire Wildfire
Winter Storm Winter Storm
Winter Weather Winter Weather

3–IMPLEMENTING UPDATED EVENTS-MATCHING WTIH INDEXING UPDATE

Using the updated Events-Mapping strategy, the storm data events were matched to permitted events.

matchEvtIDX2 <- lapply(toupper(RegExp2use), 
                       function(row) grep(as.character(row) , as.character(stormD_9611$EVTYPE), 
                                          ignore.case = TRUE, fixed =  TRUE) )

As the regular expression string matching is not perfect, additional iterative checks made sure that the list of event matching indices yielded appropriate permitted events. Where this is not met, the given event-matching list of indices is resolved using the set-difference of relevant overlapping indexed events and/or additional string exclusions. An example is commented in the following code.

## SETdiff matchEvtIDX2 lists for appropriate matches:

# "FLOOD" 
matchEvtIDX2[[4]] <- setdiff(matchEvtIDX2[[4]], unlist(matchEvtIDX2[5:11] ) ) 

# COLD/WIND CHILL"
matchEvtIDX2[[12]]<- setdiff(matchEvtIDX2[[12]], unlist(matchEvtIDX2[13] ) ) 

# "DROUGHT"
matchEvtIDX2[[18]]<- setdiff(matchEvtIDX2[[18]], unlist(matchEvtIDX2[20] ) ) 


# "HEAT"
matchNONHEAT_IDX <- grep("(^RECORD HEAT)|(^HEAT WAVE)" , as.character(stormD_9611$EVTYPE) , ignore.case = TRUE, fixed =  FALSE)
# > unique(as.character( stormD_9611$EVTYPE[matchNONHEAT_IDX] ))
# [1] "Record Heat" "Heat Wave"   "RECORD HEAT" "HEAT WAVE" 

matchEvtIDX2[[19]] <- setdiff(matchEvtIDX2[[19]], c(unlist(matchEvtIDX2[20]), matchNONHEAT_IDX) )
# diffIDX <- setdiff(matchEvtIDX2[[19]], unlist(matchEvtIDX2[20] ) )
# unique(as.character( stormD_9611$EVTYPE[diffIDX] ))
# [1] "HEAT"       

# "EXCESSIVE HEAT"
matchEvtIDX2[[20]] <- c(unlist(matchEvtIDX2[20]), matchNONHEAT_IDX) 
# > unique(as.character( stormD_9611$EVTYPE[matchEvtIDX2[[20]]] ))
# [1] "EXCESSIVE HEAT"         "EXCESSIVE HEAT/DROUGHT"
# [3] "Record Heat"            "Heat Wave"             
# [5] "RECORD HEAT"            "HEAT WAVE"             


# "HAIL"
matchEvtIDX2[[25]]<- setdiff(matchEvtIDX2[[25]], unlist(matchEvtIDX2[26] ) ) 

# "HIGH WIND"
matchEvtIDX2[[30]]<- setdiff(matchEvtIDX2[[30]], unlist(matchEvtIDX2[31] ) ) 

# "STRONG WIND" 
matchEvtIDX2[[32]]<- setdiff(matchEvtIDX2[[32]], unlist(matchEvtIDX2[33] ) ) 

# "THUNDERSTORM WIND"
matchEvtIDX2[[34]] <- setdiff(matchEvtIDX2[[34]], unlist(matchEvtIDX2[36] ) )


### AN EXAMPLE of set differencing & exclusion:
# > unique(as.character( stormD_9611$EVTYPE[matchEvtIDX2[[35]]] ))
# [1] "TSTM WIND"               "TSTM WIND/HAIL"          "TSTM WIND (G45)"        
# [4] "TSTM WIND 40"            "TSTM WIND 45"            "TSTM WIND (41)"         
# [7] "TSTM WIND (G40)"         " TSTM WIND"              "TSTM WIND AND LIGHTNING"
# [10] " TSTM WIND (G45)"        "TSTM WIND  (G45)"        "TSTM WIND (G35)"        
# [13] "TSTM WINDS"              "TSTM WIND G45"           "NON-TSTM WIND"          
# [16] "NON TSTM WIND"           "MARINE TSTM WIND"       
# > RegExp2use[35]
# [1] "TSTM WIND"

matchNONTSTM_IDX <- grep("(^NON TSTM Wind)|(^NON-TSTM Wind)" , as.character(stormD_9611$EVTYPE) ,
                         ignore.case = TRUE, fixed =  FALSE)
# > unique(as.character( stormD_9611$EVTYPE[matchNONTSTM_IDX] ))
# [1] "NON-TSTM WIND" "NON TSTM WIND"

matchEvtIDX2[[35]]<- setdiff(matchEvtIDX2[[35]], c(unlist(matchEvtIDX2[36:37]), matchNONTSTM_IDX ) )  
# diffIDX <- setdiff(matchEvtIDX2[[35]], c(unlist(matchEvtIDX2[36:37]), matchNONTSTM_IDX ) ) 
# unique(as.character( stormD_9611$EVTYPE[diffIDX] ))
# [1] "TSTM WIND"               "TSTM WIND/HAIL"          "TSTM WIND (G45)"        
# [4] "TSTM WIND 40"            "TSTM WIND 45"            "TSTM WIND (41)"         
# [7] "TSTM WIND (G40)"         " TSTM WIND"              "TSTM WIND AND LIGHTNING"
# [10] " TSTM WIND (G45)"        "TSTM WIND  (G45)"        "TSTM WIND (G35)"        
# [13] "TSTM WINDS"              "TSTM WIND G45" 

The final Events-Matching was checked for consistency and relevance.
(This gives a long printout and so the print command is commented out to suppressed output to HTML).

### Check matches to PermittedEvts2
MatchEvts <- list()
for(i in 1:length(RegExp2use)){
      if (length( unique(as.character( stormD_9611$EVTYPE[matchEvtIDX2[[i]]] )) ) !=0){
            MatchEvts[i] <- list(unique(as.character( stormD_9611$EVTYPE[matchEvtIDX2[[i]]] )) )
      }
      
#       print( c( "PermittedEvts2: ", PermittedEvts2[i]) ) 
#       print( MatchEvts[i] ) 
}

4–Now we update the recorded storm data EVTYPE values with the matched permitted events:

### UPDATE Evnts with Permitted Event Codes
stormEvtype <- as.character(stormD_9611$EVTYPE )
dim(stormEvtype) <- c(length(stormEvtype),1)

for(i in 1:length(RegExp2use)){
      stormEvtype[matchEvtIDX2[[i]]] <- PermittedEvts2[i] 
}

o RESOLVING PROPERTY & CROP DAMAGE INFORMATION VALUE : EXPONENTS H/K/M/B

We map the character exponent information for the damage values as described in the documentation with the appropriate numerical values.

oldV <- sort(unique(c(levels(stormD_9611$PROPDMGEXP),levels(stormD_9611$CROPDMGEXP) ) ))
# [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "k" "K" "m" "M"

newV <- c(0, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 9, 2, 2, 3, 3, 6, 6)

PropDMGexp <- sapply(mapvalues(stormD_9611$PROPDMGEXP, oldV, newV, warn_missing = FALSE), 
                     function(v) {10^as.numeric(as.character(v) ) })

CropDMGexp <- sapply(mapvalues(stormD_9611$CROPDMGEXP, oldV, newV, warn_missing = FALSE), 
                     function(v) {10^as.numeric(as.character(v) ) })

Subsequently the information on the values of property and crop damages are updated to reflect the real damages in numerical values.

PropDMG <- stormD_9611$PROPDMG * PropDMGexp

CropDMG <- stormD_9611$CROPDMG * CropDMGexp

The next steps involve merging the updated storm data variables, tidying and subsetting relevant information of interest.

o MERGE & TIDY & SUBSET Storm Data

## SUBSET stormD_9611
stormD_9611b <- subset(stormD_9611, select=c(STATE, BGN_DATETIME, END_DATETIME, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, REMARKS, REFNUM, FATALITIES, INJURIES) )

## ADD Prop/CropDMG to stormD_9611
stormD_9611b <- mutate(stormD_9611b, PropDMG=PropDMG, CropDMG=CropDMG, stormEvtype=factor(stormEvtype) )

o REMOVE NonMatch Evtypes

## REMOVE NonMatch Evtypes 1st
stormD_9611c <- stormD_9611b[unlist(matchEvtIDX2),]
## --------------------------------------------------------------
# length(unlist(matchEvtIDX2))/nrow(stormD_9611b) 
# [1] 0.985612

o RESOLVE DUPLICATE REFNUMS

## Duplicate RefsNums???      
nonDupREFNUMidx <- which(!duplicated(stormD_9611c$REFNUM))
# length(nonDupREFNUMidx)/nrow(stormD_9611c) 
# [1] 0.9982659

## REMOVE duplicate Refnums!!!
stormD_9611d <- stormD_9611c[nonDupREFNUMidx,]

o CHECK FOR POSSIBLE OUTLIERS

Some summary descriptives was performed which indicated possible outliers in the values of property damages.
(The boxplot code is commented out here but used again below for plotting and comparison with subsequent removal of outliers).

summary(stormD_9611d$PropDMG)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.976e+05 1.500e+03 1.150e+11
summary(stormD_9611d$CropDMG)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.124e+04 0.000e+00 1.510e+09
# bp1<- boxplot(stormD_9611d$PropDMG, stormD_9611d$CropDMG, axes=FALSE, main="Prior to Oulier Removal") 

Sorting the Property Damages by descending order, and reviewing the REMARKS associated with the top (descending) listed REFNUMs indicated that these surprisingly high property damage entries show some odd discrepancies.

stormD_9611d_SortDescPropDMG <- arrange(stormD_9611d, desc(PropDMG) )

head(stormD_9611d_SortDescPropDMG[,c("REFNUM", "FATALITIES", "INJURIES", "PropDMG", "CropDMG", "stormEvtype")])
##   REFNUM FATALITIES INJURIES   PropDMG  CropDMG         stormEvtype
## 1 605943          0        0 1.150e+11 3.25e+07               Flood
## 2 577615          0        0 1.693e+10 0.00e+00 Hurricane (Typhoon)
## 3 569288          5        0 1.000e+10 0.00e+00 Hurricane (Typhoon)
## 4 581533          0        0 7.350e+09 0.00e+00 Hurricane (Typhoon)
## 5 581537         15      104 5.880e+09 1.51e+09 Hurricane (Typhoon)
## 6 529299          7      780 5.420e+09 2.85e+08 Hurricane (Typhoon)

These were also highlighted in the forum discussions.

A brief observation summary is given below:

#   REFNUM         ** OBSERVATIONS
# 1 605943        | NAPA 115Bil vs remarked 70Mil propDmg --> refNUM2 12634
# 2 577615        | PropDMG ~ 22Bil related to Katrina but no casualties reported?! 
# 3 569288        |  Hurrican Wilma : crop damage ~ 222mil not reported
# 4 581533        | Hurrican Katrina : strangely no fatalities?

##   REFNUM FATALITIES INJURIES   PropDMG  CropDMG         stormEvtype        ** OBSERVATIONS
# 5 581537         15      104 5.880e+09 1.51e+09 Hurricane (Typhoon)         | seems ok
# 6 529299          7      780 5.420e+09 2.85e+08 Hurricane (Typhoon)         | seems ok

# REFNUM==567221
# stormD_9611d[match(567221,stormD_9611d$REFNUM),] #[1] 12634                 
# ** OBSERVATIONS       | This REFNUM info. appears to be a correct record for the anomalous entry REFNUM 605943 which had discrepant damage values as highlighted in the largest outlier value (and as discussed in forum).

o REMOVE SELECTED OUTLIERS & TIDY DATA

The 4 extreme outliers are thus removed from the current subset of storm data.

## TIDY DATA # Remove top4
stormD_9611e <- stormD_9611d_SortDescPropDMG[c(5:nrow(stormD_9611d)),] %>% arrange(BGN_DATETIME)

str(stormD_9611e)
## 'data.frame':    643008 obs. of  14 variables:
##  $ STATE       : Factor w/ 72 levels "AK","AL","AM",..: 9 13 63 24 13 24 9 49 63 39 ...
##  $ BGN_DATETIME: POSIXct, format: "1996-01-01 01:00:00" "1996-01-01 01:45:00" ...
##  $ END_DATETIME: POSIXct, format: "1996-01-02 09:00:00" "1996-01-01 13:45:00" ...
##  $ PROPDMG     : num  0 5 0 0 10 0 0 0 0 0 ...
##  $ PROPDMGEXP  : Factor w/ 19 levels "","-","?","+",..: 1 17 1 1 17 1 1 1 1 1 ...
##  $ CROPDMG     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP  : Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REMARKS     : Factor w/ 436781 levels "","\t","\t\t",..: 52157 403804 40406 5745 403805 1 379636 340239 60956 347143 ...
##  $ REFNUM      : num  251415 252698 274876 259428 252699 ...
##  $ FATALITIES  : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ INJURIES    : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ PropDMG     : num  0 5000 0 0 10000 0 0 0 0 0 ...
##  $ CropDMG     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ stormEvtype : Factor w/ 426 levels " WIND","ABNORMAL WARMTH",..: 127 356 127 180 356 115 127 127 127 420 ...

Just to see whether the removal of outliers made any differences - a quick plot check is done.
(Plotting code is commented here and is used in the plotting below).

# ## FIG1 -- EXPLORATORY CHECKS
# 
# par(mfrow = c(2,3) , mar=c(4,4,4,1))
# 
# ## BEFORE
# bp1<- boxplot(stormD_9611d$PropDMG, stormD_9611d$CropDMG, axes=FALSE, 
#               main="Prior to Oulier Removal",
#               ylab="Damges USD$") #** A COUPLE OF EXTREME VALUES ?
# axis(2)
# axis(1, labels = as.character(c("PropDMG", "CropDMG")) , at=c( 1, 2), lty=0 )  
# h1 <- hist(log(stormD_9611d$PropDMG), 100, main = " ") 
# h2 <- hist(log(stormD_9611d$CropDMG), 100, main = " ") 
# 
# ## AFTER 
# bp2 <- boxplot(stormD_9611e$PropDMG, stormD_9611e$CropDMG, axes=FALSE, 
#                main="After Outlier Removal",
#                ylab="Damges USD$")
# axis(2)
# axis(1, labels = as.character(c("PropDMG", "CropDMG")) , at=c( 1, 2), lty=0 )  
# 
# h3 <- hist(log(stormD_9611e$PropDMG), 100, main = " ") 
# h4 <- hist(log(stormD_9611e$CropDMG), 100, main = " ") 

o FIG.1. Exploratory plot checks post outlier removal.

• Analysis : Data Descriptives & Correlations

Now with the tidy subset of storm data we addess the questions of interest.
The analyses (and code for plots) are described here in this section so that it might be easier to go over the Results section (below).

o Q1 Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

i We summarize the total harm as resulted from the different storm events for FATALITIES and INJURIES separately.

EventPopHarm <- stormD_9611e %>% group_by(stormEvtype) %>% 
      summarise_each(funs(sum(., na.rm = TRUE)), FATALITIES, INJURIES) 

# summary(EventPopHarm$FATALITIES)
# 
# summary(EventPopHarm$INJURIES)

ii Then these population health effects are visualized with plotting.
The code is commented here but used in the plotting for displaying the Results section.

## FIG2 -- HEALTH EFFECTS -- CODE commented here but it is used to generate the plots in RESULTS

# ## SEPARATE PLOTs
# 
# g1<-ggplot(EventPopHarm, aes(stormEvtype, FATALITIES, fill=FATALITIES) ) + 
#       geom_bar(stat="identity") + 
#       labs(title = " ", x="Storm Data Events") +
#       theme(axis.text.x = element_text(angle = 60, hjust = 1), axis.text = element_text(size=11) ) + 
        #, legend.position="left") +
#       coord_flip() + 
#       scale_fill_gradient(low="light grey", high="dark red", space="Lab") 
# 
#       
# g2<-ggplot(EventPopHarm, aes(stormEvtype, INJURIES, fill=INJURIES) ) + 
#       geom_bar(stat="identity") + 
#       labs(title = " ", x=" ") +
#       theme(axis.text.x = element_text(angle = 60, hjust = 1), axis.text = element_text(size=11) ) + 
        #, legend.position="left") +
#       coord_flip() + 
#       scale_fill_gradient(low="light grey", high="dark green", space="Lab")
# 
# # ## COMBINED PLOT
# # grid.arrange(g1, g2, ncol=2, top="Daily of Storm Events on Population Health Across USA (1996-2011)" )
# grid.arrange(g1, g2, ncol=2, top=" " )

iii We ranked ordered the heath effects to assess which storm events lead to the highest fatalities and injuries.

EventPopHarm[order(-EventPopHarm$FATALITIES),c(1,2)]
## Source: local data frame [47 x 2]
## 
##          stormEvtype FATALITIES
##               (fctr)      (dbl)
## 1     Excessive Heat       1799
## 2            Tornado       1511
## 3        Flash Flood        887
## 4          Lightning        651
## 5        Rip Current        542
## 6              Flood        416
## 7  Thunderstorm Wind        378
## 8               Heat        237
## 9          High Wind        235
## 10         Avalanche        223
## ..               ...        ...
EventPopHarm[order(-EventPopHarm$INJURIES),c(1,3)]
## Source: local data frame [47 x 2]
## 
##            stormEvtype INJURIES
##                 (fctr)    (dbl)
## 1              Tornado    20667
## 2                Flood     6759
## 3       Excessive Heat     6461
## 4    Thunderstorm Wind     5128
## 5            Lightning     4141
## 6          Flash Flood     1674
## 7  Hurricane (Typhoon)     1326
## 8         Winter Storm     1292
## 9                 Heat     1222
## 10           High Wind     1083
## ..                 ...      ...

o Q2 Across the United States, which types of events have the greatest economic consequences?

i We summarize the total damage incurred by each of the storm event across the U.S. for property and crop damages separately.

EventEconConsq <- stormD_9611e %>% group_by(stormEvtype) %>% 
      summarise_each(funs(sum(., na.rm = TRUE)), PropDMG, CropDMG) 

# summary(EventEconConsq$PropDMG)
# 
# summary(EventEconConsq$CropDMG)

ii Likewise, these population economic effects are visualized with plotting.
The code is commented here but used in the plotting for displaying the Results section.

# ## FIG3 -- ECONOMIC EFFECTS -- CODE commented here but it is used to generate the plots in RESULTS
# 
# ## SEPARATE PLOT
# g3<-ggplot(EventEconConsq, aes(stormEvtype, PropDMG, fill=PropDMG) ) + 
#       geom_bar(stat="identity") + 
#       labs(title = " ", x="Storm Data Events", y="Property Damage (USD$)") +
#       theme(axis.text.x = element_text(angle = 60, hjust = 1), axis.text = element_text(size=11) ) + 
        #, legend.position="left") +
#       coord_flip() + 
#       scale_fill_gradient(name="Property Damage \n (USD$)", low="light grey", high="slategrey", space="Lab") 
# 
# 
# g4<-ggplot(EventEconConsq, aes(stormEvtype, CropDMG, fill=CropDMG) ) + 
#       geom_bar(stat="identity") + 
#       labs(title = " ", x=" ", y="Crop Damage (USD$)") +
#       theme(axis.text.x = element_text(angle = 60, hjust = 1), axis.text = element_text(size=11) ) + 
        #, legend.position="left") +
#       coord_flip() + 
#       scale_fill_gradient(name="Crop Damage \n (USD$)",low="light grey", high="orange", space="Lab")
# 
# # ## COMBINED PLOT
# # grid.arrange(g3, g4, ncol=2, top="Economic Consequences of Storm Events Across USA (1996-2011)")
# grid.arrange(g3, g4, ncol=2, top=" ")

iii We ranked ordered the property and crop damages to assess which storm events lead to the highest economic consequences.

EventEconConsq[order(-EventEconConsq$PropDMG),c(1,2)]
## Source: local data frame [47 x 2]
## 
##            stormEvtype     PropDMG
##                 (fctr)       (dbl)
## 1  Hurricane (Typhoon) 47438889010
## 2                Flood 28965118550
## 3              Tornado 24616945710
## 4          Flash Flood 15222268910
## 5                 Hail 14595233420
## 6    Thunderstorm Wind  7913445880
## 7       Tropical Storm  7642475550
## 8            High Wind  5248378360
## 9             Wildfire  4758667000
## 10    Storm Surge/Tide  4641188000
## ..                 ...         ...
EventEconConsq[order(-EventEconConsq$CropDMG),c(1,3)]
## Source: local data frame [47 x 2]
## 
##            stormEvtype     CropDMG
##                 (fctr)       (dbl)
## 1              Drought 13367566000
## 2  Hurricane (Typhoon)  5350107800
## 3                Flood  4944153400
## 4                 Hail  2496822450
## 5          Flash Flood  1334901700
## 6         Frost/Freeze  1094086000
## 7    Thunderstorm Wind  1016942600
## 8           Heavy Rain   728169800
## 9       Tropical Storm   677711000
## 10           High Wind   633561300
## ..                 ...         ...

o 3 Additional Analysis of Associations :

i The association between population fatalities and injuries was tested.

cor.test(EventPopHarm$FATALITIES,EventPopHarm$INJURIES, method = "spearman") #| non-parametric
## 
##  Spearman's rank correlation rho
## 
## data:  EventPopHarm$FATALITIES and EventPopHarm$INJURIES
## S = 1741.9, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8992877

ii It was also of interest to see if population property damages might be correlated to crop damages.

cor.test(EventEconConsq$PropDMG,EventEconConsq$CropDMG, method = "spearman") #| non-parametric
## 
##  Spearman's rank correlation rho
## 
## data:  EventEconConsq$PropDMG and EventEconConsq$CropDMG
## S = 3257, p-value = 4.471e-12
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8116901

iii We also investigated the extent to which the fatalities and injuries might be associated with property damages.

cor.test(EventPopHarm$FATALITIES,EventEconConsq$PropDMG, method = "spearman") #| non-parametric
## 
##  Spearman's rank correlation rho
## 
## data:  EventPopHarm$FATALITIES and EventEconConsq$PropDMG
## S = 7958.7, p-value = 8.993e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.5398544
cor.test(EventPopHarm$INJURIES,EventEconConsq$PropDMG, method = "spearman") #| non-parametric
## 
##  Spearman's rank correlation rho
## 
## data:  EventPopHarm$INJURIES and EventEconConsq$PropDMG
## S = 5569.8, p-value = 1.647e-07
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6779693

• Results :

o Effect of Storm Events on Population Health Across USA

summary(EventPopHarm$FATALITIES)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0    33.0   176.7   127.0  1799.0
summary(EventPopHarm$INJURIES)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     1.0   143.0  1187.0   710.5 20670.0

Overall, across the U.S., there are generally more injuries relative to fatalities across all storm events. However, Fig.2. below indicates that this observation does necessarily not hold true for all storm events e.g. “Excessive Heat”, “Flash Flood”,“Lightning”.

o FIG.2. Effect of Storm Events on Population Health Across USA (1996-2011)

Nonetheless, the fatalities and injuries resulting from all storm events across the U.S. are observed to be highly correlated (rho = 0.899; S = 1742, p-value < 2.2e-16).

o Economic Consequences of Storm Events Across USA

summary(EventEconConsq$PropDMG)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 1.628e+06 9.480e+06 3.631e+09 1.289e+09 4.744e+10
summary(EventEconConsq$CropDMG)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 1.765e+05 7.003e+08 2.894e+08 1.337e+10

Overall, across the U.S., there are generally more property compared to crop damages across all storm events. Fig.3. below shows that this observation characterizes most of the storm events, except for “Drought”.

o FIG.3. Economic Consequences of Storm Events Across USA (1996-2011)

We note that the association between population property and crop damages is also significant (rho = 0.812; S = 3257, p-value = 4.471e-12).

o Further Observations

Additionally, we observed that fatalities are related to property damages (rho = 0.540; S = 7959, p-value = 8.993e-05), and injuries exhibited stronger associated with property damages (rho = 0.678; S = 5570, p-value = 1.647e-07).

• Summary of Observations :

(1) Tornado, Flood, Excessive Heat, Thunderstorm Wind, Lightning, Flash Flood, and Hurricane (Typhoon) events are most harmful with respect to population health.

(2) Hurricane (Typhoon), Flood, Tornados, Flash Flood, Hail, and Drought events have greatest economic consequences.

(3) The current analysis indicates the storm event related injuries and fatalities are both associated with property damages.

• Addtional Note :

This is an initial analysis of the NOAA Storm Data. It is by no means exhaustive or accurate. In particular, further in-depth data preprocessing might yield a larger subset of the raw data for assessment. Additional analyses, e.g. breaking down the risk assessments by year, state, county, and/or longitude/latitude information could provide a more comprehensive overview of these severe weather effects on population health and economy. These assessments would required further analytical, statistical tests, and visualisation which is beyond the scope of this report. It may be that some states are more prone to certain extreme storm/weather events and that these events might come in cycles, which may aid forecasting, resource and recovery measures.

– last updated 02Dec2015: use of non-parametric correlations instead given non-normally distributed data values.