Storm Events of Highest Damage Potential

Synopsis

This report quantify and analyze storm event observations to identify those with highest potential for sever consequences. The processed data include events recorded between the years 1996-2011 (incomplete data from 1950-1995 was omitted), and the analysis is based on the fields in the processed data: evtype, year, fatalities, injuries, propdmg, cropdmg (units for propdmg & cropdmg are million dollar), and remarks. The data was processed to include relevant data, omit unused variables, filter observations, and deal with noise as inconsistent recordings and errors. Unfinalized numbers are left blank by NOAA though the observations of other fields are there. They are ignored also in this report. Locations of events are also ignored since the analysis focus is on total numbers and not geographic distribution. The report indicates that excessive heat and tornado are the major events for heavy loss of life (around 3300 fatalities out of ~8700 for the 48 event types) and tornado for injuries (around 20650 injuries out of ~58000), while hurricane (typhoon) is responsible for the highest property damages (around 82 billion dollar out of ~250 B) and drought for the highest crop damages (around 13 billion dollar out of ~35 B).

Data Processing

Reading the data

  1. Read raw CSV file containing the data (repdata-data-StormData.csv).
  • First read a few lines to explore the variables:
readTest <- read.csv("repdata-data-StormData.csv.bz2", sep=",", nrows=4, header=TRUE,
                     na.strings=c("NA","N/A",""))
  • There are 37 variables in the datasets:
ncol(readTest)
## [1] 37
names(readTest)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
  • The variables that are relevant for this analysis are:
Variable Description Comments
BGN_DATE date of event
EVTYPE type of event
FATALITIES number of deaths Health related damages
INJURIES number of injuries Health related damages
PROPDMG property damages (dollar) economic related damages
PROPDMGEXP multiplier for PROPDMG k=1,000 M=million B=billion
CROPDMG crop damages (money dollar) economic related damages
CROPDMGEXP multiplier for CROPDMG k=1,000 M=million B=billion
REMARKS comments
  1. Read the relevant variables (see the above table) for the current analysis. The packages plyr and dplyr are required for reading and filtering data with the following code.
#library(plyr)
#library(dplyr)
readSet <- select(read.csv("repdata-data-StormData.csv.bz2",header=TRUE, 
                na.strings=c("NA","N/A","")), EVTYPE, FATALITIES, INJURIES, 
                PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, BGN_DATE, REMARKS)
  • There are 902297 observations in the datasets:
nrow(readSet)
## [1] 902297

Cleaning the data

  1. Basic cleaning.
    The dataset originally included recording for the years 1950-2011 (Figure 1). The proccessed data includes only observations for the years 1996-2001 since earlier data is incomplete and since only three types of events were recorded for the years 1950-1995 (Tornado, Thunderstorm Wind and Hail). See NOAA database details: http://www.ncdc.noaa.gov/stormevents/details.jsp
  • The following process do some primary formatting and filtering for the dataset:
    • format: low case all variables and observations.
    • format: date to year (add new column).
    • filter: keep only years 1996-2011.
names(readSet) <- tolower(names(readSet)) ##change column names to lower case
readSet$evtype <- tolower(readSet$evtype) ##change event names to lower case
readSet$year <- strptime(readSet$bgn_date, "%m/%d/%Y %H:%M:%S")
readSet$year <- format(readSet$year, "%Y")
ListAllYearsObsv <- readSet$year ##keep frequencies of all years for a later plot
readSet <- filter(readSet, year>1995)
  • The dataset includes the years 1996-2011 and the Number of observations is now 653530:
nrow(readSet)
## [1] 653530
  • The following graph show the distribution of observations in the original dataset, for the years 1950-2011.
plot(table(ListAllYearsObsv), type="p", xlab="Years", ylab="Number of observations (yearly)", main="Number of observations from 1950 to 2011")
Figure 1: Number of observations for each year (1950-2011). The yearly number of observations until the early nineties is much lower than the following years.

Figure 1: Number of observations for each year (1950-2011). The yearly number of observations until the early nineties is much lower than the following years.

  1. Converting units
    The following process remove invalid property damage values: only values with “exp” values of “K”, “M”, “B”, or zero, are relevant for this analysis.
  • format: property damages as million dollar:
    • A: filter: keep only valid observations with EXP K M B or 0, set the rest as NA
    • B: multiplying/dividing according to EXP to get consistent unit of million dollar.
    • C: re-structure the dataset: remove EXP columns and zero/empty observations.
##before processing property, check how many of each:
readSet$propdmgexp <- as.character(readSet$propdmgexp)
table(readSet$propdmgexp)  ##there are about 377,000 exp observations
## 
##      0      B      K      M 
##      1     32 369938   7374
sum(is.na(readSet$propdmgexp))  ##the rest of exp values are NA
## [1] 276185
readSet$propdmg <- as.numeric(readSet$propdmg)
readSet$propdmg[is.na(readSet$propdmgexp)] <- NA  ##reset property damages of no multipliers

readSet$cropdmgexp <- as.character(readSet$cropdmgexp)
readSet$cropdmg <- as.numeric(readSet$cropdmg)
readSet$cropdmg[is.na(readSet$cropdmgexp)] <- NA ##reset crop damages of no multipliers

##convert all values to units=million dollar
readSet$propdmg[readSet$propdmgexp == "K" & !is.na(readSet$propdmg)] <- readSet$propdmg[readSet$propdmgexp == "K" & !is.na(readSet$propdmg)] / 1000
readSet$propdmg[readSet$propdmgexp == "B" & !is.na(readSet$propdmg)] <- readSet$propdmg[readSet$propdmgexp == "B" & !is.na(readSet$propdmg)] * 1000
readSet$cropdmg[readSet$cropdmgexp == "K" & !is.na(readSet$cropdmg)] <- readSet$cropdmg[readSet$cropdmgexp == "K" & !is.na(readSet$cropdmg)] / 1000
readSet$cropdmg[readSet$cropdmgexp == "B" & !is.na(readSet$cropdmg)] <- readSet$cropdmg[readSet$cropdmgexp == "B" & !is.na(readSet$cropdmg)] * 1000

##re-structure the data to omit variables of multiplier
readSet <- select(readSet, evtype, fatalities, injuries, propdmg, cropdmg, bgn_date, year, remarks)

##remove unused observations (of zero or NA whole row)
readSet <-  readSet[!((readSet$fatalities==0 & readSet$injuries==0) & (is.na(readSet$propdmg) | readSet$propdmg==0) & (is.na(readSet$cropdmg) | readSet$cropdmg==0)),]
  1. Setting uniform event names
    Officially there should be 48 event types, however, there are many more due to inconsistent names or typo. The event names for event that are related to fatalities, injuries, propdmg and cropdmg are (186 different names):
sort(unique(readSet$evtype[readSet$fatalities>0 | readSet$injuries>0 | (readSet$propdmg>0 & !is.na(readSet$propdmg)) | (readSet$cropdmg>0 & !is.na(readSet$cropdmg))]))
##   [1] "   high surf advisory"     " flash flood"             
##   [3] " tstm wind"                " tstm wind (g45)"         
##   [5] "agricultural freeze"       "astronomical high tide"   
##   [7] "astronomical low tide"     "avalanche"                
##   [9] "beach erosion"             "black ice"                
##  [11] "blizzard"                  "blowing dust"             
##  [13] "blowing snow"              "brush fire"               
##  [15] "coastal  flooding/erosion" "coastal erosion"          
##  [17] "coastal flood"             "coastal flooding"         
##  [19] "coastal flooding/erosion"  "coastal storm"            
##  [21] "coastalstorm"              "cold"                     
##  [23] "cold and snow"             "cold temperature"         
##  [25] "cold weather"              "cold/wind chill"          
##  [27] "dam break"                 "damaging freeze"          
##  [29] "dense fog"                 "dense smoke"              
##  [31] "downburst"                 "drought"                  
##  [33] "drowning"                  "dry microburst"           
##  [35] "dust devil"                "dust storm"               
##  [37] "early frost"               "erosion/cstl flood"       
##  [39] "excessive heat"            "excessive snow"           
##  [41] "extended cold"             "extreme cold"             
##  [43] "extreme cold/wind chill"   "extreme windchill"        
##  [45] "falling snow/ice"          "flash flood"              
##  [47] "flash flood/flood"         "flood"                    
##  [49] "flood/flash/flood"         "fog"                      
##  [51] "freeze"                    "freezing drizzle"         
##  [53] "freezing fog"              "freezing rain"            
##  [55] "freezing spray"            "frost"                    
##  [57] "frost/freeze"              "funnel cloud"             
##  [59] "glaze"                     "gradient wind"            
##  [61] "gusty wind"                "gusty wind/hail"          
##  [63] "gusty wind/hvy rain"       "gusty wind/rain"          
##  [65] "gusty winds"               "hail"                     
##  [67] "hard freeze"               "hazardous surf"           
##  [69] "heat"                      "heat wave"                
##  [71] "heavy rain"                "heavy rain/high surf"     
##  [73] "heavy seas"                "heavy snow"               
##  [75] "heavy snow shower"         "heavy surf"               
##  [77] "heavy surf and wind"       "heavy surf/high surf"     
##  [79] "high seas"                 "high surf"                
##  [81] "high swells"               "high water"               
##  [83] "high wind"                 "high wind (g40)"          
##  [85] "high winds"                "hurricane"                
##  [87] "hurricane edouard"         "hurricane/typhoon"        
##  [89] "hyperthermia/exposure"     "hypothermia/exposure"     
##  [91] "ice jam flood (minor"      "ice on road"              
##  [93] "ice roads"                 "ice storm"                
##  [95] "icy roads"                 "lake-effect snow"         
##  [97] "lake effect snow"          "lakeshore flood"          
##  [99] "landslide"                 "landslides"               
## [101] "landslump"                 "landspout"                
## [103] "late season snow"          "light freezing rain"      
## [105] "light snow"                "light snowfall"           
## [107] "lightning"                 "marine accident"          
## [109] "marine hail"               "marine high wind"         
## [111] "marine strong wind"        "marine thunderstorm wind" 
## [113] "marine tstm wind"          "microburst"               
## [115] "mixed precip"              "mixed precipitation"      
## [117] "mud slide"                 "mudslide"                 
## [119] "mudslides"                 "non-severe wind damage"   
## [121] "non-tstm wind"             "non tstm wind"            
## [123] "other"                     "rain"                     
## [125] "rain/snow"                 "record heat"              
## [127] "rip current"               "rip currents"             
## [129] "river flood"               "river flooding"           
## [131] "rock slide"                "rogue wave"               
## [133] "rough seas"                "rough surf"               
## [135] "seiche"                    "small hail"               
## [137] "snow"                      "snow and ice"             
## [139] "snow squall"               "snow squalls"             
## [141] "storm surge"               "storm surge/tide"         
## [143] "strong wind"               "strong winds"             
## [145] "thunderstorm"              "thunderstorm wind"        
## [147] "thunderstorm wind (g40)"   "tidal flooding"           
## [149] "tornado"                   "torrential rainfall"      
## [151] "tropical depression"       "tropical storm"           
## [153] "tstm wind"                 "tstm wind  (g45)"         
## [155] "tstm wind (41)"            "tstm wind (g35)"          
## [157] "tstm wind (g40)"           "tstm wind (g45)"          
## [159] "tstm wind 40"              "tstm wind 45"             
## [161] "tstm wind and lightning"   "tstm wind g45"            
## [163] "tstm wind/hail"            "tsunami"                  
## [165] "typhoon"                   "unseasonable cold"        
## [167] "unseasonably cold"         "unseasonably warm"        
## [169] "unseasonal rain"           "urban/sml stream fld"     
## [171] "volcanic ash"              "warm weather"             
## [173] "waterspout"                "wet microburst"           
## [175] "whirlwind"                 "wild/forest fire"         
## [177] "wildfire"                  "wind"                     
## [179] "wind and wave"             "wind damage"              
## [181] "winds"                     "winter storm"             
## [183] "winter weather"            "winter weather mix"       
## [185] "winter weather/mix"        "wintry mix"
  • The following process rename events with improper names. The remarks were checked where clarifications were needed for matching the names to the official event names. Single improper names (represent one single observation) with minor damage were ignored in this process (evtype was set to “unused”) and the total amount of “unused” is presented below. Events with vague names and significant consequences, were checked in the field “remarks” and set to the matched proper event type name. They were also described with comments (##) specifying the amount of fatalities, injuries, propdmg and cropdmg (f, i, p, c, respectively).
  • An example for checking observations with improper event names: the event “wind” has no obvious match to the official event names, there are a few categories with the word “wind”. It has 67 observations:
windObservations <- nrow(readSet[readSet$evtype=="wind",])
windFatal <- sum(readSet$fatalities[readSet$evtype=="wind"], na.rm=TRUE)
windInjur <- sum(readSet$injuries[readSet$evtype=="wind"], na.rm=TRUE)
windProp <- sum(readSet$propdmg[readSet$evtype=="wind"], na.rm=TRUE)
windCrop <- sum(readSet$cropdmg[readSet$evtype=="wind"], na.rm=TRUE)
print(paste0("number of WIND observations: " , windObservations , ", total fatalities: " , windFatal , ", total injuries: " , windInjur , ", total propdmg: " , windProp , " million dollar, total cropdmg: " , windCrop , " million dollar."))
## [1] "number of WIND observations: 67, total fatalities: 18, total injuries: 84, total propdmg: 2.2895 million dollar, total cropdmg: 0.3 million dollar."
##readSet[readSet$evtype=="wind",][1:70,]
  • For these 67 observations, the total fatalities is 18, injuries is 84, propdmg is 2.2895 million dollar and cropdmg is 0.3 million dollar. Therefore the remarks field was checked to better identify the events. Most of the remarks mentioned thunderstorm wind or a speed higher then 40 mph which match the definition of thunderstorm, therefore the event name was changed to “thunderstorm wind”. A comment was added in the code to list the amount of fatalities, injuries, propdmg, and cropdmg that were originally associated with the event name “wind.”

  • The following process set uniform names for the event types in variable evtype.

##cleaning the data: renaming event types

##wind different types
readSet$evtype[grep("^[m].*tstm.*", readSet$evtype)] <- "marine thunderstorm wind"
readSet$evtype[grep("non.?tstm wind*", readSet$evtype)] <- "strong wind"
readSet$evtype[grep("^[^m].*tstm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("tstm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("thunderstorm.*g40.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("high.*g40.*", readSet$evtype)] <- "high wind"
readSet$evtype <- gsub("winds", "wind", readSet$evtype)
readSet$evtype[grep("extreme windchill", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep(".*gusty.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep("non-severe wind damage", readSet$evtype)] <- "high wind"
readSet$evtype[grep("gradient wind", readSet$evtype)] <- "strong wind"
readSet$evtype[grep("heavy surf and wind", readSet$evtype)] <- "high surf"
readSet$evtype[grep("wind and wave", readSet$evtype)] <- "marine thunderstorm wind"
##cold
readSet$evtype[grep(".*extended cold.*", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep(".*cold and snow.*", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep("^[e].*cold.*", readSet$evtype)] <- "extreme cold/wind chill"
readSet$evtype[grep("^[^e].*cold.*", readSet$evtype)] <- "cold/wind chill"
readSet$evtype[grep("^cold.*", readSet$evtype)] <- "cold/wind chill"
##heat
readSet$evtype[grep(".*heat wave.*", readSet$evtype)] <- "heat"
readSet$evtype[grep(".*record heat.*", readSet$evtype)] <- "excessive heat"
##flood
readSet$evtype[grep(".*coastal.*flood.*", readSet$evtype)] <- "coastal flood"
readSet$evtype[grep(".*cstl.*", readSet$evtype)] <- "coastal flood"
readSet$evtype[grep(".*tidal.*", readSet$evtype)] <- "coastal flood"
readSet$evtype[grep(".*flash.*flood.*", readSet$evtype)] <- "flash flood"
readSet$evtype[grep(".*river.*", readSet$evtype)] <- "flood" ##p=126.437M
readSet$evtype[grep(".*ice jam.*", readSet$evtype)] <- "flood"
##fog
readSet$evtype[grep("^fog.*", readSet$evtype)] <- "dense fog"  ##f=60 i=712 p=13.15M
##freez
readSet$evtype[grep(".*freezing rain.*", readSet$evtype)] <- "temp1"
readSet$evtype[grep(".*freezing fog.*", readSet$evtype)] <- "temp2"
readSet$evtype[grep(".*freez.*", readSet$evtype)] <- "frost/freeze"
readSet$evtype[grep(".*temp1.*", readSet$evtype)] <- "freezing fog"
readSet$evtype[grep(".*temp2.*", readSet$evtype)] <- "sleet"
#snow
readSet$evtype[grep(".*heavy snow shower.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep(".*excessive snow.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep(".*falling snow/ice.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep(".*light snow.*", readSet$evtype)] <- "heavy snow" ##f=1, i=2, p=2.598M
readSet$evtype[grep(".*snow squall.*", readSet$evtype)] <- "blizzard"
readSet$evtype[grep(".*snow and ice.*", readSet$evtype)] <- "heavy snow"
readSet$evtype[grep("^snow?", readSet$evtype)] <- "heavy snow" ##f=2, i=12, p=2.554M
readSet$evtype[grep(".*rain/snow.*", readSet$evtype)] <- "blizzard"
readSet$evtype[grep(".*blowing snow.*", readSet$evtype)] <- "blizzard"
readSet$evtype[grep(".*lake.*snow.*", readSet$evtype)] <- "lake-effect snow"
readSet$evtype[grep(".*late season snow.*", readSet$evtype)] <- "heavy snow"
##various types
readSet$evtype[grep(".*surf.*", readSet$evtype)] <- "high surf"
readSet$evtype[grep(".*wild.*", readSet$evtype)] <- "wildfire"
readSet$evtype[grep(".*wint.*mix.*", readSet$evtype)] <- "winter weather" ##f=29 i=217 p=6M
readSet$evtype[grep(".*rip.*", readSet$evtype)] <- "rip current"
readSet$evtype[grep(".*fld.*", readSet$evtype)] <- "flood" ##f=28 i=79 p=58M
readSet$evtype[grep(".*dry microburst.*", readSet$evtype)] <- "thunderstorm wind" ##f=3 i=25 p=1.7M
readSet$evtype[grep(".*coastal storm.*", readSet$evtype)] <- "marine thunderstorm wind"
readSet$evtype[grep(".*hurricane|typhoon.*", readSet$evtype)] <- "hurricane (typhoon)"
readSet$evtype[grep(".*coastalstorm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep(".*frost.*", readSet$evtype)] <- "frost/freeze"
readSet$evtype[grep(".*torrential rainfall.*", readSet$evtype)] <- "heavy rain"
readSet$evtype[grep(".*ic.*road.*", readSet$evtype)] <- "frost/freeze" ##f=1 i=1 p=0.012M
readSet$evtype[grep(".*glaze.*", readSet$evtype)] <- "frost/freeze" ##f=1 i=212 p=0.15M
readSet$evtype[grep(".*exposure.*", readSet$evtype)] <- "extreme cold/wind chill" ##f=8
readSet$evtype[grep(".*land.*", readSet$evtype)] <- "debris flow" ##f=38 i=53 p=325 c=20.017
readSet$evtype[grep(".*mudslide.*", readSet$evtype)] <- "debris flow" ##f=5 i=2 p=1.225
readSet$evtype[grep(".*mixed precip.*", readSet$evtype)] <- "frost/freeze" ##f=2 i=26 p=0.79
readSet$evtype[grep(".*rough seas.*", readSet$evtype)] <- "marine strong wind"
readSet$evtype[grep(".*small hail.*", readSet$evtype)] <- "hail"
readSet$evtype[grep(".*storm surge.*", readSet$evtype)] <- "storm surge/tide"
readSet$evtype[grep("^thunderstorm.*", readSet$evtype)] <- "thunderstorm wind"
readSet$evtype[grep(".*whirlwind.*", readSet$evtype)] <- "dust devil" ##f=1 i=0 p=0.012
readSet$evtype[grep(".*warm.*", readSet$evtype)] <- "excessive heat" ##f=0 i=16 c=0.01
readSet$evtype[grep("^wind.*", readSet$evtype)] <- "thunderstorm wind" ##f=19 i=85 p=2.33M c=0.3
readSet$evtype[grep(".*astronomical high tide.*", readSet$evtype)] <- "coastal flood" ##f=0 i=0 p=9.425M
readSet$evtype[grep(".*dam break.*", readSet$evtype)] <- "flash flood" ##p=1.002M
readSet$evtype[grep(".*mud slide.*", readSet$evtype)] <- "debris flow"
readSet$evtype[grep(".*rock slide.*", readSet$evtype)] <- "debris flow"
readSet$evtype[grep(".*unseasonal rain.*", readSet$evtype)] <- "heavy rain" ##c=10M

##The following event type  (except for "other") were set as "unused":

##unused type           fatalities      injuries        propdmg M-doler cropdmg
##other:                7               4               0.055           1.034
##black ice:            1               24              0               
##brush fire:           0               2               0               
##drowning:             1               0               0               
##heavy seas:           1               0               0               
##high seas:            3               7               0.015           
#high swells:           1               0               0.005           
##high water:           3               0               0               
##marine accident:      1               2               0.05    
##wind damage:          0               0               0.01
##beach erosion:        0               0               0.1             
##blowing dust:         0               0               0.02            
##downburst:            0               0               0.002   
##microburst:           0               0               0.055
##rain:                 0               0               0.3             0.25
##coastal erosion:      0               0               0.766

readSet$evtype[grep(".*black ice.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*brush fire.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*drowning.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*heavy seas.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*high seas.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*high swells.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*high water.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*marine accident.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*rogue wave.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*wind damage.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*beach erosion.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*blowing dust.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*downburst.*", readSet$evtype)] <- "unused"
readSet$evtype[grep(".*microburst.*", readSet$evtype)] <- "unused"
readSet$evtype[readSet$evtype=="rain"] <- "unused"
readSet$evtype[grep(".*coastal erosion.*", readSet$evtype)] <- "unused"
  • After renaming the event types, there are 50 events (see below): 48 official event types, the event “other” (various events), and the event “unused”. The last contain events with improper names and low values.
sort(unique(readSet$evtype[readSet$fatalities>0 | readSet$injuries>0 | (readSet$propdmg>0 & !is.na(readSet$propdmg)) | (readSet$cropdmg>0 & !is.na(readSet$cropdmg))]))
##  [1] "astronomical low tide"    "avalanche"               
##  [3] "blizzard"                 "coastal flood"           
##  [5] "cold/wind chill"          "debris flow"             
##  [7] "dense fog"                "dense smoke"             
##  [9] "drought"                  "dust devil"              
## [11] "dust storm"               "excessive heat"          
## [13] "extreme cold/wind chill"  "flash flood"             
## [15] "flood"                    "freezing fog"            
## [17] "frost/freeze"             "funnel cloud"            
## [19] "hail"                     "heat"                    
## [21] "heavy rain"               "heavy snow"              
## [23] "high surf"                "high wind"               
## [25] "hurricane (typhoon)"      "ice storm"               
## [27] "lake-effect snow"         "lakeshore flood"         
## [29] "lightning"                "marine hail"             
## [31] "marine high wind"         "marine strong wind"      
## [33] "marine thunderstorm wind" "other"                   
## [35] "rip current"              "seiche"                  
## [37] "sleet"                    "storm surge/tide"        
## [39] "strong wind"              "thunderstorm wind"       
## [41] "tornado"                  "tropical depression"     
## [43] "tropical storm"           "tsunami"                 
## [45] "unused"                   "volcanic ash"            
## [47] "waterspout"               "wildfire"                
## [49] "winter storm"             "winter weather"
  • The events “unused” and “other” are summarized below:
unusedRows <- nrow(readSet[readSet$evtype=="unused",])
unusedFatal <- sum(readSet$fatal[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
unusedInjur <- sum(readSet$injur[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
unusedProp <- sum(readSet$prop[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)
unusedCtop <- sum(readSet$crop[grep(".*unused.*", readSet$evtype)],na.rm=TRUE)

otherRows <- nrow(readSet[readSet$evtype=="other",])
otherFatal <- sum(readSet$fatal[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
otherInjur <- sum(readSet$injur[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
otherProp <- sum(readSet$prop[grep(".*other.*", readSet$evtype)],na.rm=TRUE)
otherCtop <- sum(readSet$crop[grep(".*other.*", readSet$evtype)],na.rm=TRUE)

print(paste0("The number of unused observations: " , unusedRows , ", total fatalities: " , unusedFatal , ", total injuries: " , unusedInjur , ", total propdmg: " , unusedProp , " million dollar, total cropdmg: " , unusedProp , " million dollar. The number of other observations: " , otherRows , ", total fatalities: " , otherFatal , ", total injuries: " , otherInjur , ", total propdmg: " , otherProp , " million dollar, total cropdmg: " , otherCtop , " million dollar."))
## [1] "The number of unused observations: 27, total fatalities: 11, total injuries: 37, total propdmg: 1.313 million dollar, total cropdmg: 1.313 million dollar. The number of other observations: 34, total fatalities: 0, total injuries: 4, total propdmg: 0.0555 million dollar, total cropdmg: 1.0344 million dollar."

Final re-structuring and error editing of the tidy data

NOTE: There is an error in the dataset that is significant. There are two records for an event flood that refer the Napa River (CA): 12/31/2005 - propdmg is 115 M; 1/1/2006 - propdmg is 115 B. The correct value is 115 M. Therefore, the value of 115 B was set to NA. These records are corrected in the updated data (2014) in the following link:
http://www.ncdc.noaa.gov/stormevents/textsearch.jsp?q=Napa+River+City+and+Parks+Department

readSet$propdmg[readSet$propdmg==115000] <- NA ##fix error in the dataset

Sum total for each field to see what is the 100% of the entire dataset.
The following process summarize the four variable, for a rough estimate:

print(paste0("total fatalities: " , sum(readSet$fatalities, na.rm=TRUE) , ", total injuries: " , sum(readSet$injuries, na.rm=TRUE) , ", total property damage in million dollar: " , round(sum(readSet$propdmg, na.rm=TRUE),2) , ", total crop damage in million dollar: " , round(sum(readSet$cropdmg, na.rm=TRUE),2)))
## [1] "total fatalities: 8732, total injuries: 57975, total property damage in million dollar: 251767.62, total crop damage in million dollar: 34752.73"
  • For the purpose of the analysis in this report, the following variables are required:
    • evtype
    • year
    • fatalities
    • injuries
    • propdmg
    • cropdmg
  • The following code re-structure the dataset to filter variables and keep only variables that are relevant for the analysis.
readSet <- select(readSet, evtype, year, fatalities, injuries, propdmg, cropdmg)
readSet <- arrange(readSet, evtype, year)
  • The dataset is now a tidy data that match the purpose of the analysis. The first 5 observations are listed below:
readSet[1:5,]
##                  evtype year fatalities injuries propdmg cropdmg
## 1 astronomical low tide 2007          0        0    0.12       0
## 2 astronomical low tide 2008          0        0    0.20       0
## 3             avalanche 1996          0        2      NA      NA
## 4             avalanche 1996          1        1      NA      NA
## 5             avalanche 1996          2        0      NA      NA

Creating summary tables of health and economics

The analyses here aim to indicate events that have the highest impact on health and economics, where health associated categories are fatalities and injuries, and economics associated categories are property damages and crop damages.
Therefore, two separate tables with the relevant calculations were created, as follows.

For the following code, the packages “reshape”, “data.table”, and “ggplot2” are required.

#library(reshape) ##reshape package is required for changing the data structure
#library(data.table) ##data.table package is required for fast binding
#library(ggplot2) ##ggplot2 package is required for the following plot
  1. First, total sum values were calculated for each of the four categories for each event type.
##calculate total fatalities, injuries, propdmg, cropdmg for each event
allTotals <- ddply(readSet, .(evtype), summarize, totalFatalities=sum(fatalities), totalInjuries=sum(injuries), totalPropdmg=sum(propdmg, na.rm=TRUE), totalCropdmg=sum(cropdmg, na.rm=TRUE))
##restructure the totals: column 1 evtype, column 2 categories, column 3 totals
allTotals <- melt(allTotals, id=c("evtype"), categoryFactor.vars=c("fatalities","injuries", "prop", "crop")) 
names(allTotals)[2:3] <- c("categoryFactor", "totals")
  1. Second, yearly total sum values were calculated, i.e. total sum values for each of the four categories for each year for each event type.
##calculate yearly fatalities, injuries, propdmg, cropdmg, for each event (1996-2011)
allYearly <- ddply(readSet, .(evtype, year), summarize, yearlyFatalities=sum(fatalities), yearlyInjuries=sum(injuries), yearlyPropdmg=sum(propdmg, na.rm=TRUE), yearlyCropdmg=sum(cropdmg, na.rm=TRUE))
##restructure the totals: col1 evtype, col2 year, col3 categories, col4 yearly totals
allYearly <- melt(allYearly, id=c("evtype", "year"), categoryFactor.vars=c("fatalities","injuries", "prop", "crop")) 
names(allYearly)[3:4] <- c("categoryFactor", "yearlyTotals")
  1. The tables of totals and yearly totals were integrated into one large table.
tableAll <- rbindlist(list(select(allTotals, evtype, categoryFactor, totals), select(allYearly, evtype, categoryFactor, yearlyTotals)))
  1. Then, the large table was split into two smaller tables: one for health associated consequences, and one for economics. For each table, the event type was defined as factor with levels that are ordered (descending) by the totals of fatalities or by totals of propdmg+cropdmg (for health and economics respectively) in descending order.
##health table
health <- tableAll[grep("Fatalities|Injuries", tableAll$categoryFactor)] ##subset health categories
healthLevels <- arrange(health[health$categoryFactor=="totalFatalities", ], desc(totals))[,"evtype", with=FALSE]
healthLevels$evtype <- factor(healthLevels$evtype, levels=healthLevels$evtype)
health$evtype <- factor(health$evtype, levels=levels(healthLevels$evtype)) ##sort event levels by totals of fatalities

##economics table
economic <- tableAll[grep("Propdmg|Cropdmg", tableAll$categoryFactor)] ##subset economic categories

totalEconomi <- economic[economic$categoryFactor=="totalPropdmg"] 
totalEconomi$categoryFactor <- "totalPropAndCropDmg"
totalEconomi$totals <- economic$totals[economic$categoryFactor=="totalPropdmg"] + economic$totals[economic$categoryFactor=="totalCropdmg"]
totalEconomi <- arrange(totalEconomi, desc(totals))
totalEconomi$evtype <- factor(totalEconomi$evtype, levels=totalEconomi$evtype)

economic$evtype <- factor(economic$evtype, levels=levels(totalEconomi$evtype)) ##Economi dataset, sort the event levels by sum of propdmg+cropdmg

Results

The summary tables for health and economics consequences, that were prepared at the data processing section, contain totals for each event types. There are totals for all years and totals for each year (1996-2011). The year 2011 originally included data until November only.

Figure 2 and Figure 3 below enable to compare the impact of each event type on population health and economical damages, respectively. The box-plot figures enable to get an idea of the differences in yearly impact, where the box and points show the distribution of the yearly values of each event type.

Following are the top 15 harmful events of each category.

The most harmful event types with respect to population health

A. Causes for fatalities

head(arrange(health[health$categoryFactor=="totalFatalities", ], desc(totals)), 15)
##                      evtype  categoryFactor totals
##  1:          excessive heat totalFatalities   1799
##  2:                 tornado totalFatalities   1511
##  3:             flash flood totalFatalities    887
##  4:               lightning totalFatalities    651
##  5:             rip current totalFatalities    542
##  6:                   flood totalFatalities    444
##  7:       thunderstorm wind totalFatalities    406
##  8: extreme cold/wind chill totalFatalities    280
##  9:                    heat totalFatalities    237
## 10:               high wind totalFatalities    235
## 11:               avalanche totalFatalities    223
## 12:            winter storm totalFatalities    191
## 13:               high surf totalFatalities    145
## 14:     hurricane (typhoon) totalFatalities    125
## 15:         cold/wind chill totalFatalities    117

B. Causes for injuries

head(arrange(health[health$categoryFactor=="totalInjuries", ], desc(totals)), 15)
##                  evtype categoryFactor totals
##  1:             tornado  totalInjuries  20667
##  2:               flood  totalInjuries   6838
##  3:      excessive heat  totalInjuries   6410
##  4:   thunderstorm wind  totalInjuries   5250
##  5:           lightning  totalInjuries   4141
##  6:         flash flood  totalInjuries   1674
##  7:            wildfire  totalInjuries   1456
##  8: hurricane (typhoon)  totalInjuries   1328
##  9:                heat  totalInjuries   1292
## 10:        winter storm  totalInjuries   1292
## 11:           high wind  totalInjuries   1090
## 12:           dense fog  totalInjuries    855
## 13:                hail  totalInjuries    723
## 14:          heavy snow  totalInjuries    717
## 15:      winter weather  totalInjuries    560

C. Graphs

printHealth <- qplot(evtype, totals, data=health, facets=categoryFactor~.) 
printHealth + geom_boxplot() + theme(axis.text.x  = element_text(angle=90, vjust=0.5, size=11, face="bold", color="black")) + facet_grid(categoryFactor~., scales="free_y") + theme(strip.text=element_text(size=14)) + labs(x="Event Types", y="Fatalities or Injuries (free scale)", title="Weather harmful events with respect to population health \n (fatalities or injuries)")
Figure 2: Weather harmful events with respect to population health. The events are ordered (descending) by the number of total fatalities (from left to right) over the years 1996-2011. The vertical scale indicate the number of fatalities or injuries. From upper to lower: the first and second graphs show the total fatalities and injuries, respectively; the third and forth graphs show the fatalities and injuries (respectively) for the different years.

Figure 2: Weather harmful events with respect to population health. The events are ordered (descending) by the number of total fatalities (from left to right) over the years 1996-2011. The vertical scale indicate the number of fatalities or injuries. From upper to lower: the first and second graphs show the total fatalities and injuries, respectively; the third and forth graphs show the fatalities and injuries (respectively) for the different years.

From the above tables and graphs of Figure 2, it seems that the event types that are associated with highest numbers of fatalities are excessive heat and tornado, and the one associated with highest numbers of injuries is tornado. Also another four event types are associated with quite a large amount of injuries: flood, excessive heat, thunderstorm wind, and lightning.

The Event types with the greatest economical consequences

A. Causes for property damages (in million dollar)

head(arrange(economic[economic$categoryFactor=="totalPropdmg", ], desc(totals)), 15)
##                  evtype categoryFactor     totals
##  1: hurricane (typhoon)   totalPropdmg 81718.8890
##  2:    storm surge/tide   totalPropdmg 47834.7240
##  3:               flood   totalPropdmg 29129.5812
##  4:             tornado   totalPropdmg 24616.9457
##  5:         flash flood   totalPropdmg 15223.2709
##  6:                hail   totalPropdmg 14595.2134
##  7:   thunderstorm wind   totalPropdmg  7919.2480
##  8:            wildfire   totalPropdmg  7760.4495
##  9:      tropical storm   totalPropdmg  7642.4756
## 10:           high wind   totalPropdmg  5248.3834
## 11:           ice storm   totalPropdmg  3642.2488
## 12:        winter storm   totalPropdmg  1532.7432
## 13:             drought   totalPropdmg  1046.1010
## 14:           lightning   totalPropdmg   743.0771
## 15:          heavy snow   totalPropdmg   641.6945

B. Causes for crop damages (in million dollar)

head(arrange(economic[economic$categoryFactor=="totalCropdmg", ], desc(totals)), 15)
##                      evtype categoryFactor     totals
##  1:                 drought   totalCropdmg 13367.5660
##  2:     hurricane (typhoon)   totalCropdmg  5350.1078
##  3:                   flood   totalCropdmg  5013.1615
##  4:                    hail   totalCropdmg  2496.8225
##  5:            frost/freeze   totalCropdmg  1368.7610
##  6:             flash flood   totalCropdmg  1334.9017
##  7: extreme cold/wind chill   totalCropdmg  1326.0230
##  8:       thunderstorm wind   totalCropdmg  1017.4676
##  9:              heavy rain   totalCropdmg   738.1698
## 10:          tropical storm   totalCropdmg   677.7110
## 11:               high wind   totalCropdmg   633.5613
## 12:          excessive heat   totalCropdmg   492.4120
## 13:                wildfire   totalCropdmg   402.2551
## 14:                 tornado   totalCropdmg   283.4250
## 15:              heavy snow   totalCropdmg    71.1221

C. Graphs

printEconomics <- qplot(evtype, totals, data=economic, facets=categoryFactor~.) 
printEconomics + geom_boxplot() + theme(axis.text.x  = element_text(angle=90, vjust=0.5, size=11, face="bold", color="black")) + facet_grid(categoryFactor~., scales="free_y") + theme(strip.text=element_text(size=14)) + labs(x="Event Types", y="Property and Crop Damages in million dollar (free scale)", title="Weather harmful events with respect to property and crop damages \n (in million dollar)")
Figure 3: Weather harmful events with respect to property and crop damages in million dollar. The events are ordered (descending) by the number of total property and crop damages together (from left to right) over the years 1996-2011. The vertical scale indicate the amount of damage in million dollar. From upper to lower: the first and second graphs show the total property damage and crop damage, respectively; the third and forth graphs show the amount of property damage and crop damage (respectively) for the different years.

Figure 3: Weather harmful events with respect to property and crop damages in million dollar. The events are ordered (descending) by the number of total property and crop damages together (from left to right) over the years 1996-2011. The vertical scale indicate the amount of damage in million dollar. From upper to lower: the first and second graphs show the total property damage and crop damage, respectively; the third and forth graphs show the amount of property damage and crop damage (respectively) for the different years.

From the above tables and graphs of Figure 3, it seems that the event type that is associated with the greatest amounts of property damages is hurricane (typhoon), and the one associated with highest amounts of crop damages is drought. Also another five event types are associated with high amounts of property damages (from higher to lower): storm surge/tide, flood, tornado, flash flood, and hail.

Summary

The report indicates that excessive heat and tornado are the major events for heavy loss of life (around 1800 and 1500 fatalities, respectively) and tornado for injuries (around 20650 injuries), while hurricane (typhoon) is responsible for the highest property damages (around 82 billion dollar) and drought for the highest crop damages (around 13 billion dollar). Hurricane and tornado together are associated with damage amounts of around 110 billion dollar over the years 1996-2011, and so are storm surge/tide, flood and flash flood when counted together.