Most damaging weather events in the United States between 1996-2011: an analysis using the NOAA Storm Data

Synopsis of the analysis

This data analysis seeks to find the most damaging weather events in the United States in terms of population health and property damage costs. The data analyzed in this project corresponds to the Storm Data generated by the U.S. National Oceanic and Atmospheric Administration (NOAA). This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The data indicates that the most damaging weather event types to population health, measured in fatalities and injuries, are first and by far Tornadoes, followed by Wind-related events (mainly Thunderstorm Wind), Heat-related events and Floods. While the most economically damaging weather event types, measured by property and crop damage costs in millions of dollars, are Floods, Hurricanes, Tornadoes, and Ocean-related events, the latter mainly due to the damage generated by High Surf.

Data Processing

The Storm Data was loaded using the read.csv function, subsetted to only include the variables of interest with information gathered since the year 1996, when all event types began to be recorded, and cleaned using different strategies. To clean the data associated to economic damage, character-coded exponents were transformed to numbers and used to generate total cost variables. To clean the event types variable, the function amatch was used to replace the un-standardized and typo-filled labels recorded in the data set with the most similar official event names, however, this was not enough and more, manual, re-coding of event types and further grouping were performed using the grep function. After cleaning, specific data sets were generated for each the health and property damage data and processed to generate plots and tables.

# load libraries
library(dplyr)
library(ggplot2)
library(stringdist)
library(reshape2)
library(knitr)

Data Cleaning

##############################################################
                  # load data and subset
##############################################################

# load data
StormData <- read.csv(
        "C:/Users/yo/Dropbox/coursera/Reproducible_Research/project_2/repdata_data_StormData.csv.bz2",
        na.strings = "")

# format date variable as dates
StormData$BGN_DATE <- as.Date(StormData$BGN_DATE, "%m/%d/%Y")

# subset data set eliminating rows with information before 1996 (when all event types began to be recorded)
StormData_sub <- subset(StormData, BGN_DATE >= as.Date("1996-01-01"))

# further subset the data set by selecting variables of interest (event type, fatalities, injuries, property damage cost and crop damage cost associated variables) and only keeping rows where at least one variables among fatalities, injuries, crop or property costs are larger than 0
StormData_sub <- StormData %>% 
        select(EVTYPE, FATALITIES, INJURIES, CROPDMG, CROPDMGEXP, PROPDMG, PROPDMGEXP) %>% 
        filter(FATALITIES > 0 | INJURIES > 0 | CROPDMG > 0 | PROPDMG > 0)


##############################################################
                  # prepare economic cost variable
##############################################################

# check the content of the variables to clean
unique(StormData_sub$PROPDMGEXP)
##  [1] "K" "M" NA  "B" "m" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
unique(StormData_sub$CROPDMGEXP)
## [1] NA  "M" "K" "m" "B" "?" "0" "k"
# recode prop and crop exponents into numbers to later multiply with dollar values for property and crop damage the recoding is based in the data documentation that states letter B, M, K and H represent billions, millions etc. all numbers and symbols will be recoded as 1, assuming the amount entered in the PROPDMG and CROPDMG vars corresponds to dollars.

numbers <- c(0,1,2,3,4,5,6,7)
symbols <- c("+", "-", "?")

StormData_sub <- StormData_sub %>% mutate(PROPDMGEXP = recode(PROPDMGEXP,
                                            "H"=100,
                                            "h"=100,
                                            "K"=1000,
                                            "k"=1000,
                                            "M"=10^6,
                                            "m"=10^6,
                                            "B"=10^9,
                                            "b"=10^9,
                                            numbers=1,
                                            symbols=1
                                             ))

StormData_sub <- StormData_sub %>% mutate(CROPDMGEXP = recode(CROPDMGEXP,
                                            "K"=1000,
                                            "k"=1000,
                                            "M"=10^6,
                                            "m"=10^6,
                                            "B"=10^9,
                                            "b"=10^9,
                                            numbers=1,
                                            symbols=1
                                             ))

# check recoding of PROPDMGEXP and CROPDMGEXP vars
unique(StormData_sub$CROPDMGEXP)
## [1]    NA 1e+06 1e+03 1e+09
unique(StormData_sub$PROPDMGEXP)
## [1] 1e+03 1e+06    NA 1e+09 1e+02
# multiply dollar costs by exponents to get final cost and divide by 10^6 to get millions of $
StormData_sub <- StormData_sub %>% mutate(prop_damage = (PROPDMG * PROPDMGEXP)/1000000)
StormData_sub <- StormData_sub %>% mutate(crop_damage = (CROPDMG * CROPDMGEXP)/1000000)

# subset dataframe to only include total cost variables
StormData_sub <- StormData_sub %>% 
        select(EVTYPE, FATALITIES, INJURIES, prop_damage, crop_damage)

# check
str(StormData_sub)
## 'data.frame':    254633 obs. of  5 variables:
##  $ EVTYPE     : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES   : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ prop_damage: num  0.025 0.0025 0.025 0.0025 0.0025 0.0025 0.0025 0.0025 0.025 0.025 ...
##  $ crop_damage: num  NA NA NA NA NA NA NA NA NA NA ...
##############################################################
                  # clean EVTYPE variable
##############################################################
# check content of EVTYPE variable
# unique(StormData_sub$EVTYPE) #this results in thousands of variable and typo-filled labels which need to be cleaned

# eliminate blank spaces before and after characters in the EVTYPE variable and convert them all to lower case
StormData_sub$EVTYPE <- trimws(tolower(as.character(StormData_sub$EVTYPE)), which = "both" )

#load official list of events from the NOAA  
EventList <- read.csv(
        "C:/Users/yo/Dropbox/coursera/Reproducible_Research/project_2/event_list.csv")

#list official event names to use them with the amatch function below
event_list <- EventList$event_name

#generate new column (EVTYPE_2) to add official event names
StormData_sub$EVTYPE_2 <- NA

# fill the new column (EVTYPE_2) with official names by matching event names when 3 or less differences occur
for (i in 1:nrow(StormData_sub)){
        
 StormData_sub$EVTYPE_2[i] <- event_list[amatch(StormData_sub$EVTYPE[i], event_list, maxDist=4)]
 
}

# further fill the new column (EVTYPE_2) with official event names using grep 

StormData_sub$EVTYPE_2[grep("^flood", StormData_sub$EVTYPE)] <- "Flood"

StormData_sub$EVTYPE_2[grep("^heat", StormData_sub$EVTYPE)] <- "Heat"

StormData_sub$EVTYPE_2[grep("hail", StormData_sub$EVTYPE)] <- "Hail"

StormData_sub$EVTYPE_2[grep("cold", StormData_sub$EVTYPE)] <- "Cold/Wind Chill"

StormData_sub$EVTYPE_2[grep("^(extreme cold|extreme wind)", 
                                    StormData_sub$EVTYPE)] <- "Extreme Cold/Wind Chill"

StormData_sub$EVTYPE_2[grep("(tstm wind|tstmw|thunderstorm|severe thunderstorm)", 
                                    StormData_sub$EVTYPE)] <- "Thunderstorm Wind"

StormData_sub$EVTYPE_2[grep("^winter weather", 
                                    StormData_sub$EVTYPE)] <- "Winter Weather"

StormData_sub$EVTYPE_2[grep("hurricane|typhoon", 
                                    StormData_sub$EVTYPE)] <- "Hurricane (Typhoon)"

StormData_sub$EVTYPE_2[grep("landslide|mudslide|mud slide|landslump", 
                                    StormData_sub$EVTYPE)] <- "Debris Flow"

StormData_sub$EVTYPE_2[grep("frost|freezing|ice|freeze|cold", 
                                    StormData_sub$EVTYPE)] <- "Frost/Freeze"

StormData_sub$EVTYPE_2[grep("^(gusty wind|wind damage|gradient wind|wind storm)", 
                                    StormData_sub$EVTYPE)] <- "Strong Wind"

StormData_sub$EVTYPE_2[grep("^(coastal flood|coastal  flood)", 
                                    StormData_sub$EVTYPE)] <- "Coastal Flood"

StormData_sub$EVTYPE_2[grep("dust", StormData_sub$EVTYPE)] <- "Dust Storm"

StormData_sub$EVTYPE_2[grep("flash.flood|flood.flash", 
                                    StormData_sub$EVTYPE)] <- "Flash Flood"

StormData_sub$EVTYPE_2[grep("^(heavy surf|high tides|wind and wave|high waves|rapidly rising water|heavy swells|coastal surge|high seas|high swells|high water|heavy seas|high surf)", 
                                    StormData_sub$EVTYPE)] <- "High Surf"

StormData_sub$EVTYPE_2[grep("^(rogue wave|rough seas|rough surf|rip currents)", 
                                    StormData_sub$EVTYPE)] <- "Sneaker Wave"

StormData_sub$EVTYPE_2[grep("snow", StormData_sub$EVTYPE)] <- "Heavy Snow"

StormData_sub$EVTYPE_2[grep("^(high wind|winter storm high winds)", 
                                    StormData_sub$EVTYPE)] <- "High Wind"

StormData_sub$EVTYPE_2[grep("^waterspout", 
                                    StormData_sub$EVTYPE)] <- "Waterspout"

StormData_sub$EVTYPE_2[grep("^tropical storm", 
                                    StormData_sub$EVTYPE)] <- "Tropical Storm"

StormData_sub$EVTYPE_2[grep("precipitation", 
                                    StormData_sub$EVTYPE)] <- "Winter Weather"

StormData_sub$EVTYPE_2[grep("(heavy rain|rainfall|rain)", 
                                    StormData_sub$EVTYPE)] <- "Heavy Rain"

StormData_sub$EVTYPE_2[grep("^lightning", 
                                    StormData_sub$EVTYPE)] <- "Lightning"

StormData_sub$EVTYPE_2[grep("blizzard", 
                                    StormData_sub$EVTYPE)] <- "Blizzard"

StormData_sub$EVTYPE_2[grep("(brush fire|wild/forest fire|grass fires|fire)", 
                                    StormData_sub$EVTYPE)] <- "Wildfire"

StormData_sub$EVTYPE_2[grep("(hypothermia|hypothermia)", 
                                    StormData_sub$EVTYPE)] <- "NA"


# finish filling the new column (EVTYPE_2) with official event names one by one for more specific labels. Some labels recoded as "NA" because they do not correspond to weather events
StormData_sub = StormData_sub %>% mutate(
     EVTYPE_2 = case_when( 
         
         EVTYPE=="marine mishap"  ~ "NA",
         EVTYPE=="severe turbulence"  ~ "NA",
         EVTYPE=="urban small"  ~ "NA",
         EVTYPE=="urban and small"  ~ "NA",
         EVTYPE=="apache county" ~ "NA",
         EVTYPE=="drowning" ~ "NA",
         EVTYPE=="marine accident" ~ "NA",
         EVTYPE=="non-severe wind damage" ~ "NA",

         EVTYPE=="excessive wetness"  ~ "Other",
         EVTYPE=="urban/small stream"  ~ "Other",
         EVTYPE=="other"  ~ "Other",
         EVTYPE=="beach erosion" ~ "Other",
         EVTYPE=="dam break" ~ "Other",
         EVTYPE=="downburst" ~ "Other",
         EVTYPE=="unseasonal rain" ~ "Other",
         EVTYPE=="erosion/cstl flood" ~ "Other",
         EVTYPE=="microburst" ~ "Other",
         EVTYPE=="coastal erosion" ~ "Other",
         EVTYPE=="late season snow" ~ "Other",
         EVTYPE=="unseasonably warm" ~ "Other",
         EVTYPE=="warm weather" ~ "Other",
         EVTYPE=="unseasonably warm and dry" ~ "Other",
         
         EVTYPE=="glaze"  ~ "Cold/Wind Chill",
         EVTYPE=="black ice" ~ "Cold/Wind Chill",
         EVTYPE=="low temperature" ~ "Cold/Wind Chill",
         EVTYPE=="icy roads" ~ "Cold/Wind Chill",
         
         EVTYPE=="blowing snow" ~ "Blizzard",
         
         EVTYPE=="cold and snow" ~ "Winter Weather",
         EVTYPE=="falling snow/ice" ~ "Winter Weather",
         EVTYPE=="light snow" ~ "Winter Weather",
         EVTYPE=="mixed precip" ~ "Winter Weather",
         EVTYPE=="rain/snow" ~ "Winter Weather", 
         EVTYPE=="snow and ice" ~ "Winter Weather", 
         EVTYPE=="snow" ~ "Winter Weather",
         EVTYPE=="wintry mix" ~ "Winter Weather",
         EVTYPE=="light snowfall" ~ "Winter Weather",
         EVTYPE=="cool and wet" ~ "Winter Weather",
         EVTYPE=="lake effect snow" ~ "Winter Weather",
         
         EVTYPE=="marine hail" ~ "Marine Hail",
         
         EVTYPE=="storm force winds"  ~ "High Wind",
         EVTYPE=="dry mircoburst winds" ~ "High Wind",
         EVTYPE=="dry microburst" ~ "High Wind",
         EVTYPE=="microburst winds" ~ "High Wind",
         EVTYPE=="whirlwind" ~ "High Wind",
         
         EVTYPE=="non tstm wind" ~ "Strong Wind", 
         EVTYPE=="winds" ~ "Strong Wind",
         
         EVTYPE=="urban/sml stream fld"  ~ "Urban flood",
         
         EVTYPE=="record heat"  ~ "Excessive Heat",
         
         EVTYPE=="coastal storm" ~ "Storm Surge/Tide", 
         EVTYPE=="coastalstorm" ~ "Storm Surge/Tide", 
          
         
         EVTYPE=="torrential rainfall" ~ "Heavy Rain",
         EVTYPE=="heavy shower" ~ "Heavy Rain",
          
         EVTYPE=="hazardous surf" ~ "High Surf", 
         EVTYPE=="storm surge" ~ "High Surf",
         EVTYPE=="astronomical high tide" ~ "High Surf",

         EVTYPE=="astronomical low tide" ~ "Astronomical Low Tide ",
        
         EVTYPE=="marine tstm wind" ~ "Marine Thunderstorm Wind", 
          
         EVTYPE=="wet microburst" ~ "Thunderstorm Wind",

         EVTYPE=="flood/flash/flood" ~ "Flash Flood",
          
         EVTYPE=="fog" ~ "Dense Fog",
         
         EVTYPE=="rock slide" ~ "Avalanche",
         
         EVTYPE=="landspout" ~ "Funnel Cloud",
         
         EVTYPE=="hurricane/typhoon" ~ "Hurricane (Typhoon)",
         
         
         TRUE ~ as.character(EVTYPE_2)
     )
)

# eliminate event types recoded as NA (because they are not weather events but specific accidents)
StormData_sub <- StormData_sub %>% filter(EVTYPE_2 != "NA")

# check recoding 
sort(unique(subset(StormData_sub, is.na(EVTYPE_2))$EVTYPE)) # (should return "character(0)")
## character(0)
sort(unique(StormData_sub$EVTYPE_2)) # returns list of official event names in the data
##  [1] "Astronomical Low Tide "   "Avalanche"               
##  [3] "Blizzard"                 "Coastal Flood"           
##  [5] "Cold/Wind Chill"          "Debris Flow"             
##  [7] "Dense Fog"                "Dense Smoke"             
##  [9] "Drought"                  "Dust Storm"              
## [11] "Excessive Heat"           "Extreme Cold/Wind Chill" 
## [13] "Flash Flood"              "Flood"                   
## [15] "Frost/Freeze"             "Funnel Cloud"            
## [17] "Hail"                     "Heat"                    
## [19] "Heavy Rain"               "Heavy Snow"              
## [21] "High Surf"                "High Wind"               
## [23] "Hurricane (Typhoon)"      "Lakeshore Flood"         
## [25] "Lightning"                "Marine Hail"             
## [27] "Marine High Wind"         "Marine Strong Wind"      
## [29] "Marine Thunderstorm Wind" "Other"                   
## [31] "Rip Current"              "Seiche"                  
## [33] "Sleet"                    "Sneaker Wave"            
## [35] "Storm Surge/Tide"         "Strong Wind"             
## [37] "Thunderstorm Wind"        "Tornado"                 
## [39] "Tropical Depression"      "Tropical Storm"          
## [41] "Tsunami"                  "Urban flood"             
## [43] "Volcanic Ash"             "Waterspout"              
## [45] "Wildfire"                 "Winter Storm"            
## [47] "Winter Weather"
# separate in broad groups in new variable EVTYPE_group for a final more general categorization using general patterns for broader analysis
StormData_sub$EVTYPE_group <- NA
StormData_sub$EVTYPE_group[grep("([Cc]old|[Ff]reeze)", StormData_sub$EVTYPE_2)] <- "Cold"
StormData_sub$EVTYPE_group[grep("[Hh]eat", StormData_sub$EVTYPE_2)] <- "Heat"
StormData_sub$EVTYPE_group[grep("[Ff]lood", StormData_sub$EVTYPE_2)] <- "Flood"
StormData_sub$EVTYPE_group[grep("[Dd]ust", StormData_sub$EVTYPE_2)] <- "Dust"
StormData_sub$EVTYPE_group[grep("[S]now|Blizzard", StormData_sub$EVTYPE_2)] <- "Snow"
StormData_sub$EVTYPE_group[grep("[Ii]ce", StormData_sub$EVTYPE_2)] <- "Ice"
StormData_sub$EVTYPE_group[grep("[Rr]ain", StormData_sub$EVTYPE_2)] <- "Rain"
StormData_sub$EVTYPE_group[grep("[Ff]og", StormData_sub$EVTYPE_2)] <- "Fog"
StormData_sub$EVTYPE_group[grep("([Cc]urrent|[Ss]urf|[Ww]ave|[Tt]ide|[Cc]oastal|Seiche)", 
                                StormData_sub$EVTYPE_2)] <- "Ocean Events"
StormData_sub$EVTYPE_group[grep("([Th]hunderstorm|[Ll]ightning)", 
                                StormData_sub$EVTYPE_2)] <- "Thunderstorm"
StormData_sub$EVTYPE_group[grep("[Ww]ind", StormData_sub$EVTYPE_2)] <- "Wind"
StormData_sub$EVTYPE_group[grep("(Avalanche|Debris Flow)", StormData_sub$EVTYPE_2)] <- "Avalanche"
StormData_sub$EVTYPE_group[grep("(Dust Storm|Dust Devil)", StormData_sub$EVTYPE_2)] <- "Dust"
StormData_sub$EVTYPE_group[grep("(Tornado|Funnel Cloud|Waterspout)", 
                                StormData_sub$EVTYPE_2)] <- "Tornado"
StormData_sub$EVTYPE_group[grep("(Hurricane|Tropical Storm|Tropical Depression)", 
                                StormData_sub$EVTYPE_2)] <- "Hurricane"
StormData_sub$EVTYPE_group[grep("(Wildfire|Dense Smoke)", StormData_sub$EVTYPE_2)] <- "Wildfire"
StormData_sub$EVTYPE_group[grep("(Winter Storm|Winter Weather|[Pp]recipitation|Sleet)", 
                                StormData_sub$EVTYPE_2)] <- "Winter Weather"

# separate in broad groups (EVTYPE_group) for a more general categorization using specific names
StormData_sub = StormData_sub %>% mutate(
     EVTYPE_group = case_when( 
         EVTYPE_2=="Other"  ~ "Other",
         EVTYPE_2=="Volcanic Ash"  ~ "Volcanic Ash",
         EVTYPE_2=="Hail"  ~ "Hail",
         EVTYPE_2=="Marine Hail"  ~ "Hail",
         EVTYPE_2=="Drought"  ~ "Drought",
         EVTYPE_2=="Tsunami"  ~ "Tsunami",
         
         TRUE ~ as.character(EVTYPE_group)
     )
)

# check (both should return "character(0)")
sort(unique(subset(StormData_sub, is.na(EVTYPE_group))$EVTYPE))
## character(0)
sort(unique(subset(StormData_sub, is.na(EVTYPE_group))$EVTYPE_2))
## character(0)

Data preparation for plots and tables

#############################################################
    # create data sets with aggregate data for plotting
##############################################################

# change EVTYPE_group class to factor
StormData_sub$EVTYPE_group <- as.factor(StormData_sub$EVTYPE_group)

# create specific data sets for health and economic data and aggregate summary data
StormData_health <- StormData_sub %>% 
        select(EVTYPE_group, FATALITIES, INJURIES) %>% 
        group_by(EVTYPE_group) %>%
        summarize(sum_fat = sum(FATALITIES, na.rm = TRUE), 
                  sum_inj = sum(INJURIES, na.rm = TRUE))

StormData_econom <- StormData_sub %>% 
        select(EVTYPE_group, prop_damage, crop_damage) %>% 
        group_by(EVTYPE_group) %>%
        summarize(sum_propdam = sum(prop_damage, na.rm = TRUE), 
                  sum_cropdam = sum(crop_damage, na.rm = TRUE))

# reshape the data sets to prepare for plotting
melt_health <-melt(StormData_health, id=c("EVTYPE_group"))
melt_econom <-melt(StormData_econom, id=c("EVTYPE_group"))

# rename variable level names for plotting 
melt_health$variable <- factor(melt_health$variable,
                         levels = c("sum_fat", "sum_inj"),
                         labels = c("Total fatalities", "Total injuries"))

melt_econom$variable <- factor(melt_econom$variable,
                         levels = c("sum_propdam", "sum_cropdam"),
                         labels = c("Property damage", "Crop damage"))


#############################################################
# create more specific data sets with aggregate data for tables
##############################################################

# create lists with the official event names included in the event type groups that were identified as the most damaging for each type of damage

most_health_damage <- dput(sort(unique(subset(
  StormData_sub, EVTYPE_group == "Tornado" | 
    EVTYPE_group == "Wind" |
    EVTYPE_group == "Heat" | 
    EVTYPE_group == "Flood")$EVTYPE_2)))


most_econom_damage <- dput(sort(unique(subset(
  StormData_sub, EVTYPE_group == "Flood" | 
    EVTYPE_group == "Hurricane" | 
    EVTYPE_group == "Tornado" | 
    EVTYPE_group == "Ocean Events")$EVTYPE_2)))

# generate data sets to show as tables including the summary information on health and economic damage for the most damaging event types
StormData_most_health_table <- StormData_sub %>% 
                        filter(EVTYPE_2 %in% most_health_damage)  %>%
                        group_by(EVTYPE_group, EVTYPE_2) %>%
                        summarize(sum_fat = sum(FATALITIES, na.rm = TRUE), 
                                  sum_inj = sum(INJURIES, na.rm = TRUE),
                                  total_fat_inj = sum(sum_fat, sum_inj, na.rm = TRUE)) %>%
                        arrange(desc(total_fat_inj)) %>%
                        rename("Event type group" = EVTYPE_group,
                               "Offical Event name" = EVTYPE_2, 
                               "Fatalities" = sum_fat,
                               "Injuries" = sum_inj,
                               "Total fatalities and injuries" = total_fat_inj) 

StormData_most_econom_table <- StormData_sub %>% 
                        filter(EVTYPE_2 %in% most_econom_damage)  %>%
                        group_by(EVTYPE_group, EVTYPE_2) %>%
                        summarize(sum_propdam = sum(prop_damage, na.rm = TRUE), 
                                  sum_cropdam = sum(crop_damage, na.rm = TRUE),
                                  total_cost = sum(sum_propdam, sum_cropdam, na.rm = TRUE)) %>%
                        arrange(desc(total_cost)) %>%
                        rename("Event type group" = EVTYPE_group, 
                               "Offical event name" = EVTYPE_2, 
                               "Property damage (mill $)" = sum_propdam,
                               "Crop damage (mill $)" = sum_cropdam,
                               "Total cost (mill $)" = total_cost) 

Results

1. Across the United States, which types of events are most harmful with respect to population health?

To answer this question the total fatalities and injuries per weather event type were plotted into one bar plot with event types groups in the x-axis and the total number of fatalities and injuries in the y-axis. The plot indicates that the most damaging weather event types to population health, measured in fatalities and injuries, are first and by far Tornadoes, followed by Wind-related events (mainly Thunderstorm Wind), Heat-related events and Floods.

ggplot(melt_health, aes(x = EVTYPE_group, y = value, fill = variable)) +
        geom_bar(stat = "identity") +
        labs(title = "Most harmful Strom Data events to population health", 
             x = "Event type group",
             y = "Number of fatalities and injuries",
             caption = "United States NOAA Storm Data years 1996-2011") +
        scale_fill_grey(start = 0.4, end = 0.7) +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1),
              legend.title = element_blank(),
              plot.caption = element_text(face = "italic", hjust = 0))

For a more detailed exploration of the most damaging weather event type groups identified in the plot above, the following table displays the event types that conform each group using the official event names and includes total fatality and injury data. The table is sorted by the total fatalities and injuries column and shows the most damaging event types within each group are Tornadoes, Thunderstorm Wind, Excessive Heat events and Floods, the latter closely followed by the more specific category of Flash Flood.

kable(StormData_most_health_table, caption = "Table with detail of most harmful weather events to population health  that includes groups, official event names and number of fatalities and injuries as individual and aggregated variables.")
Table with detail of most harmful weather events to population health that includes groups, official event names and number of fatalities and injuries as individual and aggregated variables.
Event type group Offical Event name Fatalities Injuries Total fatalities and injuries
Tornado Tornado 5633 91364 96997
Wind Thunderstorm Wind 746 9534 10280
Heat Excessive Heat 1905 6575 8480
Flood Flood 500 6877 7377
Heat Heat 1118 2494 3612
Flood Flash Flood 1035 1802 2837
Wind High Wind 298 1515 1813
Wind Strong Wind 118 315 433
Wind Cold/Wind Chill 20 271 291
Flood Urban flood 28 79 107
Tornado Waterspout 6 72 78
Wind Marine Strong Wind 14 22 36
Wind Extreme Cold/Wind Chill 17 5 22
Wind Marine Thunderstorm Wind 9 8 17
Tornado Funnel Cloud 0 3 3
Wind Marine High Wind 1 1 2
Flood Lakeshore Flood 0 0 0

2. Across the United States, which types of events have the greatest economic consequences?

To answer this question the total property and crop damage cost per event type were plotted into one bar plot with event types groups in the x-axis and the total damage cost in million dollars in the y-axis. The plot indicates that the most economically damaging weather event types, measured by property and crop damage costs in millions of dollars, are Floods, Hurricanes, Tornadoes, and Ocean-related events.

ggplot(melt_econom, aes(x = EVTYPE_group, y = value, fill = variable)) + 
        geom_bar(stat = "identity")  +
        labs(title = "Most economically damaging Strom Data events", 
             x = "Event type group",
             y = "Millions of dollars",
             caption = "United States NOAA Storm Data years 1996-2011") +
        scale_fill_grey(start = 0.4, end = 0.7) +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1),
              legend.title = element_blank(),
              plot.caption = element_text(face = "italic", hjust = 0))

For a more detailed exploration of the most economically damaging weather event type groups identified in the plot above, the following table displays the event types that conform each group using the official event names and includes total cost data in millions of dollars ($). The table is sorted by the total cost column and shows the most damaging event types within each group are the mostly general categories of Floods, Hurricanes (Typhoons), Tornadoes and High Surf (grouped within Ocean-Events).

kable(StormData_most_econom_table, caption = "Table with detail of most economically damaging weather events that includes groups, official event names, and property and crop damage cost as individual and aggregated variables.")
Table with detail of most economically damaging weather events that includes groups, official event names, and property and crop damage cost as individual and aggregated variables.
Event type group Offical event name Property damage (mill $) Crop damage (mill $) Total cost (mill $)
Flood Flood 144781.15930 5671.1740 150452.33325
Hurricane Hurricane (Typhoon) 85356.41001 5516.1178 90872.52781
Tornado Tornado 56942.03423 414.9629 57356.99709
Ocean Events High Surf 43436.51650 0.0050 43436.52150
Flood Flash Flood 16907.94561 1532.1971 18440.14276
Hurricane Tropical Storm 7714.39055 694.8960 8409.28655
Ocean Events Storm Surge/Tide 4641.23800 0.8500 4642.08800
Ocean Events Coastal Flood 427.56606 0.0560 427.62206
Flood Urban flood 58.30965 8.4881 66.79775
Tornado Waterspout 60.73020 0.0000 60.73020
Flood Lakeshore Flood 7.54000 0.0000 7.54000
Hurricane Tropical Depression 1.73700 0.0000 1.73700
Ocean Events Seiche 0.98000 0.0000 0.98000
Ocean Events Astronomical Low Tide 0.32000 0.0000 0.32000
Tornado Funnel Cloud 0.20160 0.0000 0.20160
Ocean Events Sneaker Wave 0.17200 0.0000 0.17200
Ocean Events Rip Current 0.00100 0.0000 0.00100