Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, crop damage and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, crop damage and property damage.

Objective

The analysis will address the following questions.

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Synopsis

Here is the summary from this analysis (aiming to project objective only):

  1. For population health, with respect of the sum of fatalities and injuries
    • Tsunami is the most harmful event in terms of mean value per event from 1950 to 2011
    • Tornado is the most harmful event in terms of total value from 1950 to 2011
  2. For economic consequence, with respect of the sum of property damage and crop damage
    • Hurricane(Typhoon) has the greatest impact in terms of mean value per event from 1950 to 2011
    • Flood has the greatest impact in terms of total value from 1950 to 2011

Process Data

Original Data and Cookbook

  • Here is the Cookbook for the data, Documentation of the database available, which has descriptions of how some of the variables are constructed/defined.

  • The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

  • Storm Data.

  • The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Software Environment

library(knitr)
library(lubridate)
library(dplyr)
library(ggplot2)
library(gridExtra)
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] xtable_1.7-4    gridExtra_2.0.0 ggplot2_1.0.1   dplyr_0.4.3    
## [5] lubridate_1.3.3 knitr_1.11     
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   colorspace_1.2-6 DBI_0.3.1        digest_0.6.8    
##  [5] evaluate_0.8     formatR_1.2.1    grid_3.1.2       gtable_0.1.2    
##  [9] htmltools_0.2.6  magrittr_1.5     MASS_7.3-44      memoise_0.2.1   
## [13] munsell_0.4.2    parallel_3.1.2   plyr_1.8.3       proto_0.3-10    
## [17] R6_2.1.1         Rcpp_0.12.1      reshape2_1.4.1   rmarkdown_0.8   
## [21] scales_0.3.0     stringi_0.5-5    stringr_1.0.0    tools_3.1.2     
## [25] yaml_2.1.13

Download and Read Dataset

# Download the data with bzfile option if not download yet
if (!file.exists("StormData.csv.bz2")) {
         download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
"StormData.csv.bz2")}
# Read the original dataset into memory
stormData <- read.csv(bzfile("StormData.csv.bz2"), header = TRUE, stringsAsFactors = FALSE)
stormData <- tbl_df(data.frame(stormData))
dim_stormData <- dim(stormData)

The dimensions of the orginal dataset equals (902297, 37).

Process Strategy

  1. Original dataset is subset to contain only the following variables of interest
  • “STATE”
  • “BGN_DATE”
  • “COUNTY”
  • “REFNUM”
  • “EVTYPE”
  • “FATALITIES”
  • “INJURIES”
  • “PROPDMG”
  • “PROPDMGEXP”
  • “CROPDMG”
  • “CROPDMGEXP”
  1. Summary records (which have text “summary” or “Summary”) are removed from the original dataset as they are irrelevant.

  2. Units of crop (CROPDMG) and property (PROPDMG) damage are converted to units of US$1 based on the variables “CROPDMGEXP” and “PROPDMGEXP” respectively. These are text K (or k), M (or m) and B (or b). Any other entries (+, - “”, 0, 1, 2, 3, 4, 5, 6, 7, 8, h, H) are treated as invalid thus removed.

  3. Create three variables.
  • HARMFUL = FATALITIES + INJURIES;
  • ECOIMPACT = PROPDMG + CROPDMG (after both been converted to US$1 base);
  • YEAR_EVENT = year(mdy_hms(BGN_DATA)).
  1. Event types included in the analysis are cleaned up (see Map EVTYPE Variable section) based on the provided documentation (only 48 event types are defined), these should be the only event types allowed.

  2. Calculate and plot the top 10 most Harmful events to population health, in the sense of the mean values and the total values of the sum.

  3. Calculate and plot the top 10 most impact events with economic consequences, in the sense of the mean values and the total values of the sum.

  4. For example, calculate the Harmful data by event in the state of UTAH.

  5. For example, calculate the Economic Consequences data by event in the state of UTAH.

  6. For example, calculate the Harmful and Economic Consequences data by year in the state of Utah.

Meaning of Variables in Original Dataset

“EVTYPE” - The type of weather event

“FATALITIES” - The number of deaths directly associated with the weather event.

“INJURIES” - The number of non-fatal injuries directly associated with the weather event.

“PROPDMG” and “PROPDMGEXP” - The dollar cost of Property Damage associated with the weather event. This cost is divided into two variables, PROPDMG is the numeric estimate of the damage and PROPDMGEXP is the units associated with the numeric estimate. The units may be K-thousands, M-millions or B-billions.

“CROPDMG” and “CROPDMGEXP” - The dollar cost of Crop Damage associated with the weather event. This cost is divided into two variables, CROPDMG is the numeric estimate of the damage and CROPDMGEXP is the units associated with the numeric estimate. The units may be K-thousands, M-millions or B-billions.

Remove “summary” records

# Remove records with "summary" or ""Summary" and selecte only variables of interest
stormData <- stormData[!grepl("Summary|summary", stormData$EVTYPE), ]

Process Property and Crop Damage Amounts

There are (+, - “”, 0, 1, 2, 3, 4, 5, 6, 7, 8, h, H) in CROPDMGEXP and PROPDMGEXP, which are removed due to uncertainty.

# Convert damage amounts into US$1 units. K/k refers to US$1,000; 
# M/m refer to US$1,000,000; B/b refers to US$1,000,000,000.
##
stormDataNA <- stormData %>% 
    mutate(YEAR_EVENT = year(mdy_hms(BGN_DATE))) %>%
    mutate(PROPDMGEXP = ifelse(PROPDMGEXP %in% (c("k","K","m","M","b","B")), PROPDMGEXP, NA)) %>% 
    mutate(CROPDMGEXP = ifelse(CROPDMGEXP %in% (c("k","K","m","M","b","B")), CROPDMGEXP, NA)) %>% 
    mutate(propDamage = ifelse(PROPDMGEXP %in% (c("k","K")), PROPDMG * 1000, 
                               ifelse(PROPDMGEXP %in% (c("m","M")), PROPDMG * 1000000, 
                                      ifelse(PROPDMGEXP %in% (c("b","B")), PROPDMG * 1000000000, 0)))) %>% 
    mutate(cropDamage = ifelse(CROPDMGEXP %in% (c("k","K")), CROPDMG * 1000, 
                               ifelse(CROPDMGEXP %in% (c("m","M")), CROPDMG * 1000000, 
                                      ifelse(CROPDMGEXP %in% (c("b","B")), CROPDMG * 1000000000, 0)))) %>% 
    mutate(ECOIMPACT = propDamage + cropDamage) %>%
    mutate(HARMFUL = FATALITIES + INJURIES) %>%
    select(STATE, YEAR_EVENT, COUNTY, EVTYPE, REFNUM, FATALITIES, INJURIES, HARMFUL, ECOIMPACT)
len <- length(unique(stormDataNA$EVTYPE))
dim_stormDataNA <- dim(stormDataNA)

Now, the dimensions of the processed dataset = (902224, 9).

Map EVTYPE Variable

From the print above, we can see, that before the EVTYPE clean up, there are 921 unique entries in EVTYPE variable. It is not good to do any analysis without proper mapping to defined event type. In the cookbook, Documentation states that there are 48 event typies validated. See the picture here.

Event Type in the provided Cookbook

Therefore, a new variable called EVTYPE_Clean is created based on the following codes in order to map the 48 defined event types.

options(width = 10000)
# Generate a new variable EVTYPE_Clean
stormDataNA$EVTYPE_Clean <- NA
# Generate valid Event Type as described in Cookbook
stormDataNA$EVTYPE_Clean[grepl("Astronomical Low Tide", 
                                stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Astronomical Low Tide"
stormDataNA$EVTYPE_Clean[grepl("Avalanche", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Avalanche"
stormDataNA$EVTYPE_Clean[grepl("Blizzard", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Blizzard"
stormDataNA$EVTYPE_Clean[grepl("Coastal Flood", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Coastal Flood"
stormDataNA$EVTYPE_Clean[grepl("Cold/Wind Chill|COLD|WIND CHILL", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Cold/Wind Chill"
stormDataNA$EVTYPE_Clean[grepl("Debris Flow|Landslide", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Debris Flow"
stormDataNA$EVTYPE_Clean[grepl("Dense Fog", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dense Fog"
stormDataNA$EVTYPE_Clean[grepl("Dense Smoke", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dense Smoke"
stormDataNA$EVTYPE_Clean[grepl("Drought", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Drought"
stormDataNA$EVTYPE_Clean[grepl("Dust Devil", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dust Devil"
stormDataNA$EVTYPE_Clean[grepl("Dust Storm", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dust Storm"
stormDataNA$EVTYPE_Clean[grepl("Excessive Heat", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                    & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Excessive Heat"
stormDataNA$EVTYPE_Clean[grepl("Extreme Cold/Wind Chill |Extreme Cold/Wind Chill|EXTREME COLD|WIND CHILL|EXTREME WIND CHILL", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Extreme Cold/Wind Chill "
stormDataNA$EVTYPE_Clean[grepl("Flash Flood", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Flash Flood"
stormDataNA$EVTYPE_Clean[grepl("Flood", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Flood"
stormDataNA$EVTYPE_Clean[grepl("Frost/Freeze|FROST|FREEZE", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Frost/Freeze"
stormDataNA$EVTYPE_Clean[grepl("Funnel Cloud|FUNNEL CLOUD", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Funnel Cloud"
stormDataNA$EVTYPE_Clean[grepl("Freezing Fog|FREEZING FOG", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Freezing Fog"
stormDataNA$EVTYPE_Clean[grepl("Hail|HAIL", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Hail"
stormDataNA$EVTYPE_Clean[grepl("Heat|HEAT", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Heat"
stormDataNA$EVTYPE_Clean[grepl("Heavy Rain|HEAVY RAIN", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)== TRUE] <- "Heavy Rain"
stormDataNA$EVTYPE_Clean[grepl("Heavy Snow|HEAVY SNOW", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Heavy Snow"
stormDataNA$EVTYPE_Clean[grepl("High Surf|HIGH SURF", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "High Surf"
stormDataNA$EVTYPE_Clean[grepl("High Wind|HIGH WIND", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "High Wind"
stormDataNA$EVTYPE_Clean[grepl("Hurricane (Typhoon)|HURRICANE|TYPHOON", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Hurricane (Typhoon)"
stormDataNA$EVTYPE_Clean[grepl("Ice Storm", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Ice Storm"
stormDataNA$EVTYPE_Clean[grepl("Lake-Effect Snow|LAKE-EFFECT SNOW", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Lake-Effect Snow"
stormDataNA$EVTYPE_Clean[grepl("Lakeshore Flood", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Lakeshore Flood"
stormDataNA$EVTYPE_Clean[grepl("Lightning", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Lightning"
stormDataNA$EVTYPE_Clean[grepl("Marine Hail", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine Hail"
stormDataNA$EVTYPE_Clean[grepl("Marine High Wind", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine High Wind"
stormDataNA$EVTYPE_Clean[grepl("Marine Strong Wind", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine Strong Wind"
stormDataNA$EVTYPE_Clean[grepl("Marine Thunderstorm Wind|MARINE TSTM WIND", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine Thunderstorm Wind"
stormDataNA$EVTYPE_Clean[grepl("Rip Current", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Rip Current"
stormDataNA$EVTYPE_Clean[grepl("Seiche", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Seiche"
stormDataNA$EVTYPE_Clean[grepl("Sleet", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Sleet"
stormDataNA$EVTYPE_Clean[grepl("Storm Surge/Tide|STORM SURGE|TIDE|STORM TIDE", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Storm Surge/Tide"
stormDataNA$EVTYPE_Clean[grepl("Strong Wind", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Strong Wind"
stormDataNA$EVTYPE_Clean[grepl("Thunderstorm Wind|THUNDERSTORM WIND|THUNDERSTORM WINDS|THUNDERSTORM WINDSS|TSTM WIND",  
                                stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Thunderstorm Wind"
stormDataNA$EVTYPE_Clean[grepl("Tornado", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tornado"
stormDataNA$EVTYPE_Clean[grepl("Tropical Depression", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tropical Depression"
stormDataNA$EVTYPE_Clean[grepl("Tropical Storm", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tropical Storm"
stormDataNA$EVTYPE_Clean[grepl("Tsunami", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tsunami"
stormDataNA$EVTYPE_Clean[grepl("Volcanic Ash", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Volcanic Ash"
stormDataNA$EVTYPE_Clean[grepl("Waterspout", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Waterspout"
stormDataNA$EVTYPE_Clean[grepl("Wildfire", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Wildfire"
stormDataNA$EVTYPE_Clean[grepl("Winter Storm", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Winter Storm"
stormDataNA$EVTYPE_Clean[grepl("Winter Weather", 
                               stormDataNA$EVTYPE, ignore.case = TRUE) 
                                & is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Winter Weather"
# Records with unmapped EVTYPE are removed
stormDataNA <- stormDataNA[!is.na(stormDataNA$EVTYPE_Clean),]
dim_stormDataNA <- dim(stormDataNA)
len2 <- length(unique(stormDataNA$EVTYPE_Clean))

Now, the dimensions of the processed dataset = (892768, 10).

After the EVTYPE clean up, there are 44 unique entries in EVTYPE_Clean variable.

It is noticed that, in this analysis, EVTYPE have been categorised in the order presented in the script above. Therefore, once an EVTYPE has been categorised, it cannot be categorised again.

Results

Total Harmful for Population Health

The total values of HARMFUL (fatalities + injuries) grouped by event type are calculated. The plot is shown below.

harmfulByEVTYPE <- stormDataNA %>% 
    group_by(EVTYPE_Clean) %>% 
    summarise(total_Harmful = sum(HARMFUL), 
              total_Fatalities = sum(FATALITIES),
              total_Injuries = sum(INJURIES)) %>% 
    top_n(n = 10, wt = total_Harmful) 
ggplot(data=harmfulByEVTYPE, 
       aes(x = reorder(EVTYPE_Clean, -total_Harmful), y = total_Harmful)) + 
       geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
       labs(x="Weather Event Type", y="Total Harmful") + 
       ggtitle(expression(atop("Total Harmful (Fatalities + Injuries)", 
                          atop(italic("United States: 1950 - 2011"), "")))) + 
       theme(axis.text.x = element_text(angle = 90, hjust = 1))

For reference, the “total fatalities” and “total injuries” are also calculated, grouped by event type. The table is shown below.

dfHealth <- harmfulByEVTYPE %>%
    arrange(desc(total_Harmful)) %>%
    filter((row_number() <= 10)) %>%
    select(EVTYPE_Clean, total_Harmful, total_Fatalities, total_Injuries)
kable(x = dfHealth, row.names = TRUE,
       col.names = c("Weather Event", "Total Harmful", "Total Fatalities", "Total Injuries" ))
Weather Event Total Harmful Total Fatalities Total Injuries
1 Tornado 97043 5636 91407
2 Thunderstorm Wind 10067 704 9363
3 Excessive Heat 8445 1920 6525
4 Flood 7279 484 6795
5 Lightning 6049 817 5232
6 Heat 3896 1212 2684
7 Flash Flood 2837 1035 1802
8 Ice Storm 2079 89 1990
9 High Wind 1815 297 1518
10 Winter Storm 1554 216 1338

Mean Harmful for Population Health

The mean values of HARMFUL (fatalities + injuries) grouped by event type are calculated and the result is shown as below.

harmfulByEVTYPE <- stormDataNA %>% 
    group_by(EVTYPE_Clean) %>% 
    summarise(mean_Harmful = mean(HARMFUL),
              mean_Fatalities = mean(FATALITIES),
              mean_Injuries = mean(INJURIES)) %>% 
    top_n(n = 10, wt = mean_Harmful) 
ggplot(data=harmfulByEVTYPE, 
       aes(x = reorder(EVTYPE_Clean, -mean_Harmful), y = mean_Harmful)) + 
       geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
       labs(x="Weather Event Type", y="Mean Harmful") + 
       ggtitle(expression(atop("Mean Harmful (Fatalities + Injuries)", 
                          atop(italic("United States: 1950 - 2011"), "")))) + 
       theme(axis.text.x = element_text(angle = 90, hjust = 1))

dfHealth2 <- harmfulByEVTYPE %>%
    arrange(desc(mean_Harmful)) %>%
    filter((row_number() <= 10)) %>%
    select(EVTYPE_Clean, mean_Harmful, mean_Fatalities, mean_Injuries)
kable(x = dfHealth2, row.names = TRUE, digits = c(0, 3, 3, 3),
       col.names = c("Weather Event", "Mean Harmful", "Mean Fatalities", "Mean Injuries"))
Weather Event Mean Harmful Mean Fatalities Mean Injuries
1 Tsunami 8.100 1.650 6.450
2 Excessive Heat 5.024 1.142 3.882
3 Hurricane (Typhoon) 4.919 0.446 4.473
4 Heat 4.101 1.276 2.825
5 Tornado 1.599 0.093 1.506
6 Rip Current 1.423 0.743 0.681
7 Dust Storm 1.077 0.051 1.026
8 Ice Storm 1.026 0.044 0.982
9 Avalanche 1.021 0.579 0.442
10 Marine Strong Wind 0.750 0.292 0.458

Total Economic Consequences

The total values of ECOIMPACT (PROPDMG + CROPDMG) grouped by event type are calculated and the result is shown as below.

ecoimpactByEVTYPE <- stormDataNA %>% 
    group_by(EVTYPE_Clean) %>% 
    summarise(total_EcoImpact = sum(ECOIMPACT)/1000000000) %>% 
    top_n(n = 10, wt = total_EcoImpact) 
ggplot(data=ecoimpactByEVTYPE, 
       aes(x = reorder(EVTYPE_Clean, -total_EcoImpact), y = total_EcoImpact)) + 
       geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
       labs(x="Weather Event Type", y="Total Economic Consequences") + 
       ggtitle(expression(atop("Total Economic Consequences (in US$ Billions)", 
                          atop(italic("United States: 1950 - 2011"), "")))) + 
       theme(axis.text.x = element_text(angle = 90, hjust = 1))

For reference, a table for Total Ecoomic Consequences is also created and shown below.

ecoimpactByEVTYPE <- ecoimpactByEVTYPE %>%
    arrange(desc(total_EcoImpact)) %>%
    filter((row_number() <= 10)) %>%
    select(EVTYPE_Clean, total_EcoImpact)
kable(x = ecoimpactByEVTYPE, row.names = TRUE, digits = c(0, 3),
       col.names = c("Weather Event", "Total Economic Consequences (in US$ Billions)"))
Weather Event Total Economic Consequences (in US$ Billions)
1 Flood 161.053
2 Hurricane (Typhoon) 90.763
3 Tornado 57.408
4 Storm Surge/Tide 47.975
5 Hail 20.734
6 Flash Flood 18.439
7 Drought 15.019
8 Thunderstorm Wind 10.904
9 Ice Storm 8.968
10 Tropical Storm 8.409

Mean Economic Consequences

The mean values of ECOIMPACT (PROPDMG + CROPDMG) grouped by event type are calculated and the result is shown as below.

ecoimpactByEVTYPE <- stormDataNA %>% 
    group_by(EVTYPE_Clean) %>% 
    summarise(mean_EcoImpact = mean(ECOIMPACT)/1000000) %>% 
    top_n(n = 10, wt = mean_EcoImpact) 
ggplot(data=ecoimpactByEVTYPE, 
       aes(x = reorder(EVTYPE_Clean, -mean_EcoImpact), y = mean_EcoImpact)) + 
       geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
       labs(x="Weather Event Type", y="Mean Economic Consequences") + 
       ggtitle(expression(atop("Mean Economic Consequences (in US$ Millions)", 
                          atop(italic("United States: 1950 - 2011"), "")))) + 
       theme(axis.text.x = element_text(angle = 90, hjust = 1))

For reference, a table for Mean Ecoomic Consequences is also created and shown below.

ecoimpactByEVTYPE <- ecoimpactByEVTYPE %>%
    arrange(desc(mean_EcoImpact)) %>%
    filter((row_number() <= 10)) %>%
    select(EVTYPE_Clean, mean_EcoImpact)
kable(x = ecoimpactByEVTYPE, row.names = TRUE, digits = c(0, 3),
       col.names = c("Weather Event", "Mean Economic Consequences (in US$ Millions)"))
Weather Event Mean Economic Consequences (in US$ Millions)
1 Hurricane (Typhoon) 304.572
2 Storm Surge/Tide 92.975
3 Tropical Storm 12.065
4 Tsunami 7.204
5 Flood 6.144
6 Drought 5.979
7 Ice Storm 4.424
8 Wildfire 1.864
9 Frost/Freeze 1.341
10 Tornado 0.946

Discussion

Most Harmful Event in the State of Utah

stormDataUtah <- stormDataNA[which(stormDataNA$STATE == "UT"), ]
harmfulByEVTYPEUT <- stormDataUtah %>%
    group_by(EVTYPE_Clean) %>% 
    summarise(total_Harmful_UT = sum(HARMFUL)) %>% 
    top_n(n = 10, wt = total_Harmful_UT)
plot1 <- ggplot(data=harmfulByEVTYPEUT, 
                aes(x = reorder(EVTYPE_Clean, -total_Harmful_UT), y = total_Harmful_UT)) + 
                geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
                labs(x="Weather Event Type", y="Total Harmful in Utah") + 
                ggtitle(expression(atop("Total Harmful (Fatalities + Injuries)", 
                                        atop(italic("Utah: 1950 - 2011"), "")))) + 
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
harmfulByEVTYPEUT <- stormDataUtah %>%
    group_by(EVTYPE_Clean) %>% 
    summarise(mean_Harmful_UT = mean(HARMFUL)) %>% 
    top_n(n = 10, wt = mean_Harmful_UT)
plot2 <- ggplot(data=harmfulByEVTYPEUT, 
                aes(x = reorder(EVTYPE_Clean, -mean_Harmful_UT), y = mean_Harmful_UT)) + 
                geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
                labs(x="Weather Event Type", y="Mean Harmful in Utah") + 
                ggtitle(expression(atop("Mean Harmful (Fatalities + Injuries)", 
                                        atop(italic("Utah: 1950 - 2011"), "")))) + 
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot1, plot2, ncol = 2, nrow = 1)

Most Economic Consequences Event in the State of Utah

ecoimpactByEVTYPEUT <- stormDataUtah %>%
    group_by(EVTYPE_Clean) %>% 
    summarise(total_EcoImpact_UT = sum(ECOIMPACT)/1000000) %>% 
    top_n(n = 10, wt = total_EcoImpact_UT)
plot3 <- ggplot(data=ecoimpactByEVTYPEUT, 
                aes(x = reorder(EVTYPE_Clean, -total_EcoImpact_UT), y = total_EcoImpact_UT)) + 
                geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
                labs(x="Weather Event Type", y="Total Economic Consequences") + 
                ggtitle(expression(atop("Total Eco Consequences in US$M", 
                                        atop(italic("Utah: 1950 - 2011"), "")))) + 
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
ecoimpactByEVTYPEUT <- stormDataUtah %>%
    group_by(EVTYPE_Clean) %>% 
    summarise(mean_EcoImpact_UT = mean(ECOIMPACT)/1000000) %>% 
    top_n(n = 10, wt = mean_EcoImpact_UT)
plot4 <- ggplot(data=ecoimpactByEVTYPEUT, 
                aes(x = reorder(EVTYPE_Clean, -mean_EcoImpact_UT), y = mean_EcoImpact_UT)) + 
                geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
                labs(x="Weather Event Type", y="Mean Economic Consequences") + 
                ggtitle(expression(atop("Mean Eco Consequences in US$M", 
                                    atop(italic("Utah: 1950 - 2011"), "")))) + 
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot3, plot4, ncol = 2, nrow = 1)

Harmful Impact and Economic Consequences by Years in the State of Utah

Since the dataset contains no data point about Utah from 1950 - 1979, therefore following charts are adequate to only show the years of 1980 - 2011.

healthUT <- stormDataUtah %>%
    group_by(YEAR_EVENT) %>% 
    summarise(total_health_UT = sum(HARMFUL)) %>%
    filter(YEAR_EVENT >= 1980)
plot5 <- ggplot(data=healthUT, 
                aes(x = YEAR_EVENT, y = total_health_UT)) + 
                geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
                labs(x="Year", y="Total Harmful") + 
                ggtitle(expression(atop("Total Harmful (Fatalities + Injuries)", 
                                        atop(italic("Utah: 1980 - 2011"))))) + 
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
ecoimpactUT <- stormDataUtah %>%
    group_by(YEAR_EVENT) %>% 
    summarise(total_ecoimpact_UT = sum(ECOIMPACT)/1000000) %>%
    filter(YEAR_EVENT >= 1980)
plot6 <- ggplot(data=ecoimpactUT, 
                aes(x = YEAR_EVENT, y = total_ecoimpact_UT)) + 
                geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") + 
                labs(x="Year", y="Total Economic Consequences") + 
                ggtitle(expression(atop("Total Eco Consequences in US$M", 
                                    atop(italic("Utah: 1980 - 2011"))))) + 
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot5, plot6, ncol = 1, nrow = 2)


This document has been published on Rpubs.com.