Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The first question to be address is what events are most harmful with respect to population health. Therefore, the number of deaths and injuries that have occurred are analyzed. The second question is what events have the greatest economic consequences, so property and crop damages are analyzed.

From the analysis of the NOOA Storm Database, we found that on average across the United States, tornados are the most harmful event to population health both for fatalities and injuries. Floods cause the most property damage and droughts cause the most crop damage.


Data Processing

System Information

The analysis of the NOAA storm database was performed on the following system:

sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.2     tools_3.2.0     htmltools_0.2.6
##  [5] yaml_2.1.13     stringi_0.4-1   rmarkdown_0.8   knitr_1.11     
##  [9] stringr_1.0.0   digest_0.6.8    evaluate_0.7.2

R Packages

The following packages were used for the analysis:

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.1
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.1
## Loading required package: grid

Getting and Cleaning Data

The data for this study come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It can be downloaded using this link. The events in the database start in the year 1950 and end in November 2011

The following code can be used to download the data for further processing.

setwd("~/Coursera/Data Science/5_Reproducible Research/StormData")
if(!file.exists("StormData.csv.bz2")){
   download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")
}
data<-read.csv(bzfile("StormData.csv.bz2"))

The raw data consists of 37 variables. For this analysis, the following variables: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP were used.

dataTB<-tbl_df(data) %>% 
        select(c(8,23,24,25,26,27,28)) 
head(dataTB)
## Source: local data frame [6 x 7]
## 
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

The basic goal of this study is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The first question to be address is what events are most harmful with respect to population health. Therefore, the number of deaths and injuries that have occurred are analyzed. The second question is what events have the greatest economic consequences. So property and crop damages are analyzed. In this sense, four dataset are generated to analyze the impact of weather events on:

  1. Fatalities
  2. Injuries
  3. Property damage
  4. Crop damage

1. Fatalities

First the relevant data associated with Fatalities (Fatalities>0) is selected. This data is organized by weather event and then the total number of deaths is computed. A look-up table (LUT) is created to map the event types with the official forty-eight weather event types. After this cleaning step, the data is again organized by event and the total number of fatalities is recalculated.

LUT <- c("AVALANCE"   = "AVALANCHE",
"AVALANCHE"  = "AVALANCHE",
"BLACK ICE"  = "SLEET",                
"BLIZZARD"   = "BLIZZARD",
"BLOWING SNOW" = "HEAVY SNOW",
"COASTAL FLOOD" = "COASTAL FLOOD",               
"COASTAL FLOODING" = "COASTAL FLOOD",
"COASTAL STORM" = "STORM",
"COASTALSTORM"  = "STORM",
"COLD" = "COLD/WIND CHILL",
"COLD AND SNOW" = "COLD/WIND CHILL",
"COLD TEMPERATURE" = "COLD/WIND CHILL",
"COLD WAVE" = "COLD/WIND CHILL",
"COLD WEATHER" = "COLD/WIND CHILL",
"COLD/WIND CHILL" = "COLD/WIND CHILL",
"COLD/WINDS" = "COLD/WIND CHILL",
"DENSE FOG" = "DENSE FOG",
"DROUGHT/EXCESSIVE HEAT" = "EXCESSIVE HEAT",
"DROWNING" = "DROUGHT",
"DRY MICROBURST" = "",
"DUST DEVIL" = "DUST DEVIL",
"DUST STORM" = "DUST STORM",
"EXCESSIVE HEAT" = "EXCESSIVE HEAT",
"EXCESSIVE RAINFALL" = "HEAVY RAIN",
"EXTENDED COLD" = "COLD/WIND CHILL",
"EXTREME COLD" = "EXTREME COLD/WIND CHILL",
"EXTREME COLD/WIND CHILL" = "EXTREME COLD/WIND CHILL",
"EXTREME HEAT" = "EXCESSIVE HEAT",
"EXTREME WINDCHILL" = "EXTREME COLD/WIND CHILL",
"FALLING SNOW/ICE" = "ICE STORM",
"FLASH FLOOD" = "FLASH FLOOD",
"FLASH FLOOD/FLOOD" = "FLASH FLOOD",
"FLASH FLOODING" = "FLASH FLOOD",
"FLASH FLOODING/FLOOD" = "FLASH FLOOD",
"FLASH FLOODS" = "FLASH FLOOD",
"FLOOD" = "FLOOD",
"FLOOD & HEAVY RAIN" = "FLOOD",
"FLOOD/FLASH FLOOD" = "FLOOD",
"FLOOD/RIVER FLOOD" = "FLOOD",
"FLOODING" = "FLOOD",
"FOG" = "DENSE FOG",
"FOG AND COLD TEMPERATURES" = "FREEZING FOG",
"FREEZE" = "FROST/FREEZE",
"FREEZING DRIZZLE" = "FROST/FREEZE",
"FREEZING RAIN" = "FROST/FREEZE",
"FREEZING RAIN/SNOW" = "FROST/FREEZE",
"FREEZING SPRAY" = "FROST/FREEZE",
"FROST" = "FROST/FREEZE",
"GLAZE" = "FROST/FREEZE",
"GUSTY WIND"  = "HIGH WIND",
"GUSTY WINDS" = "HIGH WIND",
"HAIL" = "HAIL",
"HEAT" = "HEAT",
"HEAT WAVE" = "HEAT",
"HEAT WAVE DROUGHT" = "DROUGHT",
"HEAT WAVES" = "HEAT",
"HEAVY RAIN" = "HEAVY RAIN", 
"HEAVY SEAS" = "HIGH SURF",
"HEAVY SNOW" = "HEAVY SNOW",
"HEAVY SNOW AND HIGH WINDS" = "HEAVY SNOW",
"HEAVY SURF" = "HIGH SURF",
"HEAVY SURF AND WIND" = "HIGH SURF",
"HEAVY SURF/HIGH SURF" = "HIGH SURF",
"HIGH SEAS" = "HIGH SURF",
"HIGH SURF" = "HIGH SURF",
"HIGH SWELLS" = "HIGH SURF",
"HIGH WATER" = "HIGH SURF",
"HIGH WAVES" = "HIGH SURF",
"HIGH WIND" = "HIGH WIND",
"HIGH WIND AND SEAS" = "HIGH WIND",
"HIGH WIND/SEAS" = "HIGH WIND",
"HIGH WINDS" = "HIGH WIND",
"HIGH WINDS/SNOW" = "HEAVY SNOW",
"HURRICANE" = "HURRICANE/TYPHOON",
"HURRICANE ERIN" = "HURRICANE/TYPHOON",
"HURRICANE FELIX" = "HURRICANE/TYPHOON",
"HURRICANE OPAL" = "HURRICANE/TYPHOON",
"HURRICANE OPAL/HIGH WINDS" = "HURRICANE/TYPHOON",
"HURRICANE/TYPHOON" = "HURRICANE/TYPHOON",
"HYPERTHERMIA/EXPOSURE" = "",
"HYPOTHERMIA" = "",
"HYPOTHERMIA/EXPOSURE" = "",
"ICE" = "ICE STORM",
"ICE ON ROAD" = "SLEET",
"ICE STORM" = "ICE STORM",
"ICY ROADS" = "SLEET",
"LANDSLIDE" = "DEBRIS FLOW",
"LANDSLIDES" = "DEBRIS FLOW",
"LIGHT SNOW" = "LIGHTNING",
"LIGHTNING" = "LIGHTNING",
"LIGHTNING." = "LIGHTNING",
"LOW TEMPERATURE" = "COLD/WIND CHILL",
"MARINE ACCIDENT" = "",
"MARINE HIGH WIND" = "MARINE HIGH WIND",
"MARINE MISHAP" = "MARINE HIGH WIND",
"MARINE STRONG WIND" = "MARINE HIGH WIND",
"MARINE THUNDERSTORM WIND" = "MARINE THUNDERSTORM WIND",
"MARINE TSTM WIND" = "MARINE THUNDERSTORM WIND",
"MINOR FLOODING" = "FLASH FLOOD",
"MIXED PRECIP" = "",
"MUDSLIDE" = "DEBRIS FLOW",
"MUDSLIDES" = "DEBRIS FLOW",
"RAIN/SNOW" = "HEAVY RAIN",
"RAIN/WIND" = "HEAVY RAIN" ,
"RAPIDLY RISING WATER" = "FLASH FLOOD",
"RECORD COLD" = "EXTREME COLD/WIND CHILL",
"RECORD HEAT" = "EXCESSIVE HEAT",
"RECORD/EXCESSIVE HEAT" = "EXCESSIVE HEAT",
"RIP CURRENT" = "RIP CURRENT",
"RIP CURRENTS" = "RIP CURRENT",
"RIP CURRENTS/HEAVY SURF" = "RIP CURRENT",
"RIVER FLOOD" = "FLOOD",
"RIVER FLOODING" = "FLOOD",
"ROUGH SEAS" = "HIGH SURF",
"ROUGH SURF" = "HIGH SURF",
"SLEET" = "SLEET",
"SNOW" = "HEAVY SNOW",
"SNOW AND ICE" = "FROST/FREEZE",
"SNOW SQUALL" = "HEAVY SNOW",
"SNOW SQUALLS" = "HEAVY SNOW",
"SNOW/ BITTER COLD" = "",
"STORM SURGE" = "STORM TIDE",
"STORM SURGE/TIDE" = "STORM TIDE",
"STRONG WIND" = "STRONG WIND",
"STRONG WINDS" = "STRONG WIND",
"THUNDERSNOW" = "THUNDERSTORM WIND",
"THUNDERSTORM" = "THUNDERSTORM WIND",
"THUNDERSTORM WIND" = "THUNDERSTORM WIND",
"THUNDERSTORM WIND (G40)" = "THUNDERSTORM WIND",
"THUNDERSTORM WIND G52" = "THUNDERSTORM WIND",
"THUNDERSTORM WINDS" = "THUNDERSTORM WIND",
"THUNDERTORM WINDS" = "THUNDERSTORM WIND",
"TORNADO" = "TORNADO",
"TORNADOES, TSTM WIND, HAIL" = "TORNADO",
"TROPICAL STORM" = "TROPICAL STORM",
"TROPICAL STORM GORDON" = "TROPICAL STORM",
"TSTM WIND" = "THUNDERSTORM WIND",
"TSTM WIND (G35)" = "THUNDERSTORM WIND",
"TSTM WIND/HAIL" = "THUNDERSTORM WIND",
"TSUNAMI" = "TSUNAMI",
"UNSEASONABLY COLD" = "",
"UNSEASONABLY WARM" = "",
"UNSEASONABLY WARM AND DRY" = "",
"URBAN AND SMALL STREAM FLOODIN" = "FLOOD",
"URBAN/SML STREAM FLD" = "FLOOD",
"WATERSPOUT" = "WATERSPOUT",
"WATERSPOUT/TORNADO" = "WATERSPOUT",
"WHIRLWIND" = "TORNADO",
"WILD FIRES" = "WILDFIRE",
"WILD/FOREST FIRE" = "WILDFIRE",
"WILDFIRE" = "WILDFIRE",
"WIND" = "HIGH WIND",
"WIND STORM" = "HIGH WIND",
"WINDS" = "HIGH WIND",
"WINTER STORM" = "WINTER STORM",
"WINTER STORM HIGH WINDS" = "WINTER STORM",
"WINTER STORMS" = "WINTER STORM",
"WINTER WEATHER" = "WINTER WEATHER", 
"WINTER WEATHER/MIX" = "WINTER WEATHER",
"WINTRY MIX" = "WINTER WEATHER")
Fatalities <- filter(dataTB, FATALITIES > 0)                                 # data selected
Fatalities$EVTYPE <- toupper(Fatalities$EVTYPE)
Fatalities <- group_by(Fatalities, EVTYPE) %>% summarize(sum(FATALITIES))    # Total fatalities by event 
names(Fatalities)<- c("EVTYPE","FATALITIES")
Fatalities$EVTYPE <- LUT[Fatalities$EVTYPE]                                  # Lookup table
Fatalities <- group_by(Fatalities, EVTYPE) %>% summarize(sum(FATALITIES))    # Total Fatalities is recalculated
names(Fatalities)<- c("EVTYPE","FATALITIES")
Fatalities <- arrange(Fatalities, desc(FATALITIES))

head(Fatalities,n=15)
## Source: local data frame [15 x 2]
## 
##                     EVTYPE FATALITIES
## 1                  TORNADO       5659
## 2           EXCESSIVE HEAT       2020
## 3                     HEAT       1114
## 4              FLASH FLOOD       1020
## 5                LIGHTNING        818
## 6        THUNDERSTORM WIND        712
## 7              RIP CURRENT        577
## 8                    FLOOD        528
## 9                HIGH WIND        320
## 10 EXTREME COLD/WIND CHILL        305
## 11               AVALANCHE        225
## 12            WINTER STORM        217
## 13               HIGH SURF        182
## 14         COLD/WIND CHILL        166
## 15              HEAVY SNOW        142

2. Injuries

A similar approach is used in the case of Injuries.

Injuries <- filter(dataTB, INJURIES > 0)                                   # data selected
Injuries$EVTYPE <- toupper(Injuries$EVTYPE)
Injuries <- group_by(Injuries, EVTYPE) %>% summarize(sum(INJURIES))        # Total injuries by event
names(Injuries)<- c("EVTYPE","INJURIES")
Injuries$EVTYPE <- LUT[Injuries$EVTYPE]                                    # lookup table
Injuries <- group_by(Injuries, EVTYPE) %>% summarize(sum(INJURIES))
names(Injuries)<- c("EVTYPE","INJURIES")
Injuries <- arrange(Injuries, desc(INJURIES))                              # Total injuries is recalculated

head(Injuries,n=15)
## Source: local data frame [15 x 2]
## 
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2  THUNDERSTORM WIND     9461
## 3              FLOOD     6888
## 4     EXCESSIVE HEAT     6730
## 5          LIGHTNING     5232
## 6               HEAT     2479
## 7          ICE STORM     2113
## 8        FLASH FLOOD     1785
## 9           WILDFIRE     1606
## 10         HIGH WIND     1558
## 11              HAIL     1361
## 12      WINTER STORM     1353
## 13 HURRICANE/TYPHOON     1323
## 14        HEAVY SNOW     1107
## 15         DENSE FOG     1076

3. Property Damage

In order to calculate property damage caused by weather events, the following variables are used: EVTYPE, PROPDMG and PROPDMGEXP. PROPDMG represents the amount of money in property damage and PROPDMGEXP represents the units. For example a value of 3 or K in PROPDMGEXP represents 1000. PROPDMGEXP needs some cleaning before its use to calculate the total amount of property damage.

First step, select records that contain property damage

Property <- filter(dataTB, dataTB$PROPDMG>0) 
Property <- tbl_df(Property)
Property$EVTYPE <-toupper(Property$EVTYPE)

Second step, clean PROPDMGEXP and calculate the amount of property damage in USD

valor <- c("H","K","M","B","0","1","2","3","4","5","6","7","8","-","+","","?")
multiplicador <- c(100,1000,1000000,1000000000,1,10,100,1000,1000,100000,1000000,10000000,100000000,0,0,1,0)

Property$PROPDMGEXP <- toupper(Property$PROPDMGEXP)
unique(Property$PROPDMGEXP)
##  [1] "K" "M" "B" "+" "0" ""  "5" "6" "4" "H" "2" "7" "3" "-"
Property <- mutate(Property, PROPVAL = (PROPDMG * multiplicador[match(PROPDMGEXP,valor)]))

Third step, summarise property damage by events

DamageProperty <- group_by(Property, EVTYPE) %>% summarize(sum(PROPVAL))
names(DamageProperty)<- c("EVTYPE","DAMAGE")

Four step, cleaning events: For property and crop damage, the lookup table is not used because the size of this table is bigger than in the case of fatalities and injuries. A manual cleaning of the events type is performed.

DamageProperty$EVTYPE[grep("FLASH FLOOD", DamageProperty$EVTYPE)] <- "FLASH FLOOD"
DamageProperty$EVTYPE[grep("MARINE TSTM WIND", DamageProperty$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("NON-TSTM WIND", DamageProperty$EVTYPE)] <- "HIGH WIND"
DamageProperty$EVTYPE[grep("TSTM WIND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("MARINE THUNDERSTORM WIND", DamageProperty$EVTYPE)] <- "MARINE TSTM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORM WIND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("MARINE TSTM WIND", DamageProperty$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("HIGH SURF", DamageProperty$EVTYPE)] <- "HIGH SURF"
DamageProperty$EVTYPE[grep("ASTRONOMICAL", DamageProperty$EVTYPE)] <- "ASTRONOMICAL LOW TIDE"
DamageProperty$EVTYPE[grep("AVALAN", DamageProperty$EVTYPE)] <- "AVALANCHE"
DamageProperty$EVTYPE[grep("BLIZZARD", DamageProperty$EVTYPE)] <- "BLIZZARD"
DamageProperty$EVTYPE[grep("COASTAL FLOODING", DamageProperty$EVTYPE)] <- "COASTAL FLOOD" 
DamageProperty$EVTYPE[grep("COASTAL  FLOODING", DamageProperty$EVTYPE)] <- "COASTAL FLOOD" 
DamageProperty$EVTYPE[grep("RIVER FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("URBAN FLOOD", DamageProperty$EVTYPE)] <- "FLOOD" 
DamageProperty$EVTYPE[grep("MAJOR FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("CSTL", DamageProperty$EVTYPE)] <- "COASTAL FLOOD"
DamageProperty$EVTYPE[grep("FLASH", DamageProperty$EVTYPE)] <- "FLASH FLOOD"
DamageProperty$EVTYPE[grep("FLOODING", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("FLOODS", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("URBAN", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("COASTAL FLOOD", DamageProperty$EVTYPE)] <- "COASTAL FLOOD"
DamageProperty$EVTYPE[grep("FLOOD &", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("HEAVY SNOW", DamageProperty$EVTYPE)] <- "HEAVY SNOW"
DamageProperty$EVTYPE[grep("STREAM FLOOD", DamageProperty$EVTYPE)] <- "LAKESHORE FLOOD"
DamageProperty$EVTYPE[grep("RURAL FLOOD", DamageProperty$EVTYPE)] <- "FLOOD" 
DamageProperty$EVTYPE[grep("LAKE FLOOD", DamageProperty$EVTYPE)] <- "LAKESHORE FLOOD" 
DamageProperty$EVTYPE[grep("HEAVY RAIN", DamageProperty$EVTYPE)] <- "HEAVY RAIN"
DamageProperty$EVTYPE[grep("ICE JAM FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("MARINE HAIL", DamageProperty$EVTYPE)] <- "Hail"
DamageProperty$EVTYPE[grep("HAIL", DamageProperty$EVTYPE)] <- "HAIL"
DamageProperty$EVTYPE[grep("Hail", DamageProperty$EVTYPE)] <- "MARINE HAIL"
DamageProperty$EVTYPE[grep("HURRI", DamageProperty$EVTYPE)] <- "HURRICANE/TYPHOON"
DamageProperty$EVTYPE[grep("DUST DEVIL", DamageProperty$EVTYPE)] <- "DUST DEVIL"
DamageProperty$EVTYPE[grep("WATERSP", DamageProperty$EVTYPE)] <- "WATERSPOUT" 
DamageProperty$EVTYPE[grep("TORN", DamageProperty$EVTYPE)] <- "TORNADO"
DamageProperty$EVTYPE[grep("STORM SURGE", DamageProperty$EVTYPE)] <- "STORM TIDE"
DamageProperty$EVTYPE[grep("TROPICAL", DamageProperty$EVTYPE)] <- "TROPICAL"
DamageProperty$EVTYPE[grep("WINTER STORM", DamageProperty$EVTYPE)] <- "WINTER STORM" 
DamageProperty$EVTYPE[grep("WINTER WEATHER", DamageProperty$EVTYPE)] <- "WINTER WEATHER" 
DamageProperty$EVTYPE[grep("THUNDERSTORMS WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNER", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORMS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORMWINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageProperty$EVTYPE[grep("THUNDERESTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageProperty$EVTYPE[grep("THUDERSTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageProperty$EVTYPE[grep("THUNDERSTORM  WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageProperty$EVTYPE[grep("THUNDEERSTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("TUNDERSTORM WIND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("DUST STORM", DamageProperty$EVTYPE)] <- "DUST STORM"
DamageProperty$EVTYPE[grep("EXTREME WIND CHILL", DamageProperty$EVTYPE)] <- "EXTREME COLD/WIND CHILL" 
DamageProperty$EVTYPE[grep("EXTREME WINDCHILL", DamageProperty$EVTYPE)] <- "EXTREME COLD/WIND CHILL" 
DamageProperty$EVTYPE[grep("GUSTY WIND", DamageProperty$EVTYPE)]  <- "HIGH WIND"
DamageProperty$EVTYPE[grep("WILD", DamageProperty$EVTYPE)] <- "WILDFIRE"
DamageProperty$EVTYPE[grep("DROUGHT", DamageProperty$EVTYPE)] <- "DROUGHT"
DamageProperty$EVTYPE[grep("HEAVY SHOWER", DamageProperty$EVTYPE)] <- "HEAVY RAIN"
DamageProperty$EVTYPE[grep("HEAVY LAKE SNOW", DamageProperty$EVTYPE)] <- "HEAVY SNOW"
DamageProperty$EVTYPE[grep("HEAVY PRECIPITATION", DamageProperty$EVTYPE)] <- "HEAVY RAIN"
DamageProperty$EVTYPE[grep("SEVERE THUND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("HIGH WIND", DamageProperty$EVTYPE)] <- "HIGH WIND"

Fifth step: summarize property damage by events

DamageProperty <- group_by(DamageProperty, EVTYPE) %>% summarize(sum(DAMAGE))
names(DamageProperty)<- c("EVTYPE","DAMAGE")
DamageProperty <- arrange(DamageProperty, desc(DAMAGE))

head(DamageProperty,n=20)
## Source: local data frame [20 x 2]
## 
##               EVTYPE       DAMAGE
## 1              FLOOD 150221016457
## 2  HURRICANE/TYPHOON  84756180010
## 3            TORNADO  56952152367
## 4         STORM TIDE  47964724000
## 5        FLASH FLOOD  17589772496
## 6               HAIL  15977565513
## 7  THUNDERSTORM WIND  12777900750
## 8           WILDFIRE   8491563500
## 9           TROPICAL   7716127550
## 10      WINTER STORM   6748997251
## 11         HIGH WIND   5993830003
## 12         ICE STORM   3944927860
## 13        HEAVY RAIN   3220671640
## 14           DROUGHT   1046306000
## 15        HEAVY SNOW    949197150
## 16         LIGHTNING    930379430
## 17          BLIZZARD    659913950
## 18           TYPHOON    600230000
## 19     COASTAL FLOOD    448816060
## 20         LANDSLIDE    324596000

4. Crop Damage

In order to calculate crop damage caused by weather events, we use the following variables from the dataset: EVTYPE, CROPDMG and CROPDMGEXP. CROPDMG represents the amount of money in crop damage and CROPDMGEXP represents the units. For example a value of 3 or K in CROPDMGEXP represents 1000. CROPDMGEXP needs some cleaning before its use to calculate the total amount of crop damage.

First step, select records that contain crop damage

Crop <- filter(dataTB, dataTB$CROPDMG>0) 
Crop <- tbl_df(Crop)
Crop$EVTYPE <-toupper(Crop$EVTYPE)

Second step, clean CROPDMGEXP and calculate the amount of crop damage in USD

Crop$CROPDMGEXP <- toupper(Crop$CROPDMGEXP)
unique(Crop$CROPDMGEXP)
## [1] "M" "K" "B" "0" ""
Crop <- mutate(Crop, CROPVAL = (CROPDMG * multiplicador[match(CROPDMGEXP,valor)]))

Third step, summarise property damage by events

DamageCrop <- group_by(Crop, EVTYPE) %>% summarize(sum(CROPVAL))
names(DamageCrop)<- c("EVTYPE","DAMAGE")

Four step, cleaning events:

DamageCrop$EVTYPE[grep("FLASH FLOOD", DamageCrop$EVTYPE)] <- "FLASH FLOOD"
DamageCrop$EVTYPE[grep("MARINE TSTM WIND", DamageCrop$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("NON-TSTM WIND", DamageCrop$EVTYPE)] <- "HIGH WIND"
DamageCrop$EVTYPE[grep("TSTM WIND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("MARINE THUNDERSTORM WIND", DamageCrop$EVTYPE)] <- "MARINE TSTM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORM WIND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("MARINE TSTM WIND", DamageCrop$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("HIGH SURF", DamageCrop$EVTYPE)] <- "HIGH SURF"
DamageCrop$EVTYPE[grep("ASTRONOMICAL", DamageCrop$EVTYPE)] <- "ASTRONOMICAL LOW TIDE"
DamageCrop$EVTYPE[grep("AVALAN", DamageCrop$EVTYPE)] <- "AVALANCHE"
DamageCrop$EVTYPE[grep("BLIZZARD", DamageCrop$EVTYPE)] <- "BLIZZARD"
DamageCrop$EVTYPE[grep("COASTAL FLOODING", DamageCrop$EVTYPE)] <- "COASTAL FLOOD" 
DamageCrop$EVTYPE[grep("COASTAL  FLOODING", DamageCrop$EVTYPE)] <- "COASTAL FLOOD" 
DamageCrop$EVTYPE[grep("RIVER FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("URBAN FLOOD", DamageCrop$EVTYPE)] <- "FLOOD" 
DamageCrop$EVTYPE[grep("MAJOR FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("CSTL", DamageCrop$EVTYPE)] <- "COASTAL FLOOD"
DamageCrop$EVTYPE[grep("FLASH", DamageCrop$EVTYPE)] <- "FLASH FLOOD"
DamageCrop$EVTYPE[grep("FLOODING", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("FLOODS", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("URBAN", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("COASTAL FLOOD", DamageCrop$EVTYPE)] <- "COASTAL FLOOD"
DamageCrop$EVTYPE[grep("FLOOD &", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("HEAVY SNOW", DamageCrop$EVTYPE)] <- "HEAVY SNOW"
DamageCrop$EVTYPE[grep("STREAM FLOOD", DamageCrop$EVTYPE)] <- "LAKESHORE FLOOD"
DamageCrop$EVTYPE[grep("RURAL FLOOD", DamageCrop$EVTYPE)] <- "FLOOD" 
DamageCrop$EVTYPE[grep("LAKE FLOOD", DamageCrop$EVTYPE)] <- "LAKESHORE FLOOD" 
DamageCrop$EVTYPE[grep("HEAVY RAIN", DamageCrop$EVTYPE)] <- "HEAVY RAIN"
DamageCrop$EVTYPE[grep("ICE JAM FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("MARINE HAIL", DamageCrop$EVTYPE)] <- "Hail"
DamageCrop$EVTYPE[grep("HAIL", DamageCrop$EVTYPE)] <- "HAIL"
DamageCrop$EVTYPE[grep("Hail", DamageCrop$EVTYPE)] <- "MARINE HAIL"
DamageCrop$EVTYPE[grep("HURRI", DamageCrop$EVTYPE)] <- "HURRICANE/TYPHOON"
DamageCrop$EVTYPE[grep("DUST DEVIL", DamageCrop$EVTYPE)] <- "DUST DEVIL"
DamageCrop$EVTYPE[grep("WATERSP", DamageCrop$EVTYPE)] <- "WATERSPOUT" 
DamageCrop$EVTYPE[grep("TORN", DamageCrop$EVTYPE)] <- "TORNADO"
DamageCrop$EVTYPE[grep("STORM SURGE", DamageCrop$EVTYPE)] <- "STORM TIDE"
DamageCrop$EVTYPE[grep("TROPICAL", DamageCrop$EVTYPE)] <- "TROPICAL"
DamageCrop$EVTYPE[grep("WINTER STORM", DamageCrop$EVTYPE)] <- "WINTER STORM" 
DamageCrop$EVTYPE[grep("WINTER WEATHER", DamageCrop$EVTYPE)] <- "WINTER WEATHER" 
DamageCrop$EVTYPE[grep("THUNDERSTORMS WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNER", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORMS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORMWINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageCrop$EVTYPE[grep("THUNDERESTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageCrop$EVTYPE[grep("THUDERSTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageCrop$EVTYPE[grep("THUNDERSTORM  WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND" 
DamageCrop$EVTYPE[grep("THUNDEERSTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("TUNDERSTORM WIND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("DUST STORM", DamageCrop$EVTYPE)] <- "DUST STORM"
DamageCrop$EVTYPE[grep("EXTREME WIND CHILL", DamageCrop$EVTYPE)] <- "EXTREME COLD/WIND CHILL" 
DamageCrop$EVTYPE[grep("EXTREME WINDCHILL", DamageCrop$EVTYPE)] <- "EXTREME COLD/WIND CHILL" 
DamageCrop$EVTYPE[grep("GUSTY WIND", DamageCrop$EVTYPE)]  <- "HIGH WIND"
DamageCrop$EVTYPE[grep("WILD", DamageCrop$EVTYPE)] <- "WILDFIRE"
DamageCrop$EVTYPE[grep("DROUGHT", DamageCrop$EVTYPE)] <- "DROUGHT"
DamageCrop$EVTYPE[grep("HEAVY SHOWER", DamageCrop$EVTYPE)] <- "HEAVY RAIN"
DamageCrop$EVTYPE[grep("HEAVY LAKE SNOW", DamageCrop$EVTYPE)] <- "HEAVY SNOW"
DamageCrop$EVTYPE[grep("HEAVY PRECIPITATION", DamageCrop$EVTYPE)] <- "HEAVY RAIN"
DamageCrop$EVTYPE[grep("SEVERE THUND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("HIGH WIND", DamageCrop$EVTYPE)] <- "HIGH WIND"

Fifth step: summarize crop damage by events

DamageCrop <- group_by(DamageCrop, EVTYPE) %>% summarize(sum(DAMAGE))
names(DamageCrop)<- c("EVTYPE","DAMAGE")
DamageCrop <- arrange(DamageCrop, desc(DAMAGE))

head(DamageCrop,n=20)
## Source: local data frame [20 x 2]
## 
##               EVTYPE      DAMAGE
## 1            DROUGHT 13972621780
## 2              FLOOD 10743489050
## 3  HURRICANE/TYPHOON  5515292800
## 4          ICE STORM  5022113500
## 5               HAIL  3046937623
## 6        FLASH FLOOD  1532197150
## 7       EXTREME COLD  1312973000
## 8  THUNDERSTORM WIND  1273113988
## 9       FROST/FREEZE  1094186000
## 10        HEAVY RAIN   793909800
## 11          TROPICAL   694896000
## 12         HIGH WIND   686501900
## 13    EXCESSIVE HEAT   492402000
## 14            FREEZE   456725000
## 15           TORNADO   414961520
## 16          WILDFIRE   402781630
## 17              HEAT   401461500
## 18   DAMAGING FREEZE   296230000
## 19 EXCESSIVE WETNESS   142000000
## 20        HEAVY SNOW   134673100

Results

The next figure shows the top 10 weather events that have produced the greatest number of fatalities and injuries.

g1 <- ggplot(data=Fatalities[1:10,], aes(x=reorder(substr(EVTYPE,1,18),FATALITIES), y=FATALITIES, fill="red")) +
        geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
        coord_flip() + xlab(" ") + ylab("Fatalities") + 
        theme(legend.position="none")


g2 <- ggplot(data=Injuries[1:10,], aes(x=reorder(substr(EVTYPE,1,18),INJURIES), y=INJURIES, fill="red")) +
        geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
        coord_flip() + xlab(" ")  + ylab("Injuries") + 
        theme(legend.position="none")

grid.arrange(g1,g2,main="     Total Fatalities and Injuries by Weather Event Type", left="Weather Event")

From the plot above, we conclude that Tornados are the most harmful event to population health both for Fatalities and Injuries.

The following figure shows the top 10 weather events that have produced the greatest amount of damage to property and crops.

g3 <- ggplot(data=DamageProperty[1:10,], aes(x=reorder(substr(EVTYPE,1,18),DAMAGE), y=DAMAGE/1e+09, fill="red")) +
        geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
        coord_flip() + xlab(" ") + ylab("Property Damage in Billion USD") + 
        theme(legend.position="none")


g4 <- ggplot(data=DamageCrop[1:10,], aes(x=reorder(substr(EVTYPE,1,18),DAMAGE), y=DAMAGE/1e+09, fill="red")) +
        geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
        coord_flip() + xlab(" ")  + ylab("Crop Damage in Billion USD") + 
        theme(legend.position="none")

grid.arrange(g3,g4,main="     Total Property and Crop Damage by Weather Event Type", left="Weather Event")

From the plot above, Floods cause the most property damage and droughts cause the most crop damage.