Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The first question to be address is what events are most harmful with respect to population health. Therefore, the number of deaths and injuries that have occurred are analyzed. The second question is what events have the greatest economic consequences, so property and crop damages are analyzed.
From the analysis of the NOOA Storm Database, we found that on average across the United States, tornados are the most harmful event to population health both for fatalities and injuries. Floods cause the most property damage and droughts cause the most crop damage.
The analysis of the NOAA storm database was performed on the following system:
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.2 tools_3.2.0 htmltools_0.2.6
## [5] yaml_2.1.13 stringi_0.4-1 rmarkdown_0.8 knitr_1.11
## [9] stringr_1.0.0 digest_0.6.8 evaluate_0.7.2
The following packages were used for the analysis:
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.1
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.1
## Loading required package: grid
The data for this study come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It can be downloaded using this link. The events in the database start in the year 1950 and end in November 2011
The following code can be used to download the data for further processing.
setwd("~/Coursera/Data Science/5_Reproducible Research/StormData")
if(!file.exists("StormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")
}
data<-read.csv(bzfile("StormData.csv.bz2"))
The raw data consists of 37 variables. For this analysis, the following variables: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP were used.
dataTB<-tbl_df(data) %>%
select(c(8,23,24,25,26,27,28))
head(dataTB)
## Source: local data frame [6 x 7]
##
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
The basic goal of this study is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The first question to be address is what events are most harmful with respect to population health. Therefore, the number of deaths and injuries that have occurred are analyzed. The second question is what events have the greatest economic consequences. So property and crop damages are analyzed. In this sense, four dataset are generated to analyze the impact of weather events on:
First the relevant data associated with Fatalities (Fatalities>0) is selected. This data is organized by weather event and then the total number of deaths is computed. A look-up table (LUT) is created to map the event types with the official forty-eight weather event types. After this cleaning step, the data is again organized by event and the total number of fatalities is recalculated.
LUT <- c("AVALANCE" = "AVALANCHE",
"AVALANCHE" = "AVALANCHE",
"BLACK ICE" = "SLEET",
"BLIZZARD" = "BLIZZARD",
"BLOWING SNOW" = "HEAVY SNOW",
"COASTAL FLOOD" = "COASTAL FLOOD",
"COASTAL FLOODING" = "COASTAL FLOOD",
"COASTAL STORM" = "STORM",
"COASTALSTORM" = "STORM",
"COLD" = "COLD/WIND CHILL",
"COLD AND SNOW" = "COLD/WIND CHILL",
"COLD TEMPERATURE" = "COLD/WIND CHILL",
"COLD WAVE" = "COLD/WIND CHILL",
"COLD WEATHER" = "COLD/WIND CHILL",
"COLD/WIND CHILL" = "COLD/WIND CHILL",
"COLD/WINDS" = "COLD/WIND CHILL",
"DENSE FOG" = "DENSE FOG",
"DROUGHT/EXCESSIVE HEAT" = "EXCESSIVE HEAT",
"DROWNING" = "DROUGHT",
"DRY MICROBURST" = "",
"DUST DEVIL" = "DUST DEVIL",
"DUST STORM" = "DUST STORM",
"EXCESSIVE HEAT" = "EXCESSIVE HEAT",
"EXCESSIVE RAINFALL" = "HEAVY RAIN",
"EXTENDED COLD" = "COLD/WIND CHILL",
"EXTREME COLD" = "EXTREME COLD/WIND CHILL",
"EXTREME COLD/WIND CHILL" = "EXTREME COLD/WIND CHILL",
"EXTREME HEAT" = "EXCESSIVE HEAT",
"EXTREME WINDCHILL" = "EXTREME COLD/WIND CHILL",
"FALLING SNOW/ICE" = "ICE STORM",
"FLASH FLOOD" = "FLASH FLOOD",
"FLASH FLOOD/FLOOD" = "FLASH FLOOD",
"FLASH FLOODING" = "FLASH FLOOD",
"FLASH FLOODING/FLOOD" = "FLASH FLOOD",
"FLASH FLOODS" = "FLASH FLOOD",
"FLOOD" = "FLOOD",
"FLOOD & HEAVY RAIN" = "FLOOD",
"FLOOD/FLASH FLOOD" = "FLOOD",
"FLOOD/RIVER FLOOD" = "FLOOD",
"FLOODING" = "FLOOD",
"FOG" = "DENSE FOG",
"FOG AND COLD TEMPERATURES" = "FREEZING FOG",
"FREEZE" = "FROST/FREEZE",
"FREEZING DRIZZLE" = "FROST/FREEZE",
"FREEZING RAIN" = "FROST/FREEZE",
"FREEZING RAIN/SNOW" = "FROST/FREEZE",
"FREEZING SPRAY" = "FROST/FREEZE",
"FROST" = "FROST/FREEZE",
"GLAZE" = "FROST/FREEZE",
"GUSTY WIND" = "HIGH WIND",
"GUSTY WINDS" = "HIGH WIND",
"HAIL" = "HAIL",
"HEAT" = "HEAT",
"HEAT WAVE" = "HEAT",
"HEAT WAVE DROUGHT" = "DROUGHT",
"HEAT WAVES" = "HEAT",
"HEAVY RAIN" = "HEAVY RAIN",
"HEAVY SEAS" = "HIGH SURF",
"HEAVY SNOW" = "HEAVY SNOW",
"HEAVY SNOW AND HIGH WINDS" = "HEAVY SNOW",
"HEAVY SURF" = "HIGH SURF",
"HEAVY SURF AND WIND" = "HIGH SURF",
"HEAVY SURF/HIGH SURF" = "HIGH SURF",
"HIGH SEAS" = "HIGH SURF",
"HIGH SURF" = "HIGH SURF",
"HIGH SWELLS" = "HIGH SURF",
"HIGH WATER" = "HIGH SURF",
"HIGH WAVES" = "HIGH SURF",
"HIGH WIND" = "HIGH WIND",
"HIGH WIND AND SEAS" = "HIGH WIND",
"HIGH WIND/SEAS" = "HIGH WIND",
"HIGH WINDS" = "HIGH WIND",
"HIGH WINDS/SNOW" = "HEAVY SNOW",
"HURRICANE" = "HURRICANE/TYPHOON",
"HURRICANE ERIN" = "HURRICANE/TYPHOON",
"HURRICANE FELIX" = "HURRICANE/TYPHOON",
"HURRICANE OPAL" = "HURRICANE/TYPHOON",
"HURRICANE OPAL/HIGH WINDS" = "HURRICANE/TYPHOON",
"HURRICANE/TYPHOON" = "HURRICANE/TYPHOON",
"HYPERTHERMIA/EXPOSURE" = "",
"HYPOTHERMIA" = "",
"HYPOTHERMIA/EXPOSURE" = "",
"ICE" = "ICE STORM",
"ICE ON ROAD" = "SLEET",
"ICE STORM" = "ICE STORM",
"ICY ROADS" = "SLEET",
"LANDSLIDE" = "DEBRIS FLOW",
"LANDSLIDES" = "DEBRIS FLOW",
"LIGHT SNOW" = "LIGHTNING",
"LIGHTNING" = "LIGHTNING",
"LIGHTNING." = "LIGHTNING",
"LOW TEMPERATURE" = "COLD/WIND CHILL",
"MARINE ACCIDENT" = "",
"MARINE HIGH WIND" = "MARINE HIGH WIND",
"MARINE MISHAP" = "MARINE HIGH WIND",
"MARINE STRONG WIND" = "MARINE HIGH WIND",
"MARINE THUNDERSTORM WIND" = "MARINE THUNDERSTORM WIND",
"MARINE TSTM WIND" = "MARINE THUNDERSTORM WIND",
"MINOR FLOODING" = "FLASH FLOOD",
"MIXED PRECIP" = "",
"MUDSLIDE" = "DEBRIS FLOW",
"MUDSLIDES" = "DEBRIS FLOW",
"RAIN/SNOW" = "HEAVY RAIN",
"RAIN/WIND" = "HEAVY RAIN" ,
"RAPIDLY RISING WATER" = "FLASH FLOOD",
"RECORD COLD" = "EXTREME COLD/WIND CHILL",
"RECORD HEAT" = "EXCESSIVE HEAT",
"RECORD/EXCESSIVE HEAT" = "EXCESSIVE HEAT",
"RIP CURRENT" = "RIP CURRENT",
"RIP CURRENTS" = "RIP CURRENT",
"RIP CURRENTS/HEAVY SURF" = "RIP CURRENT",
"RIVER FLOOD" = "FLOOD",
"RIVER FLOODING" = "FLOOD",
"ROUGH SEAS" = "HIGH SURF",
"ROUGH SURF" = "HIGH SURF",
"SLEET" = "SLEET",
"SNOW" = "HEAVY SNOW",
"SNOW AND ICE" = "FROST/FREEZE",
"SNOW SQUALL" = "HEAVY SNOW",
"SNOW SQUALLS" = "HEAVY SNOW",
"SNOW/ BITTER COLD" = "",
"STORM SURGE" = "STORM TIDE",
"STORM SURGE/TIDE" = "STORM TIDE",
"STRONG WIND" = "STRONG WIND",
"STRONG WINDS" = "STRONG WIND",
"THUNDERSNOW" = "THUNDERSTORM WIND",
"THUNDERSTORM" = "THUNDERSTORM WIND",
"THUNDERSTORM WIND" = "THUNDERSTORM WIND",
"THUNDERSTORM WIND (G40)" = "THUNDERSTORM WIND",
"THUNDERSTORM WIND G52" = "THUNDERSTORM WIND",
"THUNDERSTORM WINDS" = "THUNDERSTORM WIND",
"THUNDERTORM WINDS" = "THUNDERSTORM WIND",
"TORNADO" = "TORNADO",
"TORNADOES, TSTM WIND, HAIL" = "TORNADO",
"TROPICAL STORM" = "TROPICAL STORM",
"TROPICAL STORM GORDON" = "TROPICAL STORM",
"TSTM WIND" = "THUNDERSTORM WIND",
"TSTM WIND (G35)" = "THUNDERSTORM WIND",
"TSTM WIND/HAIL" = "THUNDERSTORM WIND",
"TSUNAMI" = "TSUNAMI",
"UNSEASONABLY COLD" = "",
"UNSEASONABLY WARM" = "",
"UNSEASONABLY WARM AND DRY" = "",
"URBAN AND SMALL STREAM FLOODIN" = "FLOOD",
"URBAN/SML STREAM FLD" = "FLOOD",
"WATERSPOUT" = "WATERSPOUT",
"WATERSPOUT/TORNADO" = "WATERSPOUT",
"WHIRLWIND" = "TORNADO",
"WILD FIRES" = "WILDFIRE",
"WILD/FOREST FIRE" = "WILDFIRE",
"WILDFIRE" = "WILDFIRE",
"WIND" = "HIGH WIND",
"WIND STORM" = "HIGH WIND",
"WINDS" = "HIGH WIND",
"WINTER STORM" = "WINTER STORM",
"WINTER STORM HIGH WINDS" = "WINTER STORM",
"WINTER STORMS" = "WINTER STORM",
"WINTER WEATHER" = "WINTER WEATHER",
"WINTER WEATHER/MIX" = "WINTER WEATHER",
"WINTRY MIX" = "WINTER WEATHER")
Fatalities <- filter(dataTB, FATALITIES > 0) # data selected
Fatalities$EVTYPE <- toupper(Fatalities$EVTYPE)
Fatalities <- group_by(Fatalities, EVTYPE) %>% summarize(sum(FATALITIES)) # Total fatalities by event
names(Fatalities)<- c("EVTYPE","FATALITIES")
Fatalities$EVTYPE <- LUT[Fatalities$EVTYPE] # Lookup table
Fatalities <- group_by(Fatalities, EVTYPE) %>% summarize(sum(FATALITIES)) # Total Fatalities is recalculated
names(Fatalities)<- c("EVTYPE","FATALITIES")
Fatalities <- arrange(Fatalities, desc(FATALITIES))
head(Fatalities,n=15)
## Source: local data frame [15 x 2]
##
## EVTYPE FATALITIES
## 1 TORNADO 5659
## 2 EXCESSIVE HEAT 2020
## 3 HEAT 1114
## 4 FLASH FLOOD 1020
## 5 LIGHTNING 818
## 6 THUNDERSTORM WIND 712
## 7 RIP CURRENT 577
## 8 FLOOD 528
## 9 HIGH WIND 320
## 10 EXTREME COLD/WIND CHILL 305
## 11 AVALANCHE 225
## 12 WINTER STORM 217
## 13 HIGH SURF 182
## 14 COLD/WIND CHILL 166
## 15 HEAVY SNOW 142
A similar approach is used in the case of Injuries.
Injuries <- filter(dataTB, INJURIES > 0) # data selected
Injuries$EVTYPE <- toupper(Injuries$EVTYPE)
Injuries <- group_by(Injuries, EVTYPE) %>% summarize(sum(INJURIES)) # Total injuries by event
names(Injuries)<- c("EVTYPE","INJURIES")
Injuries$EVTYPE <- LUT[Injuries$EVTYPE] # lookup table
Injuries <- group_by(Injuries, EVTYPE) %>% summarize(sum(INJURIES))
names(Injuries)<- c("EVTYPE","INJURIES")
Injuries <- arrange(Injuries, desc(INJURIES)) # Total injuries is recalculated
head(Injuries,n=15)
## Source: local data frame [15 x 2]
##
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 THUNDERSTORM WIND 9461
## 3 FLOOD 6888
## 4 EXCESSIVE HEAT 6730
## 5 LIGHTNING 5232
## 6 HEAT 2479
## 7 ICE STORM 2113
## 8 FLASH FLOOD 1785
## 9 WILDFIRE 1606
## 10 HIGH WIND 1558
## 11 HAIL 1361
## 12 WINTER STORM 1353
## 13 HURRICANE/TYPHOON 1323
## 14 HEAVY SNOW 1107
## 15 DENSE FOG 1076
In order to calculate property damage caused by weather events, the following variables are used: EVTYPE, PROPDMG and PROPDMGEXP. PROPDMG represents the amount of money in property damage and PROPDMGEXP represents the units. For example a value of 3 or K in PROPDMGEXP represents 1000. PROPDMGEXP needs some cleaning before its use to calculate the total amount of property damage.
First step, select records that contain property damage
Property <- filter(dataTB, dataTB$PROPDMG>0)
Property <- tbl_df(Property)
Property$EVTYPE <-toupper(Property$EVTYPE)
Second step, clean PROPDMGEXP and calculate the amount of property damage in USD
valor <- c("H","K","M","B","0","1","2","3","4","5","6","7","8","-","+","","?")
multiplicador <- c(100,1000,1000000,1000000000,1,10,100,1000,1000,100000,1000000,10000000,100000000,0,0,1,0)
Property$PROPDMGEXP <- toupper(Property$PROPDMGEXP)
unique(Property$PROPDMGEXP)
## [1] "K" "M" "B" "+" "0" "" "5" "6" "4" "H" "2" "7" "3" "-"
Property <- mutate(Property, PROPVAL = (PROPDMG * multiplicador[match(PROPDMGEXP,valor)]))
Third step, summarise property damage by events
DamageProperty <- group_by(Property, EVTYPE) %>% summarize(sum(PROPVAL))
names(DamageProperty)<- c("EVTYPE","DAMAGE")
Four step, cleaning events: For property and crop damage, the lookup table is not used because the size of this table is bigger than in the case of fatalities and injuries. A manual cleaning of the events type is performed.
DamageProperty$EVTYPE[grep("FLASH FLOOD", DamageProperty$EVTYPE)] <- "FLASH FLOOD"
DamageProperty$EVTYPE[grep("MARINE TSTM WIND", DamageProperty$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("NON-TSTM WIND", DamageProperty$EVTYPE)] <- "HIGH WIND"
DamageProperty$EVTYPE[grep("TSTM WIND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("MARINE THUNDERSTORM WIND", DamageProperty$EVTYPE)] <- "MARINE TSTM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORM WIND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("MARINE TSTM WIND", DamageProperty$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("HIGH SURF", DamageProperty$EVTYPE)] <- "HIGH SURF"
DamageProperty$EVTYPE[grep("ASTRONOMICAL", DamageProperty$EVTYPE)] <- "ASTRONOMICAL LOW TIDE"
DamageProperty$EVTYPE[grep("AVALAN", DamageProperty$EVTYPE)] <- "AVALANCHE"
DamageProperty$EVTYPE[grep("BLIZZARD", DamageProperty$EVTYPE)] <- "BLIZZARD"
DamageProperty$EVTYPE[grep("COASTAL FLOODING", DamageProperty$EVTYPE)] <- "COASTAL FLOOD"
DamageProperty$EVTYPE[grep("COASTAL FLOODING", DamageProperty$EVTYPE)] <- "COASTAL FLOOD"
DamageProperty$EVTYPE[grep("RIVER FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("URBAN FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("MAJOR FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("CSTL", DamageProperty$EVTYPE)] <- "COASTAL FLOOD"
DamageProperty$EVTYPE[grep("FLASH", DamageProperty$EVTYPE)] <- "FLASH FLOOD"
DamageProperty$EVTYPE[grep("FLOODING", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("FLOODS", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("URBAN", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("COASTAL FLOOD", DamageProperty$EVTYPE)] <- "COASTAL FLOOD"
DamageProperty$EVTYPE[grep("FLOOD &", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("HEAVY SNOW", DamageProperty$EVTYPE)] <- "HEAVY SNOW"
DamageProperty$EVTYPE[grep("STREAM FLOOD", DamageProperty$EVTYPE)] <- "LAKESHORE FLOOD"
DamageProperty$EVTYPE[grep("RURAL FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("LAKE FLOOD", DamageProperty$EVTYPE)] <- "LAKESHORE FLOOD"
DamageProperty$EVTYPE[grep("HEAVY RAIN", DamageProperty$EVTYPE)] <- "HEAVY RAIN"
DamageProperty$EVTYPE[grep("ICE JAM FLOOD", DamageProperty$EVTYPE)] <- "FLOOD"
DamageProperty$EVTYPE[grep("MARINE HAIL", DamageProperty$EVTYPE)] <- "Hail"
DamageProperty$EVTYPE[grep("HAIL", DamageProperty$EVTYPE)] <- "HAIL"
DamageProperty$EVTYPE[grep("Hail", DamageProperty$EVTYPE)] <- "MARINE HAIL"
DamageProperty$EVTYPE[grep("HURRI", DamageProperty$EVTYPE)] <- "HURRICANE/TYPHOON"
DamageProperty$EVTYPE[grep("DUST DEVIL", DamageProperty$EVTYPE)] <- "DUST DEVIL"
DamageProperty$EVTYPE[grep("WATERSP", DamageProperty$EVTYPE)] <- "WATERSPOUT"
DamageProperty$EVTYPE[grep("TORN", DamageProperty$EVTYPE)] <- "TORNADO"
DamageProperty$EVTYPE[grep("STORM SURGE", DamageProperty$EVTYPE)] <- "STORM TIDE"
DamageProperty$EVTYPE[grep("TROPICAL", DamageProperty$EVTYPE)] <- "TROPICAL"
DamageProperty$EVTYPE[grep("WINTER STORM", DamageProperty$EVTYPE)] <- "WINTER STORM"
DamageProperty$EVTYPE[grep("WINTER WEATHER", DamageProperty$EVTYPE)] <- "WINTER WEATHER"
DamageProperty$EVTYPE[grep("THUNDERSTORMS WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNER", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORMS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORMWINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERESTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUDERSTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDERSTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("THUNDEERSTORM WINDS", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("TUNDERSTORM WIND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("DUST STORM", DamageProperty$EVTYPE)] <- "DUST STORM"
DamageProperty$EVTYPE[grep("EXTREME WIND CHILL", DamageProperty$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
DamageProperty$EVTYPE[grep("EXTREME WINDCHILL", DamageProperty$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
DamageProperty$EVTYPE[grep("GUSTY WIND", DamageProperty$EVTYPE)] <- "HIGH WIND"
DamageProperty$EVTYPE[grep("WILD", DamageProperty$EVTYPE)] <- "WILDFIRE"
DamageProperty$EVTYPE[grep("DROUGHT", DamageProperty$EVTYPE)] <- "DROUGHT"
DamageProperty$EVTYPE[grep("HEAVY SHOWER", DamageProperty$EVTYPE)] <- "HEAVY RAIN"
DamageProperty$EVTYPE[grep("HEAVY LAKE SNOW", DamageProperty$EVTYPE)] <- "HEAVY SNOW"
DamageProperty$EVTYPE[grep("HEAVY PRECIPITATION", DamageProperty$EVTYPE)] <- "HEAVY RAIN"
DamageProperty$EVTYPE[grep("SEVERE THUND", DamageProperty$EVTYPE)] <- "THUNDERSTORM WIND"
DamageProperty$EVTYPE[grep("HIGH WIND", DamageProperty$EVTYPE)] <- "HIGH WIND"
Fifth step: summarize property damage by events
DamageProperty <- group_by(DamageProperty, EVTYPE) %>% summarize(sum(DAMAGE))
names(DamageProperty)<- c("EVTYPE","DAMAGE")
DamageProperty <- arrange(DamageProperty, desc(DAMAGE))
head(DamageProperty,n=20)
## Source: local data frame [20 x 2]
##
## EVTYPE DAMAGE
## 1 FLOOD 150221016457
## 2 HURRICANE/TYPHOON 84756180010
## 3 TORNADO 56952152367
## 4 STORM TIDE 47964724000
## 5 FLASH FLOOD 17589772496
## 6 HAIL 15977565513
## 7 THUNDERSTORM WIND 12777900750
## 8 WILDFIRE 8491563500
## 9 TROPICAL 7716127550
## 10 WINTER STORM 6748997251
## 11 HIGH WIND 5993830003
## 12 ICE STORM 3944927860
## 13 HEAVY RAIN 3220671640
## 14 DROUGHT 1046306000
## 15 HEAVY SNOW 949197150
## 16 LIGHTNING 930379430
## 17 BLIZZARD 659913950
## 18 TYPHOON 600230000
## 19 COASTAL FLOOD 448816060
## 20 LANDSLIDE 324596000
In order to calculate crop damage caused by weather events, we use the following variables from the dataset: EVTYPE, CROPDMG and CROPDMGEXP. CROPDMG represents the amount of money in crop damage and CROPDMGEXP represents the units. For example a value of 3 or K in CROPDMGEXP represents 1000. CROPDMGEXP needs some cleaning before its use to calculate the total amount of crop damage.
First step, select records that contain crop damage
Crop <- filter(dataTB, dataTB$CROPDMG>0)
Crop <- tbl_df(Crop)
Crop$EVTYPE <-toupper(Crop$EVTYPE)
Second step, clean CROPDMGEXP and calculate the amount of crop damage in USD
Crop$CROPDMGEXP <- toupper(Crop$CROPDMGEXP)
unique(Crop$CROPDMGEXP)
## [1] "M" "K" "B" "0" ""
Crop <- mutate(Crop, CROPVAL = (CROPDMG * multiplicador[match(CROPDMGEXP,valor)]))
Third step, summarise property damage by events
DamageCrop <- group_by(Crop, EVTYPE) %>% summarize(sum(CROPVAL))
names(DamageCrop)<- c("EVTYPE","DAMAGE")
Four step, cleaning events:
DamageCrop$EVTYPE[grep("FLASH FLOOD", DamageCrop$EVTYPE)] <- "FLASH FLOOD"
DamageCrop$EVTYPE[grep("MARINE TSTM WIND", DamageCrop$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("NON-TSTM WIND", DamageCrop$EVTYPE)] <- "HIGH WIND"
DamageCrop$EVTYPE[grep("TSTM WIND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("MARINE THUNDERSTORM WIND", DamageCrop$EVTYPE)] <- "MARINE TSTM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORM WIND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("MARINE TSTM WIND", DamageCrop$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("HIGH SURF", DamageCrop$EVTYPE)] <- "HIGH SURF"
DamageCrop$EVTYPE[grep("ASTRONOMICAL", DamageCrop$EVTYPE)] <- "ASTRONOMICAL LOW TIDE"
DamageCrop$EVTYPE[grep("AVALAN", DamageCrop$EVTYPE)] <- "AVALANCHE"
DamageCrop$EVTYPE[grep("BLIZZARD", DamageCrop$EVTYPE)] <- "BLIZZARD"
DamageCrop$EVTYPE[grep("COASTAL FLOODING", DamageCrop$EVTYPE)] <- "COASTAL FLOOD"
DamageCrop$EVTYPE[grep("COASTAL FLOODING", DamageCrop$EVTYPE)] <- "COASTAL FLOOD"
DamageCrop$EVTYPE[grep("RIVER FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("URBAN FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("MAJOR FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("CSTL", DamageCrop$EVTYPE)] <- "COASTAL FLOOD"
DamageCrop$EVTYPE[grep("FLASH", DamageCrop$EVTYPE)] <- "FLASH FLOOD"
DamageCrop$EVTYPE[grep("FLOODING", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("FLOODS", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("URBAN", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("COASTAL FLOOD", DamageCrop$EVTYPE)] <- "COASTAL FLOOD"
DamageCrop$EVTYPE[grep("FLOOD &", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("HEAVY SNOW", DamageCrop$EVTYPE)] <- "HEAVY SNOW"
DamageCrop$EVTYPE[grep("STREAM FLOOD", DamageCrop$EVTYPE)] <- "LAKESHORE FLOOD"
DamageCrop$EVTYPE[grep("RURAL FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("LAKE FLOOD", DamageCrop$EVTYPE)] <- "LAKESHORE FLOOD"
DamageCrop$EVTYPE[grep("HEAVY RAIN", DamageCrop$EVTYPE)] <- "HEAVY RAIN"
DamageCrop$EVTYPE[grep("ICE JAM FLOOD", DamageCrop$EVTYPE)] <- "FLOOD"
DamageCrop$EVTYPE[grep("MARINE HAIL", DamageCrop$EVTYPE)] <- "Hail"
DamageCrop$EVTYPE[grep("HAIL", DamageCrop$EVTYPE)] <- "HAIL"
DamageCrop$EVTYPE[grep("Hail", DamageCrop$EVTYPE)] <- "MARINE HAIL"
DamageCrop$EVTYPE[grep("HURRI", DamageCrop$EVTYPE)] <- "HURRICANE/TYPHOON"
DamageCrop$EVTYPE[grep("DUST DEVIL", DamageCrop$EVTYPE)] <- "DUST DEVIL"
DamageCrop$EVTYPE[grep("WATERSP", DamageCrop$EVTYPE)] <- "WATERSPOUT"
DamageCrop$EVTYPE[grep("TORN", DamageCrop$EVTYPE)] <- "TORNADO"
DamageCrop$EVTYPE[grep("STORM SURGE", DamageCrop$EVTYPE)] <- "STORM TIDE"
DamageCrop$EVTYPE[grep("TROPICAL", DamageCrop$EVTYPE)] <- "TROPICAL"
DamageCrop$EVTYPE[grep("WINTER STORM", DamageCrop$EVTYPE)] <- "WINTER STORM"
DamageCrop$EVTYPE[grep("WINTER WEATHER", DamageCrop$EVTYPE)] <- "WINTER WEATHER"
DamageCrop$EVTYPE[grep("THUNDERSTORMS WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNER", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORMS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORMWINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERESTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUDERSTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDERSTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("THUNDEERSTORM WINDS", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("TUNDERSTORM WIND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("DUST STORM", DamageCrop$EVTYPE)] <- "DUST STORM"
DamageCrop$EVTYPE[grep("EXTREME WIND CHILL", DamageCrop$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
DamageCrop$EVTYPE[grep("EXTREME WINDCHILL", DamageCrop$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
DamageCrop$EVTYPE[grep("GUSTY WIND", DamageCrop$EVTYPE)] <- "HIGH WIND"
DamageCrop$EVTYPE[grep("WILD", DamageCrop$EVTYPE)] <- "WILDFIRE"
DamageCrop$EVTYPE[grep("DROUGHT", DamageCrop$EVTYPE)] <- "DROUGHT"
DamageCrop$EVTYPE[grep("HEAVY SHOWER", DamageCrop$EVTYPE)] <- "HEAVY RAIN"
DamageCrop$EVTYPE[grep("HEAVY LAKE SNOW", DamageCrop$EVTYPE)] <- "HEAVY SNOW"
DamageCrop$EVTYPE[grep("HEAVY PRECIPITATION", DamageCrop$EVTYPE)] <- "HEAVY RAIN"
DamageCrop$EVTYPE[grep("SEVERE THUND", DamageCrop$EVTYPE)] <- "THUNDERSTORM WIND"
DamageCrop$EVTYPE[grep("HIGH WIND", DamageCrop$EVTYPE)] <- "HIGH WIND"
Fifth step: summarize crop damage by events
DamageCrop <- group_by(DamageCrop, EVTYPE) %>% summarize(sum(DAMAGE))
names(DamageCrop)<- c("EVTYPE","DAMAGE")
DamageCrop <- arrange(DamageCrop, desc(DAMAGE))
head(DamageCrop,n=20)
## Source: local data frame [20 x 2]
##
## EVTYPE DAMAGE
## 1 DROUGHT 13972621780
## 2 FLOOD 10743489050
## 3 HURRICANE/TYPHOON 5515292800
## 4 ICE STORM 5022113500
## 5 HAIL 3046937623
## 6 FLASH FLOOD 1532197150
## 7 EXTREME COLD 1312973000
## 8 THUNDERSTORM WIND 1273113988
## 9 FROST/FREEZE 1094186000
## 10 HEAVY RAIN 793909800
## 11 TROPICAL 694896000
## 12 HIGH WIND 686501900
## 13 EXCESSIVE HEAT 492402000
## 14 FREEZE 456725000
## 15 TORNADO 414961520
## 16 WILDFIRE 402781630
## 17 HEAT 401461500
## 18 DAMAGING FREEZE 296230000
## 19 EXCESSIVE WETNESS 142000000
## 20 HEAVY SNOW 134673100
The next figure shows the top 10 weather events that have produced the greatest number of fatalities and injuries.
g1 <- ggplot(data=Fatalities[1:10,], aes(x=reorder(substr(EVTYPE,1,18),FATALITIES), y=FATALITIES, fill="red")) +
geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
coord_flip() + xlab(" ") + ylab("Fatalities") +
theme(legend.position="none")
g2 <- ggplot(data=Injuries[1:10,], aes(x=reorder(substr(EVTYPE,1,18),INJURIES), y=INJURIES, fill="red")) +
geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
coord_flip() + xlab(" ") + ylab("Injuries") +
theme(legend.position="none")
grid.arrange(g1,g2,main=" Total Fatalities and Injuries by Weather Event Type", left="Weather Event")
From the plot above, we conclude that Tornados are the most harmful event to population health both for Fatalities and Injuries.
The following figure shows the top 10 weather events that have produced the greatest amount of damage to property and crops.
g3 <- ggplot(data=DamageProperty[1:10,], aes(x=reorder(substr(EVTYPE,1,18),DAMAGE), y=DAMAGE/1e+09, fill="red")) +
geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
coord_flip() + xlab(" ") + ylab("Property Damage in Billion USD") +
theme(legend.position="none")
g4 <- ggplot(data=DamageCrop[1:10,], aes(x=reorder(substr(EVTYPE,1,18),DAMAGE), y=DAMAGE/1e+09, fill="red")) +
geom_bar(stat="identity", colour="black", width=0.7, position = position_dodge(width=0.9)) +
coord_flip() + xlab(" ") + ylab("Crop Damage in Billion USD") +
theme(legend.position="none")
grid.arrange(g3,g4,main=" Total Property and Crop Damage by Weather Event Type", left="Weather Event")
From the plot above, Floods cause the most property damage and droughts cause the most crop damage.