Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. The objective of this analysis is to determine which types of events are the most harmful to population health and which ones have the greatest economic consequences.
We have analysed data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which tracks characteristics of major storms and weather events in the United States. The analysis has been conducted for the events occurring between 1996 and 2011, which represents the time period where all 48 official event types have been recorded, with some filtering applied to include only the events causing either health or economic damage and to manually map non standard event types, which exists due to typos, to the standard ones. We’ve then filtered out varialbles which were not necessary for the analysis, derived some summary descriptions and total figures in terms of fatalities, injuries and property/crop damage grouped by event type.
We’ve found the most harmful events in terms of human health are tornadoes, excessive heat, floods and lightnings. Besides these, rip currents have a certain impact in terms of fatalities and thunderstorm winds in terms of injuries. In terms of economic damage, floods have major consequences together with strong wind storms. Also, major effects have of storm surge/tides and hail.
From the Reproducible Research Course website we obtain data about major storms and weather events in the United States, including estimates of any fatalities, injuries, and property damage.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
inputFile <- "StormData.csv.bz2"
if(!file.exists(inputFile)) {
download.file(fileUrl, destfile = inputFile, method = "curl")
}
We first read the data, including headers, from the raw compressed text file. The data is a CSV file with missing values coded as blank fields.
library(dplyr)
library(lubridate)
library(data.table)
stormdata <- fread(sprintf("bzcat %s", "StormData.csv.bz2"))
##
Read 25.8% of 967216 rows
Read 50.7% of 967216 rows
Read 68.2% of 967216 rows
Read 78.6% of 967216 rows
Read 92.0% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:07
After reading the data we check the dimension and the first few rows:
dim(stormdata)
## [1] 902297 37
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1: 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2: 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3: 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4: 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5: 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6: 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1: TORNADO 0 0
## 2: TORNADO 0 0
## 3: TORNADO 0 0
## 4: TORNADO 0 0
## 5: TORNADO 0 0
## 6: TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1: NA 0 14.0 100 3 0 0
## 2: NA 0 2.0 150 2 0 0
## 3: NA 0 0.1 123 2 0 0
## 4: NA 0 0.0 100 2 0 0
## 5: NA 0 0.0 150 2 0 0
## 6: NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1: 15 25.0 K 0
## 2: 0 2.5 K 0
## 3: 2 25.0 K 0
## 4: 2 2.5 K 0
## 5: 2 2.5 K 0
## 6: 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1: 3040 8812 3051 8806 1
## 2: 3042 8755 0 0 2
## 3: 3340 8742 0 0 3
## 4: 3458 8626 0 0 4
## 5: 3412 8642 0 0 5
## 6: 3450 8748 0 0 6
There are 902297 observations in this data set, with 37 variables.
The events in the database start in year 1950 and end in 2011. According to NOAA official documentation (page 6), there are 48 official event types. However, only a small fraction of these have been consistently recorded across the above time range. As described in the Storm Events Database Details, from 1950 through 1954, only tornado events were recorded. From 1955 to 1995, only tornado, thunderstorm wind and hail events have been recorded in the database. Since 1996, all 48 official events are being recorded.
Convert the events start dates, which are strings in the format MM/DD/YYYY HH:MM:SS, to R dates:
stormdata <- stormdata %>% mutate(BGN_DATE = mdy_hms(BGN_DATE))
It is a legitimate goal here to show results comparing all 48 official event types within the same time period. For this reason, we filter out the records previous to 1996:
stormdata <- stormdata %>% filter(year(BGN_DATE) >= 1996)
Ignore the events that do not cause any loss, either in terms of human or physical damage:
stormdata <- stormdata %>% filter(FATALITIES | INJURIES | PROPDMG | CROPDMG)
For each observation, compute the total value for property and crop damage. In order to do that, we need to multiply the values in PROPDMG (resp. CROPDMG) by a value encoded in the corresponding PROPDMGEXP (resp. CROPDMGEXP) varialble, which is alphanumeric. These are the possible codes for the multiplier variables:
unique(stormdata$PROPDMGEXP)
## [1] "K" "" "M" "B"
unique(stormdata$CROPDMGEXP)
## [1] "K" "" "M" "B"
Among all possible alphanumeric codes, after the previous filtering only 4 remains, including the empty character: “K”, “M”, “B” mean respectively thousands, millions and billions. According to , the blank character map to a 0 multiplier. We overwrite the property and crop damage variable with their respective total taking into account the multipliers:
totaldmg <- function(dmg, mult) {
total = 0
if(mult == "K") { total = dmg * 1000 }
else if(mult == "M") { total = dmg * 1e06 }
else if(mult == "B") { total = dmg * 1e09 }
total
}
stormdata <-
stormdata %>%
rowwise() %>%
mutate(CROPDMG = totaldmg(CROPDMG,CROPDMGEXP), PROPDMG = totaldmg(PROPDMG,PROPDMGEXP))
As mentioned before, there are 48 official event types, which are defined here.
official_events <-c("ASTRONOMICAL LOW TIDE","AVALANCHE","BLIZZARD","COASTAL FLOOD","COLD/WIND CHILL","DEBRIS FLOW","DENSE FOG","DENSE SMOKE","DROUGHT","DUST DEVIL","DUST STORM","EXCESSIVE HEAT","EXTREME COLD/WIND CHILL","FLASH FLOOD","FLOOD","FROST/FREEZE","FUNNEL CLOUD","FREEZING FOG","HAIL","HEAT","HEAVY RAIN","HEAVY SNOW","HIGH SURF","HIGH WIND","HURRICANE/TYPHOON","ICE STORM","LAKE-EFFECT SNOW","LAKESHORE FLOOD","LIGHTNING","MARINE HAIL","MARINE HIGH WIND","MARINE STRONG WIND","MARINE THUNDERSTORM WIND","RIP CURRENT","SEICHE","SLEET","STORM SURGE/TIDE","STRONG WIND","THUNDERSTORM WIND","TORNADO","TROPICAL DEPRESSION","TROPICAL STORM","TSUNAMI","VOLCANIC ASH","WATERSPOUT","WILDFIRE","WINTER STORM","WINTER WEATHER")
The so far filtered dataset contains much more event types due to typos, e.g leading/trailing spaces, plurals, and non standard nomenclature. A first trivial step is to convert all events to upper case and remove leading and trailing spaces
stormdata <- stormdata %>% mutate(EVTYPE = trimws(toupper(EVTYPE)))
which leaves us with 183 event types.
This is the list of non standard events in the filtered dataset:
events <- sort(unique(stormdata$EVTYPE))
nostd_events <- events[is.na(match(events, official_events))]
nostd_events
## [1] "AGRICULTURAL FREEZE" "ASTRONOMICAL HIGH TIDE"
## [3] "BEACH EROSION" "BLACK ICE"
## [5] "BLOWING DUST" "BLOWING SNOW"
## [7] "BRUSH FIRE" "COASTAL EROSION"
## [9] "COASTAL FLOODING" "COASTAL FLOODING/EROSION"
## [11] "COASTAL FLOODING/EROSION" "COASTALSTORM"
## [13] "COASTAL STORM" "COLD"
## [15] "COLD AND SNOW" "COLD TEMPERATURE"
## [17] "COLD WEATHER" "DAMAGING FREEZE"
## [19] "DAM BREAK" "DOWNBURST"
## [21] "DROWNING" "DRY MICROBURST"
## [23] "EARLY FROST" "EROSION/CSTL FLOOD"
## [25] "EXCESSIVE SNOW" "EXTENDED COLD"
## [27] "EXTREME COLD" "EXTREME WINDCHILL"
## [29] "FALLING SNOW/ICE" "FLASH FLOOD/FLOOD"
## [31] "FLOOD/FLASH/FLOOD" "FOG"
## [33] "FREEZE" "FREEZING DRIZZLE"
## [35] "FREEZING RAIN" "FREEZING SPRAY"
## [37] "FROST" "GLAZE"
## [39] "GRADIENT WIND" "GUSTY WIND"
## [41] "GUSTY WIND/HAIL" "GUSTY WIND/HVY RAIN"
## [43] "GUSTY WIND/RAIN" "GUSTY WINDS"
## [45] "HARD FREEZE" "HAZARDOUS SURF"
## [47] "HEAT WAVE" "HEAVY RAIN/HIGH SURF"
## [49] "HEAVY SEAS" "HEAVY SNOW SHOWER"
## [51] "HEAVY SURF" "HEAVY SURF AND WIND"
## [53] "HEAVY SURF/HIGH SURF" "HIGH SEAS"
## [55] "HIGH SURF ADVISORY" "HIGH SWELLS"
## [57] "HIGH WATER" "HIGH WIND (G40)"
## [59] "HIGH WINDS" "HURRICANE"
## [61] "HURRICANE EDOUARD" "HYPERTHERMIA/EXPOSURE"
## [63] "HYPOTHERMIA/EXPOSURE" "ICE JAM FLOOD (MINOR"
## [65] "ICE ON ROAD" "ICE ROADS"
## [67] "ICY ROADS" "LAKE EFFECT SNOW"
## [69] "LANDSLIDE" "LANDSLIDES"
## [71] "LANDSLUMP" "LANDSPOUT"
## [73] "LATE SEASON SNOW" "LIGHT FREEZING RAIN"
## [75] "LIGHT SNOW" "LIGHT SNOWFALL"
## [77] "MARINE ACCIDENT" "MARINE TSTM WIND"
## [79] "MICROBURST" "MIXED PRECIP"
## [81] "MIXED PRECIPITATION" "MUDSLIDE"
## [83] "MUD SLIDE" "MUDSLIDES"
## [85] "NON-SEVERE WIND DAMAGE" "NON TSTM WIND"
## [87] "NON-TSTM WIND" "OTHER"
## [89] "RAIN" "RAIN/SNOW"
## [91] "RECORD HEAT" "RIP CURRENTS"
## [93] "RIVER FLOOD" "RIVER FLOODING"
## [95] "ROCK SLIDE" "ROGUE WAVE"
## [97] "ROUGH SEAS" "ROUGH SURF"
## [99] "SMALL HAIL" "SNOW"
## [101] "SNOW AND ICE" "SNOW SQUALL"
## [103] "SNOW SQUALLS" "STORM SURGE"
## [105] "STRONG WINDS" "THUNDERSTORM"
## [107] "THUNDERSTORM WIND (G40)" "TIDAL FLOODING"
## [109] "TORRENTIAL RAINFALL" "TSTM WIND"
## [111] "TSTM WIND 40" "TSTM WIND (41)"
## [113] "TSTM WIND 45" "TSTM WIND AND LIGHTNING"
## [115] "TSTM WIND (G35)" "TSTM WIND (G40)"
## [117] "TSTM WIND G45" "TSTM WIND (G45)"
## [119] "TSTM WIND (G45)" "TSTM WIND/HAIL"
## [121] "TYPHOON" "UNSEASONABLE COLD"
## [123] "UNSEASONABLY COLD" "UNSEASONABLY WARM"
## [125] "UNSEASONAL RAIN" "URBAN/SML STREAM FLD"
## [127] "WARM WEATHER" "WET MICROBURST"
## [129] "WHIRLWIND" "WILD/FOREST FIRE"
## [131] "WIND" "WIND AND WAVE"
## [133] "WIND DAMAGE" "WINDS"
## [135] "WINTER WEATHER MIX" "WINTER WEATHER/MIX"
## [137] "WINTRY MIX"
There are 137 non standard events. The majority of these events can be almost directly mapped to the official ones:
blizzard <- c("BLOWING SNOW","EXCESSIVE SNOW","FALLING SNOW/ICE","SNOW SQUALL","SNOW SQUALLS")
stormdata[stormdata$EVTYPE %in% blizzard,"EVTYPE"] = "BLIZZARD"
coastal_flood <- c("COASTAL FLOODING","COASTAL FLOODING/EROSION","COASTAL FLOODING/EROSION","EROSION/CSTL FLOOD","TIDAL FLOODING")
stormdata[stormdata$EVTYPE %in% coastal_flood,"EVTYPE"] = "COASTAL FLOOD"
cold_wind_chill <- c("EXTENDED COLD","EXTREME COLD","EXTREME WINDCHILL","HYPOTHERMIA/EXPOSURE","UNSEASONABLE COLD","UNSEASONABLY COLD")
stormdata[stormdata$EVTYPE %in% cold_wind_chill,"EVTYPE"] = "COLD/WIND CHILL"
stormdata[stormdata$EVTYPE == "FOG","EVTYPE"] = "DENSE FOG"
stormdata[stormdata$EVTYPE == "BLOWING DUST","EVTYPE"] = "DUST STORM"
frost_freeze <- c("AGRICULTURAL FREEZE","BLACK ICE","COLD","COLD AND SNOW","COLD TEMPERATURE","COLD WEATHER","DAMAGING FREEZE","EARLY FROST","FREEZE","FREEZING DRIZZLE","FREEZING RAIN","FREEZING SPRAY","FROST","GLAZE","HARD FREEZE","ICE ON ROAD","ICE ROADS","ICY ROADS","LIGHT FREEZING RAIN","SNOW AND ICE")
stormdata[stormdata$EVTYPE %in% frost_freeze,"EVTYPE"] = "FROST/FREEZE"
flood <- c("ICE JAM FLOOD (MINOR","RIVER FLOOD","RIVER FLOODING","URBAN/SML STREAM FLD")
stormdata[stormdata$EVTYPE %in% flood,"EVTYPE"] = "FLOOD"
flash_flood <- c("FLASH FLOOD/FLOOD","FLOOD/FLASH/FLOOD")
stormdata[stormdata$EVTYPE %in% flash_flood,"EVTYPE"] = "FLASH FLOOD"
excessive_heat <- c("HEAT WAVE","RECORD HEAT")
stormdata[stormdata$EVTYPE %in% excessive_heat,"EVTYPE"] = "EXCESSIVE HEAT"
stormdata[stormdata$EVTYPE == "SMALL HAIL","EVTYPE"] = "HAIL"
heat <- c("HYPERTHERMIA/EXPOSURE","UNSEASONABLY WARM","WARM WEATHER")
stormdata[stormdata$EVTYPE %in% heat,"EVTYPE"] = "HEAT"
stormdata[stormdata$EVTYPE == "HEAVY SNOW SHOWER","EVTYPE"] = "HEAVY SNOW"
heavy_rain <- c("RAIN","TORRENTIAL RAINFALL","UNSEASONAL RAIN")
stormdata[stormdata$EVTYPE %in% heavy_rain,"EVTYPE"] = "HEAVY RAIN"
heavy_snow <- c("LATE SEASON SNOW", "RAIN/SNOW","SNOW")
stormdata[stormdata$EVTYPE %in% heavy_snow,"EVTYPE"] = "HEAVY SNOW"
high_surf <- c("HAZARDOUS SURF","HEAVY RAIN/HIGH SURF", "HEAVY SURF","HEAVY SURF AND WIND","HEAVY SURF/HIGH SURF","HIGH SURF ADVISORY","ROUGH SURF")
stormdata[stormdata$EVTYPE %in% high_surf,"EVTYPE"] = "HIGH SURF"
hurricane_typhoon <- c("HURRICANE", "HURRICANE EDOUARD","TYPHOON")
stormdata[stormdata$EVTYPE %in% hurricane_typhoon,"EVTYPE"] = "HURRICANE/TYPHOON"
stormdata[stormdata$EVTYPE == "RIP CURRENTS","EVTYPE"] = "RIP CURRENT"
stormdata[stormdata$EVTYPE == "MARINE TSTM WIND","EVTYPE"] = "MARINE THUNDERSTORM WIND"
storm_surge_tide <- c("COASTALSTORM","COASTAL STORM","STORM SURGE")
stormdata[stormdata$EVTYPE %in% storm_surge_tide,"EVTYPE"] = "STORM SURGE/TIDE"
strong_wind <- c("GRADIENT WIND","GUSTY WIND","GUSTY WIND/HAIL","GUSTY WIND/HVY RAIN","GUSTY WIND/RAIN","GUSTY WINDS","HIGH WIND (G40)","HIGH WINDS","NON TSTM WIND","NON-TSTM WIND","WHIRLWIND","WIND","WIND AND WAVE","WIND DAMAGE","WINDS","STRONG WINDS")
stormdata[stormdata$EVTYPE %in% strong_wind,"EVTYPE"] = "STRONG WIND"
tstm_wind <- c("DOWNBURST","DRY MICROBURST","MICROBURST","TSTM WIND","TSTM WIND 40","TSTM WIND (41)","TSTM WIND 45","TSTM WIND AND LIGHTNING","TSTM WIND (G35)","TSTM WIND (G40)","TSTM WIND G45","TSTM WIND (G45)","TSTM WIND (G45)","TSTM WIND/HAIL","THUNDERSTORM","THUNDERSTORM WIND (G40)","WET MICROBURST")
stormdata[stormdata$EVTYPE %in% tstm_wind,"EVTYPE"] = "THUNDERSTORM WIND"
stormdata[stormdata$EVTYPE == "WILD/FOREST FIRE","EVTYPE"] = "WILDFIRE"
winter_weather <- c("WINTER WEATHER MIX","WINTER WEATHER/MIX","WINTRY MIX")
stormdata[stormdata$EVTYPE %in% winter_weather,"EVTYPE"] = "WINTER WEATHER"
Here we get the remaining non standard events which cannot be intuitively mapped to the official ones:
events <- sort(unique(stormdata$EVTYPE))
nostd_events <- events[is.na(match(events, official_events))]
nostd_events
## [1] "ASTRONOMICAL HIGH TIDE" "BEACH EROSION"
## [3] "BRUSH FIRE" "COASTAL EROSION"
## [5] "DAM BREAK" "DROWNING"
## [7] "HEAVY SEAS" "HIGH SEAS"
## [9] "HIGH SWELLS" "HIGH WATER"
## [11] "LAKE EFFECT SNOW" "LANDSLIDE"
## [13] "LANDSLIDES" "LANDSLUMP"
## [15] "LANDSPOUT" "LIGHT SNOW"
## [17] "LIGHT SNOWFALL" "MARINE ACCIDENT"
## [19] "MIXED PRECIP" "MIXED PRECIPITATION"
## [21] "MUDSLIDE" "MUD SLIDE"
## [23] "MUDSLIDES" "NON-SEVERE WIND DAMAGE"
## [25] "OTHER" "ROCK SLIDE"
## [27] "ROGUE WAVE" "ROUGH SEAS"
and whose observations are then filtered out from the dataset:
stormdata <-
stormdata %>% filter(!(EVTYPE %in% nostd_events))
Finally, we retain only the variables necessary for the analysis:
stormdata <-
stormdata %>% select(BGN_DATE, STATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)
A quick glimpse into the filtered dataset:
dim(stormdata)
## [1] 200880 7
head(stormdata)
## # A tibble: 6 x 7
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1996-01-06 AL WINTER STORM 0 0 380000 38000
## 2 1996-01-11 AL TORNADO 0 0 100000 0
## 3 1996-01-11 AL THUNDERSTORM WIND 0 0 3000 0
## 4 1996-01-11 AL THUNDERSTORM WIND 0 0 5000 0
## 5 1996-01-11 AL THUNDERSTORM WIND 0 0 2000 0
## 6 1996-01-18 AL HIGH WIND 0 0 400000 0
The filtered dataset contains 200880 each with 7 variables.
These are the recorded event types:
events <- sort(unique(stormdata$EVTYPE))
Among the official 48 event types, 46 are recorded.
official_events[is.na(match(official_events, events))]
## [1] "DEBRIS FLOW" "SLEET"
In this dataset, “DEBRIS FLOW” and “SLEET” are not recorded.
In this exploratory and subsequent analysis, for the purpose of investigating health and economic consequences it is more appropriate to split the dataset in two parts: one with only health consequences and one with only economic consequences:
health <-
stormdata %>% filter(FATALITIES !=0 | INJURIES != 0)
property <-
stormdata %>% filter(PROPDMG != 0 | CROPDMG != 0)
Events generally tend to produce far more economic damage than human loss or injuries:
nrow(health)
## [1] 12716
nrow(property)
## [1] 194116
This is a summary in terms of consequences to the population:
summary(health$FATALITIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.6817 1.0000 158.0000
summary(health$INJURIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 1.00 4.55 2.00 1150.00
On average, there has been less around one casualty/five injuries per event. The distributions seem to be particularly skewed towards low values and the max values indicate the presence of outliers.
health[which.max(health$FATALITIES),]
## # A tibble: 1 x 7
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2011-05-22 MO TORNADO 158 1150 2.8e+09 0
The max values both correspond to the Tornado outbreak sequence of May 21-26 2011.
These are the total number of casualties and injuries:
sum(health$FATALITIES)
## [1] 8668
sum(health$INJURIES)
## [1] 57863
We now transform the data to get the total number of fatalities and injuries grouped by event type:
health_sum_by_evt <-
health %>%
group_by(EVTYPE) %>%
summarise(tot_fat = sum(FATALITIES), tot_inj = sum(INJURIES))
## Warning: Grouping rowwise data frame strips rowwise nature
head(health_sum_by_evt)
## # A tibble: 6 x 3
## EVTYPE tot_fat tot_inj
## <chr> <dbl> <dbl>
## 1 AVALANCHE 223 156
## 2 BLIZZARD 75 424
## 3 COASTAL FLOOD 6 8
## 4 COLD/WIND CHILL 235 96
## 5 DENSE FOG 69 855
## 6 DROUGHT 0 4
This is a summary in terms of economic damage:
summary(property$PROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 2.000e+03 1.000e+04 1.888e+06 3.000e+04 1.150e+11
summary(property$CROPDMG)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 1.789e+05 0.000e+00 1.510e+09
On average, damaging events produce hundreds of thousands of dollars of damage. The vast majority of them do not have major economic consequences. There are notable exceptions:
property[which.max(property$PROPDMG),]
## # A tibble: 1 x 7
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2006-01-01 CA FLOOD 0 0 1.15e+11 32500000
property[which.max(property$CROPDMG),]
## # A tibble: 1 x 7
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG
## <dttm> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2005-08-29 MS HURRICANE/TYPHOON 15 104 5.88e+09
## # ... with 1 more variables: CROPDMG <dbl>
The former event refers to flooding in California occurred in January 2006. The latter event is Hurricane Katrina which hit the US Gulf Coast in August 2005.
These are the estimates of total values of property and crop damages:
sum(property$PROPDMG)
## [1] 366426101780
sum(health$CROPDMG)
## [1] 5368431100
Overall, since 1996 various events have produces damages for hundreds of billions of dollars on property and around five billions of dollars in agriculture.
Let’s transform the data by merging property and crop damange estimates together and grouping by event type:
eco_sum_by_evt <-
property %>%
mutate(DMG = PROPDMG + CROPDMG) %>%
group_by(EVTYPE) %>%
summarise(tot_dmg = sum(DMG))
## Warning: Grouping rowwise data frame strips rowwise nature
head(eco_sum_by_evt)
## # A tibble: 6 x 2
## EVTYPE tot_dmg
## <chr> <dbl>
## 1 ASTRONOMICAL LOW TIDE 320000
## 2 AVALANCHE 3711800
## 3 BLIZZARD 534768950
## 4 COASTAL FLOOD 406452560
## 5 COLD/WIND CHILL 1379320900
## 6 DENSE FOG 20464500
We answer the question by separately considering the effects of each event type in terms of fatalities and injuries. Firstly, we determine the deadliest event types:
head(health_sum_by_evt %>% arrange(desc(tot_fat)) %>% select(EVTYPE, tot_fat))
## # A tibble: 6 x 2
## EVTYPE tot_fat
## <chr> <dbl>
## 1 EXCESSIVE HEAT 1799
## 2 TORNADO 1511
## 3 FLASH FLOOD 887
## 4 LIGHTNING 651
## 5 RIP CURRENT 542
## 6 FLOOD 444
This is an overview of the impact of all event types on human loss:
library(ggplot2)
g <- ggplot(health_sum_by_evt, aes(reorder(EVTYPE,tot_fat), tot_fat)) +
geom_col() +
coord_flip() +
xlab("Event Type") + ylab("Tot Fatalities (1996-2011)") +
ggtitle("Tot Fatalities per Event Type (1996-2011)")
plot(g)
Histogram of the total number of fatalities in the 1996-2011 time period grouped by event type
Secondly, we derive the event types which are more impactful in terms of injuries to the population:
head(health_sum_by_evt %>% arrange(desc(tot_inj)) %>% select(EVTYPE, tot_inj))
## # A tibble: 6 x 2
## EVTYPE tot_inj
## <chr> <dbl>
## 1 TORNADO 20667
## 2 FLOOD 6838
## 3 EXCESSIVE HEAT 6461
## 4 THUNDERSTORM WIND 5154
## 5 LIGHTNING 4141
## 6 FLASH FLOOD 1674
and similarly we plot an overview of the total injuries for all event types:
g <- ggplot(health_sum_by_evt, aes(reorder(EVTYPE,tot_inj), tot_inj)) +
geom_col() +
coord_flip() +
xlab("Event Type") + ylab("Tot Injuries (1996-2011)") +
ggtitle("Tot Injuries per Event Type (1996-2011)")
plot(g)
Histogram of the total number of injuries in the 1996-2011 time period grouped by event type
If we consider the combined effect on fatalities and injuries, the most harmful events are tornadoes, excessive heat, floods and lightnings. Besides these, rip currents have a certain impact in terms of fatalities and thunderstorm winds in terms of injuries.
Let’s show the most damaging event types:
head(eco_sum_by_evt %>% arrange(desc(tot_dmg)))
## # A tibble: 6 x 2
## EVTYPE tot_dmg
## <chr> <dbl>
## 1 FLOOD 149142742700
## 2 HURRICANE/TYPHOON 87068996810
## 3 STORM SURGE/TIDE 47835629000
## 4 TORNADO 24900370720
## 5 HAIL 17092035870
## 6 FLASH FLOOD 16557170610
and an overview of the estimates of the damage for each event type:
g <- ggplot(eco_sum_by_evt, aes(reorder(EVTYPE,tot_dmg), tot_dmg)) +
geom_col() +
coord_flip() +
xlab("Event Type") + ylab("Tot Damage in US $ (1996-2011)") +
ggtitle("Estimates of Total Damage (1996-2011)")
plot(g)
Histogram of the total damage in the 1996-2011 time period grouped by event type
As with the case of human health, floods have major (economic) consequences together with strong wind storms (hurricanes/typhoons/tornadoes). Also, it turns out that also storm surge/tides and hail have major effects in terms of economic damage.