Storms cause public health and economic problems in US. Preventing fatalities, injuries and asset loss are major concern for local authorities.
The goal of this analysis was to identify the most hazardous weather events in terms of population health and economy in US. This analysis is based on US National Oceanic and Atmospheric Administration data covering events from 1950 to 2011.
The analysis shows that the most harmful event for population health is tornado. The most harmful event for economy is flood.
This analysis used four packages: reshape2, dplyr, lubridate, ggplot2.
library(reshape2)
library(dplyr)
library(lubridate)
library(ggplot2)
Data from NOAA is available from cloud storage here. Data description is here. More information about the storm is available in FAQ.
The approach to loading the data is to download compressed file from URL if not found in working directory. Then load the data using read.csv and validate the data by checking the file size and data dimensions, which are available in supporting forum.
raw_data_url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("repdata_data_StormData.csv.bz2")) {
download.file(url = raw_data_url,
destfile = "repdata_data_StormData.csv.bz2")
}
file_info <- file.info("repdata_data_StormData.csv.bz2")
storm_data <- read.csv("repdata_data_StormData.csv.bz2", sep = ",", header = TRUE)
stopifnot(file_info[,1] == 49177144)
stopifnot(dim(storm_data) == c(902297,37))
There are seven total variables required to perform this analysis: event type, fatalities, injuries, damage to property, property damage multiplier, crop damage, crop damage multiplier. Those can be obtained from the names of the variables in the raw data.
names(storm_data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The list of necessary variables follows:
- EVTYPE - event type;
- FATALITIES - number of fatalities in event;
- INJURIES - number of injuries in event;
- PROPDMG - property damage;
- PROPDMGEXP - property damage multiplier;
- CROPDMG - crop damage;
- CROPDMGEXP - crop damage multiplier.
The dataset is reduced to these variables.
subset_data1 <- storm_data %>%
select(event_type = EVTYPE,
fatalities = FATALITIES,
injuries = INJURIES,
property_damage = PROPDMG,
property_multiplier = PROPDMGEXP,
crop_damage = CROPDMG,
crop_multiplier = CROPDMGEXP)
Part of observations contains event type NONE - we kick those observations. Additionally, we only need observations with values greater than zero.
subset_data2 <- subset_data1 %>%
filter(fatalities >0 | injuries > 0 |
property_damage > 0 | crop_damage > 0 & event_type != "NONE")
The subset has reduced to 254633 rows. Check that there are no missiong values in the subset.
dim(subset_data2)
## [1] 254633 7
sum(complete.cases(subset_data2))
## [1] 254633
The official events type are 48. The subset contains 488 unique events type.
length(unique(subset_data2$event_type))
## [1] 488
The reason behind large number of types is that the dataset contains typos and similar events in capital letters like: wind and WIND.
unique(subset_data2$event_type)
## [1] TORNADO TSTM WIND
## [3] HAIL ICE STORM/FLASH FLOOD
## [5] WINTER STORM HURRICANE OPAL/HIGH WINDS
## [7] THUNDERSTORM WINDS HURRICANE ERIN
## [9] HURRICANE OPAL HEAVY RAIN
## [11] LIGHTNING THUNDERSTORM WIND
## [13] DENSE FOG RIP CURRENT
## [15] THUNDERSTORM WINS FLASH FLOODING
## [17] FLASH FLOOD TORNADO F0
## [19] THUNDERSTORM WINDS LIGHTNING THUNDERSTORM WINDS/HAIL
## [21] HEAT HIGH WINDS
## [23] WIND HEAVY RAINS
## [25] LIGHTNING AND HEAVY RAIN THUNDERSTORM WINDS HAIL
## [27] COLD HEAVY RAIN/LIGHTNING
## [29] FLASH FLOODING/THUNDERSTORM WI FLOODING
## [31] WATERSPOUT EXTREME COLD
## [33] LIGHTNING/HEAVY RAIN BREAKUP FLOODING
## [35] HIGH WIND FREEZE
## [37] RIVER FLOOD HIGH WINDS HEAVY RAINS
## [39] AVALANCHE MARINE MISHAP
## [41] HIGH TIDES HIGH WIND/SEAS
## [43] HIGH WINDS/HEAVY RAIN HIGH SEAS
## [45] COASTAL FLOOD SEVERE TURBULENCE
## [47] RECORD RAINFALL HEAVY SNOW
## [49] HEAVY SNOW/WIND DUST STORM
## [51] FLOOD APACHE COUNTY
## [53] SLEET DUST DEVIL
## [55] ICE STORM EXCESSIVE HEAT
## [57] THUNDERSTORM WINDS/FUNNEL CLOU GUSTY WINDS
## [59] FLOODING/HEAVY RAIN HEAVY SURF COASTAL FLOODING
## [61] HIGH SURF WILD FIRES
## [63] HIGH WINTER STORM HIGH WINDS
## [65] WINTER STORMS MUDSLIDES
## [67] RAINSTORM SEVERE THUNDERSTORM
## [69] SEVERE THUNDERSTORMS SEVERE THUNDERSTORM WINDS
## [71] THUNDERSTORMS WINDS FLOOD/FLASH FLOOD
## [73] FLOOD/RAIN/WINDS THUNDERSTORMS
## [75] FLASH FLOOD WINDS WINDS
## [77] FUNNEL CLOUD HIGH WIND DAMAGE
## [79] STRONG WIND HEAVY SNOWPACK
## [81] FLASH FLOOD/ HEAVY SURF
## [83] DRY MIRCOBURST WINDS DRY MICROBURST
## [85] URBAN FLOOD THUNDERSTORM WINDSS
## [87] MICROBURST WINDS HEAT WAVE
## [89] UNSEASONABLY WARM COASTAL FLOODING
## [91] STRONG WINDS BLIZZARD
## [93] WATERSPOUT/TORNADO WATERSPOUT TORNADO
## [95] STORM SURGE URBAN/SMALL STREAM FLOOD
## [97] WATERSPOUT- TORNADOES, TSTM WIND, HAIL
## [99] TROPICAL STORM ALBERTO TROPICAL STORM
## [101] TROPICAL STORM GORDON TROPICAL STORM JERRY
## [103] LIGHTNING THUNDERSTORM WINDS URBAN FLOODING
## [105] MINOR FLOODING WATERSPOUT-TORNADO
## [107] LIGHTNING INJURY LIGHTNING AND THUNDERSTORM WIN
## [109] FLASH FLOODS THUNDERSTORM WINDS53
## [111] WILDFIRE DAMAGING FREEZE
## [113] THUNDERSTORM WINDS 13 HURRICANE
## [115] SNOW LIGNTNING
## [117] FROST FREEZING RAIN/SNOW
## [119] HIGH WINDS/ THUNDERSNOW
## [121] FLOODS COOL AND WET
## [123] HEAVY RAIN/SNOW GLAZE ICE
## [125] MUD SLIDE HIGH WINDS
## [127] RURAL FLOOD MUD SLIDES
## [129] EXTREME HEAT DROUGHT
## [131] COLD AND WET CONDITIONS EXCESSIVE WETNESS
## [133] SLEET/ICE STORM GUSTNADO
## [135] FREEZING RAIN SNOW AND HEAVY SNOW
## [137] GROUND BLIZZARD EXTREME WIND CHILL
## [139] MAJOR FLOOD SNOW/HEAVY SNOW
## [141] FREEZING RAIN/SLEET ICE JAM FLOODING
## [143] COLD AIR TORNADO WIND DAMAGE
## [145] FOG TSTM WIND 55
## [147] SMALL STREAM FLOOD THUNDERTORM WINDS
## [149] HAIL/WINDS SNOW AND ICE
## [151] WIND STORM GRASS FIRES
## [153] LAKE FLOOD HAIL/WIND
## [155] WIND/HAIL ICE
## [157] SNOW AND ICE STORM THUNDERSTORM WINDS
## [159] WINTER WEATHER DROUGHT/EXCESSIVE HEAT
## [161] THUNDERSTORMS WIND TUNDERSTORM WIND
## [163] URBAN AND SMALL STREAM FLOODIN THUNDERSTORM WIND/LIGHTNING
## [165] HEAVY RAIN/SEVERE WEATHER THUNDERSTORM
## [167] WATERSPOUT/ TORNADO LIGHTNING.
## [169] HURRICANE-GENERATED SWELLS RIVER AND STREAM FLOOD
## [171] HIGH WINDS/COASTAL FLOOD RAIN
## [173] RIVER FLOODING ICE FLOES
## [175] THUNDERSTORM WIND G50 LIGHTNING FIRE
## [177] HEAVY LAKE SNOW RECORD COLD
## [179] HEAVY SNOW/FREEZING RAIN COLD WAVE
## [181] DUST DEVIL WATERSPOUT TORNADO F3
## [183] TORNDAO FLOOD/RIVER FLOOD
## [185] MUD SLIDES URBAN FLOODING TORNADO F1
## [187] GLAZE/ICE STORM GLAZE
## [189] HEAVY SNOW/WINTER STORM MICROBURST
## [191] AVALANCE BLIZZARD/WINTER STORM
## [193] DUST STORM/HIGH WINDS ICE JAM
## [195] FOREST FIRES FROST\\FREEZE
## [197] THUNDERSTORM WINDS. HVY RAIN
## [199] HAIL 150 HAIL 075
## [201] HAIL 100 THUNDERSTORM WIND G55
## [203] HAIL 125 THUNDERSTORM WIND G60
## [205] THUNDERSTORM WINDS G60 HARD FREEZE
## [207] HAIL 200 HEAVY SNOW AND HIGH WINDS
## [209] HEAVY SNOW/HIGH WINDS & FLOOD HEAVY RAIN AND FLOOD
## [211] RIP CURRENTS/HEAVY SURF URBAN AND SMALL
## [213] WILDFIRES FOG AND COLD TEMPERATURES
## [215] SNOW/COLD FLASH FLOOD FROM ICE JAMS
## [217] TSTM WIND G58 MUDSLIDE
## [219] HEAVY SNOW SQUALLS SNOW SQUALL
## [221] SNOW/ICE STORM HEAVY SNOW/SQUALLS
## [223] HEAVY SNOW-SQUALLS ICY ROADS
## [225] HEAVY MIX SNOW FREEZING RAIN
## [227] SNOW/SLEET SNOW/FREEZING RAIN
## [229] SNOW SQUALLS SNOW/SLEET/FREEZING RAIN
## [231] RECORD SNOW HAIL 0.75
## [233] RECORD HEAT THUNDERSTORM WIND 65MPH
## [235] THUNDERSTORM WIND/ TREES THUNDERSTORM WIND/AWNING
## [237] THUNDERSTORM WIND 98 MPH THUNDERSTORM WIND TREES
## [239] TORNADO F2 RIP CURRENTS
## [241] HURRICANE EMILY COASTAL SURGE
## [243] HURRICANE GORDON HURRICANE FELIX
## [245] THUNDERSTORM WIND 60 MPH THUNDERSTORM WINDS 63 MPH
## [247] THUNDERSTORM WIND/ TREE THUNDERSTORM DAMAGE TO
## [249] THUNDERSTORM WIND 65 MPH FLASH FLOOD - HEAVY RAIN
## [251] THUNDERSTORM WIND. FLASH FLOOD/ STREET
## [253] BLOWING SNOW HEAVY SNOW/BLIZZARD
## [255] THUNDERSTORM HAIL THUNDERSTORM WINDSHAIL
## [257] LIGHTNING WAUSEON THUDERSTORM WINDS
## [259] ICE AND SNOW STORM FORCE WINDS
## [261] HEAVY SNOW/ICE LIGHTING
## [263] HIGH WIND/HEAVY SNOW THUNDERSTORM WINDS AND
## [265] HEAVY PRECIPITATION HIGH WIND/BLIZZARD
## [267] TSTM WIND DAMAGE FLOOD FLASH
## [269] RAIN/WIND SNOW/ICE
## [271] HAIL 75 HEAT WAVE DROUGHT
## [273] HEAVY SNOW/BLIZZARD/AVALANCHE HEAT WAVES
## [275] UNSEASONABLY WARM AND DRY UNSEASONABLY COLD
## [277] RECORD/EXCESSIVE HEAT THUNDERSTORM WIND G52
## [279] HIGH WAVES FLASH FLOOD/FLOOD
## [281] FLOOD/FLASH LOW TEMPERATURE
## [283] HEAVY RAINS/FLOODING THUNDERESTORM WINDS
## [285] THUNDERSTORM WINDS/FLOODING HYPOTHERMIA
## [287] THUNDEERSTORM WINDS THUNERSTORM WINDS
## [289] HIGH WINDS/COLD COLD/WINDS
## [291] SNOW/ BITTER COLD COLD WEATHER
## [293] RAPIDLY RISING WATER WILD/FOREST FIRE
## [295] ICE/STRONG WINDS SNOW/HIGH WINDS
## [297] HIGH WINDS/SNOW SNOWMELT FLOODING
## [299] HEAVY SNOW AND STRONG WINDS SNOW ACCUMULATION
## [301] SNOW/ ICE SNOW/BLOWING SNOW
## [303] TORNADOES THUNDERSTORM WIND/HAIL
## [305] FREEZING DRIZZLE HAIL 175
## [307] FLASH FLOODING/FLOOD HAIL 275
## [309] HAIL 450 EXCESSIVE RAINFALL
## [311] THUNDERSTORMW HAILSTORM
## [313] TSTM WINDS TSTMW
## [315] TSTM WIND 65) TROPICAL STORM DEAN
## [317] THUNDERSTORM WINDS/ FLOOD LANDSLIDE
## [319] HIGH WIND AND SEAS THUNDERSTORMWINDS
## [321] WILD/FOREST FIRES HEAVY SEAS
## [323] HAIL DAMAGE FLOOD & HEAVY RAIN
## [325] ? THUNDERSTROM WIND
## [327] FLOOD/FLASHFLOOD HIGH WATER
## [329] HIGH WIND 48 LANDSLIDES
## [331] URBAN/SMALL STREAM BRUSH FIRE
## [333] HEAVY SHOWER HEAVY SWELLS
## [335] URBAN SMALL URBAN FLOODS
## [337] FLASH FLOOD/LANDSLIDE HEAVY RAIN/SMALL STREAM URBAN
## [339] FLASH FLOOD LANDSLIDES TSTM WIND/HAIL
## [341] Other Ice jam flood (minor
## [343] Tstm Wind URBAN/SML STREAM FLD
## [345] ROUGH SURF Heavy Surf
## [347] Dust Devil Marine Accident
## [349] Freeze Strong Wind
## [351] COASTAL STORM Erosion/Cstl Flood
## [353] River Flooding Damaging Freeze
## [355] Beach Erosion High Surf
## [357] Heavy Rain/High Surf Unseasonable Cold
## [359] Early Frost Wintry Mix
## [361] Extreme Cold Coastal Flooding
## [363] Torrential Rainfall Landslump
## [365] Hurricane Edouard Coastal Storm
## [367] TIDAL FLOODING Tidal Flooding
## [369] Strong Winds EXTREME WINDCHILL
## [371] Glaze Extended Cold
## [373] Whirlwind Heavy snow shower
## [375] Light snow Light Snow
## [377] MIXED PRECIP Freezing Spray
## [379] DOWNBURST Mudslides
## [381] Microburst Mudslide
## [383] Cold Coastal Flood
## [385] Snow Squalls Wind Damage
## [387] Light Snowfall Freezing Drizzle
## [389] Gusty wind/rain GUSTY WIND/HVY RAIN
## [391] Wind Cold Temperature
## [393] Heat Wave Snow
## [395] COLD AND SNOW RAIN/SNOW
## [397] TSTM WIND (G45) Gusty Winds
## [399] GUSTY WIND TSTM WIND 40
## [401] TSTM WIND 45 TSTM WIND (41)
## [403] TSTM WIND (G40) Frost/Freeze
## [405] AGRICULTURAL FREEZE OTHER
## [407] Hypothermia/Exposure HYPOTHERMIA/EXPOSURE
## [409] Lake Effect Snow Freezing Rain
## [411] Mixed Precipitation BLACK ICE
## [413] COASTALSTORM LIGHT SNOW
## [415] DAM BREAK Gusty winds
## [417] blowing snow GRADIENT WIND
## [419] TSTM WIND AND LIGHTNING gradient wind
## [421] Gradient wind Freezing drizzle
## [423] WET MICROBURST Heavy surf and wind
## [425] TYPHOON HIGH SWELLS
## [427] SMALL HAIL UNSEASONAL RAIN
## [429] COASTAL FLOODING/EROSION TSTM WIND (G45)
## [431] TSTM WIND (G45) HIGH WIND (G40)
## [433] TSTM WIND (G35) COASTAL EROSION
## [435] SEICHE COASTAL FLOODING/EROSION
## [437] HYPERTHERMIA/EXPOSURE WINTRY MIX
## [439] ROCK SLIDE GUSTY WIND/HAIL
## [441] TSTM WIND LANDSPOUT
## [443] EXCESSIVE SNOW LAKE EFFECT SNOW
## [445] FLOOD/FLASH/FLOOD MIXED PRECIPITATION
## [447] WIND AND WAVE LIGHT FREEZING RAIN
## [449] ICE ROADS ROUGH SEAS
## [451] TSTM WIND G45 NON-SEVERE WIND DAMAGE
## [453] WARM WEATHER THUNDERSTORM WIND (G40)
## [455] FLASH FLOOD LATE SEASON SNOW
## [457] WINTER WEATHER MIX ROGUE WAVE
## [459] FALLING SNOW/ICE NON-TSTM WIND
## [461] NON TSTM WIND BLOWING DUST
## [463] VOLCANIC ASH HIGH SURF ADVISORY
## [465] HAZARDOUS SURF WHIRLWIND
## [467] ICE ON ROAD DROWNING
## [469] EXTREME COLD/WIND CHILL MARINE TSTM WIND
## [471] HURRICANE/TYPHOON WINTER WEATHER/MIX
## [473] FROST/FREEZE ASTRONOMICAL HIGH TIDE
## [475] HEAVY SURF/HIGH SURF TROPICAL DEPRESSION
## [477] LAKE-EFFECT SNOW MARINE HIGH WIND
## [479] TSUNAMI STORM SURGE/TIDE
## [481] COLD/WIND CHILL LAKESHORE FLOOD
## [483] MARINE THUNDERSTORM WIND MARINE STRONG WIND
## [485] ASTRONOMICAL LOW TIDE DENSE SMOKE
## [487] MARINE HAIL FREEZING FOG
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD FLASH FLOOD ... WND
One way to solve this problem is to convert all event types to lowercase characters. This action reduced the number of event types to 447.
subset_data3 <- subset_data2 %>%
mutate(event_type = tolower(event_type))
length(unique(subset_data3$event_type))
## [1] 447
Further reduction of the dataset is acheived by grouping the events type using the key word from the official event type names. The events that do not match any key word are grouped into other.
subset_data3$event <- "other"
subset_data3$event[grep("avalanche", subset_data3$event_type)] <- "avalanche"
subset_data3$event[grep("blizzard", subset_data3$event_type)] <- "snow"
subset_data3$event[grep("flood", subset_data3$event_type)] <- "flood"
subset_data3$event[grep("wind", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("fog", subset_data3$event_type)] <- "fog"
subset_data3$event[grep("cold", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("chill", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("frost", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("freeze", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("tornado", subset_data3$event_type)] <- "tornado"
subset_data3$event[grep("hail", subset_data3$event_type)] <- "hail"
subset_data3$event[grep("winds", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("win", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("wins", subset_data3$event_type)] <- "wind"
subset_data3$event[grep("storm", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("rainstorm", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("thunderstorm", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("snow", subset_data3$event_type)] <- "snow"
subset_data3$event[grep("rain", subset_data3$event_type)] <- "rain"
subset_data3$event[grep("heat", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("hurricane", subset_data3$event_type)] <- "hurricane"
subset_data3$event[grep("fld", subset_data3$event_type)] <- "flood"
subset_data3$event[grep("current", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("surf", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("fire", subset_data3$event_type)] <- "fire"
subset_data3$event[grep("water", subset_data3$event_type)] <- "flood"
subset_data3$event[grep("wave", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("tsunami", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("lightning", subset_data3$event_type)] <- "storm"
subset_data3$event[grep("warm", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("torndao", subset_data3$event_type)] <- "tornado"
subset_data3$event[grep("high tides", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("high seas", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("marine mishap", subset_data3$event_type)] <- "waves"
subset_data3$event[grep("slide", subset_data3$event_type)] <- "slides"
subset_data3$event[grep("dust devil", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("dry microburst", subset_data3$event_type)] <- "heat"
subset_data3$event[grep("low temperature", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("freezing spray", subset_data3$event_type)] <- "cold"
subset_data3$event[grep("dam break", subset_data3$event_type)] <- "flood"
This operation has reduced the dataset events type to 16 unique values.
length(unique(subset_data3$event))
## [1] 16
Top-10 events type by the number of rows cover 99% of all observations.
sum(sort(table(subset_data3$event), decreasing = TRUE)[1:10]) / nrow(subset_data3)
## [1] 0.9927072
The dataset is reduced to top-10 events.
sum(sort(table(subset_data3$event), decreasing = TRUE)[1:10])
## [1] 252776
sort(table(subset_data3$event), decreasing = TRUE)[1:10]
##
## wind storm tornado flood hail snow fire rain heat waves
## 74491 72246 39961 33223 26157 2134 1258 1246 1126 934
subset_data4 <- subset_data3 %>%
filter(event %in% c("wind", "storm", "tornado", "flood", "hail", "snow",
"fire", "rain", "heat", "waves"))
dim(subset_data4)
## [1] 252776 8
Property damage multiplier and crop damage multiplier contain different units.
unique(subset_data4$property_multiplier)
## [1] K M B + 0 5 m 6 4 h 2 7 3 H -
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(subset_data4$crop_multiplier)
## [1] K M B ? 0 k
## Levels: ? 0 2 B k K m M
Some units like k or K, m or M, B reflect thousands, millions and billions of USD. Numbers from 1 to 8 show the power coefficient for the multiplier like: \[ e = 10^k \]
where $ e $ is multiplier and $ k $ is from 1 to 8.
Other symbols do not transform to currency directly. For the purpose of this analysis those symbols are condidered to be 1 USD.
subset_data5 <- subset_data4 %>%
mutate(property_multiplier = tolower(property_multiplier),
crop_multiplier = tolower(crop_multiplier))
subset_data5$property_multiplier <- as.character(subset_data5$property_multiplier)
subset_data5$property_multiplier[is.na(subset_data5$property_multiplier)] <- 0
subset_data5$property_multiplier[!grepl("k|m|b|h|2|3|4|5|6|7", subset_data5$property_multiplier)] <- 0
subset_data5$property_multiplier[grep("k", subset_data5$property_multiplier)] <- "3"
subset_data5$property_multiplier[grep("m", subset_data5$property_multiplier)] <- "6"
subset_data5$property_multiplier[grep("b", subset_data5$property_multiplier)] <- "9"
subset_data5$property_multiplier[grep("h", subset_data5$property_multiplier)] <- "2"
subset_data5$crop_multiplier <- as.character(subset_data5$crop_multiplier)
subset_data5$crop_multiplier[is.na(subset_data5$crop_multiplier)] <- 0
subset_data5$crop_multiplier[!grepl("k|m|b|h|2|3|4|5|6|7", subset_data5$crop_multiplier)] <- 0
subset_data5$crop_multiplier[grep("k", subset_data5$crop_multiplier)] <- "3"
subset_data5$crop_multiplier[grep("m", subset_data5$crop_multiplier)] <- "6"
subset_data5$crop_multiplier[grep("b", subset_data5$crop_multiplier)] <- "9"
subset_data5$crop_multiplier[grep("h", subset_data5$crop_multiplier)] <- "2"
subset_data5$property_multiplier <- as.numeric(as.character(subset_data5$property_multiplier))
subset_data5$crop_multiplier <- as.numeric(as.character(subset_data5$crop_multiplier))
subset_data5 <- subset_data5 %>%
mutate(prop_damage = property_damage * 10^property_multiplier,
crop_damage = crop_damage * 10^crop_multiplier) %>%
select(event, fatalities, injuries, crop_damage, prop_damage)
head(subset_data5)
## event fatalities injuries crop_damage prop_damage
## 1 tornado 0 15 0 25000
## 2 tornado 0 0 0 2500
## 3 tornado 0 2 0 25000
## 4 tornado 0 2 0 2500
## 5 tornado 0 2 0 2500
## 6 tornado 0 6 0 2500
The data is aggregated by the type of event using summarise function resulting in 2 datasets: health and economic. Economic figures are transformed into billions.
health_dt <- subset_data5 %>%
group_by(event) %>% summarise(fatalities = sum(fatalities),
injuries = sum(injuries))
fatal_h <- health_dt %>%
select(event, result = fatalities) %>%
mutate(type = "fatal")
injuries_h <- health_dt %>%
select(event, result = injuries) %>%
mutate(type = "injury")
health_dt <- rbind(fatal_h, injuries_h)
health_dt
## # A tibble: 20 x 3
## event result type
## <chr> <dbl> <chr>
## 1 fire 90 fatal
## 2 flood 1562 fatal
## 3 hail 15 fatal
## 4 heat 3002 fatal
## 5 rain 114 fatal
## 6 snow 265 fatal
## 7 storm 1450 fatal
## 8 tornado 5633 fatal
## 9 waves 968 fatal
## 10 wind 1291 fatal
## 11 fire 1608 injury
## 12 flood 8753 injury
## 13 hail 1371 injury
## 14 heat 8920 injury
## 15 rain 305 injury
## 16 snow 1969 injury
## 17 storm 11923 injury
## 18 tornado 91364 injury
## 19 waves 1313 injury
## 20 wind 9616 injury
economic_dt <- subset_data5 %>%
group_by(event) %>% summarise(crop_damage_bln = sum(crop_damage) / 1000000000,
prop_damage_bln = sum(prop_damage) / 1000000000)
prop_econ <- economic_dt %>%
select(event, result = prop_damage_bln) %>%
mutate(type = "property damage")
crop_econ <- economic_dt %>%
select(event, result = crop_damage_bln) %>%
mutate(type = "crop damage")
economic_dt <- rbind(prop_econ, crop_econ)
economic_dt
## # A tibble: 20 x 3
## event result type
## <chr> <dbl> <chr>
## 1 fire 8.50 property damage
## 2 flood 168. property damage
## 3 hail 15.7 property damage
## 4 heat 0.0171 property damage
## 5 rain 3.25 property damage
## 6 snow 1.68 property damage
## 7 storm 74.2 property damage
## 8 tornado 57.0 property damage
## 9 waves 0.271 property damage
## 10 wind 12.4 property damage
## 11 fire 0.403 crop damage
## 12 flood 12.3 crop damage
## 13 hail 3.05 crop damage
## 14 heat 0.899 crop damage
## 15 rain 0.918 crop damage
## 16 snow 0.247 crop damage
## 17 storm 6.42 crop damage
## 18 tornado 0.415 crop damage
## 19 waves 0.00712 crop damage
## 20 wind 1.41 crop damage
Tornado is the most harmful event of all severe weather conditions in US. Total number of injuries and deaths is 96997, which is 7 times more than the storms.
health_dts <- health_dt %>%
group_by(event) %>% summarise(result = sum(result)) %>%
arrange(desc(result))
health_dts
## # A tibble: 10 x 2
## event result
## <chr> <dbl>
## 1 tornado 96997
## 2 storm 13373
## 3 heat 11922
## 4 wind 10907
## 5 flood 10315
## 6 waves 2281
## 7 snow 2234
## 8 fire 1698
## 9 hail 1386
## 10 rain 419
Distribution of health impact is given in the figure below.
ggplot(health_dt) +
theme_bw() +
ggtitle("Weather impact on population health") +
xlab("Event type") + ylab("Fatalities and Injuries") +
geom_col(aes(reorder(event, result), result, fill = type))
In economic terms floods caused 181 bln USD loss, which makes them the most harmful event.
economic_dts <- economic_dt %>%
group_by(event) %>% summarise(result = sum(result)) %>%
arrange(desc(result))
economic_dts
## # A tibble: 10 x 2
## event result
## <chr> <dbl>
## 1 flood 181.
## 2 storm 80.6
## 3 tornado 57.4
## 4 hail 18.8
## 5 wind 13.8
## 6 fire 8.90
## 7 rain 4.17
## 8 snow 1.93
## 9 heat 0.916
## 10 waves 0.278
Distribution of economic impact is given in the figure below.
ggplot(economic_dt) +
theme_bw() +
ggtitle("Weather impact on economy") +
xlab("Event type") + ylab("Loss in bln USD") +
geom_col(aes(reorder(event, result), result, fill = type))
Tornado is the major cause of death or injury of other types of severe weather conditions. It caused almost 100k deaths and injuries. As for the economy, the most harmful is the flood with 1814 bln USD in total loss. The major source of loss is property damage.