iair kleiman
The objective of this study is to identify the most lethal Weather Events and also those with the greatest economic impact. To do this, I’ve used NOAA historic data from (1950-2011). Some important data procesing and correction was needed. The raw database had mispelling and different name for the same weather events. Also the economic impact had different units (houndred, thousand, millions, billion).
The most dangerous events (fatalities) were: tornadoes, hot weather, floods, storms and lightnings. The most expensive events were: floods, hurricanes, storm, tornadoes and hail.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
library(ggplot2)
library(pander) # A better looking alternative to XTABLE
## Warning: package 'pander' was built under R version 3.1.3
library(gridExtra) #package the enables multiple ggplots
## Warning: package 'gridExtra' was built under R version 3.1.3
## Loading required package: grid
storm_data <- read.csv(bzfile("repdata-data-StormData.csv.bz2") ) #data file import
storm_tbl <- tbl_df(storm_data) # #data.frame to data.table conversion
To avoid even more name fixing, I converted EVTYPE, PROPDMGEXP and CROPDMGEXP all to lower cases.
storm_tbl <- storm_tbl %>% mutate(EVTYPE = tolower(EVTYPE), PROPDMGEXP =tolower(PROPDMGEXP),
CROPDMGEXP = tolower(CROPDMGEXP))
The Damages amounts are not in the same units. Some ammount are in houndred dollar, thousands, millions and even billions. Let’s bring everything to Millions of Dollars
storm_tbl <- storm_tbl %>% mutate(Property_Damage = ifelse(PROPDMGEXP=="h",PROPDMG * 100/1000000,
ifelse(PROPDMGEXP=="k", PROPDMG * 1/1000, ifelse(PROPDMGEXP=="m",PROPDMG * 1,
ifelse(PROPDMGEXP=="b", PROPDMG * 1000, 0)))))
storm_tbl <- storm_tbl %>% mutate(Crop_Damage = ifelse(CROPDMGEXP=="h",CROPDMG * 100/1000000,
ifelse(CROPDMGEXP=="k", CROPDMG * 1/1000, ifelse(CROPDMGEXP=="m",CROPDMG * 1,
ifelse(CROPDMGEXP=="b", CROPDMG * 1000, 0)))))
Before the EVTYPE name cleaning, I wanted to know how many different EVTYPE names there were
sorted <- unique(storm_tbl$EVTYPE)
sorted <- sort(sorted)
head(sorted, 150)
## [1] " high surf advisory" " coastal flood"
## [3] " flash flood" " lightning"
## [5] " tstm wind" " tstm wind (g45)"
## [7] " waterspout" " wind"
## [9] "?" "abnormal warmth"
## [11] "abnormally dry" "abnormally wet"
## [13] "accumulated snowfall" "agricultural freeze"
## [15] "apache county" "astronomical high tide"
## [17] "astronomical low tide" "avalance"
## [19] "avalanche" "beach erosin"
## [21] "beach erosion" "beach erosion/coastal flood"
## [23] "beach flood" "below normal precipitation"
## [25] "bitter wind chill" "bitter wind chill temperatures"
## [27] "black ice" "blizzard"
## [29] "blizzard and extreme wind chil" "blizzard and heavy snow"
## [31] "blizzard summary" "blizzard weather"
## [33] "blizzard/freezing rain" "blizzard/heavy snow"
## [35] "blizzard/high wind" "blizzard/winter storm"
## [37] "blow-out tide" "blow-out tides"
## [39] "blowing dust" "blowing snow"
## [41] "blowing snow- extreme wind chi" "blowing snow & extreme wind ch"
## [43] "blowing snow/extreme wind chil" "breakup flooding"
## [45] "brush fire" "brush fires"
## [47] "coastal flooding/erosion" "coastal erosion"
## [49] "coastal flood" "coastal flooding"
## [51] "coastal flooding/erosion" "coastal storm"
## [53] "coastal surge" "coastal/tidal flood"
## [55] "coastalflood" "coastalstorm"
## [57] "cold" "cold air funnel"
## [59] "cold air funnels" "cold air tornado"
## [61] "cold and frost" "cold and snow"
## [63] "cold and wet conditions" "cold temperature"
## [65] "cold temperatures" "cold wave"
## [67] "cold weather" "cold wind chill temperatures"
## [69] "cold/wind chill" "cold/winds"
## [71] "cool and wet" "cool spell"
## [73] "cstl flooding/erosion" "dam break"
## [75] "dam failure" "damaging freeze"
## [77] "deep hail" "dense fog"
## [79] "dense smoke" "downburst"
## [81] "downburst winds" "driest month"
## [83] "drifting snow" "drought"
## [85] "drought/excessive heat" "drowning"
## [87] "dry" "dry conditions"
## [89] "dry hot weather" "dry microburst"
## [91] "dry microburst 50" "dry microburst 53"
## [93] "dry microburst 58" "dry microburst 61"
## [95] "dry microburst 84" "dry microburst winds"
## [97] "dry mircoburst winds" "dry pattern"
## [99] "dry spell" "dry weather"
## [101] "dryness" "dust devel"
## [103] "dust devil" "dust devil waterspout"
## [105] "dust storm" "dust storm/high winds"
## [107] "duststorm" "early freeze"
## [109] "early frost" "early rain"
## [111] "early snow" "early snowfall"
## [113] "erosion/cstl flood" "excessive"
## [115] "excessive cold" "excessive heat"
## [117] "excessive heat/drought" "excessive precipitation"
## [119] "excessive rain" "excessive rainfall"
## [121] "excessive snow" "excessive wetness"
## [123] "excessively dry" "extended cold"
## [125] "extreme cold" "extreme cold/wind chill"
## [127] "extreme heat" "extreme wind chill"
## [129] "extreme wind chill/blowing sno" "extreme wind chills"
## [131] "extreme windchill" "extreme windchill temperatures"
## [133] "extreme/record cold" "extremely wet"
## [135] "falling snow/ice" "first frost"
## [137] "first snow" "flash flood"
## [139] "flash flood - heavy rain" "flash flood from ice jams"
## [141] "flash flood landslides" "flash flood winds"
## [143] "flash flood/" "flash flood/ flood"
## [145] "flash flood/ street" "flash flood/flood"
## [147] "flash flood/heavy rain" "flash flood/landslide"
## [149] "flash flooding" "flash flooding/flood"
summary(sorted)
## Length Class Mode
## 898 character character
There are many mistakes in the EVTYPE naming, for example how thunderstorm is writen, wind is writen as wind, winds, wnd. If the word record or excesive was added. There are also spelling errors and even spaces befores a word. I will make some fixes, but there is a lot of work to be done.
First thing to fix is to filter rows that include the text “summary”
storm_tbl <- storm_tbl %>% filter(!grepl("summary", EVTYPE))
Now let the heavy name fixing begin! I will try to cluster the Event Type Categories to be able later to add similar events and their consequences
storm_tbl <- storm_tbl %>%
mutate(EVTYPE = ifelse(grepl("tstm|thund|thundeerstorm|tunderstorm|
thundertsorm", EVTYPE),"thunderstorm", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("tornado|torndao", EVTYPE),"tornado", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("hail", EVTYPE), "hail", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("lightning|lighting|lightning|ligntning", EVTYPE),
"lightning", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("blizzard", EVTYPE),"blizzard", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("rain", EVTYPE), "rain", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("cold", EVTYPE),"cold weather", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("mud", EVTYPE),"mudslide", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("hurricane", EVTYPE),"hurricane", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("hot", EVTYPE),"hot weather", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("wind|wnd", EVTYPE),"winds", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("flood|fld", EVTYPE), "flood", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("snow", EVTYPE),"snow", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("winter storm", EVTYPE),
"winter storm", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("tropical ", EVTYPE),
"tropical storm", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("storm", EVTYPE),"storm", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("volcanic", EVTYPE),"Volcanic Activity", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("wild", EVTYPE),"wildfire", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("heat", EVTYPE),"hot weather", EVTYPE)) %>%
mutate(EVTYPE = ifelse(grepl("current", EVTYPE),"rip current", EVTYPE))
sorted <- unique(storm_tbl$EVTYPE)
sorted <- sort(sorted)
sorted
## [1] " high surf advisory" " waterspout"
## [3] "?" "abnormal warmth"
## [5] "abnormally dry" "abnormally wet"
## [7] "agricultural freeze" "apache county"
## [9] "astronomical high tide" "astronomical low tide"
## [11] "avalance" "avalanche"
## [13] "beach erosin" "beach erosion"
## [15] "below normal precipitation" "black ice"
## [17] "blizzard" "blow-out tide"
## [19] "blow-out tides" "blowing dust"
## [21] "brush fire" "brush fires"
## [23] "coastal erosion" "coastal surge"
## [25] "cold weather" "cool and wet"
## [27] "cool spell" "dam break"
## [29] "dam failure" "damaging freeze"
## [31] "dense fog" "dense smoke"
## [33] "downburst" "driest month"
## [35] "drought" "drowning"
## [37] "dry" "dry conditions"
## [39] "dry microburst" "dry microburst 50"
## [41] "dry microburst 53" "dry microburst 58"
## [43] "dry microburst 61" "dry microburst 84"
## [45] "dry pattern" "dry spell"
## [47] "dry weather" "dryness"
## [49] "dust devel" "dust devil"
## [51] "dust devil waterspout" "early freeze"
## [53] "early frost" "excessive"
## [55] "excessive precipitation" "excessive wetness"
## [57] "excessively dry" "extremely wet"
## [59] "first frost" "flash floooding"
## [61] "flood" "fog"
## [63] "forest fires" "freeze"
## [65] "freezing drizzle" "freezing drizzle and freezing"
## [67] "freezing fog" "freezing spray"
## [69] "frost" "frost/freeze"
## [71] "frost\\freeze" "funnel"
## [73] "funnel cloud" "funnel cloud."
## [75] "funnel clouds" "funnels"
## [77] "glaze" "glaze ice"
## [79] "grass fires" "gustnado"
## [81] "gustnado and" "hail"
## [83] "hard freeze" "hazardous surf"
## [85] "heavy mix" "heavy precipatation"
## [87] "heavy precipitation" "heavy seas"
## [89] "heavy shower" "heavy showers"
## [91] "heavy surf" "heavy surf/high surf"
## [93] "heavy swells" "high"
## [95] "high swells" "high seas"
## [97] "high surf" "high surf advisories"
## [99] "high surf advisory" "high swells"
## [101] "high temperature record" "high tides"
## [103] "high water" "high waves"
## [105] "hot weather" "hurricane"
## [107] "hyperthermia/exposure" "hypothermia"
## [109] "hypothermia/exposure" "ice"
## [111] "ice floes" "ice fog"
## [113] "ice jam" "ice on road"
## [115] "ice pellets" "ice roads"
## [117] "icy roads" "landslide"
## [119] "landslides" "landslump"
## [121] "landspout" "large wall cloud"
## [123] "late freeze" "lightning"
## [125] "low temperature" "low temperature record"
## [127] "marine accident" "marine mishap"
## [129] "microburst" "mild and dry pattern"
## [131] "mild pattern" "mild/dry pattern"
## [133] "mixed precip" "mixed precipitation"
## [135] "monthly precipitation" "monthly temperature"
## [137] "mudslide" "no severe weather"
## [139] "none" "normal precipitation"
## [141] "northern lights" "other"
## [143] "patchy dense fog" "patchy ice"
## [145] "prolong warmth" "rain"
## [147] "rapidly rising water" "record cool"
## [149] "record dry month" "record dryness"
## [151] "record high" "record high temperature"
## [153] "record high temperatures" "record low"
## [155] "record precipitation" "record temperature"
## [157] "record temperatures" "record warm"
## [159] "record warm temps." "record warmth"
## [161] "red flag criteria" "red flag fire wx"
## [163] "remnants of floyd" "rip current"
## [165] "rock slide" "rogue wave"
## [167] "rotating wall cloud" "rough seas"
## [169] "rough surf" "saharan dust"
## [171] "seiche" "severe turbulence"
## [173] "sleet" "small stream"
## [175] "small stream and" "smoke"
## [177] "snow" "southeast"
## [179] "storm" "temperature record"
## [181] "tornado" "tsunami"
## [183] "typhoon" "unseasonably cool"
## [185] "unseasonably cool & wet" "unseasonably dry"
## [187] "unseasonably warm" "unseasonably warm & wet"
## [189] "unseasonably warm and dry" "unseasonably warm year"
## [191] "unseasonably warm/wet" "unseasonably wet"
## [193] "unseasonal low temp" "unusual warmth"
## [195] "unusual/record warmth" "unusually warm"
## [197] "urban and small" "urban and small stream"
## [199] "urban small" "urban/small"
## [201] "urban/small stream" "very dry"
## [203] "very warm" "vog"
## [205] "Volcanic Activity" "wall cloud"
## [207] "wall cloud/funnel cloud" "warm dry conditions"
## [209] "warm weather" "water spout"
## [211] "waterspout" "waterspout-"
## [213] "waterspout funnel cloud" "waterspout/"
## [215] "waterspouts" "wayterspout"
## [217] "wet micoburst" "wet microburst"
## [219] "wet month" "wet weather"
## [221] "wet year" "wildfire"
## [223] "winds" "winter mix"
## [225] "winter weather" "winter weather mix"
## [227] "winter weather/mix" "wintery mix"
## [229] "wintry mix"
summary(sorted)
## Length Class Mode
## 229 character character
After cleaning a little the Event Types name, it was posible to cut from 898 Event Types down to 229.
At this point I will select the most important variables for this study. I will keep the event initial date, the State, the Event Type, the number of fatalities, the number of injuries, Property Damage (million dollars) and crop damage (million dollars). I’m also grouping the data by Event Type.
# Data Grouping by Type of Event
storm_tbl$BGN_DATE <- mdy_hms(storm_tbl$BGN_DATE)
storm_flt <- storm_tbl %>% select(BGN_DATE, STATE, EVTYPE, FATALITIES,
INJURIES, Property_Damage, Crop_Damage)
storm_grp <- group_by(storm_flt, EVTYPE)
Now i will make a sum of the Fatalities of every type of event, also make the sum of the injuries of the events, and add a column with the sum of both fatalities plus injueries.
I sorted this data table, by number of fatalities
# Event Sorting based on Casualties and Injuries
storm_health <- storm_grp %>%
summarize(FATALITIES = sum(FATALITIES, na.rm=T),
INJURIES= sum(INJURIES, na.rm=T)) %>% arrange(desc(FATALITIES),
desc(INJURIES))
storm_health <- mutate(storm_health, Total_Incidents = FATALITIES+INJURIES )
I’m repeating the proccess but now for the economic impact
# Event sorting based on Economic Impact
storm_dmg <- storm_grp %>%
summarize(Property_Damage = sum(Property_Damage, na.rm=T),
Crop_Damage= sum(Crop_Damage, na.rm=T)) %>%
mutate(Total_Damage= Property_Damage+Crop_Damage) %>%
arrange(desc(Total_Damage), desc(Property_Damage))
Now that I have data sorted, I’m selecting the 15 event type with the biggest impact
storm_health_15 <- head(storm_health, 15)
storm_dmg_15 <- head(storm_dmg, 15)
pander(head(storm_health, 15))
| EVTYPE | FATALITIES | INJURIES | Total_Incidents |
|---|---|---|---|
| tornado | 5636 | 91407 | 97043 |
| hot weather | 3138 | 9224 | 12362 |
| flood | 1552 | 8683 | 10235 |
| storm | 1177 | 13741 | 14918 |
| lightning | 817 | 5231 | 6048 |
| rip current | 577 | 529 | 1106 |
| winds | 473 | 1954 | 2427 |
| cold weather | 451 | 320 | 771 |
| avalanche | 224 | 170 | 394 |
| snow | 143 | 1119 | 1262 |
| hurricane | 135 | 1328 | 1463 |
| rain | 114 | 305 | 419 |
| high surf | 104 | 156 | 260 |
| blizzard | 101 | 806 | 907 |
| wildfire | 90 | 1606 | 1696 |
pander(head(storm_dmg, 15))
| EVTYPE | Property_Damage | Crop_Damage | Total_Damage |
|---|---|---|---|
| flood | 167566 | 12275 | 179841 |
| hurricane | 84756 | 5515 | 90271 |
| storm | 78898 | 7023 | 85920 |
| tornado | 56993 | 415 | 57408 |
| hail | 15975 | 3047 | 19021 |
| drought | 1046 | 13973 | 15019 |
| wildfire | 8492 | 402.8 | 8894 |
| winds | 6146 | 772.8 | 6918 |
| rain | 3265 | 919.3 | 4185 |
| cold weather | 246.6 | 1417 | 1663 |
| snow | 1010 | 134.7 | 1145 |
| frost/freeze | 10.48 | 1094 | 1105 |
| lightning | 938.7 | 12.09 | 950.8 |
| hot weather | 20.33 | 904.5 | 924.8 |
| blizzard | 664.9 | 112.1 | 777 |
health_plot <- ggplot(storm_health_15, aes(x= reorder(EVTYPE, -FATALITIES), FATALITIES)) +
geom_bar(fill="blue", stat="identity") + theme(axis.text.x=element_text(angle = 45, hjust = 1)) +
ggtitle("Top 15 Harmful Event Types") + ylab("Fatalities") + xlab("Event Type")
dmg_plot <- ggplot(storm_dmg_15, aes(x= reorder(EVTYPE, -Total_Damage),
Total_Damage)) + geom_bar(fill="forestgreen",
stat="identity") + theme(axis.text.x=element_text(angle = 45, hjust = 1)) +
ggtitle("Top 15 Expensive Event Types") + ylab("Total Economic Damage (Million Dollars)") +
xlab("Event Type")
grid.arrange(health_plot, dmg_plot, ncol=2,nrow=1)