NOAA’s significant weather event data (also known as Storm data) for USA from years 1950 through 2011 were analyzed for fatalities, injuries, property damage and crop damages. Weather event types in the source data had some areas that required tidying. Similarly, property damage and crop damage magnitude fields had some invalid data but that percentage was very low.
Results show Tornado, Excessive Heat, Heat, Flash Flood and High Wind are the top five killers. For injuries, the top five events are: Tornado, High Wind, Flood, Excessive Heat and Lightning. In terms of damages to property and crops, Flood, Hurricane (Typhoon), Tornado, Storm Surge/Tide and Hail are the top five events. Please review the analysis and graphs for further details.
The data analysis addresses the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The analysis and illustrations are to help a government or a municipal manager who may be responsible for preparing for severe weather events and will need to prioritize resources for different types of events.
Please note this is the first step in analysis. Further analysis is possible by looking at trends over years as well as by geography. These will be done in subsequent steps of the analysis in the future.
Ensure all the needed libraries are loaded. The code and results are echoed to output.
library(knitr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
opts_chunk$set(echo = TRUE)
Data is read directly from the “bz2” zipfile which was downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
## validate the zipfile exists
dataFileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filePath <- "./repdata_data_StormData.csv.bz2"
if (!file.exists(filePath))
{
print(paste0("File '", filePath, "' does not exist"))
print(paste0("Download dataset from '", dataFileURL, "' to the working directory"))
quit()
}
## read the file
stormDF <- read.csv(filePath, stringsAsFactors = FALSE)
The source data needed some tidying to help answer the questions.
1. NOAA documentation lists 48 event types. However, analysis shows 985 event types are present in the source data. From that list, 57 event types contributes to 10 deaths or more. Those event types are mapped to existing 48 event types. As we could not find a good mapping for Landslide, it is created as a new event type.
2. From the source data, 87 event types contribute to 10 injuries or more. Those event types are mapped to the existing 49 event types (original 48 plus Landslide).
3. Fields PROPDMGEXP and CROPDMGEXP are supposed to be blank or K for thousnds, or M for millions or B for billions. However, 0.04% of PROPDMGEXP and 0.00004% of CROPDMGEXP contain invalid values and an assumption was made to ignore those values and treat the multiplier as 1. Mapping has been done to map “k” to “K”, “m” to “M” and “b” to “B”. This impacts a very small subset of data (less than 0.000016% of rows).
Storm data documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) lists 48 event types whereas there are 985 levels in the dataset. We need to map these events in a reasonable way.
## initial peek into data
evtype <- as.factor(stormDF$EVTYPE)
str(evtype)
## Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## create a vector of standard event types
stdEvtype <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill",
"Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm",
"Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze",
"Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf",
"High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood",
"Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind",
"Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide",
"Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm",
"Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
## create a mapping from various forms of standard event types to standard event types
## we are using parallel vectors to maintain this mapping
events <- vector()
stdEvents <- vector()
for (idx in 1:length(stdEvtype))
{
## convert everything to upper case for case insensitive comparison
event <- toupper(stdEvtype[idx])
events <- append(events, event)
stdEvents <- append(stdEvents, stdEvtype[idx])
}
## Need to convert:
## "Cold/Wind Chill", "Cold" and "Wind Chill" maps to "Cold/Wind Chill" (use of '/')
## "Lake-Effect Snow", "Lake Effect Snow" to "Lake-Effect Snow" (use of '-')
## "Hurricane (Typhoon)", "Hurricane", "Typhoon" to "Hurricane (Typhoon)" (use of '(' and ')')
## Tried doing the above by code. However, found there are exceptions to these.
## Hence doing these by hand mapping!
additionalEvents <- c("EXTREME COLD", "EXTREME WIND CHILL", "COLD",
"WIND CHILL", "FROST", "FREEZE", "HURRICANE",
"TYPHOON", "LAKE EFFECT SNOW", "STORM SURGE",
"TIDE")
additionalStdEvents <- c("Extreme Cold/Wind Chill", "Extreme Cold/Wind Chill", "Cold/Wind Chill",
"Cold/Wind Chill", "Frost/Freeze", "Frost/Freeze", "Hurricane (Typhoon)",
"Hurricane (Typhoon)", "Lake-Effect Snow", "Storm Surge/Tide",
"Storm Surge/Tide")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)
## display top 100 events causing most fatalities
peopleHealthDF <- stormDF %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarize (totalFatalities=sum(FATALITIES), totalInjuries=sum(INJURIES)) %>%
arrange(desc(totalFatalities))
head(peopleHealthDF$EVTYPE, n = 100)
## [1] "TORNADO" "EXCESSIVE HEAT"
## [3] "FLASH FLOOD" "HEAT"
## [5] "LIGHTNING" "TSTM WIND"
## [7] "FLOOD" "RIP CURRENT"
## [9] "HIGH WIND" "AVALANCHE"
## [11] "WINTER STORM" "RIP CURRENTS"
## [13] "HEAT WAVE" "EXTREME COLD"
## [15] "THUNDERSTORM WIND" "HEAVY SNOW"
## [17] "EXTREME COLD/WIND CHILL" "STRONG WIND"
## [19] "BLIZZARD" "HIGH SURF"
## [21] "HEAVY RAIN" "EXTREME HEAT"
## [23] "COLD/WIND CHILL" "ICE STORM"
## [25] "WILDFIRE" "HURRICANE/TYPHOON"
## [27] "THUNDERSTORM WINDS" "FOG"
## [29] "HURRICANE" "TROPICAL STORM"
## [31] "HEAVY SURF/HIGH SURF" "LANDSLIDE"
## [33] "COLD" "HIGH WINDS"
## [35] "TSUNAMI" "WINTER WEATHER"
## [37] "UNSEASONABLY WARM AND DRY" "URBAN/SML STREAM FLD"
## [39] "WINTER WEATHER/MIX" "TORNADOES, TSTM WIND, HAIL"
## [41] "WIND" "DUST STORM"
## [43] "FLASH FLOODING" "DENSE FOG"
## [45] "EXTREME WINDCHILL" "FLOOD/FLASH FLOOD"
## [47] "RECORD/EXCESSIVE HEAT" "HAIL"
## [49] "COLD AND SNOW" "FLASH FLOOD/FLOOD"
## [51] "MARINE STRONG WIND" "STORM SURGE"
## [53] "WILD/FOREST FIRE" "STORM SURGE/TIDE"
## [55] "UNSEASONABLY WARM" "MARINE THUNDERSTORM WIND"
## [57] "WINTER STORMS" "MARINE TSTM WIND"
## [59] "ROUGH SEAS" "TROPICAL STORM GORDON"
## [61] "FREEZING RAIN" "GLAZE"
## [63] "HEAVY SURF" "LOW TEMPERATURE"
## [65] "MARINE MISHAP" "STRONG WINDS"
## [67] "FLOODING" "HURRICANE ERIN"
## [69] "ICE" "COLD WEATHER"
## [71] "FLASH FLOODING/FLOOD" "HEAT WAVES"
## [73] "HIGH SEAS" "ICY ROADS"
## [75] "RIP CURRENTS/HEAVY SURF" "SNOW"
## [77] "TSTM WIND/HAIL" "GUSTY WINDS"
## [79] "HEAT WAVE DROUGHT" "HIGH WIND/SEAS"
## [81] "Hypothermia/Exposure" "Mudslide"
## [83] "RAIN/SNOW" "ROUGH SURF"
## [85] "SNOW AND ICE" "COASTAL FLOOD"
## [87] "COASTAL STORM" "Cold"
## [89] "COLD WAVE" "DRY MICROBURST"
## [91] "HEAVY SEAS" "Heavy surf and wind"
## [93] "High Surf" "HIGH WATER"
## [95] "HIGH WIND AND SEAS" "HIGH WINDS/SNOW"
## [97] "HYPOTHERMIA/EXPOSURE" "WATERSPOUT"
## [99] "WATERSPOUT/TORNADO" "WILD FIRES"
head(peopleHealthDF$totalFatalities, n = 100)
## [1] 5633 1903 978 937 816 504 470 368 248 224 206 204 172 160
## [15] 133 127 125 103 101 101 98 96 95 89 75 64 64 62
## [29] 61 58 42 38 35 35 33 33 29 28 28 25 23 22
## [43] 19 18 17 17 17 15 14 14 14 13 12 11 11 10
## [57] 10 9 8 8 7 7 7 7 7 7 6 6 6 5
## [71] 5 5 5 5 5 5 5 4 4 4 4 4 4 4
## [85] 4 3 3 3 3 3 3 3 3 3 3 3 3 3
## [99] 3 3
## there are 57 EVTYPES where fatalities are greater than or equal to 10
## map mislabeled/mistyped events from these 57 to the standard events
additionalEvents <- c("HEAT WAVE", "EXTREME HEAT", "HURRICANE/TYPHOON", "FOG",
"HEAVY SURF/HIGH SURF", "LANDSLIDE", "UNSEASONABLY WARM AND DRY",
"URBAN/SML STREAM FLD", "WINTER WEATHER/MIX", "WIND", "EXTREME WINDCHILL",
"WILD/FOREST FIRE", "UNSEASONABLY WARM")
additionalStdEvents <- c("Heat", "Excessive Heat", "Hurricane (Typhoon)", "Dense Fog",
"High Surf", "Landslide", "Excessive Heat",
"Flash Flood", "Winter Weather", "High Wind", "Extreme Cold/Wind Chill",
"Wildfire", "Excessive Heat")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)
## display top 100 events causing most injuries
peopleHealthDF <- peopleHealthDF %>%
arrange(desc(totalInjuries))
head(peopleHealthDF$EVTYPE, n = 100)
## [1] "TORNADO" "TSTM WIND"
## [3] "FLOOD" "EXCESSIVE HEAT"
## [5] "LIGHTNING" "HEAT"
## [7] "ICE STORM" "FLASH FLOOD"
## [9] "THUNDERSTORM WIND" "HAIL"
## [11] "WINTER STORM" "HURRICANE/TYPHOON"
## [13] "HIGH WIND" "HEAVY SNOW"
## [15] "WILDFIRE" "THUNDERSTORM WINDS"
## [17] "BLIZZARD" "FOG"
## [19] "WILD/FOREST FIRE" "DUST STORM"
## [21] "WINTER WEATHER" "DENSE FOG"
## [23] "TROPICAL STORM" "HEAT WAVE"
## [25] "HIGH WINDS" "RIP CURRENTS"
## [27] "STRONG WIND" "HEAVY RAIN"
## [29] "RIP CURRENT" "EXTREME COLD"
## [31] "GLAZE" "AVALANCHE"
## [33] "EXTREME HEAT" "HIGH SURF"
## [35] "WILD FIRES" "ICE"
## [37] "TSUNAMI" "TSTM WIND/HAIL"
## [39] "WIND" "URBAN/SML STREAM FLD"
## [41] "WINTRY MIX" "WINTER WEATHER/MIX"
## [43] "Heat Wave" "WINTER WEATHER MIX"
## [45] "LANDSLIDE" "RECORD HEAT"
## [47] "HEAVY SURF/HIGH SURF" "COLD"
## [49] "HURRICANE" "TROPICAL STORM GORDON"
## [51] "WATERSPOUT/TORNADO" "DUST DEVIL"
## [53] "HEAVY SURF" "STORM SURGE"
## [55] "SNOW/HIGH WINDS" "SNOW SQUALL"
## [57] "ICY ROADS" "SNOW"
## [59] "WATERSPOUT" "DRY MICROBURST"
## [61] "THUNDERSTORMW" "MARINE THUNDERSTORM WIND"
## [63] "MIXED PRECIP" "EXTREME COLD/WIND CHILL"
## [65] "BLACK ICE" "FREEZING RAIN"
## [67] "MARINE STRONG WIND" "STRONG WINDS"
## [69] "EXCESSIVE RAINFALL" "HIGH WIND AND SEAS"
## [71] "UNSEASONABLY WARM" "WINTER STORMS"
## [73] "TORNADO F2" "FLOOD/FLASH FLOOD"
## [75] "HEAT WAVE DROUGHT" "FREEZING DRIZZLE"
## [77] "WINTER STORM HIGH WINDS" "GLAZE/ICE STORM"
## [79] "BLOWING SNOW" "COLD/WIND CHILL"
## [81] "THUNDERSTORM" "HEAVY SNOW/ICE"
## [83] "SMALL HAIL" "THUNDERSTORM WINDS"
## [85] "FLASH FLOODING" "MARINE TSTM WIND"
## [87] "HIGH SEAS" "GUSTY WINDS"
## [89] "NON-SEVERE WIND DAMAGE" "HIGH WINDS/SNOW"
## [91] "EXTREME WINDCHILL" "STORM SURGE/TIDE"
## [93] "ROUGH SEAS" "MARINE MISHAP"
## [95] "COASTAL FLOODING/EROSION" "TYPHOON"
## [97] "High Surf" "DROUGHT"
## [99] "HEAVY RAINS" "HIGH WINDS/COLD"
head(peopleHealthDF$totalInjuries, n = 100)
## [1] 91346 6957 6789 6525 5230 2100 1975 1777 1488 1361 1321
## [12] 1275 1137 1021 911 908 805 734 545 440 398 342
## [23] 340 309 302 297 280 251 232 231 216 170 155
## [34] 152 150 137 129 95 86 79 77 72 70 68
## [45] 52 50 48 48 46 43 42 42 40 38 36
## [56] 35 31 29 29 28 27 26 26 24 24 23
## [67] 22 21 21 20 17 17 16 15 15 15 15
## [78] 15 13 12 12 10 10 10 8 8 8 8
## [89] 7 6 5 5 5 5 5 5 4 4 4
## [100] 4
## there are 84 EVTYPES where injuries are greater than or equal to 10
## map mislabeled/mistyped events from these 84 to the standard events
## No equivalents exit for LANDSLIDE - hence, leaving that as is
additionalEvents <- c("TSTM WIND", "WILD FIRES", "ICE", "TSTM WIND/HAIL", "WINTRY MIX",
"RECORD HEAT", "SNOW/HIGH WINDS", "SNOW SQUALL", "ICY ROADS", "SNOW",
"DRY MICROBURST", "THUNDERSTORMW", "MIXED PRECIP", "BLACK ICE", "FREEZING RAIN",
"EXCESSIVE RAINFALL", "FREEZING DRIZZLE", "BLOWING SNOW", "SMALL HAIL")
additionalStdEvents <- c("Thunderstorm Wind", "Wildfire", "Ice Storm", "Thunderstorm Wind", "Winter Weather",
"Excessive Heat", "High Wind", "Winter Storm", "Frost/Freeze", "Heavy Snow",
"Thunderstorm Wind", "Thunderstorm Wind", "Heavy Rain", "Frost/Freeze", "Heavy Rain",
"Heavy Rain", "Heavy Rain", "Winter Storm", "Hail")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)
## the following is based on some observed typos and liberties taken with naming
additionalEvents <- c("AVALANCE", "DUST DEVEL", "DUSTSTORM", "HURRICANE", "LIGHTING",
"LIGNTNING", "TSTM", "THUNDERSTORM")
additionalStdEvents <- c("Avalanche", "Dust Devil", "Dust Storm", "Hurricane (Typhoon)", "Lightning",
"Lightning", "Thunderstorm Wind", "Thunderstorm Wind")
events <- c(events, additionalEvents)
stdEvents <- c(stdEvents, additionalStdEvents)
## create new variable for the updated EVTYPE
stormDF$newEvtype <- apply(stormDF, 1, function(x)
{
eventToMatch <- toupper(x["EVTYPE"])
## exact matches first (hence fixed = TRUE in grepl)
for (idx in 1:length(events))
{
if (grepl(events[idx], eventToMatch, fixed = TRUE))
{
return(stdEvents[idx])
}
}
## substring matches next
for (idx in 1:length(events))
{
if (grepl(events[idx], eventToMatch))
{
return(stdEvents[idx])
}
}
return("other")
})
stormDF$newEvtype <- as.factor(stormDF$newEvtype)
Storm data documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) lists 3 values (K for thousands, M for millions and B for billions) whereas there are 19 levels in the dataset. Blanks should be fine. Our approach is to keep B/b, M/m and K/k for our calculation. Everything else is ignored
## show the distribution
table(stormDF$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
## create a multiplier column
valK = 10.0 ^ 3
valM = 10.0 ^ 6
valB = 10.0 ^ 9
stormDF$propDmgMultiplier <- apply(stormDF, 1, function(x)
{
exp <- toupper(trimws(x["PROPDMGEXP"]))
if (exp == "K") return (valK)
if (exp == "M") return (valM)
if (exp == "B") return (valB)
return (1.0)
})
## calculate property damage in millions
stormDF$propDmgMill <- apply(stormDF, 1, function(x)
{
damage <- as.numeric(x["PROPDMG"])
multiplier <- as.numeric(x["propDmgMultiplier"])
return (damage * multiplier / valM)
})
Storm data documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) lists 3 values (K for thousands, M for millions and B for billions) whereas there are 9 levels in the dataset. Blanks should be fine. Our approach is to keep B/b, M/m and K/k for our calculation. Everything else is ignored
## show the distribution
table(stormDF$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
## create a multiplier column
stormDF$cropDmgMultiplier <- apply(stormDF, 1, function(x)
{
exp <- toupper(trimws(x["CROPDMGEXP"]))
if (exp == "K") return (valK)
if (exp == "M") return (valM)
if (exp == "B") return (valB)
return (1.0)
})
## calculate property damage in millions
stormDF$cropDmgMill <- apply(stormDF, 1, function(x)
{
damage <- as.numeric(x["CROPDMG"])
multiplier <- as.numeric(x["cropDmgMultiplier"])
return (damage * multiplier / valM)
})
peopleDF <- stormDF %>%
select(FATALITIES, INJURIES, newEvtype) %>%
group_by(newEvtype) %>%
summarize (totalFatalities = sum(FATALITIES), totalInjuries = sum(INJURIES))
## look at fatalities only
topFatalityCutoff <- 100
topFatalities <- peopleDF %>%
select(newEvtype, totalFatalities) %>%
filter(totalFatalities >= topFatalityCutoff) %>%
arrange(desc(totalFatalities))
sortedEvtype <- as.character(topFatalities$newEvtype)
topFatalities <- rename(topFatalities, eventType = newEvtype, fatalities = totalFatalities)
qplot(eventType, data = topFatalities, geom = "bar", weight = fatalities, fill = I("#f03b20"),
xlab = paste0("Event Type (fatalities greater than or equal to ", topFatalityCutoff, ")"),
ylab = "Total Fatalities",
main = "Top Fatalities by Weather Events (US 1950-2011)") +
scale_x_discrete(limits = sortedEvtype) +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))
## diplay all fatalities
allFatalities <- peopleDF %>%
select(newEvtype, totalFatalities) %>%
arrange(desc(totalFatalities))
allFatalities <- rename(allFatalities, event.type = newEvtype, fatalities = totalFatalities)
print.data.frame(allFatalities)
## event.type fatalities
## 1 Tornado 5636
## 2 Excessive Heat 1960
## 3 Heat 1212
## 4 Flash Flood 1063
## 5 High Wind 864
## 6 Lightning 817
## 7 Rip Current 577
## 8 Flood 484
## 9 Cold/Wind Chill 289
## 10 Avalanche 225
## 11 Winter Storm 219
## 12 Thunderstorm Wind 203
## 13 Extreme Cold/Wind Chill 162
## 14 High Surf 146
## 15 Heavy Snow 143
## 16 Hurricane (Typhoon) 133
## 17 Heavy Rain 111
## 18 Strong Wind 111
## 19 Ice Storm 102
## 20 Blizzard 101
## 21 Wildfire 90
## 22 Dense Fog 80
## 23 other 76
## 24 Tropical Storm 66
## 25 Winter Weather 62
## 26 Hail 45
## 27 Landslide 39
## 28 Tsunami 33
## 29 Storm Surge/Tide 24
## 30 Dust Storm 22
## 31 Marine Strong Wind 14
## 32 Marine Thunderstorm Wind 10
## 33 Frost/Freeze 7
## 34 Coastal Flood 6
## 35 Drought 6
## 36 Waterspout 3
## 37 Dust Devil 2
## 38 Sleet 2
## 39 Astronomical Low Tide 0
## 40 Dense Smoke 0
## 41 Freezing Fog 0
## 42 Funnel Cloud 0
## 43 Lake-Effect Snow 0
## 44 Seiche 0
## 45 Tropical Depression 0
## 46 Volcanic Ash 0
## look at injuries only
topInjuryCutoff <- 500
topInjuries <- peopleDF %>%
select(newEvtype, totalInjuries) %>%
filter(totalInjuries >= topInjuryCutoff) %>%
arrange(desc(totalInjuries))
sortedEvtype <- as.character(topInjuries$newEvtype)
topInjuries <- rename(topInjuries, eventType = newEvtype, injuries = totalInjuries)
qplot(eventType, data = topInjuries, geom = "bar", weight = injuries, fill = I("#fc9272"),
xlab = paste0("Event Type (injuries greater than or equal to ", topInjuryCutoff, ")"),
ylab = "Total Injuries",
main = "Top Injuries by Weather Events (US 1950-2011)") +
scale_x_discrete(limits = sortedEvtype) +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5))
## diplay all injury values
allInjuries <- peopleDF %>%
select(newEvtype, totalInjuries) %>%
arrange(desc(totalInjuries))
allInjuries <- rename(allInjuries, event.type = newEvtype, injuries = totalInjuries)
print.data.frame(allInjuries)
## event.type injuries
## 1 Tornado 91407
## 2 High Wind 8615
## 3 Flood 6795
## 4 Excessive Heat 6542
## 5 Lightning 5232
## 6 Heat 2684
## 7 Thunderstorm Wind 2468
## 8 Ice Storm 2154
## 9 Flash Flood 1881
## 10 Wildfire 1606
## 11 Hail 1467
## 12 Winter Storm 1373
## 13 Hurricane (Typhoon) 1333
## 14 Heavy Snow 1086
## 15 Dense Fog 1076
## 16 Blizzard 805
## 17 Winter Weather 615
## 18 Rip Current 529
## 19 Dust Storm 440
## 20 Tropical Storm 383
## 21 Heavy Rain 340
## 22 Strong Wind 301
## 23 other 297
## 24 Extreme Cold/Wind Chill 231
## 25 High Surf 204
## 26 Avalanche 171
## 27 Tsunami 129
## 28 Cold/Wind Chill 85
## 29 Landslide 53
## 30 Dust Devil 43
## 31 Storm Surge/Tide 43
## 32 Frost/Freeze 34
## 33 Waterspout 29
## 34 Marine Thunderstorm Wind 26
## 35 Marine Strong Wind 22
## 36 Drought 19
## 37 Coastal Flood 7
## 38 Funnel Cloud 3
## 39 Astronomical Low Tide 0
## 40 Dense Smoke 0
## 41 Freezing Fog 0
## 42 Lake-Effect Snow 0
## 43 Seiche 0
## 44 Sleet 0
## 45 Tropical Depression 0
## 46 Volcanic Ash 0
## create groupings based on event types
## look at combined property damage and crop damage
economicDF <- stormDF %>%
select(propDmgMill, cropDmgMill, newEvtype) %>%
group_by(newEvtype) %>%
summarize (totalPropDmg = sum(propDmgMill), totalCropDmg = sum(cropDmgMill))
economicDF <- economicDF %>%
mutate(totalEconomicDmg = totalPropDmg + totalCropDmg)
topDamagesCutoff <- 500
topDamages <- economicDF %>%
filter(totalEconomicDmg >= topDamagesCutoff) %>%
arrange(desc(totalEconomicDmg))
sortedEvtype <- as.character(topDamages$newEvtype)
## to enable better presentation, we need to melt the dataframe
## using reshape2 library package
temp <- topDamages %>%
select(newEvtype, totalPropDmg, totalCropDmg)
temp <- rename(temp, eventType = newEvtype, property = totalPropDmg, crop = totalCropDmg)
damages <- melt(temp, id=c("eventType"))
damages <- rename(damages, damageType = variable, damages = value)
qplot(eventType, data = damages, geom = "bar", weight = damages, fill = damageType,
xlab = paste0("Event Type (damages greater than USD ", topDamagesCutoff, "M)"),
ylab = "Total Damages (USD Millions)",
main = "Top Damages by Weather Events (US 1950-2011)") +
scale_x_discrete(limits = sortedEvtype) +
theme(axis.text.x = element_text(angle=90, hjust=1, vjust=0.5)) +
labs(fill="Damage Type") +
guides(fill=guide_legend(reverse=TRUE))
## diplay all the damages
allDamages <- economicDF %>%
arrange(desc(totalEconomicDmg))
allDamages <- rename(allDamages, event.type = newEvtype,
property.damage = totalPropDmg,
crop.damage = totalCropDmg,
total.damage = totalEconomicDmg)
## ALL THE DAMAGES ARE IN USD (MILLIONS)
print.data.frame(allDamages)
## event.type property.damage crop.damage total.damage
## 1 Flood 150205.21668 10847.85595 161053.07263
## 2 Hurricane (Typhoon) 85256.41001 5506.11780 90762.52781
## 3 Tornado 56993.09803 414.96152 57408.05955
## 4 Storm Surge/Tide 47974.14915 0.85500 47975.00415
## 5 Hail 17619.99107 3114.21287 20734.20394
## 6 Flash Flood 16965.21784 1540.68525 18505.90309
## 7 Drought 1046.30600 13972.62178 15018.92778
## 8 High Wind 10666.33960 1273.38025 11939.71985
## 9 Ice Storm 3964.13941 5022.11430 8986.25371
## 10 Wildfire 8491.56350 402.78163 8894.34513
## 11 Tropical Storm 7714.39055 694.89600 8409.28655
## 12 Thunderstorm Wind 6436.56211 652.80039 7089.36250
## 13 Winter Storm 6690.41225 27.44400 6717.85625
## 14 Heavy Rain 3242.60864 795.40980 4038.01844
## 15 Frost/Freeze 19.54120 1997.06100 2016.60220
## 16 Extreme Cold/Wind Chill 67.78740 1312.97300 1380.76040
## 17 Heavy Snow 970.76770 134.66310 1105.43080
## 18 Lightning 933.98495 12.09709 946.08204
## 19 Blizzard 659.91395 112.06000 771.97395
## 20 Excessive Heat 7.75370 492.41200 500.16570
## 21 Heat 12.37205 412.01150 424.38355
## 22 Coastal Flood 417.61606 0.05600 417.67206
## 23 Landslide 324.70100 20.01700 344.71800
## 24 Strong Wind 181.17424 69.95350 251.12774
## 25 other 20.74675 159.28895 180.03570
## 26 Cold/Wind Chill 68.34200 96.79250 165.13450
## 27 Tsunami 144.06200 0.02000 144.08200
## 28 High Surf 100.02500 0.00000 100.02500
## 29 Winter Weather 27.31050 15.00000 42.31050
## 30 Lake-Effect Snow 40.18200 0.00000 40.18200
## 31 Dense Fog 22.82950 0.00000 22.82950
## 32 Waterspout 9.56420 0.00000 9.56420
## 33 Dust Storm 5.59900 3.60000 9.19900
## 34 Avalanche 8.72180 0.00000 8.72180
## 35 Freezing Fog 2.18200 0.00000 2.18200
## 36 Tropical Depression 1.73700 0.00000 1.73700
## 37 Sleet 1.50000 0.00000 1.50000
## 38 Seiche 0.98000 0.00000 0.98000
## 39 Dust Devil 0.71913 0.00000 0.71913
## 40 Volcanic Ash 0.50000 0.00000 0.50000
## 41 Marine Thunderstorm Wind 0.43640 0.05000 0.48640
## 42 Marine Strong Wind 0.41833 0.00000 0.41833
## 43 Astronomical Low Tide 0.32000 0.00000 0.32000
## 44 Funnel Cloud 0.19460 0.00000 0.19460
## 45 Rip Current 0.16300 0.00000 0.16300
## 46 Dense Smoke 0.10000 0.00000 0.10000
## THE END:-)