Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, crop damage and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, crop damage and property damage.
The analysis will address the following questions.
Here is the summary from this analysis (aiming to project objective only):
Here is the Cookbook for the data, Documentation of the database available, which has descriptions of how some of the variables are constructed/defined.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(knitr)
library(lubridate)
library(dplyr)
library(ggplot2)
library(gridExtra)
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] xtable_1.7-4 gridExtra_2.0.0 ggplot2_1.0.1 dplyr_0.4.3
## [5] lubridate_1.3.3 knitr_1.11
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 colorspace_1.2-6 DBI_0.3.1 digest_0.6.8
## [5] evaluate_0.8 formatR_1.2.1 grid_3.1.2 gtable_0.1.2
## [9] htmltools_0.2.6 magrittr_1.5 MASS_7.3-44 memoise_0.2.1
## [13] munsell_0.4.2 parallel_3.1.2 plyr_1.8.3 proto_0.3-10
## [17] R6_2.1.1 Rcpp_0.12.1 reshape2_1.4.1 rmarkdown_0.8
## [21] scales_0.3.0 stringi_0.5-5 stringr_1.0.0 tools_3.1.2
## [25] yaml_2.1.13
# Download the data with bzfile option if not download yet
if (!file.exists("StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"StormData.csv.bz2")}
# Read the original dataset into memory
stormData <- read.csv(bzfile("StormData.csv.bz2"), header = TRUE, stringsAsFactors = FALSE)
stormData <- tbl_df(data.frame(stormData))
dim_stormData <- dim(stormData)
The dimensions of the orginal dataset equals (902297, 37).
Summary records (which have text “summary” or “Summary”) are removed from the original dataset as they are irrelevant.
Units of crop (CROPDMG) and property (PROPDMG) damage are converted to units of US$1 based on the variables “CROPDMGEXP” and “PROPDMGEXP” respectively. These are text K (or k), M (or m) and B (or b). Any other entries (+, - “”, 0, 1, 2, 3, 4, 5, 6, 7, 8, h, H) are treated as invalid thus removed.
Event types included in the analysis are cleaned up (see Map EVTYPE Variable section) based on the provided documentation (only 48 event types are defined), these should be the only event types allowed.
Calculate and plot the top 10 most Harmful events to population health, in the sense of the mean values and the total values of the sum.
Calculate and plot the top 10 most impact events with economic consequences, in the sense of the mean values and the total values of the sum.
For example, calculate the Harmful data by event in the state of UTAH.
For example, calculate the Economic Consequences data by event in the state of UTAH.
For example, calculate the Harmful and Economic Consequences data by year in the state of Utah.
“EVTYPE” - The type of weather event
“FATALITIES” - The number of deaths directly associated with the weather event.
“INJURIES” - The number of non-fatal injuries directly associated with the weather event.
“PROPDMG” and “PROPDMGEXP” - The dollar cost of Property Damage associated with the weather event. This cost is divided into two variables, PROPDMG is the numeric estimate of the damage and PROPDMGEXP is the units associated with the numeric estimate. The units may be K-thousands, M-millions or B-billions.
“CROPDMG” and “CROPDMGEXP” - The dollar cost of Crop Damage associated with the weather event. This cost is divided into two variables, CROPDMG is the numeric estimate of the damage and CROPDMGEXP is the units associated with the numeric estimate. The units may be K-thousands, M-millions or B-billions.
# Remove records with "summary" or ""Summary" and selecte only variables of interest
stormData <- stormData[!grepl("Summary|summary", stormData$EVTYPE), ]
There are (+, - “”, 0, 1, 2, 3, 4, 5, 6, 7, 8, h, H) in CROPDMGEXP and PROPDMGEXP, which are removed due to uncertainty.
# Convert damage amounts into US$1 units. K/k refers to US$1,000;
# M/m refer to US$1,000,000; B/b refers to US$1,000,000,000.
##
stormDataNA <- stormData %>%
mutate(YEAR_EVENT = year(mdy_hms(BGN_DATE))) %>%
mutate(PROPDMGEXP = ifelse(PROPDMGEXP %in% (c("k","K","m","M","b","B")), PROPDMGEXP, NA)) %>%
mutate(CROPDMGEXP = ifelse(CROPDMGEXP %in% (c("k","K","m","M","b","B")), CROPDMGEXP, NA)) %>%
mutate(propDamage = ifelse(PROPDMGEXP %in% (c("k","K")), PROPDMG * 1000,
ifelse(PROPDMGEXP %in% (c("m","M")), PROPDMG * 1000000,
ifelse(PROPDMGEXP %in% (c("b","B")), PROPDMG * 1000000000, 0)))) %>%
mutate(cropDamage = ifelse(CROPDMGEXP %in% (c("k","K")), CROPDMG * 1000,
ifelse(CROPDMGEXP %in% (c("m","M")), CROPDMG * 1000000,
ifelse(CROPDMGEXP %in% (c("b","B")), CROPDMG * 1000000000, 0)))) %>%
mutate(ECOIMPACT = propDamage + cropDamage) %>%
mutate(HARMFUL = FATALITIES + INJURIES) %>%
select(STATE, YEAR_EVENT, COUNTY, EVTYPE, REFNUM, FATALITIES, INJURIES, HARMFUL, ECOIMPACT)
len <- length(unique(stormDataNA$EVTYPE))
dim_stormDataNA <- dim(stormDataNA)
Now, the dimensions of the processed dataset = (902224, 9).
From the print above, we can see, that before the EVTYPE clean up, there are 921 unique entries in EVTYPE variable. It is not good to do any analysis without proper mapping to defined event type. In the cookbook, Documentation states that there are 48 event typies validated. See the picture here.
Therefore, a new variable called EVTYPE_Clean is created based on the following codes in order to map the 48 defined event types.
options(width = 10000)
# Generate a new variable EVTYPE_Clean
stormDataNA$EVTYPE_Clean <- NA
# Generate valid Event Type as described in Cookbook
stormDataNA$EVTYPE_Clean[grepl("Astronomical Low Tide",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Astronomical Low Tide"
stormDataNA$EVTYPE_Clean[grepl("Avalanche",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Avalanche"
stormDataNA$EVTYPE_Clean[grepl("Blizzard",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Blizzard"
stormDataNA$EVTYPE_Clean[grepl("Coastal Flood",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Coastal Flood"
stormDataNA$EVTYPE_Clean[grepl("Cold/Wind Chill|COLD|WIND CHILL",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Cold/Wind Chill"
stormDataNA$EVTYPE_Clean[grepl("Debris Flow|Landslide",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Debris Flow"
stormDataNA$EVTYPE_Clean[grepl("Dense Fog",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dense Fog"
stormDataNA$EVTYPE_Clean[grepl("Dense Smoke",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dense Smoke"
stormDataNA$EVTYPE_Clean[grepl("Drought",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Drought"
stormDataNA$EVTYPE_Clean[grepl("Dust Devil",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dust Devil"
stormDataNA$EVTYPE_Clean[grepl("Dust Storm",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Dust Storm"
stormDataNA$EVTYPE_Clean[grepl("Excessive Heat",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Excessive Heat"
stormDataNA$EVTYPE_Clean[grepl("Extreme Cold/Wind Chill |Extreme Cold/Wind Chill|EXTREME COLD|WIND CHILL|EXTREME WIND CHILL",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Extreme Cold/Wind Chill "
stormDataNA$EVTYPE_Clean[grepl("Flash Flood",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Flash Flood"
stormDataNA$EVTYPE_Clean[grepl("Flood",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Flood"
stormDataNA$EVTYPE_Clean[grepl("Frost/Freeze|FROST|FREEZE",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Frost/Freeze"
stormDataNA$EVTYPE_Clean[grepl("Funnel Cloud|FUNNEL CLOUD",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Funnel Cloud"
stormDataNA$EVTYPE_Clean[grepl("Freezing Fog|FREEZING FOG",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Freezing Fog"
stormDataNA$EVTYPE_Clean[grepl("Hail|HAIL",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Hail"
stormDataNA$EVTYPE_Clean[grepl("Heat|HEAT",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Heat"
stormDataNA$EVTYPE_Clean[grepl("Heavy Rain|HEAVY RAIN",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)== TRUE] <- "Heavy Rain"
stormDataNA$EVTYPE_Clean[grepl("Heavy Snow|HEAVY SNOW",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Heavy Snow"
stormDataNA$EVTYPE_Clean[grepl("High Surf|HIGH SURF",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "High Surf"
stormDataNA$EVTYPE_Clean[grepl("High Wind|HIGH WIND",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "High Wind"
stormDataNA$EVTYPE_Clean[grepl("Hurricane (Typhoon)|HURRICANE|TYPHOON",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Hurricane (Typhoon)"
stormDataNA$EVTYPE_Clean[grepl("Ice Storm",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Ice Storm"
stormDataNA$EVTYPE_Clean[grepl("Lake-Effect Snow|LAKE-EFFECT SNOW",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Lake-Effect Snow"
stormDataNA$EVTYPE_Clean[grepl("Lakeshore Flood",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Lakeshore Flood"
stormDataNA$EVTYPE_Clean[grepl("Lightning",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Lightning"
stormDataNA$EVTYPE_Clean[grepl("Marine Hail",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine Hail"
stormDataNA$EVTYPE_Clean[grepl("Marine High Wind",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine High Wind"
stormDataNA$EVTYPE_Clean[grepl("Marine Strong Wind",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine Strong Wind"
stormDataNA$EVTYPE_Clean[grepl("Marine Thunderstorm Wind|MARINE TSTM WIND",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Marine Thunderstorm Wind"
stormDataNA$EVTYPE_Clean[grepl("Rip Current",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Rip Current"
stormDataNA$EVTYPE_Clean[grepl("Seiche",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Seiche"
stormDataNA$EVTYPE_Clean[grepl("Sleet",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Sleet"
stormDataNA$EVTYPE_Clean[grepl("Storm Surge/Tide|STORM SURGE|TIDE|STORM TIDE",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Storm Surge/Tide"
stormDataNA$EVTYPE_Clean[grepl("Strong Wind",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Strong Wind"
stormDataNA$EVTYPE_Clean[grepl("Thunderstorm Wind|THUNDERSTORM WIND|THUNDERSTORM WINDS|THUNDERSTORM WINDSS|TSTM WIND",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Thunderstorm Wind"
stormDataNA$EVTYPE_Clean[grepl("Tornado",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tornado"
stormDataNA$EVTYPE_Clean[grepl("Tropical Depression",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tropical Depression"
stormDataNA$EVTYPE_Clean[grepl("Tropical Storm",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tropical Storm"
stormDataNA$EVTYPE_Clean[grepl("Tsunami",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Tsunami"
stormDataNA$EVTYPE_Clean[grepl("Volcanic Ash",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Volcanic Ash"
stormDataNA$EVTYPE_Clean[grepl("Waterspout",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Waterspout"
stormDataNA$EVTYPE_Clean[grepl("Wildfire",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Wildfire"
stormDataNA$EVTYPE_Clean[grepl("Winter Storm",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Winter Storm"
stormDataNA$EVTYPE_Clean[grepl("Winter Weather",
stormDataNA$EVTYPE, ignore.case = TRUE)
& is.na(stormDataNA$EVTYPE_Clean)==TRUE] <- "Winter Weather"
# Records with unmapped EVTYPE are removed
stormDataNA <- stormDataNA[!is.na(stormDataNA$EVTYPE_Clean),]
dim_stormDataNA <- dim(stormDataNA)
len2 <- length(unique(stormDataNA$EVTYPE_Clean))
Now, the dimensions of the processed dataset = (892768, 10).
After the EVTYPE clean up, there are 44 unique entries in EVTYPE_Clean variable.
It is noticed that, in this analysis, EVTYPE have been categorised in the order presented in the script above. Therefore, once an EVTYPE has been categorised, it cannot be categorised again.
The total values of HARMFUL (fatalities + injuries) grouped by event type are calculated. The plot is shown below.
harmfulByEVTYPE <- stormDataNA %>%
group_by(EVTYPE_Clean) %>%
summarise(total_Harmful = sum(HARMFUL),
total_Fatalities = sum(FATALITIES),
total_Injuries = sum(INJURIES)) %>%
top_n(n = 10, wt = total_Harmful)
ggplot(data=harmfulByEVTYPE,
aes(x = reorder(EVTYPE_Clean, -total_Harmful), y = total_Harmful)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Total Harmful") +
ggtitle(expression(atop("Total Harmful (Fatalities + Injuries)",
atop(italic("United States: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
For reference, the “total fatalities” and “total injuries” are also calculated, grouped by event type. The table is shown below.
dfHealth <- harmfulByEVTYPE %>%
arrange(desc(total_Harmful)) %>%
filter((row_number() <= 10)) %>%
select(EVTYPE_Clean, total_Harmful, total_Fatalities, total_Injuries)
kable(x = dfHealth, row.names = TRUE,
col.names = c("Weather Event", "Total Harmful", "Total Fatalities", "Total Injuries" ))
| Weather Event | Total Harmful | Total Fatalities | Total Injuries | |
|---|---|---|---|---|
| 1 | Tornado | 97043 | 5636 | 91407 |
| 2 | Thunderstorm Wind | 10067 | 704 | 9363 |
| 3 | Excessive Heat | 8445 | 1920 | 6525 |
| 4 | Flood | 7279 | 484 | 6795 |
| 5 | Lightning | 6049 | 817 | 5232 |
| 6 | Heat | 3896 | 1212 | 2684 |
| 7 | Flash Flood | 2837 | 1035 | 1802 |
| 8 | Ice Storm | 2079 | 89 | 1990 |
| 9 | High Wind | 1815 | 297 | 1518 |
| 10 | Winter Storm | 1554 | 216 | 1338 |
The mean values of HARMFUL (fatalities + injuries) grouped by event type are calculated and the result is shown as below.
harmfulByEVTYPE <- stormDataNA %>%
group_by(EVTYPE_Clean) %>%
summarise(mean_Harmful = mean(HARMFUL),
mean_Fatalities = mean(FATALITIES),
mean_Injuries = mean(INJURIES)) %>%
top_n(n = 10, wt = mean_Harmful)
ggplot(data=harmfulByEVTYPE,
aes(x = reorder(EVTYPE_Clean, -mean_Harmful), y = mean_Harmful)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Mean Harmful") +
ggtitle(expression(atop("Mean Harmful (Fatalities + Injuries)",
atop(italic("United States: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
dfHealth2 <- harmfulByEVTYPE %>%
arrange(desc(mean_Harmful)) %>%
filter((row_number() <= 10)) %>%
select(EVTYPE_Clean, mean_Harmful, mean_Fatalities, mean_Injuries)
kable(x = dfHealth2, row.names = TRUE, digits = c(0, 3, 3, 3),
col.names = c("Weather Event", "Mean Harmful", "Mean Fatalities", "Mean Injuries"))
| Weather Event | Mean Harmful | Mean Fatalities | Mean Injuries | |
|---|---|---|---|---|
| 1 | Tsunami | 8.100 | 1.650 | 6.450 |
| 2 | Excessive Heat | 5.024 | 1.142 | 3.882 |
| 3 | Hurricane (Typhoon) | 4.919 | 0.446 | 4.473 |
| 4 | Heat | 4.101 | 1.276 | 2.825 |
| 5 | Tornado | 1.599 | 0.093 | 1.506 |
| 6 | Rip Current | 1.423 | 0.743 | 0.681 |
| 7 | Dust Storm | 1.077 | 0.051 | 1.026 |
| 8 | Ice Storm | 1.026 | 0.044 | 0.982 |
| 9 | Avalanche | 1.021 | 0.579 | 0.442 |
| 10 | Marine Strong Wind | 0.750 | 0.292 | 0.458 |
The total values of ECOIMPACT (PROPDMG + CROPDMG) grouped by event type are calculated and the result is shown as below.
ecoimpactByEVTYPE <- stormDataNA %>%
group_by(EVTYPE_Clean) %>%
summarise(total_EcoImpact = sum(ECOIMPACT)/1000000000) %>%
top_n(n = 10, wt = total_EcoImpact)
ggplot(data=ecoimpactByEVTYPE,
aes(x = reorder(EVTYPE_Clean, -total_EcoImpact), y = total_EcoImpact)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Total Economic Consequences") +
ggtitle(expression(atop("Total Economic Consequences (in US$ Billions)",
atop(italic("United States: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
For reference, a table for Total Ecoomic Consequences is also created and shown below.
ecoimpactByEVTYPE <- ecoimpactByEVTYPE %>%
arrange(desc(total_EcoImpact)) %>%
filter((row_number() <= 10)) %>%
select(EVTYPE_Clean, total_EcoImpact)
kable(x = ecoimpactByEVTYPE, row.names = TRUE, digits = c(0, 3),
col.names = c("Weather Event", "Total Economic Consequences (in US$ Billions)"))
| Weather Event | Total Economic Consequences (in US$ Billions) | |
|---|---|---|
| 1 | Flood | 161.053 |
| 2 | Hurricane (Typhoon) | 90.763 |
| 3 | Tornado | 57.408 |
| 4 | Storm Surge/Tide | 47.975 |
| 5 | Hail | 20.734 |
| 6 | Flash Flood | 18.439 |
| 7 | Drought | 15.019 |
| 8 | Thunderstorm Wind | 10.904 |
| 9 | Ice Storm | 8.968 |
| 10 | Tropical Storm | 8.409 |
The mean values of ECOIMPACT (PROPDMG + CROPDMG) grouped by event type are calculated and the result is shown as below.
ecoimpactByEVTYPE <- stormDataNA %>%
group_by(EVTYPE_Clean) %>%
summarise(mean_EcoImpact = mean(ECOIMPACT)/1000000) %>%
top_n(n = 10, wt = mean_EcoImpact)
ggplot(data=ecoimpactByEVTYPE,
aes(x = reorder(EVTYPE_Clean, -mean_EcoImpact), y = mean_EcoImpact)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Mean Economic Consequences") +
ggtitle(expression(atop("Mean Economic Consequences (in US$ Millions)",
atop(italic("United States: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
For reference, a table for Mean Ecoomic Consequences is also created and shown below.
ecoimpactByEVTYPE <- ecoimpactByEVTYPE %>%
arrange(desc(mean_EcoImpact)) %>%
filter((row_number() <= 10)) %>%
select(EVTYPE_Clean, mean_EcoImpact)
kable(x = ecoimpactByEVTYPE, row.names = TRUE, digits = c(0, 3),
col.names = c("Weather Event", "Mean Economic Consequences (in US$ Millions)"))
| Weather Event | Mean Economic Consequences (in US$ Millions) | |
|---|---|---|
| 1 | Hurricane (Typhoon) | 304.572 |
| 2 | Storm Surge/Tide | 92.975 |
| 3 | Tropical Storm | 12.065 |
| 4 | Tsunami | 7.204 |
| 5 | Flood | 6.144 |
| 6 | Drought | 5.979 |
| 7 | Ice Storm | 4.424 |
| 8 | Wildfire | 1.864 |
| 9 | Frost/Freeze | 1.341 |
| 10 | Tornado | 0.946 |
stormDataUtah <- stormDataNA[which(stormDataNA$STATE == "UT"), ]
harmfulByEVTYPEUT <- stormDataUtah %>%
group_by(EVTYPE_Clean) %>%
summarise(total_Harmful_UT = sum(HARMFUL)) %>%
top_n(n = 10, wt = total_Harmful_UT)
plot1 <- ggplot(data=harmfulByEVTYPEUT,
aes(x = reorder(EVTYPE_Clean, -total_Harmful_UT), y = total_Harmful_UT)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Total Harmful in Utah") +
ggtitle(expression(atop("Total Harmful (Fatalities + Injuries)",
atop(italic("Utah: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
harmfulByEVTYPEUT <- stormDataUtah %>%
group_by(EVTYPE_Clean) %>%
summarise(mean_Harmful_UT = mean(HARMFUL)) %>%
top_n(n = 10, wt = mean_Harmful_UT)
plot2 <- ggplot(data=harmfulByEVTYPEUT,
aes(x = reorder(EVTYPE_Clean, -mean_Harmful_UT), y = mean_Harmful_UT)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Mean Harmful in Utah") +
ggtitle(expression(atop("Mean Harmful (Fatalities + Injuries)",
atop(italic("Utah: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot1, plot2, ncol = 2, nrow = 1)
ecoimpactByEVTYPEUT <- stormDataUtah %>%
group_by(EVTYPE_Clean) %>%
summarise(total_EcoImpact_UT = sum(ECOIMPACT)/1000000) %>%
top_n(n = 10, wt = total_EcoImpact_UT)
plot3 <- ggplot(data=ecoimpactByEVTYPEUT,
aes(x = reorder(EVTYPE_Clean, -total_EcoImpact_UT), y = total_EcoImpact_UT)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Total Economic Consequences") +
ggtitle(expression(atop("Total Eco Consequences in US$M",
atop(italic("Utah: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ecoimpactByEVTYPEUT <- stormDataUtah %>%
group_by(EVTYPE_Clean) %>%
summarise(mean_EcoImpact_UT = mean(ECOIMPACT)/1000000) %>%
top_n(n = 10, wt = mean_EcoImpact_UT)
plot4 <- ggplot(data=ecoimpactByEVTYPEUT,
aes(x = reorder(EVTYPE_Clean, -mean_EcoImpact_UT), y = mean_EcoImpact_UT)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Weather Event Type", y="Mean Economic Consequences") +
ggtitle(expression(atop("Mean Eco Consequences in US$M",
atop(italic("Utah: 1950 - 2011"), "")))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot3, plot4, ncol = 2, nrow = 1)
Since the dataset contains no data point about Utah from 1950 - 1979, therefore following charts are adequate to only show the years of 1980 - 2011.
healthUT <- stormDataUtah %>%
group_by(YEAR_EVENT) %>%
summarise(total_health_UT = sum(HARMFUL)) %>%
filter(YEAR_EVENT >= 1980)
plot5 <- ggplot(data=healthUT,
aes(x = YEAR_EVENT, y = total_health_UT)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Year", y="Total Harmful") +
ggtitle(expression(atop("Total Harmful (Fatalities + Injuries)",
atop(italic("Utah: 1980 - 2011"))))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ecoimpactUT <- stormDataUtah %>%
group_by(YEAR_EVENT) %>%
summarise(total_ecoimpact_UT = sum(ECOIMPACT)/1000000) %>%
filter(YEAR_EVENT >= 1980)
plot6 <- ggplot(data=ecoimpactUT,
aes(x = YEAR_EVENT, y = total_ecoimpact_UT)) +
geom_bar(col="white", fill="blue", alpha = 0.85, stat="identity") +
labs(x="Year", y="Total Economic Consequences") +
ggtitle(expression(atop("Total Eco Consequences in US$M",
atop(italic("Utah: 1980 - 2011"))))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot5, plot6, ncol = 1, nrow = 2)
This document has been published on Rpubs.com.