In this analysis we seek to answer two questions:
1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2) Across the United States, which types of events have the greatest economic consequences?
We produce an RMarkdown file and use the knitr package to create an HTML document displayed on rpubs. This literate statistical programing method enables others to reproduce our work. We create two tables and one plot which help answer the questions above. Specifically, we see that tornados are the largest cause of human health damage (fatalaties and injuries). For property damage, we see that flood damage is the largest contributor.
First, set your working directory. Then download the data file and use readcsv() to read the data. Finally, load the dplyr, knitr and ggplot2 packages.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./storm_data.csv")
storm_data <- read.csv("./storm_data.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)
We create a new variable called casualties, which is simply the sum of FATALITIES and INJURIES. This is a logical way of answering the question about human health. We create a summary table showing that tornadoes are by far the largest contributor to casualties. We also create a barplot of this same summary.
Secondly, we create a new variable called totalDamage which is the sum of PROPDMG and CROPDMG (property damage and crop damage, respectively), after converting the dollar values using the PROPDMGEXP and CROPDMGEXP variables. This is our method of answering the question about economic loss. We create another summary table showing which types of weather events result in the most property and crop damage.
Since there are many different event types, we isolate the top ten most damaging in both our tables and plots.
# Create table summarizing total casualties by event type
storm_data %>%
mutate(casualties = FATALITIES + INJURIES) %>%
group_by(EVTYPE) %>%
summarize(casualties = sum(casualties)) %>%
arrange(desc(casualties)) %>%
head(10) %>%
kable(digits = 2, caption =
"Total casualties by event type.")
| EVTYPE | casualties |
|---|---|
| TORNADO | 96979 |
| EXCESSIVE HEAT | 8428 |
| TSTM WIND | 7461 |
| FLOOD | 7259 |
| LIGHTNING | 6046 |
| HEAT | 3037 |
| FLASH FLOOD | 2755 |
| ICE STORM | 2064 |
| THUNDERSTORM WIND | 1621 |
| WINTER STORM | 1527 |
# Create barplot of the above table
plot_data <- storm_data %>%
mutate(casualties = FATALITIES + INJURIES) %>%
group_by(EVTYPE) %>%
summarize(casualties = sum(casualties)) %>%
arrange(desc(casualties)) %>%
head(10)
p <- ggplot(plot_data, aes(EVTYPE, casualties)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Total casualties by event type")
p
# Create table summarizing total economic damage by event type
# To do this we must first convert the dollar values using the
# variables with -EXP at the end of them.
storm_data_damage <- storm_data %>%
mutate(propDamage =
ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000,
ifelse(PROPDMGEXP == "M", PROPDMG * 1000000,
ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
ifelse(PROPDMGEXP == "H", PROPDMG * 100,
PROPDMG)))),
cropDamage =
ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000,
ifelse(CROPDMGEXP == "M", CROPDMG * 1000000,
ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
ifelse(CROPDMGEXP == "H", CROPDMG * 100,
CROPDMG))))) %>%
transmute(EVTYPE, FATALITIES, INJURIES,
totalDamage = propDamage + cropDamage) %>%
group_by(EVTYPE) %>%
summarize(damage = sum(totalDamage)) %>%
arrange(desc(damage)) %>%
head(10) %>%
kable(digits = 2, caption =
"Total damage by event type.")
storm_data_damage
| EVTYPE | damage |
|---|---|
| FLOOD | 150319678257 |
| HURRICANE/TYPHOON | 71913712800 |
| TORNADO | 57340614060 |
| STORM SURGE | 43323541000 |
| HAIL | 18752905438 |
| FLASH FLOOD | 17562129167 |
| DROUGHT | 15018672000 |
| HURRICANE | 14610229010 |
| RIVER FLOOD | 10148404500 |
| ICE STORM | 8967041360 |
After summarizing the NOAA data we see that tornados are the largest cause of human health damage (fatalaties and injuries). For property damage, we see that flood damage is the largest contributor.