Synopsis

In this analysis we seek to answer two questions:
1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2) Across the United States, which types of events have the greatest economic consequences?

We produce an RMarkdown file and use the knitr package to create an HTML document displayed on rpubs. This literate statistical programing method enables others to reproduce our work. We create two tables and one plot which help answer the questions above. Specifically, we see that tornados are the largest cause of human health damage (fatalaties and injuries). For property damage, we see that flood damage is the largest contributor.

Data Processing

First, set your working directory. Then download the data file and use readcsv() to read the data. Finally, load the dplyr, knitr and ggplot2 packages.

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./storm_data.csv")
storm_data <- read.csv("./storm_data.csv")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)

Results

We create a new variable called casualties, which is simply the sum of FATALITIES and INJURIES. This is a logical way of answering the question about human health. We create a summary table showing that tornadoes are by far the largest contributor to casualties. We also create a barplot of this same summary.

Secondly, we create a new variable called totalDamage which is the sum of PROPDMG and CROPDMG (property damage and crop damage, respectively), after converting the dollar values using the PROPDMGEXP and CROPDMGEXP variables. This is our method of answering the question about economic loss. We create another summary table showing which types of weather events result in the most property and crop damage.

Since there are many different event types, we isolate the top ten most damaging in both our tables and plots.

# Create table summarizing total casualties by event type
storm_data %>%
        mutate(casualties = FATALITIES + INJURIES) %>%
        group_by(EVTYPE) %>%
        summarize(casualties = sum(casualties)) %>%
        arrange(desc(casualties)) %>%
        head(10) %>%
        kable(digits = 2, caption =
                      "Total casualties by event type.")
Total casualties by event type.
EVTYPE casualties
TORNADO 96979
EXCESSIVE HEAT 8428
TSTM WIND 7461
FLOOD 7259
LIGHTNING 6046
HEAT 3037
FLASH FLOOD 2755
ICE STORM 2064
THUNDERSTORM WIND 1621
WINTER STORM 1527
# Create barplot of the above table
plot_data <- storm_data %>%
        mutate(casualties = FATALITIES + INJURIES) %>%
        group_by(EVTYPE) %>%
        summarize(casualties = sum(casualties)) %>%
        arrange(desc(casualties)) %>%
        head(10) 
p <- ggplot(plot_data, aes(EVTYPE, casualties)) +
        geom_bar(stat = "identity") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
        ggtitle("Total casualties by event type")
p

# Create table summarizing total economic damage by event type
# To do this we must first convert the dollar values using the
# variables with -EXP at the end of them.
storm_data_damage <- storm_data %>% 
        mutate(propDamage = 
                ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000,
                ifelse(PROPDMGEXP == "M", PROPDMG * 1000000,
                ifelse(PROPDMGEXP == "K", PROPDMG * 1000,
                ifelse(PROPDMGEXP == "H", PROPDMG * 100,
                       PROPDMG)))),
               cropDamage = 
                ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000,
                ifelse(CROPDMGEXP == "M", CROPDMG * 1000000,
                ifelse(CROPDMGEXP == "K", CROPDMG * 1000,
                ifelse(CROPDMGEXP == "H", CROPDMG * 100,
                       CROPDMG))))) %>%
        transmute(EVTYPE, FATALITIES, INJURIES, 
                  totalDamage = propDamage + cropDamage) %>%
        group_by(EVTYPE) %>%
        summarize(damage = sum(totalDamage)) %>%
        arrange(desc(damage)) %>%
        head(10) %>%
        kable(digits = 2, caption =
                      "Total damage by event type.")
storm_data_damage
Total damage by event type.
EVTYPE damage
FLOOD 150319678257
HURRICANE/TYPHOON 71913712800
TORNADO 57340614060
STORM SURGE 43323541000
HAIL 18752905438
FLASH FLOOD 17562129167
DROUGHT 15018672000
HURRICANE 14610229010
RIVER FLOOD 10148404500
ICE STORM 8967041360

Conclusion

After summarizing the NOAA data we see that tornados are the largest cause of human health damage (fatalaties and injuries). For property damage, we see that flood damage is the largest contributor.