Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We would try to answer these two questions in this report:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Loading and processing the data

# load necessary packages
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("ggplot2")

# download files
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./repdata_data_StormData.csv.bz2")

d <- read.csv("./repdata_data_StormData.csv.bz2", na.strings = "NA")
# clean data
d_sub <- d %>%
        select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
        mutate(
                PROPDMGEXP = toupper(PROPDMGEXP),
                CROPDMGEXP = toupper(CROPDMGEXP),
                PROPDMGEXP=replace(PROPDMGEXP, !PROPDMGEXP %in% c("M","B","K", "H"), "NA"),
                CROPDMGEXP=replace(CROPDMGEXP, !CROPDMGEXP %in% c("M","B","K", "H"), "NA")
        )

# calculate property, crop, and the total economic losses
lev <- c("M","B","K", "H", "NA")
lab <- c(1000000,1000000000,1000, 100, 0)
d_sub$PROPDMGEXP <- as.numeric(as.character(factor(d_sub$PROPDMGEXP, lev, lab)))
d_sub$CROPDMGEXP <- as.numeric(as.character(factor(d_sub$CROPDMGEXP, lev, lab)))

d_sub <- d_sub %>%
        mutate(
                PROPDMG = PROPDMG * PROPDMGEXP / 1000000000,
                CROPDMG = CROPDMG * CROPDMGEXP / 1000000000,
                ECODMG = CROPDMG + PROPDMG 
        ) %>%
        select(EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, ECODMG)

# encode EVTYPE as a factor
d_sub$EVTYPE <- factor(d_sub$EVTYPE)

# sum FATALITIES, INJURIES, ECODMG by EVTYPE
d_agg <- d_sub %>%
        group_by(EVTYPE) %>%
        summarise(
                FATALITIES = sum(FATALITIES),
                INJURIES = sum(INJURIES),
                PROPDMG = sum(PROPDMG),
                CROPDMG = sum(CROPDMG),
                ECODMG = sum(ECODMG)
        )

Data analysis

In this section, we would try to answer the following questions:

1. Across the United States, which types of events are most harmful with respect to population health?

We would calculate the total number of fatalities and injuries separately, and present the top five harmful weather events.

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("ggplot2")
library("gridExtra")
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
# calculate the top five events that cause most fatalities
d_fatalities_top_5 <- d_agg %>% 
        select(EVTYPE, FATALITIES) %>% 
        slice_max(n = 5, order_by = FATALITIES)

b1 <- ggplot(d_fatalities_top_5, aes(EVTYPE, FATALITIES, fill = EVTYPE)) +
        geom_bar(stat="identity") + 
        labs(x = "Event type", y = "Fatalities", fill = "Event type") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = 45, hjust = 1)
              )
        

# calculate the top five events that cause most injuries
d_injuries_top_5 <- d_agg %>% 
        select(EVTYPE, INJURIES) %>% 
        slice_max(n = 5, order_by = INJURIES)

b2 <- ggplot(d_injuries_top_5, aes(EVTYPE, INJURIES, fill = EVTYPE)) + 
        geom_bar(stat="identity") +
        labs(x = "Event type", y = "Injuries", fill = "Event type") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = 45, hjust = 1)
              )

grid.arrange(b1, b2, nrow = 1, top = "Top five weather events with the most fatalities and injuries")

From the result we could conclude that tornadoes caused the most fatalities and injuries in the U.S..

2. Across the United States, which types of events have the greatest economic consequences?

We would calculate property, crop and economic losses in billions of dollars, and present the top five harmful weather events.

# calculate the top five events that cause greatest property damage
d_propdmg_top_5 <- d_agg %>% 
        select(EVTYPE, PROPDMG) %>% 
        slice_max(n = 5, order_by = PROPDMG)

b3<- ggplot(d_propdmg_top_5, aes(EVTYPE, PROPDMG, fill= EVTYPE)) + 
        geom_bar(stat="identity") +
        labs(x = "Event type", y = "Property damage", fill = "Event type") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = 45, hjust = 1)
              )

# calculate the top five events that cause greatest crop damage
d_cropdmg_top_5 <- d_agg %>% 
        select(EVTYPE, CROPDMG) %>% 
        slice_max(n = 5, order_by = CROPDMG)

b4<- ggplot(d_cropdmg_top_5, aes(EVTYPE, CROPDMG, fill= EVTYPE)) + 
        geom_bar(stat="identity") +
        labs(x = "Event type", y = "Crop damage", fill = "Event type") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = 45, hjust = 1)
              )

d_ecodmg_top_5 <- d_agg %>% 
        select(EVTYPE, ECODMG) %>% 
        slice_max(n = 5, order_by = ECODMG)

# calculate the top five events that cause greatest economic damage
b5<- ggplot(d_ecodmg_top_5, aes(EVTYPE, ECODMG, fill= EVTYPE)) + 
        geom_bar(stat="identity") +
        labs(x = "Event type", y = "Economic damage", fill = "Event type") +
        theme(legend.position = "none",
              axis.text.x = element_text(angle = 45, hjust = 1)
              )

grid.arrange(b3, b4, b5, nrow = 1, top = "Top five weather events with the greatest property, crop and economic damage") 

From the result we could conclude that in the U.S. floods caused the greatest property damage, droughts caused the greatest crop damage. If we consider property and crop damage together, floods caused the greatest economic damage in the U.S..

Results

tornado_fatalities <- d_fatalities_top_5[d_fatalities_top_5$EVTYPE == "TORNADO", 2]
tornado_injuries <- d_injuries_top_5[d_injuries_top_5$EVTYPE == "TORNADO", 2]
tornado_ecodmg <- format(round(as.numeric(d_ecodmg_top_5[d_ecodmg_top_5$EVTYPE == "FLOOD", 2]), 2), nsmall = 2) 

In this report, we have analyzed the top five weather events with the most fatalities, injuries, property damage and crop damage in the U.S. from 1950 to November 2011. The result shows that tornadoes have caused 5633 fatalities and 9.1346^{4} injuries, which are the most harmful weather to population health.

Floods caused the greatest economic consequences (lost around 150.32 billions dollar) in the U.S..