Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data will be downloaded from the website and analyzed to see which weather event is the most harmful to people from 2 aspects: influence to human health, which is measured by the number of injuries and fatality, and economic consequences, which is measured using property damage and crop damage.

Data Processing

The storm data is downloaded and unzipped under directory. Read the original data in to dataset:

Dataset<-read.csv("repdata%2Fdata%2FStormData.csv")

1. Across the United States, which types of events are most harmful with respect to population health?

The harm to pulation health is measured by number of people injured and fatality caused by each event. Thus, the total number of injuries and fatalities caused by each weather event need to be aggregated.

library(dplyr)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.5.2
# Aggregate th number of injuries and fatalities by type of event, respectively
Injuries<-aggregate(INJURIES~EVTYPE, Dataset, sum)
Fatalities<-aggregate(FATALITIES~EVTYPE, Dataset, sum)
# Sort two datasets by INJURIES and FATALITIES in decreasing order and plot the first 6 rows of each dataset
InjuryRank<-Injuries[order(Injuries$INJURIES, decreasing = TRUE),]
FatalityRank<-Fatalities[order(Fatalities$FATALITIES, decreasing = TRUE),]

Now, Create bar chart for each dataset for comparision:

library(ggplot2)
pI<-ggplot(data = head(InjuryRank), aes(EVTYPE, INJURIES))+geom_bar(stat = "identity", width = 0.75, alpha = 0.75, fill = "blue")+ggtitle("Injuries by Event")
pF<-ggplot(data = head(FatalityRank), aes(EVTYPE, FATALITIES))+geom_bar(stat = "identity", width = 0.75, alpha = 0.75, fill = "blue")+ggtitle("Fatalities by Event")
grid.arrange(pI, pF)

The plots aboves shows that obviously TORNADO is the most harmful disaster to human health since it caused the most injuries and fatalities than any other events.

2. Across the United States, which types of events have the greatest economic consequences?

From the dataset we found 4 variables related to economic consequence - POROPDMG, PROPDMGEXP, CROPDMG and CROPDAMAGEEXP. PROPDMG and CROPDMG show the number of damage cost and the documentation of storm data shows that Variable PROPDMGEXP and CROPDMGEXP signify the magnitude of damage amounts:

“K” for thousands; “M” for millions; “B” for billions.

The amount of each damage need to be unified first before they can be added up.

# Create a new dataset containing all variables related to damage
DMG<-select(Dataset, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Check unique values in column PROPDMGEXP and CROPDMGEXP
unique(DMG$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(DMG$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

From the result above we learned that except for “K”“M”“B” mentioned in the documentation, there are also other values such as lowercase letters, numbers and NA in these columns. We need to take them all into consideration to make sure that we can get the correct amount for each damage.

# Calculate actual amount of damage based on magnitude for PROPERTY DAMAGE and CROP DAMAGE individually and input values into new columns
DMG$PROPAMT<-case_when(
    DMG$PROPDMGEXP %in% c("h", "H")~ DMG$PROPDMG*10^2,
    DMG$PROPDMGEXP %in% c("k", "K")~ DMG$PROPDMG*10^3,
    DMG$PROPDMGEXP %in% c("m", "M")~ DMG$PROPDMG*10^6,
    DMG$PROPDMGEXP %in% c("b", "B")~ DMG$PROPDMG*10^9,
    !(DMG$PROPDMGEXP %in% c("h", "H", "b", "B","k", "K","m", "M"))~DMG$PROPDMG*1)

DMG$CROPAMT<-case_when(
    DMG$CROPDMGEXP %in% c("k", "K")~ DMG$CROPDMG*10^3,
    DMG$CROPDMGEXP %in% c("m", "M")~ DMG$CROPDMG*10^6,
    DMG$CROPDMGEXP %in% c("b", "B")~ DMG$CROPDMG*10^9,
    TRUE ~ 0,
    !(DMG$PROPDMGEXP %in% c("b", "B","k", "K","m", "M"))~DMG$CROPDMG*1)
# Add property damage and crop damage together as total damange and aggregate by damage type
TOTALDMG<-setNames(aggregate(PROPAMT+CROPAMT~EVTYPE, DMG, sum),c("EVTYPE", "TOTAL_DAMAGE"))

Now we have a new dataset TOTALDMG with all damages catagorized. THe next step is to find out which event causes the most damage and plot a barchart to show the result.

# Sort total damage in descending order and get the highest 6 values
DMGRank<-head(TOTALDMG[order(TOTALDMG$TOTAL_DAMAGE, decreasing = TRUE),])
# Create bar chart for damage rank
pD<-ggplot(data = DMGRank[order(DMGRank$TOTAL_DAMAGE, decreasing = TRUE),], aes(x = reorder(EVTYPE, -TOTAL_DAMAGE, decreasing = TRUE), y = TOTAL_DAMAGE))+
        geom_bar(stat = "identity",width = 0.75, alpha = 0.75, fill = "blue")+
        labs(x = "Event Type", y = "Total Damage") + 
        ggtitle("Economic Consequences by Event Type")
pD

The above figure shows that FLOOD cause the most damage amoung all weather events.

Results

After importing, transfroming and analyzing the data, we come up with 2 conclusions:

  1. TORNADO is most harmful to POPULATION HEALTH;

  2. FLOOD have the greatest ECONOMIC CONSEQUENCE