1. Synopsis

This analysis is a project in the Coursera Reproducible Research course, part of the Data Science Specialization. The goal is to explore the NOAA Storm Database and explore the severe weather events on both population health and economy.

The database covers the time period between 1950 and November 2011. The analysis aims to investigate which different types of sever weather events are most harmful on the populations health in respect of general injuries and fatalities. Further the economic consequences are analyzed by exploring the financial damage done to both general property and crops.

2. Data Processing

The data can be downloaded from the course website: Storm Data. Documentation of the database is available here:

National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ

First libarary the packages needed in our analysis.

library(plyr)
library(ggplot2)
library(magrittr)
library(gridExtra)

Load the data.

StormData <- read.csv("StormData.csv.bz2")

2.1 Population Health

The fatalaties and injuries are summarized with respect to the event types. Then they are sorted decreasingly.

TotalFatalities <- aggregate(FATALITIES ~ EVTYPE, StormClean, sum) %>% arrange(desc(FATALITIES))
TotalInjuries <- aggregate(INJURIES ~ EVTYPE, StormClean, sum) %>% arrange(desc(INJURIES))

2.2 Economic Consequences

According to NATIONAL WEATHER SERVICE INSTRUCTION, the exponents are stored in a seperated column, “K” for thousands, “M” for millions, and “B” for billions. Take a glimps at different values in “PROPDMGEXP” and “CROPDMGEXP”.

unique(StormClean$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(StormClean$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

In order to get the numeric value of the damage, we need to transfer the exponent to numeric value first. Build a function to transfer the exponents according to the levels obtained in the last step.

GetExpValue <- function(x)
{
    if(x=='h' || x=='H')
        return(2)
    else if(x=='k' || x=='K')
        return(3)
    else if(x=='m' || x=='M')
        return(6)
    else if(x=='B')
        return(9)
    if(!is.na(as.numeric(x))) #if x is character, as.numeric(x) returns NA
        return(as.numeric(x))
    else return(0)
}

Then calculate the real damage value.

PropExpValue <- sapply(StormClean$PROPDMGEXP, FUN = GetExpValue)
CropExpValue <- sapply(StormClean$CROPDMGEXP, FUN = GetExpValue)
StormClean$PropDmgVal <- StormClean$PROPDMG * (10 ** PropExpValue)
StormClean$CropDmgVal <- StormClean$CROPDMG * (10 ** CropExpValue)

The Property damage and crop damage value are summarized with respect to the event types. Then they are sorted decreasingly.

PropDmgSorted <- aggregate(PropDmgVal ~ EVTYPE, StormClean, sum) %>% arrange(desc(PropDmgVal))
CropDmgSorted <- aggregate(CropDmgVal ~ EVTYPE, StormClean, sum) %>% arrange(desc(CropDmgVal))

3. Results

List the top 10 events which cause the most severe population fatality and injury respectively, then show the bar chart of the top 10 events. Tornado is the most harmful event with respect to population health, as it has the highest number of both fatality and injury, which are much higher than other events.Excessive Heat also has severe impact on population health.

head(TotalFatalities,10)
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224
head(TotalInjuries,10)
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
g1 <- ggplot(TotalFatalities[1:10,], aes(reorder(EVTYPE, FATALITIES),FATALITIES)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Fatalities") + ggtitle("Top 10 Events of Population Health Impact")
# Event type is ordered by name, so need to reorder it according to FATALITIES

g2 <- ggplot(TotalInjuries[1:10,], aes(reorder(EVTYPE, INJURIES),INJURIES)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Injuries") 
grid.arrange(g1, g2, ncol=1)

Also list top 10 events which has most severe economy consquences, and show the bar plot. Flash FLood has the most property damage, which is much more than any other event. Drought has the most crop damage.

head(PropDmgSorted,10)
##                EVTYPE   PropDmgVal
## 1         FLASH FLOOD 6.820237e+13
## 2  THUNDERSTORM WINDS 2.086532e+13
## 3             TORNADO 1.078951e+12
## 4                HAIL 3.157558e+11
## 5           LIGHTNING 1.729433e+11
## 6               FLOOD 1.446577e+11
## 7   HURRICANE/TYPHOON 6.930584e+10
## 8            FLOODING 5.920826e+10
## 9         STORM SURGE 4.332354e+10
## 10         HEAVY SNOW 1.793259e+10
head(TotalInjuries,10)
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
g3 <- ggplot(PropDmgSorted[1:10,], aes(reorder(EVTYPE, PropDmgVal),PropDmgVal)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Property Damage ($)") + ggtitle("Top 10 Events of Economy Consequence")
g4 <- ggplot(CropDmgSorted[1:10,], aes(reorder(EVTYPE, CropDmgVal),CropDmgVal)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Crop Damage ($)") 
grid.arrange(g3, g4, ncol=1)