Synopsis

In this report, we analyze the population health and economic damages caused by storm in US. The data is provided by NOAA storm database, which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage from 1950 till now. More specifically, we address the questions that which type of events causes the greatest human health and economic damage.

Data processing

First we read the .csv data file into a dataframe.

stormData <- read.csv("~/Notes/reproducibe/repdata-data-StormData.csv")

Show the first several rows of the data.

head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

To manipulate the data, we need the dplyr package.

library(dplyr)

To study the harmfulness of human health, we first sum the number of injuries and fatalities into new column “HARM”. Then we create a subset of the full data only including the “EVTYPE” and “HARM”,

healthData <- stormData %>%
    mutate(HARM = INJURIES + FATALITIES) %>%
    select(EVTYPE, HARM)
head(healthData)
##    EVTYPE HARM
## 1 TORNADO   15
## 2 TORNADO    0
## 3 TORNADO    2
## 4 TORNADO    2
## 5 TORNADO    2
## 6 TORNADO    6

as shown above.
To reflect the level of harmfulness, we need to sum the total number of injuries for each type of event.

healthSum <- healthData %>% 
    group_by(EVTYPE) %>%
    summarise(tot_harm = sum(HARM, na.rm = TRUE)) %>%
    arrange(desc(tot_harm))
head(healthSum)
## Source: local data frame [6 x 2]
## 
##           EVTYPE tot_harm
## 1        TORNADO    96979
## 2 EXCESSIVE HEAT     8428
## 3      TSTM WIND     7461
## 4          FLOOD     7259
## 5      LIGHTNING     6046
## 6           HEAT     3037

At the last step we rearrange the order according to the total number of harms. The other processes are similar to what we did for question 1.

To study the economic loss, we need to sum both property damage and crop damage.

econData <- stormData %>%
    mutate(DMG = PROPDMG + CROPDMG) %>%
    select(EVTYPE, DMG)

econSum <- econData %>% 
    group_by(EVTYPE) %>%
    summarise(tot_DMG = sum(DMG, na.rm = TRUE)) %>%
    arrange(desc(tot_DMG))
head(econSum)
## Source: local data frame [6 x 2]
## 
##              EVTYPE   tot_DMG
## 1           TORNADO 3312276.7
## 2       FLASH FLOOD 1599325.1
## 3         TSTM WIND 1445168.2
## 4              HAIL 1268289.7
## 5             FLOOD 1067976.4
## 6 THUNDERSTORM WIND  943635.6

Results

  1. Across the United States, which types of events are most harmful with respect to population health?

As we can see from dataframe healthSum, tornado, which causes 91346 injuries since 1950 (recorded), is most harmful with respect to human health. To make the results intuitively, we plot the total number of injuries for the top 5 types of events.

library(ggplot2)
ggplot(data = healthSum[1:5,], aes(x = EVTYPE, y = tot_harm)) +
    geom_bar(stat = "identity", aes(fill = EVTYPE)) +
    ylab("Total number of reported injuries and fatalities") + 
    xlab("Type of events")+
    ggtitle("Top 5 types of events being harmful to human health")

As we can see, the number of reported injures caused by tornade is significantly higher than other type of events.

  1. Across the United States, which types of events have the greatest economic consequences?

As we can see from the dataframe econSum, again tornado causes the greateset economic loss since 1950. Again we plot the top 5 types of events ranked by the economic loss induced.

ggplot(data = econSum[1:5,], aes(x = EVTYPE, y = tot_DMG)) +
    geom_bar(stat = "identity", aes(fill = EVTYPE)) +
    ylab("Total amount of damage to properites and crops") + 
    xlab("Type of events")+
    ggtitle("Top 5 types of events having the greatest economic consequences")

As we can see, compared to the human health damage, the differences between the tornade and other top types for economic damage are less significant.

Summary

In summary, across the US, tornado is thetype of events which is most harmful to population health and have greatest economic consequence.