SYNOPSIS

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report explore U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and address the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?.

DATA PROCESSING

The data come in the form of a comma-separated-value file.
Data will be downloaded from: Storm Data [47Mb].

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

From url: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, the raw data will be downloaded as StormData.csv.bz2

Loading Data

setwd("C:/Users/irman.zulkeflie/Documents")

if(!file.exists("StormData.csv.bz2")) {
  Original_Data_URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(Original_Data_URL, destfile="StormData.csv.bz2")
}

Reading Data & Filter Useful Fields

library(dplyr)

Stormdata <- read.csv("StormData.csv.bz2", stringsAsFactors=F)

#check number's of row and variable
dim(Stormdata)
## [1] 902297     37
# Filter Raw Data: EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP

subset.storm <- Stormdata %>%
  select(STATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
#display 6 row form subset.storm
head(subset.storm)
##   STATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1    AL TORNADO          0       15    25.0          K       0           
## 2    AL TORNADO          0        0     2.5          K       0           
## 3    AL TORNADO          0        2    25.0          K       0           
## 4    AL TORNADO          0        2     2.5          K       0           
## 5    AL TORNADO          0        2     2.5          K       0           
## 6    AL TORNADO          0        6     2.5          K       0

RESULT

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

There are two measurements in the dataset can reflect the degree of harmfulness of a type of event with respect to population health: fatalities and injuries.
Thus, sum them up over types of events to find out the most harmful type of event.

# Fatalities Category
fatalData <- aggregate(FATALITIES ~ EVTYPE, data = subset.storm, FUN = sum)
table(fatalData$FATALITIES)
## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##  817   51   17   15    8    8    3    6    2    1    2    2    1    1    3 
##   15   17   18   19   22   23   25   28   29   33   35   38   42   58   61 
##    1    3    1    1    1    1    1    2    1    2    2    1    1    1    1 
##   62   64   75   89   95   96   98  101  103  125  127  133  160  172  204 
##    1    2    1    1    1    1    1    2    1    1    1    1    1    1    1 
##  206  224  248  368  470  504  816  937  978 1903 5633 
##    1    1    1    1    1    1    1    1    1    1    1
# Injuries Category
InjureData <- aggregate(INJURIES ~ EVTYPE, data = subset.storm, FUN = sum)
table(InjureData$INJURIES)
## 
##     0     1     2     3     4     5     6     7     8    10    12    13 
##   827    34    18     3     7     6     1     1     4     3     2     1 
##    15    16    17    20    21    22    23    24    26    27    28    29 
##     5     1     2     1     2     1     1     2     2     1     1     2 
##    31    35    36    38    40    42    43    46    48    50    52    68 
##     1     1     1     1     1     2     1     1     2     1     1     1 
##    70    72    77    79    86    95   129   137   150   152   155   170 
##     1     1     1     1     1     1     1     1     1     1     1     1 
##   216   231   232   251   280   297   302   309   340   342   398   440 
##     1     1     1     1     1     1     1     1     1     1     1     1 
##   545   734   805   908   911  1021  1137  1275  1321  1361  1488  1777 
##     1     1     1     1     1     1     1     1     1     1     1     1 
##  1975  2100  5230  6525  6789  6957 91346 
##     1     1     1     1     1     1     1
# From two table above, plot the top 5 harmful event for injured and fatal category with reference to population health
library(ggplot2)

PlotFatal <- fatalData[order(fatalData$FATALITIES, decreasing = T), ]
PlotInjured <- InjureData[order(InjureData$INJURIES, decreasing = T), ]

# From two table above, plot the top 5 harmful event for injured and fatal category with reference to population health
# Plot top 5 fatalities per event type
ggplot(PlotFatal[1:5, ], aes(EVTYPE, FATALITIES)) + geom_bar(stat = "identity") + 
  ylab("Number Of Fatalities") + xlab("Event") + ggtitle("Numbers Of Fatalities Per Events Across the U.S")

# Plot top 5 Injured per event type
ggplot(PlotInjured[1:5, ], aes(EVTYPE, INJURIES)) + geom_bar(stat = "identity") + 
  ylab("Number Of Injured") + xlab("Event") + ggtitle("Numbers Of Injured Per Events Across the U.S")

Tornado is the most harmful event type as shown in the above two bar charts, which has led to 5633 deaths and 91346 injuries from year 1950 to November 2011.

2. Across the United States, which types of events have the greatest economic consequences?

As we can see, there are two damages PROPDMG and CROPDMG. Since the damage costs are reported in two separate columns, a damage and damage exponent column, create a separate columns in the dataframe to assign the PROPDMGEXP and CROPDMGEXP to the correct value.
However, it does not account for the characters like “-”, “+” or “?” and the numbers like “1”, “2”, etc.

subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% "B"] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% "B"] * 1000
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("M", "m")] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("M", "m")] * 1
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("K")] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("K")] * 0.001
subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("H", "h")] <- subset.storm$PROPDMG[subset.storm$PROPDMGEXP %in% c("H", "h")] * 1e-04
subset.storm$PROPDMG[!(subset.storm$PROPDMGEXP %in% c("B", "M", "m", "K", "H", "h"))] <- subset.storm$PROPDMG[!(subset.storm$PROPDMGEXP %in% c("B", "M", "m", "K", "H", "h"))] * 1e-06

subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% "B"] <- subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% "B"] * 1000
subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("M", "m")] <- subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("M", "m")] * 1
subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("K", "k")] <- subset.storm$CROPDMG[subset.storm$CROPDMGEXP %in% c("K", "k")] * 0.001
subset.storm$CROPDMG[!(subset.storm$CROPDMGEXP %in% c("B", "M", "m", "K", "k"))] <- subset.storm$CROPDMG[!(subset.storm$CROPDMGEXP %in% c("B", "M", "m", "K", "k"))] * 1e-06

Calculated the total damage by adding all property damages and crop damage for the events.
Then, visualize the top five events.

EcoConsDmg <- subset.storm$PROPDMG + subset.storm$CROPDMG
EcoCons <- aggregate(EcoConsDmg ~ subset.storm$EVTYPE, FUN = sum)
PlotEcoCons <- EcoCons[order(EcoCons$EcoConsDmg, decreasing = T), ]
names(PlotEcoCons)[1] <- "EVTYPE"

ggplot(PlotEcoCons[1:5, ], aes(EVTYPE, EcoConsDmg)) + geom_bar(stat = "identity") + ylab("Economic Damages (million dollars)") + 
  xlab("Event") + ggtitle("Top Five Events Causing Economic Damages Across the U.S")

From the diagram shows that flood damage causes the highest damage.

Conclusion

The results show that, from year 1950 to November 2011, tornados are most harmful for population health and floods have the greatest economic losses.