library("dplyr")
library("sqldf")
library("ggplot2")
library("colorspace")
library("plotrix")

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this poject we have analyzed strom dataset to answer following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data

The data for this analysis come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We can download the file from the course web site:

The documentation of the database available at:

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Data loading and cleaning

#Download the data and save it into the working direcory and set the below path
setwd("G:/vimal/data science/JHU/RepResearch/RR2")

#Read the data and store it into the working data frame
stromdata <- read.csv('repdata-data-StormData.csv.bz2', header = TRUE)
#Extract the interested dat from the data frame and store it into the sd_req
sd_req <- stromdata %>% filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 |
                               PROPDMGEXP > 0 | CROPDMG > 0 | 
                               CROPDMGEXP > 0 ) %>% select(STATE, EVTYPE, FATALITIES, 
                                                           INJURIES, PROPDMG, PROPDMGEXP, 
                                                           CROPDMG,  CROPDMGEXP)
summary(sd_req)
##      STATE                      EVTYPE        FATALITIES   
##  TX     : 22144   TSTM WIND        :63234   Min.   :  0.0  
##  IA     : 16093   THUNDERSTORM WIND:43655   1st Qu.:  0.0  
##  OH     : 13337   TORNADO          :39944   Median :  0.0  
##  MS     : 12023   HAIL             :26130   Mean   :  0.1  
##  GA     : 11207   FLASH FLOOD      :20967   3rd Qu.:  0.0  
##  AL     : 11121   LIGHTNING        :13293   Max.   :583.0  
##  (Other):168708   (Other)          :47410                  
##     INJURIES         PROPDMG       PROPDMGEXP        CROPDMG     
##  Min.   :   0.0   Min.   :   0   K      :231428   Min.   :  0.0  
##  1st Qu.:   0.0   1st Qu.:   2          : 11585   1st Qu.:  0.0  
##  Median :   0.0   Median :   5   M      : 11320   Median :  0.0  
##  Mean   :   0.6   Mean   :  43   0      :   210   Mean   :  5.4  
##  3rd Qu.:   0.0   3rd Qu.:  25   B      :    40   3rd Qu.:  0.0  
##  Max.   :1700.0   Max.   :5000   5      :    18   Max.   :990.0  
##                                  (Other):    32                  
##    CROPDMGEXP    
##         :152664  
##  K      : 99932  
##  M      :  1985  
##  k      :    21  
##  0      :    17  
##  B      :     7  
##  (Other):     7

Results

Q1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question we will consider consider both Injuries and Fatalities. We will sum them up according to the EVTYPE, plot a Barplot of Top 10 Events to get the Most Harmful natural calamity with respect to population health.

#Extacting the Fatalities and Injuries
sd_req <- group_by(sd_req, EVTYPE)
sum_human_ef <- summarise(sd_req, 
                          FATALITIES = sum(FATALITIES), 
                          INJURIES = sum(INJURIES),
                          tot = FATALITIES + INJURIES)

sum_human_ef <- arrange(sum_human_ef, desc(tot), desc(FATALITIES),desc(INJURIES))
#extract only top 10
top10_human_ef <- head(sum_human_ef, n = 10)
top_hev <- as.character(top10_human_ef[1,]$EVTYPE)
top_hval <- as.integer(top10_human_ef[1,]$tot)
#print the barplot
options(scipen=999)
par(las=1) # make label text perpendicular to axis
par(mar=c(6,10,4,2)) # increase y-axis margin.
bplt <- barplot(top10_human_ef$tot, horiz = T, col  = heat.colors(10), names.arg = top10_human_ef$EVTYPE, 
        cex.names=0.8, xlim = c(0,110000), xlab  = 'Number of Fatalities (Deaths + Injuries)',
        main = 'Total Fatalities caused by different Natual Calamities')
text(x=top10_human_ef$tot, y= bplt , labels=as.character(top10_human_ef$tot), pos  = 4)

plot of chunk unnamed-chunk-3

From the above plot it is very clear that, the TORNADO is the most dangerous natural calmity in terms of human health, with 96979 fatalities.

Q2. Across the United States, which types of events have the greatest economic consequences?

To answer this question, we have munch the data a bit. In the original dataset we have two types of economic losses PROPDMG for property damage and CROPDMG for damage to crops. The amount is in not in actual units(USD). The exponents are given separately as PROPDMGEXP and CROPDMGEXP.

So first we will calculated the actual damage in USD, the addup PROPDMG and CROPDMG to get Total Damage. We use Barplot of the top 15 natual calmities to display the most dangerous natual calamityy in terms of economic consequences

#function to return the exponent of given data type
exponent.value <- function (cvec) sapply(cvec, function (c) switch (as.character(c), "B"=1e9, "b" = 1e9, 
                                                                  "M"=1e6, "m" = 1e6, "k" = 1e3, "K"=1e3, 1))

#calculate the actual value
sd_req <- mutate(sd_req, PROPVALNEW  = PROPDMG*exponent.value(PROPDMGEXP),
                 CROPVALNEW=CROPDMG*exponent.value(CROPDMGEXP))

#calculate the sums
sum_eco_ef <- summarise(sd_req, 
                          PROPVAL = sum(PROPVALNEW), 
                          CROPVAL = sum(CROPVALNEW),
                          tot = PROPVAL + CROPVAL)

sum_eco_ef <- arrange(sum_eco_ef, desc(tot), desc(PROPVAL),desc(CROPVAL))

top10_eco_ef <- head(sum_eco_ef, n = 15)
top16 <- top10_eco_ef[15,]
top16$EVTYPE = 'OTHERS'
top16$PROPVAL = sum(sum_eco_ef$PROPVAL) - sum(top10_eco_ef$PROPVAL)
top16$CROPVAL = sum(sum_eco_ef$CROPVAL) - sum(top10_eco_ef$CROPVAL)
top16$tot = sum(sum_eco_ef$tot) -sum(top10_eco_ef$tot)
top16_eco_ef <- rbind(top10_eco_ef,top16)
top10_eco_ef <- top16_eco_ef
top10_eco_ef <- mutate(top10_eco_ef , tot_bn  = round(tot/1000000000, 2))

top_eev <- as.character(top10_eco_ef[1,]$EVTYPE)
top_eval <- as.integer(top10_eco_ef[1,]$tot_bn)

#barplot of for the Natural Calamities
options(scipen=999)
par(las=1) # make label text perpendicular to axis
par(mar=c(6,10,4,2)) # increase y-axis margin.
bplt <- barplot(top10_eco_ef$tot_bn, horiz = T, col  = heat.colors(16), names.arg = top10_eco_ef$EVTYPE, 
                cex.names=0.8, xlab  = 'Quantum of Loss($ in Billion)', 
                main = 'Total Economic Loss caused by different Natual Calamities', xlim = c(0,180))
text(x=top10_eco_ef$tot_bn, y= bplt , labels=as.character(top10_eco_ef$tot_bn), pos  = 4)

plot of chunk unnamed-chunk-4

From the above plot it is very clear that the FLOOD has caused the greatest economic loss, with an amount grater than 150 billion USD.

The same thing can also be shown with a Pie Chart as Follows

pie3D(top10_eco_ef$tot_bn,labels = top10_eco_ef$EVTYPE, explode=0.1,labelcex = 0.6, start = 180, 
      height = .01, theta = .5, main = 'Pie Chart of Proportional Economic Loss due to natural Calamities', radius = 1.5)

plot of chunk unnamed-chunk-5

Conclusion

With the above analysis we can conclude that:

  1. The TORNADO is the most dangerous natural calmity in terms of human health, with 96979 fatalities.
  2. The FLOOD has caused the greatest economic loss, with an amount grater than 150 billion USD.