Summary:

In this report, we will explore and analyze the NOAA Storm Database and answer some basic questions about the impact of storms. We will try to find the events which have the most impact on population health as well as economy.

Synopsis

We will try to answer the following two questions,

To answer the first question we consider the total number of fatalities and injuries for each event. Then we make a bar plot for the top 10 events with the highest fatalities and injuries.

Similarly, to find the answer of the second question we calculated the total damage done by each events and made barplot to find the top 10 most hazardous events for economy.

Data Processing

The storm data can be downloaded from the following link.

Load the data using a temporary connection to the above hyperlink. Name the imported data as AllData. Since the data is big, we will use cache = TRUE in the chunk.

fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
temp <- tempfile()
download.file(url = fileurl, destfile = temp, method = 'curl')
AllData <- read.csv(temp, stringsAsFactors = FALSE)
unlink(temp)

Now, consider the event type which causes most harm to population health. So we consider the two variables “FATALITIES” and “INJURIES”. We first make a data frame with the event type and number of fatalites and made a bar plot with the top 10 highest fatalities. We do the same for injuries.

library(dplyr)
library(ggplot2)
Fatalities <- AllData %>% select( EVTYPE, FATALITIES) %>% 
    group_by(EVTYPE) %>%
    summarise(Fatalities = sum(FATALITIES)) %>% 
    mutate(EVTYPE = reorder(EVTYPE, desc(Fatalities))) %>% 
    arrange(desc(Fatalities))
head(Fatalities)
## Source: local data frame [6 x 2]
## 
##           EVTYPE Fatalities
##           (fctr)      (dbl)
## 1        TORNADO       5633
## 2 EXCESSIVE HEAT       1903
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
## 6      TSTM WIND        504
Injuries <- AllData %>% select( EVTYPE, INJURIES) %>% 
    group_by(EVTYPE) %>%
    summarise(Injuries = sum(INJURIES)) %>% 
    mutate(EVTYPE = reorder(EVTYPE, desc(Injuries))) %>% 
    arrange(desc(Injuries))
head(Injuries)
## Source: local data frame [6 x 2]
## 
##           EVTYPE Injuries
##           (fctr)    (dbl)
## 1        TORNADO    91346
## 2      TSTM WIND     6957
## 3          FLOOD     6789
## 4 EXCESSIVE HEAT     6525
## 5      LIGHTNING     5230
## 6           HEAT     2100

To answer our second question, we need to calculate the total cost of damage caused by each events.

cost <- AllData %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
cost$PROPDMGEXP <- as.factor(cost$PROPDMGEXP)
cost$CROPDMGEXP <- as.factor(cost$CROPDMGEXP)

#check the levels of multiplier forr damage
levels(cost$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(cost$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
#change the multiplier to appropriate number
levels(cost$PROPDMGEXP) <- c(1,1,1,1,1,10,100,1000,10000,100000,1000000,
                             10000000,100000000,1000000000,100,100,1000,
                             1000000,1000000)
levels(cost$CROPDMGEXP) <- c(1,1,1,100,1000000000,1000,1000,1000000,1000000)

#make a data frame with total cost for each event
totalcost <- cost %>% mutate(TotalDamage = PROPDMG * as.numeric(PROPDMGEXP) + 
            CROPDMG * as.numeric(CROPDMGEXP)) %>% 
    select(EVTYPE, TotalDamage) %>% group_by(EVTYPE) %>%
    summarise(Damage = sum(TotalDamage))%>% 
    mutate(EVTYPE = reorder(EVTYPE, desc(Damage))) %>% 
    arrange(desc(Damage))

head(totalcost)
## Source: local data frame [6 x 2]
## 
##              EVTYPE   Damage
##              (fctr)    (dbl)
## 1           TORNADO 13394010
## 2       FLASH FLOOD  6438660
## 3         TSTM WIND  5790410
## 4              HAIL  5114498
## 5             FLOOD  4341956
## 6 THUNDERSTORM WIND  3782310

Results

Here is the bar plot for fatalities.

Fatal <- Fatalities[1:10,] %>% 
    mutate(EVTYPE = reorder(EVTYPE, Fatalities))
ggplot(Fatal, aes(x = EVTYPE, y = Fatalities ))+ 
    geom_bar(stat = "identity", fill = "blue") + 
    coord_flip()+ ggtitle("Total Fatalities for Each Event ")+
    labs(x = "Event Type", y = "Fatalities")

Here is the bar plot for injuries.

Injury <- Injuries[1:10,] %>% 
    mutate(EVTYPE = reorder(EVTYPE, Injuries))
ggplot(Injury, aes(x = EVTYPE, y = Injuries))+ 
    geom_bar(stat = "identity", fill = "blue") + 
    coord_flip() + ggtitle("Total Injuries for Each Event ")+
    labs(x = "Event Type", y = "Injuries")

We can deduce from this data analysis that the most detrimental events for population health are tornadoes, heat, tstm wind and flood.

Now we will draw the bar plot for total cost of damage caused by each event. We will consider the top 10 most devastating event.

TotalCost <- mutate(totalcost[1:10,], EVTYPE = reorder(EVTYPE, Damage))
ggplot(TotalCost, aes(x = EVTYPE, y = Damage))+ 
    geom_bar(stat = "identity", fill = "blue") + 
    coord_flip() + ggtitle("Total Cost for Each Event ")+ 
    labs(x = "Event Type", y = "Total Damage($)")

So the most destructive events for economy are tornado, flash flood and tstm wind.