Synopsis

This is a report based on an analysis of the NOAA Storm database, addressing some basic questions about severe weather events such as,

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The following depcits the extensive use of the “dplyr” to split the data and compute appropriate statistics required to answer the above two questions.

It is to be noted that the document is self explanatory owing to the descriptive comments made at each step.

Data Processing

Data input

input <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Estimate of damage to health

Tidying data to obtain deaths/injuries in each event

tidy <- input[,c('EVTYPE','FATALITIES','INJURIES')]

tidy_t <- tidy %>%
  mutate(health = FATALITIES + INJURIES)%>%
  group_by(EVTYPE)%>%
  summarise(deaths = sum(FATALITIES, na.rm = TRUE), injuries = sum(INJURIES, na.rm = TRUE), total.damage = sum(health, na.rm = TRUE)) %>%
  arrange(desc(total.damage), desc(deaths), desc(injuries)) %>%
  top_n(10)
## Selecting by total.damage

Plot of the total damage amongst the top 10 most harmful events to human life and health

library(ggplot2)
ggplot(tidy_t, aes(x = reorder(EVTYPE, -total.damage), y = total.damage, fill = deaths)) + 
    geom_bar(stat = "identity") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damage") + ggtitle("Most harmful weather events")

From the above plot, it is seen that “Tornadoes” cause the most damage (total number of deaths combined with injuries, about 100,000), and a large number of deaths as well (approximately 5,000).

Estimate of damage to health

Tidying data to obtain total property and crop damage in each event

Firstly, the values of damages in “dollars” are stored in denominations of hundreds, thousands, millions and billions. Therefore We multiply each value with its corresponding denominations.

tidy_p <- input[,c('EVTYPE','PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
 

tidy_tp <- tidy_p %>%
  mutate(denom.prop = ifelse(PROPDMGEXP == "H", 100, 
                        ifelse(PROPDMGEXP == "K", 1000,
                               ifelse(PROPDMGEXP == "M", 10^6, 
                                      ifelse(PROPDMGEXP == "B", 10^9, 0))))) %>%
  mutate(denom.crop = ifelse(CROPDMGEXP == "H", 100, 
                        ifelse(CROPDMGEXP == "K", 1000,
                               ifelse(CROPDMGEXP == "M", 10^6, 
                                      ifelse(PROPDMGEXP == "B", 10^9, 0)))))%>%
  mutate(value.prop = (PROPDMG*denom.prop)/10^9, value.crop = (CROPDMG*denom.crop)/10^9) %>%
  group_by(EVTYPE)%>%
  summarise(Property.Damage = sum(value.prop, na.rm = TRUE), Crop.Damage = sum(value.crop, na.rm = TRUE), Total.Damage = sum(Property.Damage, Crop.Damage)) %>%
  arrange(desc(Total.Damage)) %>%
  top_n(10)
## Selecting by Total.Damage

Plot of the total damage amongst the top 10 most harmful events to crops and tangible property

library(ggplot2)
ggplot(tidy_tp, aes(x = reorder(EVTYPE, -Total.Damage), y = Total.Damage, fill = Property.Damage)) + 
    geom_bar(stat = "identity") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damage (in billion dollars)") + ggtitle("Most harmful events, and each event's contribution to damage of property \n (in billions of dollars)")

From the above plot, it is seen that “FLOODS” cause the highest “NET damage”, and the highest damage to “tangible property”.

ggplot(tidy_tp, aes(x = reorder(EVTYPE, -Total.Damage), y = Total.Damage, fill = Crop.Damage)) + 
    geom_bar(stat = "identity") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damage (in billion dollars)") + ggtitle("Most harmful events, and each event's contribution to crop damage \n (in billions of dollars)")

This plot shows that, while “FLOODS” cause the highest net damage, most damage to “agriculture (crops)” is caused by “DROUGHT”.

Results

The conclusions made from each of the plots above is restated here: