Synopsis

The following analysis uses weather data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine the weather events which are most harmful to the population health, and economy respectively.
Information about the data set is given by:

Further more explanation about the data set is can be obtained from resources available at this link
The analysis performed below shows that Tornado are the biggest danger to public health, while Floods have the greatest economic impact.


Data Processing

To find the weather event which has the most adverse effect to the economy and population health from the given data, we perform the following steps sequentially:

  1. Check if the .csv file is in the directory, if not download the file from the following location.

  2. Load only those columns which are required for the analysis. These are:
    • EVTYPE type of event (factor)
    • FATALITIES number of person directly killed due to the event (numeric)
    • INJURIES number of person injured due to the event (numeric)
    • PROPDMG amount of the property damage caused by the event (numeric)
    • PROPDMGEXP multiplier to the PROPDMG to compute the exact value of property damage (factor)
    • CROPDMG amount of the crops damaged by the event (numeric)
    • CROPDMGEXP same as PROPDMGEXP, but used the compute the crop damage, so to be multiplies with CROPDMG (factor)
  3. Computing the total property and crop damage due to the weather event and storing the new variable in columns PROPDMGT and CROPDMGT.

  4. The data is grouped by EVTYPE values using the dplyr package and the original property and crop damage variables can be dropped now.

At this moment, our data has been properly cleaned for the further analysis.

library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.2
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
colClass <- c(rep('NULL', 7), 'character', rep('NULL', 14), rep('numeric', 3), 'character', 'numeric', 'character', rep('NULL', 9))

if(!file.exists('storm_data.csv.bz2')){
  download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', 'storm_data.csv.bz2')
}
storm_data <- read.csv(bzfile('storm_data.csv.bz2'), colClasses = colClass)
#Now for some editing
storm_data$EVTYPE <- factor(storm_data$EVTYPE)
storm_data$EVTYPE <- tolower(storm_data$EVTYPE)

storm_data$PROPDMGEXP <- tolower(storm_data$PROPDMGEXP)
storm_data$PROPDMGEXP <- factor(storm_data$PROPDMGEXP, levels = c('h', 'k', 'm', 'b'))

storm_data$CROPDMGEXP <- tolower(storm_data$CROPDMGEXP)
storm_data$CROPDMGEXP <- factor(storm_data$CROPDMGEXP, levels = c('h', 'k', 'm', 'b'))

multiplier <- c('h' = 100, 'k' = 1000, 'm' = 1e+06, 'b' = 1e+09)

storm_data <- mutate(storm_data, PROPDMGT = PROPDMG*multiplier[PROPDMGEXP])
storm_data <- mutate(storm_data, CROPDMGT = CROPDMG*multiplier[CROPDMGEXP])

storm_data <- select(storm_data, -(PROPDMG:CROPDMGEXP))

Results

The data set is grouped as per the event type value given for each record. Since there are a number of distinct events, many without any significant impact on either economy or population health, we will be concentrating only the top 10 most adverse weather events.

Which types of events are most harmful with respect to population health?

The quantitative measure of harm caused to population health by an event is computed here by adding the number of Fatalities and Injuries, both given equal weight. Since there is such a stark difference between these values given by each event, we store the log_10 of the value obtained after addition. This value is stored in another column called CASUALTY.
The barplot given below shows the CASUALTY values of the 10 most harmful events.

storm_data <- group_by(storm_data, EVTYPE)
storm_data.health <- summarise(storm_data, CASUALTY = log10(sum(FATALITIES, INJURIES)))
storm_data.health <- arrange(storm_data.health, desc(CASUALTY))
#We will only concentrate on the top 10 casualty causing events
ggplot(storm_data.health[1:10,], aes(x = reorder(EVTYPE, -CASUALTY), y = CASUALTY)) + geom_bar(stat = 'identity', aes(fill = EVTYPE)) +
  labs(title = 'Top 10 most harmful events to population health in US', x = 'WEATHER EVENT', y = 'CASUALTY (log10)') +
  guides(fill = FALSE) +
  coord_flip()

rm(storm_data.health)

As shown by the barplot, the most harmful weather event to population health is TORNADO.

Which types of events have greatest economic consequence?

We create two new datasets, having only only one of the PROPDM or CROPDMG and the rest of the processing is as follows:

  • Substituting for NA values.
  • Adding another column TYPE to specify which type of damage is associated with the value present in DMG column.
  • Grouping data based on EVTYPE and TYPE columns (TYPE is the type of cost expressed in the DMG column).
  • Combing the two datasets to get a denormalized data, fit for analysis.

The end result is a denormalized form of the previous data set, which enables to get better insights from the data.

storm_data.prop <- mutate(storm_data, PROPDMGT = ifelse(is.na(PROPDMGT), 0, PROPDMGT)) %>%
  select(EVTYPE, DMG = PROPDMGT)
storm_data.crop <- mutate(storm_data, CROPDMGT = ifelse(is.na(CROPDMGT), 0, CROPDMGT)) %>%
  select(EVTYPE, DMG = CROPDMGT)

storm_data.prop$TYPE <- 'Property'
storm_data.crop$TYPE <- 'Crop'

storm_data.new <- rbind(storm_data.crop, storm_data.prop)
rm(storm_data.crop, storm_data.prop)

storm_data.new <- group_by(storm_data.new, EVTYPE, TYPE) %>%
  summarise(TDMG = sum(DMG))

ggplot(storm_data.new[which(storm_data.new$TDM >= 5e+9),], aes(x = EVTYPE, y = TDMG, TYPE)) + 
  geom_bar(stat = 'identity', aes(fill = TYPE)) +
  labs(title = 'Top 13 most costly weather events in US', x = 'WEATHER EVENT', y = 'TOTAL DAMAGE COST') +
  guides(guides = 'legend', title = 'Damages in $ made to') +
  coord_flip()

It is evident from the barplot, that the weather event which costs the people greatest, by damage of property and crops combined is FLOODS and that such events have more damage to property than to crops. The most harmful weather event to crops is DROUGHT which also features in the top 13 most costly weather events.