Synopsis

This report contains analysis done on the NOAA Storm Events data. The raw data starts from year 1950 to November 2011, with 902297 observations and 37 variables. However, only 7 variables are relevant to the analysis. The following displays how the data was cleaned, re-orderd and analyzed to answer the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

In both the cases, top 10 events are displayed. Nevertheless, the direct conclusions that can be drawn are:

Loading and Processing Raw Data

Reading raw data

The data was downloaded from the website, unzipped and stored in the variable “storm”.

#The follwoing code runs to check if the file is present. If not, then is downloaded
if(!file.exists("repdata%2Fdata%2FStormData.csv.bz2")){
        temp <- tempfile()
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", temp)
        unzip(temp)
        unlink(temp)
}
#The dataset is then stored in storm
storm <- read.csv("repdata%2Fdata%2FStormData.csv")

Data Processing

These are the variables selected for the analysis:

  • EVTYPE as a measure of event type (e.g. tornado, flood, etc.)
  • FATALITIES as a measure of harm to human health
  • INJURIES as a measure of harm to human health
  • PROPDMG as a measure of property damage and hence economic damage in USD
  • PROPDMGEXP as a measure of magnitude of property damage (e.g. thousands, millions USD, etc.)
  • CROPDMG as a measure of crop damage and hence economic damage in USD
  • CROPDMGEXP as a measure of magnitude of crop damage (e.g. thousands, millions USD, etc.)
#Using the dplyr package, the important variables were selected and the dataset was stored in dat
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
dat <- select(storm, EVTYPE, FATALITIES, 
              INJURIES, PROPDMG, PROPDMGEXP, CROPDMG,CROPDMGEXP)

Economic damage is classified by two categories:

  • Property Damage
  • Crop Damage

The damage can be calculated by multiplying the PROPDMG with the 10PROPDMGEXP. However, the PROPDMGEXP required a proper numerical assignment. It was done in the following way.

#A dmgexp dataframe was created with two columns. One contained the variables present in the PROPDMGEXP, including the NA, "?",  "+", and "=". The other contained the corresponding numerical anssignment of the exponenent.
dmgexp<-cbind(c("K", "M", "B", "0", "5", "6", "4", "2", "3", "H", "7", "1", "8","-", ""), c(3, 6, 9, 0, 5, 6, 4, 2, 3, 2, 7, 1, 8, 1, 1))
dmgexp <- as.data.frame(dmgexp)


#The variables were named accordingly, numerically assigning the PROPDMGEXP to a new variable #PRPEXP 
names(dmgexp) <- c("PROPDMGEXP", "PRPEXP")

#The matching of dmgexp and dat datset resulted in the new variable, PRPEXP, being created an assigned according to PROPDMGEXP in the dat dataset 
dat$PRPEXP <- dmgexp$PRPEXP[match(dat$PROPDMGEXP, dmgexp$PROPDMGEXP)]

#A similar operation was carried out for CROPDMGEXP
names(dmgexp) <- c("CROPDMGEXP", "CRPEXP")
dat$CRPEXP <- dmgexp$CRPEXP[match(dat$CROPDMGEXP, dmgexp$CROPDMGEXP)]

#The new variables were changed from factor to numeric class
dat$CRPEXP <- as.numeric(dat$CRPEXP)
dat$PRPEXP <- as.numeric(dat$PRPEXP)

Total economic damage was calculated by adding Property and Crop damage.

#Total damage was stored under a new variable,TOTALDMG, in the dat dataset
dat$TOTALDMG <-  dat$PROPDMG * 10^dat$PRPEXP + dat$CROPDMG * 10^dat$CRPEXP

Results

Top 10 events that affect health

The following bar plots exhibit the top 10 events that cause the highest fatalities.

#Data was aggregated based on fatalities and event type
fatalities <- aggregate(FATALITIES ~ EVTYPE, dat, sum)
#The fatalities were arranged in descending order and the top 10 were filtered out
fatalities <- arrange(fatalities, desc(FATALITIES))
top10_fatalities <- fatalities[1:10,]

#Using ggplot2 package to plot the barplot
library(ggplot2)
ggplot(top10_fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
        geom_bar(stat = "identity", fill = "blue", col= "black") + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        xlab("EVENT TYPE") + ylab("Fatalities") + 
        ggtitle("Number of Fatalities by Top 10 Weather Events")

Similarly, the following bar plots exhibit the top 10 events that cause the highest injuries.

#Data was aggregated based on injusries and event type
injuries <- aggregate(INJURIES ~ EVTYPE, dat, sum)

#The fatalities were arranged in descending order and the top 10 were filtered out
injuries <- arrange(injuries, desc(INJURIES))
top10_injuries <- injuries[1:10,]

#Using ggplot2 package to plot the barplot
ggplot(top10_injuries, aes(x = EVTYPE, y = INJURIES)) + 
        geom_bar(stat = "identity", fill = "red", col= "black") + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        xlab("EVENT TYPE") + ylab("Injuries") + 
        ggtitle("Number of Injuries by Top 10 Weather Events")

It is seen that tornado is seen to cause the most injuries and fatalities.

Top 10 events that affect health

The bar plot below states the top 10 weather events that have caused maximum damage.

economic <- aggregate(TOTALDMG ~ EVTYPE, dat, sum)
economic <- arrange(economic, desc(TOTALDMG))

top10_totaldamage <- economic[1:10,]

ggplot(top10_totaldamage, aes(x = EVTYPE, y = TOTALDMG)) + 
        geom_bar(stat = "identity", fill = "dark green", col= "black") + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        xlab("EVENT TYPE") + ylab("Total Cost of Damage") + 
        ggtitle("Adverse Economic Consequence by Top 10 Weather Events")

The plot indicates that floods have the greatest economic consequence with the highest property and crop damage put together.

Conclusions

In summary, the analysis has shown that tornado caused the maximum number of both, fatalities and injuries. Floods had the greatest economic impact. Second major events that caused the maximum damage was Hurricanes/Typhoons.