Synopsis

The report analyses the data of storms and other severe weather events to identify severity in terms of economic consequece and damage to property and human life. The report targets to throw light on the dangers regarding these events and to prepare against them.
The data collected is from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database from the year 1950 - November 2011.
The report answers two critical questions:
  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Loading the necessary libraries that will be required for the analysis.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.1

Setting the working directory and reading the data from the bz2file. The file can be read into r using the read.csv(“”) command. Displaying the number of rows and columns in the data provided.

#Set your working directory to the directory where you have saved the raw data zip file.
setwd("C:/Users/anupam.acharya/DataScience/5. Reproducible Research/Week 4 assignment/")
alldata <- read.csv("./repdata%2Fdata%2FStormData.csv.bz2")
dim(alldata)
## [1] 902297     37

Subset the above data to extract the relevant columns required for this analysis and to answer the two questions posed. The relevant columns are EVTYPE (Type of Event), FATALITIES (Number of fatalities); INJURIES (Number of Injuries); PROPDMG (Amount of property damage in order of magnitude); PROPDMGEXP (Order of magnitude for property damage); CROPDMG (Amount of crop damage in orders of magnitude); and PROPDMGEXP (Order of magnitude for crop damage). They appear in column 8 and column 23 to 28.

#Subsetting all data and renaming the new dataframe storms.
storms <- alldata[,c(8, 23:28)]

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

For this question we will calculate the total number of fatalities and injuries for each event type. We will combine the impact to show which type of event are most harmful with respect to population health.

#Aggregating total fatalities based on event type and arranging them in decreasing order.
total_fatalities <- aggregate(storms$FATALITIES, list(Event_Type = storms$EVTYPE), sum)
total_fatalities <- total_fatalities %>% arrange(desc(x))
total_fatalities <- rename(total_fatalities, Fatalities = x)
#Aggregating total injuries based on event type and arranging them in decreasing order.
total_injuries <- aggregate(storms$INJURIES, list(Event_Type = storms$EVTYPE), sum)
total_injuries <- total_injuries %>% arrange(desc(x))
total_injuries <- rename(total_injuries, Injuries = x)
#Adding total fatalities and total injuries to show the total harm caused to human health due to individual events in decreasing order of impact.
total_harm <- merge(total_fatalities, total_injuries)
total_harm$Harm <- total_harm$Fatalities + total_harm$Injuries
total_harm <- arrange(total_harm, desc(Harm))

Results

Following is the list showing Event Type, Fatalities, Injuries and Total Harm in decreasing order of total harm to human health:

head(total_harm, 10)
##           Event_Type Fatalities Injuries  Harm
## 1            TORNADO       5633    91346 96979
## 2     EXCESSIVE HEAT       1903     6525  8428
## 3          TSTM WIND        504     6957  7461
## 4              FLOOD        470     6789  7259
## 5          LIGHTNING        816     5230  6046
## 6               HEAT        937     2100  3037
## 7        FLASH FLOOD        978     1777  2755
## 8          ICE STORM         89     1975  2064
## 9  THUNDERSTORM WIND        133     1488  1621
## 10      WINTER STORM        206     1321  1527

Below is the barplot showing the top 10 events with the highest Fatalities and Injuries.

subset_total_harm <- total_harm[1:10, ]
subset2_total_harm <- subset_total_harm[,1:3]
barplot(t(subset2_total_harm[,-1]), names.arg = subset_total_harm$Event_Type, las=2, col = c("steel blue", "red"), main = "Total Harm to Population Health", ylab = "Total Harm", xlab = "Event Type", cex.axis = 0.75, cex.names = 0.5, cex.lab = 0.75)
legend("topright",c("Fatalities","Injuries"),fill=c("steel blue","red"),bty = "n")

We can clearly see that the Fatalities and Injuries are maximum for Tornadoes. We can also see the total harm (fatalities + injuries) which is indicated by the height of the barplots.

2. Across the United States, which types of events have the greatest economic consequences?

As a pre-requisite to doing the analysis we will have to convert all the letters to numbers in the variables CROPDMGEXP and PROPDMGEXP. The abbreviations used in the order of magnitude variable correspond to: H for hundreds; K for thousands; M for millions; and B for billions

#Table showing the contents of PROPDMGEXP
table(storms$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
#Table showing the contents of CROPDMGEXP
table(storms$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
#Converting letters to numbers for Property and Crop damage exponents.
storms$PROPDMGEXP <- as.character(storms$PROPDMGEXP)
storms$CROPDMGEXP <- as.character(storms$CROPDMGEXP)
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="H" | storms$PROPDMGEXP == "h"),]$PROPDMGEXP <- 2
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="K" | storms$PROPDMGEXP == "k"),]$PROPDMGEXP <- 3
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="M" | storms$PROPDMGEXP == "m"),]$PROPDMGEXP <- 6
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="B" | storms$PROPDMGEXP == "b"),]$PROPDMGEXP <- 9

storms[!is.na(storms$CROPDMGEXP) & (storms$CROPDMGEXP=="K" | storms$CROPDMGEXP == "k"),]$CROPDMGEXP <- 3
storms[!is.na(storms$CROPDMGEXP) & (storms$CROPDMGEXP=="M" | storms$CROPDMGEXP == "m"),]$CROPDMGEXP <- 6
storms[!is.na(storms$CROPDMGEXP) & storms$CROPDMGEXP=="B",]$CROPDMGEXP <- 9
#Table showing the new contents of PROPDMGEXP
table(storms$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     20 424669      4     28 
##      6      7      8      9 
##  11341      5      1     40
#Table showing the new contents of CROPDMGEXP
table(storms$CROPDMGEXP)
## 
##             ?      0      2      3      6      9 
## 618413      7     19      1 281853   1995      9
#Calculating the net property damage by multiplying the two columns Damage and respective order of 10th power.
suppressWarnings(storms$PROPDAMAGE <- storms$PROPDMG*10^as.numeric(storms$PROPDMGEXP))
suppressWarnings(storms$CROPDAMAGE <- storms$CROPDMG*10^as.numeric(storms$CROPDMGEXP))
storms$PROPDAMAGE[is.na(storms$PROPDAMAGE)] <- 0
storms$CROPDAMAGE[is.na(storms$CROPDAMAGE)] <- 0

storms$TOTDAMAGE <- storms$PROPDAMAGE + storms$CROPDAMAGE

Total_Damage <- aggregate(storms$TOTDAMAGE, list(Event_Type = storms$EVTYPE), sum)
Total_Damage <- rename(Total_Damage, Total_Damage = x)
Total_Damage$Total_Damage <- Total_Damage$Total_Damage/10^9
Total_Damage <- Total_Damage %>% arrange(desc(Total_Damage))

Results

The top 5 events that had the greatest economic consequence across the United States are:

head(Total_Damage, 5)
##          Event_Type Total_Damage
## 1             FLOOD    150.31968
## 2 HURRICANE/TYPHOON     71.91371
## 3           TORNADO     57.36233
## 4       STORM SURGE     43.32354
## 5              HAIL     18.76122

Below is a plot showing the economic consequences in Billions of Dollars due to the top 15 events.

subset_total_damage <- Total_Damage[1:15, ]
barplot(t(subset_total_damage[,-1]), names.arg = subset_total_damage$Event_Type, las=2, col = "steel blue", main = "Total Economic Consequence (in Billion USD)", ylab = "Total Economic Consequence (in Billion USD", xlab = "Event Type", cex.axis = 0.75, cex.names = 0.5, cex.lab = 0.75)

The chart shows that Flood has the highest economic consequence on Property and Crops.

=======================================================================