The report analyses the data of storms and other severe weather events to identify severity in terms of economic consequece and damage to property and human life. The report targets to throw light on the dangers regarding these events and to prepare against them.
The data collected is from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database from the year 1950 - November 2011.
The report answers two critical questions:
Loading the necessary libraries that will be required for the analysis.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.1
Setting the working directory and reading the data from the bz2file. The file can be read into r using the read.csv(“
#Set your working directory to the directory where you have saved the raw data zip file.
setwd("C:/Users/anupam.acharya/DataScience/5. Reproducible Research/Week 4 assignment/")
alldata <- read.csv("./repdata%2Fdata%2FStormData.csv.bz2")
dim(alldata)
## [1] 902297 37
Subset the above data to extract the relevant columns required for this analysis and to answer the two questions posed. The relevant columns are EVTYPE (Type of Event), FATALITIES (Number of fatalities); INJURIES (Number of Injuries); PROPDMG (Amount of property damage in order of magnitude); PROPDMGEXP (Order of magnitude for property damage); CROPDMG (Amount of crop damage in orders of magnitude); and PROPDMGEXP (Order of magnitude for crop damage). They appear in column 8 and column 23 to 28.
#Subsetting all data and renaming the new dataframe storms.
storms <- alldata[,c(8, 23:28)]
For this question we will calculate the total number of fatalities and injuries for each event type. We will combine the impact to show which type of event are most harmful with respect to population health.
#Aggregating total fatalities based on event type and arranging them in decreasing order.
total_fatalities <- aggregate(storms$FATALITIES, list(Event_Type = storms$EVTYPE), sum)
total_fatalities <- total_fatalities %>% arrange(desc(x))
total_fatalities <- rename(total_fatalities, Fatalities = x)
#Aggregating total injuries based on event type and arranging them in decreasing order.
total_injuries <- aggregate(storms$INJURIES, list(Event_Type = storms$EVTYPE), sum)
total_injuries <- total_injuries %>% arrange(desc(x))
total_injuries <- rename(total_injuries, Injuries = x)
#Adding total fatalities and total injuries to show the total harm caused to human health due to individual events in decreasing order of impact.
total_harm <- merge(total_fatalities, total_injuries)
total_harm$Harm <- total_harm$Fatalities + total_harm$Injuries
total_harm <- arrange(total_harm, desc(Harm))
Following is the list showing Event Type, Fatalities, Injuries and Total Harm in decreasing order of total harm to human health:
head(total_harm, 10)
## Event_Type Fatalities Injuries Harm
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
Below is the barplot showing the top 10 events with the highest Fatalities and Injuries.
subset_total_harm <- total_harm[1:10, ]
subset2_total_harm <- subset_total_harm[,1:3]
barplot(t(subset2_total_harm[,-1]), names.arg = subset_total_harm$Event_Type, las=2, col = c("steel blue", "red"), main = "Total Harm to Population Health", ylab = "Total Harm", xlab = "Event Type", cex.axis = 0.75, cex.names = 0.5, cex.lab = 0.75)
legend("topright",c("Fatalities","Injuries"),fill=c("steel blue","red"),bty = "n")
We can clearly see that the Fatalities and Injuries are maximum for Tornadoes. We can also see the total harm (fatalities + injuries) which is indicated by the height of the barplots.
As a pre-requisite to doing the analysis we will have to convert all the letters to numbers in the variables CROPDMGEXP and PROPDMGEXP. The abbreviations used in the order of magnitude variable correspond to: H for hundreds; K for thousands; M for millions; and B for billions
#Table showing the contents of PROPDMGEXP
table(storms$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
#Table showing the contents of CROPDMGEXP
table(storms$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
#Converting letters to numbers for Property and Crop damage exponents.
storms$PROPDMGEXP <- as.character(storms$PROPDMGEXP)
storms$CROPDMGEXP <- as.character(storms$CROPDMGEXP)
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="H" | storms$PROPDMGEXP == "h"),]$PROPDMGEXP <- 2
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="K" | storms$PROPDMGEXP == "k"),]$PROPDMGEXP <- 3
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="M" | storms$PROPDMGEXP == "m"),]$PROPDMGEXP <- 6
storms[!is.na(storms$PROPDMGEXP) & (storms$PROPDMGEXP=="B" | storms$PROPDMGEXP == "b"),]$PROPDMGEXP <- 9
storms[!is.na(storms$CROPDMGEXP) & (storms$CROPDMGEXP=="K" | storms$CROPDMGEXP == "k"),]$CROPDMGEXP <- 3
storms[!is.na(storms$CROPDMGEXP) & (storms$CROPDMGEXP=="M" | storms$CROPDMGEXP == "m"),]$CROPDMGEXP <- 6
storms[!is.na(storms$CROPDMGEXP) & storms$CROPDMGEXP=="B",]$CROPDMGEXP <- 9
#Table showing the new contents of PROPDMGEXP
table(storms$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 20 424669 4 28
## 6 7 8 9
## 11341 5 1 40
#Table showing the new contents of CROPDMGEXP
table(storms$CROPDMGEXP)
##
## ? 0 2 3 6 9
## 618413 7 19 1 281853 1995 9
#Calculating the net property damage by multiplying the two columns Damage and respective order of 10th power.
suppressWarnings(storms$PROPDAMAGE <- storms$PROPDMG*10^as.numeric(storms$PROPDMGEXP))
suppressWarnings(storms$CROPDAMAGE <- storms$CROPDMG*10^as.numeric(storms$CROPDMGEXP))
storms$PROPDAMAGE[is.na(storms$PROPDAMAGE)] <- 0
storms$CROPDAMAGE[is.na(storms$CROPDAMAGE)] <- 0
storms$TOTDAMAGE <- storms$PROPDAMAGE + storms$CROPDAMAGE
Total_Damage <- aggregate(storms$TOTDAMAGE, list(Event_Type = storms$EVTYPE), sum)
Total_Damage <- rename(Total_Damage, Total_Damage = x)
Total_Damage$Total_Damage <- Total_Damage$Total_Damage/10^9
Total_Damage <- Total_Damage %>% arrange(desc(Total_Damage))
The top 5 events that had the greatest economic consequence across the United States are:
head(Total_Damage, 5)
## Event_Type Total_Damage
## 1 FLOOD 150.31968
## 2 HURRICANE/TYPHOON 71.91371
## 3 TORNADO 57.36233
## 4 STORM SURGE 43.32354
## 5 HAIL 18.76122
Below is a plot showing the economic consequences in Billions of Dollars due to the top 15 events.
subset_total_damage <- Total_Damage[1:15, ]
barplot(t(subset_total_damage[,-1]), names.arg = subset_total_damage$Event_Type, las=2, col = "steel blue", main = "Total Economic Consequence (in Billion USD)", ylab = "Total Economic Consequence (in Billion USD", xlab = "Event Type", cex.axis = 0.75, cex.names = 0.5, cex.lab = 0.75)
The chart shows that Flood has the highest economic consequence on Property and Crops.
=======================================================================