Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. The basic goal of this analysis is to explore the NOAA Storm Database and answer some basic questions about severe weather events as given below.
After analysis it is found that “TORNADO” is the harmful event with respect to population health and “FLOOD” is the event which have the greatest economic consequences.
The events in the database start in the year 1950 and end in November 2011. Downloaded the file from the course web site for the data: Storm Data (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
# Read Data from "repdata_data_StormData.csv" assumming file exists in data folder under working directory.
stormdata <- read.csv("./data/repdata_data_StormData.csv")
dim(stormdata)
## [1] 902297 37
## Keep only necessory columns required for analysis, drop remaining columns
stormdata <- stormdata[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")]
names(stormdata) <- tolower(names(stormdata))
#Summarise fatalities by event types and arrange in descending order of summation
FatalityEvents <- aggregate(fatalities ~ evtype, data = stormdata, FUN = sum)
FatalityEventsOrder <- FatalityEvents[order(FatalityEvents$fatalities, decreasing = T), ]
#Summarise Injurries by event types and arrange in descending order of summation
InjuryEvents <- aggregate(injuries ~ evtype, data = stormdata, FUN = sum)
InjuryEventsOrder <- InjuryEvents[order(InjuryEvents$injuries, decreasing = T), ]
# plot graph of top 5 events causing death in USA
library(ggplot2)
ggplot(FatalityEventsOrder[1:5, ], aes(evtype, fatalities)) + geom_bar(stat = "identity") +
ylab("Fatalities") + xlab("Event Type") + ggtitle("Top Five Types of Events Causing Deaths Across the U.S")
# plot graph of top 5 events causing death in USA
ggplot(InjuryEventsOrder[1:5, ], aes(evtype, injuries)) + geom_bar(stat = "identity") + ylab("Injuries") +
xlab("Event Type") + ggtitle("Top Five Types of Events Causing Injuries Across the U.S")
Assume The economic consequences comes from variables “PROPDMG” and “CROPDMG”. Also there are two variables “PROPDMGEXP” and “CROPDMGEXP” representing multipliers of “PROPDMG” and “CROPDMG”. There are 28 levels for variable “PROPDMGEXP” as follows. Assume that “B”" refers to a billion, (“m” “M”) refers to a million, “K” refers to a thousand, (“h” “H”) refers to a hundred,(“” “-” “+” “0” “1” “2” “3” “4” “5” “6” “7”) refer to 1, others refer to invalid data. There are 13 levels for variable “CROPDMGEXP” as follows. Assumed that (“” “0”) refers to 1, “B”" refers to a billion, (“K” “k”) refers to a thousand, (“m” “M”) refers to a million, others refer to invalid data.
propdmg <- stormdata$propdmg
propdmgexp <- stormdata$propdmgexp
cropdmg <- stormdata$cropdmg
cropdmgexp <- stormdata$cropdmgexp
##Convert all property damage data in single unit of million doller
propdmg[propdmgexp %in% "B"] <- propdmg[propdmgexp %in% "B"] * 1000
propdmg[propdmgexp %in% c("M", "m")] <- propdmg[propdmgexp %in% c("M", "m")] * 1
propdmg[propdmgexp %in% c("K")] <- propdmg[propdmgexp %in% c("K")] * 0.001
propdmg[propdmgexp %in% c("H", "h")] <- propdmg[propdmgexp %in% c("H", "h")] * 1e-04
propdmg[!(propdmgexp %in% c("B", "M", "m", "K", "H", "h"))] <- propdmg[!(propdmgexp %in% c("B", "M", "m", "K", "H", "h"))] * 1e-06
##Convert all crop damage data in single unit of million doller
cropdmg[cropdmgexp %in% "B"] <- cropdmg[cropdmgexp %in% "B"] * 1000
cropdmg[cropdmgexp %in% c("M", "m")] <- cropdmg[cropdmgexp %in% c("M", "m")] * 1
cropdmg[cropdmgexp %in% c("K", "k")] <- cropdmg[cropdmgexp %in% c("K", "k")] * 0.001
cropdmg[!(cropdmgexp %in% c("B", "M", "m", "K", "k"))] <- cropdmg[!(cropdmgexp %in% c("B", "M", "m","K", "k"))] * 1e-06
##Calculate total economic damage by adding property damage and crop damage
economicDamage <- cropdmg + propdmg
ecodmg <- aggregate(economicDamage ~ stormdata$evtype, FUN = sum)
ecodmgorder <- ecodmg[order(ecodmg$economicDamage, decreasing = T), ]
names(ecodmgorder)[1] <- "evtype"
ggplot(ecodmgorder[1:5, ], aes(evtype, economicDamage)) + geom_bar(stat = "identity") + ylab("Economic Damages (million dollars)") +
xlab("Event Type") + ggtitle("Top Five Types of Events Causing Economic Damages Across the U.S")
From the plots, it can be concluded that “TORNADO” is the harmful event with respect to population health and “FLOOD” is the event which have the greatest economic consequences.