The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The project aims in exploring the data and track the characteristics of the storms in the US and analyze: 1. What types of events are most harmful with respect to population health 2. What types of events have greatest economic impact
Data for the analysis can be downloaded from the course website: Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The first step is to load the necessary packages and read the storm data.
library(plyr)
library(ggplot2)
library(gridExtra)
library(grid)
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
Since the analysis intends to focus on health and economic consequences of the severe weather events, we will limit the dataset to the needed columns for faster processing.
stormData <- data[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
We will first summarize the fatalaties and injuries by event type.
harm2health <- ddply(stormData, .(EVTYPE), summarize,fatalities = sum(FATALITIES),injuries = sum(INJURIES))
fatal <- harm2health[order(harm2health$fatalities, decreasing = TRUE), ]
injury <- harm2health[order(harm2health$injuries, decreasing = TRUE), ]
Since the exponential values are stored in seperate column, we will use a function and convert the value of the exponent to a number.
getExp <- function(e) {
if (e %in% c("h", "H"))
return(2)
else if (e %in% c("k", "K"))
return(3)
else if (e %in% c("m", "M"))
return(6)
else if (e %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid value.")
}
}
We will now calculate the values for property and crop damages
propExp <- sapply(stormData$PROPDMGEXP, FUN=getExp)
stormData$propDamage <- stormData$PROPDMG * (10 ** propExp)
cropExp <- sapply(stormData$CROPDMGEXP, FUN=getExp)
stormData$cropDamage <- stormData$CROPDMG * (10 ** cropExp)
We will now summarize the damages for crops and property by event type by excluding the events which didnot cause finalcial impact
economicDamage <- ddply(stormData, .(EVTYPE), summarize,propDamage = sum(propDamage), cropDamage = sum(cropDamage))
economicDamage <- economicDamage[(economicDamage$propDamage > 0 | economicDamage$cropDamage > 0), ]
We will now sort the data
propDmgSorted <- economicDamage[order(economicDamage$propDamage, decreasing = TRUE), ]
cropDmgSorted <- economicDamage[order(economicDamage$cropDamage, decreasing = TRUE), ]
We have now processed the data accordingly to present our results.
We will plot the top 5 events which are causing most harm to the health of the citizens in terms of injuries and fatalities in a grid.
plot1 <- ggplot(data=head(injury,5), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
geom_bar(fill="deepskyblue1",stat="identity") + coord_flip() +
ylab("Number of Injuries") + xlab("Event Type") +
ggtitle("Health Impacts of weather events") +
theme(legend.position="none")
plot2 <- ggplot(data=head(fatal,5), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
geom_bar(fill="gold1",stat="identity") + coord_flip() +
ylab("Number of Fatalities") + xlab("Event Type") +
theme(legend.position="none")
grid.arrange(plot1, plot2, nrow =2)
We will plot the top 5 events which are causing most financial damage in terms of crop and property damages in a grid. We will plot the y axis in log 10 scale for better readability.
plot3 <- ggplot(data=head(propDmgSorted,5), aes(x=reorder(EVTYPE, propDamage), y=log10(propDamage), fill=propDamage )) +
geom_bar(fill="deepskyblue1", stat="identity") + coord_flip() +
xlab("Event Type") + ylab("Property Damages (in $, log 10)") +
ggtitle("Economic impact of weather events") +
theme(plot.title = element_text(hjust = 0))
plot4 <- ggplot(data=head(cropDmgSorted,5), aes(x=reorder(EVTYPE, cropDamage), y=log10(cropDamage), fill=cropDamage)) +
geom_bar(fill="gold1", stat="identity") + coord_flip() +
xlab("Event Type") + ylab("Crop Damages (in $, log 10)") +
theme(legend.position="none")
grid.arrange(plot3, plot4, ncol=1, nrow =2)