Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
library(knitr)
## Warning: package 'knitr' was built under R version 3.2.2
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.3
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.5
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
Read the file
StormData <- read.csv("C:\\Users\\Vadim Katsemba\\Documents\\StormData.csv")
After reading the file, we limit the data to the columns concerning the health and economic consequences
limited.data <- StormData[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
We summarize the fatalities and injuries based on the event type in decreasing order. (NOTE: The fatalities and injuries columns must be converted to numeric to avoid an error)
limited.data$FATALITIES <- as.numeric(as.character(limited.data$FATALITIES))
## Warning: NAs introduced by coercion
limited.data$INJURIES <- as.numeric(as.character(limited.data$INJURIES))
## Warning: NAs introduced by coercion
health <- ddply(limited.data, .(EVTYPE), summarize, fatalities = sum(FATALITIES), injuries = sum(INJURIES))
fatality <- health[order(health$fatalities, decreasing = T), ]
injury <- health[order(health$injuries, decreasing = T), ]
The exponential values(hundreds, thousands, millions and billions) are stored in a separate column. We calculate the damage in dollars with a function.
exp <- function(x) {
if (x %in% c("h", "H"))
return(2)
else if (x %in% c("k", "K"))
return(3)
else if (x %in% c("m", "M"))
return(6)
else if (x %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(x)))
return(as.numeric(x))
else if (x %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid.")
}
}
Now, we can compute the proper values for property and crop damage (NOTE: The property and crop damage columns have to be converted to numeric to avoid an error)
prop <- sapply(limited.data$PROPDMGEXP, FUN = exp)
limited.data$PROPDMG <- as.numeric(as.character(limited.data$PROPDMG))
## Warning: NAs introduced by coercion
limited.data$propdam <- limited.data$PROPDMG * (10 ** prop)
crop <- sapply(limited.data$CROPDMGEXP, FUN = exp)
limited.data$CROPDMG <- as.numeric(as.character(limited.data$CROPDMG))
## Warning: NAs introduced by coercion
limited.data$cropdam <- limited.data$CROPDMG * (10 ** crop)
Then, we summarize the property and crop damage based on the event type and remove any events that didn’t cost any damage. Before we get to the results, we arrange the sort the data in descending order
edamage <- ddply(limited.data, .(EVTYPE), summarize, pdamage = sum(propdam), cdamage = sum(cropdam))
edamage <- edamage[(edamage$pdamage > 0 ) | (edamage$cdamage > 0), ]
propdamdes <- edamage[order(edamage$pdamage, decreasing = T), ]
cropdamdes <- edamage[order(edamage$cdamage, decreasing = T), ]
There are two questions we have to answer: 1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
We create two plots for the 5 leading causes of fatalities and injuries.
fplot <- ggplot(data = head(fatality,5), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) + geom_bar(fill = "red", stat="identity") + coord_flip() + xlab("Type of Event") + ylab("Total Fatalities") + ggtitle("The Health Impact of Weather: 5 Leading Causes")
iplot <- ggplot(data = head(injury, 5), aes(x=reorder(EVTYPE, injuries), y=injuries)) + geom_bar(fill = "blue", stat = "identity") + coord_flip() + xlab("Type of Event") + ylab("Total Injuries")
grid.arrange(fplot, iplot, ncol=1, nrow=2)
We create two plots for the 5 leading causes of property and crop damage. To make the plot more readable, we adjusted it on the log base 10 scale.
propplot <- ggplot(data = head(propdamdes,5), aes(x=reorder(EVTYPE, pdamage), y=log10(pdamage), fill=pdamage)) + geom_bar(fill="green", stat = "identity") + coord_flip() + xlab("Type of Event") + ylab("Property Damage (USD)") + ggtitle("The Economic Impact of Weather: 5 Leading Causes")
cropplot <- ggplot(data = head(cropdamdes,5), aes(x=reorder(EVTYPE, cdamage), y=log10(cdamage), fill=cdamage)) + geom_bar(fill="orange", stat = "identity") + coord_flip() + xlab("Type of Event") + ylab("Crop Damage (USD)")
grid.arrange(propplot, cropplot, ncol=1, nrow=2)
In terms of public health, tornadoes caused the most fatalities and injuries over the span of time. Other events, such as heat, flooding and lightning paled in comparison to tornadoes. Flash floods caused the most property damage, with thunderstorms, tornadoes, hail and lightning not far behind. Droughts caused the most crop damage, with flooding, ice storms and hail also causing a significant amount of damage to the crops in dollars.