Synopsis
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
My assessment
Tornadoes are by far the worst weather event for property damaage and personal health (both fatalities and injuries). Crops are most affected by water-related events (floods, hurricanes, and hail). Interestingly, there is a very large seperation between tornadoes and all other events when looking at fatalaties, injuries and property damage. There is a clear separation when examining the events on the whole.
Background information
Data
for this assignment
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
Assignment
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Data Downloading/Fetching
file_loc <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2F
StormData.csv.bz2"
#download file if we don't have it already
if (!file.exists("StormData.csv.bz2")) {
download.file(file_loc, destfile="StormData.csv.bz2")
message("StormData.csv.bz2 has been downloaded!")
} else message("File already downloaded!")
## File already downloaded!
#unzip the file if it's not unzipped already
if (!file.exists("StormData.csv")) {
bunzip2("StormData.csv.bz2")
message("StormData has been unzipped!")
} else message("File already unzipped!")
## File already unzipped!
if (!exists("rawData")) {
rawData <- read.csv("StormData.csv")
message("rawData has been loaded!")
} else message("rawData already loaded!")
## rawData has been loaded!
Here we begin to process & clean the data.
#all we care about are a few of the columns that detail casulties or damage
myData <- subset(rawData,
select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
#delete summaries
myData <- myData[-grep("SUMMARY", myData$EVTYPE, ignore.case = TRUE)]
#need to clean up some mispellings and correlate like items
myData$myEVTYPE[grepl("THUNDERSTORM",myData$EVTYPE)] <- "THUNDERSTORM"
myData$myEVTYPE[grepl("TSTM",myData$EVTYPE)] <- "THUNDERSTORM"
myData$myEVTYPE[grepl("SNOW",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("WINT",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("ICE",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("FREEZ",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("LIGHTNING",myData$EVTYPE)] <- "THUNDERSTORM"
myData$myEVTYPE[grepl("CHILL",myData$EVTYPE)] <- "COLD"
myData$myEVTYPE[grepl("COLD",myData$EVTYPE)] <- "COLD"
myData$myEVTYPE[grepl("WIND",myData$EVTYPE)] <- "HIGH WIND"
myData$myEVTYPE[grepl("FLOOD",myData$EVTYPE)] <- "FLOOD"
myData$myEVTYPE[grepl("URBAN",myData$EVTYPE)] <- "FLOOD"
myData$myEVTYPE[grepl("HURRICANE",myData$EVTYPE)] <- "HURRICANE"
myData$myEVTYPE[grepl("SUMMARY",myData$EVTYPE)] <- "HURRICANE"
myData$myEVTYPE[grepl("TROPICAL",myData$EVTYPE)] <- "HURRICANE"
myData$myEVTYPE[grepl("TORN",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("WATERSPROUT",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("FUNNEL",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("HAIL",myData$EVTYPE)] <- "HAIL"
myData$myEVTYPE[grepl("RAIN",myData$EVTYPE)] <- "HEAVY RAIN"
myData$myEVTYPE[grepl("FIRE",myData$EVTYPE)] <- "WILDFIRE"
myData$myEVTYPE[grepl("SPOUT",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("MICRO",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("DRY",myData$EVTYPE)] <- "DROUGHT"
myData$myEVTYPE[grepl("HEAT",myData$EVTYPE)] <- "HEAT"
myData$myEVTYPE <- as.factor(myData$myEVTYPE)
We'll need to adjust the damage figures as the StormData.csv lists “k” for thousands, “m” for millions, etc. For simplicity's sake - we'll keep everything in Billions (B's)
#adjust property damage
myData$myPropDmg <- myData$PROPDMG
myData$myPropDmg[myData$PROPDMGEXP == "K"] <-
myData$PROPDMG[myData$PROPDMGEXP == "K"] * 1000
myData$myPropDmg[myData$PROPDMGEXP == "M"] <-
myData$PROPDMG[myData$PROPDMGEXP == "M"] * 1000000
myData$myPropDmg[myData$PROPDMGEXP == "B"] <-
myData$PROPDMG[myData$PROPDMGEXP == "B"] * 1000000
#adjust crop damage
myData$myCropDmg <- myData$CROPDMG
myData$myCropDmg[myData$CROPDMGEXP == "K"] <-
myData$CROPDMG[myData$CROPDMGEXP == "K"] * 1000
myData$myCropDmg[myData$CROPDMGEXP == "M"] <-
myData$CROPDMG[myData$CROPDMGEXP == "M"] * 1000000
myData$myCropDmg[myData$CROPDMGEXP == "B"] <-
myData$CROPDMG[myData$CROPDMGEXP == "B"] * 1000000
#turn everything back into M's
myData$myPropDmg <- myData$myPropDmg / 1000000000
myData$myCropDmg <- myData$myCropDmg / 1000000000
#concentrate on propDmg events
propDmg <- tapply(myData$myPropDmg, myData$myEVTYPE, FUN=sum, na.rm=TRUE)
propDmg <- propDmg[order(propDmg, decreasing=TRUE)]
#concentrate on propDmg events
cropDmg <- tapply(myData$myCropDmg, myData$myEVTYPE, FUN=sum, na.rm=TRUE)
cropDmg <- cropDmg[order(cropDmg, decreasing=TRUE)]
#combine crop and prop damage
allDmg <- c(cropDmg, propDmg)
allDmg <- allDmg[order(allDmg, decreasing=TRUE)]
Here we begin to correlate fatality producing events
#get only fatalities
fatalities <- tapply(myData$FATALITIES, myData$myEVTYPE, FUN=sum, na.rm=TRUE)
#gets all Fatalities
allFatalities <- fatalities[order(fatalities, decreasing=TRUE)]
#takes only top ten fatality producing events
topFatalities <- allFatalities[1:10]
topFatalities
## TORNADO HEAT FLOOD HIGH WIND THUNDERSTORM
## 5639 3138 1550 1413 818
## WINTER STORM COLD HURRICANE HEAVY RAIN WILDFIRE
## 520 222 201 114 90
Let's look at the top ten injury producing events
#get only Injuries
Injuries <- tapply(myData$INJURIES, myData$myEVTYPE, FUN=sum, na.rm=TRUE)
#gets all Injuries
allInjuries <- Injuries[order(Injuries, decreasing=TRUE)]
#takes only top ten casuality producing events
topInjuries <- allInjuries[1:10]
topInjuries
## TORNADO HIGH WIND HEAT FLOOD THUNDERSTORM
## 91439 11398 9154 8682 5271
## WINTER STORM HURRICANE WILDFIRE HAIL HEAVY RAIN
## 5236 1709 1608 1467 301
For the our purposes - all Injuries and Fatalities are most harmful with respect to population health
#Combine/add Injuries and Fatalities
#topHealth <- topFatalities + topInjuries
topHealth <- c(topInjuries, topFatalities)
topHealth
## TORNADO HIGH WIND HEAT FLOOD THUNDERSTORM
## 91439 11398 9154 8682 5271
## WINTER STORM HURRICANE WILDFIRE HAIL HEAVY RAIN
## 5236 1709 1608 1467 301
## TORNADO HEAT FLOOD HIGH WIND THUNDERSTORM
## 5639 3138 1550 1413 818
## WINTER STORM COLD HURRICANE HEAVY RAIN WILDFIRE
## 520 222 201 114 90
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
It's easy to see that “tornado” and “high wind” events were the most hazardous to overall population health.
barplot(topHealth[1:5]/1000,
main="Events causing fatalities and injuries",
col=c("yellow"),
xlab="Event",
ylab="Frequency in thousands",
ylim=c(0,100))
Across the United States, which types of events have the greatest economic consequences?
The highest property damage event was “tornado” with damage at $51.7B
The highest crop damage event was “flood” with damage at $7.25B
The top damange producing, economic event was “tornado” with damage at $51.7B
Here we look at the Crop Damage Events
barplot(allDmg[1:5],
xlab="Event",
ylab="Damage in Billions",
main="Top 5 Economic Damaging Events",
col=c("purple"),
ylim=c(0,60))
If we compare everything side by side, we can see some insights.
library(lattice)
#couple variables to make life easier when plotting
my_las <- 2
num_events <- 10
par(mfrow=c(2,2))
par(mar=c(9,3,1,1))
barplot(cropDmg[1:num_events],
ylab="Damage in Billions",
main="Top Crop Damage Events",
col=c("red"),
las=my_las,
ylim=c(0,60))
barplot(propDmg[1:num_events],
ylab="Damage in Billions",
main="Top Property Damaging Events",
col=c("purple"),
las=my_las,
ylim=c(0,60))
barplot(allInjuries[1:num_events]/1000,
col=c("yellow"),
ylab="Frequency in thousands",
main="Top Injury Events",
las=my_las,
ylim=c(0,100))
barplot(allFatalities[1:num_events]/1000,
col=c("green"),
ylab="Frequency in thousands",
main="Top Fatality Events",
las=my_las,
ylim=c(0,100))
Although we cannot put a cost on human life, we can see that tornadoes cause both catetrophic loss of life and property. Interestingly enough, tornadoes do not cause the largest amount of loss to crops. If we examine the data without tornadoes, floods become the highest damage producing event, but heat is the top casulty producing event (interestingly enough, heat is fairly low on the economic damaging events)