Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The events in the database start in the year 1950 and end in November 2011.
1.) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2.) Across the United States, which types of events have the greatest economic consequences?
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.
The first step is to read the data into a data frame.
rm(list=ls())
library(knitr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if(!file.exists("./StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="StormData.csv.bz2")
}
rawData <- read.csv("StormData.csv.bz2")
stormData <- rawData[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
stormData <- stormData[!grepl("summary",stormData$EVTYPE,ignore.case=TRUE),]
unique(stormData[grepl("TSTM", stormData$EVTYPE),]$EVTYPE)
## [1] TSTM WIND TORNADOES, TSTM WIND, HAIL
## [3] TSTM WIND 51 TSTM WIND 50
## [5] TSTM WIND 52 TSTM WIND 55
## [7] TSTM WIND G58 TSTM WIND DAMAGE
## [9] TSTM WINDS TSTMW
## [11] TSTM WIND 65) TSTM WIND/HAIL
## [13] TSTM WIND (G45) TSTM HEAVY RAIN
## [15] TSTM WIND 40 TSTM WIND 45
## [17] TSTM WIND (41) TSTM WIND (G40)
## [19] TSTM WND TSTM WIND
## [21] TSTM WIND AND LIGHTNING TSTM WIND (G45)
## [23] TSTM WIND (G45) TSTM WIND (G35)
## [25] TSTM TSTM WIND G45
## [27] NON-TSTM WIND NON TSTM WIND
## [29] MARINE TSTM WIND
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
stormData[stormData$EVTYPE == "TSTM", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTMW", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == " TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTM WINDS", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTM WND", ]$EVTYPE = "THUNDERSTORM WIND"
unique(stormData$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
stormData$PROPEXP[stormData$PROPDMGEXP == "H"] <- 100
stormData$PROPEXP[stormData$PROPDMGEXP == "h"] <- 100
stormData$PROPEXP[stormData$PROPDMGEXP == "K"] <- 1000
stormData$PROPEXP[stormData$PROPDMGEXP == "M"] <- 1000000
stormData$PROPEXP[stormData$PROPDMGEXP == "m"] <- 1000000
stormData$PROPEXP[stormData$PROPDMGEXP == "B"] <- 1000000000
stormData$PROPEXP[stormData$PROPDMGEXP == "8"] <- 100000000
stormData$PROPEXP[stormData$PROPDMGEXP == "7"] <- 10000000
stormData$PROPEXP[stormData$PROPDMGEXP == "6"] <- 1000000
stormData$PROPEXP[stormData$PROPDMGEXP == "5"] <- 100000
stormData$PROPEXP[stormData$PROPDMGEXP == "4"] <- 10000
stormData$PROPEXP[stormData$PROPDMGEXP == "3"] <- 1000
stormData$PROPEXP[stormData$PROPDMGEXP == "2"] <- 100
stormData$PROPEXP[stormData$PROPDMGEXP == "1"] <- 10
stormData$PROPEXP[stormData$PROPDMGEXP == "0"] <- 1
stormData$PROPEXP[stormData$PROPDMGEXP == "-"] <- 0
stormData$PROPEXP[stormData$PROPDMGEXP == "?"] <- 0
stormData$PROPEXP[stormData$PROPDMGEXP == "+"] <- 0
stormData$PROPDMGCOST <- as.numeric(stormData$PROPDMG) * as.numeric(stormData$PROPDMGEXP)
stormData$CROPEXP[stormData$CROPDMGEXP == "H"] <- 100
stormData$CROPEXP[stormData$CROPDMGEXP == "h"] <- 100
stormData$CROPEXP[stormData$CROPDMGEXP == "K"] <- 1000
stormData$CROPEXP[stormData$CROPDMGEXP == "M"] <- 1000000
stormData$CROPEXP[stormData$CROPDMGEXP == "m"] <- 1000000
stormData$CROPEXP[stormData$CROPDMGEXP == "B"] <- 1000000000
stormData$CROPEXP[stormData$CROPDMGEXP == "8"] <- 100000000
stormData$CROPEXP[stormData$CROPDMGEXP == "7"] <- 10000000
stormData$CROPEXP[stormData$CROPDMGEXP == "6"] <- 1000000
stormData$CROPEXP[stormData$CROPDMGEXP == "5"] <- 100000
stormData$CROPEXP[stormData$CROPDMGEXP == "4"] <- 10000
stormData$CROPEXP[stormData$CROPDMGEXP == "3"] <- 1000
stormData$CROPEXP[stormData$CROPDMGEXP == "2"] <- 100
stormData$CROPEXP[stormData$CROPDMGEXP == "1"] <- 10
stormData$CROPEXP[stormData$CROPDMGEXP == "0"] <- 1
stormData$CROPEXP[stormData$CROPDMGEXP == "-"] <- 0
stormData$CROPEXP[stormData$CROPDMGEXP == "?"] <- 0
stormData$CROPEXP[stormData$CROPDMGEXP == "+"] <- 0
stormData$CROPDMGCOST <- as.numeric(stormData$CROPDMG) * as.numeric(stormData$CROPDMGEXP)
Tornadoes cause most number of deaths and injuries among all event types. There are more than 5,000 deaths and more than 10,000 injuries in the last 60 years in US, due to tornadoes. The other event types that are most dangerous with respect to population health are excessive heat and flash floods.
fatal <- aggregate(FATALITIES ~ EVTYPE, data=stormData, sum)
fatal <- fatal[fatal$FATALITIES>0,]
fatal <- fatal[order(fatal$FATALITIES, decreasing=TRUE),]
fatal <- fatal[1:8,]
head(fatal)
## EVTYPE FATALITIES
## 766 TORNADO 5633
## 128 EXCESSIVE HEAT 1903
## 151 FLASH FLOOD 978
## 273 HEAT 937
## 462 LIGHTNING 816
## 692 THUNDERSTORM WIND 637
As we can see, fatalities caused by Tornados and Excessive Heat are the top natural events. Looking at Injuries:
injury <- aggregate(INJURIES ~ EVTYPE, data=stormData, sum)
injury <- injury[injury$INJURIES>0,]
injury <- injury[order(injury$INJURIES, decreasing=TRUE),]
injury <- injury[1:5,]
head(injury)
## EVTYPE INJURIES
## 766 TORNADO 91346
## 692 THUNDERSTORM WIND 8445
## 168 FLOOD 6789
## 128 EXCESSIVE HEAT 6525
## 462 LIGHTNING 5230
Now we plot this data to visually see the impact of weather events which impact the population in both Fatilities and Injuries.
barplot(fatal[1:5, 2], col=terrain.colors(5), legend.text=fatal[1:5, 1], ylab = "# of Fatalities", main = "Fatalities from Natural Events", cex.names=0.75)
barplot(injury[1:5, 2], col=terrain.colors(5), legend.text=injury[1:5, 1] , ylab = "# of Injuries", main = "Injuries from Natural Events", cex.names=0.75)
Across the United States, which types of events have the greatest economic consequences?
Taking a similar approach, we have the total amount of property damage during the Tidy process and it can be argued that it be performed here vs. a Tidy process as the processing and information may not be used during an analysis, but in this case, we will assume that we knew we needed to process the data.
Aggregate property damage along with the Event Type.
property <- aggregate(PROPDMG ~ EVTYPE, data=stormData, sum)
property <- property[property$PROPDMG >0,]
property <- property[order(property$PROPDMG, decreasing=TRUE),]
head(property)
## EVTYPE PROPDMG
## 766 TORNADO 3212258.2
## 692 THUNDERSTORM WIND 2213026.8
## 151 FLASH FLOOD 1420124.6
## 168 FLOOD 899938.5
## 242 HAIL 688693.4
## 462 LIGHTNING 603351.8
crop <- aggregate(CROPDMGCOST ~ EVTYPE, data=stormData, sum)
crop <- crop[crop$CROPDMGCOST >0,]
crop <- crop[order(crop$CROPDMGCOST, decreasing=TRUE),]
head(crop)
## EVTYPE CROPDMGCOST
## 242 HAIL 4061556.6
## 151 FLASH FLOOD 1256889.9
## 692 THUNDERSTORM WIND 1233545.6
## 168 FLOOD 1187264.0
## 766 TORNADO 700120.5
## 93 DROUGHT 262189.6
barplot(property[1:5, 2], col=terrain.colors(5), legend.text=property[1:5, 1], ylab = "Property Damage ($)", main = "Property Damage from Natural Events", cex.names=0.75)