Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis on the storm event database revealed that tornadoes are the most dangerous weather event to the population health. The second most dangerous event type is the excessive heat. The economic impact of weather events was also analyzed. Flash floods and thunderstorm winds caused billions of dollars in property damages between 1950 and 2011. The largest crop damage caused by drought, followed by flood and hails.
This analysis was run on a Windows 10 with a 2.4 GHz Intel Core i5 processor. The analysis was written using the R programming language. The versions of R and the related R libraries are listed below.
#library(R.utils)
sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.4 tools_3.3.0 htmltools_0.3.5
## [5] yaml_2.1.13 Rcpp_0.12.5 stringi_1.1.1 rmarkdown_0.9.6
## [9] knitr_1.13 stringr_1.0.0 digest_0.6.9 evaluate_0.9
The data is made available by the NOAA at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
file <- "repdata-data-StormData.csv"
bzip_file <- "repdata-data-StormData.csv.bz2"
if(!file.exists(file)) {
input_file <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(input_file, destfile = bzip_file)
bunzip2(bzip_file)
}
df <- read.csv("repdata-data-StormData.csv")
dim(df)
## [1] 902297 37
names(df)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The column EVTYPE holds the type of storm events. It contains over 900 different types which do not directly match the events listed in the NOAA STORM DATA PREPARATION document, which can be found at, https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf. This document states that “The only events permitted in Storm Data are listed in Table 1 of Section 2.1.1.”
The discrepancy includes different abbreviations and numerous entry errors such as misspellings. Some event classifications such as Mud Slides are not included in the NOAA STORM DATA PREPARATION list.
The code segment below attempts to match the events in the data with the events in the description file.
df$EVTYPE=trimws(df$EVTYPE)
df$EVTYPE <- toupper(gsub("^\\s+|\\s+$","", df[,"EVTYPE"]))
df$EVTYPE[grep("ASTRONOMICAL LOW TIDE*", df$EVTYPE)] <- "Astronomical Low Tide"
df$EVTYPE[grep("AVAL*", df$EVTYPE)] <- "Avalanche"
df$EVTYPE[grep("BLOWING SNOW", df$EVTYPE)] <- "Blizzard"
df$EVTYPE[grep("BLIZZ*", df$EVTYPE)] <- "Blizzard"
df$EVTYPE[grep("COASTAL*", df$EVTYPE)] <- "Coastal Flood"
df$EVTYPE[grep("COLD*", df$EVTYPE)] <- "Cold/Wind Chill"
df$EVTYPE[grep("LOW TEMP*", df$EVTYPE)] <- "Cold/Wind Chill"
df$EVTYPE[grep("DEBRI", df$EVTYPE)] <- "Debris Flow"
df$EVTYPE[grep("MUD*", df$EVTYPE)] <- "Debris Flow"
df$EVTYPE[grep("DENSE FOG", df$EVTYPE)] <- "Dense Fog"
df$EVTYPE[grep("DENSE SMOKE", df$EVTYPE)] <- "Dense Smoke"
df$EVTYPE[grep("DROUGHT", df$EVTYPE)] <- "Drought"
df$EVTYPE[grep("*DRY*", df$EVTYPE)] <- "Drought"
df$EVTYPE[grep("DUST DEV*", df$EVTYPE)] <-"Dust Devil"
df$EVTYPE[grep("DUST ST*", df$EVTYPE)] <- "Dust Storm"
df$EVTYPE[grep("EXCESSIVE HEAT", df$EVTYPE)] <- "Excessive Heat"
df$EVTYPE[grep("RECORD TEMPERATURE", df$EVTYPE)] <- "Excessive Heat"
df$EVTYPE[grep("RECORD HIGH*", df$EVTYPE)] <- "Excessive Heat"
df$EVTYPE[grep("TEMPERATURE RECORD*", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("FROST*", df$EVTYPE)] <- "Frost/Freeze"
df$EVTYPE[grep("*FREEZ*", df$EVTYPE)] <- "Frost/Freeze"
df$EVTYPE[grep("*FREEZING FOG*", df$EVTYPE)] <- "Freezing Fog"
df$EVTYPE[grep("FUNNEL*", df$EVTYPE)] <- "Funnel Cloud"
df$EVTYPE[grep("HEAVY RAIN", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("EXCESSIVE RAIN*", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("HVY RAIN", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("RECORD RAIN", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("*RAIN*", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("*SNOW*", df$EVTYPE)] <- "Winter Storm"
df$EVTYPE[grep("*FLOOD*", df$EVTYPE)] <- "Flood"
df$EVTYPE[grep("*FLD*", df$EVTYPE)] <- "Flood"
df$EVTYPE[grep("*FLASH*", df$EVTYPE)] <- "Flash Flood"
df$EVTYPE[grep("LIG*", df$EVTYPE)] <- "Lightning"
df$EVTYPE[grep("WINT*", df$EVTYPE)] <- "Winter Storm"
df$EVTYPE[grep("EXTREME COLD*", df$EVTYPE)] <- "Extreme Cold/Wind Chill"
df$EVTYPE[grep("THUND*", df$EVTYPE)] <- "Thunderstorm Wind"
df$EVTYPE[grep("TSTM*", df$EVTYPE)] <- "Thunderstorm Wind"
df$EVTYPE[grep("MARINE THUND*", df$EVTYPE)] <- "Marine Thunderstorm Wind"
df$EVTYPE[grep("MARINE TSTM*", df$EVTYPE)] <- "Marine Thunderstorm Wind"
df$EVTYPE[grep("HAIL*", df$EVTYPE)] <- "Hail"
df$EVTYPE[grep("HEAT*", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("WARM*", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("UNSEASONABLY HOT", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("HIGH WIND*", df$EVTYPE)] <- "High Wind"
df$EVTYPE[grep("*TURBU*", df$EVTYPE)] <- "High Wind"
df$EVTYPE[grep("HURR*", df$EVTYPE)] <-"Hurricane (Typhoon)"
df$EVTYPE[grep("TYPH*", df$EVTYPE)] <-"Hurricane (Typhoon)"
df$EVTYPE[grep("RIP*", df$EVTYPE)] <- "Rip Current"
df$EVTYPE[grep("STRONG WIN*", df$EVTYPE)] <- "Strong Wind"
df$EVTYPE[grep("GUS*", df$EVTYPE)] <- "Strong Wind"
df$EVTYPE[grep("MARINE STRONG WIN*", df$EVTYPE)] <- "Marine Strong Wind"
df$EVTYPE[grep("ICE*", df$EVTYPE)] <- "Ice Storm"
df$EVTYPE[grep("WILD*", df$EVTYPE)] <- "Wildfire"
df$EVTYPE[grep("*FIRE*", df$EVTYPE)] <- "Wildfire"
df$EVTYPE[grep("TORN*", df$EVTYPE)] <- "Tornado"
df$EVTYPE[grep("TSUN*", df$EVTYPE)] <- "Tsunami"
df$EVTYPE[grep("SLEET", df$EVTYPE)] <- "Sleet"
df$EVTYPE[grep("MIX*", df$EVTYPE)] <- "Sleet"
df$EVTYPE[grep("*WATERSPOUT*", df$EVTYPE)] <- "Waterspout"
df$EVTYPE[grep("*SPOUT*", df$EVTYPE)] <- "Waterspout"
df$EVTYPE[grep("HIGH SURF*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("ROUGH SURF*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("ROUGH SEA*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("HIGH SEA*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("HIGH TIDE", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("HIGH SWELL*", df$EVTYPE)] <- "High Surf"
The PROPEXP and CROPEXP fields contain a character used to define an exponent value to use with the PROPDMG and CROPDMG value amounts. The code below creates a new field that contains the amount with the exp applied. Creating a new column with the actual amount makes it easier to aggregate the crop and property damge amounts against the EVTYPE.
df$PROPEXP[df$PROPDMGEXP == "K"] <- 1000
df$PROPEXP[df$PROPDMGEXP == "M"] <- 1e+06
df$PROPEXP[df$PROPDMGEXP == ""] <- 1
df$PROPEXP[df$PROPDMGEXP == "B"] <- 1e+09
df$PROPEXP[df$PROPDMGEXP == "m"] <- 1e+06
df$PROPEXP[df$PROPDMGEXP == "0"] <- 1
df$PROPEXP[df$PROPDMGEXP == "5"] <- 1e+05
df$PROPEXP[df$PROPDMGEXP == "6"] <- 1e+06
df$PROPEXP[df$PROPDMGEXP == "4"] <- 10000
df$PROPEXP[df$PROPDMGEXP == "2"] <- 100
df$PROPEXP[df$PROPDMGEXP == "3"] <- 1000
df$PROPEXP[df$PROPDMGEXP == "h"] <- 100
df$PROPEXP[df$PROPDMGEXP == "7"] <- 1e+07
df$PROPEXP[df$PROPDMGEXP == "H"] <- 100
df$PROPEXP[df$PROPDMGEXP == "1"] <- 10
df$PROPEXP[df$PROPDMGEXP == "8"] <- 1e+08
# Assigning '0' to invalid exponent data
df$PROPEXP[df$PROPDMGEXP == "+"] <- 0
df$PROPEXP[df$PROPDMGEXP == "-"] <- 0
df$PROPEXP[df$PROPDMGEXP == "?"] <- 0
df$PROPTOT <- (df$PROPDMG * df$PROPEXP) / 1000000000
df$CROPEXP[df$CROPDMGEXP == "M"] <- 1e+06
df$CROPEXP[df$CROPDMGEXP == "K"] <- 1000
df$CROPEXP[df$CROPDMGEXP == "m"] <- 1e+06
df$CROPEXP[df$CROPDMGEXP == "B"] <- 1e+09
df$CROPEXP[df$CROPDMGEXP == "0"] <- 1
df$CROPEXP[df$CROPDMGEXP == "k"] <- 1000
df$CROPEXP[df$CROPDMGEXP == "2"] <- 100
df$CROPEXP[df$CROPDMGEXP == ""] <- 1
df$CROPEXP[df$CROPDMGEXP == "?"] <- 0
df$CROPTOT <- (df$CROPDMG * df$CROPEXP) / 1000000000
Calculate the Property Damage by event. Then create a list of property damage for the top 10 events.
p <- aggregate(PROPTOT ~EVTYPE,df,sum)
row_sub = apply(p, 1, function(x) { as.numeric(x[2]) > 0 })
p <- p[row_sub,]
prop <- p[order(-p$PROPTOT),]
prop <- prop[1:10,]
Calculate the crop damage by event. Then create a list of crop damage for the top 10 events.
cr <- aggregate(CROPTOT ~EVTYPE,df,sum)
row_sub = apply(cr, 1, function(x) { as.numeric(x[2]) > 0 })
cr <- cr[row_sub,]
crop <- cr[order(-cr$CROPTOT),]
crop <- crop[1:10,]
Calculate the fatalities by event. Then create a list of the top 10 events for fatalities
fatalities <- aggregate(FATALITIES ~EVTYPE,df,sum)
row_sub = apply(fatalities, 1, function(x) {as.integer(x[2]) > 0} )
fatalities <- fatalities[row_sub,]
fat <- fatalities[order(-fatalities$FATALITIES),]
fat <- fat[1:10,]
Calculate injuries by event. Then create a list of the top 10 events for injuries.
injuries <- aggregate(INJURIES ~EVTYPE,df,sum)
row_sub = apply(injuries, 1, function(x) { as.integer(x[2]) > 0 })
injuries <- injuries[row_sub,]
inj <- injuries[order(-injuries$INJURIES),]
inj <- inj[1:10,]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8,las=3)
barplot(fat$FATALITIES,names=fat$EVTYPE,ylab="Fatalities",main=strwrap("Top 10 events for Fatalities",30),col="green")
barplot(inj$INJURIES,names=inj$EVTYPE,ylab="Injuries",main=strwrap("Top 10 events for Injuries",30),col="blue")
#### Plot Results for Crop and Property Damage
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8,las=3)
#par(mfrow=c(1,3),mar=c(8,4,8,4),las=3)
barplot(prop$PROPTOT,names=prop$EVTYPE,ylab="Property Damages (in Billions)",main=strwrap("Top 10 events for Property Damage",30),col="green")
barplot(crop$CROPTOT,names=crop$EVTYPE,ylab="Crop Damages (in Billions)",main=strwrap("Top 10 events for Crop Damage",30),col="blue")
Check to see which events occurred most often.
fq <- table(df$EVTYPE)
dfq <- data.frame(fq)
dfq<- dfq[order(-dfq$Freq),]
dfq[1:28,]
## Var1 Freq
## 139 Winter Storm 400553
## 35 Hail 289274
## 30 Flood 85249
## 122 Tornado 61120
## 52 Lightning 16390
## 38 Heavy Rain 11905
## 33 Funnel Cloud 6985
## 138 Wildfire 4237
## 134 Waterspout 3849
## 49 Ice Storm 3056
## 22 Drought 2847
## 9 Blizzard 2774
## 13 Cold/Wind Chill 2473
## 32 Frost/Freeze 1886
## 26 Excessive Heat 1713
## 37 Heat 1656
## 19 Dense Fog 1296
## 12 Coastal Flood 866
## 62 Rip Current 786
## 41 High Surf 761
## 31 FOG 538
## 24 Dust Storm 429
## 5 Avalanche 388
## 48 Hurricane (Typhoon) 296
## 4 Astronomical Low Tide 174
## 23 Dust Devil 151
## 65 Sleet 115
## 121 Thunderstorm Wind 96
Tornados are by far the leading cause of injury and death. Floods caused the most property and crop damage combined while drought caused the most crop damage.
Tornados have caused 5660 fatalities. Excessive Heat 1920 , Winter Storms 1649, Floods 1546 and Heat 1281 are next .
Tornado’s caused 91450 injuries. Winter Storms 14516, Floods 8676, and Excessive Heat 6525 are next.
Floods caused the most property damage by a lot. Floods $167billion, Tornados $104billion and Hurricanes $85billion caused the most property damage.
Drought and Floods caused much more crop damage than any other event. Drought $13billion and Floods $12billion. Ice Storm $5billion and Hurricanes $5billion were next.
Of the major causes of injury/death or property/crop damage it’s interesting to note that Excessive Heat 1713 and Hurricanes occur much less frequently than Tornados, Winter Storms 400553 and Floods 85249. While there were 61120 Tornados, there were only 296 Hurricanes.
The great difference in occurrences of these devastating events probably means that different types of emergency and readiness plans need to be enacted for the different events.