Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

The analysis on the storm event database revealed that tornadoes are the most dangerous weather event to the population health. The second most dangerous event type is the excessive heat. The economic impact of weather events was also analyzed. Flash floods and thunderstorm winds caused billions of dollars in property damages between 1950 and 2011. The largest crop damage caused by drought, followed by flood and hails.

Environment

This analysis was run on a Windows 10 with a 2.4 GHz Intel Core i5 processor. The analysis was written using the R programming language. The versions of R and the related R libraries are listed below.

#library(R.utils)

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.4     tools_3.3.0     htmltools_0.3.5
##  [5] yaml_2.1.13     Rcpp_0.12.5     stringi_1.1.1   rmarkdown_0.9.6
##  [9] knitr_1.13      stringr_1.0.0   digest_0.6.9    evaluate_0.9

Data Processing

The data is made available by the NOAA at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.

Loading the data

file <- "repdata-data-StormData.csv"
bzip_file <- "repdata-data-StormData.csv.bz2"
if(!file.exists(file)) {
    input_file <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(input_file, destfile = bzip_file)
    bunzip2(bzip_file)
}
df <- read.csv("repdata-data-StormData.csv")
dim(df)
## [1] 902297     37
names(df)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Cleaning the Data

Correcting for errors in column EVTYPE - the event type

The column EVTYPE holds the type of storm events. It contains over 900 different types which do not directly match the events listed in the NOAA STORM DATA PREPARATION document, which can be found at, https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf. This document states that “The only events permitted in Storm Data are listed in Table 1 of Section 2.1.1.”

The discrepancy includes different abbreviations and numerous entry errors such as misspellings. Some event classifications such as Mud Slides are not included in the NOAA STORM DATA PREPARATION list.

The code segment below attempts to match the events in the data with the events in the description file.

df$EVTYPE=trimws(df$EVTYPE)
df$EVTYPE <- toupper(gsub("^\\s+|\\s+$","", df[,"EVTYPE"]))
df$EVTYPE[grep("ASTRONOMICAL LOW TIDE*", df$EVTYPE)] <- "Astronomical Low Tide"
df$EVTYPE[grep("AVAL*", df$EVTYPE)] <- "Avalanche"
df$EVTYPE[grep("BLOWING SNOW", df$EVTYPE)]  <- "Blizzard"
df$EVTYPE[grep("BLIZZ*", df$EVTYPE)]  <- "Blizzard"
df$EVTYPE[grep("COASTAL*", df$EVTYPE)] <- "Coastal Flood"
df$EVTYPE[grep("COLD*", df$EVTYPE)] <- "Cold/Wind Chill"
df$EVTYPE[grep("LOW TEMP*", df$EVTYPE)] <- "Cold/Wind Chill"
df$EVTYPE[grep("DEBRI", df$EVTYPE)] <- "Debris Flow"
df$EVTYPE[grep("MUD*", df$EVTYPE)] <- "Debris Flow"
df$EVTYPE[grep("DENSE FOG", df$EVTYPE)] <- "Dense Fog"
df$EVTYPE[grep("DENSE SMOKE", df$EVTYPE)] <- "Dense Smoke"
df$EVTYPE[grep("DROUGHT", df$EVTYPE)] <- "Drought"
df$EVTYPE[grep("*DRY*", df$EVTYPE)] <- "Drought"
df$EVTYPE[grep("DUST DEV*", df$EVTYPE)] <-"Dust Devil"
df$EVTYPE[grep("DUST ST*", df$EVTYPE)] <- "Dust Storm"
df$EVTYPE[grep("EXCESSIVE HEAT", df$EVTYPE)] <- "Excessive Heat"
df$EVTYPE[grep("RECORD TEMPERATURE", df$EVTYPE)] <- "Excessive Heat"
df$EVTYPE[grep("RECORD HIGH*", df$EVTYPE)] <- "Excessive Heat"
df$EVTYPE[grep("TEMPERATURE RECORD*", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("FROST*", df$EVTYPE)] <- "Frost/Freeze"
df$EVTYPE[grep("*FREEZ*", df$EVTYPE)] <- "Frost/Freeze"
df$EVTYPE[grep("*FREEZING FOG*", df$EVTYPE)] <- "Freezing Fog"
df$EVTYPE[grep("FUNNEL*", df$EVTYPE)] <- "Funnel Cloud"
df$EVTYPE[grep("HEAVY RAIN", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("EXCESSIVE RAIN*", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("HVY RAIN", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("RECORD RAIN", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("*RAIN*", df$EVTYPE)] <- "Heavy Rain"
df$EVTYPE[grep("*SNOW*", df$EVTYPE)] <- "Winter Storm"
df$EVTYPE[grep("*FLOOD*", df$EVTYPE)] <- "Flood"
df$EVTYPE[grep("*FLD*", df$EVTYPE)] <- "Flood"
df$EVTYPE[grep("*FLASH*", df$EVTYPE)] <- "Flash Flood"
df$EVTYPE[grep("LIG*", df$EVTYPE)] <- "Lightning"
df$EVTYPE[grep("WINT*", df$EVTYPE)] <- "Winter Storm"
df$EVTYPE[grep("EXTREME COLD*", df$EVTYPE)] <- "Extreme Cold/Wind Chill"
df$EVTYPE[grep("THUND*", df$EVTYPE)] <- "Thunderstorm Wind"
df$EVTYPE[grep("TSTM*", df$EVTYPE)] <- "Thunderstorm Wind"
df$EVTYPE[grep("MARINE THUND*", df$EVTYPE)] <- "Marine Thunderstorm Wind"
df$EVTYPE[grep("MARINE TSTM*", df$EVTYPE)] <- "Marine Thunderstorm Wind"
df$EVTYPE[grep("HAIL*", df$EVTYPE)] <- "Hail"
df$EVTYPE[grep("HEAT*", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("WARM*", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("UNSEASONABLY HOT", df$EVTYPE)] <- "Heat"
df$EVTYPE[grep("HIGH WIND*", df$EVTYPE)] <- "High Wind"
df$EVTYPE[grep("*TURBU*", df$EVTYPE)] <- "High Wind"
df$EVTYPE[grep("HURR*", df$EVTYPE)] <-"Hurricane (Typhoon)"
df$EVTYPE[grep("TYPH*", df$EVTYPE)] <-"Hurricane (Typhoon)"
df$EVTYPE[grep("RIP*", df$EVTYPE)] <- "Rip Current"
df$EVTYPE[grep("STRONG WIN*", df$EVTYPE)] <- "Strong Wind"
df$EVTYPE[grep("GUS*", df$EVTYPE)] <- "Strong Wind"
df$EVTYPE[grep("MARINE STRONG WIN*", df$EVTYPE)] <- "Marine Strong Wind"
df$EVTYPE[grep("ICE*", df$EVTYPE)] <- "Ice Storm"
df$EVTYPE[grep("WILD*", df$EVTYPE)] <- "Wildfire"
df$EVTYPE[grep("*FIRE*", df$EVTYPE)] <- "Wildfire"
df$EVTYPE[grep("TORN*", df$EVTYPE)] <- "Tornado"
df$EVTYPE[grep("TSUN*", df$EVTYPE)] <- "Tsunami"
df$EVTYPE[grep("SLEET", df$EVTYPE)] <- "Sleet"
df$EVTYPE[grep("MIX*", df$EVTYPE)] <- "Sleet"
df$EVTYPE[grep("*WATERSPOUT*", df$EVTYPE)] <- "Waterspout"
df$EVTYPE[grep("*SPOUT*", df$EVTYPE)] <- "Waterspout"
df$EVTYPE[grep("HIGH SURF*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("ROUGH SURF*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("ROUGH SEA*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("HIGH SEA*", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("HIGH TIDE", df$EVTYPE)] <- "High Surf"
df$EVTYPE[grep("HIGH SWELL*", df$EVTYPE)] <- "High Surf"

Creating a column for total crop and property damage amounts

The PROPEXP and CROPEXP fields contain a character used to define an exponent value to use with the PROPDMG and CROPDMG value amounts. The code below creates a new field that contains the amount with the exp applied. Creating a new column with the actual amount makes it easier to aggregate the crop and property damge amounts against the EVTYPE.

df$PROPEXP[df$PROPDMGEXP == "K"] <- 1000
df$PROPEXP[df$PROPDMGEXP == "M"] <- 1e+06
df$PROPEXP[df$PROPDMGEXP == ""] <- 1
df$PROPEXP[df$PROPDMGEXP == "B"] <- 1e+09
df$PROPEXP[df$PROPDMGEXP == "m"] <- 1e+06
df$PROPEXP[df$PROPDMGEXP == "0"] <- 1
df$PROPEXP[df$PROPDMGEXP == "5"] <- 1e+05
df$PROPEXP[df$PROPDMGEXP == "6"] <- 1e+06
df$PROPEXP[df$PROPDMGEXP == "4"] <- 10000
df$PROPEXP[df$PROPDMGEXP == "2"] <- 100
df$PROPEXP[df$PROPDMGEXP == "3"] <- 1000
df$PROPEXP[df$PROPDMGEXP == "h"] <- 100
df$PROPEXP[df$PROPDMGEXP == "7"] <- 1e+07
df$PROPEXP[df$PROPDMGEXP == "H"] <- 100
df$PROPEXP[df$PROPDMGEXP == "1"] <- 10
df$PROPEXP[df$PROPDMGEXP == "8"] <- 1e+08
# Assigning '0' to invalid exponent data
df$PROPEXP[df$PROPDMGEXP == "+"] <- 0
df$PROPEXP[df$PROPDMGEXP == "-"] <- 0
df$PROPEXP[df$PROPDMGEXP == "?"] <- 0

df$PROPTOT <- (df$PROPDMG * df$PROPEXP) / 1000000000


df$CROPEXP[df$CROPDMGEXP == "M"] <- 1e+06
df$CROPEXP[df$CROPDMGEXP == "K"] <- 1000
df$CROPEXP[df$CROPDMGEXP == "m"] <- 1e+06
df$CROPEXP[df$CROPDMGEXP == "B"] <- 1e+09
df$CROPEXP[df$CROPDMGEXP == "0"] <- 1
df$CROPEXP[df$CROPDMGEXP == "k"] <- 1000
df$CROPEXP[df$CROPDMGEXP == "2"] <- 100
df$CROPEXP[df$CROPDMGEXP == ""] <- 1

df$CROPEXP[df$CROPDMGEXP == "?"] <- 0

df$CROPTOT <- (df$CROPDMG * df$CROPEXP) / 1000000000

Results

Calculate the Property Damage by event. Then create a list of property damage for the top 10 events.

p <- aggregate(PROPTOT ~EVTYPE,df,sum)
row_sub = apply(p, 1, function(x) { as.numeric(x[2]) > 0  })
p <- p[row_sub,]



prop <- p[order(-p$PROPTOT),]
prop <- prop[1:10,]

Calculate the crop damage by event. Then create a list of crop damage for the top 10 events.

cr <- aggregate(CROPTOT ~EVTYPE,df,sum)

row_sub = apply(cr, 1, function(x) { as.numeric(x[2]) > 0  })
cr <- cr[row_sub,]
crop <- cr[order(-cr$CROPTOT),]
crop <- crop[1:10,]

Calculate the fatalities by event. Then create a list of the top 10 events for fatalities

fatalities <- aggregate(FATALITIES ~EVTYPE,df,sum)

row_sub = apply(fatalities, 1, function(x) {as.integer(x[2]) > 0} )
fatalities <- fatalities[row_sub,]
fat <- fatalities[order(-fatalities$FATALITIES),]
fat <- fat[1:10,]

Calculate injuries by event. Then create a list of the top 10 events for injuries.

injuries <- aggregate(INJURIES ~EVTYPE,df,sum)
row_sub = apply(injuries, 1, function(x) { as.integer(x[2]) > 0  })
injuries <- injuries[row_sub,]
inj <- injuries[order(-injuries$INJURIES),]
inj <- inj[1:10,]

Displaying Result Data

Plot Results for Fatalities and Injuries

par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8,las=3)

barplot(fat$FATALITIES,names=fat$EVTYPE,ylab="Fatalities",main=strwrap("Top 10 events for Fatalities",30),col="green")
barplot(inj$INJURIES,names=inj$EVTYPE,ylab="Injuries",main=strwrap("Top 10 events for Injuries",30),col="blue")

#### Plot Results for Crop and Property Damage

par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8,las=3)
#par(mfrow=c(1,3),mar=c(8,4,8,4),las=3)
    
barplot(prop$PROPTOT,names=prop$EVTYPE,ylab="Property Damages (in Billions)",main=strwrap("Top 10 events for Property Damage",30),col="green")
    
barplot(crop$CROPTOT,names=crop$EVTYPE,ylab="Crop Damages (in Billions)",main=strwrap("Top 10 events for Crop Damage",30),col="blue")

Frequency of Events

Check to see which events occurred most often.

fq <- table(df$EVTYPE)
dfq <- data.frame(fq)
dfq<- dfq[order(-dfq$Freq),]
dfq[1:28,]
##                      Var1   Freq
## 139          Winter Storm 400553
## 35                   Hail 289274
## 30                  Flood  85249
## 122               Tornado  61120
## 52              Lightning  16390
## 38             Heavy Rain  11905
## 33           Funnel Cloud   6985
## 138              Wildfire   4237
## 134            Waterspout   3849
## 49              Ice Storm   3056
## 22                Drought   2847
## 9                Blizzard   2774
## 13        Cold/Wind Chill   2473
## 32           Frost/Freeze   1886
## 26         Excessive Heat   1713
## 37                   Heat   1656
## 19              Dense Fog   1296
## 12          Coastal Flood    866
## 62            Rip Current    786
## 41              High Surf    761
## 31                    FOG    538
## 24             Dust Storm    429
## 5               Avalanche    388
## 48    Hurricane (Typhoon)    296
## 4   Astronomical Low Tide    174
## 23             Dust Devil    151
## 65                  Sleet    115
## 121     Thunderstorm Wind     96

Conclusion

Tornados are by far the leading cause of injury and death. Floods caused the most property and crop damage combined while drought caused the most crop damage.

Fatalities

Tornados have caused 5660 fatalities. Excessive Heat 1920 , Winter Storms 1649, Floods 1546 and Heat 1281 are next .

Injuries

Tornado’s caused 91450 injuries. Winter Storms 14516, Floods 8676, and Excessive Heat 6525 are next.

Property Damage

Floods caused the most property damage by a lot. Floods $167billion, Tornados $104billion and Hurricanes $85billion caused the most property damage.

Crop Damage

Drought and Floods caused much more crop damage than any other event. Drought $13billion and Floods $12billion. Ice Storm $5billion and Hurricanes $5billion were next.

Frequency

Of the major causes of injury/death or property/crop damage it’s interesting to note that Excessive Heat 1713 and Hurricanes occur much less frequently than Tornados, Winter Storms 400553 and Floods 85249. While there were 61120 Tornados, there were only 296 Hurricanes.

The great difference in occurrences of these devastating events probably means that different types of emergency and readiness plans need to be enacted for the different events.