This data analysis presents a brief look at the most damaging events documented in the National Weather Service Storm Data File. First, 99.5% of the entries in the database are classified into one of 48 weather event types as described in the Storm Data Documentation. Following this cleanup, I looked at the frequency of each of the 48 types of events. I then created charts of the most damaging types of events in terms of 4 factors: Injuries, Fatalities, Crop Damage, and Property Damage. Based on these charts, I make a conclusion about which types of events are most damaging overall.
The csv file was downloaded from the “Storm Data File” link above and placed in the working directory. It was then read into an object in R using the following code:
stormdata <- read.csv("repdata-data-StormData.csv")
The events in this document were classified according to the EVTYPE column. However, there was a huge problem initially as the EVTYPE column was very messy. A look at the number of unique values in that column reveals a number much greater than the 48 different types the events should be classified into.
length(levels(stormdata$EVTYPE))
## [1] 985
So, after thoroughly reading over the documentation on the 48 types of events in the Storm Data Documentation, I created Regular Expressions for each of the 48 events to use in classification. Using these regular expressions I created a new column, EVTYPE2, with my new classifications.
stormdata$EVTYPE2 <- NA #Creates EVTYPE2 column
#Classifying the various weather types
stormdata$EVTYPE2[which(grepl("High Wind", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Marine", stormdata$EVTYPE, ignore.case=TRUE))] <- "High Wind"
stormdata$EVTYPE2[which(grepl("Strong Wind", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Marine", stormdata$EVTYPE, ignore.case=TRUE))] <- "Strong Wind"
stormdata$EVTYPE2[which(grepl("Thunderstorm|TSTM", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Marine", stormdata$EVTYPE, ignore.case=TRUE))] <- "Thunderstorm Wind"
stormdata$EVTYPE2[grep("Astronomical Low Tide", stormdata$EVTYPE, ignore.case=TRUE)] <- "Astronomical Low Tide"
stormdata$EVTYPE2[grep("Avalanche", stormdata$EVTYPE, ignore.case=TRUE)] <- "Avalanche"
stormdata$EVTYPE2[grep("Blizzard", stormdata$EVTYPE, ignore.case=TRUE)] <- "Blizzard"
stormdata$EVTYPE2[which(grepl("Coastal", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Flood", stormdata$EVTYPE, ignore.case=TRUE))] <- "Coastal Flood"
stormdata$EVTYPE2[which(grepl("Wind Chill|Windchill", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Cold/Wind Chill"
stormdata$EVTYPE2[grep("Debris Flow|Debrisflow|Landslide|Land Slide|Mudslide|Mud Slide", stormdata$EVTYPE, ignore.case=TRUE)] <- "Debris Flow"
stormdata$EVTYPE2[grep("Fog", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dense Fog"
stormdata$EVTYPE2[grep("Smoke", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dense Smoke"
stormdata$EVTYPE2[grep("Drought", stormdata$EVTYPE, ignore.case=TRUE)] <- "Drought"
stormdata$EVTYPE2[grep("Dust Devil|Dust Devel|Dustdevil|Dustdevel", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dust Devil"
stormdata$EVTYPE2[grep("Dust Storm|Duststorm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dust storm"
stormdata$EVTYPE2[which(grepl("Heat", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Drought", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Excessive Heat"
stormdata$EVTYPE2[which(grepl("Cold", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Extreme Cold/Wind Chill"
stormdata$EVTYPE2[grep("Flashflood|Flash Flood", stormdata$EVTYPE, ignore.case=TRUE)] <- "Flash Flood"
stormdata$EVTYPE2[which(grepl("Flood|FLD", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Flash|Coastal|Lakeshore", stormdata$EVTYPE, ignore.case=TRUE))] <- "Flood"
stormdata$EVTYPE2[which(grepl("Frost|Freeze", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Frost/Freeze"
stormdata$EVTYPE2[grep("Funnel", stormdata$EVTYPE, ignore.case=TRUE)] <- "Funnel Cloud"
stormdata$EVTYPE2[grep("Freezing Fog", stormdata$EVTYPE, ignore.case=TRUE)] <- "Freezing Fog"
stormdata$EVTYPE2[which(grepl("Hail", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Funnel|Marine|Thunder|Flood|TSTM", stormdata$EVTYPE, ignore.case=TRUE))] <- "Hail"
stormdata$EVTYPE2[which(grepl("Heat", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Drought", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Heat"
stormdata$EVTYPE2[which(grepl("Heavy Rain", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Snow|Lightning", stormdata$EVTYPE, ignore.case=TRUE))] <- "Heavy Rain"
stormdata$EVTYPE2[which(grepl("Heavy", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Snow", stormdata$EVTYPE, ignore.case=TRUE)& !grepl("Lightning", stormdata$EVTYPE, ignore.case=TRUE))] <- "Heavy Snow"
stormdata$EVTYPE2[grep("Surf", stormdata$EVTYPE, ignore.case=TRUE)] <- "High Surf"
stormdata$EVTYPE2[grep("Hurricane|Typhoon", stormdata$EVTYPE, ignore.case=TRUE)] <- "Hurricane(Typhoon)"
stormdata$EVTYPE2[grep("Ice Storm|Icestorm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Ice Storm"
stormdata$EVTYPE2[grep("Lake-Effect Snow|Lake Effect Snow|Lake Snow", stormdata$EVTYPE, ignore.case=TRUE)] <- "Lake-Effect Snow"
stormdata$EVTYPE2[grep("Lakeshore Flood", stormdata$EVTYPE, ignore.case=TRUE)] <- "Lakeshore Flood"
stormdata$EVTYPE2[grep("Lightning", stormdata$EVTYPE, ignore.case=TRUE)] <- "Lightning"
stormdata$EVTYPE2[grep("Marine Hail", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine Hail"
stormdata$EVTYPE2[grep("Marine High Wind", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine High Wind"
stormdata$EVTYPE2[grep("Marine Strong Wind", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine Strong Wind"
stormdata$EVTYPE2[grep("Marine Thunderstorm Wind|Marine TSTM Wind", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine Thunderstorm Wind"
stormdata$EVTYPE2[grep("Rip Current", stormdata$EVTYPE, ignore.case=TRUE)] <- "Rip Current"
stormdata$EVTYPE2[grep("Seiche", stormdata$EVTYPE, ignore.case=TRUE)] <- "Seiche"
stormdata$EVTYPE2[grep("Sleet", stormdata$EVTYPE, ignore.case=TRUE)] <- "Sleet"
stormdata$EVTYPE2[grep("Storm Surge", stormdata$EVTYPE, ignore.case=TRUE)] <- "Storm Surge/Tide"
stormdata$EVTYPE2[which(grepl("Tornado", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Waterspout", stormdata$EVTYPE, ignore.case=TRUE))] <- "Tornado"
stormdata$EVTYPE2[grep("Tropical Depression", stormdata$EVTYPE, ignore.case=TRUE)] <- "Tropical Depression"
stormdata$EVTYPE2[grep("Tropical Storm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Tropical Storm"
stormdata$EVTYPE2[grep("Tsunami", stormdata$EVTYPE, ignore.case=TRUE)] <- "Tsunami"
stormdata$EVTYPE2[grep("Volcanic Ash", stormdata$EVTYPE, ignore.case=TRUE)] <- "Volcanic Ash"
stormdata$EVTYPE2[grep("Waterspout", stormdata$EVTYPE, ignore.case=TRUE)] <- "Waterspout"
stormdata$EVTYPE2[which(grepl("Wild", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Fire", stormdata$EVTYPE, ignore.case=TRUE))] <- "Wildfire"
stormdata$EVTYPE2[grep("Winter Storm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Winter Storm"
stormdata$EVTYPE2[grep("Winter Weather|Wintry Mix|Wintery Mix", stormdata$EVTYPE, ignore.case=TRUE)] <- "Winter Weather"
To make sure nearly every event was classified into one of the 48 types, I compared the total number of rows to the number of rows with NA values. Less than .5% of the data were not classified into one of the 48 types.
nrow(stormdata) #Total number of entries
## [1] 902297
nrow(stormdata[which(is.na(stormdata$EVTYPE2)),]) # Total number of unclassified entries
## [1] 3922
nrow(stormdata[which(is.na(stormdata$EVTYPE2)),]) / nrow(stormdata) # Percent of all entries that are unclassified
## [1] 0.004347
Next, I used the plyr package to create a table of each type of event, its total number of entries in the database, and its average number of injuries, number of fatalities, average crop damage and average property damage.
stormdata$count <- 1
library(plyr)
stormtable <- ddply(stormdata, .(EVTYPE2), summarize, Events=sum(count), Injuries=mean(INJURIES), Fatalities=mean(FATALITIES), CropDamage = mean(CROPDMG), PropertyDamage=mean(PROPDMG))
stormtable <- stormtable[1:48,] # Removes NA row from table
stormtable
## EVTYPE2 Events Injuries Fatalities CropDamage
## 1 Astronomical Low Tide 174 0.0000000 0.000e+00 0.000000
## 2 Avalanche 386 0.4404145 5.803e-01 0.000000
## 3 Blizzard 2735 0.2943327 3.693e-02 0.062888
## 4 Coastal Flood 851 0.0082256 7.051e-03 0.065805
## 5 Cold/Wind Chill 574 0.0209059 1.655e-01 1.045296
## 6 Debris Flow 643 0.0855365 6.843e-02 0.057543
## 7 Dense Fog 1837 0.5862820 4.409e-02 0.000000
## 8 Dense Smoke 21 0.0000000 0.000e+00 0.000000
## 9 Drought 2512 0.0075637 2.389e-03 13.516879
## 10 Dust Devil 150 0.2866667 1.333e-02 0.000000
## 11 Dust storm 430 1.0232558 5.116e-02 4.887209
## 12 Excessive Heat 1786 3.7681971 1.130e+00 0.279619
## 13 Extreme Cold/Wind Chill 1770 0.1440678 1.638e-01 3.538836
## 14 Flash Flood 55665 0.0323363 1.859e-02 3.350116
## 15 Flood 29557 0.2325676 1.729e-02 6.103032
## 16 Freezing Fog 46 0.0000000 0.000e+00 0.000000
## 17 Frost/Freeze 1510 0.0019868 1.325e-03 6.384377
## 18 Funnel Cloud 6990 0.0004292 0.000e+00 0.000000
## 19 Hail 288839 0.0047466 5.193e-05 2.013123
## 20 Heat 845 2.9337278 1.318e+00 1.086391
## 21 Heavy Rain 11812 0.0215882 8.381e-03 1.019793
## 22 Heavy Snow 15798 0.0655146 8.166e-03 0.138354
## 23 High Surf 1063 0.2314205 1.515e-01 0.001411
## 24 High Wind 21782 0.0691397 1.345e-02 0.966294
## 25 Hurricane(Typhoon) 299 4.4581940 4.515e-01 38.922375
## 26 Ice Storm 2032 0.9803150 4.380e-02 0.831176
## 27 Lake-Effect Snow 684 0.0000000 0.000e+00 0.000000
## 28 Lakeshore Flood 23 0.0000000 0.000e+00 0.000000
## 29 Lightning 15776 0.3316430 5.179e-02 0.227283
## 30 Marine Hail 442 0.0000000 0.000e+00 0.000000
## 31 Marine High Wind 135 0.0074074 7.407e-03 0.000000
## 32 Marine Strong Wind 48 0.4583333 2.917e-01 0.000000
## 33 Marine Thunderstorm Wind 11987 0.0028364 1.585e-03 0.004171
## 34 Rip Current 777 0.6808237 7.426e-01 0.000000
## 35 Seiche 21 0.0000000 0.000e+00 0.000000
## 36 Sleet 122 0.0000000 1.639e-02 0.000000
## 37 Storm Surge/Tide 409 0.1051345 5.868e-02 2.090465
## 38 Strong Wind 3776 0.0797140 2.940e-02 0.429529
## 39 Thunderstorm Wind 324794 0.0292770 2.186e-03 0.613636
## 40 Tornado 60686 1.5055202 9.323e-02 1.648309
## 41 Tropical Depression 60 0.0000000 0.000e+00 0.000000
## 42 Tropical Storm 697 0.5494978 9.469e-02 9.275638
## 43 Tsunami 20 6.4500000 1.650e+00 1.000000
## 44 Volcanic Ash 27 0.0000000 0.000e+00 0.000000
## 45 Waterspout 3861 0.0186480 1.554e-03 0.000000
## 46 Wildfire 4231 0.3795793 2.127e-02 2.142694
## 47 Winter Storm 11441 0.1182589 1.897e-02 0.217113
## 48 Winter Weather 8251 0.0745364 7.514e-03 0.001818
## PropertyDamage
## 1 1.83908
## 2 4.20699
## 3 9.31206
## 4 21.21150
## 5 3.46690
## 6 31.17114
## 7 9.29519
## 8 4.76190
## 9 1.71141
## 10 4.79087
## 11 11.85930
## 12 0.82033
## 13 6.13957
## 14 26.48605
## 15 32.75649
## 16 4.00000
## 17 1.12419
## 18 0.02856
## 19 2.38828
## 20 1.85533
## 21 4.62624
## 22 7.91671
## 23 6.12758
## 24 17.53033
## 25 84.23629
## 26 32.77592
## 27 21.50292
## 28 2.06522
## 29 38.26764
## 30 0.00905
## 31 2.20748
## 32 8.71521
## 33 0.23863
## 34 0.20978
## 35 46.66667
## 36 5.74836
## 37 63.98665
## 38 17.01041
## 39 8.23312
## 40 52.96994
## 41 12.30000
## 42 71.63943
## 43 45.26500
## 44 18.51852
## 45 2.79220
## 46 29.57889
## 47 11.73679
## 48 2.05077
Finally, I created 5 additional data frames, each sorting the stormtable data frame by a different column.
sortbycount <- stormtable[order(-stormtable$Events),]
sortbyfatalities <- stormtable[order(-stormtable$Fatalities),]
sortbyinjuries <- stormtable[order(-stormtable$Injuries),]
sortbypropdmg <- stormtable[order(-stormtable$PropertyDamage),]
sortbycropdmg <- stormtable[order(-stormtable$CropDamage),]
The vast majority of events were classified as either Thunderstorm Wind or Hail. These two types make up about two thirds of all weather events reported. Flood-related events and tornadoes are also common.
#PLOT 1
par(mar=c(6,10,4,2),cex.axis=0.75)
barplot(sortbycount$Events[10:1],names.arg=sortbycount$EVTYPE2[10:1], horiz=TRUE, col="blue", las=2, cex.names=0.75)
title(main="Top 10 most common weather events",line=2)
title(xlab="Number of recorded events",line=4)
title(ylab="Type of event",line=6)
Tsunamis are by far the type of event most damaging to populations, both in terms of injuries and fatalities. Heat and excessive heat are also highly fatal and cause many injuries. Interestingly, heat which is not classified as excessive has a higher average fatality rate than excessive heat, perhaps because the category “Heat” includes prolonged periods of heat which are likely to cause more deaths simply by lasting longer. Hurricanes and Rip currents both rank highly on the lists of both injuries and fatalities. Tornadoes, dust storms, and ice storms cause many injuries but few fatalities. Avalanches, Cold/Wind Chill, and Marine Strong Winds cause many fatalities but fewer non-fatal injuries.
#PLOT 2
par(mar=c(4,10,3,2),cex.axis=0.75, mfcol=c(2,1))
barplot(sortbyinjuries$Injuries[10:1],names.arg=sortbyinjuries$EVTYPE2[10:1], horiz=TRUE, col="salmon", las=2, cex.names=0.75)
title(main="Top 10 worst events by injuries",line=1)
title(xlab="Average injuries per event",line=2)
title(ylab="Type of event",line=8)
barplot(sortbyfatalities$Fatalities[10:1],names.arg=sortbyfatalities$EVTYPE2[10:1], horiz=TRUE, col="red", las=2, cex.names=0.75)
title(main="Top 10 worst events by fatalities",line=1)
title(xlab="Average fatalities per event",line=2)
title(ylab="Type of event",line=8)
Hurricanes inflict the most property damage, and they inflict the most damage to crops, by far. Tropical storms are the next most damaging to property, and they also cause high levels of damage to crops. Drought inlficts a lot of crop damage but little property damage otherwise. Floods, which are among the most common events, cause a high level of crop damage and property damage. Finally, storm surges and tornadoes rank highly on the list of most property damaging event types.
#PLOT 3
par(mar=c(4,10,3,2),cex.axis=0.75, mfcol=c(2,1))
barplot(sortbycropdmg$CropDamage[10:1],names.arg=sortbycropdmg$EVTYPE2[10:1], horiz=TRUE, col="orange", las=2, cex.names=0.75)
title(main="Top 10 worst events by crop damage",line=1)
title(xlab="Average crop damage per event",line=2)
title(ylab="Type of event",line=8)
barplot(sortbypropdmg$PropertyDamage[10:1],names.arg=sortbypropdmg$EVTYPE2[10:1], horiz=TRUE, col="green", las=2, cex.names=0.75)
title(main="Top 10 worst events by property damage",line=1)
title(xlab="Average property damage per event",line=2)
title(ylab="Type of event",line=8)