Analysis of Weather Events by Population and Economic Damage

Synopsis

This data analysis presents a brief look at the most damaging events documented in the National Weather Service Storm Data File. First, 99.5% of the entries in the database are classified into one of 48 weather event types as described in the Storm Data Documentation. Following this cleanup, I looked at the frequency of each of the 48 types of events. I then created charts of the most damaging types of events in terms of 4 factors: Injuries, Fatalities, Crop Damage, and Property Damage. Based on these charts, I make a conclusion about which types of events are most damaging overall.

Loading the data

The csv file was downloaded from the “Storm Data File” link above and placed in the working directory. It was then read into an object in R using the following code:

stormdata <- read.csv("repdata-data-StormData.csv")

Data Processing

The events in this document were classified according to the EVTYPE column. However, there was a huge problem initially as the EVTYPE column was very messy. A look at the number of unique values in that column reveals a number much greater than the 48 different types the events should be classified into.

length(levels(stormdata$EVTYPE))
## [1] 985

So, after thoroughly reading over the documentation on the 48 types of events in the Storm Data Documentation, I created Regular Expressions for each of the 48 events to use in classification. Using these regular expressions I created a new column, EVTYPE2, with my new classifications.

stormdata$EVTYPE2 <- NA #Creates EVTYPE2 column

#Classifying the various weather types

stormdata$EVTYPE2[which(grepl("High Wind", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Marine", stormdata$EVTYPE, ignore.case=TRUE))] <- "High Wind"
stormdata$EVTYPE2[which(grepl("Strong Wind", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Marine", stormdata$EVTYPE, ignore.case=TRUE))] <- "Strong Wind"
stormdata$EVTYPE2[which(grepl("Thunderstorm|TSTM", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Marine", stormdata$EVTYPE, ignore.case=TRUE))] <- "Thunderstorm Wind"
stormdata$EVTYPE2[grep("Astronomical Low Tide", stormdata$EVTYPE, ignore.case=TRUE)] <- "Astronomical Low Tide"
stormdata$EVTYPE2[grep("Avalanche", stormdata$EVTYPE, ignore.case=TRUE)] <- "Avalanche" 
stormdata$EVTYPE2[grep("Blizzard", stormdata$EVTYPE, ignore.case=TRUE)] <- "Blizzard"
stormdata$EVTYPE2[which(grepl("Coastal", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Flood", stormdata$EVTYPE, ignore.case=TRUE))] <- "Coastal Flood"
stormdata$EVTYPE2[which(grepl("Wind Chill|Windchill", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Cold/Wind Chill"
stormdata$EVTYPE2[grep("Debris Flow|Debrisflow|Landslide|Land Slide|Mudslide|Mud Slide", stormdata$EVTYPE, ignore.case=TRUE)] <- "Debris Flow" 
stormdata$EVTYPE2[grep("Fog", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dense Fog"
stormdata$EVTYPE2[grep("Smoke", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dense Smoke"
stormdata$EVTYPE2[grep("Drought", stormdata$EVTYPE, ignore.case=TRUE)] <- "Drought"
stormdata$EVTYPE2[grep("Dust Devil|Dust Devel|Dustdevil|Dustdevel", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dust Devil"
stormdata$EVTYPE2[grep("Dust Storm|Duststorm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Dust storm"
stormdata$EVTYPE2[which(grepl("Heat", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Drought", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Excessive Heat"
stormdata$EVTYPE2[which(grepl("Cold", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Extreme Cold/Wind Chill"
stormdata$EVTYPE2[grep("Flashflood|Flash Flood", stormdata$EVTYPE, ignore.case=TRUE)] <- "Flash Flood"
stormdata$EVTYPE2[which(grepl("Flood|FLD", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Flash|Coastal|Lakeshore", stormdata$EVTYPE, ignore.case=TRUE))] <- "Flood"
stormdata$EVTYPE2[which(grepl("Frost|Freeze", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Frost/Freeze"
stormdata$EVTYPE2[grep("Funnel", stormdata$EVTYPE, ignore.case=TRUE)] <- "Funnel Cloud"
stormdata$EVTYPE2[grep("Freezing Fog", stormdata$EVTYPE, ignore.case=TRUE)] <- "Freezing Fog"
stormdata$EVTYPE2[which(grepl("Hail", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Funnel|Marine|Thunder|Flood|TSTM", stormdata$EVTYPE, ignore.case=TRUE))] <- "Hail"
stormdata$EVTYPE2[which(grepl("Heat", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Drought", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Extreme|Excessive|Record|Severe|Unseasonabl|Unusual", stormdata$EVTYPE, ignore.case=TRUE))] <- "Heat"
stormdata$EVTYPE2[which(grepl("Heavy Rain", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Snow|Lightning", stormdata$EVTYPE, ignore.case=TRUE))] <- "Heavy Rain"
stormdata$EVTYPE2[which(grepl("Heavy", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Snow", stormdata$EVTYPE, ignore.case=TRUE)& !grepl("Lightning", stormdata$EVTYPE, ignore.case=TRUE))] <- "Heavy Snow"
stormdata$EVTYPE2[grep("Surf", stormdata$EVTYPE, ignore.case=TRUE)] <- "High Surf"
stormdata$EVTYPE2[grep("Hurricane|Typhoon", stormdata$EVTYPE, ignore.case=TRUE)] <- "Hurricane(Typhoon)"
stormdata$EVTYPE2[grep("Ice Storm|Icestorm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Ice Storm"
stormdata$EVTYPE2[grep("Lake-Effect Snow|Lake Effect Snow|Lake Snow", stormdata$EVTYPE, ignore.case=TRUE)] <- "Lake-Effect Snow"
stormdata$EVTYPE2[grep("Lakeshore Flood", stormdata$EVTYPE, ignore.case=TRUE)] <- "Lakeshore Flood"
stormdata$EVTYPE2[grep("Lightning", stormdata$EVTYPE, ignore.case=TRUE)] <- "Lightning"
stormdata$EVTYPE2[grep("Marine Hail", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine Hail"
stormdata$EVTYPE2[grep("Marine High Wind", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine High Wind"
stormdata$EVTYPE2[grep("Marine Strong Wind", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine Strong Wind"
stormdata$EVTYPE2[grep("Marine Thunderstorm Wind|Marine TSTM Wind", stormdata$EVTYPE, ignore.case=TRUE)] <- "Marine Thunderstorm Wind"
stormdata$EVTYPE2[grep("Rip Current", stormdata$EVTYPE, ignore.case=TRUE)] <- "Rip Current"
stormdata$EVTYPE2[grep("Seiche", stormdata$EVTYPE, ignore.case=TRUE)] <- "Seiche"
stormdata$EVTYPE2[grep("Sleet", stormdata$EVTYPE, ignore.case=TRUE)] <- "Sleet"
stormdata$EVTYPE2[grep("Storm Surge", stormdata$EVTYPE, ignore.case=TRUE)] <- "Storm Surge/Tide"
stormdata$EVTYPE2[which(grepl("Tornado", stormdata$EVTYPE, ignore.case=TRUE) & !grepl("Waterspout", stormdata$EVTYPE, ignore.case=TRUE))] <- "Tornado"
stormdata$EVTYPE2[grep("Tropical Depression", stormdata$EVTYPE, ignore.case=TRUE)] <- "Tropical Depression"
stormdata$EVTYPE2[grep("Tropical Storm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Tropical Storm"
stormdata$EVTYPE2[grep("Tsunami", stormdata$EVTYPE, ignore.case=TRUE)] <- "Tsunami"
stormdata$EVTYPE2[grep("Volcanic Ash", stormdata$EVTYPE, ignore.case=TRUE)] <- "Volcanic Ash"
stormdata$EVTYPE2[grep("Waterspout", stormdata$EVTYPE, ignore.case=TRUE)] <- "Waterspout"
stormdata$EVTYPE2[which(grepl("Wild", stormdata$EVTYPE, ignore.case=TRUE) & grepl("Fire", stormdata$EVTYPE, ignore.case=TRUE))] <- "Wildfire"
stormdata$EVTYPE2[grep("Winter Storm", stormdata$EVTYPE, ignore.case=TRUE)] <- "Winter Storm"
stormdata$EVTYPE2[grep("Winter Weather|Wintry Mix|Wintery Mix", stormdata$EVTYPE, ignore.case=TRUE)] <- "Winter Weather"

To make sure nearly every event was classified into one of the 48 types, I compared the total number of rows to the number of rows with NA values. Less than .5% of the data were not classified into one of the 48 types.

nrow(stormdata) #Total number of entries
## [1] 902297
nrow(stormdata[which(is.na(stormdata$EVTYPE2)),]) # Total number of unclassified entries
## [1] 3922
nrow(stormdata[which(is.na(stormdata$EVTYPE2)),]) / nrow(stormdata) # Percent of all entries that are unclassified
## [1] 0.004347

Next, I used the plyr package to create a table of each type of event, its total number of entries in the database, and its average number of injuries, number of fatalities, average crop damage and average property damage.

stormdata$count <- 1

library(plyr)

stormtable <- ddply(stormdata, .(EVTYPE2), summarize, Events=sum(count), Injuries=mean(INJURIES), Fatalities=mean(FATALITIES), CropDamage = mean(CROPDMG), PropertyDamage=mean(PROPDMG))

stormtable <- stormtable[1:48,] # Removes NA row from table

stormtable
##                     EVTYPE2 Events  Injuries Fatalities CropDamage
## 1     Astronomical Low Tide    174 0.0000000  0.000e+00   0.000000
## 2                 Avalanche    386 0.4404145  5.803e-01   0.000000
## 3                  Blizzard   2735 0.2943327  3.693e-02   0.062888
## 4             Coastal Flood    851 0.0082256  7.051e-03   0.065805
## 5           Cold/Wind Chill    574 0.0209059  1.655e-01   1.045296
## 6               Debris Flow    643 0.0855365  6.843e-02   0.057543
## 7                 Dense Fog   1837 0.5862820  4.409e-02   0.000000
## 8               Dense Smoke     21 0.0000000  0.000e+00   0.000000
## 9                   Drought   2512 0.0075637  2.389e-03  13.516879
## 10               Dust Devil    150 0.2866667  1.333e-02   0.000000
## 11               Dust storm    430 1.0232558  5.116e-02   4.887209
## 12           Excessive Heat   1786 3.7681971  1.130e+00   0.279619
## 13  Extreme Cold/Wind Chill   1770 0.1440678  1.638e-01   3.538836
## 14              Flash Flood  55665 0.0323363  1.859e-02   3.350116
## 15                    Flood  29557 0.2325676  1.729e-02   6.103032
## 16             Freezing Fog     46 0.0000000  0.000e+00   0.000000
## 17             Frost/Freeze   1510 0.0019868  1.325e-03   6.384377
## 18             Funnel Cloud   6990 0.0004292  0.000e+00   0.000000
## 19                     Hail 288839 0.0047466  5.193e-05   2.013123
## 20                     Heat    845 2.9337278  1.318e+00   1.086391
## 21               Heavy Rain  11812 0.0215882  8.381e-03   1.019793
## 22               Heavy Snow  15798 0.0655146  8.166e-03   0.138354
## 23                High Surf   1063 0.2314205  1.515e-01   0.001411
## 24                High Wind  21782 0.0691397  1.345e-02   0.966294
## 25       Hurricane(Typhoon)    299 4.4581940  4.515e-01  38.922375
## 26                Ice Storm   2032 0.9803150  4.380e-02   0.831176
## 27         Lake-Effect Snow    684 0.0000000  0.000e+00   0.000000
## 28          Lakeshore Flood     23 0.0000000  0.000e+00   0.000000
## 29                Lightning  15776 0.3316430  5.179e-02   0.227283
## 30              Marine Hail    442 0.0000000  0.000e+00   0.000000
## 31         Marine High Wind    135 0.0074074  7.407e-03   0.000000
## 32       Marine Strong Wind     48 0.4583333  2.917e-01   0.000000
## 33 Marine Thunderstorm Wind  11987 0.0028364  1.585e-03   0.004171
## 34              Rip Current    777 0.6808237  7.426e-01   0.000000
## 35                   Seiche     21 0.0000000  0.000e+00   0.000000
## 36                    Sleet    122 0.0000000  1.639e-02   0.000000
## 37         Storm Surge/Tide    409 0.1051345  5.868e-02   2.090465
## 38              Strong Wind   3776 0.0797140  2.940e-02   0.429529
## 39        Thunderstorm Wind 324794 0.0292770  2.186e-03   0.613636
## 40                  Tornado  60686 1.5055202  9.323e-02   1.648309
## 41      Tropical Depression     60 0.0000000  0.000e+00   0.000000
## 42           Tropical Storm    697 0.5494978  9.469e-02   9.275638
## 43                  Tsunami     20 6.4500000  1.650e+00   1.000000
## 44             Volcanic Ash     27 0.0000000  0.000e+00   0.000000
## 45               Waterspout   3861 0.0186480  1.554e-03   0.000000
## 46                 Wildfire   4231 0.3795793  2.127e-02   2.142694
## 47             Winter Storm  11441 0.1182589  1.897e-02   0.217113
## 48           Winter Weather   8251 0.0745364  7.514e-03   0.001818
##    PropertyDamage
## 1         1.83908
## 2         4.20699
## 3         9.31206
## 4        21.21150
## 5         3.46690
## 6        31.17114
## 7         9.29519
## 8         4.76190
## 9         1.71141
## 10        4.79087
## 11       11.85930
## 12        0.82033
## 13        6.13957
## 14       26.48605
## 15       32.75649
## 16        4.00000
## 17        1.12419
## 18        0.02856
## 19        2.38828
## 20        1.85533
## 21        4.62624
## 22        7.91671
## 23        6.12758
## 24       17.53033
## 25       84.23629
## 26       32.77592
## 27       21.50292
## 28        2.06522
## 29       38.26764
## 30        0.00905
## 31        2.20748
## 32        8.71521
## 33        0.23863
## 34        0.20978
## 35       46.66667
## 36        5.74836
## 37       63.98665
## 38       17.01041
## 39        8.23312
## 40       52.96994
## 41       12.30000
## 42       71.63943
## 43       45.26500
## 44       18.51852
## 45        2.79220
## 46       29.57889
## 47       11.73679
## 48        2.05077

Finally, I created 5 additional data frames, each sorting the stormtable data frame by a different column.

sortbycount <- stormtable[order(-stormtable$Events),]
sortbyfatalities <- stormtable[order(-stormtable$Fatalities),]
sortbyinjuries <- stormtable[order(-stormtable$Injuries),]
sortbypropdmg <- stormtable[order(-stormtable$PropertyDamage),]
sortbycropdmg <- stormtable[order(-stormtable$CropDamage),]

Results

Total number of events

The vast majority of events were classified as either Thunderstorm Wind or Hail. These two types make up about two thirds of all weather events reported. Flood-related events and tornadoes are also common.

#PLOT 1
par(mar=c(6,10,4,2),cex.axis=0.75)
barplot(sortbycount$Events[10:1],names.arg=sortbycount$EVTYPE2[10:1], horiz=TRUE, col="blue", las=2, cex.names=0.75)
title(main="Top 10 most common weather events",line=2)
title(xlab="Number of recorded events",line=4)
title(ylab="Type of event",line=6)

plot of chunk unnamed-chunk-7

Population Damage

Tsunamis are by far the type of event most damaging to populations, both in terms of injuries and fatalities. Heat and excessive heat are also highly fatal and cause many injuries. Interestingly, heat which is not classified as excessive has a higher average fatality rate than excessive heat, perhaps because the category “Heat” includes prolonged periods of heat which are likely to cause more deaths simply by lasting longer. Hurricanes and Rip currents both rank highly on the lists of both injuries and fatalities. Tornadoes, dust storms, and ice storms cause many injuries but few fatalities. Avalanches, Cold/Wind Chill, and Marine Strong Winds cause many fatalities but fewer non-fatal injuries.

#PLOT 2
par(mar=c(4,10,3,2),cex.axis=0.75, mfcol=c(2,1))
barplot(sortbyinjuries$Injuries[10:1],names.arg=sortbyinjuries$EVTYPE2[10:1], horiz=TRUE, col="salmon", las=2, cex.names=0.75)
title(main="Top 10 worst events by injuries",line=1)
title(xlab="Average injuries per event",line=2)
title(ylab="Type of event",line=8)
barplot(sortbyfatalities$Fatalities[10:1],names.arg=sortbyfatalities$EVTYPE2[10:1], horiz=TRUE, col="red", las=2, cex.names=0.75)
title(main="Top 10 worst events by fatalities",line=1)
title(xlab="Average fatalities per event",line=2)
title(ylab="Type of event",line=8)

plot of chunk unnamed-chunk-8

Economic Damage

Hurricanes inflict the most property damage, and they inflict the most damage to crops, by far. Tropical storms are the next most damaging to property, and they also cause high levels of damage to crops. Drought inlficts a lot of crop damage but little property damage otherwise. Floods, which are among the most common events, cause a high level of crop damage and property damage. Finally, storm surges and tornadoes rank highly on the list of most property damaging event types.

#PLOT 3
par(mar=c(4,10,3,2),cex.axis=0.75, mfcol=c(2,1))
barplot(sortbycropdmg$CropDamage[10:1],names.arg=sortbycropdmg$EVTYPE2[10:1], horiz=TRUE, col="orange", las=2, cex.names=0.75)
title(main="Top 10 worst events by crop damage",line=1)
title(xlab="Average crop damage per event",line=2)
title(ylab="Type of event",line=8)
barplot(sortbypropdmg$PropertyDamage[10:1],names.arg=sortbypropdmg$EVTYPE2[10:1], horiz=TRUE, col="green", las=2, cex.names=0.75)
title(main="Top 10 worst events by property damage",line=1)
title(xlab="Average property damage per event",line=2)
title(ylab="Type of event",line=8)

plot of chunk unnamed-chunk-9