Summary

Using data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, we classify damaging weather events into 26 categories. These include such things as tornadoes, heat and cold events, and fog. We then find the average number of fatalities, injuries, and economic damages for each type of event, and compare. The types of events that are most likely to cause human injury or death are heat, tsunamis, avalanches, cold, surf and coastal events, hurricanes, fog, and dust. The types of events that are most likely to cause the most economic damage were hurricanes, droughts, surf and coastal events, tropical storms, tsunamis, and ice events. There is little or no relation between the types of events that cause economic damages and the type of events that cause injuries or deaths.

Data Processing

The data for our analysis come from NOAA’s storm database. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage, going back to 1950. However, the data are very messy, as they appear to come directly from state reporters. Some variables are missing or have incorrect values, and variations in terminology and spelling are common. Before we can perform our analysis, we need to tidy up our data into recognizeable categories.

#download data
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)

F<- "factor"
N<- "numeric"
C<- "character"

colclass<- c(F,C,C,F,C,C,F,F,N,F,C,C,C,C,C,N,F,C,N,N,F,N,N,N,N,F,N,F,C,C,C,N,N,N,N,C,N)
StormData<- read.csv(temp, colClasses=colclass)
StormData$BGN_DATE<-as.Date(StormData$BGN_DATE,"%m/%d/%Y")

After downloading the data, we use the variable “EVTYPE” (event type) to create different types of weather event categories. In many cases, multiple types of damage occured in a single event. For instance, a thunderstorm event may include both tornadoes and hail. Because of this, we created 26 variables to indicate whether a particular type of event was mentioned in the “EVTYPE” variable. In most cases, similar types of events were collapsed into a single variable. The definitions of the different categories of event types are presented below.

It was also necessary to create a summary variable of economic damages. In the original dataset, the values for economic damages could be indicated in hundreds, thousands, millions, or billions of dollars, and were divided into two different types of damages: property damages and crop damages. We put all of the different damages on the same scale, and added property and crop damages together.

Finally, based on the notes included in the dataset, it is clear that $0 in damages sometimes means that there was no damages (as in a sighting of the Northern Lights), and sometimes means that the damages were not recorded or not calculated. Because of this all economic analyses exclude events for which there were no damages reported. This may overstate the economic damages of certain types of events.

library(plyr, quietly = TRUE)
library(dplyr, quietly = TRUE)
## Warning: package 'dplyr' was built under R version 3.6.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#create summary variables for use
SmlData<- StormData%>%
      #types of events
            mutate(TORNADO=ifelse(grepl("tornado|torndao|dust dev|landspout",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 WIND=ifelse(grepl("wind|wnd",StormData$EVTYPE,ignore.case = TRUE)&
                                   !grepl("wind( )?ch(ill)?",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 COLD=ifelse(grepl("cold|freeze|wind( )?ch(ill)?|low temp|cool|HYPOTHERMIA|frost|record low",StormData$EVTYPE,ignore.case = TRUE)&
                                   !grepl("cold air funnel|cold air tornado",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 FOG=ifelse(grepl("[vf]og",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 WINTWEATH=ifelse(grepl("winter|snow|ice|blizzard|sleet|wintry|freezing (rain|drizzle|fog|spray)|icy|glaze|HEAVY MIX|MIXED PRECIP",StormData$EVTYPE,ignore.case = TRUE)&
                                        !grepl("(lack of|unusually late) snow|snow (drought|melt)|ice (jam|floe)",StormData$EVTYPE,ignore.case = TRUE),
                                  "YES","NO"),
                 FLOOD=ifelse(grepl("floo(o)?d|dam |fld|SMALL STREAM|high water|urban|RISING WATER",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 FIRE=ifelse(grepl("fire|smoke",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 DROUGHT=ifelse(grepl("dry|drought|lack of snow|unusually late|BELOW NORMAL PRECIPITATION",StormData$EVTYPE,ignore.case = TRUE)&
                                      !grepl("dry micro|dry mirco",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 THUNDERSTORM=ifelse(grepl("thunderstorm|tstm|lightning|LIGHTING|LIGNTNING",StormData$EVTYPE,ignore.case = TRUE)&
                                           !grepl("non[ -]tstm",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 HAIL=ifelse(grepl("hail",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 HEAT=ifelse(grepl("heat|warm|hot|high te|HYPERTHERMIA|record high",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 MICROB=ifelse(grepl("mic(r)?oburst|downburst",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 DUST=ifelse(grepl("dust",StormData$EVTYPE,ignore.case = TRUE)&
                                   !grepl("dust dev",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 SNOW=ifelse(grepl("snow",StormData$EVTYPE,ignore.case = TRUE)&
                                   !grepl("(lack of|unusually late) snow|snow( drought|melt| melt)",StormData$EVTYPE,ignore.case = TRUE)
                             ,"YES","NO"),
                 WATERSP=ifelse(grepl("wa(y)?ter( )?spout",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 ICE=ifelse(grepl("ice|freezing (rain|drizzle|fog|spray)|icy|glaze",StormData$EVTYPE,ignore.case = TRUE)&
                                  !grepl("ice (melt|jam|floe)",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 VOLCANO=ifelse(grepl("volcan",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 HURRICANE=ifelse(grepl("hurricane|typhoon",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 TROPSTORM=ifelse(grepl("tropical|remnants",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 RAIN=ifelse(grepl("rain|(unseasonably|abnormally|EXTREMELY|EXCESSIVE) wet|drizzle|(RECORD|heavy|EXCESSIVE) PRECIP[ia]TATION|HEAVY SHOWER|wet (weather|year|month)",StormData$EVTYPE,ignore.case = TRUE)&
                                   !grepl("freezing (rain|drizzle)",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 SURF=ifelse(grepl("surf|rip current|wave|seas|tide|beach|STORM SURGE|coastal (er|surge)|swell",StormData$EVTYPE,ignore.case = TRUE)&
                                   !grepl("(cold|heat) wave|season",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 MUDSLIDE=ifelse(grepl("(mud|rock)( )?slide|landsl",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 FUNNEL=ifelse(grepl("funnel|gustnado",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 AVALANCHE=ifelse(grepl("avalanche|AVALANCE",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 TSUNAMI=ifelse(grepl("tsunami",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"),
                 OTHER=ifelse(grepl("SEICHE|WALL CLOUD|other$|Coastal( )?St|ICE (floes|jam)",StormData$EVTYPE,ignore.case = TRUE),"YES","NO"))%>%
      #events not included
      mutate(NONE=ifelse(TORNADO=="NO"& WIND=="NO"& COLD=="NO"& FOG=="NO"
                         & WINTWEATH=="NO"& FLOOD=="NO"& FIRE=="NO"
                         & DROUGHT=="NO"& THUNDERSTORM=="NO"& HAIL=="NO"
                         & HEAT=="NO"& MICROB=="NO"& DUST=="NO"& SNOW=="NO"
                         & WATERSP=="NO"& ICE=="NO"& OTHER=="NO"& TROPSTORM=="NO"
                         & VOLCANO=="NO"& HURRICANE=="NO"& RAIN=="NO"
                         & SURF=="NO"& MUDSLIDE=="NO"& FUNNEL=="NO"& AVALANCHE=="NO"
                         & TSUNAMI=="NO"
                         ,"YES","NO"))%>%
      #economic damages summary variables
      mutate( 
             ECON_PR=case_when(
                   PROPDMG==0 ~ PROPDMG,
                   PROPDMGEXP %in% c("h","H") ~ 100*PROPDMG,
                   PROPDMGEXP %in% c("k","K") ~ 1000*PROPDMG,
                   PROPDMGEXP %in% c("m","M") ~ 1000000*PROPDMG,
                   PROPDMGEXP %in% c("b","B") ~ 1000000000*PROPDMG), 
             ECON_CR=case_when(
                   CROPDMG==0 ~ CROPDMG,
                   CROPDMGEXP %in% c("h","H") ~ 100*CROPDMG,
                   CROPDMGEXP %in% c("k","K") ~ 1000*CROPDMG,
                   CROPDMGEXP %in% c("m","M") ~ 1000000*CROPDMG,
                   CROPDMGEXP %in% c("b","B") ~ 1000000000*CROPDMG), 
             ECON_T=ifelse(!is.na(ECON_PR) & !is.na(ECON_CR), ECON_PR+ECON_CR,NA))%>%
#create smaller analysis dataset with limited variables
      select(REFNUM,TORNADO:NONE,FATALITIES,INJURIES,ECON_PR:ECON_T,EVTYPE)%>%
#only include event with economic damages or human injuries
      subset((FATALITIES>0|INJURIES>0|ECON_T>0))

Event Types

We analyse 26 different types of events.

Two types frequently occur with more specific damages as well: thunderstorms and winter weather. Thunderstorms included any event that mentioned thunderstorms or lightning. Winter weather included any event that included snow, ice, blizzards, sleet, winter storms, freezing rain or fog, or mixed precipitation. Some events that included these terms were excluded if they indicated spring weather or other types of events, such as the term “snow melt.”
One type of event, wind frequently occured with other events, and was often the explanation for damages associated with thunderstorms, rain, or winter weather. It also occured on its own. Events with the word “windchill” were not included, as the term generally refers to cold, rather than high winds.

Other events include:
-Tornadoes. This includes tornadoes, dust devils, and landspouts.
-Waterspouts. This includes any type of waterspout.
-Funnel clouds. This includes any type of funnel or gustnado.
-Microbursts and downbursts. This includes any event that includes microbursts or downbursts.
-Cold. This includes any mention of cold or cool weather, freezes, windchill, low temperatures, frost, hypothermia, or record lows.
-Heat. This includes any mentions of heat or warmth, high temperatures, hyperthermia, or record highs.
-Drought and dry weather. This includes events that include any type of drought or dry weather, late rains or snow, or below normal precipitation.
-Snow. This includes most events that mention snow, including storms, snow pack, blowing snow, and snow accumulation. Snowmelt floods and snow droughts are excluded.
-Ice. This includes ice storms, ice glazes, and freezing precipitation. It excludes ice melts, james, and floes.
-Rain. This includes any event that mentions rain, excess wetness or precipitation or drizzle. Freezing rain is excluded.
-Tropical storms. This includes any event that mentions tropical storms or remnants of a named storm.
-Hurricanes. This includes both hurricanes and typhoons.
-Hail. This includes any type of hail damage, including floods caused by hail.
-Fog. This includes any type of fog, including freezing fog.
-Flooding. This includes flooding (river, coastal, street, or unspecified), high or rising water, dam breakages, and mentions of streams or “urban.”
-Surf and coastal events. This includes surf, rip currents, waves, high seas, tides, storm surges, and beach erosion. It does not include tsunamis.
-Fire. This includes any event that mentions fire or smoke.
-Dust. This includes duststorms and poor air quality due to dust, but does not include dust devils (which are classified as a type of tornado).
-Volcanoes. This includes damage from volcanic ash or volcanic activity.
-Landslides and mudslides. This includes landslides, rock falls, and mud slides.
-Avalanches. This includes any type of avalanche.
-Tsunamis. This includes any tsunami.
-Other. This includes “other” events, coastal storms, ices floes and jams, and seiches.

Results

Once we had defined our events, we wanted to look at the average reported damages for each type of event. We calculated the average number of fatalities and injuries for each event, as well as the average economic impact, in thousands of dollars. We also wanted to examine separately events that have at least one injury or fatality, to see if the story changes when only looking at “serious” events. It is also important to see the frequency of events. If there were hardly any injuries from the average thunderstorm, but they were much more common than other types of events, then they might still be a concern.

#create empty variable vectors
TYPE<- as.character(names(SmlData[2:27]))
MEANF<- as.numeric(rep(NA,26))
MEANSrF<- as.numeric(rep(NA,26))
MEANI<- as.numeric(rep(NA,26))
MEANSrI<- as.numeric(rep(NA,26))
MEANE<- as.numeric(rep(NA,26))
MEANSrE<- as.numeric(rep(NA,26))
NUM_EVENTS<- as.integer(rep(NA,26))
NUM_EVENTSSr<- as.integer(rep(NA,26))

#fill in means and number of events.
for(i in 2:27){
      MEANF[i-1]<-mean(SmlData$FATALITIES[SmlData[i]=="YES"])
      MEANSrF[i-1]<-mean(SmlData$FATALITIES[SmlData[i]=="YES"&SmlData$INJURIES+SmlData$FATALITIES>0])
      MEANI[i-1]<-mean(SmlData$INJURIES[SmlData[i]=="YES"])
      MEANSrI[i-1]<-mean(SmlData$INJURIES[SmlData[i]=="YES"&SmlData$INJURIES+SmlData$FATALITIES>0])
      MEANE[i-1]<-round(mean(SmlData$ECON_T[SmlData[i]=="YES"],na.rm = TRUE)/1000,0)
      MEANSrE[i-1]<-round(mean(SmlData$ECON_T[SmlData[i]=="YES"&SmlData$INJURIES+SmlData$FATALITIES>0],na.rm = TRUE)/1000,0)
      NUM_EVENTS[i-1]<-nrow(SmlData[SmlData[i]=="YES",])
      NUM_EVENTSSr[i-1]<- nrow(SmlData[SmlData[i]=="YES"&SmlData$INJURIES+SmlData$FATALITIES>0,])
}

#create data frame
MeanDam<-data.frame(TYPE,NUM_EVENTS,MEANF=round(MEANF,2),MEANI=round(MEANI,2),MEANE,
                    NUM_EVENTSSr,MEANSrF=round(MEANSrF,2),MEANSrI=round(MEANSrI,2),MEANSrE)

#print table
library(knitr)
kable(MeanDam[order(-NUM_EVENTS),c(1:4,6:8)],
      col.names=c("Type","Number of Events","Average Fatalities","Average Injuries",
                  "Number of Serious Events","Average Fatalities (serious events)","Average Injuries (serious events)"),
      caption="Health Outcomes by Event Type: 1950-2011",
      row.names=FALSE)
Health Outcomes by Event Type: 1950-2011
Type Number of Events Average Fatalities Average Injuries Number of Serious Events Average Fatalities (serious events) Average Injuries (serious events)
THUNDERSTORM 132873 0.01 0.11 7340 0.21 2.01
WIND 129446 0.01 0.09 4974 0.24 2.30
TORNADO 40055 0.14 2.28 7950 0.71 11.50
FLOOD 33152 0.05 0.26 1457 1.07 5.96
HAIL 26649 0.00 0.06 321 0.14 4.57
WINTWEATH 5108 0.13 1.25 783 0.86 8.18
SNOW 1898 0.09 0.61 229 0.74 5.09
FIRE 1260 0.07 1.28 333 0.27 4.83
RAIN 1174 0.09 0.24 128 0.83 2.20
SURF 1129 0.70 0.76 829 0.95 1.03
HEAT 990 3.21 9.34 947 3.36 9.76
ICE 890 0.14 2.75 160 0.78 15.32
COLD 658 0.74 0.50 363 1.34 0.90
TROPSTORM 456 0.14 0.84 43 1.53 8.91
DROUGHT 278 0.13 0.07 4 8.75 4.75
AVALANCHE 270 0.83 0.63 241 0.93 0.71
HURRICANE 233 0.58 5.72 71 1.90 18.77
MUDSLIDE 213 0.21 0.26 24 1.83 2.29
FOG 189 0.43 5.70 127 0.64 8.48
DUST 105 0.21 4.19 45 0.49 9.78
MICROB 87 0.03 0.32 14 0.21 2.00
WATERSP 64 0.09 1.12 8 0.75 9.00
OTHER 62 0.06 0.10 6 0.67 1.00
FUNNEL 19 0.00 0.16 2 0.00 1.50
TSUNAMI 14 2.36 9.21 2 16.50 64.50
VOLCANO 2 0.00 0.00 0 NaN NaN

Overall, heat events have the greatest impact on human health, both in terms of fatalities and injuries, followed by tsunamis. However, when only serious events are counted (those causing at least one injury), tsunamis have the highest impact on human health, although they are very rare in the United States. Other events with high injury or fatality averages are avalanches, cold events, surf and coastal events, hurricanes, fog, and dust. However, average fatalities for these events are often less than 1, meaning most of the time there are no fatalities at all. There were no volcanic events that resulted in any injuries, and hail and wind events also had generally small effects on human health.

#print table
kable(MeanDam[order(-NUM_EVENTS),c(1:2,5:6,9)],
      col.names=c("Type","Number of Events","Average Economic Damages","Number of Serious Events","Average Economic Damages (events with at least 1 injury)"),
      caption="Economic Damages in thousands of dollars, by Event Type: 1950-2011",
      row.names=FALSE)
Economic Damages in thousands of dollars, by Event Type: 1950-2011
Type Number of Events Average Economic Damages Number of Serious Events Average Economic Damages (events with at least 1 injury)
THUNDERSTORM 132873 111 7340 587
WIND 129446 153 4974 1667
TORNADO 40055 1472 7950 5495
FLOOD 33152 5429 1457 7694
HAIL 26649 777 321 16525
WINTWEATH 5108 3478 783 8903
SNOW 1898 613 229 990
FIRE 1260 7067 333 14582
RAIN 1174 3676 128 511
SURF 1129 42600 829 4919
HEAT 990 934 947 534
ICE 890 10124 160 4968
COLD 658 5627 363 424
TROPSTORM 456 18445 43 156242
DROUGHT 278 54025 4 588
AVALANCHE 270 32 241 24
HURRICANE 233 390011 71 593514
MUDSLIDE 213 1631 24 2899
FOG 189 132 127 130
DUST 105 88 45 41
MICROB 87 84 14 2
WATERSP 64 949 8 6264
OTHER 62 210 6 8
FUNNEL 19 16 2 2
TSUNAMI 14 10292 2 42010
VOLCANO 2 250 0 NaN

Hurricane events had, by far, the largest average economic impact, at $390 million. Droughts, surf and coastal events, tropical storms, tsunamis, and ice events also averaged over $10 million dollars in damages. On the other side, dust events, microbursts and downdrafts, avalanches, and funnel clouds generally averaged less than $100 thousand in damages.

Events with fatalities and injuries are not always the ones that cause the most economic damage. In almost half of the types of events, limiting the analysis to those with injuries lowers the average economic damages, sometimes greatly. In fact, a scatterplot of the average log injuries or fatalities with the average economic impacts of a particular event show basically no relationship between the two.

par(mfrow=c(1,2),mar=c(4,2.1,2.1,2.1),oma=c(2,3,2,0),main="Average Economic Damages and Fatalities/Injuries by Event Type")
## Warning in par(mfrow = c(1, 2), mar = c(4, 2.1, 2.1, 2.1), oma = c(2, 3, :
## "main" is not a graphical parameter
plot(log(MeanDam$MEANF),log(MeanDam$MEANE), ylab="",xlab="Mean Log Fatalities")
plot(log(MeanDam$MEANI),log(MeanDam$MEANE),ylab="",xlab="Mean Log Injuries")
mtext("Average Economic Damages and Fatalities/Injuries by Event Type", side=3, outer = TRUE)
mtext("Mean Economic Damages (log thousands of dollars)", side=2, outer = TRUE)

Conclusions

The types of events that were most likely to cause human injury or death were heat, tsunamis, avalanches, cold, surf and coastal events, hurricanes, fog, and dust. The types of events that were most likely to cause the most economic damage were hurricanes, droughts, surf and coastal events, tropical storms, tsunamis, and ice events. There was little or no relation between the types of events that caused economic damages and the type of events that caused injuries or deaths.

One caution to this analysis is that the events that were reported were those that caused injuries or had calculated economic damages. For instance, most heat waves may be harmless, but when they turn deadly (or when the deaths and injuries are reported), those effects are high. The many generally harmless heatwaves were not included in this analysis, so the deadliness of heatwaves may be overstated.