Data Analyst: Bo Suzow
Date: November 3, 2017

Introduction


This project explores a subset of the weather event dataset available at the U.S. National Oceanic and Atmospheric Administration (NOAA). As severe weather conditions cause fatalities, injuries, and substantial economic damages, ascertaining which weather event types are responsible for these threatening populations well-being leads us to effectively allocate limited resources in coping with future weather events. To this end, the analysis explores the dataset in order to answer the following two questions:

  • Across the United States, which types of events are most harmful with respect to population health?
  • Across the United States, which types of events have the greatest economic consequences?

The dataset is relatively large (900K+ observations). This analysis references Table 1. Storm Data Event Table (SDET) in the storm data preparation instructions and guidelines in determining “permitted” event types in the data set. The data clean-up of the Event Type variable (EVTYPE) is performed according to the SDET (see the Data Processing section for details).

Data Processing


The following code chunk performs:

  • Download the file from the URL provided by the instructor.
  • Read the file and load it as a data table (dtStorm).
  • Display its structure.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              "stormDB.csv")

dtStorm = as.data.table(read.csv("stormDB.csv"))
str(dtStorm)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

This report focuses on seven variables (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG,and CROPDMGEXP) in order to explore the dataset to address the questions stated in the earlier section of the report. For processing efficiencies:

  • A data table (dtNet) hosting these variables is created.
  • The Event Type (EVTYPE) values are tidied by being ‘standardized’. Slight variations noticed in values get coerced to permitted event type names.
  • The dtNet table gets split to two:
    • Fatalities and injuries
    • Economic impact
      • Property or crop damage estimates were recorded along with 3 different dollar magnitude units (B for billions, M for millions and K for thousands). They are normalized in thousands.
      • Property and crop damage estimates are combined for a total estimate per event type.
      • The “K” was assumed for the estimates not associated with a proper dollar magnitude units.
  • Missing values are filtered out from property or crop damage estimate variables.
dtNet = dtStorm %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
dtNet = dtNet %>% mutate(EVTYPE=toupper(EVTYPE))

# clean up the EVTYPE values

dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^COLD",EVTYPE),"COLD",NA))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(EXCESSIVE|EXTREME) HEAT",EVTYPE),
                                             "EXCESSIVE HEAT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^EXTREME (COLD|WINDCHILL)",EVTYPE),
                                             "EXTREME COLD-WINDCHILL",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("(FREEZE|FROST)",EVTYPE),"FROST-FREEZE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^FLOOD",EVTYPE),"FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^FLASH FLOOD",EVTYPE),"FLASH FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl(".FLASH FLOOD",EVTYPE),"FLASH FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HAIL",EVTYPE),"HAIL",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HEAT",EVTYPE),"HEAT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HEAVY RAIN",EVTYPE),"HEAVY RAIN",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HEAVY SNOW",EVTYPE),"HEAVY SNOW",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(HEAVY|HIGH) SURF",EVTYPE),"HIGH SURF",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HIGH WIND",EVTYPE),"HIGH WIND",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^HURRICANE",EVTYPE),"HURRICANE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^LAKE.EFFECT",EVTYPE),"LAKE-EFFECT SNOW",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^LANDSLIDE",EVTYPE),"LANDSLIDE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^LIGHTNING",EVTYPE),"LIGHTNING",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(MAJOR|RIVER) FLOOD",EVTYPE),"FLOOD",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^RIP CURRENT",EVTYPE),"RIP CURRENT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^STORM SURGE",EVTYPE),"STORM TIDE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^STRONG WIND",EVTYPE),"STRONG WIND",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^TROPICAL STORM",EVTYPE),"TROPICAL STORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WATERSPOUT",EVTYPE),"WATERSPOUT",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WILD*.+FIRE",EVTYPE),"WILDFIRE",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^(THUNDERSTORM|TSTM)",EVTYPE),
                                                       "THUNDERSTORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("THUNDERSTORM$",EVTYPE),
                                                       "THUNDERSTORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^TORNADO",EVTYPE),"TORNADO",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WINTER STORM",EVTYPE),"WINTER STORM",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(grepl("^WINTER WEATHER",EVTYPE),"WINTER WEATHER",EVTYPEtidy))
dtNet = dtNet %>% mutate(EVTYPEtidy = ifelse(is.na(EVTYPEtidy),EVTYPE,EVTYPEtidy))

# set up a data table for population health impact

dtPHI = dtNet %>% filter((FATALITIES >0 & !is.na(FATALITIES)) | (INJURIES >0 & !is.na(INJURIES)))
dtPHI = dtPHI  %>% select(EVTYPEtidy, FATALITIES, INJURIES)

# set up a data table for economic damages

dmgUnit = c("B","b","M","m","K","k")
Bil = 1000000
Mil = 1000

dtEI = dtNet %>% filter((PROPDMG >0 & !is.na(PROPDMG) & !is.na(PROPDMGEXP) & PROPDMGEXP %in% dmgUnit)| 
                        (CROPDMG >0 & !is.na(CROPDMG) & !is.na(CROPDMGEXP) & CROPDMGEXP %in% dmgUnit))
dtEI = dtEI %>% mutate (PDTOT = paste(PROPDMG,PROPDMGEXP,sep=""), 
                        CDTOT = paste(CROPDMG,CROPDMGEXP,sep=""))
dtEI = dtEI %>% mutate (damageTotal = ifelse(PROPDMGEXP %in% c("B","b"),PROPDMG*Bil, 
                                 ifelse(PROPDMGEXP %in% c("M","m"),PROPDMG*Mil,PROPDMG)) +
                                 ifelse(CROPDMGEXP %in% c("B","b"),CROPDMG*Bil, 
                                 ifelse(CROPDMGEXP %in% c("M","m"),CROPDMG*Mil,CROPDMG))
                       )
dtEI = dtEI  %>% select(EVTYPEtidy, PDTOT, CDTOT, damageTotal)

Data Plotting for Population Health Impact


Compute the fatality totals by event type and plot the event types that have caused 100 or more fatalities.

dtPHIsum = dtPHI %>% group_by(EVTYPEtidy) %>% summarize(sum(FATALITIES),sum(INJURIES))
names(dtPHIsum) = c("EVTYPEtidy","FATALITIES","INJURIES")
dtPHIsum = dtPHIsum %>% filter(FATALITIES >=100)

maxlabel = paste("Max=",max(dtPHIsum$FATALITIES))

p1a <- ggplot(dtPHIsum,aes(EVTYPEtidy,FATALITIES)) +
      geom_bar(stat="identity", col="black", fill=I("grey"), alpha=I(.75)) +
      xlab("Event Type (1995 through Nov 2011)") + 
      ylab("Fatality (>=100)") +
      ggtitle("Most Fatal Weather Events - Figure 1a") +
      geom_text(aes("TORNADO",max(FATALITIES)+500,label=maxlabel),col="red") +
      theme(axis.text.x = element_text(angle=90, hjust=1)) +
      theme(plot.title=element_text(hjust=1))

print(p1a)

The event types caused 100 or more fatalities were harmful to populations with injuries as well.

maxlabel = paste("Max=",max(dtPHIsum$INJURIES))

p1b <- ggplot(dtPHIsum,aes(EVTYPEtidy,INJURIES)) +
      geom_bar(stat="identity", col="black", fill=I("grey"), alpha=I(.5)) +
      xlab("Event Type (1995 through Nov 2011)") + 
      ylab("Number of Injuries") +
      ggtitle("Most Injury-induced Weather Events - Figure 1b") +
      geom_text(aes("TORNADO",max(INJURIES)+10000,label=maxlabel),col="red") +
      theme(axis.text.x = element_text(angle=90, hjust=1)) +
      theme(plot.title=element_text(hjust=1))

print(p1b)

Data Plotting for Economic Impact


Repeat the same process above for property/crop damage (estimated) totals. Let’s zero in on the event types accounting for 1 billion dollars or more loss.

dtEIsum = dtEI %>% group_by(EVTYPEtidy) %>% summarize(sum(damageTotal))
names(dtEIsum) = c("EVTYPEtidy","damageTotal")
dtEIsum = dtEIsum %>% filter(damageTotal>=Bil)

maxlabel = paste("Max=",round(max(dtEIsum$damageTotal)/Bil,digits=0))

p2 <- ggplot(dtEIsum,aes(EVTYPEtidy,damageTotal/Bil)) +
      geom_bar(stat="identity", col="black", fill=I("red")) +
      xlab("Event Type (1995 through Nov 2011)") + 
      ylab("Damage Total (in billions)") +
      ggtitle("Most Economically Damaging Weather Events - Figure 2") +
      geom_text(aes("FLOOD",max(damageTotal)/Bil+10,
                    label=maxlabel),col="red") +
      theme(axis.text.x = element_text(angle=90, hjust=1)) +
      theme(plot.title=element_text(hjust=1))
            

print(p2)

Results


The following conclusions are drawn from the dataset:

  • Tornado, Excessive Heat (EH), Flash Flood (FF), Heat and Lightning are the top five weather events accounting for the most fatalities. Of these, Tornado, EH and Lightning make the three of the top five events that resulted in more injuries.
  • Flood, Hurricane, Tornado, Storm Tide and Hail are the top five weather events responsible for most economic damages.