Synopsis

In this report, we attempt to determine which types of severe weather events have the most significant impact on population health and the economy. To do this, we use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In order to assess the effect of severe weather events on population health, we use the NOAA storm data to determine which types of weather events have caused the greatest number of injuries and fatalities. In order to determine which event types have the greatest economic consequences, we use the NOAA storm data to determine which types of events have caused the most crop and property damage. We find that tornados have caused the largest number of injuries and fatalities among types of severe weather events, and that hurricanes and typhoons are most likely to cause significant amounts of crop and property damage.

Data Processing

We start by reading in the data directly from the zipped .csv.bz2 file. Ensure that the Storm Data file is in the working directory.

StormData <- read.csv("repdata_data_StormData.csv.bz2", na.strings = "")

We will be investigating two questions: one about the effects of different types of severe weather events on population health, and a second about the economic consequences of different types of severe weather events. We’ll subset the data to create a separate dataframe for each question, containing only the columns relevant to that question.

PopStormData <- subset(StormData, select = c(EVTYPE, FATALITIES, INJURIES))

Next, we check to see if our subsetted PopStormData dataframe has any missing values.

sum(is.na(PopStormData$EVTYPE))
## [1] 0
sum(is.na(PopStormData$FATALITIES))
## [1] 0
sum(is.na(PopStormData$INJURIES))
## [1] 0
summary(PopStormData$EVTYPE)
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE            FREEZING RAIN 
##                      304                      261                      250 
##              URBAN FLOOD     HEAVY SURF/HIGH SURF        EXTREME WINDCHILL 
##                      249                      228                      204 
##             STRONG WINDS           DRY MICROBURST    ASTRONOMICAL LOW TIDE 
##                      196                      186                      174 
##                HURRICANE              RIVER FLOOD               LIGHT SNOW 
##                      174                      173                      154 
##         STORM SURGE/TIDE            RECORD WARMTH         COASTAL FLOODING 
##                      148                      146                      143 
##               DUST DEVIL         MARINE HIGH WIND        UNSEASONABLY WARM 
##                      141                      135                      126 
##                 FLOODING   ASTRONOMICAL HIGH TIDE        MODERATE SNOWFALL 
##                      120                      103                      101 
##           URBAN FLOODING               WINTRY MIX        HURRICANE/TYPHOON 
##                       98                       90                       88 
##            FUNNEL CLOUDS               HEAVY SURF              RECORD HEAT 
##                       87                       84                       81 
##                   FREEZE                HEAT WAVE                     COLD 
##                       74                       74                       72 
##              RECORD COLD                      ICE  THUNDERSTORM WINDS HAIL 
##                       64                       61                       61 
##      TROPICAL DEPRESSION                    SLEET         UNSEASONABLY DRY 
##                       60                       59                       56 
##                    FROST              GUSTY WINDS      THUNDERSTORM WINDSS 
##                       53                       53                       51 
##       MARINE STRONG WIND                    OTHER               SMALL HAIL 
##                       48                       48                       47 
##                   FUNNEL             FREEZING FOG             THUNDERSTORM 
##                       46                       45                       45 
##       Temperature record          TSTM WIND (G45)         Coastal Flooding 
##                       43                       39                       38 
##              WATERSPOUTS    MONTHLY PRECIPITATION                    WINDS 
##                       37                       36                       36 
##                  (Other) 
##                     2940
summary(PopStormData$FATALITIES)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
summary(PopStormData$INJURIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000

It appears that the data set has no missing values in the columns we’ll be using to answer the question of which event types are most harmful to population health.

Next, we subset StormData to create a dataframe containing the columns relevant to answering the question of which severe weather event types have the greatest economic consequences.

EconStormData <- subset(StormData, select = c(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))

Next, we check to see if our subsetted EconStormData dataframe has any missing values. The property and crop damage amounts are reported in scientific notation, where PROPDMGEXP and CROPDMGEXP give the order of magnitude of the property damage and crop damage amounts, respectively.

sum(is.na(EconStormData$EVTYPE))
## [1] 0
sum(is.na(EconStormData$PROPDMG))
## [1] 0
sum(is.na(EconStormData$PROPDMGEXP))
## [1] 465934
sum(is.na(EconStormData$CROPDMG))
## [1] 0
sum(is.na(EconStormData$CROPDMGEXP))
## [1] 618413
summary(EconStormData$PROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   12.06    0.50 5000.00
summary(EconStormData$CROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.527   0.000 990.000
summary(EconStormData$PROPDMGEXP)
##      -      ?      +      0      1      2      3      4      5      6 
##      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M   NA's 
##      5      1     40      1      6 424665      7  11330 465934
summary(EconStormData$CROPDMGEXP)
##      ?      0      2      B      k      K      m      M   NA's 
##      7     19      1      9     21 281832      1   1994 618413

The only columns with missing values are PROPDMGEXP and CROPDMGEXP, and there are a lot of missing values: 51.6386511% and 68.5376323%, respectively. We assume that entries whose PROPDMGEXP or CROPDMGEXP are missing already have their PROPDMG or CROPDMG entry in decimal notation. We will remove entries with PROPDMGEXP or CROPDMGEXP of “-”, “?”, and “+”. We will convert the letter entries (“h” or “H” for “hundred”, “k” or “K” for “thousand”, “m” or “M” for “million”, and “B” for “billion”) to numerical exponents, so that all entries in the column have the same formatting. Then we will create new columns containing the dollar amounts of property and crop damage amounts, respectively.

EconStormData <- EconStormData %>%
    mutate(
      cleanPROPDMGEXP = ifelse(is.na(PROPDMGEXP), 0, 
                                    ifelse(PROPDMGEXP %in% c("h", "H"), 2,
                                    ifelse(PROPDMGEXP == "K", 3,
                                    ifelse(PROPDMGEXP %in% c("m", "M"), 6,
                                    ifelse(PROPDMGEXP == "B", 9,
                                    ifelse(PROPDMGEXP %in% c("-", "?", "+"), "remove",
                                        as.character(PROPDMGEXP)   
                                    )
                                    )
                                    )
                                    )
                                    )
                               )
    ) %>%
  filter(cleanPROPDMGEXP != "remove") %>%
  mutate(cleanPROPDMGEXP = as.numeric(cleanPROPDMGEXP)) %>%
  mutate(cleanPROPDMG = PROPDMG * 10^cleanPROPDMGEXP)
EconStormData <- EconStormData %>%
    mutate(
      cleanCROPDMGEXP = ifelse(is.na(CROPDMGEXP), "0", 
                                    ifelse(CROPDMGEXP %in% c("h", "H"), 2,
                                    ifelse(CROPDMGEXP %in% c("k", "K"), 3,
                                    ifelse(CROPDMGEXP %in% c("m", "M"), 6,
                                    ifelse(CROPDMGEXP == "B", 9,
                                    ifelse(CROPDMGEXP %in% c("-", "?", "+"), "remove",
                                        as.character(CROPDMGEXP)   
                                    )
                                    )
                                    )
                                    )
                                    )
                               )
    ) %>%
  filter(cleanCROPDMGEXP != "remove") %>%
  mutate(cleanCROPDMGEXP = as.numeric(cleanCROPDMGEXP)) %>%
  mutate(cleanCROPDMG = CROPDMG * 10^cleanCROPDMGEXP)

Results

Population Health

In order to assess the effect of severe weather events on population health, we’ll create a new variable that combines the two ways the dataset measures the impact of weather events on the population: number of injuries and number of fatalities. Our new variable will be their sum. We then use this new variable to identify the ten severe weather event types with the largest total sum of injuries and fatalities over all events in the dataset.

Because the number of event types is unmanageably large and has numerous obvious coding errors, and because I don’t know enough about the dataset to fix those coding errors in an intelligent way, I will not attempt to combine apparent duplicate event types. Instead, we’ll use the event types as coded in the dataset, ignoring any errors. We’ll pick out the 10 event types with the largest sum of fatalities and injuries and focus the analysis on those event types.

PopStormData <- PopStormData %>%
  mutate(PopHealth = FATALITIES + INJURIES)

EventsByPopImpact <- PopStormData %>% 
   group_by(EVTYPE) %>% 
   summarize(PopImpact = sum(PopHealth)) 

EventsByPopImpact <- EventsByPopImpact[order(-EventsByPopImpact$PopImpact), ]

Top10PopImpact <- EventsByPopImpact[1:10, ]

In order to show the impact of the ten event types with the largest numbers of fatalities and injuries, we’ll make a bar graph showing the number of fatalities and injuries caused by all of the events of those types.

p1 <- ggplot(data=PopStormData[PopStormData$EVTYPE %in% Top10PopImpact$EVTYPE,], aes(x=EVTYPE)) + geom_col(aes(y=-FATALITIES, fill="Fatalities")) + geom_col(aes(y=INJURIES, fill="Injuries")) + coord_flip() + labs(x = "Event Type", y = "Number of Fatalities and Injuries", fill = "Legend", title = "Population Impact of Weather Events") + scale_fill_manual(values = c("Fatalities" = "red", "Injuries" = "blue"))

p1

From the bar graph, we can see that tornadoes have caused by far the largest number of fatalities and the largest number of injuries of any type of severe weather event.

Economic Effects

In order to assess the economic effects of severe weather events, we’ll look at the dollar figures of property damage and crop damage caused by the various types of events.

As for the impact on population, because the number of event types is unmanageably large and has numerous obvious coding errors, and because I don’t know enough about the dataset to fix those coding errors in an intelligent way, I will not attempt to combine apparent duplicate event types. Instead, we’ll use the event types as coded in the dataset, ignoring any errors. We’ll pick out the 10 event types with the largest total property damage and crop damage, and focus the analysis on those event types.

EventsByPropDmg <- EconStormData %>% 
   group_by(EVTYPE) %>% 
   summarize(TotalPropDmg = sum(cleanPROPDMG)) 

EventsByPropDmg <- EventsByPropDmg[order(-EventsByPropDmg$TotalPropDmg), ]

Top10PropDmg <- EventsByPropDmg[1:10, ]

EventsByCropDmg <- EconStormData %>% 
   group_by(EVTYPE) %>% 
   summarize(TotalCropDmg = sum(cleanCROPDMG)) 

EventsByCropDmg <- EventsByCropDmg[order(-EventsByCropDmg$TotalCropDmg), ]

Top10CropDmg <- EventsByCropDmg[1:10, ]

We’ll reshape the data to allow us to plot property and crop damage on the same plots.

longEconStormData <- EconStormData %>% 
                        filter(EconStormData$EVTYPE %in% Top10PropDmg$EVTYPE | EconStormData$EVTYPE %in% Top10CropDmg$EVTYPE) %>% 
                        gather(DmgType, Dollars, cleanCROPDMG, cleanPROPDMG)

In order to show the impact of the event types that carry the highest economic costs, we will make boxplots of the property and crop damage caused by events of the types in the top ten for total amounts of either property or crop damage.

p2 <- ggplot(data=longEconStormData, aes(EVTYPE, log10(Dollars + 1))) + 
          geom_boxplot(aes(color = DmgType)) +
          theme(axis.text.x = element_text(angle=90, hjust=1)) + 
          scale_y_continuous(breaks = c(0, 3, 6, 9), labels = c("0", "1,000", "1,000,000", "1,000,000,000")) + 
          labs(x = "Event Type", y = "Economic cost, in dollars (log scale)", title = "Economic Impact of Weather Events") + 
          scale_color_discrete(name="Legend", labels=c("Crop Damage", "Property Damage"))

p2

From the box plots, we can see that hurricanes and typhoons have caused the most economic damage in terms of dollar costs of property and crop damage. While it appears that most events don’t cause any crop damage (since it appears that all event types have median crop damage of $0), hurricanes and hurricane/typhoons are the only event type whose third quartile is significantly more than $0, and those are also the event types with the highest median property damage amounts.