Weather Events Most Harmful to Population Health and with Greatest Economic Consequences

Synopsis

In this report we determine both the weather events most harmful to population health in terms of fatalities and injuries, and the weather events that have the greatest economic consequence in terms of property damage and crop damage, in the last 5 years ending 30 November 2011. Data was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We found that tornadoes are the most harmful to population health in terms of both fatalities and injuries, followed by flash floods and rip currents in terms of fatalities, and thunderstorm wind and lightning in terms of injuries. As for economic consequences, in decreasing order tornadoes, floods and hail cause the greatest amounts of property damage, while floods, frost/freeze events and hail cause the greatest amounts of crop damage.

Data Processing

Needed libraries are loaded.

library(dplyr)
library(lattice)

Reading in the data

Data for all available years is loaded.

noaa <- read.csv("repdata-data-StormData.csv.bz2", header=TRUE, na.strings="")

Note that there are interspersed comments in the data file but comparison of the loaded dataset’s tail with the tail of the csv file via a text editor suggests that the comments are handled correctly. The tail also shows that dates are in M/D/Y format, and that there are 902297 rows.

tail(noaa[, c("BGN_DATE", "COUNTYNAME", "EVTYPE")])
##                  BGN_DATE                           COUNTYNAME
## 902292 11/28/2011 0:00:00 TNZ001>004 - 019>021 - 048>055 - 088
## 902293 11/30/2011 0:00:00                         WYZ007 - 017
## 902294 11/10/2011 0:00:00                         MTZ009 - 010
## 902295  11/8/2011 0:00:00                               AKZ213
## 902296  11/9/2011 0:00:00                               AKZ202
## 902297 11/28/2011 0:00:00                               ALZ006
##                EVTYPE
## 902292 WINTER WEATHER
## 902293      HIGH WIND
## 902294      HIGH WIND
## 902295      HIGH WIND
## 902296       BLIZZARD
## 902297     HEAVY SNOW

Since this report is only about:

  1. The weather events that are most harmful to human health.
  2. The weather events that have the greatest economic consequences.

only the following columns are relevant.

relevant.col <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
                 "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")

The BGN_TIME column is not included since this report summarizes by year and therefore does not go to the level of granularity represented by BGN_TIME. As for events that cross year boundaries, we assume that Rule 2.3.2 is followed, in which events that span months will have an entry for each month [1, pg. 7].

For the relevant columns, the inferred data types are as follows:

str(noaa[, relevant.col])
## 'data.frame':    902297 obs. of  8 variables:
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...

It would be more proper if the BGN_DATE column is of type Date.

noaa$BGN_DATE <- as.Date(noaa$BGN_DATE, "%m/%d/%Y %H:%M:%S")
str(noaa[, "BGN_DATE"])
##  Date[1:902297], format: "1950-04-18" "1950-04-18" "1951-02-20" "1951-06-08" ...

A summary of the relevant columns shows that there are no negative values, and that (only) the multiplier exponents for property damage and crop damage (PROPDMGEXP and CROPDMGEXP respectively) have missing values. Also, the last begin date on which there is a recorded weather event is 2011-11-30.

summary(noaa[, relevant.col])
##     BGN_DATE                        EVTYPE         FATALITIES      
##  Min.   :1950-01-03   HAIL             :288661   Min.   :  0.0000  
##  1st Qu.:1995-04-20   TSTM WIND        :219940   1st Qu.:  0.0000  
##  Median :2002-03-18   THUNDERSTORM WIND: 82563   Median :  0.0000  
##  Mean   :1998-12-27   TORNADO          : 60652   Mean   :  0.0168  
##  3rd Qu.:2007-07-28   FLASH FLOOD      : 54277   3rd Qu.:  0.0000  
##  Max.   :2011-11-30   FLOOD            : 25326   Max.   :583.0000  
##                       (Other)          :170878                     
##     INJURIES            PROPDMG          PROPDMGEXP        CROPDMG       
##  Min.   :   0.0000   Min.   :   0.00   K      :424665   Min.   :  0.000  
##  1st Qu.:   0.0000   1st Qu.:   0.00   M      : 11330   1st Qu.:  0.000  
##  Median :   0.0000   Median :   0.00   0      :   216   Median :  0.000  
##  Mean   :   0.1557   Mean   :  12.06   B      :    40   Mean   :  1.527  
##  3rd Qu.:   0.0000   3rd Qu.:   0.50   5      :    28   3rd Qu.:  0.000  
##  Max.   :1700.0000   Max.   :5000.00   (Other):    84   Max.   :990.000  
##                                        NA's   :465934                    
##    CROPDMGEXP    
##  K      :281832  
##  M      :  1994  
##  k      :    21  
##  0      :    19  
##  B      :     9  
##  (Other):     9  
##  NA's   :618413

Converting damage multiplier exponents from character to numeric

To simplify the calculations downstream, the damage multiplier exponents will be converted from character to numeric. The set of characters for PROPDMGEXP and CROPDMGEXP are as follows:

levels(noaa$PROPDMGEXP)
##  [1] "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m"
## [18] "M"
levels(noaa$CROPDMGEXP)
## [1] "?" "0" "2" "B" "k" "K" "m" "M"

It is decided, pending further information, that:

  1. “B”/“b” stands for billions i.e. a multiplier exponent of 9.
  2. “M”/“m” stands for millions i.e. a multiplier exponent of 6.
  3. “K”/“k” stands for thousands (kilos) i.e. a multiplier exponent of 3.
  4. “H”/“h” stands for hundreds i.e. a multiplier exponent of 2.
  5. Digits are converted literally i.e. a multiplier exponent of the digit value.
  6. All other characters (including NAs) are converted to a multiplier exponent of 0 i.e a multiplier of 1.

The above rules are captured in the following function:

mult.exp.char2num <- function(mult.exp) {
    mult.exp <- as.character(mult.exp)
    if (!is.na(as.integer(mult.exp))) {
        return (as.integer(mult.exp))
    } else {
        return (switch(mult.exp,
                       B=, b=9,
                       M=, m=6,
                       K=, k=3,
                       H=, h=2,
                       0))  # Default multiplier exponent.
    }
}

Using the above function, the damage multiplier exponents are converted. The summary of the associated columns show that there are no missing values.

noaa$PROPDMGEXP <- sapply(noaa$PROPDMGEXP, mult.exp.char2num)
noaa$CROPDMGEXP <- sapply(noaa$CROPDMGEXP, mult.exp.char2num)
summary(noaa[, c("PROPDMGEXP", "CROPDMGEXP")])
##    PROPDMGEXP      CROPDMGEXP    
##  Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000  
##  Median :0.000   Median :0.0000  
##  Mean   :1.488   Mean   :0.9505  
##  3rd Qu.:3.000   3rd Qu.:3.0000  
##  Max.   :9.000   Max.   :9.0000

Results

Firstly, the number of recorded events per year is plotted.

noaa.year <- noaa %>% mutate(year = format(BGN_DATE, "%Y")) %>%
                      group_by(year) %>% summarize(count = length(BGN_DATE))
plot(x=noaa.year$year, y=noaa.year$count, type="b",
     main="Number of weather events recorded per year", xlab="Year",
     ylab="Event count")

Figure caption: The plot shows a sharp increase in the number of recorded events around 1993-1995. We are only interested in the last 5 years (2006-2011) however with reasons given below, and there appears to be a considerable number of recorded events for that period.

Focus is given only on the last 5 years because the climate is not static and there is variance between years. Since the last begin date on which there is a recorded event is 2011-11-30, the 5 years will begin on the day directly after 2006-11-30.

noaa.last5 <- noaa %>% filter(BGN_DATE > as.Date("2006-11-30"))
with(noaa.last5, c(`Min.` = min(BGN_DATE), `Max.` = max(BGN_DATE)))
##         Min.         Max. 
## "2006-12-01" "2011-11-30"

Weather events most harmful to human health in last 5 years

To determine the weather events most harmful to human health over the last 5 years, we obtain the total number of fatalities and injuries for each weather event and sort by descending order.

(noaa.last5.fatal <- noaa.last5 %>% group_by(EVTYPE) %>%
                     summarize(total = sum(FATALITIES)) %>%
                     arrange(desc(total)))
## Source: local data frame [48 x 2]
## 
##               EVTYPE total
## 1            TORNADO   865
## 2        FLASH FLOOD   296
## 3        RIP CURRENT   208
## 4               HEAT   182
## 5              FLOOD   161
## 6          LIGHTNING   159
## 7  THUNDERSTORM WIND   130
## 8     EXCESSIVE HEAT   119
## 9    COLD/WIND CHILL    94
## 10         AVALANCHE    83
## ..               ...   ...
(noaa.last5.injur <- noaa.last5 %>% group_by(EVTYPE) %>%
                     summarize(total = sum(INJURIES)) %>%
                     arrange(desc(total)))
## Source: local data frame [48 x 2]
## 
##               EVTYPE total
## 1            TORNADO  9666
## 2  THUNDERSTORM WIND  1393
## 3          LIGHTNING   923
## 4     EXCESSIVE HEAT   880
## 5               HEAT   702
## 6           WILDFIRE   425
## 7     WINTER WEATHER   324
## 8        FLASH FLOOD   316
## 9               HAIL   180
## 10             FLOOD   171
## ..               ...   ...

As shown in the output above, the weather event in the last 5 years that is most harmful to human health in terms of both fatalities and injuries is tornadoes. In terms of fatalities, the next most harmful events are flash floods and rip currents, while in terms of injuries, the next most harmful events are thunderstorm wind and lightning.

Weather events that have the greatest economic consequences in last 5 years

To determine the weather events that have the greatest economic consequences over the last 5 years, we obtain the total amount of property damage and crop damage for each weather event and sort by descending order.

(noaa.last5.propdmg <- noaa.last5 %>% group_by(EVTYPE) %>%
                       summarize(total = sum(PROPDMG*(10^PROPDMGEXP))) %>%
                       arrange(desc(total)))
## Source: local data frame [48 x 2]
## 
##               EVTYPE       total
## 1            TORNADO 14699413740
## 2              FLOOD 13969813800
## 3               HAIL  6099123600
## 4        FLASH FLOOD  5041609130
## 5   STORM SURGE/TIDE  4640643000
## 6  THUNDERSTORM WIND  3375990190
## 7          HURRICANE  2467600000
## 8           WILDFIRE  2200413470
## 9          HIGH WIND  1221896040
## 10      WINTER STORM   974440000
## ..               ...         ...
(noaa.last5.cropdmg <- noaa.last5 %>% group_by(EVTYPE) %>%
                       summarize(total = sum(CROPDMG*(10^CROPDMGEXP))) %>%
                       arrange(desc(total)))
## Source: local data frame [48 x 2]
## 
##               EVTYPE      total
## 1              FLOOD 2886110000
## 2       FROST/FREEZE  931801000
## 3               HAIL  868793000
## 4        FLASH FLOOD  711942000
## 5            DROUGHT  426441000
## 6  THUNDERSTORM WIND  398102000
## 7     TROPICAL STORM  180921000
## 8          HURRICANE  180510000
## 9          HIGH WIND  106571000
## 10           TORNADO  103210000
## ..               ...        ...

As shown in the output above, the weather event that had the greatest economic consequence in terms of property damage is tornadoes, followed by floods and hail. In terms of crop damage, the weather event that had the greatest economic consequence is floods, followed by frost/freeze events and hail.

Problems / Future Work

  • Very little analysis has been done regarding the possibility of systemic errors in the dataset. In particular, there have been no attempt to reconcile the 985 weather event types (EVTYPE) that appear in the whole dataset, since this is not a problem of the last 5 years as shown below. All 48 events shown are defined in [1], after Landslide is renamed to Debris Flow [1, pg. 1].
unique(noaa.last5$EVTYPE)
##  [1] DENSE FOG                HIGH WIND               
##  [3] HEAVY SNOW               BLIZZARD                
##  [5] THUNDERSTORM WIND        HEAVY RAIN              
##  [7] FLOOD                    FROST/FREEZE            
##  [9] WILDFIRE                 STRONG WIND             
## [11] HEAT                     DUST STORM              
## [13] FUNNEL CLOUD             WINTER STORM            
## [15] HIGH SURF                WINTER WEATHER          
## [17] HAIL                     LIGHTNING               
## [19] DROUGHT                  AVALANCHE               
## [21] EXTREME COLD/WIND CHILL  FLASH FLOOD             
## [23] TORNADO                  RIP CURRENT             
## [25] LAKE-EFFECT SNOW         ICE STORM               
## [27] SLEET                    SEICHE                  
## [29] COLD/WIND CHILL          WATERSPOUT              
## [31] MARINE THUNDERSTORM WIND MARINE HAIL             
## [33] MARINE STRONG WIND       DUST DEVIL              
## [35] ASTRONOMICAL LOW TIDE    LANDSLIDE               
## [37] COASTAL FLOOD            STORM SURGE/TIDE        
## [39] EXCESSIVE HEAT           DENSE SMOKE             
## [41] TROPICAL STORM           TROPICAL DEPRESSION     
## [43] HURRICANE                FREEZING FOG            
## [45] LAKESHORE FLOOD          MARINE HIGH WIND        
## [47] VOLCANIC ASHFALL         TSUNAMI                 
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND
  • Monetary amounts have not been adjusted for inflation.
  • There have been no attempt to determine via the climate literature as to whether 5 years is actually an appropriate time frame.

References

[1] National Oceanic & Atmospheric Administration (2007), “National Weather Service Instruction (NWSI) 10-1605, dated Aug. 17, 2007”.