US Weather Events - Public health and economic consequences

Author: JayEnAar

Date: 20 June 2014

Context: Part of an assignment for a Coursera ‘Reporoducible Research’ online course, run by Prof RD Peng of Johns Hopkins School of Public Health

Synopsis

Weather events like storms hurricanes, tornadoes and extremes oftemperature can be expected to result in loss of life and limb, besides resulting in economic loss.

Is it possible to * estimate the extend of these adverse effects on human health and economic welfare? and * to determine the types of extreme weather events that result in the most serious adverse consequences.

The US Government has a system for systematically collecting data on weather events that allows these questions to be answered with a reasonable degree of certainty. The dataset spans a period of 60 years from 1950 to 2011. It is beleived that data quality has almost certainly improved in more recent times and so temporal comparisons may not reflect real changes in the effects of extreme wetaher events. Deaths and injuries as a result of weather events is also influenced by the number of people and their degree of exposure. These too have changed over the last half century. Economic loss is equally influenced by the growth in the relative wealth of Americans. More people live in coastal areas, in areas prone to high temperature episodes and wild fires, and more people now have expensive homes, cars and boats to lose, than was the case 30-60 years ago.

Therefore temporal analysis was not performed.

The Analysis shows that between 1950 and 2011, there were 15,145 deaths and 140,528 injuries due to weather events. Property damage amounted to an estimated 10.88 billion USD, and damage to crops totalled 1.38 billion. Tornadoes were the single biggest cause of fatalities, resulting in 5,633 deaths. But counting tornadoes, high winds, storms hurricanes, typhoons and lightning as part of TWISTR - an acronym that encompasses atmospheric disturbances associated with precipitation, the total deaths over 60 years in 7599 (50.2% of the total)

TWISTR also accounted for 82% of all injuries

Tornadoes were also the biggest cause of property damage.The top 10 weather events causing property damage were all part of the TWISTR constellation and accounted for 9.9 billion of damage, or 91% of all weather related property damage. Similarly for crop losses, the top cause was hail, but taking the top 10 together (all part of TWISTR) they accounted for 1.25 billion, or 90.7% of all crop losses due to weather events.

Data processing

The data comes from the U.S. National Oceanic and Atmospheric Administration’s They maintain a storm database which contains this link to a zipped data file. The csv data file is compressed using bzip2, a free data compressing software available from

zipfile <- "repdata-data-StormData.csv.bz2"
stormdata <- read.csv(zipfile)

The data file consists of 9,02,297 observations with 37 variables. The data dictionary is available here.

The key variables of interest for this analysis are as follows:

  • EVTYPE : Event type. there are 985 different types of events a; a factor variable with 985 levels

  • FATALITIES : the number of deaths recorded for each weather event and presumably directaly attributable to it. A numeric variable

  • INJURIES : as above but for for non-lethal injuries. A numeric variable

  • PROPDMG : A numeric variable that estimates in thousands of USD the cost of property damage. The estimates are somewhat rough and ready - see Appendix B of the data dictionary for the approximations used.

  • CROPDMG : A numeric variable that estimates the cost of lost or damaged crops. This too is an estimate, see above.

  • STATE: This appears to be the usual 2-alphabet code for US States. However this is a factor variable with, surprisingly, 72 levels; one would have expected 50 or 51. There may be some mis-recording of data, or other areas of the wider North America / Central America / Caribbean may have been included in the data set. Weather, after all does not respect state boundaries.

The other variables are not of particular interest for this analysis and so will be dropped when creating a smaller data set with just the following variables (the variable position is in brackets)

  • STATE(7), EVTYPE(8), FATALITIES(23), INJURIES(24), PROPDMG(25), CROPDMG(27)
stormdata <- stormdata[, c(7,8,23,24,25,27)]

This is the data set that will be used for further analysis

Results

The total deaths across the United States are: 15,145

total.deaths <- sum(stormdata$FATALITIES)
total.deaths
## [1] 15145

The total number of non-fatal injuries is: 140,528

total.injuries <- sum(stormdata$INJURIES)
total.injuries
## [1] 140528

The total cost of adverse weather in terms of damage to property is (in thousands of dollars): 10, 884,500 (or 10.8 billion USD)

total.propdmg <- sum(stormdata$PROPDMG)
total.propdmg
## [1] 10884500

The total cost of adverse weather in terms of damage to crops is (in thousands of dollars): 1,377,827 (or1.37 billion USD)

total.cropdmg <- sum(stormdata$CROPDMG)
total.cropdmg
## [1] 1377827

A quick examination of the data shows that for a large number of records there are zero fatalities and zero injuries. It might be useful therefore to create subsets of the date where there are a) 1 or more fatalities; and b) 1 or more injuries and use just this cutdown version of the data set for specific analyses

require(plyr)
## Loading required package: plyr
fatalevents <- subset(stormdata, FATALITIES > 0)
injuryevents <- subset(stormdata, INJURIES > 0)
save(fatalevents, file="fatalevents.Rda")
save(injuryevents, file="injuryevents.Rda")
str(fatalevents)
## 'data.frame':    6974 obs. of  6 variables:
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  1 1 4 1 6 7 2 5 25 2 ...
##  $ INJURIES  : num  14 26 50 8 195 12 3 20 200 90 ...
##  $ PROPDMG   : num  25 250 25 25 2.5 250 25 2.5 2.5 0.25 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
str(injuryevents)
## 'data.frame':    17604 obs. of  6 variables:
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 1 0 0 1 ...
##  $ INJURIES  : num  15 2 2 2 6 1 14 3 3 26 ...
##  $ PROPDMG   : num  25 25 2.5 2.5 2.5 2.5 25 2.5 2.5 250 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
  • Which weather events have the most adverse effects on public health?

Fatalities and Injures are a good measure of the adverse effect of weather events on huiman health

The following code attempts to answer this question: The plan is to create a summary of the total number of deaths by event and rearrange this new data table in decreasing order of deaths and

deaths.by.event <- ddply(fatalevents, "EVTYPE", summarise, deaths = sum(FATALITIES), 
                   proploss = sum(PROPDMG), croploss = sum(CROPDMG) )
str(deaths.by.event)
## 'data.frame':    168 obs. of  4 variables:
##  $ EVTYPE  : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 18 19 29 30 42 44 54 56 57 60 ...
##  $ deaths  : num  1 224 1 101 1 1 3 2 1 3 ...
##  $ proploss: num  0 660 0 4136 15 ...
##  $ croploss: num  0 0 0 112 0 0 0 0 0 0 ...
injuries.by.event <- ddply(injuryevents, .(EVTYPE), summarise, injuries = sum(INJURIES),
                     proploss = sum(PROPDMG), croploss = sum(CROPDMG))
str(injuries.by.event)
## 'data.frame':    158 obs. of  4 variables:
##  $ EVTYPE  : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 19 29 30 42 44 49 54 58 59 60 ...
##  $ injuries: num  170 24 805 1 13 2 2 5 1 1 ...
##  $ proploss: num  677 0 4208 15 0 ...
##  $ croploss: num  0 0 155 0 0 0 0 0 0 0 ...

The above shows that 168 different types of weather events capture ALL the fatalities, and 158 capture all the injuries. Even this is too large a number and so. we’ll take just the top 20.

deaths.by.topevents <- deaths.by.event[with(deaths.by.event, order(-deaths)), ]
deaths.by.top20events <- deaths.by.topevents[1:20, ]
deaths.by.top20events
##                      EVTYPE deaths proploss croploss
## 141                 TORNADO   5633 143960.3  6613.80
## 26           EXCESSIVE HEAT   1903    203.3   492.40
## 35              FLASH FLOOD    978  36411.6  4578.25
## 57                     HEAT    937    135.0   450.80
## 97                LIGHTNING    816   1848.5     0.00
## 145               TSTM WIND    504  10843.9   115.00
## 40                    FLOOD    470  20561.5  6375.30
## 116             RIP CURRENT    368      0.0     0.00
## 75                HIGH WIND    248  12294.3   617.93
## 2                 AVALANCHE    224    660.5     0.00
## 163            WINTER STORM    206   6935.8    30.00
## 117            RIP CURRENTS    204      0.0     0.00
## 58                HEAT WAVE    172    666.8   200.00
## 30             EXTREME COLD    160   2603.4     1.75
## 136       THUNDERSTORM WIND    133   6102.5   600.00
## 63               HEAVY SNOW    127   3944.7    40.00
## 31  EXTREME COLD/WIND CHILL    125      0.0     0.00
## 131             STRONG WIND    103   2014.1    63.40
## 4                  BLIZZARD    101   4136.5   112.00
## 71                HIGH SURF    101    765.0     0.00
barplot(deaths.by.top20events$deaths[1:10], names.arg = deaths.by.top20events$EVTYPE[1:10], cex.axis= 0.8, cex.names=0.35, xlab = "10 weather events that lead to the most deaths", ylab="deaths")

plot of chunk unnamed-chunk-9

The table and chart show that by far and away weather event that results in the most deaths are Tornados. However there is a lot of over lap. Tornadoes cause 5,633 deaths but inluding TSTM Wind (504 deaths + a further 133 classed as Thunderstorm wind, High Wind (248), strong wind(103) heavy rain(98), hurricane and typhoon(64), AND Lightning (816) would bring the total due to a broad category of weather event that could be referred to as ‘Tornados, wind, storm and Rain (’TWISTR’ an acronym I just made up) to:

t <- 5633+504+133+248+103+98+64 +816
t
## [1] 7599

As a %age of the total weather related deaths TWISTR accounts for 50% of all deaths

t*100/total.deaths
## [1] 50.17
injuries.by.topevents <- injuries.by.event[with(injuries.by.event, order(-injuries)), ]
injuries.by.top20events <- injuries.by.topevents[1:20, ]
injuries.by.top20events
##                 EVTYPE injuries proploss croploss
## 129            TORNADO    91346 851910.7 25101.69
## 135          TSTM WIND     6957 101786.6  3075.75
## 30               FLOOD     6789  11679.2  5741.05
## 20      EXCESSIVE HEAT     6525    207.5   492.40
## 85           LIGHTNING     5230  18819.3    13.55
## 47                HEAT     2100    145.0   485.80
## 79           ICE STORM     1975   5147.1  1015.00
## 28         FLASH FLOOD     1777  33275.7  5414.70
## 121  THUNDERSTORM WIND     1488  36077.6  1405.50
## 45                HAIL     1361  10564.8  3463.00
## 152       WINTER STORM     1321  12151.9   293.00
## 76   HURRICANE/TYPHOON     1275    672.2   301.51
## 63           HIGH WIND     1137  34672.3  1450.59
## 53          HEAVY SNOW     1021   8389.9   170.00
## 149           WILDFIRE      911  17762.0  1068.20
## 122 THUNDERSTORM WINDS      908  29464.3  1291.55
## 3             BLIZZARD      805   4208.1   155.00
## 33                 FOG      734   6680.9     0.00
## 148   WILD/FOREST FIRE      545   8365.9   506.00
## 19          DUST STORM      440   1629.0   100.00
barplot(injuries.by.top20events$injuries[1:10], names.arg = injuries.by.top20events$EVTYPE[1:10], cex.axis= 0.8, cex.names=0.35, xlab = "10 weather events that lead to the most injuries", ylab="injuries")

plot of chunk unnamed-chunk-12

The table and chart above show that Tornados by far cause the most injuries. here too using my TWISTR category of weather event the total number of case of non -fatal injuries would be:

i <- 91346+6957+6789+5230+1488+1137+908+340+302+280
i
## [1] 114777

and this would amoun to

i*100/total.injuries
## [1] 81.68

Economic Consequences

It is a reasonable assumpton to make that the weather events that result in the biggest economic damage will be the same events that cause loss of life and limb. On the basis of this reasoning I constructed a list of the top 20 weather events (as recorded in the data base - these are not the same as in the weather events table in the NOAA manual) that account (as above) for the most deaths and injuries.

Using this list the plan is to create a subset of records in the original data file that record the proprety dmage and crop damage from these top 20 weather events. It is a reasonable expectation that Tornados will, as in the case of health effects, account for the largest economic loss. There is considerable overlap between these two lists of top 20 events, with 12 events common to both lists, and a total of 28 unique events in either or both lists

eventsA <- deaths.by.top20events$EVTYPE
eventsB <- injuries.by.top20events$EVTYPE
eventsA.and.B <- intersect(eventsA, eventsB)
eventsA.or.B <- union(eventsA,eventsB)

Looking at the output from the above code, I decided to use the eventsA.or.B. The 12 events included are:

events <- eventsA.or.B
events
##  [1] "TORNADO"                 "EXCESSIVE HEAT"         
##  [3] "FLASH FLOOD"             "HEAT"                   
##  [5] "LIGHTNING"               "TSTM WIND"              
##  [7] "FLOOD"                   "RIP CURRENT"            
##  [9] "HIGH WIND"               "AVALANCHE"              
## [11] "WINTER STORM"            "RIP CURRENTS"           
## [13] "HEAT WAVE"               "EXTREME COLD"           
## [15] "THUNDERSTORM WIND"       "HEAVY SNOW"             
## [17] "EXTREME COLD/WIND CHILL" "STRONG WIND"            
## [19] "BLIZZARD"                "HIGH SURF"              
## [21] "ICE STORM"               "HAIL"                   
## [23] "HURRICANE/TYPHOON"       "WILDFIRE"               
## [25] "THUNDERSTORM WINDS"      "FOG"                    
## [27] "WILD/FOREST FIRE"        "DUST STORM"

Now to select from the main data file those records that have one of these weather events recorded.

econdmg.events <- subset(stormdata, EVTYPE %in% events, select = c(STATE,EVTYPE,PROPDMG,CROPDMG))

str(econdmg.events)
## 'data.frame':    834992 obs. of  4 variables:
##  $ STATE  : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ PROPDMG: num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ CROPDMG: num  0 0 0 0 0 0 0 0 0 0 ...

The total dollar value of the loss due to these selected weather events are: for damage to property

sum(econdmg.events$PROPDMG)
## [1] 10379197

and for damage to crops

#sum(econdmg.events$CROPDMG)

Next to sum the property damage and crop damage by event type and create 2 data frames

propdmg.events <- aggregate(econdmg.events$PROPDMG, list(Event = econdmg.events$EVTYPE), sum)
str(propdmg.events)
## 'data.frame':    28 obs. of  2 variables:
##  $ Event: Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 19 30 117 130 140 141 153 170 188 244 ...
##  $ x    : num  1624 25318 5050 1460 7658 ...
colnames(propdmg.events)[2] <- "PROPDMG"
propdmg.by.events <- propdmg.events[with(propdmg.events, order(-PROPDMG)), ]
cropdmg.events <- aggregate(econdmg.events$CROPDMG, list(Event = econdmg.events$EVTYPE), sum)
colnames(cropdmg.events)[2] <- "CROPDMG"
cropdmg.by.events <- cropdmg.events[with(cropdmg.events, order(-CROPDMG)), ]
head(propdmg.by.events, 10)
##                 Event PROPDMG
## 24            TORNADO 3212258
## 7         FLASH FLOOD 1420125
## 25          TSTM WIND 1335966
## 8               FLOOD  899938
## 22  THUNDERSTORM WIND  876844
## 10               HAIL  688693
## 18          LIGHTNING  603352
## 23 THUNDERSTORM WINDS  446293
## 15          HIGH WIND  324732
## 28       WINTER STORM  132721
head(cropdmg.by.events, 10)
##                 Event CROPDMG
## 10               HAIL  579596
## 7         FLASH FLOOD  179200
## 8               FLOOD  168038
## 25          TSTM WIND  109203
## 24            TORNADO  100019
## 22  THUNDERSTORM WIND   66791
## 23 THUNDERSTORM WINDS   18685
## 15          HIGH WIND   17283
## 5        EXTREME COLD    6121
## 16  HURRICANE/TYPHOON    4798
barplot(propdmg.by.events$PROPDMG[1:10], names.arg = propdmg.by.events$Event[1:10], cex.axis= 0.8, cex.names=0.35, xlab = "10 weather events that result in the most property dmage", ylab ="Thousand USD")

plot of chunk unnamed-chunk-20

As shown by the tables and the graph above the weather event that causes the most property damage are Tornados. The top 10 weather events - all part of TWISTR - together cause property damage of (in millions of dollars)

proploss <- sum(propdmg.by.events$PROPDMG[1:10])/ 1000
proploss
## [1] 9941

In percentage terms this amounts to

proploss*10^5/total.propdmg
## [1] 91.33

The weather events that cause the most crop loss are hail followed by floods, Thunderstorms, high winds and Tornados. The top 10 events for crop losses - all part of the TWISTR category, together cause crop losses of: (in millions of dollars)

croploss <- sum(cropdmg.by.events$CROPDMG[1:10]) /1000
croploss
## [1] 1250

In percentage terms this amounts to

croploss*10^5/total.cropdmg
## [1] 90.7

End of report