This descriptive analysis summarizes storm data compiled by the National Weather Service on behalf of the U.S. National Oceanic and Atmospheric Administration. The database contains information on events occurring from 1950 to November 2011 in the United States and its Territories. This analysis only includes the subset corresponding to the fifty US states and the District of Columbia. The objective of the analysis is to answer these questions:
The method for answering the public health question is to examine the number of fatalities and injuries by event type. Similarly, the method for answering the economic consequences question is to examine property damage and crop damage dollar amounts by event type.
Information about the database is available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
The first steps in this analysis are:
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="c:/users/temp/temp.csv.bz2",method="libcurl")
storms <- read.csv("c:/users/temp/temp.csv.bz2")
file.remove("c:/users/temp/temp.csv.bz2")
## [1] TRUE
The next step is to load the dplyr package
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Create a data frame called injury containing events causing fatalities or injuries
state_list <- c(state.abb, "DC")
injury <- filter(storms, ( FATALITIES > 0 | INJURIES > 0) & STATE %in% state_list)
injury <- select(injury, EVTYPE, FATALITIES, INJURIES)
names(injury) <- tolower(names(injury))
injury$evtype <- factor(injury$evtype)
injury <- mutate(injury, casualties = fatalities + injuries)
Create a data frame called damage containing events causing property damage or crop damage
damage <- filter(storms, ( PROPDMG > 0 | CROPDMG > 0) & STATE %in% state_list)
damage <- select(damage, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
names(damage) <- tolower(names(damage))
damage$evtype <- factor(damage$evtype)
damage$propdmgexp <- factor(toupper(damage$propdmgexp))
damage$cropdmgexp <- factor(toupper(damage$cropdmgexp))
damage <- mutate(damage,
pdmultiple = 1,
pdmultiple = ifelse(propdmgexp=='K', 1000, pdmultiple),
pdmultiple = ifelse(propdmgexp=='M', 1000000, pdmultiple),
pdmultiple = ifelse(propdmgexp=='B', 1000000000, pdmultiple),
propdmgs = propdmg * pdmultiple,
cdmultiple = 1,
cdmultiple = ifelse(cropdmgexp=='K', 1000, cdmultiple),
cdmultiple = ifelse(cropdmgexp=='M', 1000000, cdmultiple),
cdmultiple = ifelse(cropdmgexp=='B', 1000000000, cdmultiple),
cropdmgs = cropdmg * cdmultiple,
damages = propdmgs + cropdmgs)
First find total injuries, fatalities and casualties for all event types. Then compare those totals against the ones for the top ten events in terms casualties, fatalities and injuries. Even though there are some inconsistencies in coding event types, the data shows that tornadoes account for the largest number of injuries and fatalities in the database.
summarize(injury,
Injuries = sum(injuries, na.rm = TRUE),
Fatalities = sum(fatalities, na.rm = TRUE),
Casualties = sum(casualties, na.rm = TRUE))
## Injuries Fatalities Casualties
## 1 139835 14867 154702
Top ten events types based on total casualties (injuries plus fatalities).
by_event <- group_by(injury, evtype)
injury_summary <- summarize(by_event,
Injuries = sum(injuries, na.rm = TRUE),
Fatalities = sum(fatalities, na.rm = TRUE),
Casualties = sum(casualties, na.rm = TRUE))
injury_summary <- arrange(injury_summary, desc(Casualties))
print(as.data.frame(injury_summary[1:10,c(1,4)]))
## evtype Casualties
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7460
## 4 FLOOD 7250
## 5 LIGHTNING 6030
## 6 HEAT 3037
## 7 FLASH FLOOD 2708
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1614
## 10 WINTER STORM 1527
Top ten event types based on fatalities.
injury_summary <- arrange(injury_summary, desc(Fatalities))
print(as.data.frame(injury_summary[1:10,c(1,3)]))
## evtype Fatalities
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 939
## 4 HEAT 937
## 5 LIGHTNING 806
## 6 TSTM WIND 504
## 7 FLOOD 464
## 8 RIP CURRENT 343
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Top ten event types based on injuries.
injury_summary <- arrange(injury_summary, desc(Injuries))
print(as.data.frame(injury_summary[1:10,1:2]))
## evtype Injuries
## 1 TORNADO 91346
## 2 TSTM WIND 6956
## 3 FLOOD 6786
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5224
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1769
## 9 THUNDERSTORM WIND 1481
## 10 HAIL 1361
Clearly, the tornado event type accounts for more than half the casualties, over one third the fatalities, and more than half the injuries in the data. However, there are some coding inconsistencies in the event type. Not all tornadoes are coded as ‘TORNADO’. Furthermore, other event types are also coded inconsistently. The following section shows some of these inconsistencies for different types of events. Note that for windstorms and floods the lists were limited to the top ten event types, based on total casualties, for brevity.
injury_summary <- arrange(injury_summary, desc(Casualties))
# Tornadoes
as.data.frame(filter(injury_summary, grepl('TORN', evtype)))
## evtype Injuries Fatalities Casualties
## 1 TORNADO 91346 5633 96979
## 2 WATERSPOUT/TORNADO 42 3 45
## 3 TORNADOES, TSTM WIND, HAIL 0 25 25
## 4 TORNADO F2 16 0 16
## 5 TORNADO F3 2 0 2
## 6 WATERSPOUT TORNADO 1 0 1
# Heat waves
as.data.frame(filter(injury_summary, grepl('HEAT', evtype)))
## evtype Injuries Fatalities Casualties
## 1 EXCESSIVE HEAT 6525 1903 8428
## 2 HEAT 2100 937 3037
## 3 HEAT WAVE 309 172 481
## 4 EXTREME HEAT 155 96 251
## 5 RECORD HEAT 50 2 52
## 6 HEAT WAVE DROUGHT 15 4 19
## 7 RECORD/EXCESSIVE HEAT 0 17 17
## 8 HEAT WAVES 0 5 5
## 9 DROUGHT/EXCESSIVE HEAT 0 2 2
# Windstorms
as.data.frame(filter(injury_summary, grepl('WIND', evtype)))[1:10,]
## evtype Injuries Fatalities Casualties
## 1 TSTM WIND 6956 504 7460
## 2 THUNDERSTORM WIND 1481 133 1614
## 3 HIGH WIND 1134 248 1382
## 4 THUNDERSTORM WINDS 908 64 972
## 5 STRONG WIND 277 103 380
## 6 HIGH WINDS 302 35 337
## 7 EXTREME COLD/WIND CHILL 24 125 149
## 8 WIND 86 23 109
## 9 COLD/WIND CHILL 12 95 107
## 10 TSTM WIND/HAIL 93 5 98
# Floods
as.data.frame(filter(injury_summary, grepl(c('FLOO'), evtype) | grepl(c('SURG'), evtype)))[1:10,]
## evtype Injuries Fatalities Casualties
## 1 FLOOD 6786 464 7250
## 2 FLASH FLOOD 1769 939 2708
## 3 STORM SURGE 38 13 51
## 4 FLOOD/FLASH FLOOD 15 17 32
## 5 FLASH FLOODING 8 19 27
## 6 STORM SURGE/TIDE 5 11 16
## 7 FLASH FLOOD/FLOOD 0 14 14
## 8 FLOODING 2 6 8
## 9 COASTAL FLOODING/EROSION 5 0 5
## 10 COASTAL FLOOD 2 3 5
# Winter Storms
as.data.frame(filter(injury_summary, grepl(c('WINT'), evtype) | grepl(c('BLIZ'), evtype)))
## evtype Injuries Fatalities Casualties
## 1 WINTER STORM 1321 206 1527
## 2 BLIZZARD 805 101 906
## 3 WINTER WEATHER 398 33 431
## 4 WINTER WEATHER/MIX 72 28 100
## 5 WINTRY MIX 77 1 78
## 6 WINTER WEATHER MIX 68 0 68
## 7 WINTER STORMS 17 10 27
## 8 WINTER STORM HIGH WINDS 15 1 16
## 9 HEAVY SNOW/BLIZZARD/AVALANCHE 1 0 1
# Hurricanes
as.data.frame(filter(injury_summary, grepl('HURR', evtype)))
## evtype Injuries Fatalities Casualties
## 1 HURRICANE/TYPHOON 922 63 985
## 2 HURRICANE 44 40 84
## 3 HURRICANE ERIN 1 6 7
## 4 HURRICANE-GENERATED SWELLS 2 0 2
## 5 HURRICANE OPAL 1 1 2
## 6 HURRICANE OPAL/HIGH WINDS 0 2 2
## 7 HURRICANE EMILY 1 0 1
## 8 HURRICANE FELIX 0 1 1
Despite inconsistencies in coding event types, it is clear that tornadoes account for the largest amount of injuries and fatalities on the database, and consequently the largest number of casualties. Thus, tornadoes had the greatest effect on public health for storms recorded between 1950 and November 2011 in the National Weather Service Storm Database.
The method used in this analysis to find the event with most severe economic consequence follows:
scaling_factor <- 1000000
grand_totals <- summarize(damage,
Property = sum(propdmgs, na.rm = TRUE) / scaling_factor,
Crops = sum(cropdmgs, na.rm = TRUE) / scaling_factor,
Overall = sum(damages, na.rm = TRUE) / scaling_factor)
grand_totals$Property <- format(round(grand_totals$Property, 1), nsmall = 1, big.mark = ',')
grand_totals$Crops <- format(round(grand_totals$Crops, 1), nsmall = 1, big.mark = ',')
grand_totals$Overall <- format(round(grand_totals$Overall, 1), nsmall = 1, big.mark = ',')
row.names(grand_totals) <- 'Grand Totals'
as.data.frame(grand_totals)
## Property Crops Overall
## Grand Totals 423,864.0 48,377.6 472,241.5
by_event <- group_by(damage, evtype)
damage_summary <- summarize(by_event,
Property = sum(propdmgs, na.rm = TRUE) / scaling_factor,
Crops = sum(cropdmgs, na.rm = TRUE) / scaling_factor,
Overall = sum(damages, na.rm = TRUE) / scaling_factor)
damage_summary <- arrange(damage_summary, desc(Overall))
damage_summary$Property <- format(round(damage_summary$Property, 1), nsmall=1, big.mark=",")
damage_summary$Crops <- format(round(damage_summary$Crops, 1), nsmall=1, big.mark = ',')
damage_summary$Overall <- format(round(damage_summary$Overall, 1), nsmall=1, big.mark = ',')
as.data.frame(damage_summary[1:10,])
## evtype Property Crops Overall
## 1 FLOOD 144,541.3 5,614.0 150,155.3
## 2 HURRICANE/TYPHOON 69,033.1 2,603.5 71,636.6
## 3 TORNADO 56,936.7 415.0 57,351.6
## 4 STORM SURGE 43,323.5 0.0 43,323.5
## 5 HAIL 15,732.3 3,026.0 18,758.2
## 6 FLASH FLOOD 15,884.3 1,406.9 17,291.2
## 7 DROUGHT 1,041.1 13,972.4 15,013.5
## 8 HURRICANE 9,914.0 2,189.9 12,103.9
## 9 RIVER FLOOD 5,118.9 5,029.5 10,148.4
## 10 ICE STORM 3,944.9 5,022.1 8,967.0
Four of the top ten event types above involve flood or storm surge, which is flooding that occurs when the wind pushes ocean water onto coastal areas. An example of storm surge is what happened in the New Jersey Shore when hurricane Sandy (downgraded to tropical storm) made landfall. The second largest event type is hurricane, the third event type is tornado, and the fifth one is hail. A closer look at these types of events follows.
as.data.frame(filter(damage_summary, grepl(c('FLOO'), evtype) | grepl(c('SURG'), evtype)))[1:10,]
## evtype Property Crops Overall
## 1 FLOOD 144,541.3 5,614.0 150,155.3
## 2 STORM SURGE 43,323.5 0.0 43,323.5
## 3 FLASH FLOOD 15,884.3 1,406.9 17,291.2
## 4 RIVER FLOOD 5,118.9 5,029.5 10,148.4
## 5 STORM SURGE/TIDE 4,640.0 0.0 4,640.0
## 6 FLASH FLOODING 307.3 15.1 322.4
## 7 FLASH FLOOD/FLOOD 272.5 0.6 273.0
## 8 FLOOD/FLASH FLOOD 174.0 95.0 269.1
## 9 COASTAL FLOOD 237.6 0.0 237.6
## 10 COASTAL FLOODING 126.4 0.1 126.4
as.data.frame(filter(damage_summary, grepl('HURR', evtype)))
## evtype Property Crops Overall
## 1 HURRICANE/TYPHOON 69,033.1 2,603.5 71,636.6
## 2 HURRICANE 9,914.0 2,189.9 12,103.9
## 3 HURRICANE OPAL 3,172.8 19.0 3,191.8
## 4 HURRICANE ERIN 258.1 136.0 394.1
## 5 HURRICANE OPAL/HIGH WINDS 100.0 10.0 110.0
## 6 HURRICANE EMILY 50.0 0.0 50.0
## 7 HURRICANE FELIX 0.5 0.5 1.0
## 8 HURRICANE GORDON 0.5 0.0 0.5
## 9 HURRICANE-GENERATED SWELLS 0.1 0.0 0.1
Note: the second event also appears in the list of hail related event types
as.data.frame(filter(damage_summary, grepl('TORN', evtype)))[1:10,]
## evtype Property Crops Overall
## 1 TORNADO 56,936.7 415.0 57,351.6
## 2 TORNADOES, TSTM WIND, HAIL 1,600.0 2.5 1,602.5
## 3 WATERSPOUT/TORNADO 51.1 0.0 51.1
## 4 TORNADO F1 2.4 0.0 2.4
## 5 TORNADO F2 1.6 0.0 1.6
## 6 TORNADO F3 0.7 0.0 0.7
## 7 TORNADO F0 0.1 0.0 0.1
## 8 WATERSPOUT TORNADO 0.0 0.0 0.0
## 9 WATERSPOUT-TORNADO 0.0 0.0 0.0
## 10 TORNADOES 0.0 0.0 0.0
Note: the second event also appears in the list of tornado related event types
as.data.frame(filter(damage_summary, grepl('HAIL', evtype)))[1:10,]
## evtype Property Crops Overall
## 1 HAIL 15,732.3 3,026.0 18,758.2
## 2 TORNADOES, TSTM WIND, HAIL 1,600.0 2.5 1,602.5
## 3 HAILSTORM 241.0 0.0 241.0
## 4 TSTM WIND/HAIL 44.3 64.7 108.9
## 5 SMALL HAIL 0.1 20.8 20.9
## 6 THUNDERSTORM WINDS HAIL 0.7 0.0 0.7
## 7 HAIL/WINDS 0.5 0.1 0.6
## 8 THUNDERSTORM WINDS/HAIL 0.4 0.0 0.4
## 9 HAIL 275 0.2 0.0 0.2
## 10 HAIL 450 0.2 0.0 0.2
The tables above show that flood related event types, including storm surge, accounted for most of the economic damage recorded in the National Weather Service Storm Database between 1950 and November 2011.