This descriptive analysis summarizes storm data compiled by the National Weather Service on behalf of the U.S. National Oceanic and Atmospheric Administration. The database contains information on events occurring from 1950 to November 2011 in the United States and its Territories. This analysis only includes the subset corresponding to the fifty US states and the District of Columbia. The objective of the analysis is to answer these questions:

  1. Which types storms had the most harmful effects on public health?
  2. Which types of storms had the most severe economic consequences?

The method for answering the public health question is to examine the number of fatalities and injuries by event type. Similarly, the method for answering the economic consequences question is to examine property damage and crop damage dollar amounts by event type.

Data Processing

Information about the database is available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

The first steps in this analysis are:

  1. Download the data into a temporary file.
  2. Read the storm data from the temporary file into a data frame called storms.
  3. Delete the temporary file.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="c:/users/temp/temp.csv.bz2",method="libcurl")
storms <- read.csv("c:/users/temp/temp.csv.bz2")
file.remove("c:/users/temp/temp.csv.bz2")
## [1] TRUE

The next step is to load the dplyr package

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Create a data frame called injury containing events causing fatalities or injuries

  1. Include only the fifty US states and the District of Columbia
  2. Define casualties as the sum of fatalities and injuries
state_list <- c(state.abb, "DC")
injury <- filter(storms, ( FATALITIES > 0 | INJURIES > 0) & STATE %in% state_list)
injury <- select(injury, EVTYPE, FATALITIES, INJURIES)
names(injury) <- tolower(names(injury))
injury$evtype <- factor(injury$evtype)
injury <- mutate(injury, casualties = fatalities + injuries)

Create a data frame called damage containing events causing property damage or crop damage

  1. Include only the fifty US states and the District of Columbia
  2. Convert property damage amounts to dollars (K=thousands, M=millions, B=billions, otherwise assume dollars).
  3. Similarly convert crop damage amounts to dollars.
  4. Define damages as the sum of property damage and crop damage.
damage <- filter(storms, ( PROPDMG > 0 | CROPDMG > 0) & STATE %in% state_list)
damage <- select(damage, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
names(damage) <- tolower(names(damage))
damage$evtype <- factor(damage$evtype)
damage$propdmgexp <- factor(toupper(damage$propdmgexp))
damage$cropdmgexp <- factor(toupper(damage$cropdmgexp))
damage <- mutate(damage,
                 pdmultiple = 1,
                 pdmultiple = ifelse(propdmgexp=='K', 1000, pdmultiple),
                 pdmultiple = ifelse(propdmgexp=='M', 1000000, pdmultiple),
                 pdmultiple = ifelse(propdmgexp=='B', 1000000000, pdmultiple),
                 propdmgs = propdmg * pdmultiple,
                 cdmultiple = 1,
                 cdmultiple = ifelse(cropdmgexp=='K', 1000, cdmultiple),
                 cdmultiple = ifelse(cropdmgexp=='M', 1000000, cdmultiple),
                 cdmultiple = ifelse(cropdmgexp=='B', 1000000000, cdmultiple),
                 cropdmgs = cropdmg * cdmultiple,
                 damages = propdmgs + cropdmgs)

Find the Event Type That Had the Most Severe Public Health Effects

First find total injuries, fatalities and casualties for all event types. Then compare those totals against the ones for the top ten events in terms casualties, fatalities and injuries. Even though there are some inconsistencies in coding event types, the data shows that tornadoes account for the largest number of injuries and fatalities in the database.

summarize(injury, 
          Injuries = sum(injuries, na.rm = TRUE),
          Fatalities = sum(fatalities, na.rm = TRUE),
          Casualties = sum(casualties, na.rm = TRUE))
##   Injuries Fatalities Casualties
## 1   139835      14867     154702

Top ten events types based on total casualties (injuries plus fatalities).

by_event <- group_by(injury, evtype)
injury_summary <- summarize(by_event, 
                           Injuries = sum(injuries, na.rm = TRUE),
                           Fatalities = sum(fatalities, na.rm = TRUE),
                           Casualties = sum(casualties, na.rm = TRUE))
injury_summary <- arrange(injury_summary, desc(Casualties))
print(as.data.frame(injury_summary[1:10,c(1,4)]))
##               evtype Casualties
## 1            TORNADO      96979
## 2     EXCESSIVE HEAT       8428
## 3          TSTM WIND       7460
## 4              FLOOD       7250
## 5          LIGHTNING       6030
## 6               HEAT       3037
## 7        FLASH FLOOD       2708
## 8          ICE STORM       2064
## 9  THUNDERSTORM WIND       1614
## 10      WINTER STORM       1527

Top ten event types based on fatalities.

injury_summary <- arrange(injury_summary, desc(Fatalities))
print(as.data.frame(injury_summary[1:10,c(1,3)]))
##            evtype Fatalities
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        939
## 4            HEAT        937
## 5       LIGHTNING        806
## 6       TSTM WIND        504
## 7           FLOOD        464
## 8     RIP CURRENT        343
## 9       HIGH WIND        248
## 10      AVALANCHE        224

Top ten event types based on injuries.

injury_summary <- arrange(injury_summary, desc(Injuries)) 
print(as.data.frame(injury_summary[1:10,1:2]))
##               evtype Injuries
## 1            TORNADO    91346
## 2          TSTM WIND     6956
## 3              FLOOD     6786
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5224
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1769
## 9  THUNDERSTORM WIND     1481
## 10              HAIL     1361

Clearly, the tornado event type accounts for more than half the casualties, over one third the fatalities, and more than half the injuries in the data. However, there are some coding inconsistencies in the event type. Not all tornadoes are coded as ‘TORNADO’. Furthermore, other event types are also coded inconsistently. The following section shows some of these inconsistencies for different types of events. Note that for windstorms and floods the lists were limited to the top ten event types, based on total casualties, for brevity.

injury_summary <- arrange(injury_summary, desc(Casualties))
# Tornadoes
as.data.frame(filter(injury_summary, grepl('TORN', evtype)))
##                       evtype Injuries Fatalities Casualties
## 1                    TORNADO    91346       5633      96979
## 2         WATERSPOUT/TORNADO       42          3         45
## 3 TORNADOES, TSTM WIND, HAIL        0         25         25
## 4                 TORNADO F2       16          0         16
## 5                 TORNADO F3        2          0          2
## 6         WATERSPOUT TORNADO        1          0          1
# Heat waves
as.data.frame(filter(injury_summary, grepl('HEAT', evtype)))
##                   evtype Injuries Fatalities Casualties
## 1         EXCESSIVE HEAT     6525       1903       8428
## 2                   HEAT     2100        937       3037
## 3              HEAT WAVE      309        172        481
## 4           EXTREME HEAT      155         96        251
## 5            RECORD HEAT       50          2         52
## 6      HEAT WAVE DROUGHT       15          4         19
## 7  RECORD/EXCESSIVE HEAT        0         17         17
## 8             HEAT WAVES        0          5          5
## 9 DROUGHT/EXCESSIVE HEAT        0          2          2
# Windstorms
as.data.frame(filter(injury_summary, grepl('WIND', evtype)))[1:10,]
##                     evtype Injuries Fatalities Casualties
## 1                TSTM WIND     6956        504       7460
## 2        THUNDERSTORM WIND     1481        133       1614
## 3                HIGH WIND     1134        248       1382
## 4       THUNDERSTORM WINDS      908         64        972
## 5              STRONG WIND      277        103        380
## 6               HIGH WINDS      302         35        337
## 7  EXTREME COLD/WIND CHILL       24        125        149
## 8                     WIND       86         23        109
## 9          COLD/WIND CHILL       12         95        107
## 10          TSTM WIND/HAIL       93          5         98
# Floods
as.data.frame(filter(injury_summary, grepl(c('FLOO'), evtype) | grepl(c('SURG'), evtype)))[1:10,] 
##                      evtype Injuries Fatalities Casualties
## 1                     FLOOD     6786        464       7250
## 2               FLASH FLOOD     1769        939       2708
## 3               STORM SURGE       38         13         51
## 4         FLOOD/FLASH FLOOD       15         17         32
## 5            FLASH FLOODING        8         19         27
## 6          STORM SURGE/TIDE        5         11         16
## 7         FLASH FLOOD/FLOOD        0         14         14
## 8                  FLOODING        2          6          8
## 9  COASTAL FLOODING/EROSION        5          0          5
## 10            COASTAL FLOOD        2          3          5
# Winter Storms
as.data.frame(filter(injury_summary, grepl(c('WINT'), evtype) | grepl(c('BLIZ'), evtype))) 
##                          evtype Injuries Fatalities Casualties
## 1                  WINTER STORM     1321        206       1527
## 2                      BLIZZARD      805        101        906
## 3                WINTER WEATHER      398         33        431
## 4            WINTER WEATHER/MIX       72         28        100
## 5                    WINTRY MIX       77          1         78
## 6            WINTER WEATHER MIX       68          0         68
## 7                 WINTER STORMS       17         10         27
## 8       WINTER STORM HIGH WINDS       15          1         16
## 9 HEAVY SNOW/BLIZZARD/AVALANCHE        1          0          1
# Hurricanes
as.data.frame(filter(injury_summary, grepl('HURR', evtype)))
##                       evtype Injuries Fatalities Casualties
## 1          HURRICANE/TYPHOON      922         63        985
## 2                  HURRICANE       44         40         84
## 3             HURRICANE ERIN        1          6          7
## 4 HURRICANE-GENERATED SWELLS        2          0          2
## 5             HURRICANE OPAL        1          1          2
## 6  HURRICANE OPAL/HIGH WINDS        0          2          2
## 7            HURRICANE EMILY        1          0          1
## 8            HURRICANE FELIX        0          1          1

Conclusion

Despite inconsistencies in coding event types, it is clear that tornadoes account for the largest amount of injuries and fatalities on the database, and consequently the largest number of casualties. Thus, tornadoes had the greatest effect on public health for storms recorded between 1950 and November 2011 in the National Weather Service Storm Database.

Find the Event Type with Most Severe Economic Consequences

The method used in this analysis to find the event with most severe economic consequence follows:

  1. Find grand totals for property damage, crop damage and overall damages (in millions) by event type.
    1. Overall damage is the sum of property damage and crop damage.
  2. Print the top 10 event types based on overall damage to determine what event types caused the most damage.
  3. From the top 10 event types select the most prominent categories, and look for related event types.

Grand Totals

Property, Crop and Overall Damage Amounts Are in Millions of Dollars

scaling_factor <- 1000000
grand_totals <- summarize(damage,
                         Property = sum(propdmgs, na.rm = TRUE) / scaling_factor,
                         Crops = sum(cropdmgs, na.rm = TRUE) / scaling_factor,
                         Overall = sum(damages, na.rm = TRUE) / scaling_factor)
grand_totals$Property <- format(round(grand_totals$Property, 1), nsmall = 1, big.mark = ',')
grand_totals$Crops <- format(round(grand_totals$Crops, 1), nsmall = 1, big.mark = ',')
grand_totals$Overall <- format(round(grand_totals$Overall, 1), nsmall = 1, big.mark = ',')
row.names(grand_totals) <- 'Grand Totals'
as.data.frame(grand_totals)
##               Property    Crops   Overall
## Grand Totals 423,864.0 48,377.6 472,241.5

Top Ten Event Types Based on Overall Damage

Property, Crop and Overall Damage Amounts Are in Millions of Dollars

by_event <- group_by(damage, evtype)
damage_summary <- summarize(by_event, 
                            Property = sum(propdmgs, na.rm = TRUE) / scaling_factor, 
                            Crops = sum(cropdmgs, na.rm = TRUE) / scaling_factor,
                            Overall = sum(damages, na.rm = TRUE) / scaling_factor)
damage_summary <- arrange(damage_summary, desc(Overall))
damage_summary$Property <- format(round(damage_summary$Property, 1), nsmall=1, big.mark=",")
damage_summary$Crops <- format(round(damage_summary$Crops, 1), nsmall=1, big.mark = ',')
damage_summary$Overall <- format(round(damage_summary$Overall, 1), nsmall=1, big.mark = ',')
as.data.frame(damage_summary[1:10,])
##               evtype  Property    Crops   Overall
## 1              FLOOD 144,541.3  5,614.0 150,155.3
## 2  HURRICANE/TYPHOON  69,033.1  2,603.5  71,636.6
## 3            TORNADO  56,936.7    415.0  57,351.6
## 4        STORM SURGE  43,323.5      0.0  43,323.5
## 5               HAIL  15,732.3  3,026.0  18,758.2
## 6        FLASH FLOOD  15,884.3  1,406.9  17,291.2
## 7            DROUGHT   1,041.1 13,972.4  15,013.5
## 8          HURRICANE   9,914.0  2,189.9  12,103.9
## 9        RIVER FLOOD   5,118.9  5,029.5  10,148.4
## 10         ICE STORM   3,944.9  5,022.1   8,967.0

Four of the top ten event types above involve flood or storm surge, which is flooding that occurs when the wind pushes ocean water onto coastal areas. An example of storm surge is what happened in the New Jersey Shore when hurricane Sandy (downgraded to tropical storm) made landfall. The second largest event type is hurricane, the third event type is tornado, and the fifth one is hail. A closer look at these types of events follows.

Top Ten Flood Event Types (Including Storm Surge)

Property, Crop and Overall Damage Amounts Are in Millions of Dollars

as.data.frame(filter(damage_summary, grepl(c('FLOO'), evtype) | grepl(c('SURG'), evtype)))[1:10,]
##               evtype  Property    Crops   Overall
## 1              FLOOD 144,541.3  5,614.0 150,155.3
## 2        STORM SURGE  43,323.5      0.0  43,323.5
## 3        FLASH FLOOD  15,884.3  1,406.9  17,291.2
## 4        RIVER FLOOD   5,118.9  5,029.5  10,148.4
## 5   STORM SURGE/TIDE   4,640.0      0.0   4,640.0
## 6     FLASH FLOODING     307.3     15.1     322.4
## 7  FLASH FLOOD/FLOOD     272.5      0.6     273.0
## 8  FLOOD/FLASH FLOOD     174.0     95.0     269.1
## 9      COASTAL FLOOD     237.6      0.0     237.6
## 10  COASTAL FLOODING     126.4      0.1     126.4

Hurricane Event Types

Property, Crop and Overall Damage Amounts Are in Millions of Dollars

as.data.frame(filter(damage_summary, grepl('HURR', evtype)))
##                       evtype  Property    Crops   Overall
## 1          HURRICANE/TYPHOON  69,033.1  2,603.5  71,636.6
## 2                  HURRICANE   9,914.0  2,189.9  12,103.9
## 3             HURRICANE OPAL   3,172.8     19.0   3,191.8
## 4             HURRICANE ERIN     258.1    136.0     394.1
## 5  HURRICANE OPAL/HIGH WINDS     100.0     10.0     110.0
## 6            HURRICANE EMILY      50.0      0.0      50.0
## 7            HURRICANE FELIX       0.5      0.5       1.0
## 8           HURRICANE GORDON       0.5      0.0       0.5
## 9 HURRICANE-GENERATED SWELLS       0.1      0.0       0.1

Top Ten Tornado Event Types

Property, Crop and Overall Damage Amounts Are in Millions of Dollars

Note: the second event also appears in the list of hail related event types

as.data.frame(filter(damage_summary, grepl('TORN', evtype)))[1:10,]
##                        evtype  Property    Crops   Overall
## 1                     TORNADO  56,936.7    415.0  57,351.6
## 2  TORNADOES, TSTM WIND, HAIL   1,600.0      2.5   1,602.5
## 3          WATERSPOUT/TORNADO      51.1      0.0      51.1
## 4                  TORNADO F1       2.4      0.0       2.4
## 5                  TORNADO F2       1.6      0.0       1.6
## 6                  TORNADO F3       0.7      0.0       0.7
## 7                  TORNADO F0       0.1      0.0       0.1
## 8          WATERSPOUT TORNADO       0.0      0.0       0.0
## 9          WATERSPOUT-TORNADO       0.0      0.0       0.0
## 10                  TORNADOES       0.0      0.0       0.0

Top Ten Hail Event Types

Property, Crop and Overall Damage Amounts Are in Millions of Dollars

Note: the second event also appears in the list of tornado related event types

as.data.frame(filter(damage_summary, grepl('HAIL', evtype)))[1:10,]
##                        evtype  Property    Crops   Overall
## 1                        HAIL  15,732.3  3,026.0  18,758.2
## 2  TORNADOES, TSTM WIND, HAIL   1,600.0      2.5   1,602.5
## 3                   HAILSTORM     241.0      0.0     241.0
## 4              TSTM WIND/HAIL      44.3     64.7     108.9
## 5                  SMALL HAIL       0.1     20.8      20.9
## 6     THUNDERSTORM WINDS HAIL       0.7      0.0       0.7
## 7                  HAIL/WINDS       0.5      0.1       0.6
## 8     THUNDERSTORM WINDS/HAIL       0.4      0.0       0.4
## 9                    HAIL 275       0.2      0.0       0.2
## 10                   HAIL 450       0.2      0.0       0.2

Conclusion

The tables above show that flood related event types, including storm surge, accounted for most of the economic damage recorded in the National Weather Service Storm Database between 1950 and November 2011.