Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this analysis is to explore the NOAA Storm Database and answer some basic questions about severe weather events. We will use the database to answer the questions:

This report aims to help a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, this document doesn’t make any specific recommendations on how to prevent these events.

According to our analysis tornadoes and floods are the most harmful events to public health and the economy respectively.

Data Processing

Getting and Cleaning the Data

The data for this analysis comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be downloaded here [47Mb].

if(!file.exists("./data")){dir.create("./data")}
if(!file.exists("./data/repdata-data-StormData.csv.bz2")) {
     dataUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
     download.file(dataUrl, "./data/repdata-data-StormData.csv.bz2", method = "curl")
     dateDownloaded <- date()
}

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

eventsds <- read.csv("./data/repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE, na.strings = "")
str(eventsds)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The database has 902297 observations with 37 variables each. However, to answer the proposed questions, we will need only the following variables:

  • BGN_DATE: Date when the event began.
  • EVTYPE: Event Type (e.g. Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, etc.)
  • FATALITIES: Number of fatalities caused by the event.
  • INJURIES: Number of injuries caused by the event.
  • PROPDMG: Most significant digits of the property damage estimates in dollar.
  • PROPDMGEXP: Alphabetic codes to signify magnitude of the PROPDMG variable (“K” for thousands, “M” for millions, and “B” for billions).
  • CROPDMG: Most significant digits of the crop damage estimates in dollar.
  • CROPDMGEXP: Alphabetic codes to signify magnitude of the CROPDMG variable (“K” for thousands, “M” for millions, and “B” for billions).

The other variables can be discarded in this analysis. So we will delete them to free memory space and make the analysis faster.

eventsds <- select(eventsds, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, 
                   CROPDMGEXP)

The database has one string variable called “BGN_DATE” that represents the date when the event began. This is in the format “%m/%d/%Y %H:%M:%S”. This variable needs to be converted to date type to make the analysis easier.

eventsds$BGN_DATE <- as.Date(eventsds$BGN_DATE, "%m/%d/%Y %H:%M:%S")

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database, there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

In order to define which years we will consider, we will list the number of events per year, the percentage of events the year represents comparing with the whole database, and the accumulated percentage of events along the years.

occurrences <- table( Ocurrences = format(eventsds$BGN_DATE, "%Y"))
occurrences <- cbind(occurrences, percentage = round(occurrences/nrow(eventsds)*100, digits = 2))
occurrences <- occurrences[order(row.names(occurrences), decreasing = TRUE),]
accum <- occurrences[1, "percentage"]
for (i in c(2:nrow(occurrences))){
     accum <- rbind(accum, occurrences[i, "percentage"] + accum[i-1])
}
occurrences <- cbind(occurrences, accumulated = round(accum, digits = 2))
kable(occurrences, caption = "TABLE 1: Events per year", col.names = c("Number of events", 
                                                                       "Percentage from total", 
                                                                       "Accumulated percantage"),
      align = c("c", "c", "c"))
TABLE 1: Events per year
Number of events Percentage from total Accumulated percantage
2011 62174 6.89 6.89
2010 48161 5.34 12.23
2009 45817 5.08 17.31
2008 55663 6.17 23.48
2007 43289 4.80 28.28
2006 44034 4.88 33.16
2005 39184 4.34 37.50
2004 39363 4.36 41.86
2003 39752 4.41 46.27
2002 36293 4.02 50.29
2001 34962 3.87 54.16
2000 34471 3.82 57.98
1999 31289 3.47 61.45
1998 38128 4.23 65.68
1997 28680 3.18 68.86
1996 32270 3.58 72.44
1995 27970 3.10 75.54
1994 20631 2.29 77.83
1993 12607 1.40 79.23
1992 13534 1.50 80.73
1991 12522 1.39 82.12
1990 10946 1.21 83.33
1989 10410 1.15 84.48
1988 7257 0.80 85.28
1987 7367 0.82 86.10
1986 8726 0.97 87.07
1985 7979 0.88 87.95
1984 7335 0.81 88.76
1983 8322 0.92 89.68
1982 7132 0.79 90.47
1981 4517 0.50 90.97
1980 6146 0.68 91.65
1979 4279 0.47 92.12
1978 3657 0.41 92.53
1977 3728 0.41 92.94
1976 3768 0.42 93.36
1975 4975 0.55 93.91
1974 5386 0.60 94.51
1973 4463 0.49 95.00
1972 2168 0.24 95.24
1971 3471 0.38 95.62
1970 3215 0.36 95.98
1969 2926 0.32 96.30
1968 3312 0.37 96.67
1967 2688 0.30 96.97
1966 2388 0.26 97.23
1965 2855 0.32 97.55
1964 2348 0.26 97.81
1963 1968 0.22 98.03
1962 2389 0.26 98.29
1961 2246 0.25 98.54
1960 1945 0.22 98.76
1959 1813 0.20 98.96
1958 2213 0.25 99.21
1957 2184 0.24 99.45
1956 1703 0.19 99.64
1955 1413 0.16 99.80
1954 609 0.07 99.87
1953 492 0.05 99.92
1952 272 0.03 99.95
1951 269 0.03 99.98
1950 223 0.02 100.00

If we apply the Pareto principle (states that, for many events, roughly 80% of the effects come from 20% of the causes). We will reduce the number of years to 20%. It means that we will consider the events from 1999 to 2011. Conversely, in this case, the Pareto principle doesn’t seem to apply very well, as the cumulative percentage of events between 1999 and 2011 is 61.45%. Anyway this subset seems to be representative enough.

eventsds <- filter(eventsds, format(eventsds$BGN_DATE, "%Y") >= 1999)

According to the database documentation, the event type variable “EVTYPE” should have 48 variations, as follow:

Astronomical Low Tide Extreme Cold/Wind Chill Hurricane (Typhoon) Storm Surge/Tide
Avalanche Flash Flood Ice Storm Strong Wind
Blizzard Flood Lake-Effect Snow Thunderstorm Wind
Coastal Flood Frost/Freeze Lakeshore Flood Tornado
Cold/Wind Chill Funnel Cloud Lightning Tropical Depression
Debris Flow Freezing Fog Marine Hail Tropical Storm
Dense Fog Hail Marine High Wind Tsunami
Dense Smoke Heat Marine Strong Wind Volcanic Ash
Drought Heavy Rain Marine Thunderstorm Wind Waterspout
Dust Devil Heavy Snow Rip Current Wildfire
Dust Storm High Surf Seiche Winter Storm
Excessive Heat High Wind Sleet Winter Weather

A quick look into the database shows us that this variable has a lot of filler errors such as spaces at the beginning or at the end, double spaces, misspellings, typing errors, abbreviations and so on.

table(eventsds$EVTYPE[order(eventsds$EVTYPE)])
## 
##             HIGH SURF ADVISORY                    FLASH FLOOD 
##                              1                              1 
##                      TSTM WIND                     WATERSPOUT 
##                              3                              1 
##                           WIND                 ABNORMALLY DRY 
##                              1                              2 
##                 ABNORMALLY WET           ACCUMULATED SNOWFALL 
##                              1                              4 
##         ASTRONOMICAL HIGH TIDE          ASTRONOMICAL LOW TIDE 
##                            103                            174 
##                      AVALANCHE                  BEACH EROSION 
##                            351                              2 
##                      BLACK ICE                       BLIZZARD 
##                              7                           2316 
##                   BLOWING DUST                     BRUSH FIRE 
##                              2                              1 
##                  COASTAL FLOOD               COASTAL FLOODING 
##                            563                             67 
##       COASTAL FLOODING/EROSION                           COLD 
##                              2                             19 
##                   COLD WEATHER   COLD WIND CHILL TEMPERATURES 
##                              1                              4 
##                COLD/WIND CHILL                     COOL SPELL 
##                            539                              1 
##          CSTL FLOODING/EROSION                      DAM BREAK 
##                              2                              1 
##                      DENSE FOG                    DENSE SMOKE 
##                           1188                             10 
##                   DRIEST MONTH                        DROUGHT 
##                              1                           2253 
##                       DROWNING                            DRY 
##                              1                              8 
##                 DRY CONDITIONS                 DRY MICROBURST 
##                              6                            100 
##                      DRY SPELL                        DRYNESS 
##                              3                              1 
##                     DUST DEVEL                     DUST DEVIL 
##                              1                            115 
##                     DUST STORM                 EARLY SNOWFALL 
##                            394                              5 
##                 EXCESSIVE HEAT         EXCESSIVE HEAT/DROUGHT 
##                           1395                              1 
##             EXCESSIVE RAINFALL                 EXCESSIVE SNOW 
##                              2                             25 
##                EXCESSIVELY DRY                   EXTREME COLD 
##                              1                            321 
##        EXTREME COLD/WIND CHILL             EXTREME WIND CHILL 
##                           1002                              1 
##              EXTREME WINDCHILL EXTREME WINDCHILL TEMPERATURES 
##                             93                             19 
##                  EXTREMELY WET               FALLING SNOW/ICE 
##                              1                              2 
##                    FIRST FROST                     FIRST SNOW 
##                              1                              2 
##                    FLASH FLOOD              FLASH FLOOD/FLOOD 
##                          42550                              2 
##                 FLASH FLOODING                          FLOOD 
##                              1                          20881 
##              FLOOD/FLASH/FLOOD                            FOG 
##                              1                            372 
##                         FREEZE               FREEZING DRIZZLE 
##                             59                              5 
##                   FREEZING FOG                  FREEZING RAIN 
##                             45                            121 
##            FREEZING RAIN/SLEET                          FROST 
##                              2                             41 
##                   FROST/FREEZE                   FUNNEL CLOUD 
##                           1342                           4959 
##                  FUNNEL CLOUDS                          GLAZE 
##                              1                              4 
##                  GRADIENT WIND                GUSTY LAKE WIND 
##                              2                              1 
##        GUSTY THUNDERSTORM WIND       GUSTY THUNDERSTORM WINDS 
##                              3                              5 
##                     GUSTY WIND                GUSTY WIND/HAIL 
##                             13                              1 
##                    GUSTY WINDS                           HAIL 
##                             32                         175844 
##                    HARD FREEZE                 HAZARDOUS SURF 
##                              5                              1 
##                           HEAT                     HEAVY RAIN 
##                            711                          10146 
##             HEAVY RAIN EFFECTS                 HEAVY RAINFALL 
##                              1                              2 
##                     HEAVY SEAS                     HEAVY SNOW 
##                              1                          11662 
##                     HEAVY SURF           HEAVY SURF/HIGH SURF 
##                             56                            228 
##                      HIGH SEAS                      HIGH SURF 
##                              6                            628 
##           HIGH SURF ADVISORIES             HIGH SURF ADVISORY 
##                              1                              4 
##                     HIGH WATER                      HIGH WIND 
##                              2                          17359 
##                      HURRICANE              HURRICANE/TYPHOON 
##                             91                             88 
##                            ICE                    ICE ON ROAD 
##                              1                              1 
##                      ICE ROADS                      ICE STORM 
##                              1                           1477 
##                       ICE/SNOW                      ICY ROADS 
##                              2                             10 
##               LAKE EFFECT SNOW               LAKE-EFFECT SNOW 
##                              8                            635 
##                LAKESHORE FLOOD                      LANDSLIDE 
##                             23                            588 
##                      LANDSLUMP                      LANDSPOUT 
##                              1                              2 
##               LATE SEASON HAIL               LATE SEASON SNOW 
##                              1                              1 
##            LIGHT FREEZING RAIN                     LIGHT SNOW 
##                             23                            126 
##     LIGHT SNOW/FREEZING PRECIP                      LIGHTNING 
##                              1                          10548 
##             LOCALLY HEAVY RAIN                    MARINE HAIL 
##                              1                            442 
##               MARINE HIGH WIND             MARINE STRONG WIND 
##                            135                             48 
##       MARINE THUNDERSTORM WIND               MARINE TSTM WIND 
##                           5812                           6175 
##            MIXED PRECIPITATION              MODERATE SNOWFALL 
##                             34                             83 
##          MONTHLY PRECIPITATION               MONTHLY SNOWFALL 
##                             36                              1 
##            MONTHLY TEMPERATURE                      MUD SLIDE 
##                              4                              3 
##                       MUDSLIDE                NON SEVERE HAIL 
##                              5                              7 
##                  NON TSTM WIND         NON-SEVERE WIND DAMAGE 
##                              2                              1 
##                  NON-TSTM WIND                NORTHERN LIGHTS 
##                              1                              1 
##                          OTHER                     PATCHY ICE 
##                             12                              1 
##                   PROLONG COLD                 PROLONG WARMTH 
##                              9                              4 
##                           RAIN                   RECORD  COLD 
##                              7                              1 
##                    RECORD COLD                    RECORD COOL 
##                             19                              3 
##                 RECORD DRYNESS                    RECORD HEAT 
##                              2                             27 
##            RECORD LOW RAINFALL           RECORD PRECIPITATION 
##                              2                              1 
##                RECORD RAINFALL                    RECORD SNOW 
##                             11                              2 
##                RECORD SNOWFALL             RECORD TEMPERATURE 
##                              1                              3 
##                  RECORD WARMTH              RED FLAG CRITERIA 
##                             92                              2 
##               RED FLAG FIRE WX              REMNANTS OF FLOYD 
##                              2                              2 
##                    RIP CURRENT                   RIP CURRENTS 
##                            431                            217 
##                     ROGUE WAVE                     ROUGH SEAS 
##                              1                              3 
##                         SEICHE            SEVERE THUNDERSTORM 
##                             13                              1 
##           SEVERE THUNDERSTORMS                          SLEET 
##                              7                             58 
##                    SLEET STORM                     SMALL HAIL 
##                             12                             36 
##                          SMOKE                           SNOW 
##                             11                            281 
##                  SNOW ADVISORY                   SNOW AND ICE 
##                              1                              2 
##                 SNOW AND SLEET                   SNOW DROUGHT 
##                              4                              2 
##                   SNOW SHOWERS                   SNOW SQUALLS 
##                              5                              6 
##              SNOW/BLOWING SNOW             SNOW/FREEZING RAIN 
##                              6                              4 
##                     SNOW/SLEET              SNOWMELT FLOODING 
##                              3                              3 
##                    STORM SURGE               STORM SURGE/TIDE 
##                            152                            148 
##                STREET FLOODING                    STRONG WIND 
##                              2                           3514 
##                   STRONG WINDS                   THUNDERSTORM 
##                             88                             14 
##              THUNDERSTORM WIND        THUNDERSTORM WIND (G40) 
##                          81402                              1 
##                  THUNDERSTORMS                 TIDAL FLOODING 
##                              2                             10 
##                        TORNADO                 TORNADO DEBRIS 
##                          19206                              1 
##            TROPICAL DEPRESSION                 TROPICAL STORM 
##                             59                            621 
##                      TSTM WIND                TSTM WIND (G40) 
##                          95322                              3 
##                TSTM WIND (G45)                  TSTM WIND G45 
##                             11                              1 
##                 TSTM WIND/HAIL                        TSUNAMI 
##                            530                             20 
##              UNSEASONABLY COLD              UNSEASONABLY COOL 
##                             11                              7 
##        UNSEASONABLY COOL & WET               UNSEASONABLY DRY 
##                              2                             26 
##               UNSEASONABLY HOT              UNSEASONABLY WARM 
##                              6                             61 
##        UNSEASONABLY WARM & WET      UNSEASONABLY WARM AND DRY 
##                              1                              4 
##          UNSEASONABLY WARM/WET               UNSEASONABLY WET 
##                              2                             13 
##            UNSEASONAL LOW TEMP                UNSEASONAL RAIN 
##                              2                              1 
##                 UNUSUALLY COLD            UNUSUALLY LATE SNOW 
##                              8                              1 
##                 UNUSUALLY WARM           URBAN/SML STREAM FLD 
##                              4                           1867 
##                       VERY DRY                      VERY WARM 
##                              2                              1 
##                            VOG                   VOLCANIC ASH 
##                              1                             20 
##               VOLCANIC ASHFALL              VOLCANIC ERUPTION 
##                              3                              2 
##                     WALL CLOUD                   WARM WEATHER 
##                              1                              1 
##                     WATERSPOUT                      WHIRLWIND 
##                           2838                              1 
##               WILD/FOREST FIRE                       WILDFIRE 
##                           1231                           2732 
##                           WIND                  WIND ADVISORY 
##                            239                             12 
##                  WIND AND WAVE                     WIND CHILL 
##                              1                              5 
##                    WIND DAMAGE                     WIND GUSTS 
##                              2                              3 
##                          WINDS                     WINTER MIX 
##                              2                              1 
##                   WINTER STORM                 WINTER WEATHER 
##                          10169                           6951 
##             WINTER WEATHER MIX             WINTER WEATHER/MIX 
##                              6                           1104 
##                     WINTRY MIX                            WND 
##                             50                              1

We will try to clean the filler errors in the variable “EVTYPE” and group the different types to the ones listed in the documentation. Unfortunately, there are some event types in the database that are impossible to be matched with the types shown in the documentation. These types won’t be matched.

eventsds$EVTYPE <- toupper(eventsds$EVTYPE)
eventsds$EVTYPE <- trimws(eventsds$EVTYPE, "both")

eventsds$EVTYPE <- gsub(".*(ASTR).*", "ASTRONOMICAL LOW TIDE", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(AVAL|LAN|SLI).*", "AVALANCHE", eventsds$EVTYPE) 
eventsds$EVTYPE <- gsub(".*(BLIZ).*", "BLIZZARD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(COAST|CST|BEA).*", "COASTAL FLOOD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*(COLD|COO|CHI|LOW|WND))((?!FUN|TOR|FRO|SNO|EXC|EXTR|HIG|REC|SEV|UNU|TID).)*$", "COLD/WIND CHILL", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(FOG|VOG))((?!FRE|ICE).)*$", "DENSE FOG", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(SMO).*", "DENSE SMOKE", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(DROU|DRY|DRI).*", "DROUGHT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(DEV).*", "DUST DEVIL", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*DUS)((?!DEV).)*$", "DUST STORM", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(EXCESSIVE HEAT|EXTREME HEAT|HEATBURST|RECORD HEAT).*",
                       "EXCESSIVE HEAT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(WAR).*", "EXCESSIVE HEAT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(EXC|EXT|REC|SEV|UNU).*(COLD|CHI|COO).*", "EXTREME COLD/WIND CHILL", 
                        eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(FLAS|FLD|FLDG).*", "FLASH FLOOD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*FLOO)((?!COA|BEA|CST|FLA|TID).)*$", "FLOOD", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(FRO|FRE))((?!FOG).)*$", "FROST/FREEZE", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(FUN).*", "FUNNEL CLOUD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*FOG)((?!DEN).)*$", "FREEZING FOG", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HAI)((?!MAR).)*$", "HAIL", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HEAT)((?!EXC).)*$", "HEAT", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(HEAV|RAI|PREC|SHOW))((?!SNO|SUR).)*$", "HEAVY RAIN", 
                        eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HEAV)((?!RAI|SUR|LAK|ICE).)*$", "HEAVY SNOW", eventsds$EVTYPE, 
                        perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*SNO)((?!LAK).)*$", "HEAVY SNOW", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*SURF)((?!RIP).)*$", "HIGH SURF", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HIG)((?!SUR|SWE|SEA|TEM|TID|WAT|WAV|REC|MAR).)*$", "HIGH WIND",
                       eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(HUR).*", "HURRICANE (TYPHOON)", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(ICE).*", "ICE STORM", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(LAK).*", "LAKE-EFFECT SNOW", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(LIGHTN).*", "LIGHTNING", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*MARI)((?!HAI|HIG|THU|TST).)*$", "MARINE STRONG WIND", eventsds$EVTYPE, 
                       perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*MARI)((?!HAI|HIG|STR).)*$", "MARINE THUNDERSTORM WIND", eventsds$EVTYPE,
                       perl = TRUE)
eventsds$EVTYPE <- gsub(".*(RIP).*", "RIP CURRENT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(SLE).*", "SLEET", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*(TID|SURG))((?!AST).)*$", "STORM SURGE/TIDE", eventsds$EVTYPE, 
                        perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*WIND)((?!CHI|EXT|THU|HIG|MAR|TST|TUN).)*$", "STRONG WIND", 
                        eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(THU|TST))((?!MAR).)*$", "THUNDERSTORM WIND", eventsds$EVTYPE, 
                        perl = TRUE)
eventsds$EVTYPE <- gsub(".*(TORN).*", "TORNADO", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*TROP)((?!DEP).)*$", "TROPICAL STORM", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(VOL).*", "VOLCANIC ASH", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(WAT).*", "WATERSPOUT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(FIRE).*", "WILDFIRE", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(WINTER S).*", "WINTER STORM", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*WINT)((?!STO).)*$", "WINTER WEATHER", eventsds$EVTYPE, perl = TRUE)

One last step to prepare the data is merging the values in PROPDMG with PROPDMGEXP and CROPDMG with CROPDMGEXP. By merging these variables we are able to obtain the actual value in dollar of the loss in properties and crops caused by each event.

eventsds[eventsds$PROPDMGEXP == "K" & !is.na(eventsds$PROPDMGEXP), "PROPDMGEXP"] <- 1000
eventsds[eventsds$PROPDMGEXP == "M" & !is.na(eventsds$PROPDMGEXP), "PROPDMGEXP"] <- 1000000
eventsds[eventsds$PROPDMGEXP == "B" & !is.na(eventsds$PROPDMGEXP), "PROPDMGEXP"] <- 1000000000

eventsds$PROPDMGEXP <- as.numeric(eventsds$PROPDMGEXP)
eventsds$PROPDMG <- eventsds$PROPDMG * eventsds$PROPDMGEXP

eventsds[eventsds$CROPDMGEXP == "K" & !is.na(eventsds$CROPDMGEXP), "CROPDMGEXP"] <- 1000
eventsds[eventsds$CROPDMGEXP == "M" & !is.na(eventsds$CROPDMGEXP), "CROPDMGEXP"] <- 1000000
eventsds[eventsds$CROPDMGEXP == "B" & !is.na(eventsds$CROPDMGEXP), "CROPDMGEXP"] <- 1000000000

eventsds$CROPDMGEXP <- as.numeric(eventsds$CROPDMGEXP)
eventsds$CROPDMG <- eventsds$CROPDMG * eventsds$CROPDMGEXP

Analysing the impact on public health

Firstly we will analyse the consequence of the events in public health. We will start listing the 10 events with the most impact on public health.

eventsds <- group_by(eventsds, EVTYPE)
healthSummary <- summarise(eventsds, totalInjuries = sum(INJURIES, na.rm = TRUE),
                           totalFatalities = sum(FATALITIES, na.rm = TRUE),
                           totalPeopleAffected = sum(INJURIES + FATALITIES, na.rm = TRUE),
                           totalEconomicLoss = currency(sum(PROPDMG + CROPDMG, na.rm = TRUE)/1000000000),
                           frequency = sum(!is.na(EVTYPE)))

healthSummary <- healthSummary[order(-healthSummary$totalPeopleAffected),]
kable(healthSummary[1:10, c(1:4, 6)], caption = "TABLE 2: Top 10 events impacting on population health", 
      col.names = c("Event type", "Total injuries", "Total fatalities", "Total people affected",
                    "Event frequency"), align = c("c", "c", "c", "c"))
TABLE 2: Top 10 events impacting on population health
Event type Total injuries Total fatalities Total people affected Event frequency
TORNADO 17055 1287 18342 19207
EXCESSIVE HEAT 5188 1515 6703 1588
THUNDERSTORM WIND 3478 276 3754 176778
LIGHTNING 3236 512 3748 10548
FLASH FLOOD 1132 664 1796 44422
HEAT 1222 231 1453 711
WILDFIRE 1326 86 1412 3966
HURRICANE (TYPHOON) 1295 84 1379 179
HIGH WIND 802 158 960 17359
RIP CURRENT 414 485 899 648

The table above can be represented as a bubble chart where the x axis represents the number total of fatalities caused by each type of event, while the y axis represents the number of injuries. The diameter of the bubble represents the frequency of each type of event occurred in the 13 years interval.

ggplot(healthSummary[1:10,], aes(x = totalFatalities, y = totalInjuries, col = EVTYPE)) +
     geom_point(aes(size = frequency), alpha = 0.5) +
     scale_fill_manual(values = getPalette(colourCount)) +
     scale_size_area(max_size = 30) +
     ggtitle("FIGURE 1: Top 10 events impacting on population health") +
     xlab("Total fatality") +
     ylab("Total injuries") +
     labs(col = "Event type", size = "Frequency") +
     geom_text(data = healthSummary[1,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 3) +
     geom_text(data = healthSummary[2,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 2, hjust = 0.9) +
     geom_text(data = healthSummary[3,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -4.7) +
     geom_text(data = healthSummary[4,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -1.5, hjust = -0.1) +
     geom_text(data = healthSummary[5,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -1.5, hjust = -0.2) +
     geom_text(data = healthSummary[6,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 0.2, hjust = -0.2) +
     geom_text(data = healthSummary[7,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 4.5, hjust = 0.3) +
     geom_text(data = healthSummary[8,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -1.2, hjust = 0.1) +
     geom_text(data = healthSummary[9,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 0.7, hjust = -0.23) +
     geom_text(data = healthSummary[10,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 1.5, hjust = -0.05)

Analysing the impact on the economy

Following the same approach to analyse the economic loss caused by each event type we will find the top 10 events impacting on the economy:

economicSummary <- summarise(eventsds,
                             totalPropertyLoss = currency(sum(PROPDMG, na.rm = TRUE)/1000000000),
                             totalCropLoss = currency(sum(CROPDMG, na.rm = TRUE)/1000000000),
                             totalPeopleAffected = sum(INJURIES + FATALITIES, na.rm = TRUE),
                             totalEconomicLoss = currency(sum(PROPDMG + CROPDMG, na.rm = TRUE)
                                                          /1000000000),
                             frequency = sum(!is.na(EVTYPE)))

economicSummary <- economicSummary[order(-economicSummary$totalEconomicLoss),]
kable(economicSummary[1:10, c(1:3, 5, 6)],
      caption = "TABLE 3: Top 10 events impacting on economy (billions)",
      col.names = c("Event type", "Total property loss", "Total crop loss", "Total economic loss",
                    "Event frequency"), align = c("c", "c", "c", "c"))
TABLE 3: Top 10 events impacting on economy (billions)
Event type Total property loss Total crop loss Total economic loss Event frequency
FLOOD $135.06 $4.29 $135.39 20909
HURRICANE (TYPHOON) $76.51 $4.44 $38.04 179
TORNADO $21.45 $0.24 $15.72 19207
HAIL $12.45 $1.98 $8.89 176419
FLASH FLOOD $12.94 $1.21 $7.96 44422
STORM SURGE/TIDE $47.82 $0.00 $4.64 310
THUNDERSTORM WIND $5.85 $0.74 $4.32 176778
WILDFIRE $7.25 $0.31 $3.71 3966
HIGH WIND $5.03 $0.50 $2.82 17359
TROPICAL STORM $7.22 $0.45 $1.45 621

A bubble chart can give us again a visual representation of the number shown in the table above.

ggplot(economicSummary[1:10,], aes(x = totalPropertyLoss, y = currency(totalCropLoss), col = EVTYPE)) +
     geom_point(aes(size = frequency), alpha = 0.5) +
     scale_fill_manual(values = getPalette(colourCount)) +
     scale_size_area(max_size = 30) +
     ggtitle("FIGURE 2: Top 10 events impacting on economy") +
     xlab("Total property loss (billions)") +
     ylab("Total crop loss (billions)") +
     labs(col = "Event type", size = "Frequency") +
     geom_text(data = economicSummary[1,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 3, hjust = 0.7) +
     geom_text(data = economicSummary[2,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 2, hjust = 0.5) +
     geom_text(data = economicSummary[3,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 0.3, hjust = -0.25) +
     geom_text(data = economicSummary[4,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -1.5, hjust = -1.3) +
     geom_text(data = economicSummary[5,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -0.4, hjust = -0.24) +
     geom_text(data = economicSummary[6,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 0.2, hjust = -0.04) +
     geom_text(data = economicSummary[7,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -1.5, hjust = -0.27) +
     geom_text(data = economicSummary[8,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 3, hjust = 0.3) +
     geom_text(data = economicSummary[9,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -1.5, hjust = 0.18) +
     geom_text(data = economicSummary[10,], aes(x = totalPropertyLoss, y = totalCropLoss, 
                                                label = EVTYPE), vjust = -0.5, hjust = -0.02)

General analysis

Now we have two lists of the 10 events that cause the most impact on the people’s health and on the national economy. Depending on the focus of the analysis, we can use the charts above or we can join them in a health/economic chart.

We have 6 event types that appear on both lists (Flash Flood, High Wind, Hurricane (Typhoon), Thunderstorm Wind, Tornado and Wildfire). 4 event types appearing only in the top 10 health harassers (Excessive Heat, Heat, Lightning and Rip Current) and other 4 appearing only in the top 10 economy harassers (Flood, Hail, Storm Surge/Tide and Tropical Storm). Merging the two lists we have a single list of 14 events.

uniqueSummary = merge(healthSummary[1:10,], economicSummary[1:10,], all = TRUE)

kable(uniqueSummary[, 1:4], caption = "TABLE 4: Top 14 events impacting on the health and economy", 
      col.names = c("Event type", "Total injuries and fatalities", "Total economic loss (billions)",
                    "Event frequency"), align = c("c", "c", "c", "c"))
TABLE 4: Top 14 events impacting on the health and economy
Event type Total injuries and fatalities Total economic loss (billions) Event frequency
EXCESSIVE HEAT 6703 $0.49 1588
FLASH FLOOD 1796 $7.96 44422
FLOOD 606 $135.39 20909
HAIL 637 $8.89 176419
HEAT 1453 $0.00 711
HIGH WIND 960 $2.82 17359
HURRICANE (TYPHOON) 1379 $38.04 179
LIGHTNING 3748 $0.31 10548
RIP CURRENT 899 $0.00 648
STORM SURGE/TIDE 28 $4.64 310
THUNDERSTORM WIND 3754 $4.32 176778
TORNADO 18342 $15.72 19207
TROPICAL STORM 318 $1.45 621
WILDFIRE 1412 $3.71 3966

We can show the information in the table above on a chart where the x axis represents the total health impact and the y axis represents the total economic impact. The size of the bubble still represents the frequency of the event.

Knowing the growing trend of event types can be a important information in prioritizing their prevention. Events with a fast growth trend may have higher priority over downward trend events.

topEventsds <- filter(eventsds, EVTYPE %in% uniqueSummary$EVTYPE)

bubble <- ggplot(uniqueSummary, aes(x = totalPeopleAffected, y = totalEconomicLoss, col = EVTYPE)) +
     geom_point(aes(size = frequency), alpha = 0.5) +
     scale_fill_manual(values = getPalette(colourCount)) +
     scale_size_area(max_size = 30) +
     xlab("Total health impact") +
     ylab("Total economy loss (billions)") +
     labs(col = "Event type", size = "Frequency")

healthTrend <- ggplot(topEventsds, aes(x = BGN_DATE, y = FATALITIES + INJURIES, 
                                       col = EVTYPE)) + 
     geom_smooth(method = lm, se = FALSE, na.rm = TRUE, alpha = 0.5) +
     ggtitle("Total health impact trend") +
     xlab("Year") +
     ylab("Total health impact") +
     labs(col = "Event type")

economicTrend <- ggplot(topEventsds, aes(x = BGN_DATE, y = (PROPDMG + CROPDMG)/1000000000, col = EVTYPE)) + 
     geom_smooth(method = lm, se = FALSE, na.rm = TRUE, alpha = 0.5) +
     ggtitle("Total economy loss trend") +
     xlab("Year") +
     ylab("Total economy loss (billions)") +
     labs(col = "Event type")
     
fig3 <- ggarrange(bubble, 
          ggarrange(healthTrend, economicTrend, ncol = 2, legend = "none"), 
          nrow = 2, common.legend = TRUE, legend = "right")

annotate_figure(fig3, top = "FIGURE 3: Top 14 events impacting on the health and economy")

Results

Across the United States, which types of events are most harmful with respect to population health?

According to Figure 1, tornadoes are the most harmful event to the population health in relation to either the number of injuries or the number of fatalities. Tornadoes injured 17,055 people and killed 1,287 others between 1999 and 2011.

Tornadoes are destructive events and always make headlines. Conversely, it’s important to highlight that a more silent event has stood out in our analysis. The excessive heat killed 17.72% more people than tornadoes in the same time frame (1,515 fatalities).

The most frequent events, however, are the thunderstorm winds with 176,778 occurrences and 3,478 injuries.

Across the United States, which types of events have the greatest economic consequences?

The Figure 2 clearly shows that floods have the greatest economic consequences to either property or crop loss. Floods caused a property loss of 135.06 billions of dollars and a crop loss of 4.29 billions. The database has 20,909 occurrences of floods in the period between 1999 and 2011.

Hurricanes reach second place in the ranking with $76.51 billion in property loss and $4.44 billions in crop loss (3.38% over floods). Hurricanes, however, are relatively rare as there are only 179 occurrences in the 13 years time frame.

The third in the ranking are tornadoes with a total economic loss of 15.72 billions in the period. Despite the fact that it’s in the third position the difference from the first place is enormous.

General analysis

It seems from the analysis of the Figure 3 that the prioritization of a single event to prevent would force the stakeholders to choose between public health and economy. There are no events with big impacts in both areas. Tornadoes and floods lead the health and economy ranks respectively, with a huge advantage over the second places.

In the case of economic restrictions to invest in the prevention of both events, tornadoes can be the best single choice. Firstly because they are the most harmful event to the public health and the third in the greatest economic consequence. Secondly, because this analysis doesn’t consider the economic impact to treat and recover injured people after the events. Tornadoes can be even more expensive if we include this economic impact in the equation.

As we can see in Figure 3, tornadoes have a rapidly growing trend in their impact on public health compared to other types of events. Regarding the economic impact, tornadoes are the only events that show a slight increase in their trend.