Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database

Synopsis
Data Processing
Results

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this analysis is to explore the NOAA Storm Database and answer some basic questions about severe weather events. We will use the database to answer the questions:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

This report aims to help a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, this document doesn’t make any specific recommendations on how to prevent these events.

According to our analysis tornadoes and floods are the most harmful events to public health and the economy respectively.

Data Processing

Getting and Cleaning the Data

The data for this analysis comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be downloaded here [47Mb].

if(!file.exists("./data")){dir.create("./data")}
if(!file.exists("./data/repdata-data-StormData.csv.bz2")) {
     dataUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
     download.file(dataUrl, "./data/repdata-data-StormData.csv.bz2", method = "curl")
     dateDownloaded <- date()
}

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation.
National Climatic Data Center Storm Events FAQ

eventsds <- read.csv("./data/repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE, na.strings = "")
str(eventsds)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The database has 902297 observations with 37 variables each. However, to answer the proposed questions, we will need only the following variables:

BGN_DATE: Date when the event began.
EVTYPE: Event Type (e.g. Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, etc.)
FATALITIES: Number of fatalities caused by the event.
INJURIES: Number of injuries caused by the event.
PROPDMG: Most significant digits of the property damage estimates in dollar.
PROPDMGEXP: Alphabetic codes to signify magnitude of the PROPDMG variable (“K” for thousands, “M” for millions, and “B” for billions).
CROPDMG: Most significant digits of the crop damage estimates in dollar.
CROPDMGEXP: Alphabetic codes to signify magnitude of the CROPDMG variable (“K” for thousands, “M” for millions, and “B” for billions).

The other variables can be discarded in this analysis. So we will delete them to free memory space and make the analysis faster.

eventsds <- select(eventsds, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, 
                   CROPDMGEXP)

The database has one string variable called “BGN_DATE” that represents the date when the event began. This is in the format “%m/%d/%Y %H:%M:%S”. This variable needs to be converted to date type to make the analysis easier.

eventsds$BGN_DATE <- as.Date(eventsds$BGN_DATE, "%m/%d/%Y %H:%M:%S")

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database, there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

In order to define which years we will consider, we will list the number of events per year, the percentage of events the year represents comparing with the whole database, and the accumulated percentage of events along the years.

occurrences <- table( Ocurrences = format(eventsds$BGN_DATE, "%Y"))
occurrences <- cbind(occurrences, percentage = round(occurrences/nrow(eventsds)*100, digits = 2))
occurrences <- occurrences[order(row.names(occurrences), decreasing = TRUE),]
accum <- occurrences[1, "percentage"]
for (i in c(2:nrow(occurrences))){
     accum <- rbind(accum, occurrences[i, "percentage"] + accum[i-1])
}
occurrences <- cbind(occurrences, accumulated = round(accum, digits = 2))
kable(occurrences, caption = "TABLE 1: Events per year", col.names = c("Number of events", 
                                                                       "Percentage from total", 
                                                                       "Accumulated percantage"),
      align = c("c", "c", "c"))

TABLE 1: Events per year
	Number of events	Percentage from total	Accumulated percantage
2011	62174	6.89	6.89
2010	48161	5.34	12.23
2009	45817	5.08	17.31
2008	55663	6.17	23.48
2007	43289	4.80	28.28
2006	44034	4.88	33.16
2005	39184	4.34	37.50
2004	39363	4.36	41.86
2003	39752	4.41	46.27
2002	36293	4.02	50.29
2001	34962	3.87	54.16
2000	34471	3.82	57.98
1999	31289	3.47	61.45
1998	38128	4.23	65.68
1997	28680	3.18	68.86
1996	32270	3.58	72.44
1995	27970	3.10	75.54
1994	20631	2.29	77.83
1993	12607	1.40	79.23
1992	13534	1.50	80.73
1991	12522	1.39	82.12
1990	10946	1.21	83.33
1989	10410	1.15	84.48
1988	7257	0.80	85.28
1987	7367	0.82	86.10
1986	8726	0.97	87.07
1985	7979	0.88	87.95
1984	7335	0.81	88.76
1983	8322	0.92	89.68
1982	7132	0.79	90.47
1981	4517	0.50	90.97
1980	6146	0.68	91.65
1979	4279	0.47	92.12
1978	3657	0.41	92.53
1977	3728	0.41	92.94
1976	3768	0.42	93.36
1975	4975	0.55	93.91
1974	5386	0.60	94.51
1973	4463	0.49	95.00
1972	2168	0.24	95.24
1971	3471	0.38	95.62
1970	3215	0.36	95.98
1969	2926	0.32	96.30
1968	3312	0.37	96.67
1967	2688	0.30	96.97
1966	2388	0.26	97.23
1965	2855	0.32	97.55
1964	2348	0.26	97.81
1963	1968	0.22	98.03
1962	2389	0.26	98.29
1961	2246	0.25	98.54
1960	1945	0.22	98.76
1959	1813	0.20	98.96
1958	2213	0.25	99.21
1957	2184	0.24	99.45
1956	1703	0.19	99.64
1955	1413	0.16	99.80
1954	609	0.07	99.87
1953	492	0.05	99.92
1952	272	0.03	99.95
1951	269	0.03	99.98
1950	223	0.02	100.00

If we apply the Pareto principle (states that, for many events, roughly 80% of the effects come from 20% of the causes). We will reduce the number of years to 20%. It means that we will consider the events from 1999 to 2011. Conversely, in this case, the Pareto principle doesn’t seem to apply very well, as the cumulative percentage of events between 1999 and 2011 is 61.45%. Anyway this subset seems to be representative enough.

eventsds <- filter(eventsds, format(eventsds$BGN_DATE, "%Y") >= 1999)

According to the database documentation, the event type variable “EVTYPE” should have 48 variations, as follow:


Astronomical Low Tide	Extreme Cold/Wind Chill	Hurricane (Typhoon)	Storm Surge/Tide
Avalanche	Flash Flood	Ice Storm	Strong Wind
Blizzard	Flood	Lake-Effect Snow	Thunderstorm Wind
Coastal Flood	Frost/Freeze	Lakeshore Flood	Tornado
Cold/Wind Chill	Funnel Cloud	Lightning	Tropical Depression
Debris Flow	Freezing Fog	Marine Hail	Tropical Storm
Dense Fog	Hail	Marine High Wind	Tsunami
Dense Smoke	Heat	Marine Strong Wind	Volcanic Ash
Drought	Heavy Rain	Marine Thunderstorm Wind	Waterspout
Dust Devil	Heavy Snow	Rip Current	Wildfire
Dust Storm	High Surf	Seiche	Winter Storm
Excessive Heat	High Wind	Sleet	Winter Weather

A quick look into the database shows us that this variable has a lot of filler errors such as spaces at the beginning or at the end, double spaces, misspellings, typing errors, abbreviations and so on.

table(eventsds$EVTYPE[order(eventsds$EVTYPE)])

## 
##             HIGH SURF ADVISORY                    FLASH FLOOD 
##                              1                              1 
##                      TSTM WIND                     WATERSPOUT 
##                              3                              1 
##                           WIND                 ABNORMALLY DRY 
##                              1                              2 
##                 ABNORMALLY WET           ACCUMULATED SNOWFALL 
##                              1                              4 
##         ASTRONOMICAL HIGH TIDE          ASTRONOMICAL LOW TIDE 
##                            103                            174 
##                      AVALANCHE                  BEACH EROSION 
##                            351                              2 
##                      BLACK ICE                       BLIZZARD 
##                              7                           2316 
##                   BLOWING DUST                     BRUSH FIRE 
##                              2                              1 
##                  COASTAL FLOOD               COASTAL FLOODING 
##                            563                             67 
##       COASTAL FLOODING/EROSION                           COLD 
##                              2                             19 
##                   COLD WEATHER   COLD WIND CHILL TEMPERATURES 
##                              1                              4 
##                COLD/WIND CHILL                     COOL SPELL 
##                            539                              1 
##          CSTL FLOODING/EROSION                      DAM BREAK 
##                              2                              1 
##                      DENSE FOG                    DENSE SMOKE 
##                           1188                             10 
##                   DRIEST MONTH                        DROUGHT 
##                              1                           2253 
##                       DROWNING                            DRY 
##                              1                              8 
##                 DRY CONDITIONS                 DRY MICROBURST 
##                              6                            100 
##                      DRY SPELL                        DRYNESS 
##                              3                              1 
##                     DUST DEVEL                     DUST DEVIL 
##                              1                            115 
##                     DUST STORM                 EARLY SNOWFALL 
##                            394                              5 
##                 EXCESSIVE HEAT         EXCESSIVE HEAT/DROUGHT 
##                           1395                              1 
##             EXCESSIVE RAINFALL                 EXCESSIVE SNOW 
##                              2                             25 
##                EXCESSIVELY DRY                   EXTREME COLD 
##                              1                            321 
##        EXTREME COLD/WIND CHILL             EXTREME WIND CHILL 
##                           1002                              1 
##              EXTREME WINDCHILL EXTREME WINDCHILL TEMPERATURES 
##                             93                             19 
##                  EXTREMELY WET               FALLING SNOW/ICE 
##                              1                              2 
##                    FIRST FROST                     FIRST SNOW 
##                              1                              2 
##                    FLASH FLOOD              FLASH FLOOD/FLOOD 
##                          42550                              2 
##                 FLASH FLOODING                          FLOOD 
##                              1                          20881 
##              FLOOD/FLASH/FLOOD                            FOG 
##                              1                            372 
##                         FREEZE               FREEZING DRIZZLE 
##                             59                              5 
##                   FREEZING FOG                  FREEZING RAIN 
##                             45                            121 
##            FREEZING RAIN/SLEET                          FROST 
##                              2                             41 
##                   FROST/FREEZE                   FUNNEL CLOUD 
##                           1342                           4959 
##                  FUNNEL CLOUDS                          GLAZE 
##                              1                              4 
##                  GRADIENT WIND                GUSTY LAKE WIND 
##                              2                              1 
##        GUSTY THUNDERSTORM WIND       GUSTY THUNDERSTORM WINDS 
##                              3                              5 
##                     GUSTY WIND                GUSTY WIND/HAIL 
##                             13                              1 
##                    GUSTY WINDS                           HAIL 
##                             32                         175844 
##                    HARD FREEZE                 HAZARDOUS SURF 
##                              5                              1 
##                           HEAT                     HEAVY RAIN 
##                            711                          10146 
##             HEAVY RAIN EFFECTS                 HEAVY RAINFALL 
##                              1                              2 
##                     HEAVY SEAS                     HEAVY SNOW 
##                              1                          11662 
##                     HEAVY SURF           HEAVY SURF/HIGH SURF 
##                             56                            228 
##                      HIGH SEAS                      HIGH SURF 
##                              6                            628 
##           HIGH SURF ADVISORIES             HIGH SURF ADVISORY 
##                              1                              4 
##                     HIGH WATER                      HIGH WIND 
##                              2                          17359 
##                      HURRICANE              HURRICANE/TYPHOON 
##                             91                             88 
##                            ICE                    ICE ON ROAD 
##                              1                              1 
##                      ICE ROADS                      ICE STORM 
##                              1                           1477 
##                       ICE/SNOW                      ICY ROADS 
##                              2                             10 
##               LAKE EFFECT SNOW               LAKE-EFFECT SNOW 
##                              8                            635 
##                LAKESHORE FLOOD                      LANDSLIDE 
##                             23                            588 
##                      LANDSLUMP                      LANDSPOUT 
##                              1                              2 
##               LATE SEASON HAIL               LATE SEASON SNOW 
##                              1                              1 
##            LIGHT FREEZING RAIN                     LIGHT SNOW 
##                             23                            126 
##     LIGHT SNOW/FREEZING PRECIP                      LIGHTNING 
##                              1                          10548 
##             LOCALLY HEAVY RAIN                    MARINE HAIL 
##                              1                            442 
##               MARINE HIGH WIND             MARINE STRONG WIND 
##                            135                             48 
##       MARINE THUNDERSTORM WIND               MARINE TSTM WIND 
##                           5812                           6175 
##            MIXED PRECIPITATION              MODERATE SNOWFALL 
##                             34                             83 
##          MONTHLY PRECIPITATION               MONTHLY SNOWFALL 
##                             36                              1 
##            MONTHLY TEMPERATURE                      MUD SLIDE 
##                              4                              3 
##                       MUDSLIDE                NON SEVERE HAIL 
##                              5                              7 
##                  NON TSTM WIND         NON-SEVERE WIND DAMAGE 
##                              2                              1 
##                  NON-TSTM WIND                NORTHERN LIGHTS 
##                              1                              1 
##                          OTHER                     PATCHY ICE 
##                             12                              1 
##                   PROLONG COLD                 PROLONG WARMTH 
##                              9                              4 
##                           RAIN                   RECORD  COLD 
##                              7                              1 
##                    RECORD COLD                    RECORD COOL 
##                             19                              3 
##                 RECORD DRYNESS                    RECORD HEAT 
##                              2                             27 
##            RECORD LOW RAINFALL           RECORD PRECIPITATION 
##                              2                              1 
##                RECORD RAINFALL                    RECORD SNOW 
##                             11                              2 
##                RECORD SNOWFALL             RECORD TEMPERATURE 
##                              1                              3 
##                  RECORD WARMTH              RED FLAG CRITERIA 
##                             92                              2 
##               RED FLAG FIRE WX              REMNANTS OF FLOYD 
##                              2                              2 
##                    RIP CURRENT                   RIP CURRENTS 
##                            431                            217 
##                     ROGUE WAVE                     ROUGH SEAS 
##                              1                              3 
##                         SEICHE            SEVERE THUNDERSTORM 
##                             13                              1 
##           SEVERE THUNDERSTORMS                          SLEET 
##                              7                             58 
##                    SLEET STORM                     SMALL HAIL 
##                             12                             36 
##                          SMOKE                           SNOW 
##                             11                            281 
##                  SNOW ADVISORY                   SNOW AND ICE 
##                              1                              2 
##                 SNOW AND SLEET                   SNOW DROUGHT 
##                              4                              2 
##                   SNOW SHOWERS                   SNOW SQUALLS 
##                              5                              6 
##              SNOW/BLOWING SNOW             SNOW/FREEZING RAIN 
##                              6                              4 
##                     SNOW/SLEET              SNOWMELT FLOODING 
##                              3                              3 
##                    STORM SURGE               STORM SURGE/TIDE 
##                            152                            148 
##                STREET FLOODING                    STRONG WIND 
##                              2                           3514 
##                   STRONG WINDS                   THUNDERSTORM 
##                             88                             14 
##              THUNDERSTORM WIND        THUNDERSTORM WIND (G40) 
##                          81402                              1 
##                  THUNDERSTORMS                 TIDAL FLOODING 
##                              2                             10 
##                        TORNADO                 TORNADO DEBRIS 
##                          19206                              1 
##            TROPICAL DEPRESSION                 TROPICAL STORM 
##                             59                            621 
##                      TSTM WIND                TSTM WIND (G40) 
##                          95322                              3 
##                TSTM WIND (G45)                  TSTM WIND G45 
##                             11                              1 
##                 TSTM WIND/HAIL                        TSUNAMI 
##                            530                             20 
##              UNSEASONABLY COLD              UNSEASONABLY COOL 
##                             11                              7 
##        UNSEASONABLY COOL & WET               UNSEASONABLY DRY 
##                              2                             26 
##               UNSEASONABLY HOT              UNSEASONABLY WARM 
##                              6                             61 
##        UNSEASONABLY WARM & WET      UNSEASONABLY WARM AND DRY 
##                              1                              4 
##          UNSEASONABLY WARM/WET               UNSEASONABLY WET 
##                              2                             13 
##            UNSEASONAL LOW TEMP                UNSEASONAL RAIN 
##                              2                              1 
##                 UNUSUALLY COLD            UNUSUALLY LATE SNOW 
##                              8                              1 
##                 UNUSUALLY WARM           URBAN/SML STREAM FLD 
##                              4                           1867 
##                       VERY DRY                      VERY WARM 
##                              2                              1 
##                            VOG                   VOLCANIC ASH 
##                              1                             20 
##               VOLCANIC ASHFALL              VOLCANIC ERUPTION 
##                              3                              2 
##                     WALL CLOUD                   WARM WEATHER 
##                              1                              1 
##                     WATERSPOUT                      WHIRLWIND 
##                           2838                              1 
##               WILD/FOREST FIRE                       WILDFIRE 
##                           1231                           2732 
##                           WIND                  WIND ADVISORY 
##                            239                             12 
##                  WIND AND WAVE                     WIND CHILL 
##                              1                              5 
##                    WIND DAMAGE                     WIND GUSTS 
##                              2                              3 
##                          WINDS                     WINTER MIX 
##                              2                              1 
##                   WINTER STORM                 WINTER WEATHER 
##                          10169                           6951 
##             WINTER WEATHER MIX             WINTER WEATHER/MIX 
##                              6                           1104 
##                     WINTRY MIX                            WND 
##                             50                              1

We will try to clean the filler errors in the variable “EVTYPE” and group the different types to the ones listed in the documentation. Unfortunately, there are some event types in the database that are impossible to be matched with the types shown in the documentation. These types won’t be matched.

eventsds$EVTYPE <- toupper(eventsds$EVTYPE)
eventsds$EVTYPE <- trimws(eventsds$EVTYPE, "both")

eventsds$EVTYPE <- gsub(".*(ASTR).*", "ASTRONOMICAL LOW TIDE", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(AVAL|LAN|SLI).*", "AVALANCHE", eventsds$EVTYPE) 
eventsds$EVTYPE <- gsub(".*(BLIZ).*", "BLIZZARD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(COAST|CST|BEA).*", "COASTAL FLOOD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*(COLD|COO|CHI|LOW|WND))((?!FUN|TOR|FRO|SNO|EXC|EXTR|HIG|REC|SEV|UNU|TID).)*$", "COLD/WIND CHILL", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(FOG|VOG))((?!FRE|ICE).)*$", "DENSE FOG", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(SMO).*", "DENSE SMOKE", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(DROU|DRY|DRI).*", "DROUGHT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(DEV).*", "DUST DEVIL", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*DUS)((?!DEV).)*$", "DUST STORM", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(EXCESSIVE HEAT|EXTREME HEAT|HEATBURST|RECORD HEAT).*",
                       "EXCESSIVE HEAT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(WAR).*", "EXCESSIVE HEAT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(EXC|EXT|REC|SEV|UNU).*(COLD|CHI|COO).*", "EXTREME COLD/WIND CHILL", 
                        eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(FLAS|FLD|FLDG).*", "FLASH FLOOD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*FLOO)((?!COA|BEA|CST|FLA|TID).)*$", "FLOOD", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(FRO|FRE))((?!FOG).)*$", "FROST/FREEZE", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(FUN).*", "FUNNEL CLOUD", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*FOG)((?!DEN).)*$", "FREEZING FOG", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HAI)((?!MAR).)*$", "HAIL", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HEAT)((?!EXC).)*$", "HEAT", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(HEAV|RAI|PREC|SHOW))((?!SNO|SUR).)*$", "HEAVY RAIN", 
                        eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HEAV)((?!RAI|SUR|LAK|ICE).)*$", "HEAVY SNOW", eventsds$EVTYPE, 
                        perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*SNO)((?!LAK).)*$", "HEAVY SNOW", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*SURF)((?!RIP).)*$", "HIGH SURF", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*HIG)((?!SUR|SWE|SEA|TEM|TID|WAT|WAV|REC|MAR).)*$", "HIGH WIND",
                       eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(HUR).*", "HURRICANE (TYPHOON)", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(ICE).*", "ICE STORM", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(LAK).*", "LAKE-EFFECT SNOW", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(LIGHTN).*", "LIGHTNING", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*MARI)((?!HAI|HIG|THU|TST).)*$", "MARINE STRONG WIND", eventsds$EVTYPE, 
                       perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*MARI)((?!HAI|HIG|STR).)*$", "MARINE THUNDERSTORM WIND", eventsds$EVTYPE,
                       perl = TRUE)
eventsds$EVTYPE <- gsub(".*(RIP).*", "RIP CURRENT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(SLE).*", "SLEET", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*(TID|SURG))((?!AST).)*$", "STORM SURGE/TIDE", eventsds$EVTYPE, 
                        perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*WIND)((?!CHI|EXT|THU|HIG|MAR|TST|TUN).)*$", "STRONG WIND", 
                        eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub("^(?=.*(THU|TST))((?!MAR).)*$", "THUNDERSTORM WIND", eventsds$EVTYPE, 
                        perl = TRUE)
eventsds$EVTYPE <- gsub(".*(TORN).*", "TORNADO", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*TROP)((?!DEP).)*$", "TROPICAL STORM", eventsds$EVTYPE, perl = TRUE)
eventsds$EVTYPE <- gsub(".*(VOL).*", "VOLCANIC ASH", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(WAT).*", "WATERSPOUT", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(FIRE).*", "WILDFIRE", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub(".*(WINTER S).*", "WINTER STORM", eventsds$EVTYPE)
eventsds$EVTYPE <- gsub("^(?=.*WINT)((?!STO).)*$", "WINTER WEATHER", eventsds$EVTYPE, perl = TRUE)

One last step to prepare the data is merging the values in PROPDMG with PROPDMGEXP and CROPDMG with CROPDMGEXP. By merging these variables we are able to obtain the actual value in dollar of the loss in properties and crops caused by each event.

eventsds[eventsds$PROPDMGEXP == "K" & !is.na(eventsds$PROPDMGEXP), "PROPDMGEXP"] <- 1000
eventsds[eventsds$PROPDMGEXP == "M" & !is.na(eventsds$PROPDMGEXP), "PROPDMGEXP"] <- 1000000
eventsds[eventsds$PROPDMGEXP == "B" & !is.na(eventsds$PROPDMGEXP), "PROPDMGEXP"] <- 1000000000

eventsds$PROPDMGEXP <- as.numeric(eventsds$PROPDMGEXP)
eventsds$PROPDMG <- eventsds$PROPDMG * eventsds$PROPDMGEXP

eventsds[eventsds$CROPDMGEXP == "K" & !is.na(eventsds$CROPDMGEXP), "CROPDMGEXP"] <- 1000
eventsds[eventsds$CROPDMGEXP == "M" & !is.na(eventsds$CROPDMGEXP), "CROPDMGEXP"] <- 1000000
eventsds[eventsds$CROPDMGEXP == "B" & !is.na(eventsds$CROPDMGEXP), "CROPDMGEXP"] <- 1000000000

eventsds$CROPDMGEXP <- as.numeric(eventsds$CROPDMGEXP)
eventsds$CROPDMG <- eventsds$CROPDMG * eventsds$CROPDMGEXP

Analysing the impact on public health

Firstly we will analyse the consequence of the events in public health. We will start listing the 10 events with the most impact on public health.

eventsds <- group_by(eventsds, EVTYPE)
healthSummary <- summarise(eventsds, totalInjuries = sum(INJURIES, na.rm = TRUE),
                           totalFatalities = sum(FATALITIES, na.rm = TRUE),
                           totalPeopleAffected = sum(INJURIES + FATALITIES, na.rm = TRUE),
                           totalEconomicLoss = currency(sum(PROPDMG + CROPDMG, na.rm = TRUE)/1000000000),
                           frequency = sum(!is.na(EVTYPE)))

healthSummary <- healthSummary[order(-healthSummary$totalPeopleAffected),]
kable(healthSummary[1:10, c(1:4, 6)], caption = "TABLE 2: Top 10 events impacting on population health", 
      col.names = c("Event type", "Total injuries", "Total fatalities", "Total people affected",
                    "Event frequency"), align = c("c", "c", "c", "c"))

TABLE 2: Top 10 events impacting on population health
Event type	Total injuries	Total fatalities	Total people affected	Event frequency
TORNADO	17055	1287	18342	19207
EXCESSIVE HEAT	5188	1515	6703	1588
THUNDERSTORM WIND	3478	276	3754	176778
LIGHTNING	3236	512	3748	10548
FLASH FLOOD	1132	664	1796	44422
HEAT	1222	231	1453	711
WILDFIRE	1326	86	1412	3966
HURRICANE (TYPHOON)	1295	84	1379	179
HIGH WIND	802	158	960	17359
RIP CURRENT	414	485	899	648

The table above can be represented as a bubble chart where the x axis represents the number total of fatalities caused by each type of event, while the y axis represents the number of injuries. The diameter of the bubble represents the frequency of each type of event occurred in the 13 years interval.

ggplot(healthSummary[1:10,], aes(x = totalFatalities, y = totalInjuries, col = EVTYPE)) +
     geom_point(aes(size = frequency), alpha = 0.5) +
     scale_fill_manual(values = getPalette(colourCount)) +
     scale_size_area(max_size = 30) +
     ggtitle("FIGURE 1: Top 10 events impacting on population health") +
     xlab("Total fatality") +
     ylab("Total injuries") +
     labs(col = "Event type", size = "Frequency") +
     geom_text(data = healthSummary[1,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 3) +
     geom_text(data = healthSummary[2,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 2, hjust = 0.9) +
     geom_text(data = healthSummary[3,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -4.7) +
     geom_text(data = healthSummary[4,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -1.5, hjust = -0.1) +
     geom_text(data = healthSummary[5,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -1.5, hjust = -0.2) +
     geom_text(data = healthSummary[6,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 0.2, hjust = -0.2) +
     geom_text(data = healthSummary[7,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 4.5, hjust = 0.3) +
     geom_text(data = healthSummary[8,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = -1.2, hjust = 0.1) +
     geom_text(data = healthSummary[9,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 0.7, hjust = -0.23) +
     geom_text(data = healthSummary[10,], aes(x = totalFatalities, y = totalInjuries, label = EVTYPE),
               vjust = 1.5, hjust = -0.05)

Analysing the impact on the economy

Following the same approach to analyse the economic loss caused by each event type we will find the top 10 events impacting on the economy:

economicSummary <- summarise(eventsds,
                             totalPropertyLoss = currency(sum(PROPDMG, na.rm = TRUE)/1000000000),
                             totalCropLoss = currency(sum(CROPDMG, na.rm = TRUE)/1000000000),
                             totalPeopleAffected = sum(INJURIES + FATALITIES, na.rm = TRUE),
                             totalEconomicLoss = currency(sum(PROPDMG + CROPDMG, na.rm = TRUE)
                                                          /1000000000),
                             frequency = sum(!is.na(EVTYPE)))

economicSummary <- economicSummary[order(-economicSummary$totalEconomicLoss),]
kable(economicSummary[1:10, c(1:3, 5, 6)],
      caption = "TABLE 3: Top 10 events impacting on economy (billions)",
      col.names = c("Event type", "Total property loss", "Total crop loss", "Total economic loss",
                    "Event frequency"), align = c("c", "c", "c", "c"))

TABLE 3: Top 10 events impacting on economy (billions)
Event type	Total property loss	Total crop loss	Total economic loss	Event frequency
FLOOD	$135.06	$4.29	$135.39	20909
HURRICANE (TYPHOON)	$76.51	$4.44	$38.04	179
TORNADO	$21.45	$0.24	$15.72	19207
HAIL	$12.45	$1.98	$8.89	176419
FLASH FLOOD	$12.94	$1.21	$7.96	44422
STORM SURGE/TIDE	$47.82	$0.00	$4.64	310
THUNDERSTORM WIND	$5.85	$0.74	$4.32	176778
WILDFIRE	$7.25	$0.31	$3.71	3966
HIGH WIND	$5.03	$0.50	$2.82	17359
TROPICAL STORM	$7.22	$0.45	$1.45	621

A bubble chart can give us again a visual representation of the number shown in the table above.

ggplot(economicSummary[1:10,], aes(x = totalPropertyLoss, y = currency(totalCropLoss), col = EVTYPE)) +
     geom_point(aes(size = frequency), alpha = 0.5) +
     scale_fill_manual(values = getPalette(colourCount)) +
     scale_size_area(max_size = 30) +
     ggtitle("FIGURE 2: Top 10 events impacting on economy") +
     xlab("Total property loss (billions)") +
     ylab("Total crop loss (billions)") +
     labs(col = "Event type", size = "Frequency") +
     geom_text(data = economicSummary[1,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 3, hjust = 0.7) +
     geom_text(data = economicSummary[2,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 2, hjust = 0.5) +
     geom_text(data = economicSummary[3,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 0.3, hjust = -0.25) +
     geom_text(data = economicSummary[4,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -1.5, hjust = -1.3) +
     geom_text(data = economicSummary[5,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -0.4, hjust = -0.24) +
     geom_text(data = economicSummary[6,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 0.2, hjust = -0.04) +
     geom_text(data = economicSummary[7,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -1.5, hjust = -0.27) +
     geom_text(data = economicSummary[8,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = 3, hjust = 0.3) +
     geom_text(data = economicSummary[9,], aes(x = totalPropertyLoss, y = totalCropLoss, label = EVTYPE),
               vjust = -1.5, hjust = 0.18) +
     geom_text(data = economicSummary[10,], aes(x = totalPropertyLoss, y = totalCropLoss, 
                                                label = EVTYPE), vjust = -0.5, hjust = -0.02)

General analysis

Now we have two lists of the 10 events that cause the most impact on the people’s health and on the national economy. Depending on the focus of the analysis, we can use the charts above or we can join them in a health/economic chart.

We have 6 event types that appear on both lists (Flash Flood, High Wind, Hurricane (Typhoon), Thunderstorm Wind, Tornado and Wildfire). 4 event types appearing only in the top 10 health harassers (Excessive Heat, Heat, Lightning and Rip Current) and other 4 appearing only in the top 10 economy harassers (Flood, Hail, Storm Surge/Tide and Tropical Storm). Merging the two lists we have a single list of 14 events.

uniqueSummary = merge(healthSummary[1:10,], economicSummary[1:10,], all = TRUE)

kable(uniqueSummary[, 1:4], caption = "TABLE 4: Top 14 events impacting on the health and economy", 
      col.names = c("Event type", "Total injuries and fatalities", "Total economic loss (billions)",
                    "Event frequency"), align = c("c", "c", "c", "c"))

TABLE 4: Top 14 events impacting on the health and economy
Event type	Total injuries and fatalities	Total economic loss (billions)	Event frequency
EXCESSIVE HEAT	6703	$0.49	1588
FLASH FLOOD	1796	$7.96	44422
FLOOD	606	$135.39	20909
HAIL	637	$8.89	176419
HEAT	1453	$0.00	711
HIGH WIND	960	$2.82	17359
HURRICANE (TYPHOON)	1379	$38.04	179
LIGHTNING	3748	$0.31	10548
RIP CURRENT	899	$0.00	648
STORM SURGE/TIDE	28	$4.64	310
THUNDERSTORM WIND	3754	$4.32	176778
TORNADO	18342	$15.72	19207
TROPICAL STORM	318	$1.45	621
WILDFIRE	1412	$3.71	3966

We can show the information in the table above on a chart where the x axis represents the total health impact and the y axis represents the total economic impact. The size of the bubble still represents the frequency of the event.

Knowing the growing trend of event types can be a important information in prioritizing their prevention. Events with a fast growth trend may have higher priority over downward trend events.

topEventsds <- filter(eventsds, EVTYPE %in% uniqueSummary$EVTYPE)

bubble <- ggplot(uniqueSummary, aes(x = totalPeopleAffected, y = totalEconomicLoss, col = EVTYPE)) +
     geom_point(aes(size = frequency), alpha = 0.5) +
     scale_fill_manual(values = getPalette(colourCount)) +
     scale_size_area(max_size = 30) +
     xlab("Total health impact") +
     ylab("Total economy loss (billions)") +
     labs(col = "Event type", size = "Frequency")

healthTrend <- ggplot(topEventsds, aes(x = BGN_DATE, y = FATALITIES + INJURIES, 
                                       col = EVTYPE)) + 
     geom_smooth(method = lm, se = FALSE, na.rm = TRUE, alpha = 0.5) +
     ggtitle("Total health impact trend") +
     xlab("Year") +
     ylab("Total health impact") +
     labs(col = "Event type")

economicTrend <- ggplot(topEventsds, aes(x = BGN_DATE, y = (PROPDMG + CROPDMG)/1000000000, col = EVTYPE)) + 
     geom_smooth(method = lm, se = FALSE, na.rm = TRUE, alpha = 0.5) +
     ggtitle("Total economy loss trend") +
     xlab("Year") +
     ylab("Total economy loss (billions)") +
     labs(col = "Event type")
     
fig3 <- ggarrange(bubble, 
          ggarrange(healthTrend, economicTrend, ncol = 2, legend = "none"), 
          nrow = 2, common.legend = TRUE, legend = "right")

annotate_figure(fig3, top = "FIGURE 3: Top 14 events impacting on the health and economy")

Results

Across the United States, which types of events are most harmful with respect to population health?

According to Figure 1, tornadoes are the most harmful event to the population health in relation to either the number of injuries or the number of fatalities. Tornadoes injured 17,055 people and killed 1,287 others between 1999 and 2011.

Tornadoes are destructive events and always make headlines. Conversely, it’s important to highlight that a more silent event has stood out in our analysis. The excessive heat killed 17.72% more people than tornadoes in the same time frame (1,515 fatalities).

The most frequent events, however, are the thunderstorm winds with 176,778 occurrences and 3,478 injuries.

Across the United States, which types of events have the greatest economic consequences?

The Figure 2 clearly shows that floods have the greatest economic consequences to either property or crop loss. Floods caused a property loss of 135.06 billions of dollars and a crop loss of 4.29 billions. The database has 20,909 occurrences of floods in the period between 1999 and 2011.

Hurricanes reach second place in the ranking with $76.51 billion in property loss and $4.44 billions in crop loss (3.38% over floods). Hurricanes, however, are relatively rare as there are only 179 occurrences in the 13 years time frame.

The third in the ranking are tornadoes with a total economic loss of 15.72 billions in the period. Despite the fact that it’s in the third position the difference from the first place is enormous.

General analysis

It seems from the analysis of the Figure 3 that the prioritization of a single event to prevent would force the stakeholders to choose between public health and economy. There are no events with big impacts in both areas. Tornadoes and floods lead the health and economy ranks respectively, with a huge advantage over the second places.

In the case of economic restrictions to invest in the prevention of both events, tornadoes can be the best single choice. Firstly because they are the most harmful event to the public health and the third in the greatest economic consequence. Secondly, because this analysis doesn’t consider the economic impact to treat and recover injured people after the events. Tornadoes can be even more expensive if we include this economic impact in the equation.

As we can see in Figure 3, tornadoes have a rapidly growing trend in their impact on public health compared to other types of events. Regarding the economic impact, tornadoes are the only events that show a slight increase in their trend.

Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database

Vinicios Pereira

21/05/2018

Synopsis

Data Processing

Getting and Cleaning the Data

Analysing the impact on public health

Analysing the impact on the economy

General analysis

Results

Across the United States, which types of events are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

General analysis