library(ggplot2)
library(plyr)
library(dplyr)

Consequence analysis of severe meteorological events on the health and economy of the United States - (NOAA).

Synopsis

The objective of this report is to identify the severe meteorological events that are most harmful both in terms of individual health and the effects on the U.S. economy. To do so, we are using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which covers the period from 1950 through 2011. This CSV archive has registered a large number of these events.

Loading Data and selecting the variables for the analysis

I downloaded the database in question from the URL provided by Coursera for this project. Next, I selected only those variables that would be used for this analysis:

- Evtype: Description of the type of severe meteorological event.

- Fatalities: Number of people who have lost their lives in registered meteorological events.

- Injuries: Number of people who have been injured in registered meteorological events.

- Propdmg: Coefficient of the damages caused to Properties in registered meteorological events. This quantity multiplied by the Propdmgexp variable yields the real dollar amount for analysis.

- Cropdmg: Coefficient of the damages caused to Crops in registered meteorological events. This quantity multiplied by the Cropdmgexp yields the real quantity for analysis.

With regard to the registries/observations, this leaves the subset that I am going to work with, meaning those registries that have a value of greater than zero in the previously described 4 numeric variables.

if (!file.exists("./coursera/repdata_data_StormData.csv.bz2"))
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
                ,"./coursera/repdata_data_StormData.csv.bz2")

Datos_Raw <- read.csv(bzfile("./coursera/repdata_data_StormData.csv.bz2"
                             , "repdata_data_StormData.csv")) %>% 
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
Datos <- Datos_Raw %>% 
  filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)

Data Processing

To improve the quality of the data and to yield more real results, I observed that many types of events can be grouped together as one. This is why I have standardized and grouped some of them together.

These are the transformations that I’ve made:

1. Evtype - I changed all the data to capital letters and removed the spaces that had no relevance.

2. Evtype - I unified the types of events that, from my point of view, were exactly the same.

3. The variables Propdmgexp and Cropdmgexp have codes that I have interpreted in the following way:

- H,h (10^2)

- K,k (10^3)

- M,m (10^6)

- B,b (10^9)

- + (1)

- -,?,[blank],[empty] (0)

- 0:8 (10)

I converted the codes into numeric values, to make it possible to calculate the exact quantities.

Datos$EVTYPE <- toupper(Datos$EVTYPE)
Datos$EVTYPE <- trimws(Datos$EVTYPE)
Datos$EVTYPE <- gsub("   ", " ", Datos$EVTYPE)
Datos$EVTYPE <- gsub("  ", " ", Datos$EVTYPE)
valuEv <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize_at(c("FATALITIES", "INJURIES", "PROPDMG", "CROPDMG"), sum)
dim(valuEv)
## [1] 440   5
Datos$EVTYPE[Datos$EVTYPE == "AVALANCE"] <- "AVALANCHE"
Datos$EVTYPE[Datos$EVTYPE == "ASTRONOMICAL HIGH TIDE"] <- "ASTRONOMICAL LOW TIDE"
Datos$EVTYPE[Datos$EVTYPE == "BLIZZARD/WINTER STORM"] <- "BLIZZARD"
Datos$EVTYPE[Datos$EVTYPE == "COASTAL FLOODING"] <- "COASTAL FLOOD"
Datos$EVTYPE[Datos$EVTYPE == "COASTALSTORM"] <- "COASTAL STORM"
Datos$EVTYPE[Datos$EVTYPE == "COLD" | Datos$EVTYPE == "COLD/WINDS"] <- "COLD/WIND CHILL"
Datos$EVTYPE[Datos$EVTYPE == "DUST STORM/HIGH WINDS"] <- "DUST STORM"
Datos$EVTYPE[grepl("^DROUGHT.", Datos$EVTYPE)] <- "DROUGHT"
Datos$EVTYPE[Datos$EVTYPE == "EXTREME HEAT"] <- "EXCESSIVE HEAT"
Datos$EVTYPE[Datos$EVTYPE == "EXTREME COLD"] <- "EXTREME COLD/WIND CHILL"
Datos$EVTYPE[Datos$EVTYPE == "EXTREME WIND CHILL"] <- "EXTREME COLD/WIND CHILL"
Datos$EVTYPE[Datos$EVTYPE == "EXTREME WINDCHILL"] <- "EXTREME COLD/WIND CHILL"
Datos$EVTYPE[grepl("^FLASH FLOOD.", Datos$EVTYPE)] <- "FLASH FLOOD"
Datos$EVTYPE[grepl("^FLOOD.", Datos$EVTYPE)] <- "FLOOD"
Datos$EVTYPE[grepl("^FOG", Datos$EVTYPE)] <- "DENSE FOG"
Datos$EVTYPE[Datos$EVTYPE == "FREEZING FOG"] <- "@"
Datos$EVTYPE[grepl("^FREEZ.|^FROST.|^FROST", Datos$EVTYPE)] <- "FROST/FREEZE"
Datos$EVTYPE[Datos$EVTYPE == "@"] <- "FREEZING FOG"
Datos$EVTYPE[grepl("^HAIL.", Datos$EVTYPE)] <- "HAIL"
Datos$EVTYPE[grepl("^HEAT.", Datos$EVTYPE)] <- "HEAT"
Datos$EVTYPE[grepl("^HEAVY RAIN.|^HVY RAIN", Datos$EVTYPE)] <- "HEAVY RAIN"
Datos$EVTYPE[grepl("^HEAVY SNOW.", Datos$EVTYPE)] <- "HEAVY SNOW"
Datos$EVTYPE[grepl("^HIGH SURF.", Datos$EVTYPE)] <- "HIGH SURF"
Datos$EVTYPE[grepl("^HIGH WIND.", Datos$EVTYPE)] <- "HIGH WIND"
Datos$EVTYPE[grepl("^HURRICANE.|^HURRICANE|^TYPHOON", 
                   Datos$EVTYPE)] <- "HURRICANE (TYPHOON)"
Datos$EVTYPE[grepl("^LAKE EFFECT", Datos$EVTYPE)] <- "LAKE-EFFECT SNOW"
Datos$EVTYPE[grepl("^LAKE FLOOD.|^LAKE FLOOD", Datos$EVTYPE)] <- "LAKESHORE FLOOD"
Datos$EVTYPE[grepl("^LIGHTNING.|^LIGNTNING|^LIGHTING", Datos$EVTYPE)] <- "LIGHTNING"
Datos$EVTYPE[grepl("^MUD SLIDE.|^MUDSLIDE.|^MUDSLIDE", Datos$EVTYPE)] <- "MUD SLIDE"
Datos$EVTYPE[grepl("^MARINE TSTM.", Datos$EVTYPE)] <- "MARINE THUNDERSTORM WIND"
Datos$EVTYPE[grepl("^MIXED PRECI.", Datos$EVTYPE)] <- "MIXED PRECIPITATION"
Datos$EVTYPE[grepl("^RIP CURRENT.", Datos$EVTYPE)] <- "RIP CURRENT"
Datos$EVTYPE[grepl("^SLEET.", Datos$EVTYPE)] <- "SLEET"
Datos$EVTYPE[grepl("^STORM SURGE.", Datos$EVTYPE)] <- "STORM SURGE/TIDE"
Datos$EVTYPE[grepl("^STRONG WIND.", Datos$EVTYPE)] <- "STRONG WIND"
Datos$EVTYPE[grepl("^SEVERE THUNDERSTORM.", Datos$EVTYPE)] <- "SEVERE THUNDERSTORM"
Datos$EVTYPE[grepl("^THUNDERSTORM WIN.|^THUNDERSTORMS WIN.|^THUNDERSTORMW",
                   Datos$EVTYPE)] <- "THUNDERSTORM WINDS"
Datos$EVTYPE[grepl("^THUNDERTORM WIN.|^TSTM WIN.|^TSTMW|^TUNDERSTORM WIN.",
                   Datos$EVTYPE)] <- "THUNDERSTORM WINDS"
Datos$EVTYPE[grepl("^THUNDERSTORMS|^THUNDERSTROM WIND|^THUNERSTORM WINDS",
                   Datos$EVTYPE)] <- "THUNDERSTORM WINDS"
Datos$EVTYPE[grepl("^THUNDEERSTORM WINDS|^THUDERSTORM WINDS|^THUNDERESTORM WINDS", 
                   Datos$EVTYPE)] <- "THUNDERSTORM WINDS"
Datos$EVTYPE[grepl("^TORNADO.|^TORNDAO", Datos$EVTYPE)] <- "TORNADO"
Datos$EVTYPE[grepl("^TROPICAL STORM.", Datos$EVTYPE)] <- "TROPICAL STORM"
Datos$EVTYPE[grepl("^WATERSPOUT.", Datos$EVTYPE)] <- "WATERSPOUT"
Datos$EVTYPE[grepl("^WILD.|^FOREST FIRE.|^BRUSH FIRE|^GRASS FIRE.", 
                   Datos$EVTYPE)] <- "WILDFIRE"
Datos$EVTYPE[grepl("^WINTER STORM.", Datos$EVTYPE)] <- "WINTER STORM"
Datos$EVTYPE[grepl("^WINTER WEATHER.", Datos$EVTYPE)] <- "WINTER WEATHER"

valuEvClean <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize_at(c("FATALITIES", "INJURIES", "PROPDMG", "CROPDMG"), sum)
dim(valuEvClean)
## [1] 218   5
Datos$PROPDMGEXP <- as.character(Datos$PROPDMGEXP)
Datos$CROPDMGEXP <- as.character(Datos$CROPDMGEXP)
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("H","h")] = 10^2
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("H","h")] = 10^2
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("K","k")] = 10^3
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("K","k")] = 10^3
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("M","m")] = 10^6
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("M","m")] = 10^6
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("B","b")] = 10^9
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("B","b")] = 10^9
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("+")] = 1
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("+")] = 1
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("0","1","2","3","4","5","6","7","8")] = 10
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("0","1","2","3","4","5","6","7","8")] = 10
Datos$PROPDMGEXP[Datos$PROPDMGEXP %in% c("-","?",""," ")] = 0
Datos$CROPDMGEXP[Datos$CROPDMGEXP %in% c("-","?",""," ")] = 0
Datos$PROPDMGEXP <- as.numeric(Datos$PROPDMGEXP)
Datos$CROPDMGEXP <- as.numeric(Datos$CROPDMGEXP)

I grouped the data together by type of event and added the values of the variables that were of interest:

- Total Fatalities by event.

- Total Injuries by event.

- Total dollar amount for property damages.

- Total dollar amount for damages to crops.

- Total overall in dollars for all damages (properties and crops)

evFatalities <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize(TFatalities = sum(FATALITIES)) %>% 
  filter(TFatalities > 0) %>% 
  arrange(desc(TFatalities))
evInjuries <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize(TInjuries = sum(INJURIES)) %>% 
  filter(TInjuries > 0) %>% 
  arrange(desc(TInjuries))
evPropdmg <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize(TPropdmg = sum(PROPDMG * PROPDMGEXP)) %>% 
  filter(TPropdmg > 0) %>% 
  arrange(desc(TPropdmg))
evCropdmg <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize(TCropdmg = sum(CROPDMG * CROPDMGEXP)) %>% 
  filter(TCropdmg > 0) %>% 
  arrange(desc(TCropdmg))
evTotaldmg <- Datos %>% 
  group_by(EVTYPE) %>% 
  summarize(TTotaldmg = sum((CROPDMG * CROPDMGEXP) + (PROPDMG * PROPDMGEXP))) %>% 
  filter(TTotaldmg > 0) %>% 
  arrange(desc(TTotaldmg))

I chose the values that were above average, the rest were not representative.

evFatalities <- evFatalities[evFatalities$TFatalities > mean(evFatalities$TFatalities), ] 
evInjuries <- evInjuries[evInjuries$TInjuries > mean(evInjuries$TInjuries), ] 
evPropdmg <- evPropdmg[evPropdmg$TPropdmg > mean(evPropdmg$TPropdmg), ] 
evCropdmg <- evCropdmg[evCropdmg$TCropdmg > mean(evCropdmg$TCropdmg), ] 
evTotaldmg <- evTotaldmg[evTotaldmg$TTotaldmg > mean(evTotaldmg$TTotaldmg), ] 

Results

Fatalities

We can see that the type of event with the most fatalities is the “Tornado,” by a wide margin, with nearly 5,700 fatalities. It is followed by “Excessive Heat,” with almost 2,000.

evFatalities
## # A tibble: 12 x 2
##    EVTYPE                  TFatalities
##    <chr>                         <dbl>
##  1 TORNADO                        5658
##  2 EXCESSIVE HEAT                 1999
##  3 HEAT                           1118
##  4 FLASH FLOOD                    1018
##  5 LIGHTNING                       817
##  6 THUNDERSTORM WINDS              710
##  7 RIP CURRENT                     577
##  8 FLOOD                           495
##  9 EXTREME COLD/WIND CHILL         304
## 10 HIGH WIND                       293
## 11 AVALANCHE                       225
## 12 WINTER STORM                    217
ggplot(evFatalities, aes(x=EVTYPE, y=TFatalities)) + 
  geom_bar(stat="identity") + 
  theme(axis.text.x = element_text(angle=30, vjust=1, hjust=1)) + 
  ggtitle("Events with Highest Total Fatalities") +
  labs(x="Event Type", y="Total Fatalities")

Injuries

With regard to injuries, the most devastating event is once again the “Tornado,” with nearly 91,400 injuries. Nothing else even comes close to this figure. The next most devastating event in terms of injuries is “Thunderstorm Winds,” at just below 10,000.

evInjuries
## # A tibble: 9 x 2
##   EVTYPE             TInjuries
##   <chr>                  <dbl>
## 1 TORNADO                91364
## 2 THUNDERSTORM WINDS      9496
## 3 FLOOD                   6806
## 4 EXCESSIVE HEAT          6680
## 5 LIGHTNING               5232
## 6 HEAT                    2494
## 7 ICE STORM               1975
## 8 FLASH FLOOD             1785
## 9 WILDFIRE                1608
ggplot(evInjuries, aes(x=EVTYPE, y=TInjuries)) + 
  geom_bar(stat="identity") + 
  theme(axis.text.x = element_text(angle=30, vjust=1, hjust=1)) + 
  ggtitle("Events with Highest Total Injuries") +
  labs(x="Event Type", y="Total Injuries")

Properties

In terms of giving value to material property losses, the most destructive event is the “Flood,” at nearly 145 billion dollars, followed (not too closely) by “Hurricane (Typhoon)” at just under 85.4 billion dollars and then, once again, by the omnipresent “Tornado,” at 58.5 billion.

evPropdmg
## # A tibble: 15 x 2
##    EVTYPE                  TPropdmg
##    <chr>                      <dbl>
##  1 FLOOD               144957523972
##  2 HURRICANE (TYPHOON)  85356410010
##  3 TORNADO              58541935137
##  4 STORM SURGE          43323536000
##  5 FLASH FLOOD          16732872111
##  6 HAIL                 15974472377
##  7 THUNDERSTORM WINDS    9762051272
##  8 WILDFIRE              8496628500
##  9 TROPICAL STORM        7714390550
## 10 WINTER STORM          6748997260
## 11 HIGH WIND             6003357060
## 12 RIVER FLOOD           5118945500
## 13 STORM SURGE/TIDE      4641188000
## 14 ICE STORM             3944928310
## 15 HEAVY RAIN            3230998140

Crops

In terms of material losses to crops, “Drought” perhaps unsurprisingly tops the list, with nearly 14 billion dollars in damages, followed by “Flood,” with 5.9 billion and “Hurricane (Typhoon)” with 5.5 billion. Both “River Flood” and “Ice Storm,” also come in with more than 5 billion dollars in damages.

evCropdmg
## # A tibble: 11 x 2
##    EVTYPE                     TCropdmg
##    <chr>                         <dbl>
##  1 DROUGHT                 13972571780
##  2 FLOOD                    5878707950
##  3 HURRICANE (TYPHOON)      5516117800
##  4 RIVER FLOOD              5029459000
##  5 ICE STORM                5022113500
##  6 HAIL                     3026094800
##  7 FROST/FREEZE             1616911000
##  8 FLASH FLOOD              1437163150
##  9 EXTREME COLD/WIND CHILL  1330023000
## 10 THUNDERSTORM WINDS       1224414700
## 11 HEAVY RAIN                795755800

Properties & Crops

In order to represent the economic impact, I opted to group the quantities for the property losses and crop damages onto one graph, indicating total values. The most devastating event by far is “Flood,” with more than 150 billion dollars in damages, followed by “Hurricane (Typhoon)” with 90.8 billion. The third is again “Tornado,” which comes in at less than 59 billion dollars.

evTotaldmg
## # A tibble: 16 x 2
##    EVTYPE                 TTotaldmg
##    <chr>                      <dbl>
##  1 FLOOD               150836231922
##  2 HURRICANE (TYPHOON)  90872527810
##  3 TORNADO              58959398047
##  4 STORM SURGE          43323541000
##  5 HAIL                 19000567177
##  6 FLASH FLOOD          18170035261
##  7 DROUGHT              15018677780
##  8 THUNDERSTORM WINDS   10986465972
##  9 RIVER FLOOD          10148404500
## 10 ICE STORM             8967041810
## 11 WILDFIRE              8899910130
## 12 TROPICAL STORM        8409286550
## 13 WINTER STORM          6781441260
## 14 HIGH WIND             6689658960
## 15 STORM SURGE/TIDE      4642038000
## 16 HEAVY RAIN            4026753940
ggplot(evTotaldmg, aes(x=EVTYPE, y=TTotaldmg)) + 
  geom_bar(stat="identity") + 
  theme(axis.text.x = element_text(angle=30, vjust=1, hjust=1)) + 
  ggtitle("Events with Highest Economic Impact") +
  labs(x="Event Type", y="Total $")

NOTE: Before the year 1996, the registry only focused on a very few types of events, with Tornado being one of the most frequently documented. In the data gathered between 1996 and 2011, the number of types of registered events increases substantially.