Synopsis:

Severe climate events cand result in injuries, fatalities and property damages. Knowing that, preventing such outcomes to the extent possible is a key concern for every public entity.

This project will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and visually present, according to the data collected, estimates of fatalities, injuries, and property damage in a myriad of different climate events.

The data can be found Here, the National Weather Service Instruction can be found Here, and the FAQ for the National Climatic Data Center Storm Events can be seen Here.

The data cover events from 1950 up to November 2011.

1. Proposed Questions:

Our intent here is to answer two questions: which events are the most harmful to the population, and which are responsible for biggest losses in the economy.

2. Data Processing:

2.1 Reading Data:

Our first task is to download our database and read it into R.

if (!file.exists("StormData.csv.bz2"))
  {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile="StormData.csv.bz2")
  }
if (!exists("stormData"))
  {
  stormData <- read.csv("StormData.csv.bz2")
  }

2.2 Initial Exploration:

The collected data, in the current state, can’t properly answer our questions. In this section, we will treat the data, standardize it and make useful arrangements. We will create a field for the total damage (that sums property damages and crop damages in dollars), retain the event types, fatalities and injuries, and remove most other fields that aren’t meaningful for the questions we are willing to answer.

library(dplyr)
stormSmall <- stormData[, c("EVTYPE", "FATALITIES", "INJURIES")]
stormSmall$PROPDMG <- stormData$PROPDMG*
                              case_when(
                                  stormData$PROPDMGEXP == "1" ~ 10^1, 
                                  stormData$PROPDMGEXP == "2" ~ 10^2, 
                                  stormData$PROPDMGEXP == "3" ~ 10^3, 
                                  stormData$PROPDMGEXP == "4" ~ 10^4, 
                                  stormData$PROPDMGEXP == "5" ~ 10^5, 
                                  stormData$PROPDMGEXP == "6" ~ 10^6, 
                                  stormData$PROPDMGEXP == "7" ~ 10^7, 
                                  stormData$PROPDMGEXP == "8" ~ 10^8,
                                  stormData$PROPDMGEXP == "9" ~ 10^9, 
                                  stormData$PROPDMGEXP == "H" ~ 10^2,
                                  stormData$PROPDMGEXP == "h" ~ 10^2, 
                                  stormData$PROPDMGEXP == "K" ~ 10^3, 
                                  stormData$PROPDMGEXP == "k" ~ 10^3, 
                                  stormData$PROPDMGEXP == "M" ~ 10^6,
                                  stormData$PROPDMGEXP == "m" ~ 10^6,
                                  stormData$PROPDMGEXP == "B" ~ 10^9, 
                                  stormData$PROPDMGEXP == "b" ~ 10^9, 
                                  .default = 10^0 )

stormSmall$CROPDMG <- stormData$CROPDMG*
                              case_when(
                                  stormData$CROPDMGEXP == "1" ~ 10^1, 
                                  stormData$CROPDMGEXP == "2" ~ 10^2, 
                                  stormData$CROPDMGEXP == "3" ~ 10^3, 
                                  stormData$CROPDMGEXP == "4" ~ 10^4, 
                                  stormData$CROPDMGEXP == "5" ~ 10^5, 
                                  stormData$CROPDMGEXP == "6" ~ 10^6, 
                                  stormData$CROPDMGEXP == "7" ~ 10^7, 
                                  stormData$CROPDMGEXP == "8" ~ 10^8,
                                  stormData$CROPDMGEXP == "9" ~ 10^9, 
                                  stormData$CROPDMGEXP == "H" ~ 10^2,
                                  stormData$CROPDMGEXP == "h" ~ 10^2, 
                                  stormData$CROPDMGEXP == "K" ~ 10^3, 
                                  stormData$CROPDMGEXP == "k" ~ 10^3, 
                                  stormData$CROPDMGEXP == "M" ~ 10^6,
                                  stormData$CROPDMGEXP == "m" ~ 10^6,
                                  stormData$CROPDMGEXP == "B" ~ 10^9, 
                                  stormData$CROPDMGEXP == "b" ~ 10^9, 
                                  .default = 10^0 )

stormSmall$TOTDMG <- (stormSmall$PROPDMG + stormSmall$CROPDMG)

To properly convert those values, we adopted the guidelines proposed in the aforementioned NATIONAL WEATHER SERVICE INSTRUCTION, page 12, where we read: “Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.”

Now we need to evaluate the events that exists in our dataset. Our strategy here is to find those that are meaningful enough to help us answer the proposed questions.

uniqueEvents <- length(unique(stormData$EVTYPE))

This shows that we have a total of 985 unique entries in the EVTYPE column. Since this much noise would render the results difficult to read, our strategy will be as follow: 1) We will order the Top 20 events for every analysis that we make; 2) We will manually look at them and search for matching names; 3) If there is any, we will sum them together. 4) In the end, we will reorder the table again and extract only the Top 10.

First we will do for Fatalities.

sumFatalities <- aggregate(FATALITIES ~ EVTYPE, stormSmall,  sum)
topFatalities <- sumFatalities[order(-sumFatalities$FATALITIES), ][1:20, ]

Some common occurrences that we manage to find were: TSTM WIND and THUNDERSTORM WIND, EXCESSIVE HEAT and HEAT, EXTREME COLD/WIND CHILL and EXTREME COLD.

topFatalities[2,1] <- "HEAT"
topFatalities[2,2] <- topFatalities[2,2]+topFatalities[4,2]
topFatalities[4,2] <- 0
topFatalities[6,1] <- "THUNDERSTORM WIND"
topFatalities[6,2] <- topFatalities[6,2]+topFatalities[15,2]
topFatalities[15,2] <- 0
topFatalities[14,1] <- "EXTREME COLD"
topFatalities[14,2] <- topFatalities[14,2]+topFatalities[17,2]
topFatalities[17,2] <- 0
topFatalities <- topFatalities[order(-topFatalities$FATALITIES), ][1:10, ]

Now we will repeat it for Injuries.

sumInjuries <- aggregate(INJURIES ~ EVTYPE, stormSmall,  sum)
topInjuries <- sumInjuries[order(-sumInjuries$INJURIES), ][1:20, ]

Some common occurrences that we manage to find were: TSTM WIND, THUNDERSTORM WIND and THUNDERSTORM WINDS, EXCESSIVE HEAT and HEAT, WILD/FOREST FIRE and WILDFIRE.

topInjuries[2,1] <- "THUNDERSTORM WIND"
topInjuries[2,2] <- topInjuries[2,2]+topInjuries[9,2]+topInjuries[16,2]
topInjuries[9,2] <- 0
topInjuries[16,2] <- 0
topInjuries[4,1] <- "HEAT"
topInjuries[4,2] <- topInjuries[4,2]+topInjuries[6,2]
topInjuries[6,2] <- 0
topInjuries[15,1] <- "WILDFIRE"
topInjuries[15,2] <- topInjuries[15,2]+topInjuries[19,2]
topInjuries[19,2] <- 0
topInjuries <- topInjuries[order(-topInjuries$INJURIES), ][1:10, ]

Lastly, we will do it for the economic cost of the damage caused.

sumEconomic <- aggregate(TOTDMG ~ EVTYPE, stormSmall, sum)
topEconomic <- sumEconomic[order(-sumEconomic$TOTDMG), ][1:20, ]

Some common occurrences that we manage to find were: HURRICANE/TYPHOON and HURRICANE, TSTM WIND and THUNDERSTORM WINDS, WILD/FOREST FIRE and WILDFIRE, STORM SURGE/TIDE and STORM SURGE.

topEconomic[2,1] <- "HURRICANE"
topEconomic[2,2] <- topEconomic[2,2]+topEconomic[8,2]
topEconomic[8,2] <- 0
topEconomic[4,1] <- "STORMSURGE"
topEconomic[4,2] <- topEconomic[4,2]+topEconomic[15,2]
topEconomic[15,2] <- 0
topEconomic[14,1] <- "THUNDERSTORM WINDS"
topEconomic[14,2] <- topEconomic[15,2]+topEconomic[17,2]
topEconomic[16,2] <- 0
topEconomic[14,1] <- "WILDFIRE"
topEconomic[14,2] <- topEconomic[14,2]+topEconomic[19,2]
topEconomic[19,2] <- 0
topEconomic <- topEconomic[order(-topEconomic$TOTDMG), ][1:10, ]

3. Results:

3.1 Which events are more harmful with respect to population health?

The first graph we will show is for the number lives that were lost due to climate events.

library(ggplot2)
plotFatalities <- ggplot(topFatalities) +
  geom_bar(aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES, fill = EVTYPE), 
           position = "stack", stat = "identity") +
    theme(legend.position = "none") +
    coord_flip() +
    scale_fill_brewer(palette = "RdGy") + 
    labs(x = "Event Type",y = "Number of Fatalities", 
         title="Top 10 most lethal events")
print(plotFatalities)

Now, let’s see the graph for the number harmed people due to climate events.

plotInjuries <- ggplot(topInjuries) +
  geom_bar(aes(x = reorder(EVTYPE, INJURIES), y = INJURIES, fill = EVTYPE), 
           position = "stack", stat = "identity") +
    theme(legend.position = "none") +
    coord_flip() +
    scale_fill_brewer(palette = "RdGy") + 
    labs(x = "Event Type",y = "Number of Injuries", 
         title="Top 10 most lethal events")
print(plotInjuries)

Since it’s hard to compare numerically deaths and injuries, we will grant a more broaden conclusion. The middle and the bottom events vary to a degree, but we can safely say that tornadoes, high heat and thunderstorm winds are the events that presents the biggest treat to the human life according to the collected data.

3.2 Which events have the greatest economic consequences?

As for the economic consequences, let’s see the graphic.

plotEconomic <- ggplot(topEconomic) +
  geom_bar(aes(x = reorder(EVTYPE, TOTDMG), y = TOTDMG/1000000000, fill = EVTYPE), 
           position = "stack", stat = "identity") +
    theme(legend.position = "none") +
    coord_flip() +
    scale_fill_brewer(palette = "RdGy") + 
    labs(x = "Event Type",y = "Total loss (in billions of dollars)", 
         title="Top 10 events with more economic significance")
print(plotEconomic)

Since both property damage and crop damage were collected in dollars, we could put them together and have a more definite answer, according to the collected data, about what are the most economic damaging climate events.