Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents the occurence of storms and other significant weather phenomena having sufficient intensity to cause the loss of life, injuries, significant property damage and/or disruption of commerce. It also documents rare, unusual weather phenomena that generate media attention, as well as other significant meteorogial events such as maximum or minimum temperatures or precipitation that occur in connection with another event. The events in the database start in the year 1950 and end in November 2011.

The purpose of this document is to provide an analysis of the NOAA Storm Databae and answer some basic questions about the impact of severe weather events.

Data Processing

Loading the data

First we download the storm data from the NOAA website. It is provided in a bzip format which we will need to uncompress and then load into a data frame.

  #Load the libraries we will need for processing 
  library(dplyr)
  library(ggplot2)
  library(knitr)

  #Download the file from the NOAA website
  if(!file.exists("noaa.data.csv.bz2")) {
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile="noaa.data.csv.bz2")
  }

  #Open the file and load the data into a data frame
  if (file.exists("noaa.data.csv.bz2")) 
      noaa.data <- read.csv(bzfile("noaa.data.csv.bz2"), header = TRUE)

There are 902,297 observations in 37 variables.

str(noaa.data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Preprocessing

Filtering the data

Since we don’t need all the 37 variables for our analysis we will prepare a data frame with the columns we will be using.

noaa.analysis  <- select(noaa.data, 
                         EventType = EVTYPE, 
                         Fatalities = FATALITIES, 
                         Injuries = INJURIES, 
                         PropertyDamage = PROPDMG, 
                         PropertyDamageMagnitude = PROPDMGEXP, 
                         CropDamage = CROPDMG, 
                         CropDamageMagnitude = CROPDMGEXP)

We end up with a data frame with the following columns:

EventType describes the event type (tornado, hurricane, flood, etc.)

Fatalities number of human deaths caused by the event.

Injuries number of human injuries caused by the event.

PropertyDamage property damage caused by the event in US Dollars.

PropertyDamageMagnitude magnitude of property damage (thousands, millions, etc).

CropDamage crop damage caused by the event in US Dollars.

CropDamageMagnitude magnitude of crop damage (thosands, millions, etc).

Preparing the data

The PropertyDamageMagnitude and the CropDamageMagnitude columns indicate the magnitude of the PropertyDamage and the CropDamage columns respectively, we will need to compute two more columns to correctly express the property and crop damage values.

For the purposes of this analysis I am treating the PropertyDamageMagnitude and the CropDamageMagnitude columns as follows: H for hundreds, K for thousands, M for millions, B for billions and will create the corresponding columns with a function that will make the appropriate calculations.

PopulateDamage <- function(damage.value, magnitude) {
  factor.value <- 0
  
  magnitude <- toupper(magnitude)
  factor.value <- ifelse(magnitude=='B',9, 
                        ifelse(magnitude=='M',6, 
                              ifelse(magnitude=='K',3, 
                                     ifelse(magnitude=='H',2,0 ))))
                
  calculated.damage <- damage.value * (10^factor.value)
  calculated.damage
}

Now with this function in place, we can add the new columns with the damage amount.

noaa.complete <- mutate(noaa.analysis, 
                        PropertyDamageTotal = PopulateDamage(PropertyDamage, PropertyDamageMagnitude), 
                        CropDamageTotal = PopulateDamage(CropDamage,CropDamageMagnitude))

We now have the data in the format we need for our analysis.

Results

Across the United States, which types of events are most harmful with respect to population health?

In order to determine the events that caused more harm to population health we will select the events with the higher number of fatalities and injuries.

#Group by event
by.event <- group_by (noaa.complete, EventType)

health.harm.top10 <- summarize(by.event, TotalFatalities = sum(Fatalities), TotalInjuries = sum(Injuries),
                               TotalHarm = sum(Fatalities) + sum(Injuries)) %>% 
                      arrange(desc(TotalHarm)) %>% 
                      top_n(10)

These are the events that cause more harm to population health:

kable(health.harm.top10, caption = "Most Harmful Events to Population Health")
Most Harmful Events to Population Health
EventType TotalFatalities TotalInjuries TotalHarm
TORNADO 5633 91346 96979
EXCESSIVE HEAT 1903 6525 8428
TSTM WIND 504 6957 7461
FLOOD 470 6789 7259
LIGHTNING 816 5230 6046
HEAT 937 2100 3037
FLASH FLOOD 978 1777 2755
ICE STORM 89 1975 2064
THUNDERSTORM WIND 133 1488 1621
WINTER STORM 206 1321 1527
ggplot(health.harm.top10,  aes(x=EventType, y=TotalHarm, fill=EventType))+ 
  geom_bar(colour="black", stat="identity") + coord_flip() +
  labs(title="Most Harmful Events to Population Health", x="Event", y="Total Harm") +
  theme(title=element_text(size=18,face="bold"), 
        axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))

Across the United States, which types of events have the greatest economic consequences?

most.damage <- group_by(noaa.complete, EventType)  %>% 
                summarize( TotalEconomicDamage = sum(PropertyDamageTotal + CropDamageTotal) / 1000000000) %>% 
                arrange(desc(TotalEconomicDamage))  %>% 
                top_n(10)

These are the events with the greatest economic consequences, expressed in billions of US Dollars:

ggplot(most.damage,  aes(x=EventType, y=TotalEconomicDamage, fill=EventType))+ 
  geom_bar(colour="black", stat="identity") + coord_flip() +
  labs( x="Event", y="Total damage") +
  ggtitle(expression(atop("Events with the Greatest Economic Consequences", 
                          atop("In billions of US Dollars", "")))) +
  theme(title=element_text(size=18,face="bold"), 
        axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))  

Conclusion

Tornadoes have caused the most harm to population while floods have caused the greatest economic damages.