Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Our project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We will evaluate which type of events are the most harmful to papulation health as well as which ones have the larges economic consequences.

Data

NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined. National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ

Data Processing

Loading Packages and Downloading the data

From the URL given and once you have a working directory set we load the required packages and will go ahead and download the data if you don’t have it already.

library(dplyr)
library(ggplot2)
library(data.table)
library(R.utils)
library(tidyr)
if (!file.exists("StormData.csv")) {
  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  file_name <- "StormData.csv.bz2"
  download.file(url, destfile = file_name)
  bunzip2(file_name) # unzip file
}

Loading the data into Rstudio

Now we will go ahead and load our data, For easier management we changed slightly the names of the variables. Then we read it with the fread function.

col <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
         "CROPDMG", "CROPDMGEXP")
col2 <- c("Event.Type", "Fatalities", "Injuries", "PropertyD", "PropertyDexp",
          "CropD", "CropDexp")
Stormdata <- fread("StormData.csv", select = col, col.names = col2)
str(Stormdata)
## Classes 'data.table' and 'data.frame':   902297 obs. of  7 variables:
##  $ Event.Type  : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ Fatalities  : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Injuries    : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PropertyD   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PropertyDexp: chr  "K" "K" "K" "K" ...
##  $ CropD       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CropDexp    : chr  "" "" "" "" ...
##  - attr(*, ".internal.selfref")=<externalptr>
head(Stormdata)
##    Event.Type Fatalities Injuries PropertyD PropertyDexp CropD CropDexp
## 1:    TORNADO          0       15      25.0            K     0         
## 2:    TORNADO          0        0       2.5            K     0         
## 3:    TORNADO          0        2      25.0            K     0         
## 4:    TORNADO          0        2       2.5            K     0         
## 5:    TORNADO          0        2       2.5            K     0         
## 6:    TORNADO          0        6       2.5            K     0

Preparing the data for analysis - Transforming the cost columns

In this database the damage to both property and crops is coded by two columns. One of them tells us the numer and the other will have the exponential modifier defined bi the leters “H”,“K”, “M”,“B”.

# Creating a temporary table that identifies the modifiers
temp <- data.table(Code = c("", "H", "K", "M", "B"),
                   Multiplier = c(0.000001, 0.0001, 0.001, 1, 1000))

# Removing the punctiation from the exponential column. 
Stormdata[PropertyDexp %like% "\\d|[[:punct:]]", PropertyDexp:= ""]
Stormdata[CropDexp %like% "\\d|[[:punct:]]", CropDexp:= ""]

# Converting everything to a uniform upper case
Stormdata[, PropertyDexp := toupper(PropertyDexp)]
Stormdata[, CropDexp := toupper(CropDexp)]

# Now we add the multiplier columns previously created matching by the Code. 
Stormdata <- merge(Stormdata, temp, by.x = "PropertyDexp", by.y = "Code")
setnames(Stormdata, "Multiplier", "MultiplierPROP")
Stormdata <- merge(Stormdata, temp, by.x = "CropDexp", by.y = "Code")
setnames(Stormdata, "Multiplier", "MultiplierCROP")

# Now we calculate the damage (in Millions of Dollars (10^6))
Stormdata[, c("PropertyD", "CropD") := .(round(PropertyD*MultiplierPROP), round(CropD*MultiplierCROP))]

# Finally we remove the extra columns. 
Stormdata[, c("CropDexp", "PropertyDexp", "MultiplierPROP", "MultiplierCROP") := NULL]

Preparing the Data for Analysis - Impact to Population Health

First we evaluate how many different event types we have which is 985 events.

length(unique(Stormdata[,Event.Type]))
## [1] 985

Having this in mind we have to do some transformations to make everything more uniform. Based on the provided FAQ (see data ) we selected roughtly the 10 more common events that could be easily merged.

Stormdata[Event.Type %like% ".*TSTM.*|.*THUNDER.*", Event.Type := "THUNDERSTORM"] 
Stormdata[Event.Type %like% ".*TORN.*", Event.Type := "TORNADO"]
Stormdata[Event.Type %like% ".*SNOW.*", Event.Type := "SNOW"]
Stormdata[Event.Type %like% ".*WIND.*", Event.Type := "WIND"]
Stormdata[Event.Type %like% ".*HAIL.*", Event.Type := "HAIL"]
Stormdata[Event.Type %like% ".*FLOOD.*", Event.Type := "FLOOD"] 
Stormdata[Event.Type %like% ".*LIGHT.*", Event.Type := "LIGHTNING"] 
Stormdata[Event.Type %like% ".*RAIN.*", Event.Type := "RAIN"]
Stormdata[Event.Type %like% ".*WINTER.*|.*BLIZZARD.*|.*ICE.*|.*COLD.*", Event.Type := "COLD"] 

Stormdatatidy <- Stormdata[, lapply(.SD, sum), by = Event.Type]
Stormdatatidy
##                Event.Type Fatalities Injuries PropertyD CropD
##   1:         THUNDERSTORM        756     9545     10162  1105
##   2:                 HAIL         15     1371     15418  2554
##   3:                 RAIN        107      299      3183   796
##   4:                 SNOW        166     1160       913   134
##   5:                FLOOD       1523     8603    165628 11958
##  ---                                                         
## 452:    EXCESSIVE WETNESS          0        0         0   142
## 453:               Freeze          0        0         0    11
## 454:    Unseasonable Cold          0        0         0     5
## 455:          Early Frost          0        0         0    42
## 456: Heavy Rain/High Surf          0        0        14     2

Results and Graphs

So, What are the events that have the most impact on population health?

Taking both Injuries and Fatalities as markers for population health impact and just using the 10 most common events we obtain the following.

setorder(Stormdatatidy,-Injuries,-Fatalities)
Stormfj2 <- Stormdatatidy[1:10,]
Stormfj2 <- Stormfj2%>% gather("Fatalities", "Injuries", key = "Outcome", value = "Incidents")

ggplot(Stormfj2,aes(x = reorder(Event.Type, +Incidents), y = Incidents, fill = Outcome)) + 
geom_bar(stat = "identity") +
coord_flip()+
scale_y_continuous(labels = scales::comma)+
labs(y = "Fatalities & Injuries Total", x = "Event Type", title = "Deaths & Injuries vs. Storm Event Type") +
theme_classic()

So that means that Tornadoes have caused the most impact on population health. This is followed by Thunderstorms and Floods.

And, Which events have the most impact on the economy?

For a better analysis we divided this into separate graphs for both properties and crops. Of not all this values are in 10^6 dollars.

So for the Crops

setorder(Stormdatatidy, -CropD)
Stormcrop <- Stormdatatidy[1:10,]

ggplot(Stormcrop, aes(x = reorder(Event.Type,+CropD), y=CropD)) + 
geom_bar(stat = "identity",fill="Royalblue1") +
coord_flip()+
scale_y_continuous(labels = scales::dollar)+
labs(y = "Crop Damage", x = "Storm Event Type", title = "Crop Damage vs. Storm Event Type") +
theme_classic()

For Crop damage Droughts are the ones that cause the most damage

And for the property damage

setorder(Stormdatatidy, -PropertyD)
Stormprop <- Stormdatatidy[1:10,]

ggplot(Stormprop, aes(x = reorder(Event.Type,+PropertyD), y=PropertyD)) + 
geom_bar(stat = "identity",fill="Springgreen2") +
coord_flip()+
scale_y_continuous(labels = scales::dollar)+
labs(y = "Property Damage (10^6 Dollars)", x = "Storm Event Type", title = "Property Damage vs. Storm Event Type") +
theme_classic()

For property damage Floods cause the most impact.

Conclusions

Based on this database we can conclude that Tornadoes are the ones mthat have the most impact on the population health and also up in the property damage but not in the crop damage. Nontheless Floods are events that cause a major impact both in the economy and population health. Different measures should be taken by government for prediction and prevention of major damage or cost caused by these events.