Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
Our project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We will evaluate which type of events are the most harmful to papulation health as well as which ones have the larges economic consequences.
NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined. National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ
From the URL given and once you have a working directory set we load the required packages and will go ahead and download the data if you don’t have it already.
library(dplyr)
library(ggplot2)
library(data.table)
library(R.utils)
library(tidyr)
if (!file.exists("StormData.csv")) {
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file_name <- "StormData.csv.bz2"
download.file(url, destfile = file_name)
bunzip2(file_name) # unzip file
}
Now we will go ahead and load our data, For easier management we changed slightly the names of the variables. Then we read it with the fread function.
col <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")
col2 <- c("Event.Type", "Fatalities", "Injuries", "PropertyD", "PropertyDexp",
"CropD", "CropDexp")
Stormdata <- fread("StormData.csv", select = col, col.names = col2)
str(Stormdata)
## Classes 'data.table' and 'data.frame': 902297 obs. of 7 variables:
## $ Event.Type : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ Fatalities : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Injuries : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PropertyD : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PropertyDexp: chr "K" "K" "K" "K" ...
## $ CropD : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CropDexp : chr "" "" "" "" ...
## - attr(*, ".internal.selfref")=<externalptr>
head(Stormdata)
## Event.Type Fatalities Injuries PropertyD PropertyDexp CropD CropDexp
## 1: TORNADO 0 15 25.0 K 0
## 2: TORNADO 0 0 2.5 K 0
## 3: TORNADO 0 2 25.0 K 0
## 4: TORNADO 0 2 2.5 K 0
## 5: TORNADO 0 2 2.5 K 0
## 6: TORNADO 0 6 2.5 K 0
In this database the damage to both property and crops is coded by two columns. One of them tells us the numer and the other will have the exponential modifier defined bi the leters “H”,“K”, “M”,“B”.
# Creating a temporary table that identifies the modifiers
temp <- data.table(Code = c("", "H", "K", "M", "B"),
Multiplier = c(0.000001, 0.0001, 0.001, 1, 1000))
# Removing the punctiation from the exponential column.
Stormdata[PropertyDexp %like% "\\d|[[:punct:]]", PropertyDexp:= ""]
Stormdata[CropDexp %like% "\\d|[[:punct:]]", CropDexp:= ""]
# Converting everything to a uniform upper case
Stormdata[, PropertyDexp := toupper(PropertyDexp)]
Stormdata[, CropDexp := toupper(CropDexp)]
# Now we add the multiplier columns previously created matching by the Code.
Stormdata <- merge(Stormdata, temp, by.x = "PropertyDexp", by.y = "Code")
setnames(Stormdata, "Multiplier", "MultiplierPROP")
Stormdata <- merge(Stormdata, temp, by.x = "CropDexp", by.y = "Code")
setnames(Stormdata, "Multiplier", "MultiplierCROP")
# Now we calculate the damage (in Millions of Dollars (10^6))
Stormdata[, c("PropertyD", "CropD") := .(round(PropertyD*MultiplierPROP), round(CropD*MultiplierCROP))]
# Finally we remove the extra columns.
Stormdata[, c("CropDexp", "PropertyDexp", "MultiplierPROP", "MultiplierCROP") := NULL]
First we evaluate how many different event types we have which is 985 events.
length(unique(Stormdata[,Event.Type]))
## [1] 985
Having this in mind we have to do some transformations to make everything more uniform. Based on the provided FAQ (see data ) we selected roughtly the 10 more common events that could be easily merged.
Stormdata[Event.Type %like% ".*TSTM.*|.*THUNDER.*", Event.Type := "THUNDERSTORM"]
Stormdata[Event.Type %like% ".*TORN.*", Event.Type := "TORNADO"]
Stormdata[Event.Type %like% ".*SNOW.*", Event.Type := "SNOW"]
Stormdata[Event.Type %like% ".*WIND.*", Event.Type := "WIND"]
Stormdata[Event.Type %like% ".*HAIL.*", Event.Type := "HAIL"]
Stormdata[Event.Type %like% ".*FLOOD.*", Event.Type := "FLOOD"]
Stormdata[Event.Type %like% ".*LIGHT.*", Event.Type := "LIGHTNING"]
Stormdata[Event.Type %like% ".*RAIN.*", Event.Type := "RAIN"]
Stormdata[Event.Type %like% ".*WINTER.*|.*BLIZZARD.*|.*ICE.*|.*COLD.*", Event.Type := "COLD"]
Stormdatatidy <- Stormdata[, lapply(.SD, sum), by = Event.Type]
Stormdatatidy
## Event.Type Fatalities Injuries PropertyD CropD
## 1: THUNDERSTORM 756 9545 10162 1105
## 2: HAIL 15 1371 15418 2554
## 3: RAIN 107 299 3183 796
## 4: SNOW 166 1160 913 134
## 5: FLOOD 1523 8603 165628 11958
## ---
## 452: EXCESSIVE WETNESS 0 0 0 142
## 453: Freeze 0 0 0 11
## 454: Unseasonable Cold 0 0 0 5
## 455: Early Frost 0 0 0 42
## 456: Heavy Rain/High Surf 0 0 14 2
Taking both Injuries and Fatalities as markers for population health impact and just using the 10 most common events we obtain the following.
setorder(Stormdatatidy,-Injuries,-Fatalities)
Stormfj2 <- Stormdatatidy[1:10,]
Stormfj2 <- Stormfj2%>% gather("Fatalities", "Injuries", key = "Outcome", value = "Incidents")
ggplot(Stormfj2,aes(x = reorder(Event.Type, +Incidents), y = Incidents, fill = Outcome)) +
geom_bar(stat = "identity") +
coord_flip()+
scale_y_continuous(labels = scales::comma)+
labs(y = "Fatalities & Injuries Total", x = "Event Type", title = "Deaths & Injuries vs. Storm Event Type") +
theme_classic()
So that means that Tornadoes have caused the most impact on population health. This is followed by Thunderstorms and Floods.
For a better analysis we divided this into separate graphs for both properties and crops. Of not all this values are in 10^6 dollars.
So for the Crops
setorder(Stormdatatidy, -CropD)
Stormcrop <- Stormdatatidy[1:10,]
ggplot(Stormcrop, aes(x = reorder(Event.Type,+CropD), y=CropD)) +
geom_bar(stat = "identity",fill="Royalblue1") +
coord_flip()+
scale_y_continuous(labels = scales::dollar)+
labs(y = "Crop Damage", x = "Storm Event Type", title = "Crop Damage vs. Storm Event Type") +
theme_classic()
For Crop damage Droughts are the ones that cause the most damage
And for the property damage
setorder(Stormdatatidy, -PropertyD)
Stormprop <- Stormdatatidy[1:10,]
ggplot(Stormprop, aes(x = reorder(Event.Type,+PropertyD), y=PropertyD)) +
geom_bar(stat = "identity",fill="Springgreen2") +
coord_flip()+
scale_y_continuous(labels = scales::dollar)+
labs(y = "Property Damage (10^6 Dollars)", x = "Storm Event Type", title = "Property Damage vs. Storm Event Type") +
theme_classic()
For property damage Floods cause the most impact.
Based on this database we can conclude that Tornadoes are the ones mthat have the most impact on the population health and also up in the property damage but not in the crop damage. Nontheless Floods are events that cause a major impact both in the economy and population health. Different measures should be taken by government for prediction and prevention of major damage or cost caused by these events.