Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Default settings
knitr::opts_chunk$set(echo = TRUE)
The following code is used to download the data:
setwd("C:/Users/junio/Desktop/COURSERA/DATA SCIENCE/COURSE 5 - Reproducible Research/WEEK 4/Project")
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (file.exists("StormData.csv") == F){
download.file(url, "StormData.csv.bz2")
library(R.utils)
bunzip2("StormData.csv.bz2", "StormData.csv")
}
To evaluate the health impact, the total fatalities and the total injuries for each event type (EVTYPE) are calculated. The codes for this calculation are shown as follows.We will use the “data.table” to handle the data quickly and efficiently.
library(data.table)
# Read the csv file with "fread" to convert it to "data.table"
dt <- fread("StormData.csv", header = T)
## Warning in fread("StormData.csv", header = T): Read less rows (902297) than
## were allocated (967216). Run again with verbose=TRUE and please report.
# Create the variables "TOT.FATALITIES" and "TOT.INJURIES" which are the sum of the total fatalities and injuries by type of event respectively.
dt1 <- dt[, .(TOT.FATALITIES = sum(FATALITIES), TOT.INJURIES = sum(INJURIES)), by = EVTYPE]
# Order each variables created in a decreasing order and convert to factor type, this is done so that when the results are shown in barplot they are shown in descending order.
dt11 <- dt1[order(dt1$TOT.FATALITIES, decreasing = T),]
dt12 <- dt1[order(dt1$TOT.INJURIES, decreasing = T),]
dt11$EVTYPE <- factor(dt11$EVTYPE, levels = dt11$EVTYPE)
dt12$EVTYPE <- factor(dt12$EVTYPE, levels = dt12$EVTYPE)
dt11
## EVTYPE TOT.FATALITIES TOT.INJURIES
## 1: TORNADO 5633 91346
## 2: EXCESSIVE HEAT 1903 6525
## 3: FLASH FLOOD 978 1777
## 4: HEAT 937 2100
## 5: LIGHTNING 816 5230
## ---
## 981: SLEET STORM 0 0
## 982: DENSE SMOKE 0 0
## 983: LAKESHORE FLOOD 0 0
## 984: ASTRONOMICAL LOW TIDE 0 0
## 985: VOLCANIC ASHFALL 0 0
dt12
## EVTYPE TOT.FATALITIES TOT.INJURIES
## 1: TORNADO 5633 91346
## 2: TSTM WIND 504 6957
## 3: FLOOD 470 6789
## 4: EXCESSIVE HEAT 1903 6525
## 5: LIGHTNING 816 5230
## ---
## 981: SLEET STORM 0 0
## 982: DENSE SMOKE 0 0
## 983: LAKESHORE FLOOD 0 0
## 984: ASTRONOMICAL LOW TIDE 0 0
## 985: VOLCANIC ASHFALL 0 0
The data provides two types of economic impact, namely property damage (PROPDMG) and crop damage (CROPDMG). The actual damage in $USD is indicated by PROPDMGEXP and CROPDMGEXP parameters. The index in the PROPDMGEXP and CROPDMGEXP can be interpreted as the following:
The total damage caused by each event type is calculated with the following code.
# Create the scale function to make the conversion of the values of PROPDMGEXP and CROPDMGEXP to units
Escala <- function(x){
if(x %in% 0:8){
exp <- 10
}else if(x %in% c("H","h")){
exp <- 100
}else if(x %in% c("K","k")){
exp <- 1000
}else if(x %in% c("M","m")){
exp <- 1000000
}else if(x %in% c("B","b")){
exp <- 1000000000
}else if(x %in% c("-","?","")){
exp <- 0
}else if(x == "+"){
exp <- 1
}
exp
}
# Create the variables "EXP.PROPDMGEXP" and "EXP.CROPDMGEXP" which are the conversion of the values of PROPDMGEXP and CROPDMGEXP to units
dt2 <- dt[, .(EXP.PROPDMGEXP = sapply(PROPDMGEXP, Escala), EXP.CROPDMGEXP = sapply(CROPDMGEXP, Escala), PROPDMG, CROPDMG, EVTYPE)]
# Create the variables "TOT.PROPDMG" and "TOT.CROPDMG" which are Economic Impact by property damage and crop damage respectively.
dt2 <- dt2[, .(TOT.PROPDMG = EXP.PROPDMGEXP*PROPDMG, TOT.CROPDMG = EXP.CROPDMGEXP*CROPDMG, EXP.PROPDMGEXP, EXP.CROPDMGEXP, PROPDMG, CROPDMG, EVTYPE)]
# Create the variables "TOT.PROPDMGxEVTYPE" and "TOT.CROPDMGxEVTYPE" which are the sum of the total Economic Impact by property damage and crop damage respectively by type of event.
dt2 <- dt2[, .(TOT.PROPDMGxEVTYPE = sum(TOT.PROPDMG), TOT.CROPDMGxEVTYPE = sum(TOT.CROPDMG)), by = EVTYPE]
# Create the variable "TOT.DMG" which is total Economic Impact
dt2 <- dt2[, .(TOT.DMG = TOT.PROPDMGxEVTYPE + TOT.CROPDMGxEVTYPE, EVTYPE)]
# Order the variable created in a decreasing order and convert to factor type.
dt21 <- dt2[order(dt2$TOT.DMG, decreasing = T),]
dt21$EVTYPE <- factor(dt21$EVTYPE, levels = dt21$EVTYPE)
dt21
## TOT.DMG EVTYPE
## 1: 150319678250 FLOOD
## 2: 71913712800 HURRICANE/TYPHOON
## 3: 57352117607 TORNADO
## 4: 43323541000 STORM SURGE
## 5: 18758224527 HAIL
## ---
## 981: 0 DROWNING
## 982: 0 GUSTY THUNDERSTORM WIND
## 983: 0 HIGH SURF ADVISORIES
## 984: 0 SLEET STORM
## 985: 0 VOLCANIC ASHFALL
The top 10 events with the highest total fatalities and injuries are shown graphically.
library(ggplot2)
theme_set(theme_bw())
g <- ggplot(dt11[1:10,], aes(EVTYPE, TOT.FATALITIES))
g <- g + geom_bar(stat = "identity", width = 0.5, fill = "red") + labs(x = "Event Type", y = "Fatalities") + theme(axis.text.x = element_text(angle = 90, vjust = 0.8, size = 8), axis.text.y = element_text(size = 8)) + scale_y_continuous(breaks = seq(0, 8000, 1000))
g
f <- ggplot(dt12[1:10,], aes(EVTYPE, TOT.INJURIES))
f <- f + geom_bar(stat = "identity", width = 0.5, fill = "yellow") + labs(x = "Event Type", y = "Injuries") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size = 8), axis.text.y = element_text(size = 8)) + scale_y_continuous(breaks = seq(0, 100000, 20000))
f
The top 10 events with the highest total economic damages (property and crop combined) are shown graphically.
h <- ggplot(dt21[1:10,], aes(EVTYPE, TOT.DMG))
h <- h + geom_bar(stat = "identity", width = 0.5, fill = "green") + labs(x = "Event Type", y = "Economic Impact") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size = 8), axis.text.y = element_text(size = 8))
h