The present study aims to describe the climatic events that are most damaging to the health of the population and produce the greatest economic lost in the United States. The U.S. National Oceanic and Atmospheric Administration (NOAA) storm database is the source data for this task. The analysis estimates deaths, injuries, property damage, and crop damage for each type of event (eg, tornado, flood, storm, hail, etc.). The results of the analysis show that since 1996 (when there is complete data on the different types of events), the five most damaging events with respect to the health of the population are: tornadoes, excessive heat, floods, lightning and thunderstorm winds, and the five largest economic damages are caused by floods, hurricanes/typhoons, storm/surges and tornadoes.
setwd("~/Desktop/DST/Reproducible Research - Course Project 2")
library(readr)
library(data.table)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
ClimateData <- read.csv("repdata-data-StormData.csv", header = TRUE, sep=",")
There are 902.297 observations with 37 variables in the file, to answer the two questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
According to the NOAA the data recording start from Jan. 1950. At that time they recorded one event type, tornado. They add more events gradually and only from Jan. 1996 they start recording all events type (48 event types). Since our objective is comparing the effects of different weather events, in order to have o comparable basis for the analysis, the dataset is limited to the observations between 1996 and 2011.
Following these premises, the only relevant information for the analysis are the following variables:
Event outcome variable:
EVTYPE: weather event (Tornados, Wind, Snow, Flood)
Date variable
BGN_TIME: begin date of the event
Health variables:
FATALITIES: number of deaths as result of events
INJURIES: number of injuries as result of events
Economic variables:
PROPDMG: property damages in USD
PROPDMGEXP: the units multiplier for property damage (K, M, or B)
CROPDMG: crop damages in USD
CROPDMGEXP: the units multiplier for crop damage (K, M, or B)
ClimateData$BGN_DATE <- as.Date(ClimateData$BGN_DATE, "%m/%d/%Y")
NewData<-ClimateData %>% select("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP") %>% filter(BGN_DATE >= as.Date("1996-01-01") & BGN_DATE <= as.Date("2012-01-01"))
Subsetting the date, we get a data frame with 653530 observations and 8 variables
dim(NewData)
## [1] 653530 8
NewData <- as.data.table(NewData)
NewData <- NewData[(EVTYPE != "?" & (INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG > 0)), c("EVTYPE", "BGN_DATE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Cleaning and tyding PROPDMGEXP and CROPDMGEXP columns
Next step is the need to convert both, the exponent values of crop damage (“CROPDMGEXP”) and property damage variables (“PROPDMGEXP”) as follows:
Specifying the columns with variables PROPDMGEXP and CROPDMGEXP in a vector named cols
cols <- c("PROPDMGEXP", "CROPDMGEXP")
The result is assigned to the columns specified in cols, and .SDcols (the Subset of the Data) tells the call (character translation: “toupper”) that we’re only looking at those columns
NewData[, (cols) := c(lapply(.SD, toupper)), .SDcols = cols]
A new variable named ConvPROPDMG has the new values of exponent trasnformation for property damage
ConvPROPDMG <- c("-" = 10^0, "+" = 10^0, "0" = 10^0, "1" = 10^1, "2" = 10^2, "3" = 10^3,"4" = 10^4, "5" = 10^5, "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, "H" = 10^2, "K" = 10^3, "M" = 10^6, "B" = 10^9)
A new variable named ConvCROPDMG has the new values of exponent trasnformation for crop damage
ConvCROPDMG <- c("?" = 10^0, "0" = 10^0, "K" = 10^3, "M" = 10^6, "B" = 10^9)
The new variable ConvPROPDMG with transformed exponent is assigned to PROPDMGEXP variable
NewData[, PROPDMGEXP := ConvPROPDMG[as.character(NewData[,PROPDMGEXP])]]
NA values in PROPDMGEXP are assigned a 10^0 value
NewData[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]
The new variable ConvCROPDMG with transformed exponent is assigned to CROPDMGEXP variable
NewData[, CROPDMGEXP := ConvCROPDMG[as.character(NewData[,CROPDMGEXP])] ]
NA values in CROPDMGEXP are assigned a 10^0 value
NewData[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]
NewData <- NewData[, .(EVTYPE, BGN_DATE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, PROPCOST = PROPDMG * PROPDMGEXP, CROPDMG, CROPDMGEXP, CROPCOST = CROPDMG * CROPDMGEXP)]
To get the impact on health population, I estimate the total of Fatalities and Injuries for each event.
Health <- NewData[, .(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), TOTALHEALTH = sum(FATALITIES) + sum(INJURIES)), by = .(EVTYPE)]
Health <- Health[order(-TOTALHEALTH), ]
Health <- Health[1:10, ]
The 10 most harmful events are described
head(Health, 10)
## EVTYPE FATALITIES INJURIES TOTALHEALTH
## 1: TORNADO 1511 20667 22178
## 2: EXCESSIVE HEAT 1797 6391 8188
## 3: FLOOD 414 6758 7172
## 4: LIGHTNING 651 4141 4792
## 5: TSTM WIND 241 3629 3870
## 6: FLASH FLOOD 887 1674 2561
## 7: THUNDERSTORM WIND 130 1400 1530
## 8: WINTER STORM 191 1292 1483
## 9: HEAT 237 1222 1459
## 10: HURRICANE/TYPHOON 64 1275 1339
To get the impact on economic lost, I estimate the total of Property Cost and Crop Cost for each event.
Economic <- NewData[, .(PROPCOST = sum(PROPCOST), CROPCOST = sum(CROPCOST), TOTALECONOMICS = sum(PROPCOST) + sum(CROPCOST)), by = .(EVTYPE)]
Economic <- Economic[order(-TOTALECONOMICS), ]
Economic <- Economic[1:10, ]
The 10 most harmful events are described
head(Economic, 10)
## EVTYPE PROPCOST CROPCOST TOTALECONOMICS
## 1: FLOOD 143944833550 4974778400 148919611950
## 2: HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3: STORM SURGE 43193536000 5000 43193541000
## 4: TORNADO 24616945710 283425010 24900370720
## 5: HAIL 14595143420 2476029450 17071172870
## 6: FLASH FLOOD 15222203910 1334901700 16557105610
## 7: HURRICANE 11812819010 2741410000 14554229010
## 8: DROUGHT 1046101000 13367566000 14413667000
## 9: TROPICAL STORM 7642475550 677711000 8320186550
## 10: HIGH WIND 5247860360 633561300 5881421660
To answer question 1), I made an histogram to get the top 10 weather events that are most harmful to health population.
healthimpact <- melt(Health, id.vars = "EVTYPE", variable.name = "Fatalities_and_Injuries")
g<-ggplot(healthimpact, aes(x = reorder(EVTYPE, -value), y = value)) + geom_bar(stat = "identity", aes(fill = Fatalities_and_Injuries), position = "dodge") + labs(x="Type of event", y=expression("Total Injuries/Fatalities"), fill = "Type of health impact") + scale_fill_discrete(name = "Type of health impact", labels = c("Fatalities", "Injuries", "Total health impact")) + theme (legend.key.size = unit(0.3, "cm"), legend.key.width = unit(0.3,"cm")) + theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle("Climate events that are most harmful for health population in USA") + theme(plot.title = element_text(hjust = 0.5))
print(g)
To answer question 2) I made an histogram to get the top 10 weather events that have largest cost to economy.
economimpact <- melt(Economic, id.vars = "EVTYPE", variable.name = "Damage_Type")
g<-ggplot(economimpact, aes(x = reorder(EVTYPE, -value), y = value/1e9)) + geom_bar(stat = "identity", aes(fill = Damage_Type), position = "dodge") + labs(x="Event Type", y=expression("Cost/Damage (in billion USD)"), fill = "Type of damage") + scale_fill_discrete(name = "Type of damage", labels = c("Cost of property lost", "Cost of crop lost", "Total economic impact")) + theme (legend.key.size = unit(0.3, "cm"), legend.key.width = unit(0.3,"cm"))+ theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle("Climate events that produces the greatest economic lost in USA") + theme(plot.title = element_text(hjust = 0.5))
print(g)
The results of the analysis show that since 1996 (when there is complete data on the different types of events), the five most damaging events with respect to the health of the population are: tornadoes, excessive heat, floods, lightning and thunderstorm winds, and the largest economic damages are caused by floods with an estimate lost of 148 billions of US dollars, followed by hurricanes/typhoons (71 billions), storm/surges (43 billions) and tornadoes (24.9 billions).