Title: Analysis of data on climatic events in the United States and their impact on the economy and the health of the population

Synopsis

The present study aims to describe the climatic events that are most damaging to the health of the population and produce the greatest economic lost in the United States. The U.S. National Oceanic and Atmospheric Administration (NOAA) storm database is the source data for this task. The analysis estimates deaths, injuries, property damage, and crop damage for each type of event (eg, tornado, flood, storm, hail, etc.). The results of the analysis show that since 1996 (when there is complete data on the different types of events), the five most damaging events with respect to the health of the population are: tornadoes, excessive heat, floods, lightning and thunderstorm winds, and the five largest economic damages are caused by floods, hurricanes/typhoons, storm/surges and tornadoes.

Data processing

Loading libraries and data

setwd("~/Desktop/DST/Reproducible Research - Course Project 2")
library(readr)
library(data.table)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
ClimateData <- read.csv("repdata-data-StormData.csv", header = TRUE, sep=",")

Selecting variables of interest for the analysis of weather impact on health and economy

There are 902.297 observations with 37 variables in the file, to answer the two questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

According to the NOAA the data recording start from Jan. 1950. At that time they recorded one event type, tornado. They add more events gradually and only from Jan. 1996 they start recording all events type (48 event types). Since our objective is comparing the effects of different weather events, in order to have o comparable basis for the analysis, the dataset is limited to the observations between 1996 and 2011.

Following these premises, the only relevant information for the analysis are the following variables:

Event outcome variable:

EVTYPE: weather event (Tornados, Wind, Snow, Flood)

Date variable

BGN_TIME: begin date of the event

Health variables:

FATALITIES: number of deaths as result of events
INJURIES: number of injuries as result of events

Economic variables:

PROPDMG: property damages in USD
PROPDMGEXP: the units multiplier for property damage (K, M, or B)
CROPDMG: crop damages in USD
CROPDMGEXP: the units multiplier for crop damage (K, M, or B)

Formatting the BGN_DATE variable as a date and selecting only data from january 1996 which has complete information about climate event type.

ClimateData$BGN_DATE <- as.Date(ClimateData$BGN_DATE, "%m/%d/%Y")
NewData<-ClimateData %>% select("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP") %>% filter(BGN_DATE >= as.Date("1996-01-01") & BGN_DATE <= as.Date("2012-01-01"))

Subsetting the date, we get a data frame with 653530 observations and 8 variables

dim(NewData)
## [1] 653530      8

Next step is to select the data where fatalities, injuries, or damages occurred.

NewData <- as.data.table(NewData)
NewData <- NewData[(EVTYPE != "?" & (INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG > 0)), c("EVTYPE", "BGN_DATE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Cleaning and tyding PROPDMGEXP and CROPDMGEXP columns

Next step is the need to convert both, the exponent values of crop damage (“CROPDMGEXP”) and property damage variables (“PROPDMGEXP”) as follows:

  • H = hundreds = 1^2 
  • K = thousands = 1^3 
  • M = millions = 1^6 
  • B = billions = 1^9 
  • (+) = 10^0 
  • (-) = 10^0 
  • (?) = 10^0 
  • “-” = 10^0 
  • 0 = 10^0 

Specifying the columns with variables PROPDMGEXP and CROPDMGEXP in a vector named cols

cols <- c("PROPDMGEXP", "CROPDMGEXP") 

The result is assigned to the columns specified in cols, and .SDcols (the Subset of the Data) tells the call (character translation: “toupper”) that we’re only looking at those columns

NewData[, (cols) := c(lapply(.SD, toupper)), .SDcols = cols]

A new variable named ConvPROPDMG has the new values of exponent trasnformation for property damage

ConvPROPDMG <- c("-" = 10^0, "+" = 10^0, "0" = 10^0, "1" = 10^1, "2" = 10^2, "3" = 10^3,"4" = 10^4, "5" = 10^5, "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, "H" = 10^2, "K" = 10^3, "M" = 10^6, "B" = 10^9)

A new variable named ConvCROPDMG has the new values of exponent trasnformation for crop damage

ConvCROPDMG <- c("?" = 10^0, "0" = 10^0, "K" = 10^3, "M" = 10^6, "B" = 10^9)

The new variable ConvPROPDMG with transformed exponent is assigned to PROPDMGEXP variable

NewData[, PROPDMGEXP := ConvPROPDMG[as.character(NewData[,PROPDMGEXP])]] 

NA values in PROPDMGEXP are assigned a 10^0 value

NewData[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]

The new variable ConvCROPDMG with transformed exponent is assigned to CROPDMGEXP variable

NewData[, CROPDMGEXP := ConvCROPDMG[as.character(NewData[,CROPDMGEXP])] ]

NA values in CROPDMGEXP are assigned a 10^0 value

NewData[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]

Analysis

Creating two new columns of Property and Crop costs, multiplying the variables PROPDMG by PROPDMGEXP, and CROPDMG by CROPMGEXP.

NewData <- NewData[, .(EVTYPE, BGN_DATE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, PROPCOST = PROPDMG * PROPDMGEXP, CROPDMG, CROPDMGEXP, CROPCOST = CROPDMG * CROPDMGEXP)]

Health Impact

To get the impact on health population, I estimate the total of Fatalities and Injuries for each event.

Health <- NewData[, .(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), TOTALHEALTH = sum(FATALITIES) + sum(INJURIES)), by = .(EVTYPE)]
Health <- Health[order(-TOTALHEALTH), ]
Health <- Health[1:10, ]

The 10 most harmful events are described

head(Health, 10)
##                EVTYPE FATALITIES INJURIES TOTALHEALTH
##  1:           TORNADO       1511    20667       22178
##  2:    EXCESSIVE HEAT       1797     6391        8188
##  3:             FLOOD        414     6758        7172
##  4:         LIGHTNING        651     4141        4792
##  5:         TSTM WIND        241     3629        3870
##  6:       FLASH FLOOD        887     1674        2561
##  7: THUNDERSTORM WIND        130     1400        1530
##  8:      WINTER STORM        191     1292        1483
##  9:              HEAT        237     1222        1459
## 10: HURRICANE/TYPHOON         64     1275        1339

Economic Impact

To get the impact on economic lost, I estimate the total of Property Cost and Crop Cost for each event.

Economic <- NewData[, .(PROPCOST = sum(PROPCOST), CROPCOST = sum(CROPCOST), TOTALECONOMICS = sum(PROPCOST) + sum(CROPCOST)), by = .(EVTYPE)]
Economic <- Economic[order(-TOTALECONOMICS), ]
Economic <- Economic[1:10, ]

The 10 most harmful events are described

head(Economic, 10)
##                EVTYPE     PROPCOST    CROPCOST TOTALECONOMICS
##  1:             FLOOD 143944833550  4974778400   148919611950
##  2: HURRICANE/TYPHOON  69305840000  2607872800    71913712800
##  3:       STORM SURGE  43193536000        5000    43193541000
##  4:           TORNADO  24616945710   283425010    24900370720
##  5:              HAIL  14595143420  2476029450    17071172870
##  6:       FLASH FLOOD  15222203910  1334901700    16557105610
##  7:         HURRICANE  11812819010  2741410000    14554229010
##  8:           DROUGHT   1046101000 13367566000    14413667000
##  9:    TROPICAL STORM   7642475550   677711000     8320186550
## 10:         HIGH WIND   5247860360   633561300     5881421660

Results

Health Impact

To answer question 1), I made an histogram to get the top 10 weather events that are most harmful to health population.

healthimpact <- melt(Health, id.vars = "EVTYPE", variable.name = "Fatalities_and_Injuries")
g<-ggplot(healthimpact, aes(x = reorder(EVTYPE, -value), y = value)) + geom_bar(stat = "identity", aes(fill = Fatalities_and_Injuries), position = "dodge") + labs(x="Type of event", y=expression("Total Injuries/Fatalities"), fill = "Type of health impact") + scale_fill_discrete(name = "Type of health impact", labels = c("Fatalities", "Injuries", "Total health impact")) + theme (legend.key.size = unit(0.3, "cm"), legend.key.width = unit(0.3,"cm")) + theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle("Climate events that are most harmful for health population in USA") + theme(plot.title = element_text(hjust = 0.5))
print(g)

Economic Impact

To answer question 2) I made an histogram to get the top 10 weather events that have largest cost to economy.

 economimpact <- melt(Economic, id.vars = "EVTYPE", variable.name = "Damage_Type")
g<-ggplot(economimpact, aes(x = reorder(EVTYPE, -value), y = value/1e9)) + geom_bar(stat = "identity", aes(fill = Damage_Type), position = "dodge") + labs(x="Event Type", y=expression("Cost/Damage (in billion USD)"), fill = "Type of damage") + scale_fill_discrete(name = "Type of damage", labels = c("Cost of property lost", "Cost of crop lost", "Total economic impact")) + theme (legend.key.size = unit(0.3, "cm"), legend.key.width = unit(0.3,"cm"))+ theme(axis.text.x = element_text(angle=90, hjust=1)) + ggtitle("Climate events that produces the greatest economic lost in USA") + theme(plot.title = element_text(hjust = 0.5))
print(g)

Conclusion

The results of the analysis show that since 1996 (when there is complete data on the different types of events), the five most damaging events with respect to the health of the population are: tornadoes, excessive heat, floods, lightning and thunderstorm winds, and the largest economic damages are caused by floods with an estimate lost of 148 billions of US dollars, followed by hurricanes/typhoons (71 billions), storm/surges (43 billions) and tornadoes (24.9 billions).