Reproducible Research: Course Project 2

C. Saquel
8 November 2018

Sinopsys

Storms and other severe weather events can cause public health and economic problems for communities and municipalities. Many serious events can cause deaths, injuries and property damage, and the prevention of such results as much as possible is a key concern.

This project involves exploring the storm database of the National Oceanic and Atmospheric Administration of the United States (NOAA). This database tracks the characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of deaths, injuries and property damage.

The following are the most significant events with respect to the damage to the health of the population and the greatest economic consequences.

Data Processing

The data corresponds to a database that contains climatic events that occurred in the United States between 1950 and the year 2011. You can download the file from the website:

Storm Data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.:

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
folder <- "C:/Users/HP/Documents/Data Science/Reproducible Research/Week 4/Project"
setwd(folder)
download.file(url, "StormData.csv.bz2")
StormData <- read.csv("StormData.csv.bz2")

Once the data has been downloaded, we can see the variables it contains:

names(StormData)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

From these data we will select the ones that interest us:

EVTYPE
FATALITIES
INJURIES
PROPDMG
PROPDMGEXP
CROPDMG
CROPDMGEXP

Then we select the data associated with the damage to the health of the population and the greatest economic consequences, in the variables DataMostHarmful and DataEconConseq respectively. In addition we make lowercase the names of the variables.

library(dplyr)
namesSD <- names(StormData)
DataMostHarmful <- select(StormData,namesSD[c(8,23:24)])
names(DataMostHarmful) <- tolower(names(DataMostHarmful))
DataEconConseq <- select(StormData,namesSD[c(8,25:28)])
names(DataEconConseq) <- tolower(names(DataEconConseq))

Then the data of fatalities and injuries are grouped and ordered according to the type of event (evtype).

MostHarmful <- summarise(group_by(DataMostHarmful, evtype), fatalities = sum(fatalities, na.rm = TRUE), injuries = sum(injuries, na.rm = TRUE))
MostHarmfulFat <- arrange(MostHarmful,desc(fatalities))
MostHarmfulInj <- arrange(MostHarmful,desc(injuries))

For economic damage a similar action is taken, the data of property damage (propdmg) and damage to crops (cropdmg) are grouped and ordered according to the type of event (evtype). In this case, you must first adjust the variables according to the orders of magnitude described in the variables ** propdmgexp ** and ** cropdmgexp **.

levels(DataEconConseq$propdmgexp)

##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"

levels(DataEconConseq$cropdmgexp)

## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

The alphabetic characters used to signify magnitude include “K” for thousands, “M” for millions and “B” for billions. For the rest of the symbols, the values indicated on the website How to handle the exponent value of PROPDMGEXP and CROPDMGEXP are considered, delivers values for multipliers not mentioned in the official information.

valuesMult <- unique(c(levels(DataEconConseq$propdmgexp),levels(DataEconConseq$cropdmgexp)))
Mult <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,100,100,10^3,10^6,10^6,10^3)
convert <- data.frame( valuesMult = valuesMult, Mult = Mult)
tbl_df(convert)

## # A tibble: 20 x 2
##    valuesMult       Mult
##    <fct>           <dbl>
##  1 ""                  0
##  2 -                   0
##  3 ?                   0
##  4 +                   1
##  5 0                  10
##  6 1                  10
##  7 2                  10
##  8 3                  10
##  9 4                  10
## 10 5                  10
## 11 6                  10
## 12 7                  10
## 13 8                  10
## 14 B          1000000000
## 15 h                 100
## 16 H                 100
## 17 K                1000
## 18 m             1000000
## 19 M             1000000
## 20 k                1000

This generates a variable ** convert** used to do the corresponding multiplication and thus group and order the data.

DataEconConseq$propdmgMult <- convert$Mult[match(DataEconConseq$propdmgexp, convert$valuesMult)]
DataEconConseq$cropdmgMult <- convert$Mult[match(DataEconConseq$cropdmgexp, convert$valuesMult)]
DataEconConseq$propdmgMult <- DataEconConseq$propdmgMult*DataEconConseq$propdmg
DataEconConseq$cropdmgMult <- DataEconConseq$cropdmgMult*DataEconConseq$cropdmg

EconConseq <- summarise(group_by(DataEconConseq, evtype), propdmg = sum(propdmgMult, na.rm = TRUE), cropdmg = sum(cropdmgMult, na.rm = TRUE))
EconConseq <- mutate(EconConseq, total = propdmg + cropdmg)
EconConseqProp <- arrange(EconConseq,desc(propdmg))
EconConseqCrop <- arrange(EconConseq,desc(cropdmg))
EconConseqTotal<- arrange(EconConseq,desc(total))

Results

Damage to the health of the population.

In the following table you can see that in the case of fatalities and injuries, the most significant events tend to repeat.

arrange(merge(head(MostHarmfulFat,10),head(MostHarmfulInj,10),all = TRUE),desc(fatalities),desc(injuries))

##               evtype fatalities injuries
## 1            TORNADO       5633    91346
## 2     EXCESSIVE HEAT       1903     6525
## 3        FLASH FLOOD        978     1777
## 4               HEAT        937     2100
## 5          LIGHTNING        816     5230
## 6          TSTM WIND        504     6957
## 7              FLOOD        470     6789
## 8        RIP CURRENT        368      232
## 9          HIGH WIND        248     1137
## 10         AVALANCHE        224      170
## 11 THUNDERSTORM WIND        133     1488
## 12         ICE STORM         89     1975
## 13              HAIL         15     1361

The top 10 events with the highest total fatalities and injuries are shown below.

library(ggplot2)
library(gridExtra)
library(grid)
n <- 10
p1 <- ggplot(data=head(MostHarmfulFat,n), aes(x=reorder(tolower(evtype), fatalities), y=fatalities)) +   geom_bar(fill="royalblue",stat="identity", width = 0.9)  + coord_flip() + 
    ylab("Total number of fatalities") + xlab("Event type") +
    theme(legend.position="none")

p2 <- ggplot(data=head(MostHarmfulInj,n), aes(x=reorder(tolower(evtype), injuries), y=injuries)) +
    geom_bar(fill="firebrick3",stat="identity") + coord_flip() +
    ylab("Total number of injuries") + xlab("Event type") 
grid.arrange(p1, p2, nrow = 2, top = "Health impact of weather events in the US - Top 10")

We can see that tornadoes are the main reason for injuries and deaths that affect the health of the population.

Greatest economic consequences.

In the following table you can see that in the case of property damage, crop damage y total damage, the most significant events tend to repeat. (total damage is property damage + crop damage)

arrange(merge(head(EconConseqTotal,10),merge(head(EconConseqProp,10),head(EconConseqCrop,10), all = TRUE), all = TRUE),desc(total),desc(propdmg),desc(cropdmg))

##               evtype      propdmg     cropdmg        total
## 1              FLOOD 144657709800  5661968450 150319678250
## 2  HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 3            TORNADO  56937162897   414954710  57352117607
## 4        STORM SURGE  43323536000        5000  43323541000
## 5               HAIL  15732269877  3025954650  18758224527
## 6        FLASH FLOOD  16140815011  1421317100  17562132111
## 7            DROUGHT   1046106000 13972566000  15018672000
## 8          HURRICANE  11868319010  2741910000  14610229010
## 9        RIVER FLOOD   5118945500  5029459000  10148404500
## 10         ICE STORM   3944928310  5022113500   8967041810
## 11    TROPICAL STORM   7703890550   678346000   8382236550
## 12      WINTER STORM   6688497260    26944000   6715441260
## 13         HIGH WIND   5270046280   638571300   5908617580
## 14      EXTREME COLD     67737400  1292973000   1360710400
## 15      FROST/FREEZE      9480000  1094086000   1103566000

The top 10 events with the highest total property damage, crop damage y total damage are shown below.

n <- 10
p1 <- ggplot(data=head(EconConseqProp,n), aes(x=reorder(tolower(evtype), propdmg), y=propdmg/1000000)) +
    geom_bar(fill="royalblue",stat="identity", width = 0.9)  + coord_flip() + 
    ylab("Property damage (MUS$)") + xlab("Event type") +
    theme(legend.position="none")
p2 <- ggplot(data=head(EconConseqCrop,n), aes(x=reorder(tolower(evtype), cropdmg), y=cropdmg/1000000)) +
    geom_bar(fill="firebrick3",stat="identity") + coord_flip() +
    ylab("Crop damage (MUS$)") + xlab("Event type") + scale_y_continuous(limit = c(0,max(EconConseqTotal$total/1000000))) + theme(legend.position="none")
p3 <- ggplot(data=head(EconConseqTotal,n), aes(x=reorder(tolower(evtype), total), y=total/1000000)) +
    geom_bar(fill="darkolivegreen3",stat="identity") + coord_flip() +
    ylab("Property + crop damage (MUS$)") + xlab("Event type") + scale_y_continuous(limit = c(0,max(EconConseqTotal$total/1000000))) + 
    theme(legend.position="none")
grid.arrange(p1, p2, p3, nrow = 3, top = "Economic Consequences of weather events in the US - Top 10")

We can see that floods, storm surges, hurricanes and tornadoes are the ones that contribute most to damage to property.
In the case of damage to crops, drought is the one that contributes the most to crop damage.
Globally, damage to crops is not significant compared to damage to property.