knitr::opts_chunk$set(echo = TRUE, fig.width = 15, fig.height = 7)

Synopsis

Storms and other violent weather phenomena have been always a real concern for societies, as they cause many human deaths and injuries, as well as economic damages. For this project, we have explored a dataset from U.S. National Oceanic and Atmospheric Administration’s (NOAA), which includes these kinds of events from 1960’s to 2011.

After decompressing the file into a .csv one, we focused on both human fatalities and injuries, and economic damages(crops and properties). The total number of deaths and injuries were aggregated separatedly, while crops and property damage were added in the same variable.

The top 10 plots of these 3 variables show tornados comprise the biggest number of human deaths and injuries. In adittion, the majority of events appear in both plots, therefore there is a correlation between deaths and injuries. On the other hand, floods (followed by hurricanes and tornados) are the costliest events, being most of this damage being taken by crops.

Data processing

Since the dataset used for this assignment is a .bz2 file, we’ll have to use a different command to decompress it and turn it into a .csv one. When it comes to load to RStudio, we are using the normal csv() function:

website <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(website, "./Storms.bz2")
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
##     warnings
bunzip2("./Storms.bz2", "./Storms.csv")
NOAA <- read.csv("./Storms.csv", header = TRUE, sep = ",")

Since this project is focused in both human (fatalities and injuries) and economic (property and crop) damages, we’ll subset those columns and save it into another variable:

summed_NOAA <- NOAA[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

When looking into this simplified dataset, we can see that, when there are either crop or property damages, the “EXP” value in the contiguous column can be “K” (thousands) or “M” (millions), amongst others. So it’s key to take this into account when plotting.

Casualties

So, first we’ll focus on human damages. The procedure here would be first to sort the event types by the total number of deaths and total number of injuries in different datasets.The number of deaths and injuries will be represented on different plots, in order to compare if there are significant differences. For the sake of simplicity, we’ll only keep the top 10 event types for each dataset:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
NOAA_fatalities <- summed_NOAA %>% select(EVTYPE, FATALITIES) %>% group_by(EVTYPE) %>% summarise(FATALITIES= sum(FATALITIES)) %>% arrange(desc(FATALITIES))
## `summarise()` ungrouping output (override with `.groups` argument)
Meaningful_fatalities <- NOAA_fatalities[1:10,]
NOAA_injuries <- summed_NOAA %>% select(EVTYPE, INJURIES) %>% group_by(EVTYPE) %>% summarise(INJURIES= sum(INJURIES)) %>% arrange(desc(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
Meaningful_injuries <- NOAA_injuries[1:10,]

Economic damages

table(summed_NOAA$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5      6 
## 465934      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330
table(summed_NOAA$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Remember when I said that values in “EXP” columns could have different values? These are the possible values that can appear. We’ll convert the crop and property damages in this way: - Missing values, as well as “-”, “?”, “+”, and “0” = 0 zeros added. - “1-8” = matching zeros. - “H”, “h” = 2 zeros. - “K”, “k” = 3 zeros. - “M”, “m” = 6 zeros. - “B”, “b” = 9 zeros.

NOAA_economic <- summed_NOAA[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
NOAA_economic$PROPDMGEXP <- toupper(NOAA_economic$PROPDMGEXP)
NOAA_economic$CROPDMGEXP <- toupper(NOAA_economic$CROPDMGEXP)

NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "")] <- 10^0
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "-")] <- 10^0
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "?")] <- 10^0
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "+")] <- 10^0
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "0")] <- 10^0
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "1")] <- 10^1
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "2")] <- 10^2
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "3")] <- 10^3
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "4")] <- 10^4
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "5")] <- 10^5
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "6")] <- 10^6
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "7")] <- 10^7
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "8")] <- 10^8
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "h")] <- 10^2
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "H")] <- 10^2
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "K")] <- 10^3
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "m")] <- 10^6
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "M")] <- 10^6
NOAA_economic$PROPFACTOR[(NOAA_economic$PROPDMGEXP == "B")] <- 10^9

NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "")] <- 10^0
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "?")] <- 10^0
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "0")] <- 10^0
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "2")] <- 10^2
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "k")] <- 10^3
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "K")] <- 10^3
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "m")] <- 10^6
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "M")] <- 10^6
NOAA_economic$CROPFACTOR[(NOAA_economic$CROPDMGEXP == "B")] <- 10^9

Once each factor has been associated with its matching exponent, we can now calculate the total economic damage for every type event, keeping the 10 top as in the case of human casualties:

NOAA_economic <- mutate(NOAA_economic, ECONOMICDMG = PROPDMG * PROPFACTOR + CROPDMG * CROPFACTOR)
NOAA_bankruptcy <- NOAA_economic %>% select(EVTYPE, ECONOMICDMG) %>% group_by(EVTYPE) %>% summarise(ECONOMICDMG= sum(ECONOMICDMG)) %>% arrange(desc(ECONOMICDMG))
## `summarise()` ungrouping output (override with `.groups` argument)
Meaningful_bankruptcy <- NOAA_bankruptcy[1:10,]

Results

library(ggplot2)
g <- ggplot(Meaningful_fatalities, aes(reorder(EVTYPE, desc(FATALITIES)), FATALITIES))
g + geom_bar(stat = "identity") + xlab("Event type") + ylab("Total human deaths") + ggtitle("Top 10 deadliest event types")

g2 <- ggplot(Meaningful_injuries, aes(reorder(EVTYPE, desc(INJURIES)), INJURIES))
g2 + geom_bar(stat = "identity") + xlab("Event type") + ylab("Total human injuries") + ggtitle("Top 10 event types with the most number of injuries")

g3 <- ggplot(Meaningful_bankruptcy, aes(reorder(EVTYPE, desc(ECONOMICDMG)), ECONOMICDMG))
g3 + geom_bar(stat = "identity") + xlab("Event type") + ylab("Total economic damage(crops and properties) in $USD") + ggtitle("Top 10 event types with the most economic damages")

When comparing the plots for human fatalities and injuries, we can see that tornados are by far the most costly events in human terms. Among the rest of the top 10, the order changes between both plots, but we see that top 7 deadliest evets appear in the other one, showing some correlation between deaths and injuries.

Things change when we look at plot of economic damages, floods here are the most harmful events in economic terms, followed by hurricanes and tornados. As floods and hurricanes dont’ cause that many human deaths in comparison with tornados, we can tell most of these damages are caused in crops, meaning that houses are relatively well prepared for floods and hurricanes, but not for tornados. Consequently, this could be a key point while building new houses.