Hello! I hope you are in good health as well as your family in these times of pandemic.

SYNOPSIS

In this study we are going to analyze the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database.

The main objective with this study is to determine which are the meteorological events that more damage the health and which events cause the most economic damages.

The study concludes that the worst economic losses are caused by floods while the greatest health impacts are caused by tornadoes.

Let’s go!

INTRODUCTION

Over the years and with the development of science, human beings have become more concerned with the study of natural phenomena. Some of these phenomena are more severe than others, which is why it is essential to be able to characterize them and use that information to take planned and informed action.

In this work, we will focus on studying the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which contains very important information about injures, fatalities, property damages, dates.

DATA FOR THE ANALYSIS

The data for the analysis can be downloaded for the web site:

Dataset: Storm Data [47Mb]

With this dataset we are going to try to find the answer to the next questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

DESCRIPTION OF THE DATASET

The dataset consists of events since 1950 until 2011. It´s compound by 902297 observations (rows) and 37 variables (columns). Of these the principal data required to evaluate the economic and health consequences of various weather events are:

GENERAL DATA DOWNLOAD

To guarantee reproducibility in the documents, which is one of the objectives of the course, we generated a code that allows us to create a directory in our work desk and to be able to deposit the information of the data base there to later analyze it.

# Test for a directory, if it doesn´t exists then define the data directory
Dir <- "./workf"
if(!dir.exists(Dir)){
    dir.create(Dir)
}

# Define the data file and destfile 
dest_data <- paste(Dir, "StormData.csv.bz2", sep="/")

# Downloading file
if(!file.exists(dest_data)){
    dataUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(dataUrl, destfile = dest_data)
}

DATA PROCCESING

With the data downloaded, it is first loaded into R and tidied. To simplify further processing a new dataframe is created containing just the 7 relevant variables. At this stage important libraries are loaded for use with data manipulation and plotting.

Read the storm data, preprocess into a new dataframe and load libraries

library(plyr)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.0.3
#library(ggplot2)

Readstorm  <- read.csv(dest_data)
stormframe <- data.frame(Readstorm$EVTYPE, Readstorm$FATALITIES,
                 Readstorm$INJURIES, Readstorm$PROPDMG, Readstorm$PROPDMGEXP,
                 Readstorm$CROPDMG, Readstorm$CROPDMGEXP)
names(stormframe ) <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
                          "PROPDMGEXP", "CROPDMG", "CROPDMGEXP") 

We´ll explore and tidy the dataset with the values of the property and crop exponent variables, PROPDMGEXP and CROPDMGEXP.

unique(stormframe $PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(stormframe$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

TRANSFORMATING DATA

There is a number for letter designators, only the numerical powers. The lower case letters were converted to upper case since they are equivalent to the same prefix (H=100, K=1000, H=1000000, etc).

We convert lower case into upper case

stormframe$PROPDMGEXP <- toupper(as.character(stormframe$PROPDMGEXP))
stormframe$CROPDMGEXP <- toupper(as.character(stormframe$CROPDMGEXP))

We assign zero “0” to the missing values as they haven’t associated cost

stormframe$CROPDMG[(stormframe$CROPDMG == "")] <- 0
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "")] <- 0
stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "")] <- 0
stormframe$FATALITIES[(stormframe$FATALITIES == "")] <- 0
stormframe$INJURIES[(stormframe$INJURIES == "")] <- 0

We assign letter codes in the S.I.U. to correct numerical values

stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "H")] <- 2
stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "K")] <- 3
stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "M")] <- 6
stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "B")] <- 9
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "H")] <- 2
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "K")] <- 3
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "M")] <- 6
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "B")] <- 9

We define the wrongly defined exponents as “NA” so that they can be eliminated

stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "+")] <- "NA"
stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "?")] <- "NA"
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "?")] <- "NA"
stormframe$CROPDMGEXP[(stormframe$CROPDMGEXP == "-")] <- "NA"
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "-")] <- "NA"
stormframe$PROPDMGEXP[(stormframe$PROPDMGEXP == "+")] <- "NA"

We convert the exponents as “integers” to manipulate them into mathematical operations

stormframe$PROPDMGEXP <- as.integer(stormframe$PROPDMGEXP)
## Warning: NAs introducidos por coerción
stormframe$CROPDMGEXP <- as.integer(stormframe$CROPDMGEXP)
## Warning: NAs introducidos por coerción

ANALYSIS OF THE DATA

We will find the total cost of the damage. To do this, we will calculate the costs associated with property and crop damage from the exponent and mantissa values.

We will calculate the crop and property damage cost

stormframe$PROPDMGTOTAL <- stormframe$PROPDMG * 10^stormframe$PROPDMGEXP
stormframe$CROPDMGTOTAL <- stormframe$CROPDMG * 10^stormframe$CROPDMGEXP

Total financial value of the damage

stormframe$TOTALDMG <- stormframe$PROPDMGTOTAL + stormframe$CROPDMGTOTAL

We’ll aggregate all the data to find the totals as a function of EVTYPE for each of the summary variables Fatalities, Injuries, Property Damage, Crop Damage and Total Financial Damage.

Data aggregated as function of EVTYPE and summing the summaries

fatalities_EVTYPE  <- aggregate(FATALITIES ~ EVTYPE, data = stormframe, FUN=sum)
injuries_EVTYPE    <- aggregate(INJURIES ~ EVTYPE, data = stormframe, FUN=sum)
propdamage_EVTYPE  <- aggregate(PROPDMGTOTAL ~ EVTYPE, data = stormframe, FUN=sum)
cropdamage_EVTYPE  <- aggregate(CROPDMGTOTAL ~ EVTYPE, data = stormframe, FUN=sum)
sumdamage_EVTYPE   <- aggregate(TOTALDMG ~ EVTYPE, data = stormframe, FUN=sum)

Merge these into a single dataframe by EVTYPE:

s.sum <- merge(fatalities_EVTYPE, injuries_EVTYPE, by="EVTYPE", all=TRUE)
s.sum <- merge(s.sum, propdamage_EVTYPE, by="EVTYPE", all=TRUE)
s.sum <- merge(s.sum, cropdamage_EVTYPE, by="EVTYPE", all=TRUE)
s.sum <- merge(s.sum, sumdamage_EVTYPE, by="EVTYPE", all=TRUE)

Sort the dataframe by each of the summary variables and extract the first 15 rows…

fatalities_EVTYPE  <- s.sum[order(s.sum$FATALITIES, decreasing=TRUE),][1:15,]
injuries_EVTYPE    <- s.sum[order(s.sum$INJURIES, decreasing=TRUE),][1:15,]
propdamage_EVTYPE <- s.sum[order(s.sum$PROPDMGTOTAL, decreasing=TRUE),][1:15,]
cropdamage_EVTYPE  <- s.sum[order(s.sum$CROPDMGTOTAL, decreasing=TRUE),][1:15,]
sumdamage_EVTYPE   <- s.sum[order(s.sum$TOTALDMG, decreasing=TRUE),][1:15,]

RESULTS

In this section we will present the results of the previous analyses, make some exploratory graphs and observe some interesting conclusions.

By graphing the data, we realize that the greatest impact to life and in terms of injuries is assigned to tornadoes.

par(mfrow=c(1,2), mar=c(8,4,3,2), oma=c(4,2,2,2), cex=0.8)
barplot(fatalities_EVTYPE$FATALITIES, names.arg=fatalities_EVTYPE$EVTYPE, las=3,
    cex.names=0.6, xlab="", ylab="TOTAL NUMBER OF FATALITIES", col="magenta",
    main="WEATHER EVENTS WITH HIGHEST INCIDENT OF FATALITIES")
barplot(injuries_EVTYPE$INJURIES, names.arg=injuries_EVTYPE$EVTYPE, las=3, cex.names=0.6,
    xlab="", ylab="TOTAL NUMBER OF INJURIES", col="Orange", main="WEATHER EVENTS WITH HIGHEST INCIDENCE OF INJURIES")

Analyzing the data, we realize that the greatest financial impact to properties is caused by flooding, while the greatest financial impact to crops is caused by drought.

par(mfrow=c(1,2), mar=c(8,4,3,2), oma=c(4,2,2,2), cex=0.8)
barplot(propdamage_EVTYPE$PROPDMGTOTAL/10^6, names.arg=propdamage_EVTYPE$EVTYPE, las=3, cex.names=0.6, xlab="", ylab="PROPERTY DAMAGE IN USD (Millions)",
col="blue", main="WEATHER EVENTS WITH HIGEST COST IN PROPERTY DAMAGE")

barplot(cropdamage_EVTYPE$CROPDMGTOTAL/10^6, names.arg=cropdamage_EVTYPE$EVTYPE, las=3,
    cex.names=0.6, xlab="", ylab="CROP DAMAGE IN USD (Millions)",
    col="yellow", main="WEATHER EVENTS WITH HIGHEST COST IN CROP DAMAGE")