Exploring storm database of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) - Health and Economic Impacts

Sipnosis

According to the data supplied by of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) - Health and Economic Impacts it is evident that in terms of injuries and deaths caused by meteorological events in the United States, Tornados are the main causes of this with a total of USD$5,636 deaths in the period covered and a total of 91407 injuries. In the case of economic damage to property, Tornados are also the main protagonist of these with a cost of USD$ 3,216 million and in the case of economic damage to crops it is USD$ 586 thousand dollars.

Packages used

paqs <- function(pkg){
        new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
        if (length(new.pkg)) 
                install.packages(new.pkg, dependencies = TRUE)
        sapply(pkg, require, character.only =TRUE) 
} 
#This is a function to automate the installation of packages.


packages <- c("tidyverse","viridis") #packages to use in the project

library (tidyverse)
library(viridis)

Import data for analysis

#With the data previously downloaded in our working directory, we proceed to load them

if(!exists("storm")) {
    storm <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)
}

#Selecting data of interest

variables <- c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm <- storm[, variables]

attach(storm)

Exploratory analysis

str(storm)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

of the 7 variables that are observed, there are four that are numerical and three are string. Now Performing statistical summary of each variable.

summary(storm)
##     EVTYPE            FATALITIES          INJURIES            PROPDMG       
##  Length:902297      Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00  
##  Class :character   1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00  
##  Mode  :character   Median :  0.0000   Median :   0.0000   Median :   0.00  
##                     Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06  
##                     3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50  
##                     Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:902297      Min.   :  0.000   Length:902297     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.527                     
##                     3rd Qu.:  0.000                     
##                     Max.   :990.000

Observing if exists missing values in dataset

apply(is.na(storm), 2, sum)
##     EVTYPE FATALITIES   INJURIES    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP 
##          0          0          0          0          0          0          0

Don’t exists missing values in dataset.

Prosessing data

First we preprocess the EVENTYPE variable, for this we will create a new variable in the data set with the main climatic events, for this we will use regular expressions with the grep () function.

# New variable
storm$EVENTS <- "OTHER"


# inputing event categories
storm$EVENTS[grep("WIND", storm$EVTYPE, ignore.case = TRUE)] <- "WIND"
storm$EVENTS[grep("TORNADO", storm$EVTYPE, ignore.case = TRUE)] <- "TORNADO"
storm$EVENTS[grep("HEAT", storm$EVTYPE, ignore.case = TRUE)] <- "HEAT"
storm$EVENTS[grep("FLOOD", storm$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
storm$EVENTS[grep("SNOW", storm$EVTYPE, ignore.case = TRUE)] <- "SNOW"
storm$EVENTS[grep("STORM", storm$EVTYPE, ignore.case = TRUE)] <- "STORM"
storm$EVENTS[grep("WINTER", storm$EVTYPE, ignore.case = TRUE)] <- "WINTER"
storm$EVENTS[grep("RAIN", storm$EVTYPE, ignore.case = TRUE)] <- "RAIN"
storm$EVENTS[grep("HAIL", storm$EVTYPE, ignore.case = TRUE)] <- "HAIL"

# The variable EVTYPE is no longer necessary, therefore I proceed to exclude it from the database

storm <- storm[, -1]

#Observing EVENT variable
table(storm$EVENTS)
## 
##   FLOOD    HAIL    HEAT   OTHER    RAIN    SNOW   STORM TORNADO    WIND  WINTER 
##   82689  290401    2648   48970   12241   17636  113086   60699  254323   19604

The PROPDMGEXP and CROPDMGEXP variables are represented in different monetary units, therefore, we will unify the monetary criteria of both variables.

# observing quantity of observations by symbols of PROPDMGEXP and CROPDMGEXP.

sort(table(storm$PROPDMGEXP), decreasing = T)[1:10]
## 
##             K      M      0      B      5      1      2      ?      m 
## 465934 424665  11330    216     40     28     25     13      8      7
sort(table(storm$CROPDMGEXP), decreasing = T)[1:15]
## 
##             K      M      k      0      B      ?      2      m   <NA>   <NA> 
## 618413 281832   1994     21     19      9      7      1      1               
##   <NA>   <NA>   <NA>   <NA> 
## 

The variables PROPDMGEXP and CROPDMGEXP contain the symbologies of the motor units of the variables PROPDMG and CROPDMG respectively, being:

  • K = Thousands of Dollars —> 10^3

  • M = Millions of Dollars —> 10^6

  • B = Billions of Dollars —> 10^9

We will treat the other existing symbols as dollar units, that is, 10^0.

We will start by modifying the PROPDMGEXP y CROPDMGEXP variables. once the PROPDMGEXP and CROPDMGEXP variables have been recoded according to their symbol, we will proceed to multiply PROPDMG and CROPDMG by the PROPDMGEXP and CROPDMGEXP variables respectively and homogenize the monetary units. We will also create two new variables that will contain the total damage of the population, this through the sum of the variables INJURY and FATALITIES. These two variables will be called PROPTOTALDMG and CROPTOTALDMG respectively.

###############PROPDMGEXP##########################################################
storm$PROPDMGEXP <- as.character(storm$PROPDMGEXP) #we first convert this variable to type character.

storm$PROPDMGEXP[is.na(storm$PROPDMGEXP)] <- 0 #If there are missing values, we will convert it to an exponent of 0

storm$PROPDMGEXP[!grepl("K|M|B", storm$PROPDMGEXP, ignore.case = TRUE)] <- 0 
storm$PROPDMGEXP[grep("K", storm$PROPDMGEXP, ignore.case = TRUE)] <- "3"
storm$PROPDMGEXP[grep("M", storm$PROPDMGEXP, ignore.case = TRUE)] <- "6"
storm$PROPDMGEXP[grep("B", storm$PROPDMGEXP, ignore.case = TRUE)] <- "9"
storm$PROPDMGEXP <- as.numeric(as.character(storm$PROPDMGEXP))

storm$PROPTOTALDMG <- storm$PROPDMG * 10^storm$PROPDMGEXP #New variable

###############CROPDMGEXP#########################################################
storm$CROPDMGEXP <- as.character(storm$CROPDMGEXP) #we first convert this variable to type character.

storm$CROPDMGEXP[is.na(storm$CROPDMGEXP)] <- 0 

storm$CROPDMGEXP[!grepl("K|M|B", storm$CROPDMGEXP, ignore.case = TRUE)] <- 0 
storm$CROPDMGEXP[grep("K", storm$CROPDMGEXP, ignore.case = TRUE)] <- "3"
storm$CROPDMGEXP[grep("M", storm$CROPDMGEXP, ignore.case = TRUE)] <- "6"
storm$CROPDMGEXP[grep("B", storm$CROPDMGEXP, ignore.case = TRUE)] <- "9"
storm$CROPDMGEXP <- as.numeric(as.character(storm$CROPDMGEXP))

storm$CROPTOTALDMG <- storm$CROPDMG * 10^storm$CROPDMGEXP #New variable

knitr::kable(storm[1:10, ], caption = "First ten rows of dataset storm", label = "Table 1", align = "c")
First ten rows of dataset storm
FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP EVENTS PROPTOTALDMG CROPTOTALDMG
0 15 25.0 3 0 0 TORNADO 25000 0
0 0 2.5 3 0 0 TORNADO 2500 0
0 2 25.0 3 0 0 TORNADO 25000 0
0 2 2.5 3 0 0 TORNADO 2500 0
0 2 2.5 3 0 0 TORNADO 2500 0
0 6 2.5 3 0 0 TORNADO 2500 0
0 1 2.5 3 0 0 TORNADO 2500 0
0 0 2.5 3 0 0 TORNADO 2500 0
1 14 25.0 3 0 0 TORNADO 25000 0
0 0 25.0 3 0 0 TORNADO 25000 0

Results

To answer the question throughout the United States, what types of events are most damaging to the health of the population? We will make a summary by adding the INJURY and FATALITIES variables by type of event throughout the United States.

DMGPOP <- storm %>% 
        group_by(EVENTS) %>% 
        summarise(INJURIES = sum(INJURIES, na.rm = T),
                  FATALITIES= sum(FATALITIES, na.rm = T),
                  FATALITIES_AND_INJURIES = sum(INJURIES + FATALITIES, na.rm = T)
                  ) %>% 
        arrange(desc(FATALITIES_AND_INJURIES))
DMGPOP <- as.data.frame(DMGPOP)
knitr::kable(DMGPOP, caption = " Deaths and injuries by type of meteorological event", label = "Table 2", align = "c")
Deaths and injuries by type of meteorological event
EVENTS INJURIES FATALITIES FATALITIES_AND_INJURIES
TORNADO 91407 5636 97043
OTHER 12224 2626 14850
HEAT 9224 3138 12362
FLOOD 8602 1524 10126
WIND 8906 1204 10110
STORM 5338 416 5754
WINTER 1891 278 2169
HAIL 1467 45 1512
SNOW 1164 164 1328
RAIN 305 114 419

The weather event that caused the greatest injuries to the population of the United States were the TORNADO , con 9.140710^{4}, tornadoes also caused the highest number of deaths to the population with 5636.

These results can best be seen through a bar graph as shown below.

ggplot(DMGPOP, aes(x = reorder(EVENTS , -INJURIES), y =INJURIES, fill = EVENTS))+
        geom_bar(stat = "identity")+
        geom_text(aes(label = INJURIES), vjust = -0.50)+
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
        labs(x = "EVENTS", y = "INJURIES")+
        ggtitle( "Injuries caused by type of weather event in the US")+
        theme_classic()
Injuries caused by type of weather event in the US

Injuries caused by type of weather event in the US

ggplot(DMGPOP, aes(x = reorder(EVENTS , -FATALITIES), y =FATALITIES, fill = EVENTS))+
        geom_bar(stat = "identity")+
        geom_text(aes(label = FATALITIES), vjust = -0.50)+
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
        labs(x = "EVENTS", y = "FATALITIES")+
        ggtitle( "Fatalities caused by type of weather event in the US")+
        theme_classic()
Fatalities caused by type of weather event in the US

Fatalities caused by type of weather event in the US

It can clearly be seen that tornadoes have been the most damaging to the US population for both deaths and human injuries.

Now to answer the question across America, what kinds of events have the biggest economic consequences? We proceed to group the PROPDMG and CROPDMG data by type of weather event creating a new column with the sum of the economic costs incurred by each event.

ECONOMICDMG <- storm %>% 
  group_by(EVENTS) %>% 
  summarise(PropertyDMG =round(sum(PROPDMG, na.rm = T)/1000000,3),
              HarvestDMG =round(sum(CROPDMG, na.rm = T)/1000000,3),
              TOTAL_DMG = PropertyDMG + HarvestDMG
            ) %>% 
  arrange(desc(TOTAL_DMG))

ECONOMICDMG <- as.data.frame(ECONOMICDMG)

knitr::kable(ECONOMICDMG, caption = "Economic consequences by type of weather event in the united states (millions of dollars)", label = "Table 3", align = "c")
Economic consequences by type of weather event in the united states (millions of dollars)
EVENTS PropertyDMG HarvestDMG TOTAL_DMG
TORNADO 3.216 0.100 3.316
FLOOD 2.434 0.364 2.798
WIND 1.797 0.133 1.930
STORM 1.478 0.097 1.575
HAIL 0.699 0.586 1.285
OTHER 0.896 0.079 0.975
SNOW 0.151 0.002 0.153
WINTER 0.151 0.002 0.153
RAIN 0.059 0.013 0.072
HEAT 0.003 0.001 0.004

Again we can observe the TORNADO as the weather event with the highest economic incidence in the case of property damage for the United States with 3.216. In the case of damage to the harvest, the economic damage was caused by..HAIL with 0.586 millions of dollars in crop damage.

Now we use a bar graph for a better appreciation of the data.

ggplot(ECONOMICDMG, aes(x = reorder(EVENTS , -ECONOMICDMG$PropertyDMG), y =ECONOMICDMG$PropertyDMG, fill = EVENTS))+
        geom_bar(stat = "identity")+
        geom_text(aes(label = ECONOMICDMG$PropertyDMG), vjust = -0.50)+
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
        labs(x = "EVENTS", y = "Millions of dollars (USD$)")+
        ggtitle( "Economic damage to property by type of meteorological event in the United States (Millions of dollars USD$)")+
        theme_classic()
Economic damage to property by type of meteorological event in the United States

Economic damage to property by type of meteorological event in the United States

ggplot(ECONOMICDMG, aes(x = reorder(EVENTS , -ECONOMICDMG$HarvestDMG), y =ECONOMICDMG$HarvestDMG, fill = EVENTS))+
        geom_bar(stat = "identity")+
        geom_text(aes(label = ECONOMICDMG$HarvestDMG), vjust = -0.50)+
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
        labs(x = "EVENTS", y = "Millions of dollars (USD$)")+
        ggtitle( "Economic damage to crop by type of meteorological event in the United States (Millions of dollars USD$)")+
        theme_classic()
Economic damage to property by type of meteorological event in the United States

Economic damage to property by type of meteorological event in the United States

graphically we can corroborate the information previously explained about these phenomena, where it can be seen that in the case of material damage to property, the TORNADO are the main causes of these and for damage to crops these damages are led by the HAIL.