According to the data supplied by of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) - Health and Economic Impacts it is evident that in terms of injuries and deaths caused by meteorological events in the United States, Tornados are the main causes of this with a total of USD$5,636 deaths in the period covered and a total of 91407 injuries. In the case of economic damage to property, Tornados are also the main protagonist of these with a cost of USD$ 3,216 million and in the case of economic damage to crops it is USD$ 586 thousand dollars.
paqs <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only =TRUE)
}
#This is a function to automate the installation of packages.
packages <- c("tidyverse","viridis") #packages to use in the project
library (tidyverse)
library(viridis)
#With the data previously downloaded in our working directory, we proceed to load them
if(!exists("storm")) {
storm <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)
}
#Selecting data of interest
variables <- c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm <- storm[, variables]
attach(storm)
str(storm)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
of the 7 variables that are observed, there are four that are numerical and three are string. Now Performing statistical summary of each variable.
summary(storm)
## EVTYPE FATALITIES INJURIES PROPDMG
## Length:902297 Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Mode :character Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:902297 Min. : 0.000 Length:902297
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
apply(is.na(storm), 2, sum)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 0 0 0 0 0 0 0
Don’t exists missing values in dataset.
First we preprocess the EVENTYPE variable, for this we will create a new variable in the data set with the main climatic events, for this we will use regular expressions with the grep () function.
# New variable
storm$EVENTS <- "OTHER"
# inputing event categories
storm$EVENTS[grep("WIND", storm$EVTYPE, ignore.case = TRUE)] <- "WIND"
storm$EVENTS[grep("TORNADO", storm$EVTYPE, ignore.case = TRUE)] <- "TORNADO"
storm$EVENTS[grep("HEAT", storm$EVTYPE, ignore.case = TRUE)] <- "HEAT"
storm$EVENTS[grep("FLOOD", storm$EVTYPE, ignore.case = TRUE)] <- "FLOOD"
storm$EVENTS[grep("SNOW", storm$EVTYPE, ignore.case = TRUE)] <- "SNOW"
storm$EVENTS[grep("STORM", storm$EVTYPE, ignore.case = TRUE)] <- "STORM"
storm$EVENTS[grep("WINTER", storm$EVTYPE, ignore.case = TRUE)] <- "WINTER"
storm$EVENTS[grep("RAIN", storm$EVTYPE, ignore.case = TRUE)] <- "RAIN"
storm$EVENTS[grep("HAIL", storm$EVTYPE, ignore.case = TRUE)] <- "HAIL"
# The variable EVTYPE is no longer necessary, therefore I proceed to exclude it from the database
storm <- storm[, -1]
#Observing EVENT variable
table(storm$EVENTS)
##
## FLOOD HAIL HEAT OTHER RAIN SNOW STORM TORNADO WIND WINTER
## 82689 290401 2648 48970 12241 17636 113086 60699 254323 19604
The PROPDMGEXP and CROPDMGEXP variables are represented in different monetary units, therefore, we will unify the monetary criteria of both variables.
# observing quantity of observations by symbols of PROPDMGEXP and CROPDMGEXP.
sort(table(storm$PROPDMGEXP), decreasing = T)[1:10]
##
## K M 0 B 5 1 2 ? m
## 465934 424665 11330 216 40 28 25 13 8 7
sort(table(storm$CROPDMGEXP), decreasing = T)[1:15]
##
## K M k 0 B ? 2 m <NA> <NA>
## 618413 281832 1994 21 19 9 7 1 1
## <NA> <NA> <NA> <NA>
##
The variables PROPDMGEXP and CROPDMGEXP contain the symbologies of the motor units of the variables PROPDMG and CROPDMG respectively, being:
K = Thousands of Dollars —> 10^3
M = Millions of Dollars —> 10^6
B = Billions of Dollars —> 10^9
We will treat the other existing symbols as dollar units, that is, 10^0.
We will start by modifying the PROPDMGEXP y CROPDMGEXP variables. once the PROPDMGEXP and CROPDMGEXP variables have been recoded according to their symbol, we will proceed to multiply PROPDMG and CROPDMG by the PROPDMGEXP and CROPDMGEXP variables respectively and homogenize the monetary units. We will also create two new variables that will contain the total damage of the population, this through the sum of the variables INJURY and FATALITIES. These two variables will be called PROPTOTALDMG and CROPTOTALDMG respectively.
###############PROPDMGEXP##########################################################
storm$PROPDMGEXP <- as.character(storm$PROPDMGEXP) #we first convert this variable to type character.
storm$PROPDMGEXP[is.na(storm$PROPDMGEXP)] <- 0 #If there are missing values, we will convert it to an exponent of 0
storm$PROPDMGEXP[!grepl("K|M|B", storm$PROPDMGEXP, ignore.case = TRUE)] <- 0
storm$PROPDMGEXP[grep("K", storm$PROPDMGEXP, ignore.case = TRUE)] <- "3"
storm$PROPDMGEXP[grep("M", storm$PROPDMGEXP, ignore.case = TRUE)] <- "6"
storm$PROPDMGEXP[grep("B", storm$PROPDMGEXP, ignore.case = TRUE)] <- "9"
storm$PROPDMGEXP <- as.numeric(as.character(storm$PROPDMGEXP))
storm$PROPTOTALDMG <- storm$PROPDMG * 10^storm$PROPDMGEXP #New variable
###############CROPDMGEXP#########################################################
storm$CROPDMGEXP <- as.character(storm$CROPDMGEXP) #we first convert this variable to type character.
storm$CROPDMGEXP[is.na(storm$CROPDMGEXP)] <- 0
storm$CROPDMGEXP[!grepl("K|M|B", storm$CROPDMGEXP, ignore.case = TRUE)] <- 0
storm$CROPDMGEXP[grep("K", storm$CROPDMGEXP, ignore.case = TRUE)] <- "3"
storm$CROPDMGEXP[grep("M", storm$CROPDMGEXP, ignore.case = TRUE)] <- "6"
storm$CROPDMGEXP[grep("B", storm$CROPDMGEXP, ignore.case = TRUE)] <- "9"
storm$CROPDMGEXP <- as.numeric(as.character(storm$CROPDMGEXP))
storm$CROPTOTALDMG <- storm$CROPDMG * 10^storm$CROPDMGEXP #New variable
knitr::kable(storm[1:10, ], caption = "First ten rows of dataset storm", label = "Table 1", align = "c")
| FATALITIES | INJURIES | PROPDMG | PROPDMGEXP | CROPDMG | CROPDMGEXP | EVENTS | PROPTOTALDMG | CROPTOTALDMG |
|---|---|---|---|---|---|---|---|---|
| 0 | 15 | 25.0 | 3 | 0 | 0 | TORNADO | 25000 | 0 |
| 0 | 0 | 2.5 | 3 | 0 | 0 | TORNADO | 2500 | 0 |
| 0 | 2 | 25.0 | 3 | 0 | 0 | TORNADO | 25000 | 0 |
| 0 | 2 | 2.5 | 3 | 0 | 0 | TORNADO | 2500 | 0 |
| 0 | 2 | 2.5 | 3 | 0 | 0 | TORNADO | 2500 | 0 |
| 0 | 6 | 2.5 | 3 | 0 | 0 | TORNADO | 2500 | 0 |
| 0 | 1 | 2.5 | 3 | 0 | 0 | TORNADO | 2500 | 0 |
| 0 | 0 | 2.5 | 3 | 0 | 0 | TORNADO | 2500 | 0 |
| 1 | 14 | 25.0 | 3 | 0 | 0 | TORNADO | 25000 | 0 |
| 0 | 0 | 25.0 | 3 | 0 | 0 | TORNADO | 25000 | 0 |
To answer the question throughout the United States, what types of events are most damaging to the health of the population? We will make a summary by adding the INJURY and FATALITIES variables by type of event throughout the United States.
DMGPOP <- storm %>%
group_by(EVENTS) %>%
summarise(INJURIES = sum(INJURIES, na.rm = T),
FATALITIES= sum(FATALITIES, na.rm = T),
FATALITIES_AND_INJURIES = sum(INJURIES + FATALITIES, na.rm = T)
) %>%
arrange(desc(FATALITIES_AND_INJURIES))
DMGPOP <- as.data.frame(DMGPOP)
knitr::kable(DMGPOP, caption = " Deaths and injuries by type of meteorological event", label = "Table 2", align = "c")
| EVENTS | INJURIES | FATALITIES | FATALITIES_AND_INJURIES |
|---|---|---|---|
| TORNADO | 91407 | 5636 | 97043 |
| OTHER | 12224 | 2626 | 14850 |
| HEAT | 9224 | 3138 | 12362 |
| FLOOD | 8602 | 1524 | 10126 |
| WIND | 8906 | 1204 | 10110 |
| STORM | 5338 | 416 | 5754 |
| WINTER | 1891 | 278 | 2169 |
| HAIL | 1467 | 45 | 1512 |
| SNOW | 1164 | 164 | 1328 |
| RAIN | 305 | 114 | 419 |
The weather event that caused the greatest injuries to the population of the United States were the TORNADO , con 9.140710^{4}, tornadoes also caused the highest number of deaths to the population with 5636.
These results can best be seen through a bar graph as shown below.
ggplot(DMGPOP, aes(x = reorder(EVENTS , -INJURIES), y =INJURIES, fill = EVENTS))+
geom_bar(stat = "identity")+
geom_text(aes(label = INJURIES), vjust = -0.50)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(x = "EVENTS", y = "INJURIES")+
ggtitle( "Injuries caused by type of weather event in the US")+
theme_classic()
Injuries caused by type of weather event in the US
ggplot(DMGPOP, aes(x = reorder(EVENTS , -FATALITIES), y =FATALITIES, fill = EVENTS))+
geom_bar(stat = "identity")+
geom_text(aes(label = FATALITIES), vjust = -0.50)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(x = "EVENTS", y = "FATALITIES")+
ggtitle( "Fatalities caused by type of weather event in the US")+
theme_classic()
Fatalities caused by type of weather event in the US
It can clearly be seen that tornadoes have been the most damaging to the US population for both deaths and human injuries.
Now to answer the question across America, what kinds of events have the biggest economic consequences? We proceed to group the PROPDMG and CROPDMG data by type of weather event creating a new column with the sum of the economic costs incurred by each event.
ECONOMICDMG <- storm %>%
group_by(EVENTS) %>%
summarise(PropertyDMG =round(sum(PROPDMG, na.rm = T)/1000000,3),
HarvestDMG =round(sum(CROPDMG, na.rm = T)/1000000,3),
TOTAL_DMG = PropertyDMG + HarvestDMG
) %>%
arrange(desc(TOTAL_DMG))
ECONOMICDMG <- as.data.frame(ECONOMICDMG)
knitr::kable(ECONOMICDMG, caption = "Economic consequences by type of weather event in the united states (millions of dollars)", label = "Table 3", align = "c")
| EVENTS | PropertyDMG | HarvestDMG | TOTAL_DMG |
|---|---|---|---|
| TORNADO | 3.216 | 0.100 | 3.316 |
| FLOOD | 2.434 | 0.364 | 2.798 |
| WIND | 1.797 | 0.133 | 1.930 |
| STORM | 1.478 | 0.097 | 1.575 |
| HAIL | 0.699 | 0.586 | 1.285 |
| OTHER | 0.896 | 0.079 | 0.975 |
| SNOW | 0.151 | 0.002 | 0.153 |
| WINTER | 0.151 | 0.002 | 0.153 |
| RAIN | 0.059 | 0.013 | 0.072 |
| HEAT | 0.003 | 0.001 | 0.004 |
Again we can observe the TORNADO as the weather event with the highest economic incidence in the case of property damage for the United States with 3.216. In the case of damage to the harvest, the economic damage was caused by..HAIL with 0.586 millions of dollars in crop damage.
Now we use a bar graph for a better appreciation of the data.
ggplot(ECONOMICDMG, aes(x = reorder(EVENTS , -ECONOMICDMG$PropertyDMG), y =ECONOMICDMG$PropertyDMG, fill = EVENTS))+
geom_bar(stat = "identity")+
geom_text(aes(label = ECONOMICDMG$PropertyDMG), vjust = -0.50)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(x = "EVENTS", y = "Millions of dollars (USD$)")+
ggtitle( "Economic damage to property by type of meteorological event in the United States (Millions of dollars USD$)")+
theme_classic()
Economic damage to property by type of meteorological event in the United States
ggplot(ECONOMICDMG, aes(x = reorder(EVENTS , -ECONOMICDMG$HarvestDMG), y =ECONOMICDMG$HarvestDMG, fill = EVENTS))+
geom_bar(stat = "identity")+
geom_text(aes(label = ECONOMICDMG$HarvestDMG), vjust = -0.50)+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
labs(x = "EVENTS", y = "Millions of dollars (USD$)")+
ggtitle( "Economic damage to crop by type of meteorological event in the United States (Millions of dollars USD$)")+
theme_classic()
Economic damage to property by type of meteorological event in the United States
graphically we can corroborate the information previously explained about these phenomena, where it can be seen that in the case of material damage to property, the TORNADO are the main causes of these and for damage to crops these damages are led by the HAIL.