Synopsis.

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

There is some documentation of the database available. Here you will find how some of the variables are constructed or defined. We recommend you check it out. 1. National Weather Service Storm Data Documentation. 2. National Climatic Data Center Storm Events FAQ.

The packages used are:

library(knitr)
library(data.table)
library(ggplot2)
library(dplyr)
library(R.utils)
library(ggpubr)
library(ggthemes)
library(ggeasy)

Data Processing.

The data come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the link.

url_1 <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

setwd("C:/Users/aleja/Documents/Cursos/Coursera R pratices/RepData_PeerAssessment2")

ifelse(!dir.exists(file.path(getwd(), "Data")), 
       dir.create(file.path(getwd(), "Data")), FALSE)
## [1] FALSE

If the result of the code above is TRUE, it means that there was not a file called “Data”, and is already created. If it’s FALSE, there was already a file called “Data”

download.file(url = url_1, destfile = file.path("./Data", "StormData.csv.bz2"), 
              method = "curl")
bunzip2("./Data/StormData.csv.bz2", "./Data/StormData.csv");list.files(path = "./Data")
fread("./Data/StormData.csv")-> df

str(df); head(df)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1:       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2:       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3:       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4:       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5:       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6:       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1: TORNADO         0                                               0         NA
## 2: TORNADO         0                                               0         NA
## 3: TORNADO         0                                               0         NA
## 4: TORNADO         0                                               0         NA
## 5: TORNADO         0                                               0         NA
## 6: TORNADO         0                                               0         NA
##    END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1:         0                      14.0   100 3   0          0       15    25.0
## 2:         0                       2.0   150 2   0          0        0     2.5
## 3:         0                       0.1   123 2   0          0        2    25.0
## 4:         0                       0.0   100 2   0          0        2     2.5
## 5:         0                       0.0   150 2   0          0        2     2.5
## 6:         0                       1.5   177 2   0          0        6     2.5
##    PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1:          K       0                                         3040      8812
## 2:          K       0                                         3042      8755
## 3:          K       0                                         3340      8742
## 4:          K       0                                         3458      8626
## 5:          K       0                                         3412      8642
## 6:          K       0                                         3450      8748
##    LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1:       3051       8806              1
## 2:          0          0              2
## 3:          0          0              3
## 4:          0          0              4
## 5:          0          0              5
## 6:          0          0              6

We need to clean the data.table, due to the amount of NA’s that contains.

df %>% 
     select(EVTYPE, FATALITIES) %>% 
     group_by(EVTYPE) %>% 
     summarise(total_fatalities = sum(FATALITIES)) %>% 
     arrange(-total_fatalities)-> Total_fatal_events; head(Total_fatal_events, 10)
## # A tibble: 10 x 2
##    EVTYPE         total_fatalities
##    <chr>                     <dbl>
##  1 TORNADO                    5633
##  2 EXCESSIVE HEAT             1903
##  3 FLASH FLOOD                 978
##  4 HEAT                        937
##  5 LIGHTNING                   816
##  6 TSTM WIND                   504
##  7 FLOOD                       470
##  8 RIP CURRENT                 368
##  9 HIGH WIND                   248
## 10 AVALANCHE                   224
df %>% 
     select(EVTYPE, INJURIES) %>% 
     group_by(EVTYPE) %>% 
     summarise(total_injure = sum(INJURIES)) %>% 
     arrange(-total_injure) -> Total_injure_events; head(Total_injure_events, 10)
## # A tibble: 10 x 2
##    EVTYPE            total_injure
##    <chr>                    <dbl>
##  1 TORNADO                  91346
##  2 TSTM WIND                 6957
##  3 FLOOD                     6789
##  4 EXCESSIVE HEAT            6525
##  5 LIGHTNING                 5230
##  6 HEAT                      2100
##  7 ICE STORM                 1975
##  8 FLASH FLOOD               1777
##  9 THUNDERSTORM WIND         1488
## 10 HAIL                      1361

The data has two types of economic impact, namely property damage (PROPDMG) and crop damage (CROPDMG). The damage cost is in $USD, and is coded as shown below:

H, h -> hundreds = x100 K, K -> kilos = x1,000 M, m -> millions = x1,000,000 B,b -> billions = x1,000,000,000 (+) -> x1 (-) -> x0 (?) -> x0 blank -> x0

To know more about the different variables, and the structure of the data, visit the page.

df %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)->df_damage 

sort(unique(as.character(df_damage$PROPDMGEXP)))->Symbol
c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)->Multiplier
data.frame(Symbol, Multiplier)->convert_Multiplier

df_damage$Prop_Multiplier <- convert_Multiplier$Multiplier[match(df_damage$PROPDMGEXP, convert_Multiplier$Symbol)]
df_damage$Crop_Multiplier <- convert_Multiplier$Multiplier[match(df_damage$CROPDMGEXP, convert_Multiplier$Symbol)]

df_d <- df_damage %>% 
     mutate(PROPDMG = PROPDMG*Prop_Multiplier) %>% 
     mutate(CROPDMG = CROPDMG*Crop_Multiplier) %>% 
     mutate(TOTAL_DMG = PROPDMG+CROPDMG)

df_d %>% 
     group_by(EVTYPE) %>% 
     summarize(Total_cost_Dmg = sum(TOTAL_DMG, na.rm=T), 
               Property_cost_Dmg = sum(PROPDMG, na.rm=T), 
               Crop_cost_Dmg = sum(CROPDMG, na.rm=T))%>% 
     arrange(-Property_cost_Dmg, -Crop_cost_Dmg, -Total_cost_Dmg)->df_Cost_Damage
head(df_Cost_Damage, 10)
## # A tibble: 10 x 4
##    EVTYPE            Total_cost_Dmg Property_cost_Dmg Crop_cost_Dmg
##    <chr>                      <dbl>             <dbl>         <dbl>
##  1 FLOOD               150319678250      144657709800    5661968450
##  2 HURRICANE/TYPHOON    71913712800       69305840000    2607872800
##  3 TORNADO              57352117607       56937162897     414954710
##  4 STORM SURGE          43323541000       43323536000          5000
##  5 FLASH FLOOD          17562132111       16140815011    1421317100
##  6 HAIL                 18757611527       15732269877    3025537650
##  7 HURRICANE            14610229010       11868319010    2741910000
##  8 TROPICAL STORM        8382236550        7703890550     678346000
##  9 WINTER STORM          6715441260        6688497260      26944000
## 10 HIGH WIND             5908617580        5270046280     638571300

Result

g1 <- ggplot(Total_fatal_events[1:10,], 
            aes(x=reorder(EVTYPE, -total_fatalities), y=total_fatalities) )

g1 + 
     geom_bar(stat="identity", fill="brown2")+
     theme_foundation() +
     theme(axis.text.x = element_text(angle=90, vjust=0.5))+
     ggtitle("Top 10 Natural Disasters by Deads.") +
     labs(x="Natural Disaster Event.", y = "Total Deads")+
     theme(plot.title = element_text(hjust = 0.5))->g1
g1

g2 <- ggplot(Total_injure_events[1:10,], 
             aes(x=reorder(EVTYPE, -total_injure), y=total_injure) )

g2 + 
     geom_bar(stat="identity", fill="brown2")+
     theme_foundation() +
     theme(axis.text.x = element_text(angle=90, vjust=0.5))+
     ggtitle("Top 10 Natural Disasters by Injures.") +
     labs(x="Natural Disaster Event.", y = "Total Injures")+
     theme(plot.title = element_text(hjust = 0.5))->g2
g2

g3<- ggplot(df_Cost_Damage[1:10,], 
            aes(x=reorder(EVTYPE, -Total_cost_Dmg), y=Total_cost_Dmg) )

g3 + 
     geom_bar(stat="identity", fill="brown2")+
     theme_foundation() +
     theme(axis.text.x = element_text(angle=90, vjust=0.5))+
     ggtitle("Top 10 Natural Disasters by Damage Costs.") +
     labs(x="Natural Disaster Event.", y = "Total Deads")+
     theme(plot.title = element_text(hjust = 0.5))->g3
g3

g11 <- ggplot(Total_fatal_events[1:10,], 
             aes(x=reorder(EVTYPE, -total_fatalities), y=total_fatalities) )

g11 + 
     geom_bar(stat="identity", fill="brown2")+
     theme_tufte() +
     theme(axis.text.x = element_text(angle=45, vjust=0.5))+
     labs(x="Natural Disaster Event.", y = "Total Deads")+
     ylim(0, 100000) ->g11

g22 <- ggplot(Total_injure_events[1:10,], 
             aes(x=reorder(EVTYPE, -total_injure), y=total_injure) )

g22 + 
     geom_bar(stat="identity", fill="brown2")+
     theme_tufte() +
     theme(axis.text.x = element_text(angle=45, vjust=0.5)) +
     labs(x="Natural Disaster Event.", y = "Total Injures")+
     ylim(0,100000) ->g22

g33<- ggplot(df_Cost_Damage[1:10,], 
            aes(x=reorder(EVTYPE, -Total_cost_Dmg), y=Total_cost_Dmg) )

g33 + 
     geom_bar(stat="identity", fill="brown2")+
     theme_tufte() +
     theme(axis.text.x = element_text(angle=45, vjust=0.5))+
     ggtitle("Top 10 Natural Disasters by Damage Costs.") +
     labs(x="Natural Disaster Event.", y = "Total Deads")+
     theme(plot.title = element_text(hjust = 0.5))->g33


fig1<- ggarrange(g33,
                 ggarrange(g11,g22, ncol = 2, labels = c("Deads", "Injures"), hjust = -2),
                 nrow = 2, labels = "Total Costs")
fig1

Which types of events are most harmful to population health?

The most harmful event to population is the tornados events, because is the event that kills and hurt more people, than the other weather events.

Which types of events have the greatest economic consequences?

The events that have the greatest economic consequences are Floods, Hurricanes, Tornados and Storm Surges respectivily.