Exploring NOAA storm data base. Severe weather consequencies

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

More information about the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA):

Questions to be answered or approached

Across the United States: + Which types of events (as indicated in the EVTYPEEVTYPE variable) are most harmful with respect to population health? + Which types of events have the greatest economic consequences?

Libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.3

Data processing

if(!file.exists("stormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile = "stormData.csv.bz2", method = "curl")
}

#Read data from working directory

activity<-read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)

#Create tidy date of only needed information
tidy_data_health <- select(activity,EVTYPE,FATALITIES, INJURIES)
tidy_data_property <- select(activity, BGN_DATE,EVTYPE,PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Results

- Question 1: Which are the most dangerous events conserning deaths and injuries in the USA?

The firt question will be analized regarding deaths and injuries

Which event causes more deaths?

tidy_fatalities <- aggregate(FATALITIES ~ EVTYPE, data = tidy_data_health, sum)

#See first 20 lines
tidy_fatalities <- arrange(tidy_fatalities, desc(FATALITIES))
head(tidy_fatalities,20)
##                     EVTYPE FATALITIES
## 1                  TORNADO       5633
## 2           EXCESSIVE HEAT       1903
## 3              FLASH FLOOD        978
## 4                     HEAT        937
## 5                LIGHTNING        816
## 6                TSTM WIND        504
## 7                    FLOOD        470
## 8              RIP CURRENT        368
## 9                HIGH WIND        248
## 10               AVALANCHE        224
## 11            WINTER STORM        206
## 12            RIP CURRENTS        204
## 13               HEAT WAVE        172
## 14            EXTREME COLD        160
## 15       THUNDERSTORM WIND        133
## 16              HEAVY SNOW        127
## 17 EXTREME COLD/WIND CHILL        125
## 18             STRONG WIND        103
## 19                BLIZZARD        101
## 20               HIGH SURF        101
#Select only 10 lines
tidy_fatalities_selected <- tidy_fatalities[1:10, ]
#Define as factor in order to be ordered in the plot
tidy_fatalities_selected$EVTYPE <- factor(tidy_fatalities_selected$EVTYPE, levels = tidy_fatalities_selected$EVTYPE)
ggplot(tidy_fatalities_selected, aes(x= EVTYPE, y = FATALITIES)) + geom_bar(stat = "identity", fill = "red") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(x = "Type of event", y = "Fatalities", title = "Number of fatalities by 10 most important events from 1950 to 2011")

As seen from the graphic, tornados causes more deaths in the USA than any other weather event

Which event causes more injuries?

#Find injuries per event type.
tidy_injuries <- aggregate(INJURIES ~ EVTYPE, data = tidy_data_health, sum)

#See first 20 lines
tidy_injuries <- arrange(tidy_injuries, desc(INJURIES))

tidy_injuries_selected <- tidy_injuries[1:10, ]
#Define as factor in order to be ordered in the plot
tidy_injuries_selected$EVTYPE <- factor(tidy_injuries_selected$EVTYPE, levels = tidy_injuries_selected$EVTYPE)
ggplot(tidy_injuries_selected, aes(x= EVTYPE, y = INJURIES)) + geom_bar(stat = "identity", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(x = "Type of event", y = "Number of injuries", title = "Number of injuried by 10 most important events from 1950 to 2011") 

As seen from the graphic, tornadoes causes more injuries in the USA than any other weather event.

It is obvserved that tornadoes are the most dangerous weather events in the USA, being the top 1 in injuried and deaths.

Question 2: Which types of events have the greatest economic consequences?

#Analysis for property damage
# Convert H, K, M, B units to calculate Property Damage 
tidy_prop_damage <- mutate(tidy_data_property,  Dam_to_property = ifelse(toupper(PROPDMGEXP) =='K',   PROPDMG*1000, ifelse(toupper(PROPDMGEXP) =='M', PROPDMG*1000000, ifelse(toupper(PROPDMGEXP) == 'B', PROPDMG*1000000000,                                 ifelse(toupper(PROPDMGEXP) == 'H', PROPDMG*100, PROPDMG)))))

prop_damage <- subset(tidy_prop_damage, select = c("EVTYPE", "Dam_to_property"))
prop_damage_type <- aggregate( Dam_to_property ~ EVTYPE, data = prop_damage, sum)



#Analysis for crop damage
# Convert H, K, M, B units to calculate crop Damage 
tidy_crop_damage <- mutate(tidy_data_property, Dam_to_crop = ifelse(toupper(CROPDMGEXP) =='K', CROPDMG*1000,  ifelse(toupper(CROPDMGEXP) =='M', CROPDMG*1000000, 
ifelse(toupper(CROPDMGEXP) == 'B', CROPDMG*1000000000,
ifelse(toupper(CROPDMGEXP) == 'H', CROPDMG*100, CROPDMG)))))

crop_damage <- subset(tidy_crop_damage, select = c("EVTYPE", "Dam_to_crop"))
crop_damage_type <- aggregate( Dam_to_crop ~ EVTYPE, data = crop_damage, sum)

#Merge data from property and crop damage

total_damage <- merge(crop_damage_type, prop_damage_type, by="EVTYPE")
total_damage <- mutate(total_damage, Tot_damage = Dam_to_crop + Dam_to_property)

#Order by 10 most important
total_damage <- arrange(total_damage, desc(Tot_damage))
total_damage_selected <- total_damage[1:10, ]

#Define as factor in order to be ordered in the plot
total_damage_selected$EVTYPE <- factor(total_damage_selected$EVTYPE, levels = total_damage_selected$EVTYPE)
ggplot(total_damage_selected, aes(x= EVTYPE, y = Tot_damage)) + geom_bar(stat = "identity", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(x = "Type of event", y = "Damage", title = "Damage to property by 10 most important events from 1950 to 2011") 

As seen from the plot, flood causes more property and crop damage than any other weather event.