Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The purpose of this analysis is to identify the extreme weather events causing the most damage in terms of health damage (fatalities and injuries) and in terms of material damage (property damage and crop damage). As it will be seens, tornados are the extreme events causing the most damages in the US both in terms of health and material damage
if(require(dplyr)==FALSE)(install.packages("dplyr")); library(dplyr)
if(require(ggplot2)==FALSE)(install.packages("ggplot2")); library(ggplot2)
if(require(reshape2)==FALSE)(install.packages("reshape2")); library(reshape2)
data <- read.csv("Data/repdata-data-StormData.csv.bz2")
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
As we see we have 37 variables in the dataset. However we’ll only need a subset of them. Therefore let’s create a smaller dataset containing only the variables we need i.e. EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG. Let’s rename it using camelCase convention
data <- data %>% select(eventType = EVTYPE, fatalities = FATALITIES, injuries = INJURIES, propertyDamage = PROPDMG, cropDamage = CROPDMG)
head(data)
## eventType fatalities injuries propertyDamage cropDamage
## 1 TORNADO 0 15 25.0 0
## 2 TORNADO 0 0 2.5 0
## 3 TORNADO 0 2 25.0 0
## 4 TORNADO 0 2 2.5 0
## 5 TORNADO 0 2 2.5 0
## 6 TORNADO 0 6 2.5 0
To ease the process of analysing the data but also when using ggplot2, let’s restrucutre the data using the function * melt *. Indeed the different type of damage are just values of cateogrical variable. The resulting dataset will contain 3 variables: - eventType - categoryOfDamage - valueOfTheDamage
data2 <- melt(data, id=c('eventType'), measure.vars = c('fatalities', 'injuries','cropDamage', 'propertyDamage'), variable.name = "categoryOfDamage", value.name = "valueOfTheDamage")
Before answering the questions let’s compare the damages per event type X damage category In order to get an idea of the damages caused across the four damage categories we first compute the average and sum of damage value per event type per damage category.
- Per Event X Category
DamEventCat <- data2 %>% group_by(eventType,categoryOfDamage) %>% summarize(mean = mean(valueOfTheDamage, na.rm = TRUE),sum=sum(valueOfTheDamage))
head(DamEventCat)
## # A tibble: 6 x 4
## # Groups: eventType [2]
## eventType categoryOfDamage mean sum
## <fct> <fct> <dbl> <dbl>
## 1 " HIGH SURF ADVISORY" fatalities 0 0
## 2 " HIGH SURF ADVISORY" injuries 0 0
## 3 " HIGH SURF ADVISORY" cropDamage 0 0
## 4 " HIGH SURF ADVISORY" propertyDamage 200 200
## 5 " COASTAL FLOOD" fatalities 0 0
## 6 " COASTAL FLOOD" injuries 0 0
To answer that quesiton we’ll select the 10 events with the most fatalities/injuries
First, let’s calculate the top10
topDamEventCat <- DamEventCat[DamEventCat$categoryOfDamage == "fatalities" | DamEventCat$categoryOfDamage == "injuries",] %>% group_by(categoryOfDamage) %>% top_n(n = 10, wt = sum) %>% data.frame()
## Warning: package 'bindrcpp' was built under R version 3.3.2
ggplot(data = topDamEventCat[topDamEventCat$categoryOfDamage == "fatalities",], aes(reorder(eventType,sum), y=sum)) +
geom_bar(stat ="identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() +
labs(x = "Type of extreme event", y = "Sum", title = "Top 10 fatalitites of extreme weather events in the USA")
The barplot shows that the events causing the most fatalities are in descending order tornados, excessive heat, and fflash flood.
ggplot(data = topDamEventCat[topDamEventCat$categoryOfDamage == "injuries",], aes(reorder(eventType,sum), y=sum)) +
geom_bar(stat ="identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() +
labs(x = "Type of extreme event", y = "Sum", title = "Top 10 injuries of extreme weather events in the USA")
The events causing the most injuries are in descending order Tornadoes, TSTM wind and hail and Floods
To answer that quesiton we’ll select the 10 events with the most crop/property damages.
First, let’s calculate the top10
topDamEventCat2 <- DamEventCat[DamEventCat$categoryOfDamage == "cropDamage" | DamEventCat$categoryOfDamage == "propertyDamage",] %>% group_by(categoryOfDamage) %>% top_n(n = 10, wt = sum) %>% data.frame()
ggplot(data = topDamEventCat2, aes(reorder(eventType,sum), y=sum)) +
geom_bar(stat ="identity", position = "dodge") +
facet_grid(categoryOfDamage~.)+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() +
labs(x = "Type of extreme event", y = "Sum", title = "Top 10 crop/properties damages of extreme weather events in the US")
The barplot shows that the events causing the most fatalities are in descending order hail, flash floods, and **flood*. The barplot shows that the events causing the most fatalities are in descending order tornados, flash floods, and tstm wind.