Simple analysis of U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been performed in order to determine the most deadly event types. The second task was to determine the event types which are most destructive in the economical sense.
In this Section I will describe all the steps done during the data analysis. First, the data is downloaded and imported.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Fstormdata.csv")
data <- read.csv("Fstormdata.csv")
##The documentation for data is available here.
I need to address two questions:
Which types of events are most harmful with respect to the population health across the United States? Which types of events have the greatest economic consequences across the United States? For the first question, I need to group the data by the event (column “EVTYPE”), then I have to add two columns (“FATALITIES”, “INJURIES”) in order to obtain the deadliest event type across the United States.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
deadliest <- data %>%
group_by(EVTYPE) %>%
summarise(total_injuries = sum(FATALITIES, INJURIES, na.rm = TRUE))
deadliest <- deadliest[order(deadliest$total_injuries, decreasing = TRUE), ]
data$new_propdmgexp <- as.numeric(as.character(factor(toupper(data$PROPDMGEXP),
levels=c('K', 'M', 'B'),
labels = c(1000,
1000000,
1000000000))))
data$new_cropdmgexp <- as.numeric(as.character(factor(toupper(data$CROPDMGEXP),
levels=c('K', 'M', 'B'),
labels = c(1000,
1000000,
1000000000))))
most_expensive <- data %>%
group_by(EVTYPE) %>%
summarise(total_dmg = sum(PROPDMG*new_propdmgexp, CROPDMG*new_cropdmgexp, na.rm = TRUE))
most_expensive <- most_expensive[order(most_expensive$total_dmg, decreasing = TRUE), ]
The Figure below shows top 10 deadliest event types across the United States:
library(ggplot2)
ggplot() + theme_classic()+
theme(axis.title.x = element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_col(aes(y=total_injuries, x = EVTYPE,
fill = EVTYPE),
data = deadliest[1:10,],
show.legend = TRUE) +
labs(y = "Total number of injured people",
fill = "Event type")
The Figure below shows top 10 event types which cause most economical damage across the United States:
ggplot() + theme_classic()+
theme(axis.title.x = element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
geom_col(aes(y=total_dmg, x = EVTYPE,
fill = EVTYPE),
data = most_expensive[1:10,],
show.legend = TRUE) +
labs(y = "Total amount of damage ($)",
fill = "Event type")
## Conclusion The deadliest event type across the United States is
tornado, with nearly 95 000 people affected by this severe weather
event. Floods are the greatest economic disaster across the United
States.