The basic goal of this assignment is to explore the NOAA Storm Database and answer the following questions about severe weather events:
Firstly, the necessary libraries are loaded
library(readr)
library(tidyverse)
library(reshape2)
The data is loaded into a dataframe
if(!file.exists("stormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "stormData.csv.bz2", method = "curl")
}
stormData <- read_csv("stormData.csv.bz2")
We strip the dataframe and leave only the necessary columns
df <- stormData %>% select('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'CROPDMG')
Now we are ready to address the two questions. For the first question we will group and summarise the data by the count of fatalities and injuries. They are two separate entities and they cannot possibly be standardised. We will then look at the top ten contributing event types. As the assignment does not ask for any specific figures, the results will only be visualised.
q1 <- df %>% select(EVTYPE, FATALITIES, INJURIES) %>% group_by(EVTYPE) %>%
summarise(sum(FATALITIES), sum(INJURIES)) %>%
rename("TOTAL_FATALITIES" = "sum(FATALITIES)", "TOTAL_INJURIES" = "sum(INJURIES)") %>%
arrange(desc(TOTAL_FATALITIES, TOTAL_INJURIES)) %>%
head(10)
At this point I want to get the data into tidy format suitable for plotting.
melted1 <- q1 %>% melt()
The data is now ready to answer the first question (see Results). Similar preparation process is used for the second question. Here I could have simply added Damage to Property and Damage to Crops but, as I have already said, no figures are required for this analysis, while each contribution is more demonstative in the chart to follow.
q2 <- df %>% select(EVTYPE, PROPDMG, CROPDMG) %>% group_by(EVTYPE) %>%
summarise(sum(PROPDMG), sum(CROPDMG)) %>%
rename("DamageToProperty" = "sum(PROPDMG)", "DamageToCrops" = "sum(CROPDMG)") %>%
arrange(desc(DamageToProperty, DamageToCrops)) %>%
head(10)
melted2 <- q2 %>% melt()
For the first question, which events are most harmful with respect to population’s health, Tornado is by far the most harmful.
plot1 <- ggplot(melted1, aes(x = reorder(EVTYPE, -value), value, fill=as.factor(variable))) + geom_col() + coord_flip()
plot1 + labs(x = "Event Type", y = "Count") +
scale_fill_discrete(name = "Type of Harm", labels = c("Total Fatalities", "Total Injuries"))
For the second question, which types of events have the greatest economic consequences, again, Tornado is in the lead.
plot2 <- ggplot(melted2, aes(x = reorder(EVTYPE, -value), value, fill=as.factor(variable))) + geom_col() + coord_flip()
plot2 + labs(x = "Event Type", y = "Count") +
scale_fill_discrete(name = "Type of Damage", labels = c("Damage to Property", "Damage to Crops")) +
labs(fill = "Type of Damage")