The following study aims to answer the following two questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The Storm Data are available from the link embedded in the text. The following code will download and read the them:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "data.csv.bz2")
data <- read.csv("data.csv.bz2")
In order to answer the first question, we calculate a new variable, “DMG2PPL” (damage to people), that sums the number of injuried and killed people.
data$DMG2PPL <- data$FATALITIES + data$INJURIES
Similarly, to answer our second question, we are going to calculate a variable called “ECODMG” (economic damage) that sums property and crop damage. This will be more tricky and require more passages since those damages have a code for the multiplier. We are doing this task using the following steps:
Create the multipliers variable, assigning:
H = 100
K = 1000
M = 1000000
B = 1000000000
1 for any other case.
Multiply the corresponding values.
Sum the resulting values.
data$PROPDMGMULT <- ifelse(toupper(data$PROPDMGEXP) == "H", 100,
ifelse(toupper(data$PROPDMGEXP) == "K", 1000,
ifelse(toupper(data$PROPDMGEXP) == "M", 1000000,
ifelse(toupper(data$PROPDMGEXP) == "B", 1000000000, 1))))
data$CROPDMGMULT <- ifelse(toupper(data$CROPDMGEXP) == "H", 100,
ifelse(toupper(data$CROPDMGEXP) == "K", 1000,
ifelse(toupper(data$CROPDMGEXP) == "M", 1000000,
ifelse(toupper(data$CROPDMGEXP) == "B", 1000000000, 1))))
data$ECODMG <- (data$CROPDMG * data$CROPDMGMULT) + (data$PROPDMG * data$PROPDMGMULT)
We are now read to answer the research questions!
In our dataset there are 985 kind of events recorded. For readability purposes we are going to analise the top 10 most harmful events.
# Load ggplot2 and scales
library(ggplot2)
library(scales)
# Summarize the data by EVTYPE and filter the top ten
total_dmg <- aggregate(DMG2PPL ~ EVTYPE, data = data, sum)
top10_dmg <- head(total_dmg[order(-total_dmg$DMG2PPL), ], 10)
# Create the bar chart for the top 10 most harmful events for people
ggplot(top10_dmg, aes(x = reorder(EVTYPE, -DMG2PPL), y = DMG2PPL)) +
geom_bar(stat = "identity", fill = "red") + # Blood red color
geom_text(aes(label = scales::label_number(big.mark = ".", decimal.mark = ",")(DMG2PPL)),
vjust = -0.3, color = "black", size = 3) + # Format numbers with points as thousand separators
labs(title = "Top 10 Most Harmful Atmospheric Events for People (1950-2011)",
x = "Event Type",
y = "Total Damage to People (Fatalities + Injuries)") +
theme_minimal(base_size = 10) + # Set base font size for readability
theme(axis.text.y = element_blank(), # Remove y-axis labels
axis.ticks.y = element_blank(), # Optionally remove y-axis ticks
axis.text.x = element_text(angle = 45, hjust = 1, size = 6), # X-axis text size
axis.title.x = element_text(size = 8), # X-axis title size
axis.title.y = element_text(size = 8), # Y-axis title size
plot.title = element_text(size = 10)) # Title size
From the above graph we can see that tornadoes are by far the most harmful event for people in the U.S. during the analised period, by killing or injuring 96 979 people in the considered peoriod of time.
# Summarize the data by EVTYPE
total_econ_dmg <- aggregate(ECODMG ~ EVTYPE, data = data, sum)
# Order the data by ECODMG in descending order and select the top 10
top10_econ_dmg <- head(total_econ_dmg[order(-total_econ_dmg$ECODMG), ], 10)
# Create the bar chart for the top 10 most harmful events for the economy
ggplot(top10_econ_dmg, aes(x = reorder(EVTYPE, -ECODMG), y = ECODMG)) +
geom_bar(stat = "identity", fill = "steelblue") + # Winter blue color
geom_text(aes(label = scales::label_number(scale = 1e-9, suffix = "B")(ECODMG)),
vjust = -0.3, color = "black", size = 3) + # Format as billions, text size 3
labs(title = "Top 10 Most Economically Harmful Atmospheric Events in the U.S. (1950-2011)",
x = "Event Type",
y = "Total Economic Damage ($)") +
theme_minimal(base_size = 10) + # Set base font size to 10
theme(axis.text.y = element_blank(), # Remove y-axis labels
axis.ticks.y = element_blank(), # Optionally remove y-axis ticks
axis.text.x = element_text(angle = 45, hjust = 1, size = 6), # X-axis text size 8
axis.title.x = element_text(size = 8),
axis.title.y = element_text(size = 8),
plot.title = element_text(size = 10))
The above graph shows that floods caused the most economic damage in the US during the considered period, with more than 150B USD of damages.