Abstract

The following study aims to answer the following two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

    • Tornadoes appear to be the most dangerous event, by killing or injuring 96 979 people between 1950 and 2011.
  2. Across the United States, which types of events have the greatest economic consequences?

    • Floods are the most devastating event for the economy, with a total estimated damage of more than 150B USD in the analysed period.

Data Processing

The Storm Data are available from the link embedded in the text. The following code will download and read the them:

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "data.csv.bz2")
data <- read.csv("data.csv.bz2")

In order to answer the first question, we calculate a new variable, “DMG2PPL” (damage to people), that sums the number of injuried and killed people.

data$DMG2PPL <- data$FATALITIES + data$INJURIES

Similarly, to answer our second question, we are going to calculate a variable called “ECODMG” (economic damage) that sums property and crop damage. This will be more tricky and require more passages since those damages have a code for the multiplier. We are doing this task using the following steps:

  1. Create the multipliers variable, assigning:

    • H = 100

    • K = 1000

    • M = 1000000

    • B = 1000000000

    • 1 for any other case.

  2. Multiply the corresponding values.

  3. Sum the resulting values.

data$PROPDMGMULT <- ifelse(toupper(data$PROPDMGEXP) == "H", 100,
                    ifelse(toupper(data$PROPDMGEXP) == "K", 1000,
                    ifelse(toupper(data$PROPDMGEXP) == "M", 1000000,
                    ifelse(toupper(data$PROPDMGEXP) == "B", 1000000000, 1))))

data$CROPDMGMULT <- ifelse(toupper(data$CROPDMGEXP) == "H", 100,
                    ifelse(toupper(data$CROPDMGEXP) == "K", 1000,
                    ifelse(toupper(data$CROPDMGEXP) == "M", 1000000,
                    ifelse(toupper(data$CROPDMGEXP) == "B", 1000000000, 1))))

data$ECODMG <- (data$CROPDMG * data$CROPDMGMULT) + (data$PROPDMG * data$PROPDMGMULT)

We are now read to answer the research questions!

Results

In our dataset there are 985 kind of events recorded. For readability purposes we are going to analise the top 10 most harmful events.

# Load ggplot2 and scales
library(ggplot2)
library(scales)

# Summarize the data by EVTYPE and filter the top ten
total_dmg <- aggregate(DMG2PPL ~ EVTYPE, data = data, sum)
top10_dmg <- head(total_dmg[order(-total_dmg$DMG2PPL), ], 10)

# Create the bar chart for the top 10 most harmful events for people
ggplot(top10_dmg, aes(x = reorder(EVTYPE, -DMG2PPL), y = DMG2PPL)) +
  geom_bar(stat = "identity", fill = "red") +  # Blood red color
  geom_text(aes(label = scales::label_number(big.mark = ".", decimal.mark = ",")(DMG2PPL)),
            vjust = -0.3, color = "black", size = 3) +  # Format numbers with points as thousand separators
  labs(title = "Top 10 Most Harmful Atmospheric Events for People (1950-2011)",
       x = "Event Type",
       y = "Total Damage to People (Fatalities + Injuries)") +
  theme_minimal(base_size = 10) +  # Set base font size for readability
  theme(axis.text.y = element_blank(),  # Remove y-axis labels
        axis.ticks.y = element_blank(),  # Optionally remove y-axis ticks
        axis.text.x = element_text(angle = 45, hjust = 1, size = 6),  # X-axis text size
        axis.title.x = element_text(size = 8),  # X-axis title size
        axis.title.y = element_text(size = 8),  # Y-axis title size
        plot.title = element_text(size = 10))  # Title size

From the above graph we can see that tornadoes are by far the most harmful event for people in the U.S. during the analised period, by killing or injuring 96 979 people in the considered peoriod of time.

# Summarize the data by EVTYPE
total_econ_dmg <- aggregate(ECODMG ~ EVTYPE, data = data, sum)

# Order the data by ECODMG in descending order and select the top 10
top10_econ_dmg <- head(total_econ_dmg[order(-total_econ_dmg$ECODMG), ], 10)

# Create the bar chart for the top 10 most harmful events for the economy
ggplot(top10_econ_dmg, aes(x = reorder(EVTYPE, -ECODMG), y = ECODMG)) +
  geom_bar(stat = "identity", fill = "steelblue") +  # Winter blue color
  geom_text(aes(label = scales::label_number(scale = 1e-9, suffix = "B")(ECODMG)), 
            vjust = -0.3, color = "black", size = 3) +  # Format as billions, text size 3
  labs(title = "Top 10 Most Economically Harmful Atmospheric Events in the U.S. (1950-2011)",
       x = "Event Type",
       y = "Total Economic Damage ($)") +
  theme_minimal(base_size = 10) +  # Set base font size to 10
  theme(axis.text.y = element_blank(),  # Remove y-axis labels
        axis.ticks.y = element_blank(),  # Optionally remove y-axis ticks
        axis.text.x = element_text(angle = 45, hjust = 1, size = 6),  # X-axis text size 8
        axis.title.x = element_text(size = 8), 
        axis.title.y = element_text(size = 8),  
        plot.title = element_text(size = 10)) 

The above graph shows that Floods caused the most economic damage in the US during the considered period, with more than 150B USD of damages.