Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis aims to provide answers to questions that help to prepare for severe weather events and prioritization of resources for different types of weather events. In both cases, it shows that according to data from the the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, tornados are the most harmful weather events.

Data Processing

Source data

This project uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which tracks characteristics of major storms and weather events in the United States.

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be obtained here.

Documentation on the database can be found at these locations:

Analysis

This analysis was conducted in R, using these packages:

  • data.table
  • Hmisc
  • dplyr
  • ggplot2

Loading data

Data was loaded with the data table package. For ease of use variable names were converted to lowercase and made unique if needed.

library(data.table)
myData <-
      fread(
            "stormdata.csv",
            sep = ",",
            na.strings = "",
            stringsAsFactors = TRUE,
            )
## 
Read 32.1% of 967216 rows
Read 55.8% of 967216 rows
Read 74.4% of 967216 rows
Read 84.8% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:06
names(myData) <- make.names(tolower(names(myData)), unique = TRUE)

Results

Key questions addressed are 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?

Health impact of weather events

Health impact was assessed as the total number of fatalities and injuries caused by eacht type of event. To understand this, data was first aggregated into a new data frame using the dplyr package. For each of use the variable names of the new data frame were replaced. The data was then sorted by health impact, in descending order. Finally the data was plotted into a bar plot using the ggplot package. For readability, the plot displays only the 10 most impactful events.

library(dplyr)
library(ggplot2)
health <- aggregate(data = myData, fatalities + injuries ~ evtype, sum)
names(health) <- c("event_type", "health_impact")
healthSort <- arrange(health, desc(health_impact))
plot1 <- ggplot(data = healthSort[1:10, 1:2],
       aes(x = reorder(event_type, health_impact),
           y = health_impact)) +
      geom_bar(stat = "identity", na.rm = TRUE, fill = "steelblue") +
      coord_flip() +
      theme_minimal() +
      theme(plot.margin = unit(c(0.5,2,0.5,1), "cm")) +
      labs(title = "Top 10 health hazards",
           x = "Event type",
           y = "Health impact")
head(healthSort, 10)
##           event_type health_impact
## 1            TORNADO         96979
## 2     EXCESSIVE HEAT          8428
## 3          TSTM WIND          7461
## 4              FLOOD          7259
## 5          LIGHTNING          6046
## 6               HEAT          3037
## 7        FLASH FLOOD          2755
## 8          ICE STORM          2064
## 9  THUNDERSTORM WIND          1621
## 10      WINTER STORM          1527
print(plot1)

Economic impact of weather events

Health impact was assessed as the total number of property and crop damages caused by eacht type of event. Data aggregation, sorting and visualization were done in the same manner as for the health impact analysis. For ease of use, the plot expresses ecomomic impact in thousands.

economics <- aggregate(data = myData, propdmg + cropdmg ~ evtype, sum)
names(economics) <- c("event_type", "economic_impact")
economics$economic_impact <- round(economics$economic_impact)

ecoSort <- arrange(economics, desc(economic_impact))
plot2 <- ggplot(data = ecoSort[1:10, 1:2],
                aes(x = reorder(event_type, economic_impact),
                    y = economic_impact)) +
      scale_y_continuous(labels = function(x)x/1000) +
      geom_bar(stat = "identity", na.rm = TRUE, fill = "indianred3") +
      coord_flip() +
      theme_minimal() +
      theme(plot.margin = unit(c(0.5,2,0.5,1), "cm")) +
      labs(title = "Top 10 economic hazards",
           x = "Event type",
           y = "Economic impact (1000)")
head(ecoSort, 10)
##            event_type economic_impact
## 1             TORNADO         3312277
## 2         FLASH FLOOD         1599325
## 3           TSTM WIND         1445168
## 4                HAIL         1268290
## 5               FLOOD         1067976
## 6   THUNDERSTORM WIND          943636
## 7           LIGHTNING          606932
## 8  THUNDERSTORM WINDS          464978
## 9           HIGH WIND          342015
## 10       WINTER STORM          134700
print(plot2)