Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

For this project we are looking into 2 questions. Across the United States, which types of events are most harmful with respect to population health, and across the United States, which types of events have the greatest economic consequences?

Data Processing

To load in the data, we used the download.file and read.csv functions

weburl <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if (!file.exists("2FStormData.csv.bz2p")) {
  download.file(weburl, destfile = "stormdata.csv", method = "curl")
}
storm_data <- read.csv("stormdata.csv")

Now let’s get our packages loaded in correctly

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ggrepel)

Results

Now that we have the data loaded in, and our packages in order, we can start doing our actual analysis.

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

When looking at health, the “Fatalities” and “Injuries” variables are what we will be looking at. Let’s look at those variables, and group them by Event Type. To simplify things, we will create new columns mean_fatalities and mean_injuries where we find the mean number for each type of event.

storm_type <- select(storm_data, FATALITIES, INJURIES, EVTYPE) %>% 
  group_by(EVTYPE) %>% 
  summarize(mean_fatalities = mean(FATALITIES, na.rm = TRUE), mean_injuries = mean(INJURIES, na.rm = TRUE))

Now, let’s look at the data frame we created.

str(storm_type)
## Classes 'tbl_df', 'tbl' and 'data.frame':    985 obs. of  3 variables:
##  $ EVTYPE         : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ mean_fatalities: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mean_injuries  : num  0 0 0 0 0 0 0 0 0 0 ...

Looks like there is going to be a lot of 0s. Because of this, we will filter our data to only include mean fatalities that are greater than 0 and mean injuries > 2.5.

storm_filt_both <- filter(storm_type, mean_fatalities > 0 & mean_injuries > 2.5)
nrow(storm_filt_both)
## [1] 16

Now, we can plot the data.

health <- ggplot(storm_filt_both, aes(mean_fatalities, mean_injuries))
health + geom_point(color = 'red') + geom_label_repel(aes(label = EVTYPE),
    box.padding   = 0.35, point.padding = 0.5, segment.color = 'grey50')

As we can see, these are the 16 types of Storm events that impact health the most. Wild Fire and Tropical Storm Gordon are stand out as very destructive.

Now, we move to Question 2: Across the United States, which types of events have the greatest economic consequences?

We will once again use event type, but this time we will look at Property Damage.

storm_damage <- select(storm_data, EVTYPE, PROPDMG) %>% 
  group_by(EVTYPE) %>% 
  summarize(mean_damage = mean(PROPDMG, na.rm = TRUE))

Now we have a data frame with the mean damage for each event type. Let’s filter out smaller values

storm_damage_filt <- filter(storm_damage, mean_damage > 200)
nrow(storm_damage_filt)
## [1] 30

Now, we make our plot

damage <- ggplot(storm_damage_filt, aes(EVTYPE, mean_damage))
damage + geom_bar(stat = 'identity', color = 'green') + coord_flip()

As we can see, there are a bunch of event types that yield a lot of damage, with Coastal Erosion being an outlier. A lot of the named Tropical Storms and Hurricanes are on the list, which makes sense.