Synopsis

The goal of this report is to better understand exactly which types of weather events have the greatest effects on public health and have the highest economic consequences.

Questions

Your data analysis must address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?

Data Processing

First, check to see if the data is available in the root directory of this project. If not, then the data will be downloaded. Following the check for the data, it will just be loaded through a read.csv() call.

if (!file.exists("StormData.csv.bz2"))
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")

data <- read.csv("StormData.csv.bz2")

We are going to create a new dataset that includes only the EVTYPE, FATALITIES, INJURIES, and PROPDMG columns, then we are going to group the set by EVTYPE.

weather <- select(data, EVTYPE, FATALITIES, INJURIES, PROPDMG) %>%
  group_by(EVTYPE) %>%
  summarize(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), PROPDMG = sum(PROPDMG))

Here is a sample of the new dataset:

head(weather)
## # A tibble: 6 x 4
##   EVTYPE                  FATALITIES INJURIES PROPDMG
##   <fct>                        <dbl>    <dbl>   <dbl>
## 1 "   HIGH SURF ADVISORY"          0        0     200
## 2 " COASTAL FLOOD"                 0        0       0
## 3 " FLASH FLOOD"                   0        0      50
## 4 " LIGHTNING"                     0        0       0
## 5 " TSTM WIND"                     0        0     108
## 6 " TSTM WIND (G45)"               0        0       8

We’re going to take the weather dataset and split it into three different data sets that will have the top five event types in property damage, injuries, and fatalities. Let’s also print the results so that we can see the data that is returned.

damage <- arrange(weather, desc(PROPDMG)) %>%
  head(5) %>%
  print()
## # A tibble: 5 x 4
##   EVTYPE            FATALITIES INJURIES  PROPDMG
##   <fct>                  <dbl>    <dbl>    <dbl>
## 1 TORNADO                 5633    91346 3212258.
## 2 FLASH FLOOD              978     1777 1420125.
## 3 TSTM WIND                504     6957 1335966.
## 4 FLOOD                    470     6789  899938.
## 5 THUNDERSTORM WIND        133     1488  876844.
injuries <- arrange(weather, desc(INJURIES)) %>%
  head(5) %>%
  print()
## # A tibble: 5 x 4
##   EVTYPE         FATALITIES INJURIES  PROPDMG
##   <fct>               <dbl>    <dbl>    <dbl>
## 1 TORNADO              5633    91346 3212258.
## 2 TSTM WIND             504     6957 1335966.
## 3 FLOOD                 470     6789  899938.
## 4 EXCESSIVE HEAT       1903     6525    1460 
## 5 LIGHTNING             816     5230  603352.
fatalities <- arrange(weather, desc(FATALITIES)) %>%
  head(5) %>%
  print()
## # A tibble: 5 x 4
##   EVTYPE         FATALITIES INJURIES  PROPDMG
##   <fct>               <dbl>    <dbl>    <dbl>
## 1 TORNADO              5633    91346 3212258.
## 2 EXCESSIVE HEAT       1903     6525    1460 
## 3 FLASH FLOOD           978     1777 1420125.
## 4 HEAT                  937     2100     298.
## 5 LIGHTNING             816     5230  603352.

Results

Here are the graphs that show the results of the three data sets:

ggplot(damage, aes(x = reorder(EVTYPE, -PROPDMG), y = PROPDMG)) + geom_bar(stat = "identity") + labs(title = "Damage by Event Type", x = "Event Type", y = "Property Damage")

ggplot(injuries, aes(x = reorder(EVTYPE, -INJURIES), y = INJURIES)) + geom_bar(stat = "identity") + labs(title = "Injuries by Event Type", x = "Event Type", y = "Number of Injuries")

ggplot(fatalities, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES)) + geom_bar(stat = "identity") + labs(title = "Fatalities by Event Type", x = "Event Type", y = "Number of Fatalities")

The data clearly shows that tornadoes rank first in all three categories, however, second through fifth in each category is different. Excessive heat ranks second in fatalities. Thunderstorm wind is second in injuries, and flash floods rank second in property damage.