Synopsis

This report seeks to answer two questions:

  1. Across the United States, which types of events are most harmul with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

We find that the events that are most harmful with respect to population health are tornadoes, which result in the most injuries and fatalities, and excessive heat, which has the second most fatalities. We find that the events with the greatest economic consequences are hurricanes/typhoons, causing the most property damage and second most crop damaage, and river floods, causing the most agricultural damage. We conclude that tornadoes, overall, are most harmful across the United States, since they cause the most bodily harm and are most widespread in the country.

Data Processing

The data we analyzed in this report comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. After downloading the data set, we load into R for processing.

storm <- read.csv('repdata-data-StormData.csv', stringsAsFactors = FALSE)

Since we seek to know about types of events, we will first do some exploratory analysis about the events themselves. First, we seek to know how many unique event types there are. Then, since we are primarily concerned with event type in our analysis, we will group the data according to event type.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
num_events <- length(unique(storm$EVTYPE))
by_event <- storm %>% group_by(EVTYPE)

We find that there are 985 unique meterological events in the database. The variables that seem to have the most relevance to population health are injuries and fatalities. We use the summarise function to see how the number of injuries and fatalities correspond to each type of weather event.

health <- by_event %>% summarise(injuries = sum(INJURIES, na.rm = TRUE), fatalities = sum(FATALITIES, na.rm = TRUE))

We will look only at the rows (events) where the number of injuries or fatalities is greater than 0.

bad_health <- health[which((health$injuries != 0) | (health$fatalities !=0)),]
num_bad <- length(bad_health$injuries)

We see that only 220 weather events resulted in either injury or death.

By contrast, the type of information most relevant to economic consequences are property damage (variables PROPDMG and PROPDMGEXP) and crop damage (variables CROPDMG and CROPDMGEXP). To make it easier to compare damage values in the future, we will combine the DMG and DMGEXP columns to just get one column showing the property or crop damage with the correct magnitude.

costs <- c(1000, 1e6, 1e9)
names(costs)<- c("K", "M", "B")
by_event$PROPDMG2 <- by_event$PROPDMG*costs[storm$PROPDMGEXP] 
by_event$CROPDMG2 <- by_event$CROPDMG*costs[storm$CROPDMGEXP]

Now we can summarize the dataframe over the property and crop damage, creating columns for both the total and average amounts of damage monetarily.

econ <- by_event %>% summarise(PropDamTotal = sum(PROPDMG2, na.rm = TRUE), PropDamAvg = mean(PROPDMG2, na.rm = TRUE), CropDamTotal = sum(CROPDMG2, na.rm = TRUE), CropDamAvg = mean(CROPDMG2, na.rm = TRUE)) 
bad_econ <- econ[which((econ$PropDamTotal != 0) | (econ$CropDamTotal !=0)),]
num_bad_econ <- length(bad_econ$PropDamTotal)

And so we see that there are 426 types of weather events that resulted in crop or property damage.

Results

Health

First, we will look at how the various weather events affected the overall health of the U.S. poopulation.

We will first sort the weather events in decreasing order in terms of number of injuries, and number of fatalities caused. We then find the weather events that are listed in the top 10 for both.

injury_sort <- bad_health[order(bad_health$injuries, decreasing = TRUE),]
fatality_sort <- bad_health[order(bad_health$fatalities, decreasing = TRUE),]
evn_ids <- intersect(injury_sort$EVTYPE[1:10],fatality_sort$EVTYPE[1:10])
worst_health <- bad_health[bad_health$EVTYPE %in% evn_ids,]
num_events <- length(worst_health$EVTYPE)

To visually compare the health toll caused by these 7 weather events, we create a stacked bar graph that shows the number of injuries in blue and number of fatalites in red. Note that the data frame must be restacked first by combining the injury and fatality plots into one column, in order to create the plot.

worst_health <- data.frame(worst_health[1],stack(worst_health[2:3]))
library(ggplot2)
cbPalette <- c("#0072B2", "#D55E00", "#E69F00", "#56B4E9", "#009E73", "#F0E442",  "#CC79A7")
ggplot(worst_health, aes(x = EVTYPE))+
    geom_col(aes(y = values, fill = ind))+ 
    coord_flip()+
    scale_fill_manual(values=cbPalette)+
    labs(x = 'Totals') + #add labels
    labs(y = "Weather Event") +
    labs(title = "Weather Events Causing the Greatest Number of Injuries and Fatalities")

From this plot, we can clearly see that the overwhelming majority of harm is caused by tornadoes, in terms of both injuries and fatalities. The second greatest number of fatalities appears to be caused by excessive heat.

We suspect that tornadoes are also quite widespread across the U.S. To test this, we examine how many states or territories reported incidents of bodily harm due to tornadoes.

tornado <- subset(storm,EVTYPE == 'TORNADO', select = c("STATE__", "STATE", "EVTYPE"))
num_tornadoes <- length(unique(tornado$STATE__))

We find that 52 reported incidents, which covers all 50 states, D.C. and Puerto Rico.

Economy

To get an overview of which weather events cause the most damage to property, we first simply calculate the weather events that caused the most property and crop damage, respectively, in terms monetary cost.

max_prop <- bad_econ$EVTYPE[bad_econ$PropDamTotal == max(bad_econ$PropDamTotal)] 
max_crop <- econ$EVTYPE[bad_econ$CropDamTotal == max(bad_econ$CropDamTotal)]

We note that the highest amount of property damage was caused by the weather event FLOOD and the most amount of crop damage was caused by BLIZZARD/WINTER STORM, LIGHTNING, UNSEASONABLY WARM/WET.

We will now visually investigate how the weather events that create the most property and crop damage compare. To do this, we will first sort the dataframe, as we did for the health cases, by total property damage, average property damage, total crop damage, and average crop damage, respectively. Then, we find the top 20 in each category and retain those that appear in both the top total and top average property damage, and those that appear in both the top total and top average crop damage.

prop_total_sort <- bad_econ[order(bad_econ$PropDamTotal, decreasing = TRUE),]
prop_avg_sort <- bad_econ[order(bad_econ$PropDamAvg, decreasing = TRUE),]
crop_total_sort <- bad_econ[order(bad_econ$CropDamTotal, decreasing = TRUE),]
crop_avg_sort <- bad_econ[order(bad_econ$CropDamAvg, decreasing = TRUE),]

prop_evn_ids <- intersect(prop_total_sort$EVTYPE[1:20],prop_avg_sort$EVTYPE[1:20])
crop_evn_ids <- intersect(crop_total_sort$EVTYPE[1:20],crop_avg_sort$EVTYPE[1:20])

We now compare the weather events with the greatest property and crop damage, looking at the total damage caused and the average monetary damage, respectively. Note that the property and crop damage plots have different scales.

#Restrict to weather events causing the most damage
worst_prop <- bad_econ[bad_econ$EVTYPE %in% prop_evn_ids,1:3]
worst_prop <- data.frame(worst_prop[1],stack(worst_prop[2:3])) 

worst_crop <- bad_econ[bad_econ$EVTYPE %in% crop_evn_ids,c(1,4:5)]
worst_crop <- data.frame(worst_crop[1],stack(worst_crop[2:3]))

#plot
ggplot(worst_prop, aes(x = EVTYPE))+
    geom_col(aes(y = values, fill = ind), position = "dodge" )+ 
    #facet_grid(.~ind)+
    coord_flip()+
    labs(x = "Weather Event" ) + #add labels
    labs(y = 'Damage in US Dollars') +
    labs(title = "Property Damage")

ggplot(worst_crop, aes(x = EVTYPE))+
    geom_col(aes(y = values, fill = ind), position = "dodge")+ 
    #facet_grid(.~ind)+
    coord_flip()+
    labs(x = "Weather Event" ) + #add labels
    labs(y = 'Damage in US Dollars') +
    labs(title = "Crop Damage")

We see that the weather event that causes the most overall property damage is a hurricane/typhoon, with storm surge not too far behind. The greatest amount of crop damage is by far due to river floods, with hurricanes/typoons coming in second.

It is interesting to note that these weather events are very similar in nature. In fact, the weather events causing the most damage to both property and crops is caused, perhaps unsurprisingly, by water-related weather events.

On the other hand, general heavy rain/severe weather causes the most property damage on average. Since these events are much more common than huricanes, this makes sense, and might make them a greater threat to the economy. River floods still account for the greatest crop damage on average, which, again, is likely to be a more widespread problem across the U.S.

To see just how widespread these events are, we calculate the number of states or territories in which they were observed.

hurricane <- subset(storm,EVTYPE == 'HURRICANE' | EVTYPE == 'HURRICANE/TYPHOON', select = c("STATE", "EVTYPE"))
hurricane_num <- length(unique(hurricane$STATE))

flood <- subset(storm,EVTYPE == 'RIVER FLOOD', select = c("STATE", "EVTYPE"))
flood_num <- length(unique(flood$STATE))

rain <- subset(storm,EVTYPE == 'HEAVY RAIN/SEVERE WEATHER', select = c("STATE", "EVTYPE"))
rain_num <- length(unique(rain$STATE))

storm_surge <- subset(storm,EVTYPE == 'STORM SURGE', select = c("STATE", "EVTYPE"))
storm_surge_num <- length(unique(storm_surge$STATE))

We find that there were 19 states or territories that had recorded hurricanes or typhoons, 21 that recorded storm surges, 16 that experienced river floods, and 2 that experienced heavy rain or severe weather that resulted in damage.

This shows us the suprising result that heavy rain and severe weather did not actually cause damage often. Since there were only two recorded events, this low occurance would account for why the total and average amounts were the same.

Conclusion

In conclusion, we find that tornadoes are the most widespread weather event in the U.S. that cause harm, specifically the largest bodily injury and fatality rates of any weather event, by far. Additionally, water-related weather events cause the most economic damage. Specifically hurricanes or typhoons create the most property damage while river floods are the most damaging to crops.