Here we consider which severe weather events in the US have the most impact, in terms of fatalities, injuries, property damage, and crop damage. We first find the events which have the greatest number of fatalities within the entire US, and then break this down by state. Then, we find those events which have the largest amount of property damage across the entire US, and then break this down by state. We discover that most fatalities come from tornados, heat. flash glood, and excessive heat, but these are problems only in specific states . We discover that most property damage comes from thunderstorm winds, tornados, floods, and flash floods, but again, concentrated in specific states. Perhaps this could be useful in analyzing how resources should be allocated to deal with these extreme weather events.
The data for this analysis was taken from this web site: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 which is a copy of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We downloaded it on 12/16/2014.
For making sense of the data, these two links may prove useful: National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ
The data format is a csv file, compressed in bz2 format. read.csv() is able to read it even when compressed.
df <- read.csv("c:/josh/repdata-data-StormData.csv.bz2")
We use the dplyr package for manipulating the data and the lattice package for plotting the data
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lattice)
We consider the following question: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
We look across the entire US and find the events with the most fatalities, which we shall report in the Results section. Using dplyr, we group by event type and then aggregate the fatalities and injuries.
health <- group_by(df, EVTYPE)
health <- summarise(health, num_fatalities = sum(FATALITIES), num_injuries = sum(INJURIES))
health <- arrange(health, desc(num_fatalities), desc(num_injuries))
most_freq <- health[1:4, ]$EVTYPE
most_fatalities <- most_freq
health_us <- health
Then, taking only the four most damaging events (in terms of fatalities), we consider how these impact each state. Using dplyr, we group by both state and event type and then aggregate the fatalities and injuries. We filter by the four most frequent events.
health <- group_by(df, STATE, EVTYPE)
health <- summarise(health, num_fatalities = sum(FATALITIES), num_injuries = sum(INJURIES))
health <- health[health$EVTYPE %in% most_freq, ]
health <- arrange(health, EVTYPE)
Next, we look across the entire US and find the events with the most property damage, which we shall report in the Results section. Using dplyr, we group by event type and then aggregate the property and crop damage.
damage <- group_by(df, EVTYPE)
damage <- summarise(damage, num_property_damage = sum(PROPDMG), num_crop_damage = sum(CROPDMG))
damage <- arrange(damage, desc(num_property_damage), desc(num_crop_damage))
most_freq <- damage[1:4, ]$EVTYPE
most_property_damage <- most_freq
damage_us <- damage
Then, taking only the four most damaging events (in terms of property damage), we consider how these impact each state. Using dplyr, we group by both state and event type and then aggregate the property and crop damage. We filter by the four most frequent events.
damage <- group_by(df, STATE, EVTYPE)
damage <- summarise(damage, num_property_damage = sum(PROPDMG), num_crop_damage = sum(CROPDMG))
damage <- damage[damage$EVTYPE %in% most_freq, ]
damage <- arrange(damage, EVTYPE)
Looking at the most fatal weather events across the entire US, we find the most common are these. We report here the fatalities and injuries:
head(health_us)
## Source: local data frame [6 x 3]
##
## EVTYPE num_fatalities num_injuries
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
The four most frequent are TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT.
We now break this down by state, showing the fatalities for each of these four weather events:
barchart(EVTYPE ~ num_fatalities | STATE, data = health, color=health$EVTYPE, col=c("green", "red", "blue"), main="Fatalities by state", xlab = "# fatalities")
Looking at the most damaging weather events across the entire US (forconsidered in terms of property damage), we find the most common are these. We report here the property and crop damage:
head(damage_us)
## Source: local data frame [6 x 3]
##
## EVTYPE num_property_damage num_crop_damage
## 1 TORNADO 3212258.2 100018.52
## 2 FLASH FLOOD 1420124.6 179200.46
## 3 TSTM WIND 1335965.6 109202.60
## 4 FLOOD 899938.5 168037.88
## 5 THUNDERSTORM WIND 876844.2 66791.45
## 6 HAIL 688693.4 579596.28
The four most frequent are TORNADO, FLASH FLOOD, TSTM WIND, FLOOD. We now break this down by state, showing the property damage for each of these four weather events:
barchart(EVTYPE ~ num_property_damage | STATE, data = damage, color=damage$EVTYPE, col=c("green", "red", "blue"), main="Property damage by state", xlab = "property damage")