Using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, this report identifies:
In our results, it was found that tornado is the event that caused the largest number of casualties and injuries from 1950 to 2011. It is also the event that caused the largest amount of damages to property and crop during the same period.
The data for this report are taken from the NOAA Storm Database, which can be downloaded by doing the following in R.
Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(Url, destfile = "StormData.csv.bz2", method = "curl")
Additional notes from the course project website: The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records.
The data for this report is stored locally as “StormData.csv.bz2”.
storm <- read.csv("StormData.csv.bz2")
#dimension of the data
dim(storm)
## [1] 902297 37
As seen above, the storm data is huge and has 37 variables. The variables of interest to us is EVTYPE, FATALITIES, INJURIES, PROPDMG, and CROPDMG. The dplyr package is used to work on the storm data throughout the report. We’ll break the data processing into two parts: one for the purpose of analysing the impact on population health and the other for economic consequences.
The question of interest is which types of events are most harmful with respect to the population health. The relevant variables are EVTYPE, FATALITIES, and INJURIES. In the following code, a report of FATALITIES and INJURIES for each EVTYPE is generated.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Generating a report of MAG, FATALITIES, and INJURIES for each EVTYPE
report1 <- storm %>% filter(FATALITIES > 0 | INJURIES > 0) %>% group_by(EVTYPE) %>% summarize(fatalities = sum(FATALITIES), injuries = sum(INJURIES))
#Sneak peek at the report
head(report1)
## Source: local data frame [6 x 3]
##
## EVTYPE fatalities injuries
## 1 AVALANCE 1 0
## 2 AVALANCHE 224 170
## 3 BLACK ICE 1 24
## 4 BLIZZARD 101 805
## 5 BLOWING SNOW 1 13
## 6 BRUSH FIRE 0 2
Explanation of the variables:
1. Fatalities = the total number of fatalities for that event from 1950 to 2011.
2. Injuries = the total number of injuries for that event from 1950 to 2011.
The question of interest is which types of events have the greatest economic consequences. The relevant variables are EVTYPE, PROPDMG, and CROPDMG. In the following code, a report of property damage and crop damage for each EVTYPE is generated.
report2 <- storm %>% filter(PROPDMG > 0 | CROPDMG > 0) %>% group_by(EVTYPE) %>% summarize(property.damage = sum(PROPDMG), crop.damage = sum(CROPDMG))
report2 <- mutate(report2, total.damage = property.damage + crop.damage)
#Sneak peek
head(report2)
## Source: local data frame [6 x 4]
##
## EVTYPE property.damage crop.damage total.damage
## 1 HIGH SURF ADVISORY 200 0.00 200.00
## 2 FLASH FLOOD 50 0.00 50.00
## 3 TSTM WIND 108 0.00 108.00
## 4 TSTM WIND (G45) 8 0.00 8.00
## 5 ? 5 0.00 5.00
## 6 AGRICULTURAL FREEZE 0 28.82 28.82
Explanation of the variables:
1. property.damage = sum of the damage to properties for the event type from 1950 to 2011.
2. crop.damage = sum of the damage to crops for the event type from 1950 to 2011.
3. total.damage = sum of property and crop damage for the event type from 1950 to 2011.
In the following, the report is sorted based on fatalities and injuries in descending order.
by_fatalities <- arrange(report1, desc(fatalities))
by_injuries <- arrange(report1, desc(injuries))
The following is the 5 most disastrous events in terms of fatalities and injuries.
head(by_fatalities)
## Source: local data frame [6 x 3]
##
## EVTYPE fatalities injuries
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
head(by_injuries)
## Source: local data frame [6 x 3]
##
## EVTYPE fatalities injuries
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
#Plotting the 5 most harmful events with respect to population health
par(mfrow = c(2, 1))
with(by_fatalities[1:5,], barplot(fatalities, names.arg = EVTYPE, cex.names = 0.6, ylim = range(by_fatalities$fatalities), main = "Top 5 worst events by fatalities"))
with(by_injuries[1:5,], barplot(injuries, names.arg = EVTYPE, cex.names = 0.8, ylim = range(by_injuries$injuries), main = "Top 5 worst events by injuries"))
Figure 1: The top 5 most harmful events in terms of fatalities and injuries across the United States from 1950 to 2011
From the figure, tornado is the absolute most harmful event, causing massive fatalities and injuries that far outweight the other events. The second most disastrous event is arguably excessive heat, with number of fatalities more than twice of flash flood and injuries totalled more than 6000.
By arranging report2 by total damage in descending order, the event with greatest economic consequences can be identified by constructing a barplot.
report2 <- arrange(report2, desc(total.damage))
head(report2)
## Source: local data frame [6 x 4]
##
## EVTYPE property.damage crop.damage total.damage
## 1 TORNADO 3212258.2 100018.52 3312276.7
## 2 FLASH FLOOD 1420124.6 179200.46 1599325.1
## 3 TSTM WIND 1335965.6 109202.60 1445168.2
## 4 HAIL 688693.4 579596.28 1268289.7
## 5 FLOOD 899938.5 168037.88 1067976.4
## 6 THUNDERSTORM WIND 876844.2 66791.45 943635.6
#Plot the top 5 events that have the greatest economic consequences
with(report2[1:5,], barplot(total.damage, names.arg = EVTYPE, cex.names = "0.8", main = "Top 5 events that caused greatest economic damages"))
Figure 2: The top 5 most harmful events in terms of property and crop damages across the United States from 1950 to 2011
Tornado is again the worst event in terms of economic damages, causing more than USD3million property damages and USD100k crop damages.