Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?
echo = TRUE
options(scipen = 1) # Turn off scientific notation.
library(ggplot2)
library(plyr)
require(gridExtra)
## Loading required package: gridExtra
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
We download and read the data from the Coursera course web site.
storm <- read.csv("repdata_data_StormData.csv.bz2") # read the csv file
dim(storm)
## [1] 902297 37
In this section, we check the number of fatalities and injuries that are caused by the severe weather events. We would like to get the first 10 most severe types of weather events.The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
health_impact <- summarise(group_by(storm, EVTYPE), total_fatalities = sum(FATALITIES), total_injuries = sum(INJURIES))
health_impact <- subset(health_impact, total_fatalities>0 | total_injuries>0)
health_impact$total_health_impact <- health_impact$total_fatalities + health_impact$total_injuries
# Top 10 most injurious weather events
top10_health <- head(health_impact[with(health_impact, order(-total_health_impact, EVTYPE)),], 10)
print(top10_health)
## # A tibble: 10 x 4
## EVTYPE total_fatalities total_injuries total_health_impact
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
We will calculate total Crop & Property damage from the PROPDMG and CROPDMG variables, but take care to use the PROPDMGEXP and CROPDMGEXP variables for multiplying the respective values where applicable.
summary(storm$PROPDMGEXP)
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
summary(storm$CROPDMGEXP)
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).In addition we will interpret H as x100, and raw numbers N as 10^N.
storm$PROPDMG_ADJ <- storm$PROPDMG
B <- with(storm, PROPDMGEXP=="B" | PROPDMGEXP=="b")
M <- with(storm, PROPDMGEXP=="M" | PROPDMGEXP=="m")
K <- with(storm, PROPDMGEXP=="K" | PROPDMGEXP=="k")
H <- with(storm, PROPDMGEXP=="H" | PROPDMGEXP=="h")
N <- with(storm, is.numeric(PROPDMGEXP) & PROPDMGEXP != '0')
storm[B,]$PROPDMG_ADJ <- storm[B,]$PROPDMG_ADJ * 1000000000
storm[M,]$PROPDMG_ADJ <- storm[M,]$PROPDMG_ADJ * 1000000
storm[K,]$PROPDMG_ADJ <- storm[K,]$PROPDMG_ADJ * 1000
storm[H,]$PROPDMG_ADJ <- storm[H,]$PROPDMG_ADJ * 100
storm[N,]$PROPDMG_ADJ <- storm[N,]$PROPDMG_ADJ * (10^as.numeric(storm[N,]$PROPDMGEXP))
storm$CROPDMG_ADJ <- storm$CROPDMG
B <- with(storm, CROPDMGEXP=="B" | CROPDMGEXP=="b")
M <- with(storm, CROPDMGEXP=="M" | CROPDMGEXP=="m")
K <- with(storm, CROPDMGEXP=="K" | CROPDMGEXP=="k")
H <- with(storm, CROPDMGEXP=="H" | CROPDMGEXP=="h")
N <- with(storm, is.numeric(CROPDMGEXP) & CROPDMGEXP != '0')
storm[B,]$CROPDMG_ADJ <- storm[B,]$CROPDMG_ADJ * 1000000000
storm[M,]$CROPDMG_ADJ <- storm[M,]$CROPDMG_ADJ * 1000000
storm[K,]$CROPDMG_ADJ <- storm[K,]$CROPDMG_ADJ * 1000
storm[H,]$CROPDMG_ADJ <- storm[H,]$CROPDMG_ADJ * 100
storm[N,]$CROPDMG_ADJ <- storm[N,]$CROPDMG_ADJ * (10^as.numeric(storm[N,]$CROPDMGEXP))
storm$TOTAL_DMG <- storm$PROPDMG_ADJ + storm$CROPDMG_ADJ
economical_impact <- summarise(group_by(storm, EVTYPE), total_damage = sum(TOTAL_DMG))
# Top 10 most injurious weather events
top10_dmg <- head(economical_impact[with(economical_impact, order(-total_damage, EVTYPE)),], 10)
print(top10_dmg)
## # A tibble: 10 x 2
## EVTYPE total_damage
## <fct> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758222016.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
The barplot below shows the top 10 weather events with the most health impact.
par(mar=c(5,11,4,2))
barplot(top10_health$total_health_impact / 1000, names.arg=top10_health$EVTYPE, col=heat.colors(10), horiz = TRUE, las=1, main="Fatalities and Injuries", xlab="x 1000 people")
The barplot below shows the top 10 weather events with the most economical impact.
par(mar=c(5,11,4,2))
barplot(top10_dmg$total_damage / 1000000000, names.arg=top10_dmg$EVTYPE, col=heat.colors(10), horiz = TRUE, las=1, main="Crop and Property Damage", xlab="Billion $")