Coursera - Reproducible Research: Peer Assessment 2

Impact of Severe Weather Events on Public Health and Economy in the United States

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Questions explored in this report

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences?

Configuration and Libraries

echo = TRUE  
options(scipen = 1)  # Turn off scientific notation.
library(ggplot2)
library(plyr)
require(gridExtra)
## Loading required package: gridExtra
require(dplyr) 
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Processing

We download and read the data from the Coursera course web site.

storm <- read.csv("repdata_data_StormData.csv.bz2")  # read the csv file
dim(storm)
## [1] 902297     37

Question:Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

In this section, we check the number of fatalities and injuries that are caused by the severe weather events. We would like to get the first 10 most severe types of weather events.The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

health_impact <- summarise(group_by(storm, EVTYPE), total_fatalities = sum(FATALITIES), total_injuries = sum(INJURIES))
health_impact <- subset(health_impact, total_fatalities>0 | total_injuries>0)
health_impact$total_health_impact <- health_impact$total_fatalities + health_impact$total_injuries

# Top 10 most injurious weather events
top10_health <- head(health_impact[with(health_impact, order(-total_health_impact, EVTYPE)),], 10)
print(top10_health)
## # A tibble: 10 x 4
##    EVTYPE            total_fatalities total_injuries total_health_impact
##    <fct>                        <dbl>          <dbl>               <dbl>
##  1 TORNADO                       5633          91346               96979
##  2 EXCESSIVE HEAT                1903           6525                8428
##  3 TSTM WIND                      504           6957                7461
##  4 FLOOD                          470           6789                7259
##  5 LIGHTNING                      816           5230                6046
##  6 HEAT                           937           2100                3037
##  7 FLASH FLOOD                    978           1777                2755
##  8 ICE STORM                       89           1975                2064
##  9 THUNDERSTORM WIND              133           1488                1621
## 10 WINTER STORM                   206           1321                1527

Question: Across the United States, which types of events have the greatest economic consequences?

We will calculate total Crop & Property damage from the PROPDMG and CROPDMG variables, but take care to use the PROPDMGEXP and CROPDMGEXP variables for multiplying the respective values where applicable.

summary(storm$PROPDMGEXP)
##             -      ?      +      0      1      2      3      4      5      6 
## 465934      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330
summary(storm$CROPDMGEXP)
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Both PROPDMGEXP and CROPDMGEXP columns record a multiplier for each observation where we have Hundred (H), Thousand (K), Million (M) and Billion (B).In addition we will interpret H as x100, and raw numbers N as 10^N.

storm$PROPDMG_ADJ <- storm$PROPDMG
B <- with(storm, PROPDMGEXP=="B" | PROPDMGEXP=="b")
M <- with(storm, PROPDMGEXP=="M" | PROPDMGEXP=="m")
K <- with(storm, PROPDMGEXP=="K" | PROPDMGEXP=="k")
H <- with(storm, PROPDMGEXP=="H" | PROPDMGEXP=="h")
N <- with(storm, is.numeric(PROPDMGEXP) & PROPDMGEXP != '0')
storm[B,]$PROPDMG_ADJ <- storm[B,]$PROPDMG_ADJ * 1000000000
storm[M,]$PROPDMG_ADJ <- storm[M,]$PROPDMG_ADJ * 1000000
storm[K,]$PROPDMG_ADJ <- storm[K,]$PROPDMG_ADJ * 1000
storm[H,]$PROPDMG_ADJ <- storm[H,]$PROPDMG_ADJ * 100
storm[N,]$PROPDMG_ADJ <- storm[N,]$PROPDMG_ADJ * (10^as.numeric(storm[N,]$PROPDMGEXP))

storm$CROPDMG_ADJ <- storm$CROPDMG
B <- with(storm, CROPDMGEXP=="B" | CROPDMGEXP=="b")
M <- with(storm, CROPDMGEXP=="M" | CROPDMGEXP=="m")
K <- with(storm, CROPDMGEXP=="K" | CROPDMGEXP=="k")
H <- with(storm, CROPDMGEXP=="H" | CROPDMGEXP=="h")
N <- with(storm, is.numeric(CROPDMGEXP) & CROPDMGEXP != '0')
storm[B,]$CROPDMG_ADJ <- storm[B,]$CROPDMG_ADJ * 1000000000
storm[M,]$CROPDMG_ADJ <- storm[M,]$CROPDMG_ADJ * 1000000
storm[K,]$CROPDMG_ADJ <- storm[K,]$CROPDMG_ADJ * 1000
storm[H,]$CROPDMG_ADJ <- storm[H,]$CROPDMG_ADJ * 100
storm[N,]$CROPDMG_ADJ <- storm[N,]$CROPDMG_ADJ * (10^as.numeric(storm[N,]$CROPDMGEXP))

storm$TOTAL_DMG <- storm$PROPDMG_ADJ + storm$CROPDMG_ADJ

economical_impact <- summarise(group_by(storm, EVTYPE), total_damage = sum(TOTAL_DMG))

# Top 10 most injurious weather events
top10_dmg <- head(economical_impact[with(economical_impact, order(-total_damage, EVTYPE)),], 10)
print(top10_dmg)
## # A tibble: 10 x 2
##    EVTYPE             total_damage
##    <fct>                     <dbl>
##  1 FLOOD             150319678257 
##  2 HURRICANE/TYPHOON  71913712800 
##  3 TORNADO            57352114049.
##  4 STORM SURGE        43323541000 
##  5 HAIL               18758222016.
##  6 FLASH FLOOD        17562129167.
##  7 DROUGHT            15018672000 
##  8 HURRICANE          14610229010 
##  9 RIVER FLOOD        10148404500 
## 10 ICE STORM           8967041360

Results

Impact on Public Health

The barplot below shows the top 10 weather events with the most health impact.

par(mar=c(5,11,4,2))

barplot(top10_health$total_health_impact / 1000, names.arg=top10_health$EVTYPE, col=heat.colors(10), horiz = TRUE, las=1, main="Fatalities and Injuries", xlab="x 1000 people")

Impact on Economy

The barplot below shows the top 10 weather events with the most economical impact.

par(mar=c(5,11,4,2))

barplot(top10_dmg$total_damage / 1000000000, names.arg=top10_dmg$EVTYPE, col=heat.colors(10), horiz = TRUE, las=1, main="Crop and Property Damage", xlab="Billion $")