Synopsis

Using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, this report identifies:

  1. The severe weather events that are most harmful with respect to population health
  2. The events that have the greatest economic consequences.

In our results, it was found that tornado is the event that caused the largest number of casualties and injuries from 1950 to 2011. It is also the event that caused the largest amount of damages to property and crop during the same period.

Data Processing

The data for this report are taken from the NOAA Storm Database, which can be downloaded by doing the following in R.

Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(Url, destfile = "StormData.csv.bz2", method = "curl")

Additional notes from the course project website: The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records.

The data for this report is stored locally as “StormData.csv.bz2”.

storm <- read.csv("StormData.csv.bz2")

#dimension of the data
dim(storm)
## [1] 902297     37

As seen above, the storm data is huge and has 37 variables. The variables of interest to us is EVTYPE, FATALITIES, INJURIES, PROPDMG, and CROPDMG. The dplyr package is used to work on the storm data throughout the report. We’ll break the data processing into two parts: one for the purpose of analysing the impact on population health and the other for economic consequences.

Question 1: Impact on Population Health

The question of interest is which types of events are most harmful with respect to the population health. The relevant variables are EVTYPE, FATALITIES, and INJURIES. In the following code, a report of FATALITIES and INJURIES for each EVTYPE is generated.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#Generating a report of MAG, FATALITIES, and INJURIES for each EVTYPE
report1 <- storm %>% filter(FATALITIES > 0 | INJURIES > 0) %>% group_by(EVTYPE) %>% summarize(fatalities = sum(FATALITIES), injuries = sum(INJURIES))
#Sneak peek at the report
head(report1)
## Source: local data frame [6 x 3]
## 
##         EVTYPE fatalities injuries
## 1     AVALANCE          1        0
## 2    AVALANCHE        224      170
## 3    BLACK ICE          1       24
## 4     BLIZZARD        101      805
## 5 BLOWING SNOW          1       13
## 6   BRUSH FIRE          0        2

Explanation of the variables:
1. Fatalities = the total number of fatalities for that event from 1950 to 2011.
2. Injuries = the total number of injuries for that event from 1950 to 2011.

Question 2: Economic Consequences

The question of interest is which types of events have the greatest economic consequences. The relevant variables are EVTYPE, PROPDMG, and CROPDMG. In the following code, a report of property damage and crop damage for each EVTYPE is generated.

report2 <- storm %>% filter(PROPDMG > 0 | CROPDMG > 0) %>% group_by(EVTYPE) %>% summarize(property.damage = sum(PROPDMG), crop.damage = sum(CROPDMG))
report2 <- mutate(report2, total.damage = property.damage + crop.damage)
#Sneak peek
head(report2)
## Source: local data frame [6 x 4]
## 
##                  EVTYPE property.damage crop.damage total.damage
## 1    HIGH SURF ADVISORY             200        0.00       200.00
## 2           FLASH FLOOD              50        0.00        50.00
## 3             TSTM WIND             108        0.00       108.00
## 4       TSTM WIND (G45)               8        0.00         8.00
## 5                     ?               5        0.00         5.00
## 6   AGRICULTURAL FREEZE               0       28.82        28.82

Explanation of the variables:
1. property.damage = sum of the damage to properties for the event type from 1950 to 2011.
2. crop.damage = sum of the damage to crops for the event type from 1950 to 2011.
3. total.damage = sum of property and crop damage for the event type from 1950 to 2011.

Results

Question 1: Impact on Population Health

In the following, the report is sorted based on fatalities and injuries in descending order.

by_fatalities <- arrange(report1, desc(fatalities))
by_injuries <- arrange(report1, desc(injuries))

The following is the 5 most disastrous events in terms of fatalities and injuries.

head(by_fatalities)
## Source: local data frame [6 x 3]
## 
##           EVTYPE fatalities injuries
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3    FLASH FLOOD        978     1777
## 4           HEAT        937     2100
## 5      LIGHTNING        816     5230
## 6      TSTM WIND        504     6957
head(by_injuries)
## Source: local data frame [6 x 3]
## 
##           EVTYPE fatalities injuries
## 1        TORNADO       5633    91346
## 2      TSTM WIND        504     6957
## 3          FLOOD        470     6789
## 4 EXCESSIVE HEAT       1903     6525
## 5      LIGHTNING        816     5230
## 6           HEAT        937     2100
#Plotting the 5 most harmful events with respect to population health
par(mfrow = c(2, 1))
with(by_fatalities[1:5,], barplot(fatalities, names.arg = EVTYPE, cex.names = 0.6, ylim = range(by_fatalities$fatalities), main = "Top 5 worst events by fatalities"))
with(by_injuries[1:5,], barplot(injuries, names.arg = EVTYPE, cex.names = 0.8, ylim = range(by_injuries$injuries), main = "Top 5 worst events by injuries"))

Figure 1: The top 5 most harmful events in terms of fatalities and injuries across the United States from 1950 to 2011

From the figure, tornado is the absolute most harmful event, causing massive fatalities and injuries that far outweight the other events. The second most disastrous event is arguably excessive heat, with number of fatalities more than twice of flash flood and injuries totalled more than 6000.

Question 2: Economic Consequences

By arranging report2 by total damage in descending order, the event with greatest economic consequences can be identified by constructing a barplot.

report2 <- arrange(report2, desc(total.damage))
head(report2)
## Source: local data frame [6 x 4]
## 
##              EVTYPE property.damage crop.damage total.damage
## 1           TORNADO       3212258.2   100018.52    3312276.7
## 2       FLASH FLOOD       1420124.6   179200.46    1599325.1
## 3         TSTM WIND       1335965.6   109202.60    1445168.2
## 4              HAIL        688693.4   579596.28    1268289.7
## 5             FLOOD        899938.5   168037.88    1067976.4
## 6 THUNDERSTORM WIND        876844.2    66791.45     943635.6
#Plot the top 5 events that have the greatest economic consequences
with(report2[1:5,], barplot(total.damage, names.arg = EVTYPE, cex.names = "0.8", main = "Top 5 events that caused greatest economic damages"))

Figure 2: The top 5 most harmful events in terms of property and crop damages across the United States from 1950 to 2011

Tornado is again the worst event in terms of economic damages, causing more than USD3million property damages and USD100k crop damages.