Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis on the storm event database revealed that tornadoes were the most dangerous events to the population health. The second most dangerous event type was the excessive heat. The economic impact of weather events was also analyzed. Flash floods and thunderstorm winds caused billions of dollars in property damages between 1950 and 2011. The largest crop damage caused by drought, followed by flood and hails.
Download NOAA database
##install.packages("magrittr") # package installations are only needed the first time you use it
##install.packages("dplyr") # alternative installation of the %>%
require(ggplot2)
## Loading required package: ggplot2
require(reshape2)
## Loading required package: reshape2
require(plyr)
## Loading required package: plyr
library(magrittr) # needs to be run every time you start R and want to use %>%
library(dplyr) # alternatively, this also loads %>%
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- read.csv(bzfile("stormData.csv.bz2"))
# The number of unique event types
length(unique(data$EVTYPE))
## [1] 985
To find the event types that are most harmful to population health, The number of casualties are aggregated by the event type.
install.packages('plyr', repos = "http://cran.us.r-project.org")
##
## The downloaded binary packages are in
## /var/folders/s0/hts3dbss56x0x2f85v7w1p6r0000gn/T//RtmpkxTk5L/downloaded_packages
library(plyr)
casualties <- ddply(data, .(EVTYPE), summarize,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
##Dangerous Events with respect to Population Health
# Find events that caused most death and injury
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)
fatal_events[, c("EVTYPE", "fatalities")]
## EVTYPE fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
injury_events[, c("EVTYPE", "injuries")]
## EVTYPE injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## Results
## The following plot shows top dangerous weather event types and the impacts on
## the number of fatalities and injuries.
install.packages('gridExtra')
##
## The downloaded binary packages are in
## /var/folders/s0/hts3dbss56x0x2f85v7w1p6r0000gn/T//RtmpkxTk5L/downloaded_packages
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
# Set the levels in order
p1 <- ggplot(data=fatal_events,
aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of fatalities") +
xlab("Event type") +
theme(legend.position="none")
p2 <- ggplot(data=injury_events,
aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of injuries") +
xlab("Event type") +
theme(legend.position="none")
grid.arrange(p1, p2)
The events which fluenced on the economy.
To analyze the impact of bad weather on the economy, property damage and crop damage estimates were used.
In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP.
Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step is to calculate the property and crop damage for each event.
Compute the economic loss by event type ==========================================
library(ggplot2)
library(gridExtra)
exp_transform <- function(e) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e))) # if a digit
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(0)
else {
stop("Invalid exponent value.")
}
}
prop_dmg_exp <- sapply(data$PROPDMGEXP, FUN=exp_transform)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
data$prop_dmg <- data$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(data$CROPDMGEXP, FUN=exp_transform)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
data$crop_dmg <- data$CROPDMG * (10 ** crop_dmg_exp)
library(plyr)
econ_loss <- ddply(data, .(EVTYPE), summarize,
prop_dmg = sum(prop_dmg),
crop_dmg = sum(crop_dmg))
# Filter out events that caused no economic loss
econ_loss <- econ_loss[(econ_loss$prop_dmg > 0 | econ_loss$crop_dmg > 0), ]
prop_dmg_events <- head(econ_loss[order(econ_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(econ_loss[order(econ_loss$crop_dmg, decreasing = T), ], 10)
The results of the most destructive weather events with an impact on economy
p1 <- ggplot(data=prop_dmg_events,
aes(x=reorder(EVTYPE, prop_dmg), y=log10(prop_dmg), fill=prop_dmg )) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Property damage in dollars (log-scale)") +
theme(legend.position="none")
p2 <- ggplot(data=crop_dmg_events,
aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Crop damage in dollars") +
theme(legend.position="none")
grid.arrange(p1, p2)