To inform policy on preventative measures against harmful and damaging weather events in the United States, this analysis attempts to determine the types of weather event types that are most harmful with respect to population health, as well as weather event types that are most damaging to the country’s economy. It uses data provided by the National Oceanic and Atmospheric Administration (NOAA). It follows a simple approach to produce two rankings, each listing the most dangerous and most damaging weather event types observed in the United States between 1996 and 2011 respectively.
Weather events often have negative consequences on the health of a population, as well as on the economy of a country. Policy makers need to make informed decisions on allocating resources to counteract these weather events. This report contributes by attempting to answer two questions:
The analysis employs simple ranking approach to answer the questions stated above. It follows the following logical steps:
The data set used in this analysis is provided by the National Oceanic and Atmospheric Administration (NOAA).
The NOAA data set covers observations from 1950 to 2011. However, only events recorded after 1996 are used.
packages <- c("R.utils", "plyr", "ggplot2")
installed <- packages %in% installed.packages()
if (any(!installed)) {
install.packages(packages[!installed])
}
lapply(packages, library, character.only = TRUE)
## [[1]]
## [1] "R.utils" "R.oo" "R.methodsS3" "stats" "graphics"
## [6] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "plyr" "R.utils" "R.oo" "R.methodsS3" "stats"
## [6] "graphics" "grDevices" "utils" "datasets" "methods"
## [11] "base"
##
## [[3]]
## [1] "ggplot2" "plyr" "R.utils" "R.oo" "R.methodsS3"
## [6] "stats" "graphics" "grDevices" "utils" "datasets"
## [11] "methods" "base"
if (!file.exists("storm_data.csv")) {
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file_name <- "storm_data.csv.bz2"
download.file(file_url, file_name, method = "auto")
if (!file.exists("storm_data.csv")) {
bunzip2(file_name, overwrite = TRUE)
}
}
original_storm_data <- read.csv("storm_data.csv")
nrow(original_storm_data)
## [1] 902297
storm_data <- original_storm_data[, c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
storm_data$BGN_DATE <- as.Date(as.character(storm_data$BGN_DATE), "%m/%d/%Y %H:%M:%S")
storm_data <- subset(storm_data, as.numeric(format(storm_data$BGN_DATE,"%Y")) > 1996)
nrow(storm_data)
## [1] 621260
For observations of the FATALITIES and INJURIES variables to be valuable for determining negative consequences on population health, they must be greater than 0.
storm_data_for_harmfulness <- subset(storm_data, FATALITIES > 0 | INJURIES > 0)
For analysing consequences on population health, only EVTYPE, FATALITIES, and INJURIES are considered.
storm_data_for_harmfulness <- storm_data_for_harmfulness[,c("EVTYPE","FATALITIES","INJURIES")]
nrow(storm_data_for_harmfulness)
## [1] 11851
To approximate the total effect, fatalities and injuries are combined.
storm_data_for_harmfulness$HARMFULNESS <- storm_data_for_harmfulness$FATALITIES + storm_data_for_harmfulness$INJURIES
storm_data_for_harmfulness_grouped_per_event_type <- ddply(storm_data_for_harmfulness, .(EVTYPE), numcolwise(sum))
For economic impact, valid property and crop damage values are required.
storm_data_for_economy <- subset(storm_data, PROPDMGEXP != "" & CROPDMGEXP != "")
storm_data_for_economy <- subset(storm_data_for_economy, PROPDMG > 0 | CROPDMG > 0)
storm_data_for_economy <- storm_data_for_economy[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
nrow(storm_data_for_economy)
## [1] 91502
Multipliers are applied to obtain full damage values.
storm_data_for_economy$PROPDMGEXP <- toupper(storm_data_for_economy$PROPDMGEXP)
storm_data_for_economy$CROPDMGEXP <- toupper(storm_data_for_economy$CROPDMGEXP)
storm_data_for_economy$PROPDMGFULL <- ifelse(storm_data_for_economy$PROPDMGEXP=="K", storm_data_for_economy$PROPDMG*1000,
ifelse(storm_data_for_economy$PROPDMGEXP=="M", storm_data_for_economy$PROPDMG*1000000,
ifelse(storm_data_for_economy$PROPDMGEXP=="B", storm_data_for_economy$PROPDMG*1000000000,0)))
storm_data_for_economy$CROPDMGFULL <- ifelse(storm_data_for_economy$CROPDMGEXP=="K", storm_data_for_economy$CROPDMG*1000,
ifelse(storm_data_for_economy$CROPDMGEXP=="M", storm_data_for_economy$CROPDMG*1000000,
ifelse(storm_data_for_economy$CROPDMGEXP=="B", storm_data_for_economy$CROPDMG*1000000000,0)))
storm_data_for_economy$DAMAGE <- storm_data_for_economy$PROPDMGFULL + storm_data_for_economy$CROPDMGFULL
storm_data_for_economy_grouped_per_event_type <- ddply(storm_data_for_economy, .(EVTYPE), numcolwise(sum))
top_10_events_for_harmfulness <- storm_data_for_harmfulness_grouped_per_event_type[
order(storm_data_for_harmfulness_grouped_per_event_type$HARMFULNESS, decreasing = TRUE),
][1:10, ]
top_10_events_for_harmfulness[, c("EVTYPE", "HARMFULNESS")]
## EVTYPE HARMFULNESS
## 96 TORNADO 21447
## 22 EXCESSIVE HEAT 8093
## 29 FLOOD 7121
## 66 LIGHTNING 4424
## 98 TSTM WIND 3524
## 28 FLASH FLOOD 2425
## 94 THUNDERSTORM WIND 1530
## 42 HEAT 1459
## 55 HURRICANE/TYPHOON 1339
## 112 WINTER STORM 1209
top_10_events_for_damage <- storm_data_for_economy_grouped_per_event_type[
order(storm_data_for_economy_grouped_per_event_type$DAMAGE, decreasing = TRUE),
][1:10, ]
top_10_events_for_damage[, c("EVTYPE", "DAMAGE")]
## EVTYPE DAMAGE
## 17 FLOOD 136877233900
## 32 HURRICANE/TYPHOON 29348167800
## 51 TORNADO 16203902150
## 31 HURRICANE 11474663000
## 25 HAIL 9172124220
## 16 FLASH FLOOD 8246133530
## 48 STORM SURGE/TIDE 4641493000
## 50 THUNDERSTORM WIND 3780985440
## 61 WILDFIRE 3684468370
## 30 HIGH WIND 2873328540
This analysis shows that excessive heat, floods, lightning and tornadoes are among the most harmful weather events to population health, while floods, hail, hurricanes and tornadoes cause the greatest economic damage.