Synopsis: The goal of this report is to present an analysis of a dataset from U.S. National Oceanic and Atmospheric Administrationâs (NOAA) about impact of storms and other extreme weather conditions on public health and economy.
First section, Data Processing, follows the steps taken to process the data and investigate on 2 questions:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
Second section, Results, presents found answers on above questions.
Historically, wild fires and tornado are the most harmful/deadly, while frost and hurricane have the greatest economic consequences.
Data used in this analysis can be downloaded from Coursera repdata course website:
if (!file.exists("NOAA.storm.csv.bz2")){
file.URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file.URL, destfile = "NOAA.storm.csv.bz2", method = "curl")
}
Loaded dataset is cached and this report depends on cached value:
NOAA.storm <- read.csv("NOAA.storm.csv.bz2")
Of all variables most interesting ones for questions asked are FATALITIES, INJURIES, PROPDMG and CROPDMG. Obviously, not all events have these variables greater from zero, so it is reasonable to split the dataset into two, based on impact type (on public health, NOAA.health and economy, NOAA.economy):
NOAA.health <- NOAA.storm[NOAA.storm$FATALITIES > 0 | NOAA.storm$INJURIES > 0, ]
NOAA.economy <- NOAA.storm[NOAA.storm$PROPDMG > 0 | NOAA.storm$CROPDMG > 0, ]
We have 21929 health-related events in the dataset and 245031 ones with economic consequances.
For each consequance in both categories we can find average impact or total impact for individual event type. Therefore we coul later compare summed and mean impact of each event type and find most dangerous ones:
For health-related events:
NOAA.health.fatal.mean <- aggregate(list(fatal.mean = NOAA.health$FATALITIES), by = list(event.type = NOAA.health$EVTYPE), mean)
NOAA.health.fatal.sum <- aggregate(list(fatal.sum = NOAA.health$FATALITIES), by = list(event.type = NOAA.health$EVTYPE), sum)
NOAA.health.injury.mean <- aggregate(list(injury.mean = NOAA.health$INJURIES), by = list(event.type = NOAA.health$EVTYPE), mean)
NOAA.health.injury.sum <- aggregate(list(injury.sum = NOAA.health$INJURIES), by = list(event.type = NOAA.health$EVTYPE), sum)
For economic impact we have more difficult situation, since crop and property damages are expressed with 2 additional columns (PROPDMGEXP and CROPDMGEXP) which represent multipliers (from CODE BOOK it seems we have Hundreds (H), Thousands (K), Millions (M) and Billions (B) multipliers to base values in main variables). If we make a mapping of these letters to integers we can change chosen columns to represent actual values of cost:
NOAA.economy.map <- c("H"=100, "K"=1000, "M"=1000000, "B"=1000000000, "?"=1, " "=0)
#head(NOAA.economy$PROPDMG) #track head for testing
NOAA.economy$PROPDMG <- NOAA.economy$PROPDMG * as.integer(NOAA.economy.map[as.character(NOAA.economy$PROPDMGEXP)])
NOAA.economy$CROPDMG <- NOAA.economy$CROPDMG * as.integer(NOAA.economy.map[as.character(NOAA.economy$CROPDMGEXP)])
#head(NOAA.economy$PROPDMG)
NOAA.economy.prop.mean <- aggregate(list(prop.mean = NOAA.economy$PROPDMG), by = list(event.type = NOAA.economy$EVTYPE), mean)
NOAA.economy.prop.sum <- aggregate(list(prop.sum = NOAA.economy$PROPDMG), by = list(event.type = NOAA.economy$EVTYPE), sum)
NOAA.economy.crop.mean <- aggregate(list(crop.mean = NOAA.economy$CROPDMG), by = list(event.type = NOAA.economy$EVTYPE), mean)
NOAA.economy.crop.sum <- aggregate(list(crop.sum = NOAA.economy$CROPDMG), by = list(event.type = NOAA.economy$EVTYPE), sum)
As a dependancy for plots in next section, we load ggplot2.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.1
library(grid)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.1
Consequances of extreme weather events to public health could be fatal or could produce injuries. Both consequances we can look at in total or on average for each event type. If we take first 12 types with greatest impact, we could make plots that are illustrative and conclusive:
max.fatal.sum <- head(NOAA.health.fatal.sum[order(NOAA.health.fatal.sum$fatal.sum, decreasing = T),])
max.fatal.mean <- head(NOAA.health.fatal.mean[order(NOAA.health.fatal.mean$fatal.mean, decreasing = T),])
max.injury.sum <- head(NOAA.health.injury.sum[order(NOAA.health.injury.sum$injury.sum, decreasing = T),])
max.injury.mean <- head(NOAA.health.injury.mean[order(NOAA.health.injury.mean$injury.mean, decreasing = T),])
fatal.sum <- ggplot(max.fatal.sum, aes(event.type, fatal.sum, fill = event.type)) +
geom_bar(stat="identity") +
xlab("") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Total fatalities") + theme(legend.position="none")
fatal.mean <- ggplot(max.fatal.mean, aes(event.type, fatal.mean, fill = event.type)) +
geom_bar(stat="identity") +
xlab("") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Mean fatalities") + theme(legend.position="none")
injury.sum <- ggplot(max.injury.sum, aes(event.type, injury.sum, fill = event.type)) +
geom_bar(stat="identity") +
xlab("Event type") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Total injuries") + theme(legend.position="none")
injury.mean <- ggplot(max.injury.mean, aes(event.type, injury.mean, fill = event.type)) +
geom_bar(stat="identity") +
xlab("Event type") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Mean injuries") + theme(legend.position="none")
grid.arrange(fatal.sum, fatal.mean, injury.sum, injury.mean, top=textGrob("Total and mean impact on public health, by event type\n"))
In following graphs it is shown how total and average cost differs based on event type, both for property damages and crop damages:
max.prop.sum <- head(NOAA.economy.prop.sum[order(NOAA.economy.prop.sum$prop.sum, decreasing = T),])
max.prop.mean <- head(NOAA.economy.prop.mean[order(NOAA.economy.prop.mean$prop.mean, decreasing = T),])
max.crop.sum <- head(NOAA.economy.crop.sum[order(NOAA.economy.crop.sum$crop.sum, decreasing = T),])
max.crop.mean <- head(NOAA.economy.crop.mean[order(NOAA.economy.crop.mean$crop.mean, decreasing = T),])
prop.sum <- ggplot(max.prop.sum, aes(event.type, prop.sum, fill = event.type)) +
geom_bar(stat="identity") +
xlab("") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Total property cost") + theme(legend.position="none")
prop.mean <- ggplot(max.prop.mean, aes(event.type, prop.mean, fill = event.type)) +
geom_bar(stat="identity") +
xlab("") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Mean property cost") + theme(legend.position="none")
crop.sum <- ggplot(max.crop.sum, aes(event.type, crop.sum, fill = event.type)) +
geom_bar(stat="identity") +
xlab("Event type") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Total crop cost") + theme(legend.position="none")
crop.mean <- ggplot(max.crop.mean, aes(event.type, crop.mean, fill = event.type)) +
geom_bar(stat="identity") +
xlab("Event type") +
theme(axis.text.x = element_text(angle = 25, hjust = 1)) +
ylab("Mean crop cost") + theme(legend.position="none")
grid.arrange(prop.sum, prop.mean, crop.sum, crop.mean, top=textGrob("Total and mean impact on economy, by event type\n"))