In this report we describe the harmfullness of extreme weather events for both the health of the population and the economy. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Harm to the popluation health was conceptualized by adding fatalities and injuries together. Harm to the economy by adding damage to property and damage to crops together. The outcomes reveil heatwaves as the most harmfull weather event for population health and tornados (potential high damage) and tropical storms (highest median damage) as most harmfull for the economy.
From the NOAA website we downloaded the data on severe weather events.This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011
You can review the documentation here and the National Climatic Data Center Storm Events FAQ here.
This data is first stored in a separate data folder and then loaded into R (takes a while).
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "data/stormdata.bz2")
stormdata <- read.csv(bzfile("data/stormdata.bz2"))
For the research questions in this report, we need (1) a description of event types and (2) a conceptualisation and operationalisation of ‘harmfullness to population health’ and ‘ecnonomic consequences’. Harmfullness to population health can be conceptualized as the sum of injuries and fatalities. Economic consequences can be conceptualized as the sum of damage to property and damage to crops. Although fatalities and injuries could be given a different weight, this report treats them as equal. In a similar way, harm to properties and harm to crops were treated as equal. This leads us to the following operatonalisation:
This allows us to create a processed dataset [stormdata_processed] containing the variables described above.
stormdata_processed <- data.frame(
EVTYPE = stormdata$EVTYPE,
populationharm = stormdata$FATALITIES + stormdata$INJURIES,
economicharm = stormdata$PROPDMG + stormdata$CROPDMG)
In this section we describe the preprocessing of the data to answer two central questions:
When aggregating the data for each type of weather event, harmfullness can be interpreted in tree ways:
These values are calculated for both research questions. For each research question a data.frame is created which is then filtered to exclude less severe weather events (median values < 1).
First, this is done for the harm to the population health.
harmforhealth_median <- tapply(stormdata_processed$populationharm,stormdata_processed$EVTYPE, median)
harmforhealth_sum <- tapply(stormdata_processed$populationharm,stormdata_processed$EVTYPE, sum)
harmforhealth_max <- tapply(stormdata_processed$populationharm,stormdata_processed$EVTYPE, max)
harmforhealth <- data.frame(
EVTYPE=names(harmforhealth_median),
median=as.numeric(harmforhealth_median),
sum=as.numeric(harmforhealth_sum),
max=as.numeric(harmforhealth_max))
harmforhealth_clean <- harmforhealth[(harmforhealth$median>1),]
Next, this is done for the economic consequences.
economicharm_median <- tapply(stormdata_processed$economicharm,stormdata_processed$EVTYPE, median)
economicharm_sum <- tapply(stormdata_processed$economicharm,stormdata_processed$EVTYPE, sum)
economicharm_max <- tapply(stormdata_processed$economicharm,stormdata_processed$EVTYPE, max)
harmforeconomy <- data.frame(
EVTYPE=names(economicharm_median),
median=as.numeric(economicharm_median),
sum=as.numeric(economicharm_sum),
max=as.numeric(economicharm_max))
harmforeconomy_clean <- harmforeconomy[(harmforeconomy$median>1),]
RQ1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
To answer this question, the median (x-axis) is plotted on the sum value (y-axis). The size of the geoms reflects the maximum value (labels are scaled with the same varable).
This provides a clear overview of the harmfullness of the different weather events for the health of the population (injuries + fatalities).
library(ggplot2)
harmforhealth_graph <- ggplot(data=harmforhealth_clean, aes(median,sum))
harmforhealth_plot <- harmforhealth_graph +
geom_point(
size = harmforhealth_clean$max ,
color="steelblue",
alpha= 3/4) +
labs(title="harmfullness of different event types for the population health") +
labs(x = "Median amount of harm for the population health",
y = "Total sum of harm for the population health") +
geom_text(aes(
label=EVTYPE),
hjust=1, vjust=0,
size=(harmforhealth_clean$max/10))
harmforhealth_plot
To support the plot above, the weather events with the highest median, sum and maximum value are explicitly described below.
Highest median:
harmforhealth_clean[harmforhealth_clean$median==max(harmforhealth_clean$median),1:2]
## EVTYPE median
## 277 Heat Wave 70
Highest sum:
harmforhealth_clean[harmforhealth_clean$sum==max(harmforhealth_clean$sum),c(1,3)]
## EVTYPE sum
## 277 Heat Wave 70
Highest max:
harmforhealth_clean[harmforhealth_clean$max==max(harmforhealth_clean$max),c(1,4)]
## EVTYPE max
## 277 Heat Wave 70
This data clearly shows that heat waves have the highest impact on the health of the population. Both in sum, avegare and maximum values, such weather events score very high.
RQ2: Across the United States, which types of events have the greatest economic consequences?
To answer this question, the log of the median (x-axis) is plotted on the log of the sum value (y-axis). Logaritmic transformations were added to enhance the visual understanding of the plot. The size of the geoms reflects the maximum value (labels are scaled with the same variable).
This provides a clear overview of the harmfullness of the different weather events for the health of the economy (properties + crops).
harmforeconomy_graph <- ggplot(harmforeconomy_clean, aes(log(median), log(sum)))
harmforeconomy_plot <- harmforeconomy_graph +
geom_point(size = harmforeconomy_clean$max/100 ,
color="steelblue",
alpha= 3/4) +
labs(title="harmfullness of different event types for the economy") +
labs( x = "log(median amount of harm for the economy)",
y = "log(total sum of harm for the economy)") +
geom_text(aes(
label=EVTYPE),
hjust=0, vjust=0.7,
size=(harmforeconomy_clean$max/200))
harmforeconomy_plot
To support the plot above, the weather events with the highest median, sum and maximum value are explicitly described below.
Highest median:
harmforeconomy_clean[harmforeconomy_clean$median==max(harmforeconomy_clean$median),]
## EVTYPE median sum max
## 851 TROPICAL STORM GORDON 1000 1000 1000
Highest sum:
harmforeconomy_clean[harmforeconomy_clean$sum==max(harmforeconomy_clean$sum),]
## EVTYPE median sum max
## 834 TORNADO 2.5 3312277 4410
Highest max:
harmforeconomy_clean[harmforeconomy_clean$max==max(harmforeconomy_clean$max),]
## EVTYPE median sum max
## 834 TORNADO 2.5 3312277 4410
These data reveil tornados as the weather events that can deliver the highest damage for the economy, which is also relfected in its high sum value. However, tornados are not always this destructive for the economy, since their median value is a lot lower. This means that when a tornado occurs, it is very likely to be less destructive than the few very destructive ones. When it comes to the highest median value, tropical storms are much more harmfull.