The following analysis uses data provided by the US NOAA Storm Database to assess the link between types of storm and their impact on population health and the economy.
The analysis takes uses data recorded on the STORM database from 1950 to 2011 to assess the effect on population health (as measured by fatalities and injuries) and the economy (as measured by total damage to both Property and Crops).
Analyis highlights that tornados have the greatest impact on population health and accounts for the greatest percentage of deaths and injuries from Storms, whereas the greatest economic impact has been from Floods over the period with c.$150bn in property and crop damage over the period.
Data processing has been split into two parts. Firstly the loading of the raw data and then secondly pre-processing to address each question. Data was also processed to tidy up data anomolies and make it more consistant.
Data has been sourced from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Once downloaded the file is loaded into R using the Read.csv command.
if(!file.exists("STORM_Data.csv")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "STORM_Data.csv")
}
raw_data <- read.csv("STORM_Data.csv")
Initial data for Question 1 relates to the relationship between event type (EVTYPE) and population health.
The pre-processing creates a data.frame with] relevant variables. These have been made lower case and with whitespace trimmed. This removes variability due to the way variables were entered in the database.
Data is then summarised to view the total fatialities and injuries by event type over the complete history available
The data was filtered to include only relevant columns, and then calculate the percentage of total injuries and fatalities for each event type. Finally the data was transformed into a long data form for use in plotting in the results section.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
eventtype.health.data <- raw_data %>%
select(EVTYPE , FATALITIES, INJURIES) %>%
mutate( FAT_PERCENT = FATALITIES / sum(FATALITIES)*100, INJ_PERCENT = INJURIES / sum(INJURIES)*100, EVTYPE =trimws(tolower(EVTYPE))) %>%
group_by(EVTYPE) %>%
summarise( FAT_PERCENT = sum(FAT_PERCENT), INJ_PERCENT = sum(INJ_PERCENT) )%>%
pivot_longer( cols= -EVTYPE, names_to = "TYPE", values_to = "PERCENT" ) %>%
group_by(TYPE) %>%
top_n(10, PERCENT) %>%
ungroup() %>%
arrange( TYPE, desc(PERCENT))
Pre-processing to address the questions of event type and economic effect selects the variables which effect event type classification and economic health into a single dataset from which analysis will be undertaken
Again event types have been made lowercase and whitespace removed to remove errors caused by data entry.
Property and Crop damage in dollar amounts are used as a proxy for total economic damage by each Storm event.
Values are then converted to actual dollar amounts based on the exposure types. B for Billions, M for Million, K for Thousand and H for hundred.
Entries with no unit provided are assumed to be in dollar amounts.
Any other entries have been removed from the dataset as it cannot be confirmed what the correct units have been and are classed as data entry errors
Data transforms: The raw data was firstly condensed to include only relevant columns to the analysis. Data was then covnerted to lowercase and whitespace removed.
This was then used to convert dollar amounts into common dollar units using the PROPDMGEXP descriptor. The final output then combined the property and crop damage to calculate the total dollar amounts of damage.
eventtype.economic.data <- raw_data %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% # Relevant columns
mutate( EVTYPE =trimws(tolower(EVTYPE)), PROPDMGEXP = trimws(tolower(PROPDMGEXP)),
CROPDMGEXP = trimws(tolower(CROPDMGEXP))) %>% # Format lowercase and whitespace
filter( CROPDMGEXP %in% c("m","k","b","h","") & PROPDMGEXP %in% c("m","k","b","h","") ) %>%
mutate( propdmg.dollar = # Calculate dollar amounts
ifelse(PROPDMGEXP=="h", PROPDMG*100,
ifelse(PROPDMGEXP=="k", PROPDMG*1000,
ifelse(PROPDMGEXP == "m", PROPDMG*1000000,
ifelse(PROPDMGEXP =="b", PROPDMG*1000000000,
PROPDMG
))))) %>%
mutate( cropdmg.dollar =
ifelse(CROPDMGEXP=="h", CROPDMG*100,
ifelse(CROPDMGEXP=="k", CROPDMG*1000,
ifelse(CROPDMGEXP == "m", CROPDMG*1000000,
ifelse(CROPDMGEXP =="b", CROPDMG*1000000000,
CROPDMG
))))) %>%
mutate( total.dollar.damage = propdmg.dollar + cropdmg.dollar) %>%
group_by(EVTYPE) %>%
summarise( total.dollar.damage = sum(total.dollar.damage) ) %>%
top_n( 20, total.dollar.damage) # Summarise the top event types by dollar value
The impact on Population Health has been measured by using the amount of injuries and fatalities caused by each storm event recorded in the database.
library(ggplot2)
labs <- c("Percent of Total Fatalities", "Percent of Total Injuries")
names(labs) <- c("FAT_PERCENT", "INJ_PERCENT")
g <- ggplot(data = eventtype.health.data, aes(x= reorder(EVTYPE, -PERCENT), y= PERCENT))
g + geom_bar(stat= "identity", aes(colour = TYPE) , fill = "white" ) +
facet_wrap( vars(TYPE), nrow=2, ncol=1,
labeller = labeller( TYPE = labs ) ) +
theme(legend.position = "none", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs( title = "Percentage of Injuries and Fatalities by Storm Event Type",
x = "Event", y = "Percent of Total Injuries / Fatalities",
subtitle = "Top 10 events by percent")
As we can see the largest impact on Population Health across the entire period and across the US has been from Tornados. These cause over 60% of all injuries and 35% of all fatalities caused by Storms over the period. This is followed by Exessive Heat and Lightning as the events which have the largest effect on population health.
The economic impact has been measured by analysing the approximate cost of damage to both property and crops over the USA from all storms from 1950 to 2011.
p <- ggplot(data = eventtype.economic.data, aes(x=reorder(EVTYPE,-total.dollar.damage) , total.dollar.damage/1000000000))
p + geom_bar(stat = "identity", colour ="navy" , fill = "white") +
theme(legend.position = "none", axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs( title = "Total Dollar Impact (Bn.) by Storm Event Type",
x = "Event", y = "Total Dollar Value of Storm Damage",
subtitle = "Top 20 events by value")
In terms of economic impact, the damage caused to property and crops have had by far the largest impact causing c.$150bn of damage over the period 1955 to 2011. This is followed by hurricanes and then tornados.