In this report, we look at the effects of severe weather events on public health and the economy. We use data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which type of events are most harmful to public health and which types have the greatest economic consequences. The effects on public health is measured by the number of fatalities and injuries attributed to the event, while the economic consequences are measured in terms of dollars of property and crop damage. To determine which events are most harmful to public health, we look at the average number of fatalities and injuries per year and examine how it has changed over time for the five events with the highest overall average number of fatalities and injuries. We do a similar analysis for economic consequences. Overall, we find that tornadoes tend to cause the most injuries and fatalities, though heat-related events have had significant public health consequences in recent years. We also find that floods tend to have the greatest economic consequences, though the consequences of hurricanes can also be significant, as was the case for Hurricane Opal in 1995.
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
First, we load the raw data into R using ‘read.csv’. After loading the data, we convert the property and crop damages into dollar amounts using the variables PROPDMGEXP and CROPDMGEXP, respectively, where “h” or “H” designates hundreds of dollars, “k” or “K” denotes thousands, “m” or “M” denotes millions, and “B” denotes billions. All other designations are treated as missing data and are ignored.
We also find the year in which the weather event began for our analysis of consequences by year, which we store in the variable YEAR.
After the conversion of the data, we select the pertinent variables for our analysis, event type (EVTYPE), year in which the event began (YEAR), fatalities (FATALITIES), injuries (INJURIES), property damage in dollars (PROPDMG), and crop damage in dollars (CROPDMG).
storm <- read.csv("repdata-data-StormData.csv.bz2") %>%
mutate(PROPDMGEXP = ifelse(PROPDMGEXP == "h" | PROPDMGEXP == "H",100,
ifelse(PROPDMGEXP == "k" | PROPDMGEXP == "K",1000,
ifelse(PROPDMGEXP == "m" | PROPDMGEXP == "M",10^6,
ifelse(PROPDMGEXP == "B",10^9,0))))) %>%
mutate(CROPDMGEXP = ifelse(CROPDMGEXP == "h" | CROPDMGEXP == "H",100,
ifelse(CROPDMGEXP == "k" | CROPDMGEXP == "K",1000,
ifelse(CROPDMGEXP == "m" | CROPDMGEXP == "M",10^6,
ifelse(CROPDMGEXP == "B",10^9,0))))) %>%
mutate(PROPDMG = PROPDMG * PROPDMGEXP,
CROPDMG = CROPDMG * CROPDMGEXP,
YEAR = substr(BGN_DATE,nchar(as.character(BGN_DATE)) - 11,nchar(as.character(BGN_DATE)) - 8)) %>%
select(EVTYPE,YEAR,FATALITIES,INJURIES,PROPDMG,CROPDMG)
First, we look at the type of events which are most harmful to public health. These are the events which cause the most fatalities and injuries. We find the total number of injuries and fatalities for each event type by year and then find the overall mean number of injuries and fatalities for each event type over all years. For the event types which are among the top 10 in overall mean fatalities and injuries, we create a two-panel time-series plot, one panel for total injuries and one panel for total fatalities to compare how they change over time.
public_health <- ddply(storm,
.(EVTYPE,YEAR),
summarise,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES)
)
public_health_mean <- ddply(public_health,
.(EVTYPE),
summarise,
mean_fatalities = mean(fatalities),
mean_injuries = mean(injuries)
)
top_fatalities <- arrange(public_health_mean,desc(mean_fatalities),desc(mean_injuries))[1:10,1]
top_injuries <- arrange(public_health_mean,desc(mean_injuries),desc(mean_fatalities))[1:10,1]
most_public_harm <- intersect(top_fatalities,top_injuries)
most_harm_public_health <- public_health[public_health$EVTYPE %in% most_public_harm,]
most_harm_public_health <- melt(most_harm_public_health,id.vars = c("EVTYPE","YEAR"),
value.name = "Number",
variable.name = "Type")
most_harm_public_health$EVTYPE <- factor(most_harm_public_health$EVTYPE)
ggplot(most_harm_public_health, aes(YEAR, Number, colour = EVTYPE)) + geom_line(aes(group = EVTYPE)) +
facet_grid(Type ~ .,scales = "free_y") +
xlab("Year") + ylab("Total Number of Lives") +
theme(axis.text.x = element_blank()) +
labs(title = "Public Health Consequences of Weather Events over Time, 1950-2011")
From the plot, we see that tornadoes consistently cause the most injuries and often cause the most fatalities. Heat and excessive heat have caused significant injuries and fatalities in recent years, so they may be severe events to watch.
Now, we look at the type of events which have the greatest economic consequences. These are the events which cause the greatest property and crop damages, in dollars. We follow a similar analysis as that of the public health consequences. We find the total dollar amount of property and crop damages for each event type by year and then find the overall mean dollar amount of property and crop damage for each event type over all years. For the event types which are among the top 10 in overall mean property and crop damage amount, we create a two-panel time-series plot, one panel for total property damages and one panel for crop damages to compare how they change over time.
economy <- ddply(storm,
.(EVTYPE,YEAR),
summarise,
Property_Damage = sum(PROPDMG),
Crop_Damage = sum(CROPDMG)
)
economy_mean <- ddply(economy,
.(EVTYPE),
summarise,
mean_propdmg = mean(Property_Damage),
mean_cropdmg = mean(Crop_Damage)
)
top_propdmg <- arrange(economy_mean,desc(mean_propdmg),desc(mean_cropdmg))[1:10,1]
top_cropdmg <- arrange(economy_mean,desc(mean_propdmg),desc(mean_cropdmg))[1:10,1]
most_dmg <- intersect(top_propdmg,top_cropdmg)
most_econ_conseq <- economy[economy$EVTYPE %in% most_dmg,]
most_econ_conseq <- melt(most_econ_conseq,id.vars = c("EVTYPE","YEAR"),
value.name = "Dollars",
variable.name = "Type")
most_econ_conseq$EVTYPE <- factor(most_econ_conseq$EVTYPE)
ggplot(most_econ_conseq, aes(YEAR, Dollars, colour = EVTYPE)) + geom_line(aes(group = EVTYPE)) +
facet_grid(Type ~ .,scales = "free_y") +
xlab("Year") + ylab("Total Amount of Damage ($)") +
theme(axis.text.x = element_blank()) +
labs(title = "Economic Consequences of Weather Events over Time, 1950-2011")
From the plot, we see that floods appear to have the greatest economic consequences. Also of note is that hurricanes can cause very significant damage, as in the case of Hurricane Opal in 1995.