Analysing the Impacts of Major Weather Events in the US

Abstract

In this report we will explore the impact of major weather events on both health and economic outcomes across the US from 1950 to November 2011, based on data available from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We consider here the total economic impact including crop damage, and the total health impact including both fatalities and injuries. Tornados will be shown to have the most impact to human health, while floods have the greatest economic impact.

Data Processing

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(dplyr)
library(ggplot2)

First we’ll download the dataset and read in the compressed .bv2 file to our “data” dataframe.

file_url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file (file_url, destfile = "data.csv.bz2", method="curl",quiet=TRUE)

data <- read.csv ("data.csv.bz2", header=TRUE, stringsAsFactors=FALSE)

First We’ll create new fields in the data frame to account for the prop damage and the crop damage. This invloves replacing the symbol in PROPDMGEXP with the value we’ll use to scale the PROPDMG field. After this will create a new data frame aggregating by event type. I borrowed the method for replacing the symbols from https://rpubs.com/sararafi/82311.

unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
data$PROPEXP[data$PROPDMGEXP ==  "K"]  <-    1000
data$PROPEXP[data$PROPDMGEXP == "M"]   <-  10^6
data$PROPEXP[data$PROPDMGEXP == ""]   <-  1
data$PROPEXP[data$PROPDMGEXP == "B"]   <-  10^9
data$PROPEXP[data$PROPDMGEXP == "m"]   <-  10^6
data$PROPEXP[data$PROPDMGEXP == "+"]   <-  0
data$PROPEXP[data$PROPDMGEXP == "0"]   <-  1
data$PROPEXP[data$PROPDMGEXP == "5"]   <-  10^5
data$PROPEXP[data$PROPDMGEXP == "6"]   <-  10^6
data$PROPEXP[data$PROPDMGEXP == "?"]   <-  0
data$PROPEXP[data$PROPDMGEXP == "4"]   <-  10000
data$PROPEXP[data$PROPDMGEXP == "2"]   <-  100
data$PROPEXP[data$PROPDMGEXP == "3"]   <-  1000
data$PROPEXP[data$PROPDMGEXP == "h"]   <-  100
data$PROPEXP[data$PROPDMGEXP == "7"]   <-  10^7
data$PROPEXP[data$PROPDMGEXP == "H"]   <-  100
data$PROPEXP[data$PROPDMGEXP == "-"]   <-  0
data$PROPEXP[data$PROPDMGEXP == "1"]   <-  10
data$PROPEXP[data$PROPDMGEXP == "8"]   <-  10^8


data$PROPDMGVAL <- round(data$PROPDMG * data$PROPEXP/1000000000,3)


data$CROPEXP[data$CROPDMGEXP ==  ""]   <-  1
data$CROPEXP[data$CROPDMGEXP == "M"]   <-  10^6
data$CROPEXP[data$CROPDMGEXP == "K"]   <-  1000
data$CROPEXP[data$CROPDMGEXP == "m"]   <-  10^9
data$CROPEXP[data$CROPDMGEXP == "B"]   <-  10^6
data$CROPEXP[data$CROPDMGEXP == "?"]   <-  0
data$CROPEXP[data$CROPDMGEXP == "0"]   <-  1
data$CROPEXP[data$CROPDMGEXP == "k"]   <-  1000
data$CROPEXP[data$CROPDMGEXP == "2" ]   <-  100

data$CROPDMGVAL <- round(data$CROPDMG * data$CROPEXP/1000000000,3)

Summarize by event type and see if we can trim the dataset by removing events with low occurence. There appears to be naming inconsistencies in the event types so this will help.

event_summary<-data %>% group_by(EVTYPE) %>% summarize(count_events = length(EVTYPE), 
                                                        total_fatalities = sum(FATALITIES),
                                                        total_injuries = sum(INJURIES),
                                                        total_propdmg = sum(PROPDMGVAL),
                                                        total_cropdmg = sum(CROPDMGVAL))
lowcount<-event_summary %>% filter(count_events<=10)
sum(lowcount$total_fatalities)+sum(lowcount$total_fatalities)
## [1] 476
sum(event_summary$total_fatalities)+sum(event_summary$total_fatalities)
## [1] 30290

The events with frequency < 10 account for only 1.6% of total fatalities and injuries, as such we can omit from the analysis.

We get rid of the low frequency stuff. then we create a new field summing injuries and fatalities with equal weight.

trimmed_data<-event_summary %>% filter(count_events>10)
trimmed_data<-trimmed_data %>% mutate(health_impact = total_fatalities+total_injuries)

I think now we’re ready to look at some results.

Results

Starting with health impact. First we’ll order by decreasing health impact and take the top 10 records, which will be plotted in a barplot.

trimmed_data<-trimmed_data[order(trimmed_data$health_impact, decreasing=TRUE),]
trimmed_data_plot<-head(trimmed_data,10)

ggplot(data=trimmed_data_plot, aes(x=reorder(EVTYPE, -health_impact), y=health_impact)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=health_impact), vjust=-0.3, size=3.5)+
  theme_minimal()+
  ggtitle("Health Impact of Major Weather Events")+
  xlab("Event Type") + ylab("Fatalities and Injuries")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

By a massive margin, tornados appear to cause the most substantial health impact.

Now we can look at the economic impacts.

trimmed_data<-trimmed_data[order(trimmed_data$total_propdmg, decreasing=TRUE),]
trimmed_data_plot<-head(trimmed_data,10)

ggplot(data=trimmed_data_plot, aes(x=reorder(EVTYPE, -total_propdmg), y=total_propdmg)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=total_propdmg), vjust=-0.3, size=3.5)+
  theme_minimal()+
  ggtitle("Economic Impact of Major Weather Events")+
  xlab("Event Type") + ylab("Property and Crop Damage ($B)")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

In this case we cleary see that Flooding has the greatest economic impact. We now conclude that the greatest health impacts come from Tornados, while the greatest economic impacts come from flooding.