Impact of Severe Weather Events

Synopsis

The analysis below looks at severe weather data collected by the national weather service from 1950 to 2011. The goal of our investigation is two answer two essential questions

  1. Which types of events are most harmful to population health?
  2. Which types of events have the greatest economic consequences?

In the investigation we aimed to answer the health component by finding the totals of injuries and fatalities resulting from severe weather events. We then found totals for property and crop damage to address the economic consequences.

The results were interpreted by ordering the sums and plotting them on bar graphs that show the amount of impact each type of event has.

Data Processing

Loading in Data

The data is loaded in from its URL and orignally came from the National Weather Service. The function bzfile is required to decompress the csv file.

setwd("F:/Coursera/Course 5 Reproducible Research/Project 2")

download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",  
              destfile = "Storm_Data.csv.bz2")
Storm_Data <- read.csv(bzfile(("Storm_Data.csv.bz2")))

Trimming Data to Parameters of Interest

Observations that do not have a valid severe weather event type are removed. If the parameters of interest relating to health and financial damage are all zero then those observations are removed.

Storm_Data <- subset(Storm_Data, EVTYPE != '?')
Storm_Data <- subset(Storm_Data, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)

Scaling Property and Crop Damage into dollar amounts.

The economic data needs to be processed more, because its the dollar amounts for property and crop damage are stored in 4 different columns instead of just 2. Each type of loss includes a column for the type of loss(stored in PROPDMG & CROPDMG respectively) and a column with a character representing the scale of that value (PROPDMGEXP & CROPDMGEXP).

PROPDMGEXP and CROPDMG are represented with a collection of characters and numbers that represent the exponents that scale the damage columns. To scale the Property and Crop Damage amounts into actual dollars, we had to replace these symbols with powers of 10 then multiply the PROPDMG & CROPDMG variables by these scalars (stored in prop_dollars_key & crop_dollars_key). The new dollar amounts are stored in PROPLOSSDOLL and CROPLOSSDOLL columns of the dataset.

# examining all property damage exponenets
unique(Storm_Data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 4 h 2 7 3 H -
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# examining property damage exponents
unique(Storm_Data$CROPDMGEXP)
## [1]   M K m B ? 0 k
## Levels:  ? 0 2 B k K m M
# making all exponents upercase
Storm_Data$PROPDMGEXP <- toupper(Storm_Data$PROPDMGEXP)
Storm_Data$CROPDMGEXP <- toupper(Storm_Data$CROPDMGEXP)

######### Property Damage ############
# Making conversion key to convert exponents to dollar amounts
prop_dollars_key <- c("0" = 10^0, "1" = 10, "2" = 10^2, "3" = 10^3, "4" = 10^4, "5" = 10^5,
                      "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9, 
                      "H" = 10^2, "K" = 10^3, "M" = 10^6, "B" = 10^9)

# Replacing symbols for exponents of property damage with numeric values
Storm_Data$PROPDMGEXP <- prop_dollars_key[Storm_Data$PROPDMGEXP]
# if no symbol is present we will multiply by 1
Storm_Data$PROPDMGEXP[is.na(Storm_Data$PROPDMGEXP)] <- 1

# Calculating property damage in dollars and replacing the old scaled value
Storm_Data$PROPLOSSDOLL <- (Storm_Data$PROPDMG * Storm_Data$PROPDMGEXP)

######### Crop Damage ############
# Making conversion key to convert exponents to dollar amounts
crop_dollars_key <- c("?" = 1, "0" = 1, "K" = 10^3, "M" = 10^6, "B" = 10^9)

# Replacing symbols for exponents of crop damage with numeric values
Storm_Data$CROPDMGEXP <- crop_dollars_key[Storm_Data$CROPDMGEXP]
# if no symbol is present we will multiply by 1
Storm_Data$CROPDMGEXP[is.na(Storm_Data$CROPDMGEXP)] <- 1

#Calculating property damage in dollars and replacing the old scaled value
Storm_Data$CROPLOSSDOLL <- (Storm_Data$CROPDMG * Storm_Data$CROPDMGEXP)

Which types of Events are Most Harmful to Population Health

The data needs to farther processed in order to get injury and fatalities totals. This will allow us to answer the question of which events are most harmful to population health.

This process involves aggredating the data by event type for both fatality and injury data, and summing the corresponding observations. Once we have sums for each event type, a new variable is created to give us totals of injuries and fatalites. This new information will help us to better determine which event is the most dangerous.

library(plyr)
# Fatlity
total_event_fatality <- aggregate(FATALITIES ~ EVTYPE, data = Storm_Data, sum)
total_event_fatality <- arrange(total_event_fatality, desc(FATALITIES))

# Injuries
total_event_injuries <- aggregate(INJURIES ~ EVTYPE, data = Storm_Data, sum)
total_event_injuries <- arrange(total_event_injuries, desc(INJURIES))

# creating a dataset with fatalities and  injuries
Population_Health <- merge(total_event_fatality, total_event_injuries, by = "EVTYPE")
# create a variable with the total number of injuries and fatalities for each event
Population_Health$TOTAL <- Population_Health$FATALITIES + Population_Health$INJURIES

# Ordering health data by total of injury and fatality
Population_Health <- arrange(Population_Health, desc(TOTAL))
head(Population_Health)
##           EVTYPE FATALITIES INJURIES TOTAL
## 1        TORNADO       5633    91346 96979
## 2 EXCESSIVE HEAT       1903     6525  8428
## 3      TSTM WIND        504     6957  7461
## 4          FLOOD        470     6789  7259
## 5      LIGHTNING        816     5230  6046
## 6           HEAT        937     2100  3037

After looking at the sorted data, we have a pretty good Idea that tornados, excessive heat, and thunderstorm winds are the most dangerous to human health.

Which types of events have the greatest economic consequences?

Similar to the health information, the data needs to farther processed in order to allow us to answer the question of which events have the greatest economic consequences.

This process involves aggredating the data by event type for both crop damage and property damage, and summing the corresponding observations. Once we have sums for each event type, a new variable is created to give us the total of crop damage and property damage together.
This new information will help us to better determine which event is the most dangerous.

# Economic Consequences
#property
total_prop_dmg <- aggregate(PROPLOSSDOLL ~ EVTYPE, data = Storm_Data, sum)
total_prop_dmg <- arrange(total_prop_dmg, desc(PROPLOSSDOLL))

# crops
total_crop_dmg <- aggregate(CROPLOSSDOLL ~ EVTYPE, data = Storm_Data, sum)
total_crop_dmg <- arrange(total_crop_dmg, desc(CROPLOSSDOLL))

# creating a dataset with crop losses and property losses
Money_Loss <- merge(total_prop_dmg, total_crop_dmg, by = "EVTYPE")
# create a variable with the total amount of property and crop losses
Money_Loss$ECONOMICLOSSES <- Money_Loss$PROPLOSSDOLL + Money_Loss$CROPLOSSDOLL

# Ordering economic data
Money_Loss <- arrange(Money_Loss, desc(ECONOMICLOSSES))
head(Money_Loss)
##              EVTYPE PROPLOSSDOLL CROPLOSSDOLL ECONOMICLOSSES
## 1             FLOOD 144657709807   5661968450   150319678257
## 2 HURRICANE/TYPHOON  69305840000   2607872800    71913712800
## 3           TORNADO  56947380677    414953270    57362333947
## 4       STORM SURGE  43323536000         5000    43323541000
## 5              HAIL  15735267513   3025954473    18761221986
## 6       FLASH FLOOD  16822673979   1421317100    18243991079

After looking at the sorted data, we have a pretty good Idea that Floods, Hurricane, and Tornadoes are the most dangerous to properties and crops.

Results

Events Most Harmful to Population Health

The graph created below helps us to determine what events are most harmful to population health. To create the graph the top 10 event types for total health impact were used. The injury totals, fatatility totals, and overall totals were graphed for each event.

library(ggplot2)
library(reshape)
## 
## Attaching package: 'reshape'
## The following objects are masked from 'package:plyr':
## 
##     rename, round_any
#subsetting top 10 health damaging weather events
Top_Health_Affects <- Population_Health[1:10,]

# specifiy id variables and measurement variables
Top_Health_Affects <- melt(Top_Health_Affects, id.vars="EVTYPE", variable_name = "IMPACT", 
                           measure.vars = c("FATALITIES", "INJURIES", "TOTAL"))


# plotting Injury, Fatality, and both totals for 10 health concerns
Health_Plot <- ggplot(Top_Health_Affects, aes(x = reorder(EVTYPE, desc(value)), y = value, fill = IMPACT))

# stat = identity so categorical value is used not frequency                
Health_Plot + geom_bar(stat = "identity", position = "dodge") +
        theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) + 
        xlab("Event Type") + ylab("Frequency") +
        ggtitle("Top 10 US Storm Health Impacts")

The severe weather events depicted above have the greatest impact on human health.
Their impact decreases from left to right. For the most part all three graphed parameters decrease from one event to another. It is safe to conclude that Tornados, excessive heat, and thunderstorm winds have the largest imact on human health.

Events with Most Economic Impact

The graph created below is simlar to the one create for human health impact. Its goal is to assist us in determining what severe weather events have the greatest economic impact. To create the graph the top 10 event types for total monetary damage were used. The property damage, crop damage, and overall damage totals were graphed for each event.

# property and crop damage
Top_Money_Loss <- Money_Loss[1:10,]

# specify id and measurement variables
Top_Money_Loss <- melt(Top_Money_Loss, id.vars="EVTYPE", variable_name = "LOSSTYPE", 
                           measure.vars = c("PROPLOSSDOLL", "CROPLOSSDOLL", "ECONOMICLOSSES"))

# plotting the sum of property loss crop loss and total losses for each of top 10 event types
Money_Plot <- ggplot(Top_Money_Loss, aes(x = reorder(EVTYPE, desc(value)), y = value, 
                fill = factor(LOSSTYPE, labels = c("Property", "Crops", "Total"))))

# stat = identity so categorical value is used not frequency, and make actual plot                
Money_Plot + geom_bar(stat = "identity", position = "dodge") +
        theme(axis.text.x = element_text(angle = 45, vjust = 0.5)) + 
        xlab("Event Type") + ylab("Dollars") +
        ggtitle("Top 10 US Storm Financial Impacts") +
        guides(fill = guide_legend(title = "Type of Loss"))

The 10 severe weather events graphed above have the greatest fincial impacts. Their impact decreases from left to right. According to our graph floods, hurricanes/ typhoons, and tornadoes have greatest total impact. The relationship for property loss seems to correspond with the graph. Crops seem to have a lower overall effect on the total. Instead, other variables such as droughts, floods, and ice storms appear to be top contenders.