This assignment involves using R to analyze the U.S. National Oceanic and Atmosphere Administration’s (NOAA) storm database to answer some basic questions about severe weather events. The events in the dataset occured between the years 1950 and 2011. The data analysis aims to answer the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The data for this assignment came in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded at the course website: Storm Data [47Mb]
There is also some documentation of the database available. In these, you will find how some of the variables are constructed/defined.
# Packages used for analysis
library(R.utils)
library(ggplot2)
library(dplyr)
Download & Load the data into R
# The location where the data is to be downloaded from
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
# Download the data in your working directory
download.file(url, destfile="storm_data.csv.bz2", method="curl")
# To unzip the bz2 file you'll need to have `R.utils` package installed
# Unzip the bz2 file to get a CSV file
bunzip2("storm_data.csv.bz2", "storm_data.csv", remove = FALSE, overwrite=TRUE)
# Read the CSV file into R
storm_data <- read.csv("storm_data.csv")
# To see how many observations and variables are in the dataset
dim(storm_data)
## [1] 902297 37
# Reducing the dataset to only the needed variables
data <- storm_data[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
First, we analyze which types of events (as indicated in the EVTYPE variable) across the United States are most harmful with respect to population health?
# Find the sum of fatalities per event
fatalities <- aggregate(FATALITIES ~ EVTYPE, data, sum)
# Take the top 10 events with the most fatalities
topFatalities <- arrange(fatalities, desc(FATALITIES)) [1:10,]
# Find the sum of injuries per event
injuries <- aggregate(INJURIES ~ EVTYPE, data, sum)
# Take the top 10 events with the most injuries
topInjuries <- arrange(injuries, desc(INJURIES)) [1:10,]
Top 10 Events with the most Fatalities
print(topFatalities)
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
topFatalities <- transform(topFatalities,EVTYPE=reorder(EVTYPE, order(FATALITIES, decreasing=TRUE)))
# Order by decreasing number of fatalities for graphing
ggplot(aes(x=EVTYPE, y=FATALITIES),data=topFatalities) +
geom_bar(stat="identity", fill= "deepskyblue") +
xlab("Event Type") +
ylab("Fatalities Recorded") +
ggtitle("Top 10 Events causing Fatalities") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Top 10 Events with the most Injuries
print(topInjuries)
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
# Order by decreasing number of injuries for graphing
topInjuries <- transform(topInjuries,EVTYPE=reorder(EVTYPE, order(INJURIES, decreasing=TRUE)))
ggplot(aes(x=EVTYPE, y=INJURIES),data=topInjuries) +
geom_bar(stat="identity", fill= "deepskyblue") +
xlab("Event Type") +
ylab("Injuies Recorded") +
ggtitle("Top 10 Events causing Injuries") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The top weather event to cause both the most fatalities and injuries was tornados.
Secondly, we analysis which types of events across the United States have the greatest economic consequences?
The property damage estimates reported in the variable PROPDMG are rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions are reported in the variable PROPDMGEXP.
# Multiply the damage amounts based on the amount indicated by the alphabetical character in `PROPDMGEXP`.
data$PROP_Total <- 0
data[data$PROPDMGEXP %in% c("H","h"),"PROP_Total"] <- data[data$PROPDMGEXP %in% c("H","h"),"PROPDMG"] * 100
data[data$PROPDMGEXP %in% c("K","k"),"PROP_Total"] <- data[data$PROPDMGEXP %in% c("K","k"),"PROPDMG"] * 1000
data[data$PROPDMGEXP %in% c("M","m"),"PROP_Total"] <- data[data$PROPDMGEXP %in% c("M","m"),"PROPDMG"] * 1000000
data[data$PROPDMGEXP %in% c("B","b"),"PROP_Total"] <- data[data$PROPDMGEXP %in% c("B","b"),"PROPDMG"] * 1000000000
# Multiply the damage amounts based on the amount indicated by the alphabetical character in `CROPDMGEXP`.
data$CROP_Total <- 0
data[data$CROPDMGEXP %in% c("H","h"),"CROP_Total"] <- data[data$CROPDMGEXP %in% c("H","h"),"CROPDMG"] * 100
data[data$CROPDMGEXP %in% c("K","k"),"CROP_Total"] <- data[data$CROPDMGEXP %in% c("K","k"),"CROPDMG"] * 1000
data[data$CROPDMGEXP %in% c("M","m"),"CROP_Total"] <- data[data$CROPDMGEXP %in% c("M","m"),"CROPDMG"] * 1000000
data[data$CROPDMGEXP %in% c("B","b"),"CROP_Total"] <- data[data$CROPDMGEXP %in% c("B","b"),"CROPDMG"] * 1000000000
# Property and crop damage costs are summed together to investigate the total economic damage produced.
data$damage <- data$PROP_Total + data$CROP_Total
# Find the sum of property damage per event
damage_total <- aggregate(damage ~ EVTYPE, data, sum)
# Take the top 10 events with the largest monetary amount of damage
top_damage <- arrange(damage_total, desc(damage)) [1:10,]
top_damage$damage <-top_damage$damage*1e-9
Top 10 Events with the most economic damage
print(top_damage)
## EVTYPE damage
## 1 FLOOD 150.319678
## 2 HURRICANE/TYPHOON 71.913713
## 3 TORNADO 57.352114
## 4 STORM SURGE 43.323541
## 5 HAIL 18.758222
## 6 FLASH FLOOD 17.562129
## 7 DROUGHT 15.018672
## 8 HURRICANE 14.610229
## 9 RIVER FLOOD 10.148404
## 10 ICE STORM 8.967041
# Order by decreasing amount of property damage for graphing
top_damage <- transform(top_damage,EVTYPE=reorder(EVTYPE, order(damage, decreasing=TRUE)))
ggplot(aes(x=EVTYPE, y=damage),data=top_damage) +
geom_bar(stat="identity", fill= "red") +
xlab("Event Type") +
ylab("Total Cost of Property Damage (Billions of USD)") +
ggtitle("Top 10 Events Causing the Most Economic Damage") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The top weather event to cause the most economic damage was floods.
Tornados caused significantly more injuries and fatalities than any other severe weather events. Economic damaged was analyzed by looking at both property and crop damage. Floods caused significantly more economic damage that other severe weather events.