Martin Livingstone 2016-10-24
Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been analyzed to identify (a) the weather events that are most harmful to the population’s health (as indicated by the number of fatatities and injuries), and (b) the weather events that have the greatest econonmic consequenses (as indicated by the value of property and crop damage). The data covers Jan 1950 to Nov 2011. Over this period there were 15,145 fatalities and 140,528 injuries, and $427.3 billion of damage. The analysis has found that Tornados, by far, caused the greatest fatalities (5,633) and injuries (91,346). Floods caused the greatest damage ($150.3 billion).
library(ggplot2)
library(gridExtra)
library(dplyr)
library(scales)
library(reshape2)
Download the NOAA storm data via the link provided by Coursera, read in the data from the zipped csv file, then inspect it.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")
dat <- read.csv("StormData.csv.bz2")
# It can be useful to save the data into a local RDS file and read from that instead (its much quicker than reading from the csv file every time) when developing the rMarkdown file
# saveRDS(dat,"storm_data.RDS")
# dat <- readRDS('storm_data.RDS')
dim(dat)
## [1] 902297 37
colnames(dat)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The dataset contains 37 columns but we are only interested in the following columns:
colnames(dat)[c(8,23:28)]
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
For columns PROPDMGEXP, CROPDMGEXP alphabetical characters are used to signify the magnitude of the damage (given in PROPDMG, CROPDMG) using “K” for thousands, “M” for millions, and “B” for billions.
Over what period does the data cover?
min(as.Date(dat$BGN_DATE,"%m/%d/%Y"))
## [1] "1950-01-03"
max(as.Date(dat$BGN_DATE,"%m/%d/%Y"))
## [1] "2011-11-30"
The data covers the period 1950-01-03 to 2011-11-30.
Extract the relevant data relating to injuries and fatalities and create a corresponding tidy dataset. Rows where there are no injuries or fatalities can be excluded.
i <- dat %>% filter(FATALITIES + INJURIES > 0) %>% select(EVTYPE,FATALITIES,INJURIES)
mi <- melt(i,id=c("EVTYPE"))
smi <- mi %>% group_by(EVTYPE,variable) %>% summarize(ival=sum(value)) %>% arrange(desc(ival))
Extract the relevant data relating to economic impact (property and crop damage) and create a corresponding tidy dataset.
e <- dat %>%
filter(PROPDMG + CROPDMG > 0) %>%
mutate(propd=PROPDMG*
ifelse(PROPDMGEXP=="B",1e9,
ifelse(PROPDMGEXP=='M',1e6,
ifelse(PROPDMGEXP=='K',1e3,1)))) %>%
mutate(cropd=CROPDMG*
ifelse(CROPDMGEXP=="B",1e9,
ifelse(CROPDMGEXP=='M',1e6,
ifelse(CROPDMGEXP=='K',1e3,1)))) %>%
select(EVTYPE, propd, cropd)
me <- melt(e, id=c("EVTYPE"))
sme <- me %>% group_by(EVTYPE,variable) %>% summarize(eval=sum(value)) %>% arrange(desc(eval))
We now have two tidy datasets - one for fatalities and injuries (smi), the other for property and crop damage (sme). These datasets are used to answer the following questions:
fatalities <- sum(subset(smi,variable=="FATALITIES")$ival)
injuries <- sum(subset(smi,variable=="INJURIES")$ival)
Over the period 1950-01-03 to 2011-11-30, storms and severe weather events caused a total of 15,145 fatalities and 140,528 injuries.
To show the weather events that are most harmful to the population’s health we plot, by weather event, the number of fatatlities and injuries caused by each (top 20 only).
f <- ggplot(subset(smi,variable=="FATALITIES")[1:20,], aes(x = reorder(EVTYPE, ival), y = ival))
f + geom_bar(stat = "identity") + coord_flip() +
labs(title="US Fatalities due to Weather Events between Jan 1950 to Nov 2011") +
labs(y = "Fatalities", x = "Weather Event") +
scale_y_continuous(breaks=seq(0,6000,500),labels=comma) + theme_bw()
i <- ggplot(subset(smi,variable=="INJURIES")[1:20,], aes(x = reorder(EVTYPE, ival), y = ival))
i + geom_bar(stat = "identity") + coord_flip() +
labs(title="US Injuries due to Weather Events between Jan 1950 to Nov 2011") +
labs(y = "Injuries", x = "Weather Event") +
scale_y_continuous(breaks=seq(0,1e5,1e4),labels=comma) + theme_bw()
As shown in the graphs above, Tornados are by far the most harmful weather events with respect to human health - they caused both the most fatalities (5,633) and the most injuries (91,346).
propd <- sum(subset(sme,variable=="propd")$eval)
cropd <- sum(subset(sme,variable=="cropd")$eval)
Over the period 1950-01-03 to 2011-11-30, storms and severe weather events caused $427.3 billion of property damage and $49.1 billion of crop damage ($476.4 billion in total).
To show the weather events that have the greatest economic consequences we plot, by weather event, the total damage (property damage + crop damage) caused by each (top 20 only); the greater the damage the greater the economic consequence.
totd <- aggregate(sme$eval,list(sme$EVTYPE),sum)
colnames(totd)<-c("EVTYPE","damage")
totd <- head((totd[order(-totd$damage),]),n=20)
d <- ggplot(totd, aes(x = reorder(EVTYPE, damage), y = damage/1e9))
d + geom_bar(stat = "identity") + coord_flip() +
labs(title="US Damage due to Weather Events between Jan 1950 to Nov 2011") +
labs(y = "Damage $ Billions", x = "Weather Event") +
scale_y_continuous(breaks=seq(0,160,10),labels=comma) + theme_bw()
As shown in the graph above, Floods caused the greatest econonmic damage ($150.3 billion).
An analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database shows that between Jan 1950 and Nov 2011 there were 15,145 fatalities and 140,528 injuries, and a total of $476.4 billion of damage caused.
The weather event that caused the most harm to the population’s health was Tornados (5,633 fatalities and 91,346 injuries).
The weather event that had the greatest economic consequences was Floods ($150.3 billion).