Weather events are responsible for significant economic damage and human casualties in the United States. Presented below is an analysis of collected data showing the cumulative damage both in dollars and human lives.
The U.S. National Oceanic and Atmospheric Administration (NOAA) makes its storm database available to the public. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Reports are made approximately 60 to 90 days after the event.
From the National Weather Service FAQ:
NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.
Storm Data Disclaimer:
…Some information appearing in Storm Data may be provided by or gathered from sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc…
The data is inconsistant at best and requires a great deal of tidying.
setwd("~/Assignments/RepData_Storm")
library(knitr)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
library(ggplot2)
library(stringi)
library(gridExtra)
library(RColorBrewer)
Download the data file:
if(!file.exists("StormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")}
Read in the data to storm and extract only those columns related to date, event type, economic damage and human harm:
Columns 2, 8 and 23 through 28
#get the data and hold onto it for dear life
if(!file.exists("temp.RDS")){
temp<<-read.csv("StormData.csv.bz2", header=T, na.strings = "", strip.white = T)
saveRDS(temp, "temp.RDS")
}
#rm("temp")
storm<-readRDS("temp.RDS")
storm<-storm[,c(2,8,23:28)]
saveRDS(storm, "storm.RDS")
storm<-readRDS("storm.RDS")
Save the data in a form that can be easily read and rename columns to more intuitive human readable descriptions.
storm$year<-year(as.Date(storm$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))
storm$year<-as.numeric(storm$year)
storm<-storm[storm$year >=1994,]
storm<-select(storm, year=year, event=EVTYPE, fatalities = FATALITIES, injuries = INJURIES, prop=PROPDMG, pexp=PROPDMGEXP, crop=CROPDMG, cexp=CROPDMGEXP)
saveRDS(object = storm, "storm.RDS")
storm$pexp<-tolower(as.character(storm$pexp))
storm$cexp<-tolower(as.character(storm$cexp))
Reporting increases dramatically in 1994 so we’ll use the data from 1994 to 2011 for our analysis.
storm$event<-as.character(storm$event)
storm<-filter(storm, year>=1994)
str(storm)
## 'data.frame': 702131 obs. of 8 variables:
## $ year : num 1995 1995 1994 1995 1995 ...
## $ event : chr "FREEZING RAIN" "SNOW" "ICE STORM/FLASH FLOOD" "SNOW/ICE" ...
## $ fatalities: num 0 0 0 0 0 2 0 0 0 0 ...
## $ injuries : num 0 0 2 0 0 0 0 0 0 0 ...
## $ prop : num 0 0 0 0 0 0.1 50 5 500 0 ...
## $ pexp : chr NA NA NA NA ...
## $ crop : num 0 0 0 0 0 10 0 500 0 0 ...
## $ cexp : chr NA NA NA NA ...
The data in the damages columns, crop and prop require multiplying exponents that currently use a character code. In order to perform like-to-like analysis, we need to transform the damages columns so that they reflect the same units of measure.
storm$cexp<-gsub("h", 2, storm$cexp)
storm$cexp<-gsub("k", 3, storm$cexp)
storm$cexp<-gsub("m", 6, storm$cexp)
storm$cexp<-gsub("b", 9, storm$cexp)
storm$pexp<-gsub("h", 2, storm$pexp)
storm$pexp<-gsub("k", 3, storm$pexp)
storm$pexp<-gsub("m", 6, storm$pexp)
storm$pexp<-gsub("b", 9, storm$pexp)
storm$cexp<-as.numeric(storm$cexp)
## Warning: NAs introduced by coercion
storm$pexp<-as.numeric(storm$pexp)
## Warning: NAs introduced by coercion
storm$cfact<-10^storm$cexp
storm$pfact<-10^storm$pexp
storm$crop<-storm$cfact*storm$crop
storm$prop<-storm$pfact*storm$prop
storm<-select(storm, -c(pfact, cfact))
storm$event<-toupper(trimws(storm$event))
We’ll remove any rows where the event is missing, as that is the most important element and divide the dollar amounts by \(10^6\) to make it friendlier for plotting.
storm<-select(storm, -c(pexp, cexp))
storm<-filter(storm, !is.na(event) & event !="")
storm<-as.data.frame(storm)
storm$crop<-as.numeric(storm$crop/(10^6))
storm$prop<-as.numeric(storm$prop/(10^6))
storm$event<-as.factor(storm$event)
str(storm)
## 'data.frame': 702131 obs. of 6 variables:
## $ year : num 1995 1995 1994 1995 1995 ...
## $ event : Factor w/ 837 levels "?","ABNORMAL WARMTH",..: 158 521 355 539 539 339 658 658 703 191 ...
## $ fatalities: num 0 0 0 0 0 2 0 0 0 0 ...
## $ injuries : num 0 0 2 0 0 0 0 0 0 0 ...
## $ prop : num NA NA NA NA NA 1e+02 5e-02 5e+00 5e-01 NA ...
## $ crop : num NA NA NA NA NA 10 NA 0.5 NA NA ...
For each variable, Property Damage (prop), Crop Damage (crop), Fatalilies (fatalities) and Injuries (injuries) using dplyr and piping, create a separate table for each variable and sum the damage. Arrange the the table in descending order by amount of damage or deaths and then extract the first ten rows.
#using dplyr and piping, create a separate table for each variable and sum the damage. Arrange the the table in descending order by amount of damage or deaths and then extract the first ten rows
prop<-select(storm, event, prop)%>%
filter(!is.na(prop))%>%
group_by(event)%>%
summarise(damage=sum(prop, na.rm=T))%>%
arrange(desc(damage))
prop$event<-as.character(prop$event)
prop<-prop[1:10,]
prop$event<-as.factor(prop$event)
prop<-transform(prop, event=reorder(event, desc(damage)))
prop$ID<-seq.int(nrow(prop))
kable(prop)
event | damage | ID |
---|---|---|
FLOOD | 144179.609 | 1 |
HURRICANE/TYPHOON | 69305.840 | 2 |
STORM SURGE | 43193.536 | 3 |
TORNADO | 25630.588 | 4 |
FLASH FLOOD | 16398.306 | 5 |
HAIL | 15338.044 | 6 |
HURRICANE | 11862.819 | 7 |
TROPICAL STORM | 7703.386 | 8 |
HIGH WIND | 5266.939 | 9 |
WILDFIRE | 4765.114 | 10 |
crop<-select(storm, event, crop)%>%
filter(!is.na(crop))%>%
group_by(event)%>%
summarise(damage=sum(crop, na.rm=T))%>%
arrange(desc(damage))
crop$event<-as.character(crop$event)
crop<-crop[1:10,]
crop$event<-as.factor(crop$event)
crop<-transform(crop, event=reorder(event, desc(damage)))
kable(crop)
event | damage |
---|---|
DROUGHT | 13922.0660 |
FLOOD | 5506.9424 |
ICE STORM | 5022.1135 |
HAIL | 2982.6991 |
HURRICANE | 2741.4100 |
HURRICANE/TYPHOON | 2607.8728 |
FLASH FLOOD | 1402.6615 |
EXTREME COLD | 1312.9730 |
FROST/FREEZE | 1094.1860 |
HEAVY RAIN | 733.3998 |
death<-select(storm, event, fatalities)%>%
filter(fatalities>0)%>%
group_by(event)%>%
summarise(deaths=sum(fatalities, na.rm=T))%>%
arrange(desc(deaths))
death$event<-as.character(death$event)
death<-death[1:10,]
death$event<-as.factor(death$event)
death<-transform(death, event=reorder(event, desc(deaths)))
kable(death)
event | deaths |
---|---|
EXCESSIVE HEAT | 1903 |
TORNADO | 1593 |
FLASH FLOOD | 951 |
HEAT | 930 |
LIGHTNING | 794 |
FLOOD | 450 |
RIP CURRENT | 368 |
HIGH WIND | 242 |
TSTM WIND | 241 |
AVALANCHE | 224 |
injury<-select(storm, event, injuries)%>%
filter(injuries>0)%>%
group_by(event)%>%
summarise(injured=sum(injuries, na.rm=T))%>%
arrange(desc(injured))
injury$event<-as.character(injury$event)
injury<-injury[1:10,]
injury<-transform(injury, event=reorder(event, desc(injured)))
kable(injury)
event | injured |
---|---|
TORNADO | 22571 |
FLOOD | 6778 |
EXCESSIVE HEAT | 6525 |
LIGHTNING | 5116 |
TSTM WIND | 3631 |
HEAT | 2095 |
ICE STORM | 1971 |
FLASH FLOOD | 1754 |
THUNDERSTORM WIND | 1476 |
WINTER STORM | 1298 |
Figures 1 shows economic damage:
figProp<-ggplot(prop, aes(event, damage, fill=event))+
geom_bar(stat="identity")+
ggtitle("Property Damage")+
theme_classic()+
theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
theme(legend.position="none")
figCrop<-ggplot(crop, aes(event, damage, fill=event))+
geom_bar(stat="identity")+
ggtitle("Crop Damage")+
theme_classic()+
theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
theme(legend.position="none")
grid.arrange(figCrop, figProp, ncol=2, top="Economic Damage", left="USD (millions)")
Figure 2 compares death and injury
figDeath<-ggplot(death, aes(event, deaths, fill=event))+
geom_bar(stat="identity")+
ggtitle("Fatalities")+
theme_classic()+
theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
theme(legend.position="none")
figInjured<-ggplot(injury, aes(event, injured, fill=event))+
geom_bar(stat="identity")+
ggtitle("Injuries")+
theme_classic()+
theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
theme(legend.position="none")
grid.arrange(figDeath, figInjured, ncol=2, top="Human Harm", left="Number of Persons")
Water, either too much (flood) or too little (drought) accounts for the bulk of economic damage. Drought wrecks the most damage in agriculture, while habitable property is most severely affected by Flood.
While not as televisually dramatic as other types of events such as hurricanes and floods, heat poses the greatest risk of death of all weather events in the United States, followed closely by tornado. Tornados, however, are the most injurious, non-fatal events.