Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The following data analysis will address the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
To wrangle in the data, I utilized the dplyr package. For data visualization I worked with the ggplot2 package. Please read through the comments and code to get an understanding on how I processed the data. As you read, you will notice I separated the data process into two parts. The first part (1.) wrangles data for the weather impact on public health and the second part (2.) wrangles in data for the weather impact on the economy. I would like to also mention that I filtered and normalized the data where needed.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# download storm data from U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database - uncomment below code to download the storm data from the website
# download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile='stormdata.csv.bz2', mode = "wb")
# read .csv file name it sd
sd <- read.csv("stormdata.csv.bz2", sep=",")
# 1. select public health data, ph = public health
ph <- select(sd, EVTYPE, FATALITIES, INJURIES )
# 2. select economic data impact econ = economic
econ <- select(sd, EVTYPE, PROPDMG, CROPDMG )
# 1. group by event type and sum public health data
phdata <- ph %>%
group_by(EVTYPE) %>%
select(FATALITIES, INJURIES) %>%
summarise(
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE)
)
# 2. group by event type and sum economic data
econdata <- econ %>%
group_by(EVTYPE) %>%
select(PROPDMG, CROPDMG) %>%
summarise(
PROPDMG = sum(PROPDMG, na.rm = TRUE),
CROPDMG = sum(CROPDMG, na.rm = TRUE)
)
# 1. sum public health data and add FATALITIES and INJURIES for total
phdata <- mutate(phdata,
TOTAL = FATALITIES + INJURIES)
# 2. sum econ and add PROPDMG and CROPDMG for total
econdata <- mutate(econdata,
TOTAL = PROPDMG + CROPDMG)
# 1. order by TOTAL on phdata
phdata <- arrange(phdata, desc(TOTAL))
# 2. order by TOTAL on econdata
econdata <- arrange(econdata, desc(TOTAL))
# 1. filter row that meets TOTAL >99 FATALITIES + INJURIES
phdataTotal <- filter(phdata, TOTAL>99)
# 2. filter row that meets TOTAL >49999 PROPDMG + CROPDMG
econdataTotal <- filter(econdata, TOTAL>49999)
# 1. Order TOTAL highest to lowest in order to plot EVTYPE on x axis phdataTotal
phdataTotal$EVTYPE <- factor(phdataTotal$EVTYPE, levels = phdataTotal$EVTYPE[order(phdataTotal$TOTAL)])
# 2. Order TOTAL highest to lowest in order to plot EVTYPE on x axis econdataTotal
econdataTotal$EVTYPE <- factor(econdataTotal$EVTYPE, levels = econdataTotal$EVTYPE[order(econdataTotal$TOTAL)])
# 1. looks for 'TORNDADO' EVTYPE and divides total to normalise data for plot
phdataTotal <- mutate(phdataTotal, TOTAL = ifelse(EVTYPE == 'TORNADO', TOTAL / 10, TOTAL))
# 2. Divide 100K to TOTAL to normalise data for plot
econdataTotal <- mutate(econdataTotal, TOTAL100K =TOTAL / 100000)
Please refer to table below for top ten event types that have caused great impact to public health. The event weather *TORNADO is the most harmful with respect to population health with a total of 96,979 fatalities and injuries between 1950 through November 2011. Followed by EXCESSIVE HEAT and TSTM WIND with 8,428 and 7,461, respectively.
*Note TORNADO total was divided by 10 to normalize data plot.
library(ggplot2)
# 1. Top 10 event types that cause the most FATALITIES + INJURIES combined
top_n(phdataTotal,10,TOTAL)
## Source: local data frame [10 x 4]
##
## EVTYPE FATALITIES INJURIES TOTAL
## 1 TORNADO 5633 91346 9697.9
## 2 EXCESSIVE HEAT 1903 6525 8428.0
## 3 TSTM WIND 504 6957 7461.0
## 4 FLOOD 470 6789 7259.0
## 5 LIGHTNING 816 5230 6046.0
## 6 HEAT 937 2100 3037.0
## 7 FLASH FLOOD 978 1777 2755.0
## 8 ICE STORM 89 1975 2064.0
## 9 THUNDERSTORM WIND 133 1488 1621.0
## 10 WINTER STORM 206 1321 1527.0
# 1. plot event type impact to public health
phTotal <- ggplot(phdataTotal, aes(EVTYPE, TOTAL, fill=TOTAL)) +
geom_bar(stat="identity")+
labs(title="Total Fatalities & Injuries by Weather Events (1950-November 2011)", x="", y="")+
coord_flip()
phTotal
Please refer to table below for top ten event types that have caused great impact to economy. The event weather TORNADO causes great damage with respect to economy with a total cost of $3,3122,278 between 1950 through November 2011. Followed by FLASH FLOOD and TSTM WIND with a cost of $1,599,325 and $1,445,168, respectively.
library(ggplot2)
# 2. Top 10 event types that cause the most damage
top_n(econdataTotal,10,TOTAL)
## Source: local data frame [10 x 5]
##
## EVTYPE PROPDMG CROPDMG TOTAL TOTAL100K
## 1 TORNADO 3212258.2 100018.52 3312276.7 33.122767
## 2 FLASH FLOOD 1420124.6 179200.46 1599325.1 15.993251
## 3 TSTM WIND 1335965.6 109202.60 1445168.2 14.451682
## 4 HAIL 688693.4 579596.28 1268289.7 12.682897
## 5 FLOOD 899938.5 168037.88 1067976.4 10.679764
## 6 THUNDERSTORM WIND 876844.2 66791.45 943635.6 9.436356
## 7 LIGHTNING 603351.8 3580.61 606932.4 6.069324
## 8 THUNDERSTORM WINDS 446293.2 18684.93 464978.1 4.649781
## 9 HIGH WIND 324731.6 17283.21 342014.8 3.420148
## 10 WINTER STORM 132720.6 1978.99 134699.6 1.346996
# 2. plot event type impact to economy
econTotal <- ggplot(econdataTotal, aes(EVTYPE, TOTAL100K, fill=TOTAL100K)) +
geom_bar(stat="identity")+
labs(title="Total Cost to Property & Crop Damage by Weather Events (1950-November 2011)", x="", y="")+
coord_flip()
econTotal
*Note TOTAL100K variable is used to normalize data plot.