Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis on the storm event database revealed that tornadoes are the most dangerous weather event type regarding the injuries and fatalities in the US. In terms of weather events that caused greatest economic consequences, the shows that tornado caused the most property damage in this 61 years of analysis; regarding the crop damages, hail is the top 1.
The data for this report come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation. National Climatic Data Center Storm Events FAQ.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The data is a comma-separated-value file compressed via bzip2 algorithm to reduce its size. Hence, the first step is to read the data into a data frame and display a summary of the information .
storm <- read.csv(bzfile("/Users/isabelmendez/Documents/R/ExploratoryDataAnalysis/project2/repdata-data-StormData.csv.bz2"), na.strings = "NA")
#summary(storm)
For the Weather Event Types it was formatted the words to refer the same type of event.
# number of unique event types
length(unique(storm$EVTYPE))
## [1] 985
# translate all letters to lowercase
event_types <- tolower(storm$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
## [1] 874
# update the data frame
storm$EVTYPE <- event_types
To find the event types that are most harmful to population health, the number of casualties are aggregated by the event type.
library(plyr)
casualties <- ddply(storm, .(EVTYPE), summarize,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
# Find events that caused most death and injury
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)
Then, it is analyzed the top 10 events that caused largest number of deaths, being the tornado the main cause of death.
fatal_events[, c("EVTYPE", "fatalities")]
## EVTYPE fatalities
## 741 tornado 5633
## 116 excessive heat 1903
## 138 flash flood 978
## 240 heat 937
## 410 lightning 816
## 762 tstm wind 504
## 154 flood 470
## 515 rip current 368
## 314 high wind 248
## 19 avalanche 224
Later, the top 10 events that caused most number of injuries are analyzed, being again the tornado the most harmful.
injury_events[, c("EVTYPE", "injuries")]
## EVTYPE injuries
## 741 tornado 91346
## 762 tstm wind 6957
## 154 flood 6789
## 116 excessive heat 6525
## 410 lightning 5230
## 240 heat 2100
## 382 ice storm 1975
## 138 flash flood 1777
## 671 thunderstorm wind 1488
## 209 hail 1361
library(plyr)
econ_loss <- ddply(storm, .(EVTYPE), summarize,
prop_dmg = sum(PROPDMG),
crop_dmg = sum(CROPDMG))
# filter out events that caused no economic loss
econ_loss <- econ_loss[(econ_loss$prop_dmg > 0 | econ_loss$crop_dmg > 0), ]
prop_dmg_events <- head(econ_loss[order(econ_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(econ_loss[order(econ_loss$crop_dmg, decreasing = T), ], 10)
The data analysis shows that the 10 events that caused most property damage, in dollars, are the following. Tornado is the top 1.
library(plyr)
prop_dmg_events[, c("EVTYPE", "prop_dmg")]
## EVTYPE prop_dmg
## 741 tornado 3212258.2
## 138 flash flood 1420124.6
## 762 tstm wind 1335995.6
## 154 flood 899938.5
## 671 thunderstorm wind 876844.2
## 209 hail 688693.4
## 410 lightning 603351.8
## 697 thunderstorm winds 446293.2
## 314 high wind 324731.6
## 866 winter storm 132720.6
In the case of the crop damange, from the top 10, hail is the first cause. An important data to analyze is that in this case, tornado is the fifth cause.
crop_dmg_events[, c("EVTYPE", "crop_dmg")]
## EVTYPE crop_dmg
## 209 hail 579596.28
## 138 flash flood 179200.46
## 154 flood 168037.88
## 762 tstm wind 109202.60
## 741 tornado 100018.52
## 671 thunderstorm wind 66791.45
## 84 drought 33898.62
## 697 thunderstorm winds 18684.93
## 314 high wind 17283.21
## 250 heavy rain 11122.80
The following plot shows top dangerous weather event types. This graph reflects that the tornado was the main cause of death during those 61 years; there were more than 5,000 deaths and more than 10,000 injuries during that period in US. The other two top most dangerous weather event types were the excessive heat and flash floods.
library(ggplot2)
library(gridExtra)
# Set the levels in order
p1 <- ggplot(data=fatal_events,
aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
geom_bar(stat="identity") +
coord_flip() + xlab("Weather Event Type") + ylab("Total Number of Fatalities") + theme(legend.position="none")
p2 <- ggplot(data=injury_events,
aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
geom_bar(stat="identity") +
coord_flip() + xlab("Weather Event Type") + ylab("Total Number of Injuries") + theme(legend.position="none")
grid.arrange(p1, p2, top="TOP FATALITIES AND INJURIES IN THE US FROM 1950 TO 2011")
The following plot shows the top most harmful weather events that caused greatest economic consequences. This graph shows that tornado caused the most property damage. For the crop damaage, the first event was the hail, in this case, tornadoes are the fifth weather event that caused greatest economic consequences.
library(ggplot2)
#library(gridExtra)
# Set the levels in order
p1 <- ggplot(data=prop_dmg_events,
aes(x=reorder(EVTYPE, prop_dmg), y=log10(prop_dmg), fill=prop_dmg )) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Weather Event Type") +
ylab("Property Damage in Dollars (log-scale)") +
theme(legend.position="none")
p2 <- ggplot(data=crop_dmg_events,
aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Weather Event Type") +
ylab("Crop Damage in Dollars") +
theme(legend.position="none")
grid.arrange(p1, p2, top="ECONOMIC CONSEQUENCES COSTS TO THE US ECONOMY ROM 1950 TO 2011")