Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data [47Mb]
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(rmarkdown)
library(dplyr)
library(ggplot2)
library(readr)
if(!file.exists('repdata-data-StormData.csv.bz2')) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata-data-StormData.csv.bz2")}
if(!file.exists('repdata-data-StormData.csv')) {
bunzip2("repdata-data-StormData.csv.bz2", overwrite=T, remove=F)
}
if (!"rawData" %in% ls()) {
rawData = read_csv("repdata-data-StormData.csv")
}
Based on the National Weather Service Codebook [https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf]
EVTYPE as a measure of event type (e.g. tornado, flood, etc.) FATALITIES as a measure of harm to human health INJURIES as a measure of harm to human health PROPDMG as a measure of property damage and hence economic damage in USD PROPDMGEXP as a measure of magnitude of property damage (e.g. thousands, millions USD, etc.) CROPDMG as a measure of crop damage and hence economic damage in USD CROPDMGEXP as a measure of magnitude of crop damage (e.g. thousands, millions USD, etc.)
damaged_data = rawData %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
knitr::kable(head(damaged_data, 5))
| EVTYPE | FATALITIES | INJURIES | PROPDMG | PROPDMGEXP | CROPDMG | CROPDMGEXP |
|---|---|---|---|---|---|---|
| TORNADO | 0 | 15 | 25.0 | K | 0 | NA |
| TORNADO | 0 | 0 | 2.5 | K | 0 | NA |
| TORNADO | 0 | 2 | 25.0 | K | 0 | NA |
| TORNADO | 0 | 2 | 2.5 | K | 0 | NA |
| TORNADO | 0 | 2 | 2.5 | K | 0 | NA |
Across the United States, which types of events(as indicated in the EVTYPE variable) are most harmful with respect to population health?
fatalities = damaged_data %>% group_by(EVTYPE) %>% summarise(sum_fatalities = sum(FATALITIES)) %>% arrange(desc(sum_fatalities))
injuries = damaged_data %>% group_by(EVTYPE) %>% summarise(sum_injuries = sum(INJURIES)) %>% arrange(desc(sum_injuries))
knitr::kable(head(fatalities, 5))
| EVTYPE | sum_fatalities |
|---|---|
| TORNADO | 5633 |
| EXCESSIVE HEAT | 1903 |
| FLASH FLOOD | 978 |
| HEAT | 937 |
| LIGHTNING | 816 |
knitr::kable(head(injuries, 5))
| EVTYPE | sum_injuries |
|---|---|
| TORNADO | 91346 |
| TSTM WIND | 6957 |
| FLOOD | 6789 |
| EXCESSIVE HEAT | 6525 |
| LIGHTNING | 5230 |
par(mfrow=c(1,2))
barplot(head(fatalities,5)$sum_fatalities,
las=2,
names.arg = head(fatalities,5)$EVTYPE,
main = "Top 5 Causes for Fatalities",
ylab = "count",
col = rainbow(5),
cex.names=0.5)
barplot(head(injuries,5)$sum_injuries,
las=2, names.arg = head(injuries,5)$EVTYPE,
main = "Top 5 Causes for Injuries",
ylab = "count",
col = rainbow(5),
cex.names=0.5)
Across the United States, which types of events have the greatest economic consequences?
prop_data = damaged_data %>% select(EVTYPE, PROPDMG, PROPDMGEXP)
unique(prop_data$PROPDMGEXP)
## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
prop_data$PROPDMGEXP = gsub("K|k", "3", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("M|m", "6", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("B", "9", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("H|h", "2", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("\\-|\\+|\\?","0",prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = as.numeric(prop_data$PROPDMGEXP)
prop_data = prop_data %>% mutate(property_damage = PROPDMG * 10^PROPDMGEXP) %>% group_by(EVTYPE) %>% summarise(sum_prop = sum(property_damage)) %>% arrange(desc(sum_prop))
crop_data = damaged_data %>% select(EVTYPE, CROPDMG, CROPDMGEXP)
unique(crop_data$CROPDMGEXP)
## [1] NA "M" "K" "m" "B" "?" "0" "k" "2"
crop_data$CROPDMGEXP = gsub("K|k", "3", crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = gsub("M|m", "6", crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = gsub("B", "9", crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = gsub("|\\?","0",crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = as.numeric(crop_data$CROPDMGEXP)
crop_data = crop_data %>% mutate(crop_damage = CROPDMG * 10^CROPDMGEXP) %>% group_by(EVTYPE) %>% summarise(sum_crop = sum(crop_damage)) %>% arrange(desc(sum_crop))
knitr::kable(head(prop_data, 5))
| EVTYPE | sum_prop |
|---|---|
| TORNADOES, TSTM WIND, HAIL | 1600000000 |
| WILD FIRES | 624100000 |
| HAILSTORM | 241000000 |
| HIGH WINDS/COLD | 110500000 |
| River Flooding | 106155000 |
knitr::kable(head(crop_data, 5))
| EVTYPE | sum_crop |
|---|---|
| EXCESSIVE WETNESS | 1.42e+62 |
| COLD AND WET CONDITIONS | 6.60e+61 |
| Early Frost | 4.20e+61 |
| Damaging Freeze | 3.41e+61 |
| Freeze | 1.05e+61 |
par(mfrow = c(1,2))
barplot(head(prop_data,5)$sum_prop, las = 2, names = head(prop_data, 5)$EVTYPE, ylab = "Property Damage",
main = "Top 5 Causes for Property Damage",
col = rainbow(5),
cex.names = 0.5,
cex.axis = 0.5)
barplot(head(crop_data,5)$sum_crop, las = 2, names = head(crop_data, 5)$EVTYPE, ylab = "Crop Damage",
main = "Top 5 Causes for Crop Damage",
col = rainbow(5),
cex.names = 0.5,
cex.axis = 0.5)
According to U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, most harmful event to population health is ‘Tornado’ which causes 5,633 fatalities and 91,346 injuries. Additionally, events that have the greatest economic consequences are ‘Tornadoes, Thunderstorm wind, and Hail’ for property damage and ‘Excessive wetness’ for crop damage.