Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis shows that tornadoes are the most dangerous event, followed by excessive heat. Flash floods and thunderstorm winds provoqued billions of dollars in damages between 1950 and 2011. The cause of the largest crop damage was drought, followed by flood and hails.
The database used can be dowloaded here, and the description can be found here.
library(ggplot2)
library(gridExtra)
library(plyr)
data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
Before start, a normalization in the variable EVTYPE (event type) is needed, in order to have the same category without taking into account the way the event was logged (uppercase/lowercase)
eventTypes <- tolower(data$EVTYPE)
eventTypes <- gsub("[[:blank:][:punct:]+]", " ", eventTypes)
data$EVTYPE <- eventTypes
length(unique(data$EVTYPE))
## [1] 874
To find the most harmfull events, we are going to sum the casualties group by the event type.
casualtiesByEvent <- ddply(data, .(EVTYPE), summarize, fatalities = sum(FATALITIES), injuries = sum(INJURIES))
fatalEvents <- head(casualtiesByEvent[order(casualtiesByEvent$fatalities, decreasing = T), ], 10)
injuryEvents <- head(casualtiesByEvent[order(casualtiesByEvent$injuries, decreasing = T), ], 10)
Top 10 events that caused largest number of deaths:
fatalEvents[, c("EVTYPE", "fatalities")]
## EVTYPE fatalities
## 741 tornado 5633
## 116 excessive heat 1903
## 138 flash flood 978
## 240 heat 937
## 410 lightning 816
## 762 tstm wind 504
## 154 flood 470
## 515 rip current 368
## 314 high wind 248
## 19 avalanche 224
Top 10 events that caused most number of injuries:
injuryEvents[, c("EVTYPE", "injuries")]
## EVTYPE injuries
## 741 tornado 91346
## 762 tstm wind 6957
## 154 flood 6789
## 116 excessive heat 6525
## 410 lightning 5230
## 240 heat 2100
## 382 ice storm 1975
## 138 flash flood 1777
## 671 thunderstorm wind 1488
## 209 hail 1361
p1 <- ggplot(data=fatalEvents,
aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of fatalities") +
xlab("Event")
p2 <- ggplot(data=injuryEvents,
aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of injuries") +
xlab("Event")
grid.arrange(p1, p2, top="Top deadly weather events")
Property Damage
propertyDamageData <- aggregate(PROPDMG ~ EVTYPE, data = data, FUN=sum)
propertyDamageData <- arrange(propertyDamageData, desc(propertyDamageData[, 2]))
top10PropertyDamageData <- propertyDamageData[1:10,]
ggplot(top10PropertyDamageData, aes(x = reorder(EVTYPE, -PROPDMG), y = PROPDMG)) +
geom_bar(stat = "identity") +
xlab("Weather Event Type") +
ylab("Property Damage") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle('Top 10 Property Damage')
Crop Damage
cropDamageData <- aggregate(CROPDMG ~ EVTYPE, data = data, FUN=sum)
cropDamageData <- arrange(cropDamageData, desc(cropDamageData[, 2]))
top10CropDamageData <- cropDamageData[1:10,]
ggplot(top10CropDamageData, aes(x = reorder(EVTYPE, -CROPDMG), y = CROPDMG)) +
geom_bar(stat = "identity") +
xlab("Weather Event Type") +
ylab("Property Damage") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle('Top 10 Crop Damage')