Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Load the packages that will be used throughout the study
library(plyr)
library(ggplot2)
Load the data from the Coursera website; download it into your working directory, unzip the file, and import the dataset into R.
path <- getwd()
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, file.path(path, "repdata-data-StormData.csv.bz2"))
Storm_Data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
Take a first glance at the data
names(Storm_Data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Find the deadliest weather events
## Find the deadliest weather event
fatalityData <- aggregate(FATALITIES ~ EVTYPE, data = Storm_Data, FUN ="sum")
most_fatal <- which.max(fatalityData$FATALITIES)
fatalityData[most_fatal,]
## EVTYPE FATALITIES
## 834 TORNADO 5633
## Find the 10 deadliest events
fatalityData <- arrange(fatalityData, desc(fatalityData[,2]))
top10fatData <- fatalityData[1:10,]
top10fatData
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Plot the 10 deadliest weather events
ggplot(top10fatData, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES)) +
geom_bar(stat = "identity") +
ggtitle("10 deadliest events") +
xlab("Event Type") +
theme(axis.text.x = element_text(angle = 45, hjust =1))
Find the weather events that lead to the most injured people
## Find the event that leads to the most injured people
injuriesData <- aggregate(INJURIES ~ EVTYPE, data=Storm_Data, FUN = "sum")
most_injuries <- which.max(injuriesData$INJURIES)
injuriesData[most_injuries,]
## EVTYPE INJURIES
## 834 TORNADO 91346
## Find the 10 events that lead to the most injured people
injuriesData <- arrange(injuriesData, desc(injuriesData[,2]))
top10injData <- injuriesData[1:10,]
top10injData
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Plot the 10 events that lead to the most injured people
ggplot(top10injData, aes(x = reorder(EVTYPE, -INJURIES), y = INJURIES)) +
geom_bar(stat = "identity") +
ggtitle("10 events leading to the most injuries") +
xlab("Event Type") +
theme(axis.text.x = element_text(angle = 45, hjust =1))
Find the weather events that have the biggest negative impact on the economy (cost the most).Impact on the economy is characterized in this study by PROPDMG (the property damages) and CROPDMG (crop damages).
EconomicImpactData <- aggregate(PROPDMG + CROPDMG ~ EVTYPE, data = Storm_Data, FUN=sum)
names(EconomicImpactData) <- c("EVTYPE", "Damages")
EcoImpact <- arrange(EconomicImpactData, desc(EconomicImpactData[, 2]))
top10EcoData <- EcoImpact[1:10,]
top10EcoData
## EVTYPE Damages
## 1 TORNADO 3312276.7
## 2 FLASH FLOOD 1599325.1
## 3 TSTM WIND 1445168.2
## 4 HAIL 1268289.7
## 5 FLOOD 1067976.4
## 6 THUNDERSTORM WIND 943635.6
## 7 LIGHTNING 606932.4
## 8 THUNDERSTORM WINDS 464978.1
## 9 HIGH WIND 342014.8
## 10 WINTER STORM 134699.6
Plot the 10 weather events the have the biggest negative impact on the economy (cost the most).
ggplot(top10EcoData, aes(x = reorder(EVTYPE, -Damages), y = Damages)) +
geom_bar(stat = "identity") +
ggtitle("10 events having the greatest economic damages") +
xlab("Event Type") +
theme(axis.text.x = element_text(angle = 90))
This study seems to show that tornadoes are the deadliest weather events in the US throughout the years studied in this study (1950 to 2011); tornadoes are also the weather events that led to the most injured people in the US throughout the study. Unsurprisingly, tornadoes also appear to be responsible for the highest negative impact in the US.
Excessive heat and flash flood follow tornadoes in the number of fatalities, with about 2000 and 1000 respectively for the studied period.
Wind and flood follow tornadoes in the number of injuries.
As for the economic impact of the weather events, flash flood and wind follow tornadoes.
To sum up, it can thus be said that tornadoes should be very carefully monitored to avoid both the human and economic cost (though the property damages and crop damages might not be avoidable); (flash) floods and wind should also be carefully watched.