May 27, 2019
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This section explains how the data has been loaded into R and processed for analysis. This is necesarry, since some events in the data are misspelled, and they could be mistakingly considered as a wrong event type.
Read the data (from working directory) and extract relevant columns:
stormData <- read.csv("repdata_data_StormData.csv.bz2")
processedData <- stormData[,c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")]
Capitalize all event types (EVTYPE) and remove leading and trailing whitespace
library(stringr)
processedData$EVTYPE <- toupper(processedData$EVTYPE)
processedData$EVTYPE <- str_trim(processedData$EVTYPE, side = c("both"))
Spelling corrections
processedData$EVTYPE <- gsub("^THUNDEERSTORM WINDS$","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNDERESTORM WINDS$","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNDERSTROM WIND.*","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNDERTORM.*","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNDERSTORMW","THUNDERSTORMS", processedData$EVTYPE)
processedData$EVTYPE <- gsub(".*TSTM","THUNDERSTORM", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNERSTORM","THUNDERSTORM", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^TUNDERSTORM","THUNDERSTORM", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNDESTORM","THUNDERSTORM", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUDERSTORM WINDS$","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^THUNDERTSORM.*","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub(".*THUNDERSTORM.*","THUNDERSTORM WINDS", processedData$EVTYPE)
processedData$EVTYPE <- gsub("LIGHTNING.*","LIGHTNING", processedData$EVTYPE)
processedData$EVTYPE <- gsub("AVALANCE*","AVALANCHE", processedData$EVTYPE)
processedData$EVTYPE <- gsub("AVALANCHEHE*","AVALANCHE", processedData$EVTYPE)
processedData$EVTYPE <- gsub(".*HURRICANE.*","HURRICANE", processedData$EVTYPE)
processedData$EVTYPE <- gsub(".*FLASH.*","FLASH FLOOD", processedData$EVTYPE)
processedData$EVTYPE <- gsub("^FLOOD.*","FLOOD", processedData$EVTYPE)
processedDataDim <- dim(processedData)
uniqueProcessedData <- nrow(unique(processedData$EVTYPE))
Next section shows the R codes to get the total number of fatalities, injuries, property damage and crop damage for every event type.
Pipeline:
Convert the Event Type (EVTYPE) column from string to factor, and count the number of fatalities per Event Type in order to get the top 10 events that cause fatality.
processedData$EVTYPE <- as.factor(processedData$EVTYPE)
fatalitiesByEventType <- aggregate(FATALITIES ~ EVTYPE, processedData, FUN = sum)
top10FatalEvent <- head(fatalitiesByEventType[order(-fatalitiesByEventType$FATALITIES), ], 10)
top10FatalEvent$EVTYPE <- factor(top10FatalEvent$EVTYPE, levels = top10FatalEvent$EVTYPE)
Repeat for number of injuries per Event Type
injuriesByEventType <- aggregate(INJURIES ~ EVTYPE, processedData, FUN = sum)
top10InjuryEvent <- head(injuriesByEventType[order(-injuriesByEventType$INJURIES), ], 10)
top10InjuryEvent$EVTYPE <- factor(top10InjuryEvent$EVTYPE, levels = top10InjuryEvent$EVTYPE)
Next, the total number of damages with relation to properties per event type and total number of damages with relation to crops per event type, and then merge.
library(tidyr)
propertyDamage <- aggregate(PROPDMG ~ EVTYPE, processedData, FUN = sum)
top10PropertyDamage <- head(propertyDamage[order(-propertyDamage$PROPDMG), ], 10)
cropDamage <- aggregate(CROPDMG ~ EVTYPE, processedData, FUN = sum)
top10CropDamage <- head(cropDamage[order(-cropDamage$CROPDMG), ], 10)
economicDamage <- merge(top10PropertyDamage, top10CropDamage, by.x = "EVTYPE", by.y = "EVTYPE", all = FALSE)
economicDamage <- gather(economicDamage, DamageType, Damage, -EVTYPE)
##Results
Types of events that are most harmful to population health.
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
ggplot(top10FatalEvent, aes(EVTYPE, FATALITIES, fill=EVTYPE))+ guides(fill=FALSE)+geom_bar(stat="identity")+xlab("EVENT TYPE") + ylab("# OF FATALITIES") + ggtitle("Top 10 Causes of Fatality in the U.S.") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
The figure above shows the top 10 events of severe weather conditions that cause fatalities across the United States. It also shows that TORNADO has the most number of fatalities.
ggplot(top10InjuryEvent, aes(x=factor(EVTYPE), y=INJURIES, fill=EVTYPE))+ guides(fill=FALSE)+geom_bar(stat="identity")+xlab("EVENT TYPE") + ylab("# OF Injuries") + ggtitle("Top 10 Causes of Injuries in the U.S.") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
The figure above illustrates the top 10 events of severe weather conditions that cause injuries across the United States. It also shows that TORNADO has the most number of injuries.
Types of events that have the greatest economic consequences.
options(scipen = 5)
ggplot(economicDamage, aes(EVTYPE, y=Damage, fill=EVTYPE)) + guides(fill=FALSE) + geom_bar(stat="identity") + xlab("EVENT TYPE") + ylab("Damage") + ggtitle("Top 5 Events with Greatest Economic Consequences in the U.S.") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + facet_wrap(~DamageType, ncol=1)
The figure above shows the top 5 events or severe weather conditions that have the greatest economic consequences across the United States. It also shows that HAIL has the highest incurred damages, in terms of crops, and on the other hand, it also shows that TORNADO has the highest incurred damages in terms of property.