The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage."
Our focus is on two questions:
Our analysis indicates:
Event most harmful to population health (fatalities) is Tornados
Event most harmful to population health (injuries) is Tornados
Event with greatest economic consequence is Flood
Preprocessing i.e. Load the data, create data frames
# loading data
if (!"datafile.csv.bz2" %in% dir("./")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","datafile.csv.bz2")
}
if(!"weatherdata" %in% ls()) {
weatherdata <- read.csv("datafile.csv.bz2")
}
# Data Frame for event type, fatalities and injuries
weatherdataclean <- data.frame(weatherdata$EVTYPE,weatherdata$FATALITIES, weatherdata$INJURIES)
colnames(weatherdataclean) = c("EVTYPE", "FATALITIES", "INJURIES")
# Data Frame for event type, property damage and crop damage
damagedataclean <- data.frame(weatherdata$EVTYPE,weatherdata$PROPDMG, weatherdata$PROPDMGEXP, weatherdata$CROPDMG, weatherdata$CROPDMGEXP)
colnames(damagedataclean) = c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
damagedataclean$PROPDMGMult <- ifelse (damagedataclean$PROPDMGEXP == "K", 1000, ifelse (damagedataclean$PROPDMGEXP == "M", 1000000, ifelse (damagedataclean$PROPDMGEXP == "B", 1000000000, 0)))
# Derive damage amount based on metric summary. Create new metric for combined property + crop damage
damagedataclean$PROPDMGAMT <- damagedataclean$PROPDMG*damagedataclean$PROPDMGMult
damagedataclean$CROPDMGMult <- ifelse (damagedataclean$CROPDMGEXP == "K", 1000, ifelse (damagedataclean$CROPDMGEXP == "M", 1000000, ifelse (damagedataclean$CROPDMGEXP == "B", 1000000000, 0)))
damagedataclean$CROPDMGAMT <- damagedataclean$CROPDMG*damagedataclean$CROPDMGMult
damagedataclean$TOTALDMGAMT <- damagedataclean$PROPDMGAMT+damagedataclean$CROPDMGAMT
It is assumed that events are causing damage to population health by fatalities (FATALITIES in data set) and injuries (INJURIES in data set).
# summary of events based on total number of fatalities by event type
weatherfatalities <- aggregate(weatherdataclean$FATALITIES, by = list(weatherdataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(weatherfatalities) = c("EVTYPE", "FATALITIES")
weatherfatalities <- weatherfatalities[order(-weatherfatalities$FATALITIES),]
topweatherfatalities <- weatherfatalities[1: 10, ]
p<- ggplot(topweatherfatalities, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES))
p+geom_bar(stat = "identity", fill = "grey")+ ggtitle("Weather Events by # Fatalities")+labs(x = "Event Type", y="#Fatalities") +theme(axis.text.x = element_text(angle=45, hjust=1))
Fig 1: We can see from the graph that tornadoes are most dangerous events for health in U.S in terms of fatalities.
The event type with the most total fatalities was tornado (5633) followed by excessive heat (1903) and flash flood (978). But when looking at individual weathr events, some event types appear more deadly. The event type with the most fatalities per event is tornadoes, TSTM wind, hail (25) followed by cold and snow (14) and tropical strom Gordon (8).
# summary of events based on total number of injuries by event type.
weatherinjury <- aggregate(weatherdataclean$INJURIES, by = list(weatherdataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(weatherinjury) = c("EVTYPE", "INJURIES")
weatherinjury <- weatherinjury[order(-weatherinjury$INJURIES),]
topweatherinjury <- weatherinjury[1: 10, ]
q<- ggplot(topweatherinjury, aes(x=reorder(EVTYPE, INJURIES), y=INJURIES))
q+geom_bar(stat = "identity", fill = "grey")+ ggtitle("Weather Events by # Injuries")+labs(x = "Event Type", y="#Injuries") +theme(axis.text.x = element_text(angle=45, hjust=1))
Fig 2:We can see from the graph that tornadoes are most dangerous events for health in U.S in terms of injuries.
Tornados were also the event type with the most injuries (91346) followed by TSTM wind (6957) and flood (6789). When considering individual events, the average number of injuries per event was highest for heat wave (70) followed by tropical storm Gordon (43) and wild fires (37.5).
# summary of events based on total damage($) by event type
TOTALDMGAMT <- aggregate(damagedataclean$TOTALDMGAMT, by = list(damagedataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(TOTALDMGAMT) = c("EVTYPE", "TOTALDMGAMT")
TOTALDMGAMT <- TOTALDMGAMT[order(-TOTALDMGAMT$TOTALDMGAMT),]
TOPTOTALDMGAMT <- TOTALDMGAMT[1: 10, ]
r<- ggplot(TOPTOTALDMGAMT, aes(x=reorder(EVTYPE, TOTALDMGAMT/1000000000), y=TOTALDMGAMT/1000000000))
r+geom_bar(stat = "identity", fill = "grey")+ ggtitle("Weather Events by Total Damage (in $ Billions)")+labs(x = "Event Type", y="Total Damage (in $ Billions)") +theme(axis.text.x = element_text(angle=45, hjust=1))
Fig 3:We can conclude from the graph that floods have greatest economic consequences.
The cost of damages analysis combines both personal property damage and crop damage. The weather event with the highest total cost of damages was flood ($15 BILLION) followed by hurricane/typhoon ($7.2 BILLION) and tornado ($5.7 BILLION). For individual weather events, the highest cost of damages per event was for tornadoes, TSTM wind, hail ($160.2 MILLION) followed by heavy rain/severe weather ($125 MILLION) and hurricane/typhoon ($81.7 MILLION).
Tornado is the most harmful weather event in the U.S with respect to population health.
Floods have the greatest economic consequences in the U.S.