The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
2 main questions that will be addressed in this analysis:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The data for this assignment can be downloaded from the course web site:
Dataset: Weather Data (URL: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)
Definitions are available at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf as published in the following document: NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, AUGUST 17, 2007, Operations and Services Performance, NWSPD 10-16, STORM DATA PREPARATION
The variables from this dataset that were selected for this analysis include:
EVTYPE: Event Type
Fatalities: # of fatalities
Injuries: # of injuries
The dataset contains a total of 902,297 observations.
Download file and load data into new variable.
if (!"datafile.csv.bz2" %in% dir("./")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","datafile.csv.bz2")
}
if(!"weatherdata" %in% ls()) {
weatherdata <- read.csv("datafile.csv.bz2")
}
Load Required Libraries
library(ggplot2)
Create Data Frame for event type, fatalities and injuries
weatherdataclean <- data.frame(weatherdata$EVTYPE,weatherdata$FATALITIES, weatherdata$INJURIES)
colnames(weatherdataclean) = c("EVTYPE", "FATALITIES", "INJURIES")
Create Data Frame for event type, property damage and crop damage
damagedataclean <- data.frame(weatherdata$EVTYPE,weatherdata$PROPDMG, weatherdata$PROPDMGEXP, weatherdata$CROPDMG, weatherdata$CROPDMGEXP)
colnames(damagedataclean) = c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
Derrive damage amount based on metric summary (K = 1,000, M = 1,000,000, B = 1,000,000,000). Create new metric for combined property + crop damage.
damagedataclean$PROPDMGMult <- ifelse (damagedataclean$PROPDMGEXP == "K", 1000, ifelse (damagedataclean$PROPDMGEXP == "M", 1000000, ifelse (damagedataclean$PROPDMGEXP == "B", 1000000000, 0)))
damagedataclean$PROPDMGAMT <- damagedataclean$PROPDMG*damagedataclean$PROPDMGMult
damagedataclean$CROPDMGMult <- ifelse (damagedataclean$CROPDMGEXP == "K", 1000, ifelse (damagedataclean$CROPDMGEXP == "M", 1000000, ifelse (damagedataclean$CROPDMGEXP == "B", 1000000000, 0)))
damagedataclean$CROPDMGAMT <- damagedataclean$CROPDMG*damagedataclean$CROPDMGMult
damagedataclean$TOTALDMGAMT <- damagedataclean$PROPDMGAMT+damagedataclean$CROPDMGAMT
For the purpose of this analysis, we will interpret “harmful” as having the most fatalities OR most injuries. There are 2 outputs below. In terms of “types of events”, we will examine individual event types and not groups of event types.
Below is a summary of events based on total number of fatalities by event type. Only the top 10 events are shown.
weatherfatalities <- aggregate(weatherdataclean$FATALITIES, by = list(weatherdataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(weatherfatalities) = c("EVTYPE", "FATALITIES")
weatherfatalities <- weatherfatalities[order(-weatherfatalities$FATALITIES),]
topweatherfatalities <- weatherfatalities[1: 10, ]
p<- ggplot(topweatherfatalities, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES))
p+geom_bar(stat = "identity", fill = "red")+ ggtitle("Top 10 Weather Events by # Fatalities")+labs(x = "Event Type", y="#Fatalities") +theme(axis.text.x = element_text(angle=45, hjust=1))
Based on the information shown above, Tornados are the most harmful events to population health based on total number fatalities.
Below is a summary of events based on total number of injuries by event type. Only the top 10 events are shown.
weatherinjury <- aggregate(weatherdataclean$INJURIES, by = list(weatherdataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(weatherinjury) = c("EVTYPE", "INJURIES")
weatherinjury <- weatherinjury[order(-weatherinjury$INJURIES),]
topweatherinjury <- weatherinjury[1: 10, ]
q<- ggplot(topweatherinjury, aes(x=reorder(EVTYPE, INJURIES), y=INJURIES))
q+geom_bar(stat = "identity", fill = "blue")+ ggtitle("Top 10 Weather Events by # Injuries")+labs(x = "Event Type", y="#Injuries") +theme(axis.text.x = element_text(angle=45, hjust=1))
Based on the information shown above, Tornados are the most harmful events to population health based on total number injuries.
For the purpose of this analysis, we will interpret “economic consequence” as having the most fatalities. In terms of “types of events”, we will examine individual event types and not groups of event types.
Below is a summary of events sames on total damage by event type. Only the top 10 events are shown.
TOTALDMGAMT <- aggregate(damagedataclean$TOTALDMGAMT, by = list(damagedataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(TOTALDMGAMT) = c("EVTYPE", "TOTALDMGAMT")
TOTALDMGAMT <- TOTALDMGAMT[order(-TOTALDMGAMT$TOTALDMGAMT),]
TOPTOTALDMGAMT <- TOTALDMGAMT[1: 10, ]
r<- ggplot(TOPTOTALDMGAMT, aes(x=reorder(EVTYPE, TOTALDMGAMT/1000000000), y=TOTALDMGAMT/1000000000))
r+geom_bar(stat = "identity", fill = "green")+ ggtitle("Top 10 Weather Events by Total Damage (in $ Billions)")+labs(x = "Event Type", y="Total Damage (in $ Billions)") +theme(axis.text.x = element_text(angle=45, hjust=1))
Based on the information shown above, Floods have the greatest economic consequences based on total dollars of property and crop damage.
Tornados are the most harmful events to population health, both in terms of fatalities and injuries.
Floods have the greatest economic consequences based on total dollars of damage.