The data in this project is taken from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks the major weather events in the United States, and the estimates of fatalities, injuries, and property damage.
The data is downloaded from
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
The documentation of the data is at
https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
The data is loaded from the bzip2 file in R. The total damage of each event type is calculated to address following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Two figures are plotted to answer the questions.
# load data to stormData
stormData <- read.csv("StormData.csv.bz2")
The harm of an event done to the population health is defined as the summation of fatalities and injures caused by the event. Column EVTYPE indicates the event type. The total and average harm of each type event are calculated by using summarize function after group_by EVTYPE. Finally, the obtained data frame is sorted in descending order of total harm.
#calculate the total and average harm to population health caused by each type of event
suppressMessages(library(dplyr))
injury_event <- stormData %>%
mutate(fat_inj = FATALITIES+INJURIES) %>%
group_by(EVTYPE) %>%
summarize(total=sum(fat_inj, na.rm=TRUE),
average=mean(fat_inj, na.rm=TRUE),
n()) %>%
arrange(desc(total))
The economic damage of an event is defined as the summation of PROPDMG and CROPDMG caused by the event. Similarly, the data frame of economic damage relating to event type is created by using group_by, summarize, and arrange.
#calculate the total and average damage caused by each type of event
#library(dplyr)
damage_event <- stormData %>%
mutate(prop_crop = PROPDMG+CROPDMG) %>%
group_by(EVTYPE) %>%
summarize(total=sum(prop_crop, na.rm=TRUE),
average=mean(prop_crop, na.rm=TRUE),
n()) %>%
arrange(desc(total))
library(ggplot2)
#plot top 5 harmful events
ggplot(injury_event[1:5,], aes(x=EVTYPE, y=total)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=45)) +
xlab("Severe weather event") +
ylab("Fatalities and injuries") +
labs(title="Top 5 harmful weather event types in fatalities and injuries")
The figure shows that tornado causes most fatalities and injures.
#plot top 5 harmful events
ggplot(damage_event[1:5,], aes(x=EVTYPE, y=total)) +
geom_bar(stat = "identity") +
theme(axis.text.x=element_text(angle=45)) +
xlab("Severe weather event") +
ylab("Damage to properties and crop") +
labs(title="Top 5 harmful weather event types to properties and crop damage")
The figure shows that tornado causes most damage in properties and crops.