This is a report based on an analysis of the NOAA Storm database, addressing some basic questions about severe weather events such as,
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The following depcits the extensive use of the “dplyr” to split the data and compute appropriate statistics required to answer the above two questions.
It is to be noted that the document is self explanatory owing to the descriptive comments made at each step.
input <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
tidy <- input[,c('EVTYPE','FATALITIES','INJURIES')]
tidy_t <- tidy %>%
mutate(health = FATALITIES + INJURIES)%>%
group_by(EVTYPE)%>%
summarise(deaths = sum(FATALITIES, na.rm = TRUE), injuries = sum(INJURIES, na.rm = TRUE), total.damage = sum(health, na.rm = TRUE)) %>%
arrange(desc(total.damage), desc(deaths), desc(injuries)) %>%
top_n(10)
## Selecting by total.damage
library(ggplot2)
ggplot(tidy_t, aes(x = reorder(EVTYPE, -total.damage), y = total.damage, fill = deaths)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Damage") + ggtitle("Most harmful weather events")
From the above plot, it is seen that “Tornadoes” cause the most damage (total number of deaths combined with injuries, about 100,000), and a large number of deaths as well (approximately 5,000).
Firstly, the values of damages in “dollars” are stored in denominations of hundreds, thousands, millions and billions. Therefore We multiply each value with its corresponding denominations.
tidy_p <- input[,c('EVTYPE','PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
tidy_tp <- tidy_p %>%
mutate(denom.prop = ifelse(PROPDMGEXP == "H", 100,
ifelse(PROPDMGEXP == "K", 1000,
ifelse(PROPDMGEXP == "M", 10^6,
ifelse(PROPDMGEXP == "B", 10^9, 0))))) %>%
mutate(denom.crop = ifelse(CROPDMGEXP == "H", 100,
ifelse(CROPDMGEXP == "K", 1000,
ifelse(CROPDMGEXP == "M", 10^6,
ifelse(PROPDMGEXP == "B", 10^9, 0)))))%>%
mutate(value.prop = (PROPDMG*denom.prop)/10^9, value.crop = (CROPDMG*denom.crop)/10^9) %>%
group_by(EVTYPE)%>%
summarise(Property.Damage = sum(value.prop, na.rm = TRUE), Crop.Damage = sum(value.crop, na.rm = TRUE), Total.Damage = sum(Property.Damage, Crop.Damage)) %>%
arrange(desc(Total.Damage)) %>%
top_n(10)
## Selecting by Total.Damage
library(ggplot2)
ggplot(tidy_tp, aes(x = reorder(EVTYPE, -Total.Damage), y = Total.Damage, fill = Property.Damage)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Damage (in billion dollars)") + ggtitle("Most harmful events, and each event's contribution to damage of property \n (in billions of dollars)")
From the above plot, it is seen that “FLOODS” cause the highest “NET damage”, and the highest damage to “tangible property”.
ggplot(tidy_tp, aes(x = reorder(EVTYPE, -Total.Damage), y = Total.Damage, fill = Crop.Damage)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Damage (in billion dollars)") + ggtitle("Most harmful events, and each event's contribution to crop damage \n (in billions of dollars)")
This plot shows that, while “FLOODS” cause the highest net damage, most damage to “agriculture (crops)” is caused by “DROUGHT”.
The conclusions made from each of the plots above is restated here:
“Tornadoes” cause the most damage (total number of deaths combined with injuries, about 100,000), as well as a large number of deaths (approximately 5,000).
“FLOODS” cause the highest “NET damage”, and the highest damage to “tangible property”.
While “FLOODS” cause the highest net damage, most damage to “agriculture (crops)” is caused by “DROUGHT”.