This analysis looks for the impact of natural disasters on health and property across the US. First, the data is prepared by aggregating the impact categories from the various types of disasters. Next, the top 10 items from this table is selected for further analysis. Then, the rank of each type of damage (fatalities, injuries and propdmg (property damage)) is included in the table. Finally, the aggregated information is plotted to show the analysis.
First, read the data and convert event types to factors:
stormData <- read.csv('repdata-data-StormData.csv')
stormData$EVTYPE <- as.factor(stormData$EVTYPE)
Second aggregate data on the three categories of impact we want to analyze; fatalities, injuries and propdmg.
harmfulData <- aggregate(FATALITIES ~ EVTYPE, stormData, sum)
harmfulData$INJURIES <- aggregate(INJURIES ~ EVTYPE, stormData, sum)$INJURIES
harmfulData$PROPDMG <- aggregate(PROPDMG ~ EVTYPE, stormData, sum)$PROPDMG
Third, convert the columns to contain numeric values
harmfulData$FATALITIES <- as.numeric(harmfulData$FATALITIES)
harmfulData$INJURIES <- as.numeric(harmfulData$INJURIES)
harmfulData$PROPDMG <- as.numeric(harmfulData$PROPDMG)
Fourth, retain only the top 10 disaster categories for further analysis
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
fatTop <- top_n(harmfulData, n=10, FATALITIES)
Fifth, place the rank of the categories from this top 10 selection back into the tagble
fatTop$FATALITIES_rank <- rank(fatTop$FATALITIES)
fatTop$INJURIES_rank <- rank(fatTop$INJURIES)
fatTop$PROPDMG_rank <- rank(fatTop$PROPDMG)
Sixth, melt these event type categories to prepare simultaneous plotting.
library(reshape2)
meltdf <- melt(fatTop[-(2:4)],id="EVTYPE")
This plot shows how the three damage categories fatalities, injuries and propdmg rank. Rank 10 means it was the category with the highest damages and 1 is the lowest, within this top 10 selection. Only 10 categories are included for readability reasons. The plot shows that all categories agree that tornadoes were the most damaging. For other categories, there is some variation in how they rank. Thus, the plot illustrates how the various disasters impact life and treasure, and in what cases they are the same.
library(ggplot2)
f <- which.max(harmfulData$FATALITIES)
i <- which.max(harmfulData$INJURIES)
harmfulData[f,]
## EVTYPE FATALITIES INJURIES PROPDMG
## 830 TORNADO 5633 91346 3212258
harmfulData[i,]
## EVTYPE FATALITIES INJURIES PROPDMG
## 830 TORNADO 5633 91346 3212258
g <- ggplot(meltdf,aes(x=EVTYPE,y=value, color=variable, group=variable)) + geom_line() + scale_y_continuous(breaks = seq(0, 10, 1)) + labs(title = "How disaster categories rank") + ylab('Impact rank (10=high)') + xlab('Disaster category')
g