synopsis
The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.In this project, we did data analysis to address two questions: 1) which type of events are most harmful with respect to population health 2) which types of events have the greatest economic consequences.
Data Processing
Loading the data
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
dest="stormdata.bz2")
stormdata <- read.csv(bzfile("stormdata.bz2"), header=TRUE, sep=",", stringsAsFactors=FALSE)
Data subseting
healthdata<-subset(stormdata,stormdata$FATALITIES>0|stormdata$INJURIES>0,
select=c("EVTYPE","FATALITIES","INJURIES"))
DMGdata<-subset(stormdata,stormdata$PROPDMG>0|stormdata$CROPDMG>0,
select=c("EVTYPE","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
“PROPDMG”: Property damage; “CROPDMG”:Crop damage; “~EXP”, numeric unit: K, x1000…….
Check the exp unit
unique(DMGdata$PROPDMGEXP)
## [1] "K" "M" "B" "m" "" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
unique(DMGdata$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k"
Results
The most harmful event for population health
Sorting the data by “FATALITIES” and plotting the top 10
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
SortedData<-aggregate(FATALITIES ~ EVTYPE, healthdata, FUN=sum)
SortedData<-SortedData[order(SortedData$FATALITIES,decreasing = TRUE),]
SortedData$EVTYPE<-factor(SortedData$EVTYPE, levels = SortedData$EVTYPE[order(SortedData$FATALITIES,decreasing = TRUE)])
ggplot(SortedData[1:10,], aes(x=EVTYPE, y=FATALITIES))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=45, hjust=1))+
ggtitle("Top 10 Events with Highest Total Fatalities") +labs(x="EVENT TYPE", y="Total Fatalities")

Sorting the data by “INJURIES” and plotting the top 10
SortedData<-aggregate(INJURIES ~ EVTYPE, healthdata, FUN=sum)
SortedData<-SortedData[order(SortedData$INJURIES,decreasing = TRUE),]
SortedData$EVTYPE<-factor(SortedData$EVTYPE, levels = SortedData$EVTYPE[order(SortedData$INJURIES,decreasing = TRUE)])
ggplot(SortedData[1:10,], aes(x=EVTYPE, y=INJURIES))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=45, hjust=1))+
ggtitle("Top 10 Events with Highest Total Injuries") +labs(x="EVENT TYPE", y="Total Injuries")

Event that have the greatest economic consequences
####Combine the data from same event
PROPDMG10<-aggregate(PROPDMG ~ EVTYPE, DMGdata, FUN=sum)
CROPDMG10<-aggregate(CROPDMG ~ EVTYPE, DMGdata, FUN=sum)
#### get top 10 events with highest property damage
PROPDMG10 <- PROPDMG10[order(-PROPDMG10$PROPDMG), ][1:10, ]
#### get top 10 events with highest crop damage
CROPDMG10 <- CROPDMG10[order(-CROPDMG10$CROPDMG), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(PROPDMG10$PROPDMG, las = 3, names.arg = PROPDMG10$EVTYPE,
main = "Top 10 Events-Property Damages", ylab = "Damages ($)",
col = "blue")
barplot(CROPDMG10$CROPDMG, las = 3, names.arg = CROPDMG10$EVTYPE,
main = "Top 10 Events-Crop Damages", ylab = "Damages ($)",
col = "green")

Conclusions
1.) Across the United States, TORANDO are most harmful event with respect to population health.
2.) Across the United States, FLOOD have the greatest propety damage, DROUGHT cause the most crop damage. Overall, FLOOD have the have the greatest economic consequences as 150 billion.