This research will analyze the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm data, and try to answer 2 quesions.
The are 902,297 rows of storm data, and 39 variables. you may download and have a look at the data at Storm Data . For more detail please refer to Storm Data Document
This research will analysis the storm data from 1995, since there are only 4 types of storm were recorded in database before 1996. It makes more sense if we compare all the storm types.
(injuires + fatalities ) is the standard to judge whether the event is more harmful or not.
( Crop damage + Property damage ) is the standard to judge whether the event have the greatest economic consequences in US.
# 1. Read the data
stormData <- read.csv("repdata-data-StormData.csv.bz2", header=TRUE, sep=",")
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.2
# 2. Filter the data, 1996 and after
stormData$date <- as.Date( as.character(stormData$BGN_DATE), "%m/%d/%Y")
after_1995_stromData <- subset(stormData, date > "1995-12-31")
# 3. Get injuries + fatalities
# 3.1 sum by etype
after_1965_by_type1 <- group_by(after_1995_stromData, EVTYPE)
after_1965_sum <- summarize(after_1965_by_type1, INJURIES=sum(INJURIES), FATALITIES=sum(FATALITIES))
# 3.2 remove 0 and sort
after_1965_sum <- subset(after_1965_sum, INJURIES >0, FATALITIES >0)
# 3.3 Calaculate the total and sort
after_1965_sum$total <- (after_1965_sum$INJURIES + after_1965_sum$FATALITIES)
after_1965_sum <- arrange(after_1965_sum, desc(total))
# 4. Get property damage + crop damage
# 4.1 replace unit blank,k,m,b with 1000,1000000,1000000000,0
after_1995_stromData$PROPDMGEXP <- gsub("k", 1000, after_1995_stromData$PROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$PROPDMGEXP <- gsub("m", 1000000, after_1995_stromData$PROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$PROPDMGEXP <- gsub("b", 1000000000, after_1995_stromData$PROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$PROPDMGEXP <- as.numeric(gsub("^$", 0, after_1995_stromData$PROPDMGEXP))
after_1995_stromData$CROPDMGEXP <- gsub("k", 1000, after_1995_stromData$CROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$CROPDMGEXP <- gsub("m", 1000000, after_1995_stromData$CROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$CROPDMGEXP <- gsub("b", 1000000000, after_1995_stromData$CROPDMGEXP,ignore.case = TRUE)
after_1995_stromData$CROPDMGEXP <- as.numeric(gsub("^$", 0, after_1995_stromData$CROPDMGEXP))
# 4.2 toal = property damage * unit + crop damage * unit
after_1995_stromData$total <- after_1995_stromData$PROPDMG*as.numeric(after_1995_stromData$PROPDMGEXP) + after_1995_stromData$CROPDMG*as.numeric(after_1995_stromData$CROPDMGEXP)
# 4.3 group by and summarize
after_1965_by_type2 <- group_by(after_1995_stromData, EVTYPE)
after_1995_stromData_dmage<- summarize(after_1965_by_type2, total=sum(total))
# 4.4 remove 0 and sort
after_1995_stromData_dmage<- subset(after_1995_stromData_dmage, total > 0)
after_1995_stromData_dmage<- arrange(after_1995_stromData_dmage, desc(total))
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
sp <- ggplot(data=head(after_1965_sum), aes(x=EVTYPE, y=total))
sp + geom_bar(stat="identity") + ggtitle("Top 6 Most harmful events to pupulation health") + ylab("Total injures and fatalities")
From the figure above you may see that TORNADO is the most harmful to population health.
sp <- ggplot(data=head(after_1995_stromData_dmage), aes(x=EVTYPE, y=total))
sp + geom_bar(stat="identity") + ggtitle("Top 6 Events cause greatest economic consequences in US") + ylab("Total damage in dollars")
From the figure above you may see that flood have the greatest economic consequences in US.