Puxin Xu
Thursday, December 23, 2015
In this analysis, we explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to answer two questions, one is which types of events are most harmful with respect to population health? and the other is which types of events have the greatest economic consequences? acrossed the United States. In the result, we found that Tornado is the most harmful weather to population health, the Flood will lead to greatest economic consequences.
if (!file.exists("./tempdata.csv.bz2")){
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "tempdata.csv.bz2", method = "curl")}
data <- read.csv("./tempdata.csv.bz2")
After read the documentation of Storm Data and the requirements of the assignment. The related column are BGN_DATE,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEX,CROPDMG,CROPDMGEXP.So,subseting the original dataset to this column.The year in the column BGN_DATE is useful,extract it to new dataset.
library(lubridate)
year <- year(mdy_hms(data$BGN_DATE))
data <- cbind(data,year)
cleaned_data <- subset(data,select = c(year,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP))
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. we can hist the years of the cleaned data.
with(cleaned_data,hist(year,breaks=60))
We can see that the huge increasing around in the year 1995,so we select the data from 1995-2011.
completed_data <- subset(cleaned_data,year >= 1995)
Sum the column FATALITIES and INJURIES are most related to population health,we want to use the top 10 of ENTYPE to show which of type cause population health.
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, mday, month, quarter, wday, week, yday, year
dt_data <- as.data.table(completed_data)
aspect_data <- as.data.table(aggregate(FATALITIES~EVTYPE,dt_data,sum))
setorder(aspect_data,-FATALITIES)
Fatalities <- aspect_data
aspect_data <- as.data.table(aggregate(INJURIES~EVTYPE,dt_data,sum))
setorder(aspect_data,-INJURIES)
Injuries <- aspect_data
total_effect_health <- merge(Fatalities,Injuries,by = "EVTYPE")
setorder(total_effect_health,-FATALITIES,-INJURIES)
total_effect_health <- with(total_effect_health,cbind(total_effect_health,c(FATALITIES+INJURIES)))[1:10,]
The Corp damage and Property damage is related to this section.
symbol_corp <- data.frame(CROPDMGEXP=c("M","m","K","k","B"),exp = c(6,6,3,3,9))
symbol_prop <- data.frame(PROPDMGEXP=c("M","m","K","H","B"),exp = c(6,6,3,2,9))
getfull_num <- function(type,symbol,type_exp){
test_data <- completed_data
test <- merge(test_data,symbol,by = type_exp)
not_zero_data <- subset(test,test[[type]] != 0)
full_num <- not_zero_data[[type]] * 10^not_zero_data[["exp"]]
not_zero_data <- cbind(not_zero_data,full_num)
data_df <- as.data.table(aggregate(not_zero_data$full_num,by = list(not_zero_data$EVTYPE),FUN = "sum"))
setorder(data_df,-x)
top_10 <- data_df[1:30,]
setnames(top_10,"Group.1",type)
return (top_10)
}
Crop_effect <- getfull_num("CROPDMG",symbol_corp,"CROPDMGEXP")
Prop_effect <- getfull_num("PROPDMG",symbol_prop,"PROPDMGEXP")
setnames(Crop_effect,"CROPDMG","type")
setnames(Prop_effect,"PROPDMG","type")
total_effect_ecnomic <- merge(Crop_effect,Prop_effect,by = "type")
total_effect_ecnomic <- with(total_effect_ecnomic,cbind(total_effect_ecnomic,
c(x.x+x.y)))[1:10,]
setorder(total_effect_ecnomic,-V2)
library(ggplot2)
ggplot(total_effect_health, aes(x = reorder(EVTYPE, -V2),y = V2, fill = EVTYPE))+geom_bar(stat = "identity")+theme(axis.text.x = element_text(angle = 45,hjust = 1))+xlab("Types of events")+ylab("Total num")+ggtitle("Population health by types of events(Year:1995-2011)")
We can see that the most harmful weather to population health is Tornado.
ggplot(total_effect_ecnomic, aes(x = reorder(type, -V2),y = V2/10^6, fill = type))+geom_bar(stat = "identity")+theme(axis.text.x = element_text(angle = 45,hjust = 1))+xlab("Types of events")+ylab("Economic losses(Million dollars)")+ggtitle("Economic losses by types of events(Year:1995-2011)")
As the plot says, Flood has the greatest economic consequences.