Before processing the data, here is the instructions about how data collected and structured:
Storm Events Database
rm(list=ls())
folder <- 'C:/Personal/CourseRA/Reproducible-Research/PA2'
if(file.exists(folder)){
setwd(folder)
}else{
dir.create(folder)
setwd(folder)
}
file_url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if(!file.exists('PA2-data.csv.bz2')){
download.file(file_url,'PA2-data.csv.bz2')
}
data_handle <- bzfile('PA2-data.csv.bz2','r')
data <- read.csv(data_handle)
close(data_handle)
data$BGN_DATE <- as.Date(as.character(data$BGN_DATE),"%m/%d/%Y %H:%M:%S")
data$END_DATE <- as.Date(data$END_DATE,"%m/%d/%Y %H:%M:%S")
Across the United States, which types of events are most harmful with respect to population health?
We counted the Total number of Fatality and Injury per event type as follows. Specially, the severity is ranked primarily based on the Fatality:
stats_evtype <- data %>% group_by(EVTYPE) %>% summarize(Fatal_rate=sum(FATALITIES), Injury_rate=sum(INJURIES)) %>% arrange(desc(Fatal_rate), desc(Injury_rate))
According to report, The most harmful type of events is TORNADO, which caused 5633 people died and 91346 people injuried.
plot_num <- 10 # Only top 20 type of events are plotted
plot_data <- stats_evtype[1:plot_num,] %>% gather(Type, Value, -EVTYPE) %>% arrange(Type, desc(Value)) %>% mutate(EVTYPE=factor(EVTYPE, levels=unique(EVTYPE), ordered=TRUE))
p <- ggplot(plot_data, aes(EVTYPE, Value))+
geom_bar(aes(fill=Type), position = "dodge", stat="identity")+facet_grid(.~Type)+
xlab("Event Type") + ylab("Total Fatal/Injury Numbers")+
scale_fill_manual(values = c("red", "blue"))+
theme(axis.text.x = element_text(angle=90))
p
Across the United States, which types of events have the greatest economic consequences?
On the other side, we focus on the economic consequences of harmful events across the U.S.
Similarly, we counted the total value loss/damage about both Property and Crop, and ranked events primarily based on Property Damage
According to the column information, only “h”, “m”, “K”, “M”, “m” and “B” could be explained in column PROPDMGEXP and column CROPDMGEXP. Numeric levels xare considered as 10^x; The other levels (+, -, ?) are ingored in this analysis and considered as same as 0.
In details, we modified the level into numbers as follows:
levels(data$PROPDMGEXP) <- c(0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 2, 3, 6, 6 )
levels(data$CROPDMGEXP) <- c(0, 0, 0, 2, 9, 3, 3, 6, 6)
Now we formed the new data table with updated Property and Crop Damage values:
data_new <- data %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% mutate(Prop_tot = PROPDMG * 10^(as.integer(as.matrix(PROPDMGEXP)))) %>% mutate(Crop_tot = CROPDMG * 10^(as.integer(as.matrix(CROPDMGEXP))))
Then we measured the result for each events using the updated Damage values:
Eco_evtype <- data_new %>% group_by(EVTYPE) %>% summarize(PROP_rate=sum(Prop_tot), CROP_rate=sum(Crop_tot)) %>% arrange(desc(PROP_rate), desc(CROP_rate))
According to report, The type of events having greatest economic consequences is FLOOD, which caused 1.446577110^{11} Property damage in total and 5.661968410^{9} Crop damage in total.
plot_num <- 10 # Only top 20 type of events are plotted
plot_data <- Eco_evtype[1:plot_num,] %>% gather(Type, Value, -EVTYPE) %>% arrange(desc(Type), desc(Value)) %>% mutate(EVTYPE=factor(EVTYPE, levels=unique(EVTYPE), ordered=TRUE))
p <- ggplot(plot_data, aes(EVTYPE, Value))+
geom_bar(aes(fill=Type), position = "dodge", stat="identity")+facet_grid(.~Type)+
xlab("Event Type") + ylab("Total Crop/Property Damage value")+
scale_fill_manual(values = c("green", "orange"))+
theme(axis.text.x = element_text(angle=90))
p