Peer-assignment 2 for Reproducible Research course

Before processing the data, here is the instructions about how data collected and structured:
Storm Events Database

Event Types Available:

  1. Tornado: From 1950 through 1954, only tornado events were recorded.
  2. Tornado, Thunderstorm Wind and Hail: From 1955 through 1992, only tornado, thunderstorm wind and hail events were keyed from the paper publications into digital data. From 1993 to 1995, only tornado, thunderstorm wind and hail events have been extracted from the Unformatted Text Files.
  3. All Event Types (48 from Directive 10-1605): From 1996 to present, 48 event types are recorded as defined in NWS Directive 10-1605.

Detailed information about the field/columns

column information

Data Processing

  • Create and set working directory
rm(list=ls())
folder <- 'C:/Personal/CourseRA/Reproducible-Research/PA2'
if(file.exists(folder)){
  setwd(folder)
}else{
  dir.create(folder)
  setwd(folder)
}
  • Download raw data
file_url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if(!file.exists('PA2-data.csv.bz2')){
  download.file(file_url,'PA2-data.csv.bz2')  
}
  • Load data from bz2 file
data_handle <- bzfile('PA2-data.csv.bz2','r')
data <- read.csv(data_handle)
close(data_handle)
  • Data processing and cleaning
data$BGN_DATE <- as.Date(as.character(data$BGN_DATE),"%m/%d/%Y %H:%M:%S")
data$END_DATE <- as.Date(data$END_DATE,"%m/%d/%Y %H:%M:%S")

Result

Question 1: Most Harmful Events

Across the United States, which types of events are most harmful with respect to population health?

  • There are two types of harmful events reported in the record documents:
  1. Total Fatalities;
  2. Total Injuries;

We counted the Total number of Fatality and Injury per event type as follows. Specially, the severity is ranked primarily based on the Fatality:

stats_evtype <- data %>% group_by(EVTYPE) %>% summarize(Fatal_rate=sum(FATALITIES),  Injury_rate=sum(INJURIES)) %>% arrange(desc(Fatal_rate), desc(Injury_rate))

According to report, The most harmful type of events is TORNADO, which caused 5633 people died and 91346 people injuried.

  • Here is the Top 10 most harmful events in U.S:
plot_num <- 10 # Only top 20 type of events are plotted
plot_data <-  stats_evtype[1:plot_num,] %>% gather(Type, Value, -EVTYPE) %>% arrange(Type, desc(Value)) %>% mutate(EVTYPE=factor(EVTYPE, levels=unique(EVTYPE), ordered=TRUE))

p <- ggplot(plot_data, aes(EVTYPE, Value))+
  geom_bar(aes(fill=Type), position = "dodge", stat="identity")+facet_grid(.~Type)+
  xlab("Event Type") + ylab("Total Fatal/Injury Numbers")+
  scale_fill_manual(values = c("red", "blue"))+
  theme(axis.text.x = element_text(angle=90))
p

Question 2:

Across the United States, which types of events have the greatest economic consequences?

On the other side, we focus on the economic consequences of harmful events across the U.S.
Similarly, we counted the total value loss/damage about both Property and Crop, and ranked events primarily based on Property Damage

According to the column information, only “h”, “m”, “K”, “M”, “m” and “B” could be explained in column PROPDMGEXP and column CROPDMGEXP. Numeric levels xare considered as 10^x; The other levels (+, -, ?) are ingored in this analysis and considered as same as 0.

In details, we modified the level into numbers as follows:

levels(data$PROPDMGEXP) <- c(0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 2, 3, 6, 6 )
levels(data$CROPDMGEXP) <- c(0, 0, 0, 2, 9, 3, 3, 6, 6)

Now we formed the new data table with updated Property and Crop Damage values:

data_new <- data %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% mutate(Prop_tot = PROPDMG * 10^(as.integer(as.matrix(PROPDMGEXP)))) %>% mutate(Crop_tot = CROPDMG * 10^(as.integer(as.matrix(CROPDMGEXP))))

Then we measured the result for each events using the updated Damage values:

Eco_evtype <- data_new %>% group_by(EVTYPE) %>% summarize(PROP_rate=sum(Prop_tot), CROP_rate=sum(Crop_tot)) %>% arrange(desc(PROP_rate), desc(CROP_rate))

According to report, The type of events having greatest economic consequences is FLOOD, which caused 1.446577110^{11} Property damage in total and 5.661968410^{9} Crop damage in total.

  • Here is the Top 10 events having greatest economic consequences in U.S:
plot_num <- 10 # Only top 20 type of events are plotted
plot_data <-  Eco_evtype[1:plot_num,] %>% gather(Type, Value, -EVTYPE) %>% arrange(desc(Type), desc(Value)) %>% mutate(EVTYPE=factor(EVTYPE, levels=unique(EVTYPE), ordered=TRUE))

p <- ggplot(plot_data, aes(EVTYPE, Value))+
  geom_bar(aes(fill=Type), position = "dodge", stat="identity")+facet_grid(.~Type)+
  xlab("Event Type") + ylab("Total Crop/Property Damage value")+
  scale_fill_manual(values = c("green", "orange"))+
  theme(axis.text.x = element_text(angle=90))
p