Brief

This is an R Markdown document to answer the 2 questions below. The data can be downloaded here

Synopsis

This analysis is a exploration study looking into the impact of storm events in the United States. It focuses on the records between 1993 to 2011. It also only uses the data where the PROPDMGEXP is K(thousand), M(million) and B(Billion). As the others are less than 1%, the result is valid. The figures explain the thought process for the analysis rather the result. The results are presented in the tables.

The desciption and justification of the data is in the note on the R code.

Data Processing: Download and Read.

url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

destfolder<-"C:/Users/Ms. Kui/Desktop/Coursera/Course5/Project2"
dir.create(destfolder, showWarnings = TRUE)
setwd(destfolder)
destfile<-paste(destfolder,"/stormdata.csv", sep="")
if(!file.exists("stormdata.csv"))
  download.file(url, destfile = destfile, method = "auto")

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

##Not to considfer the number of the event type, select only the columns pertain to population health, summarize the number by the event type. It then sorts them by injuries and fatalities
ehh<-stdata %>% 
  select(EVTYPE, INJURIES, FATALITIES) %>% 
  group_by(EVTYPE) %>% 
  summarize(TOTAL_INJURIES = sum(INJURIES), TOTAL_FATALITIES = sum(FATALITIES)) %>% 
  arrange(desc(TOTAL_INJURIES), desc(TOTAL_FATALITIES))
# Explore the data usinng chart to see the combination effect of injuries and fatalities using plot. 
g<-ggplot(ehh, aes(TOTAL_INJURIES, TOTAL_FATALITIES))
g+geom_point(size = 4)+ labs(x = "Total Injuries", y = "Total Fatalities", title="Explore the Range of Damage to Population Health by Events")

# It is obvious that some event has a significant effect than the others which has injuries number larger than 75000.
top_ehh<-filter(ehh, TOTAL_INJURIES >75000)

The event type that is the most harmful to population health is:

## # A tibble: 1 x 3
##   EVTYPE  TOTAL_INJURIES TOTAL_FATALITIES
##   <fct>            <dbl>            <dbl>
## 1 TORNADO          91346             5633
## To consider the number of the event type in recorded in different year. Include the year variable, and convert it into date time formate. Extra the year. 
ehh_year<-stdata %>%
  select(BGN_DATE, EVTYPE, INJURIES, FATALITIES) %>%
  mutate(BGN_DATE = mdy_hms(BGN_DATE), year = year(BGN_DATE))
## See how many type of events recorded in the year. 
ehh_record_by_year<-ehh_year %>% group_by(year) %>% summarize(unique_event = n_distinct(EVTYPE))
qplot(year, unique_event, data = ehh_record_by_year, main = "Explore the Range of Years When More Event Types Collected")

## It turns out that before 1993, a few type of events were recorded. So filter only the ones in and after 1993. 
ehh_consider_year<-filter(ehh_year, year>=1993)

## Summarize the injuries and fatalities by event type

ehh_result <- ehh_consider_year %>% 
  group_by(EVTYPE) %>% 
  summarize(Total_INJ = sum(INJURIES), Total_FAT = sum(FATALITIES)) %>% 
  mutate(combination = Total_INJ*Total_FAT) %>% 
  arrange(desc(combination))

Result: Top ten events affect population health are:

## # A tibble: 10 x 4
##    EVTYPE            Total_INJ Total_FAT combination
##    <fct>                 <dbl>     <dbl>       <dbl>
##  1 TORNADO               23310      1621    37785510
##  2 EXCESSIVE HEAT         6525      1903    12417075
##  3 LIGHTNING              5230       816     4267680
##  4 FLOOD                  6789       470     3190830
##  5 HEAT                   2100       937     1967700
##  6 FLASH FLOOD            1777       978     1737906
##  7 TSTM WIND              3631       241      875071
##  8 HIGH WIND              1137       248      281976
##  9 WINTER STORM           1321       206      272126
## 10 THUNDERSTORM WIND      1488       133      197904

Across the United States, which types of events have the greatest economic consequences?

## Select all data in and after 1993 since it has more records on different types of events. Only get the ones have a valid PROPDMGEXP. The others are only small % seeing the number of records that are not either K, M or B as the PROPDMGEXP. 
gec <-stdata %>% 
  select(EVTYPE, PROPDMG, PROPDMGEXP, BGN_DATE) %>% 
  mutate(year = year(mdy_hms(BGN_DATE)))
## Convert PRODMGEXP into character for filtering. 
gec$PROPDMGEXP<-as.character(gec$PROPDMGEXP)
gec <-gec %>%
    filter(year >=1993, PROPDMGEXP == "K" | PROPDMGEXP== "M"| PROPDMGEXP =="B")
## Create a data frame to match the value. 
conunit<-data.frame(c("B", "M", "K"), c(1000000000, 1000000, 1000))
colnames(conunit)<-c("unit", "convert_amount")
conunit$unit<-as.character(conunit$unit)
gec_value<-left_join(gec, conunit, by = c("PROPDMGEXP" = "unit"))

## Calculate the property damage by the converted amount. 
gec_value<-gec_value %>% 
  mutate(DMG_AMT_Billion = PROPDMG * convert_amount/1000000)
gec_result<-gec_value %>% 
  group_by(EVTYPE) %>% 
  summarize(Total_DMG_Amount_Billion = sum(DMG_AMT_Billion))%>%
  arrange(desc(Total_DMG_Amount_Billion))

Result: Top ten types of events have the greatest economic consequences:

## # A tibble: 10 x 2
##    EVTYPE            Total_DMG_Amount_Billion
##    <fct>                                <dbl>
##  1 FLOOD                              144658.
##  2 HURRICANE/TYPHOON                   69306.
##  3 STORM SURGE                         43324.
##  4 TORNADO                             26327.
##  5 FLASH FLOOD                         16141.
##  6 HAIL                                15727.
##  7 HURRICANE                           11868.
##  8 TROPICAL STORM                       7704.
##  9 WINTER STORM                         6688.
## 10 HIGH WIND                            5270.

Summary:

The event type that is the most damaging to population health is tornado. The event type has the greatest economic consequence is flood. However, the result may not be accurate because not all events are recorded for every year. It takes futher analysis for verification.