Synopsis
This analysis is a exploration study looking into the impact of storm events in the United States. It focuses on the records between 1993 to 2011. It also only uses the data where the PROPDMGEXP is K(thousand), M(million) and B(Billion). As the others are less than 1%, the result is valid. The figures explain the thought process for the analysis rather the result. The results are presented in the tables.
The desciption and justification of the data is in the note on the R code.
Data Processing: Download and Read.
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfolder<-"C:/Users/Ms. Kui/Desktop/Coursera/Course5/Project2"
dir.create(destfolder, showWarnings = TRUE)
setwd(destfolder)
destfile<-paste(destfolder,"/stormdata.csv", sep="")
if(!file.exists("stormdata.csv"))
download.file(url, destfile = destfile, method = "auto")
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
##Not to considfer the number of the event type, select only the columns pertain to population health, summarize the number by the event type. It then sorts them by injuries and fatalities
ehh<-stdata %>%
select(EVTYPE, INJURIES, FATALITIES) %>%
group_by(EVTYPE) %>%
summarize(TOTAL_INJURIES = sum(INJURIES), TOTAL_FATALITIES = sum(FATALITIES)) %>%
arrange(desc(TOTAL_INJURIES), desc(TOTAL_FATALITIES))
# Explore the data usinng chart to see the combination effect of injuries and fatalities using plot.
g<-ggplot(ehh, aes(TOTAL_INJURIES, TOTAL_FATALITIES))
g+geom_point(size = 4)+ labs(x = "Total Injuries", y = "Total Fatalities", title="Explore the Range of Damage to Population Health by Events")

# It is obvious that some event has a significant effect than the others which has injuries number larger than 75000.
top_ehh<-filter(ehh, TOTAL_INJURIES >75000)
The event type that is the most harmful to population health is:
## # A tibble: 1 x 3
## EVTYPE TOTAL_INJURIES TOTAL_FATALITIES
## <fct> <dbl> <dbl>
## 1 TORNADO 91346 5633
## To consider the number of the event type in recorded in different year. Include the year variable, and convert it into date time formate. Extra the year.
ehh_year<-stdata %>%
select(BGN_DATE, EVTYPE, INJURIES, FATALITIES) %>%
mutate(BGN_DATE = mdy_hms(BGN_DATE), year = year(BGN_DATE))
## See how many type of events recorded in the year.
ehh_record_by_year<-ehh_year %>% group_by(year) %>% summarize(unique_event = n_distinct(EVTYPE))
qplot(year, unique_event, data = ehh_record_by_year, main = "Explore the Range of Years When More Event Types Collected")

## It turns out that before 1993, a few type of events were recorded. So filter only the ones in and after 1993.
ehh_consider_year<-filter(ehh_year, year>=1993)
## Summarize the injuries and fatalities by event type
ehh_result <- ehh_consider_year %>%
group_by(EVTYPE) %>%
summarize(Total_INJ = sum(INJURIES), Total_FAT = sum(FATALITIES)) %>%
mutate(combination = Total_INJ*Total_FAT) %>%
arrange(desc(combination))
Result: Top ten events affect population health are:
## # A tibble: 10 x 4
## EVTYPE Total_INJ Total_FAT combination
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 23310 1621 37785510
## 2 EXCESSIVE HEAT 6525 1903 12417075
## 3 LIGHTNING 5230 816 4267680
## 4 FLOOD 6789 470 3190830
## 5 HEAT 2100 937 1967700
## 6 FLASH FLOOD 1777 978 1737906
## 7 TSTM WIND 3631 241 875071
## 8 HIGH WIND 1137 248 281976
## 9 WINTER STORM 1321 206 272126
## 10 THUNDERSTORM WIND 1488 133 197904
Across the United States, which types of events have the greatest economic consequences?
## Select all data in and after 1993 since it has more records on different types of events. Only get the ones have a valid PROPDMGEXP. The others are only small % seeing the number of records that are not either K, M or B as the PROPDMGEXP.
gec <-stdata %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, BGN_DATE) %>%
mutate(year = year(mdy_hms(BGN_DATE)))
## Convert PRODMGEXP into character for filtering.
gec$PROPDMGEXP<-as.character(gec$PROPDMGEXP)
gec <-gec %>%
filter(year >=1993, PROPDMGEXP == "K" | PROPDMGEXP== "M"| PROPDMGEXP =="B")
## Create a data frame to match the value.
conunit<-data.frame(c("B", "M", "K"), c(1000000000, 1000000, 1000))
colnames(conunit)<-c("unit", "convert_amount")
conunit$unit<-as.character(conunit$unit)
gec_value<-left_join(gec, conunit, by = c("PROPDMGEXP" = "unit"))
## Calculate the property damage by the converted amount.
gec_value<-gec_value %>%
mutate(DMG_AMT_Billion = PROPDMG * convert_amount/1000000)
gec_result<-gec_value %>%
group_by(EVTYPE) %>%
summarize(Total_DMG_Amount_Billion = sum(DMG_AMT_Billion))%>%
arrange(desc(Total_DMG_Amount_Billion))
Result: Top ten types of events have the greatest economic consequences:
## # A tibble: 10 x 2
## EVTYPE Total_DMG_Amount_Billion
## <fct> <dbl>
## 1 FLOOD 144658.
## 2 HURRICANE/TYPHOON 69306.
## 3 STORM SURGE 43324.
## 4 TORNADO 26327.
## 5 FLASH FLOOD 16141.
## 6 HAIL 15727.
## 7 HURRICANE 11868.
## 8 TROPICAL STORM 7704.
## 9 WINTER STORM 6688.
## 10 HIGH WIND 5270.