Based on the history data of major storms and weather events in the United States from the U.S. National Oceanic and Atmospheric Administration’s (NOAA), we find that a huge impact on the American population health is mostly from Tornado. However, the greatest impact on economic consequences is caused by Flood. In this study, we measure an impact on population health based on the number of people injured in each event type. Also, an impact on economic consequences is also measured by the summation of damage value of properties and crops, without adjustment to present value.
This part shows that the study downloads the data from the link provided from Coursera. And then it is read into R Studio. Finally, we fix the format of the date in columns “BGN_DATE” (begin date of the event) and “END_DATE” (end date of the event)
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("repdata-data-StormData.csv.bz2")){
download.file(fileurl,destfile = "./repdata-data-StormData.csv.bz2", method = "curl")
}
df <- read.csv(bzfile("./repdata-data-StormData.csv.bz2"), stringsAsFactors = FALSE)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df$BGN_DATE <- as.Date(df$BGN_DATE, "%m/%d/%Y")
df$END_DATE <- as.Date(df$END_DATE, "%m/%d/%Y")
The next step is to find the event type that is most harmful to population health. We measure the level of being “harmful to population health” by using the number of people injured from each event. This data is available in the column “INJURIES”.
library(dplyr)
df1 <- df %>% group_by(EVTYPE) %>% summarize(sum(INJURIES))
df1 <- data.frame(df1)
df1_sort <- df1[order(df1$sum.INJURIES., decreasing = TRUE),]
From this bar chart below, we can see that the event type that caused most injuries is Tornado.
library(ggplot2)
df1_sort$EVTYPE <- factor(df1_sort$EVTYPE, levels = df1_sort[order(df1_sort$sum.INJURIES., decreasing = TRUE),]$EVTYPE)
qplot(EVTYPE, sum.INJURIES., data = df1_sort[1:10,], geom = "bar", stat = "identity", main = "Top 10 events that causes most effect to the American population health", xlab = "Event Type", ylab = "Number of People Injured")
Next, we will find the event type that caused most economic damage. We measure “economic damage” from 2 main variables: properties damage (from column “PROPDMG” and “PROPDMGEXP”) and crops damage (from column “CROPDMG” and “CROPDMGEXP”). However, given a scope of this researh, we decide not to convert the value in data into present value.
Because properties and crops damage are represented in 2 separate columns, we need to calculate the right amount of each damage before we find the event type that make the most damage.
explist <- unique(c(unique(df$PROPDMGEXP),unique(df$CROPDMGEXP)))
convlist <- c(10^3,10^6,1,10^9,10^6,1,1,10^5,10^6,1,10^4,10^2,10^3,10^2,10^7,10^2,1,1,10^8,10^3)
dfconvprop <- data.frame(explist,convlist)
df2 <- merge(df,dfconvprop,by.x = "PROPDMGEXP", by.y = "explist", all.x = TRUE, all.y = FALSE)
names(df2)[38] <- "propconvlist"
df2 <- merge(df2,dfconvprop,by.x = "CROPDMGEXP", by.y = "explist", all.x = TRUE, all.y = FALSE)
names(df2)[39] <- "cropconvlist"
df2$ECONDMG <- (df2$PROPDMG * df2$propconvlist) + (df2$CROPDMG * df2$cropconvlist)
From this bar chart below, we can see that the event type that caused most economic damage is Flood.
df2a <- df2 %>% group_by(EVTYPE) %>% summarize(sum(ECONDMG))
df2a <- data.frame(df2a)
df2a_sort <- df2a[order(df2a$sum.ECONDMG., decreasing = TRUE),]
df2a_sort$EVTYPE <- factor(df2a_sort$EVTYPE, levels = df2a_sort[order(df2a_sort$sum.ECONDMG., decreasing = TRUE),]$EVTYPE)
qplot(EVTYPE, sum.ECONDMG., data = df2a_sort[1:10,], geom = "bar", stat = "identity", main = "Top 10 event type that caused most economic damage in the US.", xlab = "Event Type", ylab = "Amount of Economic Damage from both Properties and Crops (in USD)")
Based on analysis above, Tornado caused most injuries in the US, with a total number of people injured shown below.
df1_sort$EVTYPE[1]
## [1] TORNADO
## 985 Levels: TORNADO TSTM WIND FLOOD EXCESSIVE HEAT LIGHTNING ... wet micoburst
format(df1_sort$sum.INJURIES.[1], big.mark = ",")
## [1] "91,346"
Also, Flood caused most economic consequences (damage) in the US, with a total of damage shown below.
df2a_sort$EVTYPE[1]
## [1] FLOOD
## 985 Levels: FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE ... wet micoburst
format(df2a_sort$sum.ECONDMG.[1], big.mark = ",")
## [1] "150,319,678,257"