Gilbert T. Tarus
This project aims to find the most harmful weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. In order to investigate these events, we use the estimates of fatalities, injuries, property damage, crop damage and event types. From these data, we found that TORNADO and Excessive Heat are the most harmful to the population health, while Flood,Hurricane/Typhoon,Tornado and Storm Surge have the greatest economic consequences.
The data for this project waas obtained from the course project’s website. It comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded using this link.
library(dplyr)
library(ggplot2)
library(R.utils)
The data has been downloaded into the local disk.
if(exists("stormdata.csv")){
stormdata <- read.table("stormdata.csv",sep = ",",header = TRUE)
}else{
stormdata <- read.table(bunzip2("repdata_data_StormData.csv.bz2","stormdata.csv"),sep = ",",header = TRUE)
}
dim(stormdata)
## [1] 902297 37
names(stormdata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
In order to assess the health and economic effects of weather events, we select the required variables from the storm data for the analysis. The required variables are:
EVTYPE: weather event.
For Economic consequences, we select:
For Health effect, we select:
working_data <- stormdata %>%
select(EVTYPE,PROPDMGEXP,PROPDMG,CROPDMG,CROPDMGEXP,FATALITIES,INJURIES)
check <- NULL
for (i in 1:ncol(working_data)) {
check[i] <- mean(is.na(working_data[,i]))
}
check
## [1] 0 0 0 0 0 0 0
1. Population health
Injuries
injuries <- working_data %>%
select(EVTYPE,INJURIES) %>%
group_by(EVTYPE) %>%
summarise(Total.injuries = sum(INJURIES,na.rm = TRUE)) %>%
arrange(desc(Total.injuries))
## `summarise()` ungrouping output (override with `.groups` argument)
head(injuries,10)
## Warning: `...` is not empty.
##
## We detected these problematic arguments:
## * `needs_dots`
##
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 10 x 2
## EVTYPE Total.injuries
## <fct> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Fatalities
fatalities <- working_data %>%
select(EVTYPE,FATALITIES) %>%
group_by(EVTYPE) %>%
summarise(Total.fatalities = sum(FATALITIES,na.rm = TRUE)) %>%
arrange(desc(Total.fatalities))
## `summarise()` ungrouping output (override with `.groups` argument)
head(fatalities,10)
## Warning: `...` is not empty.
##
## We detected these problematic arguments:
## * `needs_dots`
##
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 10 x 2
## EVTYPE Total.fatalities
## <fct> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Economic consequences
Property damage(PROPDMG) and Crop damage(CROPDMG) are two featuures of economic effects of weather events. The value of such damages in USD is given in PROPDMGEXP and CROPDMGEXP variables. According to this link;
The possible values of CROPDMGEXP and PROPDMGEXP are:
These levels can be interpreted as:
These two variables have to be transformed into one unit (USD) using the information above.
#PROPDMGEXP
df <- working_data
symbolsP <- levels(df$PROPDMGEXP)
PropdmgexpSV <- c(0,0,0,1,rep(10,9),10^9,100,100,1000,10^6,10^6)
value.data <- data.frame(PROPDMGEXP=symbolsP,PropdmgexpSV)
# Merge with the df
df <- merge(df,value.data)
# CROPDMGEXP
symbolsC <- levels(df$CROPDMGEXP)
CropdmgexpSV <- c(0,0,10,10,10^9,10^3,10^3,10^6,10^6)
value.data2 <- data.frame(CROPDMGEXP = symbolsC,CropdmgexpSV)
## Merge
df <- merge(df,value.data2)
# Create a data frame with total value of economic damage
df <- df %>%
mutate(
PropdmgV = PROPDMG*PropdmgexpSV,
CropdmgV = CROPDMG*CropdmgexpSV,
Total.damage = PropdmgV+CropdmgV
)
head(df)
## CROPDMGEXP PROPDMGEXP EVTYPE PROPDMG CROPDMG FATALITIES INJURIES
## 1 TSTM WIND 0 0 0 0
## 2 TSTM WIND 0 0 0 0
## 3 TSTM WIND 0 0 0 0
## 4 TSTM WIND 0 0 0 0
## 5 TSTM WIND 0 0 0 0
## 6 HAIL 0 0 0 0
## PropdmgexpSV CropdmgexpSV PropdmgV CropdmgV Total.damage
## 1 0 0 0 0 0
## 2 0 0 0 0 0
## 3 0 0 0 0 0
## 4 0 0 0 0 0
## 5 0 0 0 0 0
## 6 0 0 0 0 0
Economic.DMG <- df %>%
group_by(EVTYPE) %>%
summarise(total.DMG = sum(Total.damage)) %>%
arrange(desc(total.DMG),EVTYPE)
## `summarise()` ungrouping output (override with `.groups` argument)
head(Economic.DMG)
## Warning: `...` is not empty.
##
## We detected these problematic arguments:
## * `needs_dots`
##
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 2
## EVTYPE total.DMG
## <fct> <dbl>
## 1 FLOOD 150319678250
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352117607
## 4 STORM SURGE 43323541000
## 5 HAIL 18758224527
## 6 FLASH FLOOD 17562132111
Population Health
ggplot(fatalities[1:10,],aes(reorder(EVTYPE,-Total.fatalities),Total.fatalities))+
geom_bar(stat = "identity",fill = "red")+
coord_cartesian(ylim = c(0,6000))+
theme(axis.text.y = element_text(color = "red"),axis.text.x = element_text(angle = 90,hjust = 1,vjust = 0.5),panel.background = element_rect(fill = "wheat"),axis.line = element_line(arrow = arrow()),axis.text = element_text(color = "navyblue"))+
labs(x = "Event Type", y = "Total Fatalities",title = "Top 10 Events Causing Highest Fatalities")
ggplot(injuries[1:10,],aes(reorder(EVTYPE,-Total.injuries),Total.injuries))+
geom_bar(stat = "identity",fill = "navyblue")+
coord_cartesian(ylim = c(0,100000))+
theme(axis.text.y = element_text(color = "red"),axis.text.x = element_text(angle = 90,hjust = 1,vjust = 0.5),panel.background = element_rect(fill = "pink"),axis.line = element_line(arrow = arrow()),axis.text = element_text(color = "navyblue"))+
labs(x = "Event Type", y = "Total Injuries",title = "Top 10 Events Causing Highest Injuries")
ggplot(Economic.DMG[1:10,],aes(x = reorder(EVTYPE,-total.DMG), y = total.DMG/10^9))+
geom_bar(stat = "identity",fill = rgb(0.5,0,0.5))+#
coord_cartesian(ylim = c(0,200))+
theme(axis.text.y = element_text(color = "red"),axis.text.x = element_text(angle = 90),panel.background = element_rect(fill = "wheat"),axis.line = element_line(arrow = arrow()),axis.text = element_text(color = "navyblue"),axis.title = element_text(color = "blue"))+
labs(x = "Event Type",y = "Total Economic Damage( x 1e+09 USD)",title = "Top 10 Events With Highest Economic Impacts")