This document is authored in RMarkdown language. The codes used are in R programming language and the transformation from the RMarkdown to the current document format is via R-package knitr. You will need the necessary R programming setup to rerun the RMarkdown file.
This document is to fulfill the requirement for the Reproductible Research - Peer Assignment 2. It is a course offered by John Hopkins University Bloomberg School of Public Health on Coursera.
The following codes initialize the session. It loads the required library, set the number of events to be displayed and define a simple function to convert suffix into number.
## Initialization process ...
# loading libraries
require(dplyr)
require(knitr)
## basic configuration setting
nb_top_event = 20
## Simple function to convert the exponential code (K,M,B) to equivalent decimal.
## Note : This function return 1 to any other values as it is not defined in the
## NATIONAL WEATHER SERVICE INSTRUCTION. This is because the value returned by this function will be multipled by something else. 1 is multiplication neutral.
#
## Extract from the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 [pd01016005curr.pdf | pg 12/97]
# ----------------------------------------------------
# Estimates should be rounded to three significant digits, followed by an alphabetical
# character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000.
# Alphabetical characters used to signify magnitude include
# "K" for thousands, "M" for millions, and "B" for billions.
# If additional precision is available, it may be provided in the narrative part of the entry.
# ----------------------------------------------------
multipler <- function(val) {
if(val=="" ) { # to speedup the function since most values are empty
mcoef <- 1
} else if(val=="k" | val=="K") {
mcoef <- 1000
} else if (val=="m" | val =="M") {
mcoef <- 1000000
} else if (val=="b" | val =="B") {
mcoef <- 1000000000
} else {
mcoef <- 1
}
mcoef
}
We aim to understand and answer some basic questions about severe weather events.
Storm Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is used to understand the impact to each storm type to the human health together with the economic consequences.
This report load the full database but uses only the top 20 events to understand the harm inflicted to the population health and top 20 events to show the economic consequenses.
The top events for the harm caused are aggregated and sorted in descending order sequenced by total fatalities, total injuries, total fatal events and total injury events.
The top events for the economic damages are aggregated and sorted in descending order sequenced by total damages to property and crops, total event with property damages and total events with crop damages
We observe that tornado and excessive heat harm the population the most whereas flood and hurricane/typhoon cause the most economic damages to the country.
Tornado causes far more human loss compared to other events.
Economic lost due to property damages is much higher compared to crop damages.
We also show the average loss per event as additional information to be potentially considered in the decision making.
This analysis is made to address the following questions:
This report is intended for a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, this document does not make any specific recommendations.
The datafile is loaded directly from the website. The bzip2 compressed file is read directly using read.csv. The dataset will need much more cleansing if more fields are to be used.
bzstormfile <- "stormdata.bz2" # just for file name consistency
if(!file.exists(bzstormfile)) { # no need to download everytime we run
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "bzstormfile")
}
#read.csv can read bzfiles directly. Hooray!
storm <- read.csv(bzstormfile)
dstorm <- tbl_df(storm)
warning("BGN_TIME and END_TIME format are not consistent across observation. Needs cleanup if using it")
## Warning: BGN_TIME and END_TIME format are not consistent across
## observation. Needs cleanup if using it
# if need to to convert ABCD format into AB:CD format, use the commands below
# xtime = gsub('^([0-9]{2})([0-9]+)$', '\\1:\\2',xtime)
# dstorm <- mutate(dstorm, start_date = mdy_hms(BGN_DATE), stop_date = mdy_hms(END_DATE), ....)
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The data within PROPDMGEXP and CROPDMGEXP are dirty as it has more than the (K,M,B) values as shown below. We will use the function multipler defined above to translate the exponential code into numeric value and use 1 for the invalid values
table(storm$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(storm$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Only the relevant fields are extracted from the full dataset.Damage Exponent values are transformed accordingly using the function multiplier defined above. The data is grouped by Event Type (EVTYPE)
# Selecting necessary columns only and
# insert real value of damages to properties and crop
# add more needed colum into storm_damage if more detailed analysis is needed
storm_damage <- dstorm %>%
select(STATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(PROP_MULTIPLER = sapply(PROPDMGEXP, multipler),
PROP_DAMAGE_VALUE=PROPDMG*PROP_MULTIPLER,
CROP_MULTIPLER = sapply(CROPDMGEXP, multipler),
CROP_DAMAGE_VALUE=CROPDMG*CROP_MULTIPLER
) %>%
group_by(EVTYPE)
storm_damage
## Source: local data frame [902,297 x 12]
## Groups: EVTYPE
##
## STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 AL TORNADO 0 15 25.0 K 0
## 2 AL TORNADO 0 0 2.5 K 0
## 3 AL TORNADO 0 2 25.0 K 0
## 4 AL TORNADO 0 2 2.5 K 0
## 5 AL TORNADO 0 2 2.5 K 0
## 6 AL TORNADO 0 6 2.5 K 0
## 7 AL TORNADO 0 1 2.5 K 0
## 8 AL TORNADO 0 0 2.5 K 0
## 9 AL TORNADO 1 14 25.0 K 0
## 10 AL TORNADO 0 0 25.0 K 0
## .. ... ... ... ... ... ... ... ...
## Variables not shown: PROP_MULTIPLER (dbl), PROP_DAMAGE_VALUE (dbl),
## CROP_MULTIPLER (dbl), CROP_DAMAGE_VALUE (dbl)
From the reduced dataset storm_damage, we summarize the harm inflicted by each of the event type based on
Only the top 20 events are selected based on the order enumerated above.
harmful_event <- storm_damage %>%
summarise(total_fatalities = sum(FATALITIES),
total_fatal_event=sum(FATALITIES>0),
total_injuries = sum(INJURIES),
total_injury_event=sum(INJURIES>0),
average_fatalities_per_event = total_fatalities/total_fatal_event,
average_injuries_per_event = total_injuries/total_injury_event
) %>%
arrange(desc(total_fatalities),
desc(total_injuries),
desc(total_fatal_event),
desc(total_injury_event)
)
top_harmful_event <- harmful_event[1:nb_top_event,]
The fatalities is extremely high for TORNADO followed by EXCESSIVE HEAT, FLASH FLOOD and HEAT.
par(mar=c(4,14,8,4), cex=0.75, las=1)
x_max = max(top_harmful_event$total_fatalities);
barplot(top_harmful_event$total_fatalities,
horiz=TRUE,
names.arg = top_harmful_event$EVTYPE,
xlim = c(0,ceiling(x_max/10^floor(log10(x_max)))*10^floor(log10(x_max))),
ylim=c(0,nb_top_event),
main=paste0("Total Fatalities of the top ", nb_top_event, " Event Type"),
xlab ="Total Fatalities",
ylab = ""
)
TORNADO causes the most injuries. The other events causes much less injuries compared to TORNADO.
x_max = max(top_harmful_event$total_fatalities);
par(mar=c(4,14,8,4), cex=0.75, las=1)
x_max = max(top_harmful_event$total_injuries);
barplot(top_harmful_event$total_injuries,
horiz=TRUE,
names.arg = top_harmful_event$EVTYPE,
xlim = c(0, ceiling(x_max/10^floor(log10(x_max)))*10^floor(log10(x_max))),
ylim=c(0,nb_top_event),
main=paste0("Total Injuries of the top ", nb_top_event, " Event Type"),
xlab ="Total Injuries",
ylab = ""
)
The table below shows
Only the top 20 events are selected based on the order enumerated above.
# rearrange harmful_event
harmful_event <- harmful_event %>%
arrange(desc(average_fatalities_per_event),
desc(average_injuries_per_event))
top_harm_per_event <- harmful_event[1:nb_top_event, ] %>%
select(EVTYPE,
total_fatal_event,
total_fatalities,
average_fatalities_per_event,
total_injury_event,
total_injuries,
average_injuries_per_event)
top_harm_per_event$average_fatalities_per_event <- round(top_harm_per_event$average_fatalities_per_event, 3)
top_harm_per_event$average_injuries_per_event <- round(top_harm_per_event$average_injuries_per_event, 3)
top_harm_per_event$average_injuries_per_event[is.nan(top_harm_per_event$average_injuries_per_event)]<-"_Not Applicable_"
kable(top_harm_per_event)
| EVTYPE | total_fatal_event | total_fatalities | average_fatalities_per_event | total_injury_event | total_injuries | average_injuries_per_event |
|---|---|---|---|---|---|---|
| UNSEASONABLY WARM AND DRY | 1 | 29 | 29.000 | 0 | 0 | Not Applicable |
| TORNADOES, TSTM WIND, HAIL | 1 | 25 | 25.000 | 0 | 0 | Not Applicable |
| RECORD/EXCESSIVE HEAT | 1 | 17 | 17.000 | 0 | 0 | Not Applicable |
| TSUNAMI | 2 | 33 | 16.500 | 1 | 129 | 129 |
| COLD AND SNOW | 1 | 14 | 14.000 | 0 | 0 | Not Applicable |
| STORM SURGE/TIDE | 1 | 11 | 11.000 | 1 | 5 | 5 |
| EXTREME HEAT | 9 | 96 | 10.667 | 5 | 155 | 31 |
| WINTER STORMS | 1 | 10 | 10.000 | 1 | 17 | 17 |
| TROPICAL STORM GORDON | 1 | 8 | 8.000 | 1 | 43 | 43 |
| HEAT WAVE | 26 | 172 | 6.615 | 9 | 309 | 34.333 |
| UNSEASONABLY WARM | 2 | 11 | 5.500 | 4 | 17 | 4.25 |
| HEAT | 178 | 937 | 5.264 | 46 | 2100 | 45.652 |
| HEAT WAVES | 1 | 5 | 5.000 | 0 | 0 | Not Applicable |
| STORM SURGE | 3 | 13 | 4.333 | 9 | 38 | 4.222 |
| HEAT WAVE DROUGHT | 1 | 4 | 4.000 | 1 | 15 | 15 |
| Mudslide | 1 | 4 | 4.000 | 1 | 2 | 2 |
| HIGH WIND/SEAS | 1 | 4 | 4.000 | 0 | 0 | Not Applicable |
| TORNADO | 1602 | 5633 | 3.516 | 7704 | 91346 | 11.857 |
| MARINE MISHAP | 2 | 7 | 3.500 | 1 | 5 | 5 |
| HURRICANE/TYPHOON | 19 | 64 | 3.368 | 12 | 1275 | 106.25 |
From the reduced dataset storm_damage, we summarize the economic damages by each of the event type based on
Only the top 20 events are selected based on the order enumarated above.
economic_damage <- storm_damage %>%
summarise(total_prop_damage = sum(PROP_DAMAGE_VALUE),
total_prop_event=sum(PROPDMG>0),
total_crop_damage = sum(CROP_DAMAGE_VALUE),
total_crop_event=sum(CROPDMG>0),
average_prop_damage_per_event = total_prop_damage/total_prop_event,
average_crop_damage_per_event = total_crop_damage/total_crop_event
) %>%
arrange(desc(total_prop_damage+total_crop_damage),
desc(total_prop_event),
desc(total_crop_event)
)
top_economic_damage <- economic_damage[1:nb_top_event,]
## prepare to draw stacked barplot
bplot_data <- rbind(property_damage = top_economic_damage$total_prop_damage,
crop_damage = top_economic_damage$total_crop_damage)
The total economic loss due to FLOOD is highest followed by HURRICANE/TYPHOON and TORNADO.
par(mar=c(4,14,8,4), cex=0.75, las=1)
abs_max <- max(top_economic_damage$total_prop_damage+top_economic_damage$total_crop_damage)
x_scale <- 10^floor(log10(abs_max))
barplot(bplot_data,
horiz=TRUE,
names.arg = top_economic_damage$EVTYPE,
xlim = c(0, x_max = ceiling(abs_max/x_scale))*x_scale,
ylim=c(0,nb_top_event),
main=paste0("Total Damage of the top ", nb_top_event, " Event Type"),
xlab ="Total Prop Damage Value, USD",
ylab = "",
col=c("red","green")
)
legend("topright", legend = c("Property Damage", "Crop Damage"), fill=c("red","green"))
The table below shows the top 20 events based on the average property damages per event type.
# rearrange economic_damage
economic_damage <- economic_damage %>%
arrange(desc(average_prop_damage_per_event))
top_prop_damage_per_event <- economic_damage[1:nb_top_event, ] %>%
select(EVTYPE,
total_prop_event,
total_prop_damage,
average_prop_damage_per_event)
kable(top_prop_damage_per_event)
| EVTYPE | total_prop_event | total_prop_damage | average_prop_damage_per_event |
|---|---|---|---|
| HEAVY RAIN/SEVERE WEATHER | 1 | 2500000000 | 2500000000 |
| TORNADOES, TSTM WIND, HAIL | 1 | 1600000000 | 1600000000 |
| HURRICANE/TYPHOON | 69 | 69305840000 | 1004432464 |
| HURRICANE OPAL | 8 | 3172846000 | 396605750 |
| STORM SURGE | 173 | 43323536000 | 250425064 |
| SEVERE THUNDERSTORM | 6 | 1205360000 | 200893333 |
| WILD FIRES | 4 | 624100000 | 156025000 |
| HURRICANE OPAL/HIGH WINDS | 1 | 100000000 | 100000000 |
| STORM SURGE/TIDE | 47 | 4641188000 | 98748681 |
| HURRICANE | 121 | 11868319010 | 98085281 |
| HAILSTORM | 3 | 241000000 | 80333333 |
| TYPHOON | 9 | 600230000 | 66692222 |
| WINTER STORM HIGH WINDS | 1 | 60000000 | 60000000 |
| HURRICANE ERIN | 5 | 258100000 | 51620000 |
| RIVER FLOOD | 102 | 5118945500 | 50185740 |
| HURRICANE EMILY | 1 | 50000000 | 50000000 |
| MAJOR FLOOD | 3 | 105000000 | 35000000 |
| WILDFIRES | 3 | 100500000 | 33500000 |
| HIGH WINDS/COLD | 5 | 110500000 | 22100000 |
| River Flooding | 5 | 106155000 | 21231000 |
The table below shows the top 20 events based on the average crop damages per event type.
# rearrange economic_damage
economic_damage <- economic_damage %>%
arrange(desc(average_crop_damage_per_event))
top_crop_damage_per_event <- economic_damage[1:nb_top_event, ] %>%
select(EVTYPE,
total_crop_event,
total_crop_damage,
average_crop_damage_per_event)
kable(top_crop_damage_per_event)
| EVTYPE | total_crop_event | total_crop_damage | average_crop_damage_per_event |
|---|---|---|---|
| RIVER FLOOD | 20 | 5029459000 | 251472950 |
| EXCESSIVE HEAT | 2 | 492402000 | 246201000 |
| ICE STORM | 24 | 5022113500 | 209254729 |
| EXCESSIVE WETNESS | 1 | 142000000 | 142000000 |
| DAMAGING FREEZE | 3 | 262100000 | 87366667 |
| HURRICANE/TYPHOON | 33 | 2607872800 | 79026448 |
| COLD AND WET CONDITIONS | 1 | 66000000 | 66000000 |
| HURRICANE | 48 | 2741910000 | 57123125 |
| DROUGHT | 251 | 13972566000 | 55667594 |
| Early Frost | 1 | 42000000 | 42000000 |
| HEAT | 10 | 401461500 | 40146150 |
| HURRICANE ERIN | 4 | 136010000 | 34002500 |
| FREEZE | 15 | 446225000 | 29748333 |
| EXTREME COLD | 49 | 1292973000 | 26387204 |
| FLOOD/RAIN/WINDS | 5 | 112800000 | 22560000 |
| Extreme Cold | 1 | 20000000 | 20000000 |
| BLIZZARD | 6 | 112060000 | 18676667 |
| Damaging Freeze | 2 | 34130000 | 17065000 |
| SEVERE THUNDERSTORMS | 1 | 17000000 | 17000000 |
| HEAVY RAINS | 4 | 60500000 | 15125000 |