Identify the weather events that most affects a country’s population health as well as those resulting in the most damage.
After downloading the weather storm data from the specified site, read in the datafile identifying the following weather events that have fatalities, property damage, and crop damage associated with them. Since the actual weather event descriptions are free form text, there is a strong possibility that this data is inconsistent, duplicated, and misspelled. With this fact, group the weather events into more consistent categories before identifying the top 25 events resulting in the most fatalities, property damage, and crop damage.
Retrieve the storm data file from the specified source, e.g. https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. Save the file as “repdata_data_StormData.csv.bz2”. The program reads the bz2 (Bzip2 compressed) formatted file directly as the initial processing step.
# Source file obtained from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"
, "repdata_data_StormData.csv")
, header = TRUE
,sep = ",")
A preliminary analysis of the file revealed over 150 unique weather events. Many were duplicates or were very similar to each other while others were misspelled. In an attempt to gain more meaningful results, the weather events were evaluated and regrouped into more generalized categories. The function, assignGroup, takes a weather event as an input parameter and returns a more consistent and standardized weather grouping in all capital letters.
# group weather events into consistent categories
assignGroup <- function(p_type) {
if (!is.null(p_type)) {
p_type <- toupper(p_type)
# manually investigate and re-group items
if (p_type == "AVALANCE") {
p_type <- "AVALANCHE"
}
else if (p_type == "COASTAL FLOOD") {
p_type <- "COASTAL FLOODING"
}
else if (p_type == "COASTALSTORM") {
p_type <- "COASTAL STORM"
}
else if (p_type %in% c("COLD"
,"COLD AND SNOW"
,"COLD TEMPERATURE"
,"COLD WAVE"
,"COLD WEATHER"
,"COLD/WINDS")) {
p_type <- "COLD/WIND CHILL"
}
else if (p_type %in% c("DROUGHT/EXCESSIVE HEAT"
, "EXCESSIVE HEAT"
, "EXTREME HEAT"
, "RECORD/EXCESSIVE HEAT"
, "RECORD HEAT")
|| (regexpr("^UNSEASONABLY WARM", p_type) > 0)
) {
p_type <- "DROUGHT/EXCESSIVE HEAT"
}
else if (p_type %in% c("EXTENDED COLD"
, "EXTREME COLD/WIND CHILL"
, "LOW TEMPERATURE"
, "RECORD COLD"
, "UNSEASONABLY COLD")) {
p_type <- "EXTREME COLD"
}
else if (p_type == "FALLING SNOW/ICE") {
p_type <- "SNOW AND ICE"
}
else if (p_type %in% c("FLASH FLOOD"
, "FLASH FLOOD/FLOOD"
, "FLASH FLOODING"
, "FLASH FLOODING/FLOOD"
, "FLASH FLOODS"
, "FLOOD"
, "FLOOD & HEAVY RAIN"
, "FLOODING"
, "FLOOD/RIVER FLOOD"
, "MINOR FLOODING"
, "RAPIDLY RISING WATER"
, "URBAN/SML STREAM FLD"
, "URBAN AND SMALL STREAM FLOODIN")
|| (regexpr("^RIVER FLOOD", p_type) > 0)) {
p_type <- "FLOOD/FLASH FLOOD"
}
else if (p_type == "FOG AND COLD TEMPERATURES") {
p_type <- "FOG"
}
else if (regexpr("^FREEZING", p_type) > 0) {
p_type <- "FREEZE"
}
else if (p_type == "GLAZE") {
p_type <- "FROST"
}
else if (p_type == "GUSTY WIND") {
p_type <- "GUSTY WINDS"
}
else if (regexpr("^HEAT WAVE", p_type) > 0) {
p_type <- "HEAT"
}
else if (p_type == "HEAVY SNOW AND HIGH WINDS") {
p_type <- "HEAVY SNOW"
}
else if (p_type %in% c("HEAVY SURF AND WIND"
, "HEAVY SURF/HIGH SURF"
, "ROUGH SURF")) {
p_type <- "HEAVY SURF"
}
else if (p_type %in% c("HIGH SWELLS"
, "HIGH WATER"
, "HIGH WAVES")) {
p_type <- "HIGH SURF"
}
else if (regexpr("^HIGH WIND", p_type) > 0) {
p_type <- "HIGH WINDS"
}
else if (regexpr("^HURRICANE", p_type) > 0) {
p_type <- "HURRICANE/TYPHOON"
}
else if (regexpr("^(HYPOTNERMIA|HYPOTHERMIA|HYPTHERMIA)", p_type) > 0) {
p_type <- "HYPTHERMIA/EXPOSURE"
}
else if (regexpr("^(ICE|ICY)", p_type) > 0) {
p_type <- "ICE"
}
else if (regexpr("^LANDSLIDE", p_type) > 0) {
p_type <- "LANDSLIDES"
}
else if (p_type %in% c("LIGHT SNOW"
, "HEAVY SNOW")) {
p_type <- "SNOW"
}
else if (p_type == "LIGHTNING.") {
p_type <- "LIGHTNING"
}
else if (p_type =="MARINE MISHAP")
{
p_type <- "MARINE ACCIDENT"
}
else if (regexpr("^MUDSLIDE", p_type) > 0) {
p_type <- "MUDSLIDES"
}
else if (regexpr("^RAIN/", p_type) > 0) {
p_type <- "MIXED PRECIP"
}
else if (regexpr("^RIP CURRENT", p_type) > 0) {
p_type <- "RIP CURRENTS"
}
else if (p_type == "SNOW/ BITTER COLD") {
p_type <- "SNOW"
}
else if (regexpr("^STRONG WIND", p_type) > 0) {
p_type <- "STRONG WINDS"
}
else if (regexpr("^(THUNDERSTORM WIND|THUNDERTORM)", p_type) > 0) {
p_type <- "THUNDERSTORM WINDS"
}
else if (regexpr("^(TORNADO|TSTM WIND|MARINE TSTM WIND)", p_type) > 0) {
p_type <- "TORNADOS"
}
else if (regexpr("^TROPICAL STORM", p_type) > 0) {
p_type <- "TROPICAL STORMS"
}
else if (regexpr("^WATERSPOUT", p_type) > 0) {
p_type <- "WATERSPOUTS"
}
else if (regexpr("^WILD", p_type) > 0) {
p_type <- "WILDFIRES"
}
else if (regexpr("^WIND", p_type) > 0) {
p_type <- "WINDS"
}
else if (regexpr("^WINTER STORM", p_type) > 0) {
p_type <- "WINTER STORMS"
}
else if (regexpr("^(WINTER WEATHER|WINTRY MIX)", p_type) > 0) {
p_type <- "WINTER WEATHER/MIX"
}
}
return (p_type)
}
In calculating property damage (PROPDMG) and crop damage (CROPDMG) estimates, a dollar amount unit column must be considered. The additional columns of PROPDMGEXP and CROPDMGEXP signify the dollar amount magnitude. For example, “K” for thousands, “M” for millions, and “B” for billions of dollars. Any other value will be considered noise and will be ignored. The function, getDamageInDollars, will take two input parameters: a dollar amount and the dollar amount magnitude and calculate the actual dollar amount.
getDamageInDollars <- function(p_amt, p_unit) {
p_unit <- toupper(p_unit)
totalAmt <- p_amt;
if (p_unit %in% c("K", "M", "B")) {
if (p_unit == "K") {
totalAmt <- p_amt * 1000
}
else if (p_unit == "M") {
totalAmt <- p_amt * 10^6
}
else if (p_unit == "B") {
totalAmt <- p_amt * 10^9
}
}
return (totalAmt)
}
In order to identify the weather events that most affects population as well as those events that result in the greatest amount of damage, create aggregate datasets based upon the following columns: FATALITIES, PROPDMG, and CROPDMG.
After grouping the data, graph the top 25 weather events that result in the greatest number of fatalities and those that result in the greatest amount of property and crop damage.
# -----------------------------------------------------------------------------
# aggregate by fatalities
# -----------------------------------------------------------------------------
storm_data.fatalities <- aggregate(storm_data$FATALITIES ~ storm_data$EVTYPE
, data = storm_data
, FUN = sum)
colnames(storm_data.fatalities) <- c("type", "fatalities")
# only display events where fatalities > 0
fatalities_above_zero <- subset(storm_data.fatalities, fatalities > 0)
fatalities_above_zero$new_group <- NULL
fatalities_above_zero$new_group <- toupper(as.character(fatalities_above_zero$type))
for (i in seq(from=1, to=nrow(fatalities_above_zero)))
fatalities_above_zero$new_group[i] <- assignGroup(as.character(fatalities_above_zero$type[i]))
# regroup data
fatalities_above_zero.regrouped <- aggregate(fatalities_above_zero$fatalities
~ fatalities_above_zero$new_group
, data = fatalities_above_zero
, FUN = sum)
colnames(fatalities_above_zero.regrouped) <- c("type", "fatalities")
t1 <- fatalities_above_zero.regrouped[order(-fatalities_above_zero.regrouped$fatalities),]
# retrieve top 25 events
t1 <- head(t1, 25)
# -----------------------------------------------------------------------------
# create a subset of storm_data data representing only those records whose
# property and crop damage amounts are greater than 0
# -----------------------------------------------------------------------------
damages <- storm_data[storm_data$PROPDMG+storm_data$CROPDMG > 0
, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
# create columns to hold total property and crop damage amounts
damages$totalpropdmg <- NULL
damages$totalcropdmg <- NULL
# Note: the is a very time consuming operation
for (i in seq(from=1, to=nrow(damages))) {
damages$totalpropdmg[i] <- getDamageInDollars(damages$PROPDMG[i], damages$PROPDMGEXP[i])
damages$totalcropdmg[i] <- getDamageInDollars(damages$CROPDMG[i], damages$CROPDMGEXP[i])
}
# create a subset of the property and crop damage data
# and only on those records whose value > 0
damages_prop <- damages[damages$totalpropdmg > 0,]
damages_crop <- damages[damages$totalcropdmg > 0,]
# -----------------------------------------------------------------------------
# aggregate by property damange
# -----------------------------------------------------------------------------
storm_data.propdmg <- aggregate(damages_prop$totalpropdmg ~ damages_prop$EVTYPE
, data = damages_prop
, FUN = sum)
colnames(storm_data.propdmg) <- c("type", "propdmg")
storm_data.propdmg$new_group <- NULL
storm_data.propdmg$new_group <- toupper(as.character(storm_data.propdmg$type))
for (i in seq(from=1, to=nrow(storm_data.propdmg)))
storm_data.propdmg$new_group[i] <- assignGroup(as.character(storm_data.propdmg$type[i]))
# regroup data
storm_data.propdmg.regrouped <- aggregate(storm_data.propdmg$propdmg
~ storm_data.propdmg$new_group
, data = storm_data.propdmg
, FUN = sum)
colnames(storm_data.propdmg.regrouped) <- c("type", "propdmg")
t2 <- storm_data.propdmg.regrouped[order(-storm_data.propdmg.regrouped$propdmg),]
# retrieve top 25 events
t2 <- head(t2, 25)
# set property damage value in millions of dollars
t2$propdmg <- t2$propdmg / 10^6
# -----------------------------------------------------------------------------
# aggregate by crop damange
# -----------------------------------------------------------------------------
storm_data.cropdmg <- aggregate(damages_crop$totalcropdmg ~ damages_crop$EVTYPE
, data = damages_crop
, FUN = sum)
colnames(storm_data.cropdmg) <- c("type", "cropdmg")
storm_data.cropdmg$new_group <- NULL
storm_data.cropdmg$new_group <- toupper(as.character(storm_data.cropdmg$type))
for (i in seq(from=1, to=nrow(storm_data.cropdmg)))
storm_data.cropdmg$new_group[i] <- assignGroup(as.character(storm_data.cropdmg$type[i]))
# regroup data
storm_data.cropdmg.regrouped <- aggregate(storm_data.cropdmg$cropdmg
~ storm_data.cropdmg$new_group
, data = storm_data.cropdmg
, FUN = sum)
colnames(storm_data.cropdmg.regrouped) <- c("type", "cropdmg")
t3 <- storm_data.cropdmg.regrouped[order(-storm_data.cropdmg.regrouped$cropdmg),]
# retrieve top 25 events
t3 <- head(t3, 25)
# set crop damage value in millions of dollars
t3$cropdmg <- t3$cropdmg / 10^6
## type fatalities
## 56 TORNADOS 6177
## 9 DROUGHT/EXCESSIVE HEAT 2060
## 17 FLOOD/FLASH FLOOD 1548
## 23 HEAT 1118
## 36 LIGHTNING 817
## 43 RIP CURRENTS 577
## 15 EXTREME COLD 298
## 30 HIGH WINDS 293
## 1 AVALANCHE 225
## 63 WINTER STORMS 217
## 55 THUNDERSTORM WINDS 200
## 7 COLD/WIND CHILL 158
## 31 HURRICANE/TYPHOON 135
## 46 SNOW 134
## 52 STRONG WINDS 111
## 29 HIGH SURF 109
## 3 BLIZZARD 101
## 34 ICE 101
## 24 HEAVY RAIN 98
## 61 WILDFIRES 90
## 57 TROPICAL STORMS 66
## 18 FOG 63
## 64 WINTER WEATHER/MIX 62
## 27 HEAVY SURF 57
## 35 LANDSLIDES 39
## type propdmg
## 52 FLOOD/FLASH FLOOD 166975.6
## 119 HURRICANE/TYPHOON 84756.2
## 211 TORNADOS 63077.3
## 190 STORM SURGE 43323.5
## 73 HAIL 15732.3
## 230 WILDFIRES 8491.6
## 214 TROPICAL STORMS 7714.4
## 232 WINTER STORMS 6749.0
## 118 HIGH WINDS 6003.4
## 201 THUNDERSTORM WINDS 5223.1
## 191 STORM SURGE/TIDE 4641.2
## 120 ICE 3972.1
## 92 HEAVY RAIN/SEVERE WEATHER 2500.0
## 164 SEVERE THUNDERSTORM 1205.4
## 30 DROUGHT 1046.1
## 170 SNOW 950.0
## 132 LIGHTNING 928.7
## 88 HEAVY RAIN 694.2
## 11 BLIZZARD 659.2
## 218 TYPHOON 600.2
## 19 COASTAL FLOODING 392.5
## 125 LANDSLIDES 324.7
## 83 HAILSTORM 241.0
## 192 STRONG WINDS 177.7
## 216 TSUNAMI 144.1
## type cropdmg
## 9 DROUGHT 13972.57
## 18 FLOOD/FLASH FLOOD 12268.99
## 44 HURRICANE/TYPHOON 5515.29
## 46 ICE 5027.11
## 28 HAIL 3025.95
## 16 EXTREME COLD 1338.07
## 24 FROST/FREEZE 1094.19
## 68 TORNADOS 1036.17
## 38 HEAVY RAIN 733.40
## 69 TROPICAL STORMS 694.90
## 43 HIGH WINDS 686.30
## 64 THUNDERSTORM WINDS 605.69
## 10 DROUGHT/EXCESSIVE HEAT 497.42
## 22 FREEZE 456.73
## 37 HEAT 407.06
## 76 WILDFIRES 402.78
## 8 DAMAGING FREEZE 296.23
## 15 EXCESSIVE WETNESS 142.00
## 57 SNOW 134.66
## 19 FLOOD/RAIN/WINDS 112.80
## 2 BLIZZARD 112.06
## 60 STRONG WINDS 69.95
## 5 COLD AND WET CONDITIONS 66.00
## 23 FROST 66.00
## 40 HEAVY RAINS 60.50