Syposis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

Load data using following codes

if(!exists('StormData.csv.bz2')){download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")}
if(!exists('storm.data')){storm.data<-read.csv("StormData.csv.bz2",header = TRUE)}

After checking the dataset, I notice that EVTYPE variable needs to be processed to have a specific format. The following codes are used to do this processing.

# number of unique event types
length(unique(storm.data$EVTYPE))
# translate all letters to lowercase
event_types <- tolower(storm.data$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
# update the data frame
storm.data$EVTYPE <- event_types

Analysis and Result

Severe Weather Impacts on Population Health

First, the fatalities (FATALITIES) and injuries (INJURIES) data have to be aggregated by severe weather types (EVTYPE).

storm.death <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = storm.data, sum, na.rm = T)

Then, a scatter plot is created using aggregated fatalities and injuries.

plot(storm.death$FATALITIES,storm.death$INJURIES,log="xy", main="Severe Wether Even Impacts on Population Health", xlab = "FATALITIES", ylab = "INJURIES")
text(storm.death$FATALITIES, storm.death$INJURIES, labels=storm.death$EVTYPE, cex= 0.8,pos=1)

From the scatter plot, we can conclude that tornado is the most harmful with respect to population health across the U.S.

Sever Weather Impacts on Economy

PROPDMGEXP, PROPDMG, CROPDMGEXP, and CROPDMG are used to evaluate the impacts on economy. Due to PROPDMGEXP and CROPDMGEXP consisting of non-numerical records, we need transform the data at the beginning.

exp_transform <- function(e) {
  # h -> hundred, k -> thousand, m -> million, b -> billion
  if (e %in% c('h', 'H'))
    return(2)
  else if (e %in% c('k', 'K'))
    return(3)
  else if (e %in% c('m', 'M'))
    return(6)
  else if (e %in% c('b', 'B'))
    return(9)
  else if (!is.na(as.numeric(e))) # if a digit
    return(as.numeric(e))
  else if (e %in% c('', '-', '?', '+'))
    return(0)
  else {
    stop("Invalid exponent value.")
  }
}

prop_dmg_exp <- sapply(storm.data$PROPDMGEXP, FUN=exp_transform)
storm.data$prop_dmg <- storm.data$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(storm.data$CROPDMGEXP, FUN=exp_transform)
storm.data$crop_dmg <- storm.data$CROPDMG * (10 ** crop_dmg_exp)

After the transformation, proper damages and crop damages need to be aggregated by severe weather type.

library(plyr)
econ_loss <- ddply(storm.data, .(EVTYPE), summarize,
                   prop_dmg = sum(prop_dmg),
                   crop_dmg = sum(crop_dmg))

# filter out events that caused no economic loss
econ_loss <- econ_loss[(econ_loss$prop_dmg > 0 | econ_loss$crop_dmg > 0), ]
prop_dmg_events <- head(econ_loss[order(econ_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(econ_loss[order(econ_loss$crop_dmg, decreasing = T), ], 10)

prop_dmg_events[, c("EVTYPE", "prop_dmg")]

crop_dmg_events[, c("EVTYPE", "crop_dmg")]

storm.econ<-cbind(prop_dmg_events,crop_dmg_events,by="EVTYPE")

At last, a scatter plot is created using aggregated property damages and crop damages.

plot(storm.econ$prop_dmg,storm.econ$crop_dmg,log="xy", main="Severe Wether Even Impacts on Economy",
     xlab = "Property Damage ($)", ylab = "Crop Damamge ($)")
text(storm.econ$prop_dmg, storm.econ$crop_dmg, labels=storm.econ$EVTYPE, cex= 0.8,pos=1)

From the scatter plot, we can conclude that flash flood is the most harmful with respect to economic consequences across the U.S.