Synopsis

Extreme weather events impacts US public health and US economy in general. Severe events can result in injurie, or worse fatalities. They might cause damage to property and agricultural products as well. Many severs events can results in fatalities, injuries and property damage including damage to agricultural products.

Communities, municipals and governments could take preventive measures to reduce such impact. Since it is not feasible to support such activities in every level, to use the economies of scale a national prioritization is necessary to implement thepreventive actions to the highest impacting type of events.

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database records major weather events in the United States including public and economic impact.This report contains the exploratory analysis results to suggest prioritization of planning preventive ativities and rapid responses to severe weather event types.

References

A. Storm Data

B. [National Weather Service Storm Data Documentation] (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf)

C. [Reproducible Research Peer Assignment 2 text] (https://class.coursera.org/repdata-031/human_grading/view/courses/975144/assessments/4/submissions) ; John Hopkins Bloomberg School of Public Health

Data Processing

Getting the Data Package

The data should be available via a file named “repdata-data-stormDB.csv”. If not, a file should be downloaded from the course page and extracted.

# if data file do not exist, extract it from the download
if (!file.exists("repdata-data-StormData.csv")) {
  #if download file do not exist, get it from the course sources
  if (!file.exists("repdata-data-StormData.csv.bz2")) {
    download.file(
      "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
      "repdata-data-StormData.csv.bz2")
  }
  library(R.utils)
  bunzip2("repdata-data-StormData.csv.bz2", "repdata-data-StormData.csv", remove = FALSE)
}

Preparing the Workspace

The data is read to variable stormDB to be used in analysis tasks

stormDB <- read.csv("repdata-data-StormData.csv")

Selection of data

Referring to the end of Section Data from Reference C, “… recent years should be considered more complete”, events only after year 2000 will be used in this study. This strategy will also help with a meaningful comparison of events to each other by minimizing the population variance between decades and the major differences in the economy.

extractDate <- function (x) gsub( " .*$", "", x)
stormDB$BGN_DATE <- strptime(extractDate(stormDB$BGN_DATE), format="%m/%d/%Y")
threshHoldDate<- strptime("1/1/2000", format="%m/%d/%Y")
stormDB <- stormDB[stormDB$BGN_DATE >= threshHoldDate,]

Exploring and Clearing Data

colnames(stormDB)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

From these column names for answering the required questions columns required are: * EVTYPE * FATALITIES * INJURIES * PROPDMG & PROPDMGEXP * CROPDMG & CROPDMGEXP

stormDB <- stormDB[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Section 2.1.1 from Reference B the legid Event Types are as following:

NOAAEVTYPE <- toupper(
  c("Astronomical Low Tide", "Avalanche", "Blizzard", 
    "Coastal Flood", "Cold/Wind Chill", "Debris Flow", 
    "Dense Fog", "Dense Smoke", "Drought", 
    "Dust Devil", "Dust Storm", "Excessive Heat", 
    "Extreme Cold/Wind Chill", "Flash Flood", "Flood", 
    "Frost/Freeze", "Funnel Cloud", "Freezing Fog", 
    "Hail", "Heat", "Heavy Rain", 
    "Heavy Snow", "High Surf", "High Wind", 
    "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", 
    "Lakeshore Flood", "Lightning", "Marine Hail", 
    "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", 
    "Rip Current", "Seiche", "Sleet",
    "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", 
    "Tornado", "Tropical Depression", "Tropical Storm", 
    "Tsunami", "Volcanic Ash", "Waterspout", 
    "Wildfire", "Winter Storm", "Winter Weather"))

From the Storm data, it requires to pick events only with the valid types.

# Removing heading and trailing whitespaces from Event Type and 
# Convert to upper case for standardisation
RemWhiteSpace <- function (x) gsub("^\\s+|\\s+$", "", x)
stormDB$EVTYPE<-toupper(RemWhiteSpace(stormDB$EVTYPE))
stormDB<-stormDB[stormDB$EVTYPE %in% NOAAEVTYPE,]

It requires to convert the data into numeric format, including processing exponentials for property damage and crop damage, according to Reference B Section 2.7

stormDB$FATALITIES<- as.numeric(stormDB$FATALITIES)
stormDB$INJURIES<- as.numeric(stormDB$INJURIES)

#PropERTY Damage normalization
stormDB$PROPDMG<- as.numeric(stormDB$PROPDMG)
stormDB$PROPDMGEXP<- as.character(stormDB$PROPDMGEXP)
stormDB$PROPDMGEXP[grep("K",stormDB$PROPDMGEXP) ] <- 1000 
stormDB$PROPDMGEXP[grep("M",stormDB$PROPDMGEXP) ] <- 1000000
stormDB$PROPDMGEXP[grep("B",stormDB$PROPDMGEXP) ] <- 1000000000
suppressWarnings(stormDB$PROPDMGEXP<-as.numeric(stormDB$PROPDMGEXP))
stormDB$PROPDMGEXP[is.na(stormDB$PROPDMGEXP) ]<-1
stormDB$PROPDMG<-stormDB$PROPDMG*stormDB$PROPDMGEXP

#crop Damage normalization
stormDB$CROPDMG<- as.numeric(stormDB$CROPDMG)
stormDB$CROPDMGEXP<- as.character(stormDB$CROPDMGEXP)
stormDB$CROPDMGEXP[grep("K",stormDB$CROPDMGEXP) ] <- 1000 
stormDB$CROPDMGEXP[grep("M",stormDB$CROPDMGEXP) ] <- 1000000
stormDB$CROPDMGEXP[grep("B",stormDB$CROPDMGEXP) ] <- 1000000000
suppressWarnings(stormDB$CROPDMGEXP<-as.numeric(stormDB$CROPDMGEXP))
stormDB$CROPDMGEXP[is.na(stormDB$CROPDMGEXP) ]<-1
stormDB$CROPDMG<-stormDB$CROPDMG*stormDB$CROPDMGEXP

#remove unnecessary columns which contains exponential data
stormDB<-stormDB[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","CROPDMG")]

Prioritization of Events

To measure impact to population health, fatalities and injuries are counted together; similarly for economic damages, property damage and agricultural (crop) damage is summed.

stormDB$POPULATIONIMPACT=stormDB$FATALITIES+stormDB$INJURIES
stormDB$ECONOMICIMPACT=stormDB$PROPDMG+stormDB$CROPDMG

I would like to compare the impact of different event types by summing their impact, low impact events might be as effective as a high impact event if it happens frequently.

population <- aggregate(POPULATIONIMPACT ~ EVTYPE, data = stormDB, FUN = sum)
population <- population[order(-population$POPULATIONIMPACT), ]

economic <- aggregate(ECONOMICIMPACT ~ EVTYPE, data = stormDB, FUN = sum)
economic <- economic[order(-economic$ECONOMICIMPACT), ]

The number of NOAA Event types makes drawing all results into one figure for comparison not practicle.

Due to this reason top 5 events should give us a good prioritization on type of events which we should be more concerned.

populationTop5 <- population[1:5, ]
economicTop5 <- economic[1:5, ]

Results

Top 5 NOAA event types impacting US population health and economy from the beginning of year 2000 are presented as bar charts below:

Top 5 NOAA Event Types Impacting Health of US Population

par(mfrow = c(1, 1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(populationTop5$POPULATIONIMPACT, las = 3, names.arg = populationTop5$EVTYPE, main = "Across the United States, top 5 events which are most harmful with respect to population health", ylab = "number of injuries or fatalities", col = "red")

Top 5 NOAA Event Types Impacting US Economy

par(mfrow = c(1, 1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(economicTop5$ECONOMICIMPACT/1000000000, las = 3, names.arg = economicTop5$EVTYPE, main = "Across the United States, which types of events have the greatest economic consequences?", ylab = "Billions of USD", col = "green")