Synopsys:

  • Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
  • This is analysis of extreme weather events from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
  • The analysis shows Tornado to be the main culprit for economic damages as well fatalities followed by Flood (flash floods etc) and excessive/heatwaves to be another major reasons for health related fatalities.

Data preprocessing

We set the global parameters echo=TRUE to embed the codes chunks output in a single markdown file, unless otherwise. For selectively supress by selective code chunks to be set echo=FALSE

echo = TRUE

Required R libraries in Preamble

library(knitr)
## Warning: package 'knitr' was built under R version 3.1.2
require(data.table)       # read the data
## Loading required package: data.table
require(plyr)             
## Loading required package: plyr
require(dplyr)            # summaize the data
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following objects are masked from 'package:data.table':
## 
##     between, last
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(ggplot2)
## Loading required package: ggplot2
options(scipen = 1)

Set the local directory where the input file is dowloaded. We check if the file repdata-data-stormData.csv.bz2 in the working dorectory. If not already downloaded, the following code with download.

if (!"repdata-data-StormData.csv.bz2" %in% dir("./")) {
    print("The data does not exist.. dowloadng...")
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                  destfile = "./repdata-data-StormData.csv")
    bunzip2("./repdata-data-StormData.csv.bz2", overwrite=T, remove=F)
}

#dir()

If file is not already read in data frame, the following will read the data.

if (!exists('stormdata')) {
    stormdata <- data.table(read.table('repdata-data-StormData.csv.bz2', header=TRUE, sep=','))
}

We check the records of data read and variable names.

dim(stormdata)
## [1] 902297     37
names(stormdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Subsetting the relevant data

The variables of interest would : EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP,

head(stormdata$EVTYPE)
## [1] TORNADO TORNADO TORNADO TORNADO TORNADO TORNADO
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND
length(unique(stormdata$EVTYPE))
## [1] 985

There are 985 different types of unique values in variable EVTTYPE

stormdata <- stormdata[,EVTYPE := tolower(EVTYPE)]

After the variables are converted to lower case, the unique values are

length(unique(stormdata$EVTYPE))
## [1] 898

Summarising the data for median values for FATALITIES, AND DAMAGES (PROPERT DAMAGES)

summarizedData <- stormdata %>% group_by(EVTYPE) %>% 
    summarise(median_fatalities=median(FATALITIES), 
                        total_fatalities=sum(FATALITIES),
                        total_damages=sum(PROPDMG))

Exploratory Data Analysis

Which types of events are most deadly in respect to population health?

Analysis 1

summarizedData <- summarizedData[order(median_fatalities, decreasing = T)]
summarizedData <- summarizedData[,EVTYPE:=factor(EVTYPE, levels=EVTYPE)]

fatalities <- ggplot(summarizedData[1:10]) + geom_bar(aes(EVTYPE, median_fatalities), stat='identity') + 
    labs(x='Event Type', y='Fatalities per Event') + theme_bw() + theme(axis.text.x = element_text(angle=90)) 

plot(fatalities)

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  • Tornado and cold spell seems to have higher fatalities.

Analysis 2

summarizedData <- summarizedData[order(total_fatalities, decreasing = T)]
summarizedData <- summarizedData[,EVTYPE:=factor(EVTYPE, levels=EVTYPE)]

TotalFatalities <- ggplot(summarizedData[1:10]) + geom_bar(aes(EVTYPE, total_fatalities), stat='identity') + 
    labs(x='Event Type', y='Fatalities per Event') + theme_bw() + theme(axis.text.x = element_text(angle=90)) 

plot(TotalFatalities)

  • Tital fatalies seems to be due to Tornado and heatwaves.

Which types of events have the greatest economic consequences?

summarizedData <- summarizedData[order(total_damages, decreasing = T)]
summarizedData <- summarizedData[,EVTYPE:=factor(EVTYPE, levels=EVTYPE)]

TotalDamages <- ggplot(summarizedData[1:10]) + geom_bar(aes(EVTYPE, total_damages), stat='identity') + 
    labs(x='Event Type', y='Damages (~USD)') +theme_bw() + theme(axis.text.x = element_text(angle=90)) 

plot(TotalDamages)

The economic consequences seems to be due to two major reasons (accounting properties damages) due to Tornado, which are quite common in US, followed by Flash Floods, likewise after effect of Tornadoes or Hurricanes.

Conclusion

The storm database analysis of historical data from 1950~2011 seems to show Tornado to be the deadliest of all, and its quite common for health related fatalities and injuries also causing sever economic damages due to damages of mainly properties. Flash flood or flood has been another major factors for economic damages, and these are somewhat interlinked as flash floods are aftermath of stromg and gusty wind from sea shore, like those of hurricanes whihc does damages to properties as well health and fatalities.