Synopsis

Storms and other severe weather events may affect both public health and economic. Lot of such events can result in fatalities, injuries, and property damage. Purpose of this analysis is to identify which type of events had greatest impact on public health and economic conditions. Analysis is done based on the U.S. National Oceanic and Atmosphereic Administrations’s (NOAA) Storm Database about severe weather events. Data is available from year 1950 to November 2011.

Data Processing

Storm data is available in bzip2 file at url: Storm Data Documention of the database is available at Storm Data Documentation and FAQ

Data Loading

## set working directory
setwd("C:/Swapnil/Docs/Data Science/reproducible-research/PeerAssignment2/")

## load required packages
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library("plyr")
## Warning: package 'plyr' was built under R version 3.1.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.3
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.1.3
## download the data and load into system
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("stormdata.bz2")){
     download.file(url, destfile = "stormdata.bz2", quiet = TRUE)     
}
storm <- read.csv(bzfile("stormdata.bz2"))
dim(storm)
## [1] 902297     37

There are total of 902297 observations available from the source.

colnames(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
head(storm, 3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3

This dataset consists of lot of information and lot of fields are not required for our current analysis. So, we will extract only required information from the dataset.

Also, we are interested in values of fatalities, injuries, damage on properties and damage on crops. so, we will keep records where one or more of these have values.

fields <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", 
           "CROPDMGEXP")
storm <- storm[,fields]
req_records <- (storm$FATALITIES>0 | storm$INJURIES>0 | storm$PROPDMG>0 | storm$CROPDMG>0)
storm <- storm[req_records,]

Data Cleaning

Event Types Trasformations

EVTYPES variable contains the name of the events and those are manual enttries which causes huge difficulties in categorizing these events. There are around 488 unique source events in reduced subset of data. We will try to catagories those by looking for common words and abbreviations.

storm$SourceType <- NA
storm$EVTYPE <- tolower(storm$EVTYPE)
storm[grepl("precipitation|rain|hail|drizzle|wet|percip|burst|depression|fog|wall cloud|mixed precip", 
                     storm$EVTYPE), "SourceType"] <- "Precipitation & Fog"

storm[grepl("wind|storm|wnd|hurricane|typhoon", 
                     storm$EVTYPE), "SourceType"] <- "Wind & Storm"

storm[grepl("slide|erosion|slump", 
                     storm$EVTYPE), "SourceType"] <- "Landslide & Erosion"

storm[grepl("warmth|warm|heat|dry|hot|drought|thermia|temperature record|record temperature|record high",storm$EVTYPE), "SourceType"] <- "Heat & Drought"

storm[grepl("cold|cool|ice|icy|frost|freeze|snow|winter|wintry|wintery|blizzard|chill|freezing|avalanche|glaze|sleet|avalance",storm$EVTYPE), "SourceType"] <- "Snow & Ice"

storm[grepl("flood|surf|blow-out|swells|fld|dam break|heavy shower", 
                     storm$EVTYPE), "SourceType"] <- "Flooding & High Surf"

storm[grepl("seas|high water|tide|tsunami|wave|current|marine|drowning|rapidly rising water|coastal surge|high", 
                     storm$EVTYPE), "SourceType"] <- "High seas"

storm[grepl("dust|saharan", 
                     storm$EVTYPE), "SourceType"] <- "Dust & Saharan winds"  

storm[grepl("tstm|thunderstorm|lightning", 
                     storm$EVTYPE), "SourceType"] <- "Thunderstorm & Lightning"

storm[grepl("tornado|spout|funnel|whirlwind", 
                     storm$EVTYPE), "SourceType"] <- "Tornado"

storm[grepl("fire|smoke|volcanic", 
                     storm$EVTYPE), "SourceType"] <- "Fire & Volcanic activity"


storm[grepl("torndao", storm$EVTYPE), "SourceType"] <- "Tornado"
storm[grepl("ligntning|lighting", storm$EVTYPE), "SourceType"] <- "Thunderstorm & Lightning"

Property and Crop damages calculation

Proper values of Property and Crop damages are needed for further analysis. All symbols in the DMGEXP columns are treated as powers of 10 of the DMG column. We will get final values by cleaning the values in DMGEXP columns.

Based on summary values, observation with 115 billion property damage of flood looks outlier which we will remove for the analysis

## find propety damage exponents and assign the proper values
unique(storm$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 4 h 2 7 3 H -
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
storm$PROPDMGEXP <- revalue(tolower(storm$PROPDMGEXP), c("-"=NA, "+"=NA, "b"=9, "k"=3, "m"=6, "h"=2))
storm[which(storm$PROPDMGEXP==""),]$PROPDMGEXP <- NA

## find crop damage exponents and assign the proper values
unique(storm$CROPDMGEXP)
## [1]   M K m B ? 0 k
## Levels:  ? 0 2 B k K m M
storm$CROPDMGEXP <- revalue(tolower(storm$CROPDMGEXP), c("b"=9, "k"=3, "m"=6, "?"=NA))
storm[which(storm$CROPDMGEXP==""),]$CROPDMGEXP <- NA

storm$PROPDMG_Clean <- storm$PROPDMG * (10^as.numeric(storm$PROPDMGEXP))
storm$CROPDMG_Clean <- storm$CROPDMG * (10^as.numeric(storm$CROPDMGEXP))

## summary of damages
summary(storm$PROPDMG_Clean)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
## 0.000e+00 2.500e+03 1.000e+04 1.762e+06 4.200e+04 1.150e+11     11591
summary(storm$CROPDMG_Clean)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
## 0.000e+00 0.000e+00 0.000e+00 4.816e+05 0.000e+00 5.000e+09    152670
## remove invalid record
storm <- storm[(storm$PROPDMG_Clean!=115000000000),]

Results

Impact of Harmful Events on Population Health

With respect to the Population Health, there are two damages caused : fatalities and injuries. Below plot show impact of the harmful events on population health based on these parameters.

fatalityplot <- ggplot(storm[!is.na(storm$FATALITIES),], aes(x = SourceType,y = FATALITIES,fill=SourceType))+geom_bar(stat = "identity", show.legend = F)
fatalityplot <- fatalityplot +labs(x="Events Type", y="Total Fatalities")
fatalityplot <- fatalityplot + ggtitle("Most Fatal Weather Events")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

injuriesplot <- ggplot(storm[!is.na(storm$INJURIES),], aes(x = SourceType,y = INJURIES,fill=SourceType))+geom_bar(stat = "identity", show.legend = F)
injuriesplot <- injuriesplot +labs(x="Event Type", y="Total Injuries")
injuriesplot <- injuriesplot + ggtitle("Most Injurious Weather Events")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

grid.arrange(fatalityplot, injuriesplot, ncol=2)
## Warning: Removed 31 rows containing missing values (position_stack).

## Warning: Removed 31 rows containing missing values (position_stack).

Based on the plots, it is clear that Tornadoes cause most number of deaths and injuries among all event types. Tornado is the cause of more than 5,000 deaths and 10,000 injuries in the last 60 years in US.

Economical Impact of Harmful Events

Impact on economy due to the weather events is measured based on property and crops damages. Below graph shows total damage in US million dollars due the harmful events.

prop_damage <- aggregate(PROPDMG_Clean~SourceType, storm, sum)
crop_damage <- aggregate(CROPDMG_Clean~SourceType, storm, sum)
total_damage <- merge(prop_damage, crop_damage)
colnames(total_damage) <- c("EventType", "PropertyDamage", "CropDamage")
total_damage <- melt(total_damage, id.vars = c("EventType"), measure.vars = c("PropertyDamage", "CropDamage"))
colnames(total_damage) <- c("EventType", "DamageType", "Value")
econo_plt <- ggplot(total_damage, aes(x=EventType, y=Value, fill=DamageType)) + geom_bar(stat="identity")
econo_plt <- econo_plt + labs(x="Event Type", y="Total Damage(thousands US dollars)")
econo_plt <- econo_plt + ggtitle("Most Expensive Weather Events")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
econo_plt

Based on the plot, we can conclude “Wind and Storm” type events causes the worst economic consequence.