Analysis of Severe Weather Events using the NOAA Storm Database

 

Synopsis

This document consists of the analysis of impact of severe weather events on public health as well as economic problems. The weather events such as storms, tornado, rain, flood, hail, wind, heat etc. result in fatalities, injuries, crop and property damages. Hence, understanding and preventing such damages is important for government authorities at multiple levels.

One way to understand the impact of severe weather on public health and other economic damages is by analyzing the data of historic events. For this analysis I am using U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database track characteristics of major storms and weather events in the United States that includes estimates of fatalities, injuries, crop and property damages along with other attributes related to the storm.

The data shows that most of the severe weather events has impacted people lives and damaged their properties. Among all thpes of weather events Tornadoes resulted into higher health impacts such as fatalities and injuries and Floods resulted into most economic impacts such as properties and crops damages.

 

Assignment

This Assignment is a part of Reproducible Research Course Project 2 The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

 

Data Processing

 

Load Libraries

Load some R Libraries used for data manipulation and creating visuals.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

 

Load File

Load file that has data of historic storm events,

if(!exists("storm_data_all")) {
  storm_data_all <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header=TRUE)
}

 

Check Basic Data Attributes

Below code runs some basic checks to confirm that the file has been loaded properly. Following R commands helps to check the size of the data, column headers and internal structure of the R object.

dim(storm_data_all)
## [1] 902297     37
colnames(storm_data_all)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(storm_data_all)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

The size of the input database is 902297 rows and 37 columns

 

Data Pre-Processing

Next steps after loading the data successfully are for data pre-processing. Data pre-processing is very important step before utilizing any data or beginning with any analysis.

For this analysis i am considering only specific columns those are related with health and economic impacts from the raw data.

vars <- c( "BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm_data <- storm_data_all[, vars]

Below code removes the records with incomplete information. It removes the record where none of the information related to storm event, fatalities, injuries, property and crop damages exists.

storm_data <- filter(storm_data, (EVTYPE != "?" & (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)))

The data in input database starts from year 1950. But in earlier years there were very few events recorded mostly due to lack of good records. Hence, I am considering records after year 1991 onward for this analysis. In order to restrict data based on year, I am converting the BGN_DATE data point in standard date format of mm/dd/yyyy and then applying filter based on the value of year of begin date.

storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, "%m/%d/%Y")
storm_data$YR <- year(storm_data$BGN_DATE)
storm_data <- filter(storm_data, YR >= 1991)

sort(table(storm_data$YR))
## 
##  1991  1992  1993  1994  2005  1996  2001  1997  2002  1995  2004  1999  2003 
##   879   990  5838  9643 10014 10040 10298 10322 10432 10457 10484 10609 11015 
##  2000  2007  2006  1998  2009  2010  2008  2011 
## 11508 11953 11974 14013 14434 16019 17633 20570

Let’s take a quick look at the converted data set by checking basic parameters.

dim(storm_data)
## [1] 229125      9
colnames(storm_data)
## [1] "BGN_DATE"   "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"   
## [6] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "YR"

Next steps in data pre-processing are for standardizing and converting storm events names. Following code will group the data into main weather events like Hail, Heat, Flood, Wind, Storm, Snow, Tornatdo, Winter and Rain. The remaining weather events data are grouped as ‘Other’ weather events.

storm_data$EVENT_TYPE <- "Other"
storm_data$EVENT_TYPE[grep("HAIL", storm_data$EVTYPE, ignore.case = TRUE)] <- "Hail"
storm_data$EVENT_TYPE[grep("HEAT", storm_data$EVTYPE, ignore.case = TRUE)] <- "Heat"
storm_data$EVENT_TYPE[grep("FLOOD", storm_data$EVTYPE, ignore.case = TRUE)] <- "Flood"
storm_data$EVENT_TYPE[grep("WIND", storm_data$EVTYPE, ignore.case = TRUE)] <- "Wind"
storm_data$EVENT_TYPE[grep("STORM", storm_data$EVTYPE, ignore.case = TRUE)] <- "Storm"
storm_data$EVENT_TYPE[grep("SNOW", storm_data$EVTYPE, ignore.case = TRUE)] <- "Snow"
storm_data$EVENT_TYPE[grep("TORNADO", storm_data$EVTYPE, ignore.case = TRUE)] <- "Torando"
storm_data$EVENT_TYPE[grep("WINTER", storm_data$EVTYPE, ignore.case = TRUE)] <- "Winter"
storm_data$EVENT_TYPE[grep("RAIN", storm_data$EVTYPE, ignore.case = TRUE)] <- "Rain"

sort(table(storm_data$EVENT_TYPE), decreasing = TRUE)
## 
##    Wind   Storm   Flood    Hail   Other Torando  Winter    Snow    Rain    Heat 
##   72935   57434   32455   26102   18507   15524    2060    1878    1250     980

Property damages are recorded with exponential units. In order to analyze the economic impact, I am converting these costs with exponential units to actual dollar amount. This includes transformations such as K will be converted to 1,000, M will be converted to 1,000,000, B will be converted to 1,000,000,000.

storm_data$PROPDMGEXP <- as.character(storm_data$PROPDMGEXP)

storm_data$PROPDMGEXP[is.na(storm_data$PROPDMGEXP)] <- 0 
storm_data$PROPDMGEXP[!grepl("K|M|B", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- 0 

storm_data$PROPDMGEXP[grep("K", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- "3"
storm_data$PROPDMGEXP[grep("M", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- "6"
storm_data$PROPDMGEXP[grep("B", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- "9"

storm_data$PROPDMGEXP <- as.numeric(as.character(storm_data$PROPDMGEXP))
storm_data$PROPERTY_DAMAGE <- storm_data$PROPDMG * 10^storm_data$PROPDMGEXP

sort(table(storm_data$PROPERTY_DAMAGE), decreasing = TRUE)[1:10]
## 
##  5000 10000  1000  2000     0 50000  3000 20000 25000 15000 
## 31730 21787 17544 17186 14066 13596 10364  9179  8919  8617

Crop damages are recorded with exponential units. In order to analyze the economic impact, I am converting these costs with exponential units to actual dollar amount. This includes transformations such as K will be converted to 1,000, M will be converted to 1,000,000, B will be converted to 1,000,000,000.

storm_data$CROPDMGEXP <- as.character(storm_data$CROPDMGEXP)

storm_data$CROPDMGEXP[is.na(storm_data$CROPDMGEXP)] <- 0 
storm_data$CROPDMGEXP[!grepl("K|M|B", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- 0 

storm_data$CROPDMGEXP[grep("K", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- "3"
storm_data$CROPDMGEXP[grep("M", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- "6"
storm_data$CROPDMGEXP[grep("B", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- "9"

storm_data$CROPDMGEXP <- as.numeric(as.character(storm_data$CROPDMGEXP))
storm_data$CROP_DAMAGE <- storm_data$CROPDMG * 10^storm_data$CROPDMGEXP

sort(table(storm_data$CROP_DAMAGE), decreasing = TRUE)[1:10]
## 
##      0   5000  10000  50000  1e+05   1000   2000  25000  20000  5e+05 
## 207026   4097   2349   1984   1233    956    951    830    758    721

 

Analysis & Results

 

1. Across the United States, which types of events are most harmful with respect to population health?

For analyzing health impacts related to weather events, I am taking into consideration the number of fatalities and injuries by weather event type.

agg.fatalities_injuries <- ddply(storm_data, .(EVENT_TYPE), summarize,
                                 Total=sum(FATALITIES + INJURIES, na.rm=TRUE))
agg.fatalities_injurie <- "Fatalities and Injuries"

agg.fatalities <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(FATALITIES, na.rm = TRUE))
agg.fatalities$Type <- "Fatalities"

agg.injuries <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(INJURIES, na.rm = TRUE))
agg.injuries$Type <- "Injuries"

agg.health_impact <- rbind(agg.fatalities, agg.injuries)

health_impact <- join (agg.fatalities, agg.injuries, by="EVENT_TYPE", type="inner")

health_impact
##    EVENT_TYPE Total       Type Total     Type
## 1       Flood  1524 Fatalities  8602 Injuries
## 2        Hail    15 Fatalities  1082 Injuries
## 3        Heat  3138 Fatalities  9224 Injuries
## 4       Other  2626 Fatalities 12224 Injuries
## 5        Rain   114 Fatalities   305 Injuries
## 6        Snow   164 Fatalities  1164 Injuries
## 7       Storm   416 Fatalities  5339 Injuries
## 8     Torando  1727 Fatalities 25558 Injuries
## 9        Wind   990 Fatalities  6485 Injuries
## 10     Winter   278 Fatalities  1891 Injuries

Plotting the results on chart for quick analysis.

agg.health_impact$EVENT_TYPE <- as.factor(agg.health_impact$EVENT_TYPE)

health_impact_plot <- ggplot(agg.health_impact, aes(x = reorder(EVENT_TYPE, -Total), y = Total, fill = Type)) +
  theme_classic() + 
  geom_bar(stat = "identity", position = 'dodge', alpha=0.75) +
  xlab("Weather Event") +
  ylab("Number of Fatalities and Injuries") +
  ggtitle("Impact of Severe Weather Events On Public Health 1991-2011") +
  theme(axis.text = element_text(face="bold")) + 
  theme(axis.text.x = element_text(angle=90)) +
  theme(plot.title = element_text(hjust = 0.5))
print(health_impact_plot)

The graph shows that the highest impact on public health was resulted due to tornado. Most fatalities and injuries were recorded due to tornadoes. Among other known weather events heat and flood resulted into more fatalities and injuries most after tornadoes

 

2. Across the United States, which types of events have the greatest economic consequences?

For analyzing economic impacts related to weather events, I am taking into consideration the cost of property damages and crop damaged by weather event type.

agg.propdmg_cropdmg <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(PROPERTY_DAMAGE + CROP_DAMAGE,  na.rm = TRUE))
agg.propdmg_cropdmg$type <- "Property and Crop Damage"

agg.prop <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(PROPERTY_DAMAGE, na.rm = TRUE))
agg.prop$Type <- "Property"

agg.crop <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(CROP_DAMAGE, na.rm = TRUE))
agg.crop$Type <- "Crop"

agg.economic_impact <- rbind(agg.prop, agg.crop)

economic_impact <- join (agg.prop, agg.crop, by="EVENT_TYPE", type="inner")

economic_impact
##    EVENT_TYPE        Total     Type       Total Type
## 1       Flood 167502193929 Property 12266906100 Crop
## 2        Hail  15733043048 Property  3046837473 Crop
## 3        Heat     20325750 Property   904469280 Crop
## 4       Other  97246707337 Property 23588880870 Crop
## 5        Rain   3270230192 Property   919315800 Crop
## 6        Snow   1024169752 Property   134683100 Crop
## 7       Storm  66304415393 Property  6374474888 Crop
## 8     Torando  30553884789 Property   417461520 Crop
## 9        Wind  10847166618 Property  1403719150 Crop
## 10     Winter   6777295251 Property    47444000 Crop

Plotting the results on chart for quick analysis.

economic_impact$EVENT_TYPE <- as.factor(economic_impact$EVENT_TYPE)

economic_impact_plot <- ggplot(agg.economic_impact, aes(x = reorder(EVENT_TYPE, -Total), y = Total/1e9, fill = Type)) +
  theme_classic() + 
  geom_bar(stat = "identity", position = 'dodge', alpha=0.75) +
  xlab("Weather Event") +
  ylab("Property and Crop Damage (in billion USD)") +
  ggtitle("Impact of Severe Weather Events On Economy 1991-2011") +
  theme(axis.text = element_text(face="bold")) + 
  theme(axis.text.x = element_text(angle=90)) +
  theme(plot.title = element_text(hjust = 0.5))

print(economic_impact_plot)

The graph shows that the highest economic damage was recorded due to flood. Floods resulted into most properties and crops damages. Among other known weather events storms and tornadoes impacted crops and damages most after floods.