Storm Data Analysis for Population Health and Economic Consequences

Reproducible Research Peer Assignment II

Synopsis

The storm data from U.S. National Oceanic and Atmospheric Administration is extracted and analyzed for two major attributes: population health hazard and economic consequence due to the storm events. This data consists of fatalities, injuries, property damage and crop damage values that are caused by different storm events. This report presents which storm events are most responsible for these damages and health hazards across the United States.

Data Processing

First, we have to initialize R program with required packages and set the working directory to the right folder.

library(plyr)
library(xtable)
library(knitr)
setwd("~/Documents/Courses/Coursera/reproducible_research/assignment2")

The storm data is read through the 'read.table' command and unzipped to csv file through the 'bzfile' command. Some commands that take long processing times are 'cached' in the R code.

storm <- read.table(bzfile("repdata-data-StormData.csv.bz2", "repdata-data-StormData.csv"), header=T, sep=",")

In this exercise, the storm data is analyzed to see which types of events are most harmful for population health, and which events has the greatest economic consequences.

First, the variables of the dataset are examined.

colnames(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

In order to narrow down the data analysis exercise into two parts, the main dataset is subset into three parts: first dataset consists of variables that are relevant to population health, and the second dataset consists of variables that are relevant to property damage, and the third dataset consists of variables that are relevant to crop damage. The second and third dataset will be used to determine which event has the greatest economic consequence.

pop_storm <- storm[,c("EVTYPE", "FATALITIES", "INJURIES")]
prop_damage <- storm[,c("EVTYPE","PROPDMG","PROPDMGEXP")]
crop_damage <- storm[,c("EVTYPE","CROPDMG","CROPDMGEXP")]

Here, 'EVTYPE' variable refers to the storm events, 'FATALITIES' and 'INJURIES' variables are related to population health. 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP' variables are related to economic consequences, meaning property damage and crop damage respectively. The variables 'PROPDMGEXP', and 'CROPDMGEXP' contain the numeric denomination of the corresponding property damage and crop damage numbers, 'K' meaning thousands of dollars, 'M' meaning millions of dollars, and 'B' meaning billions of dollars.

Data processing for Population Health Analysis

The events are stored as 'factors' in the storm data, so first, they are converted to character class for easy grouping analysis.

pop_storm[,1] <- as.character(pop_storm[,1])

Then, using the 'ddply' command from the 'plyr' package, the total number of fatalities are first calculated for each storm event type. They are then ordered to see first event has the most fatalities.

sum_fatalities_storm <- ddply(pop_storm, ~EVTYPE, summarize, sum=sum(FATALITIES))
colnames(sum_fatalities_storm) <- c("Disaster", "Fatalities")
ordered_fatality <- sum_fatalities_storm[order(-sum_fatalities_storm$Fatalities),]

Similar approach is used on the dataset to determine the total number of injuries of each storm event, and the output is ordered from high to low number of injuries.

sum_injuries_storm <- ddply(pop_storm, ~EVTYPE, summarize, injuries=sum(INJURIES))
colnames(sum_injuries_storm) <- c("Disaster", "Injuries")
ordered_injury <- sum_injuries_storm[order(-sum_injuries_storm$Injuries),]

These processed datasets will be used for producing results, which is included in the results section.

Data processing for determining Economic Consequences

This section will cover the dataprocessing done for the property and crop damage datasets in order to determine the economic consequences of the storm events.

First, like we did in the population dataset, the events are converted from factor class to character class in both datasets.

prop_damage[,3] <- as.character(prop_damage[,3])
crop_damage[,3] <- as.character(crop_damage[,3])

The difference in these datasets is that, they have a column with just the number for the expenses, then there is the corresponding denomination for those numbers, which could be thousands, millions or even billions of dollars! So, we cannot just add them up without taking those denominations into consideration.

So, first, these numbers are subset into different datasets according to their denominations.

prop_damage_billion <- subset(prop_damage, prop_damage[,3] == "B" | prop_damage[,3]=="b")
prop_damage_million <- subset(prop_damage, prop_damage[,3] == "M" | prop_damage[,3]=="m")
prop_damage_thousand <- subset(prop_damage, prop_damage[,3] == "K" | prop_damage[,3]=="k")

These are then summed up to estimate the total amount of dollars (in their respective denominations) according to the event type.

sum_prop_damage_billion <- ddply(prop_damage_billion, ~EVTYPE, summarize, damage = sum(PROPDMG))
ordered_prop <- sum_prop_damage_billion[order(-sum_prop_damage_billion$damage),]

sum_prop_damage_million <- ddply(prop_damage_million, ~EVTYPE, summarize, damage = sum(PROPDMG))
ordered_prop_m <- sum_prop_damage_million[order(-sum_prop_damage_million$damage),]

sum_prop_damage_thousand <- ddply(prop_damage_thousand, ~EVTYPE, summarize, damage = sum(PROPDMG))
ordered_prop_t <- sum_prop_damage_thousand[order(-sum_prop_damage_thousand$damage),]

Then, all the values are converted to billions of dollars, to a common denominator. These datasets are then combined and then summed up according to their event type causes.

ordered_prop_m_converted <- ordered_prop_m
ordered_prop_m_converted[,2] <- ordered_prop_m_converted[,2] / 1000

ordered_prop_t_converted <- ordered_prop_t
ordered_prop_t_converted[,2] <- ordered_prop_t_converted[,2] / 1000000

prop_damage_final <- rbind(ordered_prop, ordered_prop_m_converted, ordered_prop_t_converted)

sum_prop_damage_final <- ddply(prop_damage_final, ~EVTYPE, summarize, sumdamage=sum(damage))

These results are then ordered according to highest economic consequence to lowest according to the storm event type.

ordered_prop_final <- sum_prop_damage_final[order(-sum_prop_damage_final$sumdamage),]

Similar apprach is followed to determine the crop damage as shown below.

crop_damage_million <- subset(crop_damage, crop_damage[,3] == "M" | crop_damage[,3]=="m")
crop_damage_billion <- subset(crop_damage, crop_damage[,3] == "B" | crop_damage[,3]=="b")
crop_damage_thousand <- subset(crop_damage, crop_damage[,3] == "K" | crop_damage[,3]=="k")

sum_crop_damage_billion <- ddply(crop_damage_billion, ~EVTYPE, summarize, damage = sum(CROPDMG))
ordered_crop <- sum_crop_damage_billion[order(-sum_crop_damage_billion$damage),]

sum_crop_damage_million <- ddply(crop_damage_million, ~EVTYPE, summarize, damage = sum(CROPDMG))
ordered_crop_m <- sum_crop_damage_million[order(-sum_crop_damage_million$damage),]

sum_crop_damage_thousand <- ddply(crop_damage_thousand, ~EVTYPE, summarize, damage = sum(CROPDMG))
ordered_crop_t <- sum_crop_damage_thousand[order(-sum_crop_damage_thousand$damage),]

ordered_crop_m_converted <- ordered_crop_m
ordered_crop_m_converted[,2] <- ordered_crop_m_converted[,2] / 1000

ordered_crop_t_converted <- ordered_crop_t
ordered_crop_t_converted[,2] <- ordered_crop_t_converted[,2] / 1000000

crop_damage_final <- rbind(ordered_crop, ordered_crop_m_converted, ordered_crop_t_converted)

sum_crop_damage_final <- ddply(crop_damage_final, ~EVTYPE, summarize, sumdamage=sum(damage))

ordered_crop_final <- sum_crop_damage_final[order(-sum_crop_damage_final$sumdamage),]

The final ordered property damage and crop damage results that have the damage expense values in billions of dollars are used to produce the results.

Results

Population Health Results

From the analysis of fatalities and injuries, it is observed that Tornadoes are the biggest cause when it comes to population health hazard, both in terms of fatalities and injuries. The top 10 storm event causes of fatalities and injuries are shwon in Tables 1 and 2.

Table 1. Top 10 Fatality Numbers of Storm Events

out_fatality <- ordered_fatality[1:10,]
rownames(out_fatality) <- NULL
out_tab_fatality <- xtable(out_fatality, digits=0)
print(out_tab_fatality, type="html")
Disaster Fatalities
1 TORNADO 5633
2 EXCESSIVE HEAT 1903
3 FLASH FLOOD 978
4 HEAT 937
5 LIGHTNING 816
6 TSTM WIND 504
7 FLOOD 470
8 RIP CURRENT 368
9 HIGH WIND 248
10 AVALANCHE 224

The fatality numbers are lead by tornadoes, followed by excessive heat and flash flood events.

Table 2. Top 10 Injury Numbers of Storm Events

out_injury <- ordered_injury[1:10,]
rownames(out_injury) <- NULL
out_tab_injury <- xtable(out_injury, digits=0)
print(out_tab_injury, type="html")
Disaster Injuries
1 TORNADO 91346
2 TSTM WIND 6957
3 FLOOD 6789
4 EXCESSIVE HEAT 6525
5 LIGHTNING 5230
6 HEAT 2100
7 ICE STORM 1975
8 FLASH FLOOD 1777
9 THUNDERSTORM WIND 1488
10 HAIL 1361

The injury numbers are lead by tornadoes, followed by thunderstorm winds and floods.

The following code calculates the share of fatalities and injuries of the total population for all the storm events.

fat <- sum(ordered_fatality[,2])
inj <- sum(ordered_injury[,2])

Figure 1 shows the share of fatalities and injuries of all the storm events.

s <- c(fat, inj)
lbls <- c("Fatalities", "Injuries")
pie(s, labels=lbls, main="Figure 1. Share of Fatalities and Injuries for ALL Storm Events", col=c("blue", "red"))

plot of chunk unnamed-chunk-16

It is observed that the total fatality is about 10% of the population hazard for the storm events.

Economic Consequence Results

Property damage and crop damage expenses are analyzed to determine which events have the most economic consequences. Tables 3 and 4 show the top 10 events that has the highest consequence in terms of billions of dollars, for property damage and crop damage respectively.

Table 3. Top 10 Property Damage Values of Storm Events

out_prop <- ordered_prop_final[1:10,]
rownames(out_prop) <- NULL
colnames(out_prop) <- c("Disaster", "Property Damage (Billions of Dollars)")
out_tab_prop <- xtable(out_prop, digits=2)
print(out_tab_prop, type="html")
Disaster Property Damage (Billions of Dollars)
1 FLOOD 144.66
2 HURRICANE/TYPHOON 69.31
3 TORNADO 56.94
4 STORM SURGE 43.32
5 FLASH FLOOD 16.14
6 HAIL 15.73
7 HURRICANE 11.87
8 TROPICAL STORM 7.70
9 WINTER STORM 6.69
10 HIGH WIND 5.27

Table 3. Top 10 Crop Damage Values of Storm Events

out_crop <- ordered_crop_final[1:10,]
rownames(out_crop) <- NULL
colnames(out_crop) <- c("Disaster", "Crop Damage (Billions of Dollars)")
out_tab_crop <- xtable(out_crop, digits=2)
print(out_tab_crop, type="html")
Disaster Crop Damage (Billions of Dollars)
1 DROUGHT 13.97
2 FLOOD 5.66
3 RIVER FLOOD 5.03
4 ICE STORM 5.02
5 HAIL 3.03
6 HURRICANE 2.74
7 HURRICANE/TYPHOON 2.61
8 FLASH FLOOD 1.42
9 EXTREME COLD 1.29
10 FROST/FREEZE 1.09

It is observed that while flood events are the major cause of property damage, droughts top the cause of damage expenses in crops. However, considering the economic share of crops to properties, it can be concluded that Floods are the major source of economic consequence among all the storm events. Floods are also the second most destructive event when it comes to crops.

The following code calculates the total expenses of property and crop damages, to see their respective shares.

property <- sum(ordered_prop_final[,2])
crop <- sum(ordered_crop_final[,2])

Figure 2 shows the share of the property and crop damages.

s2 <- c(property, crop)
lbls <- c("Property", "Crops")
pie(s2, labels=lbls, main="Figure 2. Share of Property and Crop damage expenses for ALL Storm Events", col=c("yellow", "dark green"))

plot of chunk unnamed-chunk-20

This reinforces the previous assumption that property damage is significantly higher compared to crop damages in terms of economic value, hence floods are the most destructive storm events.

Summary

Hence, to summarize the analysis, it can be concluded that as far as population health is concerned, tornado is the most destructive storm event, and as far as economic consequence is concerned, floods are the most destructive storm events.