The storm data from U.S. National Oceanic and Atmospheric Administration is extracted and analyzed for two major attributes: population health hazard and economic consequence due to the storm events. This data consists of fatalities, injuries, property damage and crop damage values that are caused by different storm events. This report presents which storm events are most responsible for these damages and health hazards across the United States.
First, we have to initialize R program with required packages and set the working directory to the right folder.
library(plyr)
library(xtable)
library(knitr)
setwd("~/Documents/Courses/Coursera/reproducible_research/assignment2")
The storm data is read through the 'read.table' command and unzipped to csv file through the 'bzfile' command. Some commands that take long processing times are 'cached' in the R code.
storm <- read.table(bzfile("repdata-data-StormData.csv.bz2", "repdata-data-StormData.csv"), header=T, sep=",")
In this exercise, the storm data is analyzed to see which types of events are most harmful for population health, and which events has the greatest economic consequences.
First, the variables of the dataset are examined.
colnames(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
In order to narrow down the data analysis exercise into two parts, the main dataset is subset into three parts: first dataset consists of variables that are relevant to population health, and the second dataset consists of variables that are relevant to property damage, and the third dataset consists of variables that are relevant to crop damage. The second and third dataset will be used to determine which event has the greatest economic consequence.
pop_storm <- storm[,c("EVTYPE", "FATALITIES", "INJURIES")]
prop_damage <- storm[,c("EVTYPE","PROPDMG","PROPDMGEXP")]
crop_damage <- storm[,c("EVTYPE","CROPDMG","CROPDMGEXP")]
Here, 'EVTYPE' variable refers to the storm events, 'FATALITIES' and 'INJURIES' variables are related to population health. 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP' variables are related to economic consequences, meaning property damage and crop damage respectively. The variables 'PROPDMGEXP', and 'CROPDMGEXP' contain the numeric denomination of the corresponding property damage and crop damage numbers, 'K' meaning thousands of dollars, 'M' meaning millions of dollars, and 'B' meaning billions of dollars.
The events are stored as 'factors' in the storm data, so first, they are converted to character class for easy grouping analysis.
pop_storm[,1] <- as.character(pop_storm[,1])
Then, using the 'ddply' command from the 'plyr' package, the total number of fatalities are first calculated for each storm event type. They are then ordered to see first event has the most fatalities.
sum_fatalities_storm <- ddply(pop_storm, ~EVTYPE, summarize, sum=sum(FATALITIES))
colnames(sum_fatalities_storm) <- c("Disaster", "Fatalities")
ordered_fatality <- sum_fatalities_storm[order(-sum_fatalities_storm$Fatalities),]
Similar approach is used on the dataset to determine the total number of injuries of each storm event, and the output is ordered from high to low number of injuries.
sum_injuries_storm <- ddply(pop_storm, ~EVTYPE, summarize, injuries=sum(INJURIES))
colnames(sum_injuries_storm) <- c("Disaster", "Injuries")
ordered_injury <- sum_injuries_storm[order(-sum_injuries_storm$Injuries),]
These processed datasets will be used for producing results, which is included in the results section.
This section will cover the dataprocessing done for the property and crop damage datasets in order to determine the economic consequences of the storm events.
First, like we did in the population dataset, the events are converted from factor class to character class in both datasets.
prop_damage[,3] <- as.character(prop_damage[,3])
crop_damage[,3] <- as.character(crop_damage[,3])
The difference in these datasets is that, they have a column with just the number for the expenses, then there is the corresponding denomination for those numbers, which could be thousands, millions or even billions of dollars! So, we cannot just add them up without taking those denominations into consideration.
So, first, these numbers are subset into different datasets according to their denominations.
prop_damage_billion <- subset(prop_damage, prop_damage[,3] == "B" | prop_damage[,3]=="b")
prop_damage_million <- subset(prop_damage, prop_damage[,3] == "M" | prop_damage[,3]=="m")
prop_damage_thousand <- subset(prop_damage, prop_damage[,3] == "K" | prop_damage[,3]=="k")
These are then summed up to estimate the total amount of dollars (in their respective denominations) according to the event type.
sum_prop_damage_billion <- ddply(prop_damage_billion, ~EVTYPE, summarize, damage = sum(PROPDMG))
ordered_prop <- sum_prop_damage_billion[order(-sum_prop_damage_billion$damage),]
sum_prop_damage_million <- ddply(prop_damage_million, ~EVTYPE, summarize, damage = sum(PROPDMG))
ordered_prop_m <- sum_prop_damage_million[order(-sum_prop_damage_million$damage),]
sum_prop_damage_thousand <- ddply(prop_damage_thousand, ~EVTYPE, summarize, damage = sum(PROPDMG))
ordered_prop_t <- sum_prop_damage_thousand[order(-sum_prop_damage_thousand$damage),]
Then, all the values are converted to billions of dollars, to a common denominator. These datasets are then combined and then summed up according to their event type causes.
ordered_prop_m_converted <- ordered_prop_m
ordered_prop_m_converted[,2] <- ordered_prop_m_converted[,2] / 1000
ordered_prop_t_converted <- ordered_prop_t
ordered_prop_t_converted[,2] <- ordered_prop_t_converted[,2] / 1000000
prop_damage_final <- rbind(ordered_prop, ordered_prop_m_converted, ordered_prop_t_converted)
sum_prop_damage_final <- ddply(prop_damage_final, ~EVTYPE, summarize, sumdamage=sum(damage))
These results are then ordered according to highest economic consequence to lowest according to the storm event type.
ordered_prop_final <- sum_prop_damage_final[order(-sum_prop_damage_final$sumdamage),]
Similar apprach is followed to determine the crop damage as shown below.
crop_damage_million <- subset(crop_damage, crop_damage[,3] == "M" | crop_damage[,3]=="m")
crop_damage_billion <- subset(crop_damage, crop_damage[,3] == "B" | crop_damage[,3]=="b")
crop_damage_thousand <- subset(crop_damage, crop_damage[,3] == "K" | crop_damage[,3]=="k")
sum_crop_damage_billion <- ddply(crop_damage_billion, ~EVTYPE, summarize, damage = sum(CROPDMG))
ordered_crop <- sum_crop_damage_billion[order(-sum_crop_damage_billion$damage),]
sum_crop_damage_million <- ddply(crop_damage_million, ~EVTYPE, summarize, damage = sum(CROPDMG))
ordered_crop_m <- sum_crop_damage_million[order(-sum_crop_damage_million$damage),]
sum_crop_damage_thousand <- ddply(crop_damage_thousand, ~EVTYPE, summarize, damage = sum(CROPDMG))
ordered_crop_t <- sum_crop_damage_thousand[order(-sum_crop_damage_thousand$damage),]
ordered_crop_m_converted <- ordered_crop_m
ordered_crop_m_converted[,2] <- ordered_crop_m_converted[,2] / 1000
ordered_crop_t_converted <- ordered_crop_t
ordered_crop_t_converted[,2] <- ordered_crop_t_converted[,2] / 1000000
crop_damage_final <- rbind(ordered_crop, ordered_crop_m_converted, ordered_crop_t_converted)
sum_crop_damage_final <- ddply(crop_damage_final, ~EVTYPE, summarize, sumdamage=sum(damage))
ordered_crop_final <- sum_crop_damage_final[order(-sum_crop_damage_final$sumdamage),]
The final ordered property damage and crop damage results that have the damage expense values in billions of dollars are used to produce the results.
From the analysis of fatalities and injuries, it is observed that Tornadoes are the biggest cause when it comes to population health hazard, both in terms of fatalities and injuries. The top 10 storm event causes of fatalities and injuries are shwon in Tables 1 and 2.
out_fatality <- ordered_fatality[1:10,]
rownames(out_fatality) <- NULL
out_tab_fatality <- xtable(out_fatality, digits=0)
print(out_tab_fatality, type="html")
| Disaster | Fatalities | |
|---|---|---|
| 1 | TORNADO | 5633 |
| 2 | EXCESSIVE HEAT | 1903 |
| 3 | FLASH FLOOD | 978 |
| 4 | HEAT | 937 |
| 5 | LIGHTNING | 816 |
| 6 | TSTM WIND | 504 |
| 7 | FLOOD | 470 |
| 8 | RIP CURRENT | 368 |
| 9 | HIGH WIND | 248 |
| 10 | AVALANCHE | 224 |
The fatality numbers are lead by tornadoes, followed by excessive heat and flash flood events.
out_injury <- ordered_injury[1:10,]
rownames(out_injury) <- NULL
out_tab_injury <- xtable(out_injury, digits=0)
print(out_tab_injury, type="html")
| Disaster | Injuries | |
|---|---|---|
| 1 | TORNADO | 91346 |
| 2 | TSTM WIND | 6957 |
| 3 | FLOOD | 6789 |
| 4 | EXCESSIVE HEAT | 6525 |
| 5 | LIGHTNING | 5230 |
| 6 | HEAT | 2100 |
| 7 | ICE STORM | 1975 |
| 8 | FLASH FLOOD | 1777 |
| 9 | THUNDERSTORM WIND | 1488 |
| 10 | HAIL | 1361 |
The injury numbers are lead by tornadoes, followed by thunderstorm winds and floods.
The following code calculates the share of fatalities and injuries of the total population for all the storm events.
fat <- sum(ordered_fatality[,2])
inj <- sum(ordered_injury[,2])
Figure 1 shows the share of fatalities and injuries of all the storm events.
s <- c(fat, inj)
lbls <- c("Fatalities", "Injuries")
pie(s, labels=lbls, main="Figure 1. Share of Fatalities and Injuries for ALL Storm Events", col=c("blue", "red"))
It is observed that the total fatality is about 10% of the population hazard for the storm events.
Property damage and crop damage expenses are analyzed to determine which events have the most economic consequences. Tables 3 and 4 show the top 10 events that has the highest consequence in terms of billions of dollars, for property damage and crop damage respectively.
out_prop <- ordered_prop_final[1:10,]
rownames(out_prop) <- NULL
colnames(out_prop) <- c("Disaster", "Property Damage (Billions of Dollars)")
out_tab_prop <- xtable(out_prop, digits=2)
print(out_tab_prop, type="html")
| Disaster | Property Damage (Billions of Dollars) | |
|---|---|---|
| 1 | FLOOD | 144.66 |
| 2 | HURRICANE/TYPHOON | 69.31 |
| 3 | TORNADO | 56.94 |
| 4 | STORM SURGE | 43.32 |
| 5 | FLASH FLOOD | 16.14 |
| 6 | HAIL | 15.73 |
| 7 | HURRICANE | 11.87 |
| 8 | TROPICAL STORM | 7.70 |
| 9 | WINTER STORM | 6.69 |
| 10 | HIGH WIND | 5.27 |
out_crop <- ordered_crop_final[1:10,]
rownames(out_crop) <- NULL
colnames(out_crop) <- c("Disaster", "Crop Damage (Billions of Dollars)")
out_tab_crop <- xtable(out_crop, digits=2)
print(out_tab_crop, type="html")
| Disaster | Crop Damage (Billions of Dollars) | |
|---|---|---|
| 1 | DROUGHT | 13.97 |
| 2 | FLOOD | 5.66 |
| 3 | RIVER FLOOD | 5.03 |
| 4 | ICE STORM | 5.02 |
| 5 | HAIL | 3.03 |
| 6 | HURRICANE | 2.74 |
| 7 | HURRICANE/TYPHOON | 2.61 |
| 8 | FLASH FLOOD | 1.42 |
| 9 | EXTREME COLD | 1.29 |
| 10 | FROST/FREEZE | 1.09 |
It is observed that while flood events are the major cause of property damage, droughts top the cause of damage expenses in crops. However, considering the economic share of crops to properties, it can be concluded that Floods are the major source of economic consequence among all the storm events. Floods are also the second most destructive event when it comes to crops.
The following code calculates the total expenses of property and crop damages, to see their respective shares.
property <- sum(ordered_prop_final[,2])
crop <- sum(ordered_crop_final[,2])
Figure 2 shows the share of the property and crop damages.
s2 <- c(property, crop)
lbls <- c("Property", "Crops")
pie(s2, labels=lbls, main="Figure 2. Share of Property and Crop damage expenses for ALL Storm Events", col=c("yellow", "dark green"))
This reinforces the previous assumption that property damage is significantly higher compared to crop damages in terms of economic value, hence floods are the most destructive storm events.
Hence, to summarize the analysis, it can be concluded that as far as population health is concerned, tornado is the most destructive storm event, and as far as economic consequence is concerned, floods are the most destructive storm events.