Summary

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. NOAA database tracks the major storm and severe weather events in the United States from the year 1950 to year 2011. Database holds the information such as where and when the events occured, as well as estimates of any fatalities, injuries, and crop and property damage. This report studies the consequencies of these severe weather events on population health, and economic damages of properties and crops across USA.

Detailed desciption of NOAA database can be found in the code book given here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.

Data Processing

if (!"repdata-data-StormData.csv.bz2" %in% dir()) {
     download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="repdata-data-StormData.csv.bz2")
     bunzip2("repdata-data-StormData.csv.bz2", overwrite=T, remove=F)
}
data = read.csv("repdata-data-StormData.csv")

Population Health Consequences

Preprocessing involves including only the relevant data for specific analysis.
Two variables are related to the population’s health in this dataset:
- Fatalities
- Injuries
Therefore, the analysis will focus only on these two variables to determine which events are most harmful with respect to population health.

library(plyr)
pop_health = data[,c("EVTYPE","FATALITIES","INJURIES")]
pop_health <- ddply(pop_health, .(EVTYPE), summarise, fatalities=sum(FATALITIES),injuries=sum(INJURIES))
pop_health$total <- pop_health$fatalities + pop_health$injuries
pop_health <- pop_health[order(pop_health$total, decreasing = TRUE),]
head(pop_health, n=10)
##                EVTYPE fatalities injuries total
## 830           TORNADO       5633    91346 96979
## 123    EXCESSIVE HEAT       1903     6525  8428
## 854         TSTM WIND        504     6957  7461
## 164             FLOOD        470     6789  7259
## 452         LIGHTNING        816     5230  6046
## 269              HEAT        937     2100  3037
## 147       FLASH FLOOD        978     1777  2755
## 424         ICE STORM         89     1975  2064
## 759 THUNDERSTORM WIND        133     1488  1621
## 972      WINTER STORM        206     1321  1527

Note that there are several different terms for the same weather events, such as: TSTM WIND and THUNDERSTORM WIND, HEAT and EXCESSIVE HEAT, etc. Here, we will clean up those occurancies, that is, we will combine those events that are basically the same.

pop_health$EVTYPE[grepl("THUNDER*", pop_health$EVTYPE, ignore.case = TRUE)] = "THUNDERSTORM WIND"
pop_health$EVTYPE[grepl("TSTM*", pop_health$EVTYPE, ignore.case = TRUE)] = "THUNDERSTORM WIND"
pop_health$EVTYPE[grepl("FLOOD", pop_health$EVTYPE, ignore.case = TRUE)] = "FLOOD"
pop_health$EVTYPE[grepl("HEAT", pop_health$EVTYPE, ignore.case = TRUE)] = "HEAT"
pop_health <- ddply(pop_health, .(EVTYPE), summarise, fatalities=sum(fatalities),injuries=sum(injuries), total = sum(total))
pop_health <- pop_health[order(pop_health$total, decreasing = TRUE),]
head(pop_health, n=10)
##                EVTYPE fatalities injuries total
## 630           TORNADO       5633    91346 96979
## 212              HEAT       3138     9224 12362
## 628 THUNDERSTORM WIND        756     9545 10301
## 127             FLOOD       1525     8604 10129
## 371         LIGHTNING        816     5230  6046
## 347         ICE STORM         89     1975  2064
## 730      WINTER STORM        206     1321  1527
## 281         HIGH WIND        248     1137  1385
## 182              HAIL         15     1361  1376
## 331 HURRICANE/TYPHOON         64     1275  1339

Economic Consequences

Preprocessing involves including only the relevant data for specifis analysis.
Following variables are related to the analysis of economic consequences in this dataset:
- Property damage - PROPDMG
- Property damage exponent - PROPDMGEXP
- Crop damage - CROPDMG
- Crop damage expoenent - CROPDMGEXP
Therefore, we shall select only those variables and perform our analysis on them. Furhtermore, dataset has the exponent feature for property and crop damage (PROPDMGEXP, CROPDMGEXP), which must be transformed into a numeric value and multiplied with their corresponging base.

econ_cons = data[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
econ_cons$PROPDMGEXP = as.character(econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("H|h","2",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("k|K","3",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("m|M","6",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("b|B","9",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("\\-|\\+|\\?|0","1",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = as.numeric(econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP[is.na(econ_cons$PROPDMGEXP)] = 0

econ_cons$CROPDMGEXP=as.character(econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("H|h","2",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("k|K","3",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("m|M","6",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("b|B","9",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("\\-|\\+|\\?|0","1",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=as.numeric(econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP[is.na(econ_cons$CROPDMGEXP)] = 0
# multiply the exponent with its corresponding base value
econ_cons$PROPDMG = econ_cons$PROPDMG * 10^econ_cons$PROPDMGEXP
econ_cons$CROPDMG = econ_cons$CROPDMG * 10^econ_cons$CROPDMGEXP
# group by event types
econ_cons_sum <- ddply(econ_cons, .(EVTYPE), summarise, prop_damage=sum(PROPDMG),crop_damage=sum(CROPDMG))

Results

Population Health Consequences

head(pop_health, n=10)
##                EVTYPE fatalities injuries total
## 630           TORNADO       5633    91346 96979
## 212              HEAT       3138     9224 12362
## 628 THUNDERSTORM WIND        756     9545 10301
## 127             FLOOD       1525     8604 10129
## 371         LIGHTNING        816     5230  6046
## 347         ICE STORM         89     1975  2064
## 730      WINTER STORM        206     1321  1527
## 281         HIGH WIND        248     1137  1385
## 182              HAIL         15     1361  1376
## 331 HURRICANE/TYPHOON         64     1275  1339

To generate the most relevant results, only the 10 events that have caused the most injuries and fatalities combined, will be further analysed.

The analysis will show fatalities and injuries to give the top weather events that are harmful to the population health.

library(ggplot2)
library(grid)
plot1 = ggplot(head(pop_health, 10), aes(y=fatalities, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity", fill="red") + labs(x="Event Type", y="Number of\n fatalities per event") + geom_text(aes(label = fatalities), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))

plot2 = ggplot(head(pop_health, 10), aes(y=injuries, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity",fill="olivedrab") + labs(x="Event Type", y="Number of\n injuries per event") + geom_text(aes(label = injuries), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))

pushViewport(viewport(layout = grid.layout(1, 2)))
print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(plot2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))

plot of chunk unnamed-chunk-6

By observing preliminary analysis of the effects of severe weather events on the population health, it can be deduced that TORNADO causes the greatest amount of damage with respect to public health. The effects of TORNADOes are followed by excessive HEAT situations where the number of fatalities exceeds the number of injures. These two events are closely followed by THUNDERSTORM WINDs and FLOODs.

Economic Consequences

# add a new feature that shows the total damage done
econ_cons_sum$total <- econ_cons_sum$prop_damage + econ_cons_sum$crop_damage
econ_cons_sum <- econ_cons_sum[order(econ_cons_sum$total, decreasing = TRUE),]
head(econ_cons_sum, n=10)
##                EVTYPE prop_damage crop_damage     total
## 164             FLOOD   1.447e+11   5.662e+09 1.503e+11
## 406 HURRICANE/TYPHOON   6.931e+10   2.608e+09 7.191e+10
## 830           TORNADO   5.695e+10   4.150e+08 5.736e+10
## 666       STORM SURGE   4.332e+10   5.000e+03 4.332e+10
## 238              HAIL   1.574e+10   3.026e+09 1.876e+10
## 147       FLASH FLOOD   1.682e+10   1.421e+09 1.824e+10
## 88            DROUGHT   1.046e+09   1.397e+10 1.502e+10
## 397         HURRICANE   1.187e+10   2.742e+09 1.461e+10
## 586       RIVER FLOOD   5.119e+09   5.029e+09 1.015e+10
## 424         ICE STORM   3.945e+09   5.022e+09 8.967e+09

After sorting the events by the amount of damage they have caused to property and crops combined, we can conclude that the FLOODs have the highest ecomonical consequencies, causing up to ~150 billion USD in property damages and ~5.5 billion USD in crop damage. Damages caused by floods are followed by HURRICANEs and TORNADOes.

To see the most harmful nature events the analysis will show the top 10.

plot1 = ggplot(head(econ_cons_sum, 10), aes(y=prop_damage, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity", fill="red") + labs(x="Event Type", y="Property damage per event [USD]") + geom_text(aes(label = round(prop_damage/10^9,1)), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))
plot2 = ggplot(head(econ_cons_sum, 10), aes(y=crop_damage, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity",fill="olivedrab") + labs(x="Event Type", y="Crop damage per event [USD]") + geom_text(aes(label = round(crop_damage/10^9,1)), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))
pushViewport(viewport(layout = grid.layout(1, 2)))
print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(plot2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))

plot of chunk unnamed-chunk-8

These plots show the severity of the damage and give a clear picture of the most harmfull event types. FLOOD has the greatest impact on property damage but the DROUGHT is the most expensive when it comes to crops.
The authority should be aware of these weather disasters and act accordingly to reduce economic damages.