This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. NOAA database tracks the major storm and severe weather events in the United States from the year 1950 to year 2011. Database holds the information such as where and when the events occured, as well as estimates of any fatalities, injuries, and crop and property damage. This report studies the consequencies of these severe weather events on population health, and economic damages of properties and crops across USA.
Detailed desciption of NOAA database can be found in the code book given here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.
if (!"repdata-data-StormData.csv.bz2" %in% dir()) {
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="repdata-data-StormData.csv.bz2")
bunzip2("repdata-data-StormData.csv.bz2", overwrite=T, remove=F)
}
data = read.csv("repdata-data-StormData.csv")
Preprocessing involves including only the relevant data for specific analysis.
Two variables are related to the population’s health in this dataset:
- Fatalities
- Injuries
Therefore, the analysis will focus only on these two variables to determine which events are most harmful with respect to population health.
library(plyr)
pop_health = data[,c("EVTYPE","FATALITIES","INJURIES")]
pop_health <- ddply(pop_health, .(EVTYPE), summarise, fatalities=sum(FATALITIES),injuries=sum(INJURIES))
pop_health$total <- pop_health$fatalities + pop_health$injuries
pop_health <- pop_health[order(pop_health$total, decreasing = TRUE),]
head(pop_health, n=10)
## EVTYPE fatalities injuries total
## 830 TORNADO 5633 91346 96979
## 123 EXCESSIVE HEAT 1903 6525 8428
## 854 TSTM WIND 504 6957 7461
## 164 FLOOD 470 6789 7259
## 452 LIGHTNING 816 5230 6046
## 269 HEAT 937 2100 3037
## 147 FLASH FLOOD 978 1777 2755
## 424 ICE STORM 89 1975 2064
## 759 THUNDERSTORM WIND 133 1488 1621
## 972 WINTER STORM 206 1321 1527
Note that there are several different terms for the same weather events, such as: TSTM WIND and THUNDERSTORM WIND, HEAT and EXCESSIVE HEAT, etc. Here, we will clean up those occurancies, that is, we will combine those events that are basically the same.
pop_health$EVTYPE[grepl("THUNDER*", pop_health$EVTYPE, ignore.case = TRUE)] = "THUNDERSTORM WIND"
pop_health$EVTYPE[grepl("TSTM*", pop_health$EVTYPE, ignore.case = TRUE)] = "THUNDERSTORM WIND"
pop_health$EVTYPE[grepl("FLOOD", pop_health$EVTYPE, ignore.case = TRUE)] = "FLOOD"
pop_health$EVTYPE[grepl("HEAT", pop_health$EVTYPE, ignore.case = TRUE)] = "HEAT"
pop_health <- ddply(pop_health, .(EVTYPE), summarise, fatalities=sum(fatalities),injuries=sum(injuries), total = sum(total))
pop_health <- pop_health[order(pop_health$total, decreasing = TRUE),]
head(pop_health, n=10)
## EVTYPE fatalities injuries total
## 630 TORNADO 5633 91346 96979
## 212 HEAT 3138 9224 12362
## 628 THUNDERSTORM WIND 756 9545 10301
## 127 FLOOD 1525 8604 10129
## 371 LIGHTNING 816 5230 6046
## 347 ICE STORM 89 1975 2064
## 730 WINTER STORM 206 1321 1527
## 281 HIGH WIND 248 1137 1385
## 182 HAIL 15 1361 1376
## 331 HURRICANE/TYPHOON 64 1275 1339
Preprocessing involves including only the relevant data for specifis analysis.
Following variables are related to the analysis of economic consequences in this dataset:
- Property damage - PROPDMG
- Property damage exponent - PROPDMGEXP
- Crop damage - CROPDMG
- Crop damage expoenent - CROPDMGEXP
Therefore, we shall select only those variables and perform our analysis on them. Furhtermore, dataset has the exponent feature for property and crop damage (PROPDMGEXP, CROPDMGEXP), which must be transformed into a numeric value and multiplied with their corresponging base.
econ_cons = data[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
econ_cons$PROPDMGEXP = as.character(econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("H|h","2",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("k|K","3",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("m|M","6",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("b|B","9",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = gsub("\\-|\\+|\\?|0","1",econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP = as.numeric(econ_cons$PROPDMGEXP)
econ_cons$PROPDMGEXP[is.na(econ_cons$PROPDMGEXP)] = 0
econ_cons$CROPDMGEXP=as.character(econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("H|h","2",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("k|K","3",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("m|M","6",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("b|B","9",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=gsub("\\-|\\+|\\?|0","1",econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP=as.numeric(econ_cons$CROPDMGEXP)
econ_cons$CROPDMGEXP[is.na(econ_cons$CROPDMGEXP)] = 0
# multiply the exponent with its corresponding base value
econ_cons$PROPDMG = econ_cons$PROPDMG * 10^econ_cons$PROPDMGEXP
econ_cons$CROPDMG = econ_cons$CROPDMG * 10^econ_cons$CROPDMGEXP
# group by event types
econ_cons_sum <- ddply(econ_cons, .(EVTYPE), summarise, prop_damage=sum(PROPDMG),crop_damage=sum(CROPDMG))
head(pop_health, n=10)
## EVTYPE fatalities injuries total
## 630 TORNADO 5633 91346 96979
## 212 HEAT 3138 9224 12362
## 628 THUNDERSTORM WIND 756 9545 10301
## 127 FLOOD 1525 8604 10129
## 371 LIGHTNING 816 5230 6046
## 347 ICE STORM 89 1975 2064
## 730 WINTER STORM 206 1321 1527
## 281 HIGH WIND 248 1137 1385
## 182 HAIL 15 1361 1376
## 331 HURRICANE/TYPHOON 64 1275 1339
To generate the most relevant results, only the 10 events that have caused the most injuries and fatalities combined, will be further analysed.
The analysis will show fatalities and injuries to give the top weather events that are harmful to the population health.
library(ggplot2)
library(grid)
plot1 = ggplot(head(pop_health, 10), aes(y=fatalities, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity", fill="red") + labs(x="Event Type", y="Number of\n fatalities per event") + geom_text(aes(label = fatalities), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))
plot2 = ggplot(head(pop_health, 10), aes(y=injuries, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity",fill="olivedrab") + labs(x="Event Type", y="Number of\n injuries per event") + geom_text(aes(label = injuries), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))
pushViewport(viewport(layout = grid.layout(1, 2)))
print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(plot2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
By observing preliminary analysis of the effects of severe weather events on the population health, it can be deduced that TORNADO causes the greatest amount of damage with respect to public health. The effects of TORNADOes are followed by excessive HEAT situations where the number of fatalities exceeds the number of injures. These two events are closely followed by THUNDERSTORM WINDs and FLOODs.
# add a new feature that shows the total damage done
econ_cons_sum$total <- econ_cons_sum$prop_damage + econ_cons_sum$crop_damage
econ_cons_sum <- econ_cons_sum[order(econ_cons_sum$total, decreasing = TRUE),]
head(econ_cons_sum, n=10)
## EVTYPE prop_damage crop_damage total
## 164 FLOOD 1.447e+11 5.662e+09 1.503e+11
## 406 HURRICANE/TYPHOON 6.931e+10 2.608e+09 7.191e+10
## 830 TORNADO 5.695e+10 4.150e+08 5.736e+10
## 666 STORM SURGE 4.332e+10 5.000e+03 4.332e+10
## 238 HAIL 1.574e+10 3.026e+09 1.876e+10
## 147 FLASH FLOOD 1.682e+10 1.421e+09 1.824e+10
## 88 DROUGHT 1.046e+09 1.397e+10 1.502e+10
## 397 HURRICANE 1.187e+10 2.742e+09 1.461e+10
## 586 RIVER FLOOD 5.119e+09 5.029e+09 1.015e+10
## 424 ICE STORM 3.945e+09 5.022e+09 8.967e+09
After sorting the events by the amount of damage they have caused to property and crops combined, we can conclude that the FLOODs have the highest ecomonical consequencies, causing up to ~150 billion USD in property damages and ~5.5 billion USD in crop damage. Damages caused by floods are followed by HURRICANEs and TORNADOes.
To see the most harmful nature events the analysis will show the top 10.
plot1 = ggplot(head(econ_cons_sum, 10), aes(y=prop_damage, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity", fill="red") + labs(x="Event Type", y="Property damage per event [USD]") + geom_text(aes(label = round(prop_damage/10^9,1)), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))
plot2 = ggplot(head(econ_cons_sum, 10), aes(y=crop_damage, x=EVTYPE)) + geom_bar(width=0.5, stat = "identity",fill="olivedrab") + labs(x="Event Type", y="Crop damage per event [USD]") + geom_text(aes(label = round(crop_damage/10^9,1)), size = 2.5) + theme(axis.text.x = element_text(angle = 30, hjust = 1))
pushViewport(viewport(layout = grid.layout(1, 2)))
print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(plot2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
These plots show the severity of the damage and give a clear picture of the most harmfull event types. FLOOD has the greatest impact on property damage but the DROUGHT is the most expensive when it comes to crops.
The authority should be aware of these weather disasters and act accordingly to reduce economic damages.