This analysis is a project in the Coursera Reproducible Research course, part of the Data Science Specialization. The goal is to explore the NOAA Storm Database and explore the severe weather events on both population health and economy.
The database covers the time period between 1950 and November 2011. The analysis aims to investigate which different types of sever weather events are most harmful on the populations health in respect of general injuries and fatalities. Further the economic consequences are analyzed by exploring the financial damage done to both general property and crops.
The data can be downloaded from the course website: Storm Data. Documentation of the database is available here:
National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ
First libarary the packages needed in our analysis.
library(plyr)
library(ggplot2)
library(magrittr)
library(gridExtra)
StormData <- read.csv("StormData.csv.bz2")
The fatalaties and injuries are summarized with respect to the event types. Then they are sorted decreasingly.
TotalFatalities <- aggregate(FATALITIES ~ EVTYPE, StormClean, sum) %>% arrange(desc(FATALITIES))
TotalInjuries <- aggregate(INJURIES ~ EVTYPE, StormClean, sum) %>% arrange(desc(INJURIES))
According to NATIONAL WEATHER SERVICE INSTRUCTION, the exponents are stored in a seperated column, “K” for thousands, “M” for millions, and “B” for billions. Take a glimps at different values in “PROPDMGEXP” and “CROPDMGEXP”.
unique(StormClean$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(StormClean$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
In order to get the numeric value of the damage, we need to transfer the exponent to numeric value first. Build a function to transfer the exponents according to the levels obtained in the last step.
GetExpValue <- function(x)
{
if(x=='h' || x=='H')
return(2)
else if(x=='k' || x=='K')
return(3)
else if(x=='m' || x=='M')
return(6)
else if(x=='B')
return(9)
if(!is.na(as.numeric(x))) #if x is character, as.numeric(x) returns NA
return(as.numeric(x))
else return(0)
}
Then calculate the real damage value.
PropExpValue <- sapply(StormClean$PROPDMGEXP, FUN = GetExpValue)
CropExpValue <- sapply(StormClean$CROPDMGEXP, FUN = GetExpValue)
StormClean$PropDmgVal <- StormClean$PROPDMG * (10 ** PropExpValue)
StormClean$CropDmgVal <- StormClean$CROPDMG * (10 ** CropExpValue)
The Property damage and crop damage value are summarized with respect to the event types. Then they are sorted decreasingly.
PropDmgSorted <- aggregate(PropDmgVal ~ EVTYPE, StormClean, sum) %>% arrange(desc(PropDmgVal))
CropDmgSorted <- aggregate(CropDmgVal ~ EVTYPE, StormClean, sum) %>% arrange(desc(CropDmgVal))
List the top 10 events which cause the most severe population fatality and injury respectively, then show the bar chart of the top 10 events. Tornado is the most harmful event with respect to population health, as it has the highest number of both fatality and injury, which are much higher than other events.Excessive Heat also has severe impact on population health.
head(TotalFatalities,10)
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
head(TotalInjuries,10)
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
g1 <- ggplot(TotalFatalities[1:10,], aes(reorder(EVTYPE, FATALITIES),FATALITIES)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Fatalities") + ggtitle("Top 10 Events of Population Health Impact")
# Event type is ordered by name, so need to reorder it according to FATALITIES
g2 <- ggplot(TotalInjuries[1:10,], aes(reorder(EVTYPE, INJURIES),INJURIES)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Injuries")
grid.arrange(g1, g2, ncol=1)
Also list top 10 events which has most severe economy consquences, and show the bar plot. Flash FLood has the most property damage, which is much more than any other event. Drought has the most crop damage.
head(PropDmgSorted,10)
## EVTYPE PropDmgVal
## 1 FLASH FLOOD 6.820237e+13
## 2 THUNDERSTORM WINDS 2.086532e+13
## 3 TORNADO 1.078951e+12
## 4 HAIL 3.157558e+11
## 5 LIGHTNING 1.729433e+11
## 6 FLOOD 1.446577e+11
## 7 HURRICANE/TYPHOON 6.930584e+10
## 8 FLOODING 5.920826e+10
## 9 STORM SURGE 4.332354e+10
## 10 HEAVY SNOW 1.793259e+10
head(TotalInjuries,10)
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
g3 <- ggplot(PropDmgSorted[1:10,], aes(reorder(EVTYPE, PropDmgVal),PropDmgVal)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Property Damage ($)") + ggtitle("Top 10 Events of Economy Consequence")
g4 <- ggplot(CropDmgSorted[1:10,], aes(reorder(EVTYPE, CropDmgVal),CropDmgVal)) + coord_flip() + geom_col() + xlab("Entity Type") + ylab("Total Crop Damage ($)")
grid.arrange(g3, g4, ncol=1)