In this report, we will explore and analyze the NOAA Storm Database and answer some basic questions about the impact of storms. We will try to find the events which have the most impact on population health as well as economy.
We will try to answer the following two questions,
Most harmful events with respect to population health
Events with great economic consequence
To answer the first question we consider the total number of fatalities and injuries for each event. Then we make a bar plot for the top 10 events with the highest fatalities and injuries.
Similarly, to find the answer of the second question we calculated the total damage done by each events and made barplot to find the top 10 most hazardous events for economy.
The storm data can be downloaded from the following link.
Load the data using a temporary connection to the above hyperlink. Name the imported data as AllData. Since the data is big, we will use cache = TRUE in the chunk.
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
temp <- tempfile()
download.file(url = fileurl, destfile = temp, method = 'curl')
AllData <- read.csv(temp, stringsAsFactors = FALSE)
unlink(temp)
Now, consider the event type which causes most harm to population health. So we consider the two variables “FATALITIES” and “INJURIES”. We first make a data frame with the event type and number of fatalites and made a bar plot with the top 10 highest fatalities. We do the same for injuries.
library(dplyr)
library(ggplot2)
Fatalities <- AllData %>% select( EVTYPE, FATALITIES) %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES)) %>%
mutate(EVTYPE = reorder(EVTYPE, desc(Fatalities))) %>%
arrange(desc(Fatalities))
head(Fatalities)
## Source: local data frame [6 x 2]
##
## EVTYPE Fatalities
## (fctr) (dbl)
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
Injuries <- AllData %>% select( EVTYPE, INJURIES) %>%
group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES)) %>%
mutate(EVTYPE = reorder(EVTYPE, desc(Injuries))) %>%
arrange(desc(Injuries))
head(Injuries)
## Source: local data frame [6 x 2]
##
## EVTYPE Injuries
## (fctr) (dbl)
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
To answer our second question, we need to calculate the total cost of damage caused by each events.
cost <- AllData %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
cost$PROPDMGEXP <- as.factor(cost$PROPDMGEXP)
cost$CROPDMGEXP <- as.factor(cost$CROPDMGEXP)
#check the levels of multiplier forr damage
levels(cost$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(cost$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
#change the multiplier to appropriate number
levels(cost$PROPDMGEXP) <- c(1,1,1,1,1,10,100,1000,10000,100000,1000000,
10000000,100000000,1000000000,100,100,1000,
1000000,1000000)
levels(cost$CROPDMGEXP) <- c(1,1,1,100,1000000000,1000,1000,1000000,1000000)
#make a data frame with total cost for each event
totalcost <- cost %>% mutate(TotalDamage = PROPDMG * as.numeric(PROPDMGEXP) +
CROPDMG * as.numeric(CROPDMGEXP)) %>%
select(EVTYPE, TotalDamage) %>% group_by(EVTYPE) %>%
summarise(Damage = sum(TotalDamage))%>%
mutate(EVTYPE = reorder(EVTYPE, desc(Damage))) %>%
arrange(desc(Damage))
head(totalcost)
## Source: local data frame [6 x 2]
##
## EVTYPE Damage
## (fctr) (dbl)
## 1 TORNADO 13394010
## 2 FLASH FLOOD 6438660
## 3 TSTM WIND 5790410
## 4 HAIL 5114498
## 5 FLOOD 4341956
## 6 THUNDERSTORM WIND 3782310
Here is the bar plot for fatalities.
Fatal <- Fatalities[1:10,] %>%
mutate(EVTYPE = reorder(EVTYPE, Fatalities))
ggplot(Fatal, aes(x = EVTYPE, y = Fatalities ))+
geom_bar(stat = "identity", fill = "blue") +
coord_flip()+ ggtitle("Total Fatalities for Each Event ")+
labs(x = "Event Type", y = "Fatalities")
Here is the bar plot for injuries.
Injury <- Injuries[1:10,] %>%
mutate(EVTYPE = reorder(EVTYPE, Injuries))
ggplot(Injury, aes(x = EVTYPE, y = Injuries))+
geom_bar(stat = "identity", fill = "blue") +
coord_flip() + ggtitle("Total Injuries for Each Event ")+
labs(x = "Event Type", y = "Injuries")
We can deduce from this data analysis that the most detrimental events for population health are tornadoes, heat, tstm wind and flood.
Now we will draw the bar plot for total cost of damage caused by each event. We will consider the top 10 most devastating event.
TotalCost <- mutate(totalcost[1:10,], EVTYPE = reorder(EVTYPE, Damage))
ggplot(TotalCost, aes(x = EVTYPE, y = Damage))+
geom_bar(stat = "identity", fill = "blue") +
coord_flip() + ggtitle("Total Cost for Each Event ")+
labs(x = "Event Type", y = "Total Damage($)")
So the most destructive events for economy are tornado, flash flood and tstm wind.