Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database that tracks characteristics of major storms and weather events in the United States between 1950 and November 2011, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The goal of this analysis is to answer the following questions about the effects of severe weather events:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Accross the United States, which types of events have the greatest economic consequences?
The data for this assignment can be downloaded from the course web site: Storm Data
Database documentation is also available: National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The following packages were used for this analysis:
library(dplyr)
library(ggplot2)
library(gridExtra)
library(grid)
Download data set into current working directory and read into R
fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,"./stormData.csv.bz2")
stormData <- read.csv(bzfile("stormData.csv.bz2"))
Subset data for columns pertaining to health and economic consequences of severe weather events
stormDatasub <- stormData[,c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Next the effects on population health and economic consequences are investigated.
Fatalities summarized by event type in descending order
fatalityData <-stormDatasub %>%
group_by(EVTYPE) %>%
summarize(Fatalities = sum(FATALITIES, na.rm = T)) %>%
arrange(desc(Fatalities))
Injuries summarized by event type in descending order
injuryData <-stormDatasub %>%
group_by(EVTYPE) %>%
summarize(Injuries = sum(INJURIES, na.rm = T)) %>%
arrange(desc(Injuries))
To calculate financial damage, a function must be created to convert letter values stored in a separate column to usuable numbers
getExp <- function(e) {
if (e %in% c("h", "H"))
return(2)
else if (e %in% c("k", "K"))
return(3)
else if (e %in% c("m", "M"))
return(6)
else if (e %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid value.")
}
}
The function is then called to cacluate property and crop damagne
propExp <-sapply(stormDatasub$PROPDMGEXP, FUN = getExp)
stormDatasub$propDamage<-stormDatasub$PROPDMG *(10**propExp)
cropExp<-sapply(stormDatasub$ CROPDMGEXP, FUN = getExp)
stormDatasub$cropDamage<-stormDatasub$CROPDMG * (10 **cropExp)
Financial damange for crops and property are then summarized by event type
econDamage<-stormDatasub %>%
group_by(EVTYPE) %>%
summarize(propDamage =sum(propDamage), cropDamage = sum(cropDamage))
and events not causing any financial damage are omitted
econDamage<-econDamage[(econDamage$propDamage>0)|econDamage$cropDamage>0, ]
Data is then sorted in decreasing order
propDmgSorted <- econDamage[order(econDamage$propDamage, decreasing = T), ]
cropDmgSorted <- econDamage[order(econDamage$cropDamage, decreasing = T), ]
Effects on population health Top 5 weather events affecting injuries and deaths are as follows:
head(injuryData,5)
## # A tibble: 5 × 2
## EVTYPE Injuries
## <fctr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
head(fatalityData,5)
## # A tibble: 5 × 2
## EVTYPE Fatalities
## <fctr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
Plot of Top 10 Events
p1<-ggplot(head(injuryData,10), aes(x = reorder(EVTYPE,Injuries), y = Injuries)) +
geom_bar(fill = "darkolivegreen",stat = "Identity")+
coord_flip()+
xlab("EVent Type")+
ylab("Total Number of Injuries")+
ggtitle("Health Impact of Top 10 Weather Events in the US")
p2<-ggplot(head(fatalityData,10), aes(x = reorder(EVTYPE, Fatalities), y = Fatalities)) +
geom_bar(fill = "goldenrod", stat = "Identity")+
coord_flip()+
xlab("Event Type")+
ylab("Total Number of Fatalities")
grid.arrange(p1, p2,nrow = 2)
Tornoados are the most dangerous events as indicated by the plots above.
Top 5 weather events causing financial damage to property and crops are as follows
head(propDmgSorted[ ,c("EVTYPE","propDamage")],5)
## # A tibble: 5 × 2
## EVTYPE propDamage
## <fctr> <dbl>
## 1 FLASH FLOOD 6.820237e+13
## 2 THUNDERSTORM WINDS 2.086532e+13
## 3 TORNADO 1.078951e+12
## 4 HAIL 3.157558e+11
## 5 LIGHTNING 1.729433e+11
head(cropDmgSorted[ ,c("EVTYPE","cropDamage")],5)
## # A tibble: 5 × 2
## EVTYPE cropDamage
## <fctr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025974480
Flash floods, thunderstorms, and tornados cause the most economic damage of the weather events.
To confirm the findings above, plots of the Top 10 events for property and crop damage are shown below:
p1 <- ggplot(data=head(propDmgSorted,10), aes(x=reorder(EVTYPE, propDamage), y=log10(propDamage), fill=propDamage )) +
geom_bar(fill="darkblue", stat="identity") + coord_flip() +
xlab("Event type") + ylab("Property damage in dollars (log10)") +
ggtitle("Economic impact of weather events in the US - Top 10") +
theme(plot.title = element_text(hjust = 0))
p2 <- ggplot(data=head(cropDmgSorted,10), aes(x=reorder(EVTYPE, cropDamage), y=cropDamage, fill=cropDamage)) +
geom_bar(fill="goldenrod", stat="identity") + coord_flip() +
xlab("Event type") + ylab("Crop damage in dollars") +
theme(legend.position="none")
grid.arrange(p1, p2, ncol=1, nrow =2)