Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Analysis is performed to find the events which have the most harmful impact regarding both population health and economic consequences.
## Set working directory, download dataset & read it into R
setwd("~/R Directory/C5W4")
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("storm.csv.bz2"))download.file(fileurl, "storm.csv.bz2")
if(!exists("storm"))storm <- read.csv("storm.csv.bz2")
## Select variable that are relevant to the required analysis, then remove original dataset & cach results
db <- storm %>% select(c("EVTYPE", "INJURIES", "FATALITIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
rm("storm")
It seems to me that fatalities are terrible, but the question is addressing population health, which I conceive as the existing population health, not people who already died, as that affects their health maybe with chronic injuries to end of their lives. So, I sumarized the data according to number of injuries.
## Calculate total number of injuries & fatalities
db1 <- db %>% group_by(EVTYPE) %>% summarize(suminj = sum(INJURIES, na.rm = TRUE), sumfat = sum(FATALITIES, na.rm = TRUE)) %>% arrange(desc(suminj))
## Create table with obtained results
kable(db1[1:5, ], caption = "Top five event types sorted by total number of injuries", align = "c", col.names = c("Event", "Injuries", "Fatalities"))
| Event | Injuries | Fatalities |
|---|---|---|
| TORNADO | 91346 | 5633 |
| TSTM WIND | 6957 | 504 |
| FLOOD | 6789 | 470 |
| EXCESSIVE HEAT | 6525 | 1903 |
| LIGHTNING | 5230 | 816 |
According to NOAA database for US recent storms history, the most harmful event with respect to population health is TORNADO
## Calculate the cost of properties & crops damages
levels(db$CROPDMGEXP) <- c(1, 1, 1, 1, 1e9, 1e3, 1e3, 1e6, 1e6)
levels(db$PROPDMGEXP) <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1e9, 1e2, 1e2, 1e3, 1e6, 1e6)
db$CROPDMGEXP <- as.numeric(as.character(db$CROPDMGEXP))
db$PROPDMGEXP <- as.numeric(as.character(db$PROPDMGEXP))
## Calculate total economic loss
db2 <- db %>% group_by(EVTYPE) %>% summarize(economic = sum(PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP, na.rm = TRUE)) %>% arrange(desc(economic))
## Plot results on bar chart
qplot(data = db2[1:5, ], x = EVTYPE, weight = economic, geom = "bar", xlab = "Event type", ylab = "Total cost", main = "Top five event types sorted by total damage cost", col = "red", show.legend = FALSE) + theme_bw()
According to NOAA database for US recent storms history, the most harmful event with respect to economic consequences is FLOOD