Analysis of storm harmfulness by type of events on population health and economy based on Data from the National Weather Service.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.
Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, the objective of the following study is to identify which types of events are most harmful to population health and have the greatest economic consequences.
The database for this study tracks characteristics of major storms and weather events in the United States from 1950 to 2011, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Storm Data are available downloading file here.
Downloading is done in project working folder saving date of download with following code:
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = paste(getwd(),"/Storm_file",sep=""))
print(Sys.Date())
## [1] "2021-03-05"
Reading Storm data and summary.
data <- read.csv("Storm_file")
dim(data)
## [1] 902297 37
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Harmfulness to population health is analyzed using event type (“EVTYPE”), fatalities (“FATALITIES”) and injuries (“INJURIES”).
Economic consequences is analyzed using event type (“EVTYPE”), property damages (“PROPDMG” and cost magnitude “PROPDMGEXP”) and crop damages (“CROPDMG” and cost magnitude “CROPDMGEXP”).
Direct and indirect fatalities, injuries or damages (properties and crop) are counted.
To take in account over time changes in data collection, the analysis is focused on data collected after year 2000 (with beginning of events date:“BGN_DATE”).
data2 <- data[as.Date(data$BGN_DATE, "%m/%d/%Y %T") > as.Date("12/31/1999", "%m/%d/%Y"),]
To keep maximum of usable data for each analysis, health and economic data are cleaned in two separate sets.
health <- data2[,c(2,8,23,24)]
str(health)
## 'data.frame': 523163 obs. of 4 variables:
## $ BGN_DATE : chr "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" ...
## $ EVTYPE : chr "TSTM WIND" "TSTM WIND" "TSTM WIND" "TSTM WIND" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 0 0 ...
## $ INJURIES : num 0 0 0 0 0 0 0 0 0 0 ...
economic <- data2[,c(2,8,25,26,27,28)]
str(economic)
## 'data.frame': 523163 obs. of 6 variables:
## $ BGN_DATE : chr "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" "2/13/2000 0:00:00" ...
## $ EVTYPE : chr "TSTM WIND" "TSTM WIND" "TSTM WIND" "TSTM WIND" ...
## $ PROPDMG : num 20 20 2 5 2 5 4 30 20 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "K" "K" "K" "K" ...
According to preparation guide, magnitude for damages cost is defined with an exponential abbreviation (empty if no damage, “K” for thousands, “M” for Millions and “B” for Billions).
For accuracy purpose, data are discarded from economic analysis if incorrect abbreviation or number is used either in property and crop damages.
nb: there is much less inaccurate values with data above year 2000
economic <- economic[economic$PROPDMGEXP=="K"|economic$PROPDMGEXP=="M"|economic$PROPDMGEXP=="B"|economic$PROPDMGEXP=="",]
economic <- economic[economic$CROPDMGEXP=="K"|economic$CROPDMGEXP=="M"|economic$CROPDMGEXP=="B"|economic$CROPDMGEXP=="",]
For the analysis, all damages cost are converted in Million $
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
economic <-
economic %>%
mutate(propcosts = ifelse(PROPDMGEXP == "K", PROPDMG/1000,
ifelse(PROPDMGEXP == "M",PROPDMG,
ifelse(PROPDMGEXP == "B",PROPDMG*1000,0))),
cropcosts = ifelse(CROPDMGEXP == "K", CROPDMG/1000,
ifelse(CROPDMGEXP == "M",CROPDMG,
ifelse(CROPDMGEXP == "B",CROPDMG*1000,0))))
Health data are grouped by event type to calculate sum of fatalities and injuries for each type of event.
Results are merged in one set by event type, ten most harmful events are sorted based on fatalities count.
Count of injuries for most harmful events are showed in Table 1. Injuries are not represented alone as it doesn’t reflect event harmfulness separated from fatalities.
Table 1
#data with sum of fatalities by type of event
mostharmful_fatalities <- health %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES))
#data with sum of injuries by type of event
mostharmful_injuries <- health %>%
group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES))
#merged data
mostharmful_health <- merge(mostharmful_fatalities, mostharmful_injuries, by = "EVTYPE")
mostharmful_health <- mostharmful_health %>%
arrange(desc(Fatalities), desc(Injuries))
head(mostharmful_health, 10)
## EVTYPE Fatalities Injuries
## 1 TORNADO 1193 15213
## 2 EXCESSIVE HEAT 1013 3708
## 3 FLASH FLOOD 600 812
## 4 LIGHTNING 466 2993
## 5 RIP CURRENT 340 208
## 6 FLOOD 266 315
## 7 HEAT 231 1222
## 8 AVALANCHE 179 126
## 9 HIGH WIND 131 677
## 10 THUNDERSTORM WIND 130 1400
Figure 1
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
g <- ggplot(data = mostharmful_health[1:10,],
aes(x = Fatalities, y = reorder(EVTYPE, Fatalities),
fill = reorder(EVTYPE, -Fatalities)))
g +
geom_col(color = "black") +
scale_fill_brewer(palette = "Spectral") +
theme_minimal() +
theme(legend.position = "none",
axis.title.y = element_blank(),
plot.margin = margin(0.5, 1, 0.5, 0.5, "cm")) +
ggtitle("Most harmful events to population health\nUnited States: 2000-2011") +
xlab("Number of Fatalities")
Economic data are grouped by event type to calculate sum of property and crop damages costs for each type of event.
Results are analyzed separately as cost magnitude isn’t comparable. Ten most harmful events are sorted based on properties damage costs (Table and Figure 2) and crop damage costs (Table and Figure 3).
Table 2
#data with sum of properties costs by type of event
mostharmful_prop <-
economic %>%
group_by(EVTYPE) %>%
summarise(Costs = sum(propcosts)) %>%
ungroup() %>%
arrange(desc(Costs))
head(mostharmful_prop, 10)
## # A tibble: 10 x 2
## EVTYPE Costs
## <chr> <dbl>
## 1 FLOOD 134691.
## 2 HURRICANE/TYPHOON 69306.
## 3 STORM SURGE 43171.
## 4 TORNADO 19461.
## 5 HAIL 11989.
## 6 FLASH FLOOD 11877.
## 7 TROPICAL STORM 7195.
## 8 HIGH WIND 4946.
## 9 WILDFIRE 4759.
## 10 STORM SURGE/TIDE 4641.
Figure 2
g <- ggplot(data = mostharmful_prop[1:10,],
aes(x = Costs, y = reorder(EVTYPE, Costs),
fill = reorder(EVTYPE, Costs)))
g +
geom_col(color = "black") +
scale_fill_brewer(palette = "PuOr") +
theme_minimal() +
theme(legend.position = "none",
axis.title.y = element_blank(),
plot.margin = margin(0.5, 1, 0.5, 0.5, "cm")) +
ggtitle("Events with highest properties damage costs\nUnited States: 2000-2011") +
xlab("Cost in M$")
Table 3
#data with sum of crop costs by type of event
mostharmful_crop <-
economic %>%
group_by(EVTYPE) %>%
summarise(Costs = sum(cropcosts)) %>%
ungroup() %>%
arrange(desc(Costs))
head(mostharmful_crop, 10)
## # A tibble: 10 x 2
## EVTYPE Costs
## <chr> <dbl>
## 1 DROUGHT 9136.
## 2 FLOOD 4222.
## 3 HURRICANE/TYPHOON 2608.
## 4 HAIL 1785.
## 5 FROST/FREEZE 1094.
## 6 FLASH FLOOD 905.
## 7 HIGH WIND 498.
## 8 EXCESSIVE HEAT 492.
## 9 HURRICANE 449.
## 10 TROPICAL STORM 412.
Figure 3
g <- ggplot(data = mostharmful_crop[1:10,],
aes(x = Costs, y = reorder(EVTYPE, Costs),
fill = reorder(EVTYPE, -Costs)))
g +
geom_col(color = "black") +
scale_fill_brewer(palette = "BrBG") +
theme_minimal() +
theme(legend.position = "none",
axis.title.y = element_blank(),
plot.margin = margin(0.5, 1, 0.5, 0.5, "cm")) +
ggtitle("Events with highest crop damage costs\nUnited States: 2000-2011") +
xlab("Cost in M$")
National Weather Service [Storm Data Documentation] https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf National Weather Service [FAQ] https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf