Severe weather events can cause significant damage. In this study, we analyzed storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA) to identify the most harmful weather events to population health and the ones that caused the greatest economic damage. Our finding indicate that tornadoes are the most harmful to population health including fatalities and injuries. Additionally, floods caused the greatest economic damage, including damage to properties and crops.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(R.utils)
library(tidyverse)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "repdata_data_StormData.csv.bz2")
bunzip2("repdata_data_StormData.csv.bz2")
storm <- read.csv("repdata_data_StormData.csv")
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
To explore population health and economy, we will only need “FATALITIES”,“INJURIES”,“PROPDMG”,“PROPDMGEXP”,“CROPDMG”,“CROPDMGEXP” columns with “EVTYPE”.
storm2 <- storm %>%
select("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
summary(storm2)
## EVTYPE FATALITIES INJURIES PROPDMG
## Length:902297 Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00
## Mode :character Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:902297 Min. : 0.000 Length:902297
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 1.527
## 3rd Qu.: 0.000
## Max. :990.000
storm2 <- storm2 %>%
mutate(TOTAL_HEALTH = FATALITIES + INJURIES)
health10 <- storm2 %>%
select(EVTYPE, FATALITIES, INJURIES, TOTAL_HEALTH) %>%
group_by(EVTYPE) %>%
summarize(
TOTAL_HEALTH = sum(TOTAL_HEALTH, na.rm = TRUE),
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE)
) %>%
arrange(desc(TOTAL_HEALTH)) %>%
slice(1:10)
health10
## # A tibble: 10 × 4
## EVTYPE TOTAL_HEALTH FATALITIES INJURIES
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 96979 5633 91346
## 2 EXCESSIVE HEAT 8428 1903 6525
## 3 TSTM WIND 7461 504 6957
## 4 FLOOD 7259 470 6789
## 5 LIGHTNING 6046 816 5230
## 6 HEAT 3037 937 2100
## 7 FLASH FLOOD 2755 978 1777
## 8 ICE STORM 2064 89 1975
## 9 THUNDERSTORM WIND 1621 133 1488
## 10 WINTER STORM 1527 206 1321
unique(storm2$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm2$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
convert_exp <- function(exp){
case_when(
exp %in% c("+", "-", "", " ", "?") ~ 10^0,
exp %in% c("0","1","2","3","4","5","6","7","8") ~ 10^as.numeric(exp),
exp %in% c("K","k") ~ 10^3,
exp %in% c("M","m") ~ 10^6,
exp %in% c("B") ~ 10^9,
exp %in% c("H","h") ~ 10^2,
TRUE ~ 10^0
)
}
storm2 <- storm2 %>%
mutate(cPROPDMG = PROPDMG * convert_exp(PROPDMGEXP),
cCROPDMG = CROPDMG * convert_exp(CROPDMGEXP)) %>%
select(-PROPDMG, -PROPDMGEXP, -CROPDMG, -CROPDMGEXP)
storm2 <- storm2 %>%
mutate(TOTAL_ECONOMIC_IMPACT = cPROPDMG + cCROPDMG)
head(storm2)
## EVTYPE FATALITIES INJURIES TOTAL_HEALTH cPROPDMG cCROPDMG
## 1 TORNADO 0 15 15 25000 0
## 2 TORNADO 0 0 0 2500 0
## 3 TORNADO 0 2 2 25000 0
## 4 TORNADO 0 2 2 2500 0
## 5 TORNADO 0 2 2 2500 0
## 6 TORNADO 0 6 6 2500 0
## TOTAL_ECONOMIC_IMPACT
## 1 25000
## 2 2500
## 3 25000
## 4 2500
## 5 2500
## 6 2500
economic10 <- storm2 %>%
select(EVTYPE, cPROPDMG, cCROPDMG, TOTAL_ECONOMIC_IMPACT) %>%
group_by(EVTYPE) %>%
summarize(
TOTAL_ECONOMIC_IMPACT = sum(TOTAL_ECONOMIC_IMPACT, na.rm=TRUE),
cPROPDMG = sum(cPROPDMG, na.rm=TRUE),
cCROPDMG = sum(cCROPDMG, na.rm=TRUE)
) %>%
arrange(desc(TOTAL_ECONOMIC_IMPACT)) %>%
slice(1:10)
economic10
## # A tibble: 10 × 4
## EVTYPE TOTAL_ECONOMIC_IMPACT cPROPDMG cCROPDMG
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 150319678257 144657709807 5661968450
## 2 HURRICANE/TYPHOON 71913712800 69305840000 2607872800
## 3 TORNADO 57362333946. 56947380676. 414953270
## 4 STORM SURGE 43323541000 43323536000 5000
## 5 HAIL 18761221986. 15735267513. 3025954473
## 6 FLASH FLOOD 18243991078. 16822673978. 1421317100
## 7 DROUGHT 15018672000 1046106000 13972566000
## 8 HURRICANE 14610229010 11868319010 2741910000
## 9 RIVER FLOOD 10148404500 5118945500 5029459000
## 10 ICE STORM 8967041360 3944927860 5022113500
health10_tidy <- health10 %>%
pivot_longer(cols = c(FATALITIES, INJURIES),
names_to = "variable",
values_to = "value")
ggplot(health10_tidy, aes(x=reorder(EVTYPE, -value), y=value, fill=variable)) +
geom_bar(stat="identity") +
labs(title = "TOP 10 Most Harmful Weather Events on Health",
x = "Event Type", y= "Health Impact") +
theme_classic() +
guides(fill = guide_legend(title = NULL)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Fig.1 Top 10 most harmful weather events on health. Stacked bar plot shows top 10 weather events with the greatest number of the sum of fatalities and injuries. Pink indicates fatalities. Teal indicates injuries.
According to Fig.1, tornadoes, excessive heat, thunderstorm winds, floods, lightning, heat, flash floods, ice storms, thunderstorm winds, and winter storms are the most harmful weather events, listed in order of severity. Tornadoes are extremely dangerous, causing a significant number of both fatalities and injuries.
economic10_tidy <- economic10 %>%
pivot_longer(cols = c(cPROPDMG, cCROPDMG),
names_to = "variable",
values_to = "value")
ggplot(economic10_tidy, aes(x=reorder(EVTYPE, -value), y=value/10^9, fill=variable)) +
geom_bar(stat="identity") +
labs(title = "Top 10 weather events with the greatest economic impacts",
x = "Event Type", y= "Economic Impact (in billion)") +
theme_classic() +
guides(fill = guide_legend(title = NULL)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Fig.2 Top 10 weather events with the greatest economic impact. Stacked bar plot shows top 10 weather events with the greatest cost of the sum of property damages and crop damages. Pink indicates the cost of crop damages. Teal indicates the cost of property damages.
Based on Fig.2, floods, hurricanes/typhoons, tornadoes, storm surges, hail, flash floods, droughts, hurricanes, river floods, and ice storms have caused the most damage to both properties and crops, listed in order of severity. Floods have the greatest economic impact. Most weather events cause damage primarily to properties, but droughts particularly affect crops. River floods and ice storms also cause significant crop damage.