This report details an analysis of the NOAA United States weather event data from 1950 to 2011, focusing on the weather events recorded to have resulted in the most devastating consequences in terms of population health and the national economy. Here, the health impacts are measured by death and injury rates on both an overall and average scale, while the economic impacts are measured using the financial cost of damage to property and crops. This report finds that tornadoes are the most devastating weather events with respect to health from a macro viewpoint, whereas flood and drought pose the biggest threats to the economy; meanwhile, since the 21st century, hurricane/typhoon has become the most disatrous event due to its ability to cause greatest injuries and economic loss per occurrence.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(ggplot2)
library(RColorBrewer)
data <- read.csv('original.csv.bz2')
clean_units <- function(col){
units <- gsub("[0-8]", "10", as.character(toupper(col)))
units <- gsub("^$", "0", units)
units <- gsub('K', '1000', units)
units <- gsub('M', '1000000', units)
units <- gsub('B', '1000000000', units)
units <- gsub('H', '100', units)
units <- gsub('\\+', '1', units)
units <- gsub('-', '0', units)
units <- gsub("\\?", '0', units)
units
}
data$PROPDMGEXP <- as.numeric(clean_units(data$PROPDMGEXP))
data$CROPDMGEXP <- as.numeric(clean_units(data$CROPDMGEXP))
data$PD <- data$PROPDMG * data$PROPDMGEXP
data$CD <- data$CROPDMG * data$CROPDMGEXP
num_type <- length(unique(data$EVTYPE))
There are 985 different disaster types in total, and the top 10 ones in terms of total occurrences are listed below.
tab <- sort(table(data$EVTYPE), decreasing=TRUE)
head(tab, 10)
##
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW
## 15708
According to the record, the most frequent type of natural disaster is Hail, followed by TSTM Wind, Thunderstorm wind and Tornado, respectively.
However, it is noticeable that the distribution is substantially skewed. As suggested by the quantile. 50% of the event types have been recorded to occurr no more than twice, and only 1% of the events have more than 12361 records.
quantile(tab, probs=c(0.25, 0.5, 0.75, 0.90, 0.99, 1))
## 25% 50% 75% 90% 99% 100%
## 1.0 2.0 5.0 35.2 12360.6 288661.0
max_death_event <- as.character(data[data$FATALITIES == 583, 'EVTYPE'])
The average number of deaths from each record of disaster is 0.0167849, with the maximum number of 583 deaths because of HEAT.
summary(data$FATALITIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.0168 0.0000 583.0000
max_death_event
## [1] "HEAT"
Also, at least 99% of the recorded disaster did not lead to any deaths.
quantile(data$FATALITIES, probs=c(0.25, 0.5, 0.75, 0.99, 1))
## 25% 50% 75% 99% 100%
## 0 0 0 0 583
max_injury_event <- as.character(data[data$INJURIES == 1700, 'EVTYPE'])
From the viewpoint of injuries, the mean value is 0.1557447, and the the maximum number of 1700 injuries resulted from TORNADO.
summary(data$INJURIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1557 0.0000 1700.0000
max_injury_event
## [1] "TORNADO"
Also, at least 98% of the recorded disaster did not lead to any injuries.
quantile(data$INJURIES, probs=c(0.25, 0.5, 0.75, 0.98, 1))
## 25% 50% 75% 98% 100%
## 0 0 0 0 1700
max_pd_event <- as.character(data[which.max(data$PD), 'EVTYPE'])
max_cd_event <- as.character(data[which.max(data$CD), 'EVTYPE'])
The average number of property damage from each record of disaster is 4.735898710^{5}, with the maximum number of 1.1510^{11} deaths because of FLOOD.
summary(data$PD)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 4.736e+05 5.000e+02 1.150e+11
max_pd_event
## [1] "FLOOD"
The average number of crop damage from each record of disaster is 5.442132110^{4}, with the maximum number of 510^{9} caused by RIVER FLOOD.
summary(data$CD)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
max_cd_event
## [1] "RIVER FLOOD"
get_event <- function(type, data){
sum <- data %>% group_by(EVTYPE) %>%
summarise(total_death = sum(FATALITIES, na.rm=TRUE), avg_death = mean(FATALITIES, na.rm=TRUE),
total_inj = sum(INJURIES, na.rm=TRUE), avg_inj = mean(INJURIES, na.rm=TRUE))
event = cbind(type, sum[which.max(sum[[type]]), c('EVTYPE', type)])
names(event) <- c('Indicator','Event_type', 'Number')
event
}
table_health <- rbind(get_event('total_death', data), get_event('total_inj', data),
get_event('avg_death', data), get_event('avg_inj', data))
If the total number of deaths and injuries are taken into account, tornado seems to be the most harmful natural disaster to population health across the United States. However, since the number of recordes differs across various types of disasters, such high numbers could perhaps be partly contributed by its high frequencies (ranked 4th) as addressed in the 2-1 section. Moreover, the combination of tornadoes, TSTM wind and hail led to the highest average death rates. Hence, the high death rates plus high frequency combined should make tornado a concern in any preventive decision-making process.
table_health
## Indicator Event_type Number
## 1 total_death TORNADO 5633
## 2 total_inj TORNADO 91346
## 3 avg_death TORNADOES, TSTM WIND, HAIL 25
## 4 avg_inj Heat Wave 70
Also, the highest number of injuries on average was caused by heat wave, which should also be taken into account by relevant department.
More recent data should be more relevant to the policy-making at present. If we use the data collected within the 21st century, same implications can be derived for tornados (total deaths/injuries & high frequency); however, now it is the rough seas and hurricane/typhoon that would lead to most average deaths and injuries, respectively.
data$Year <- year(mdy_hms(data$BGN_DATE))
data_2000 <- filter(data, Year >= 2000)
tab <- sort(table(data_2000$EVTYPE), decreasing=TRUE)
head(tab, 10)
##
## HAIL TSTM WIND THUNDERSTORM WIND FLASH FLOOD
## 165719 85007 81402 40585
## FLOOD TORNADO HIGH WIND HEAVY SNOW
## 19961 17687 16411 10901
## WINTER STORM LIGHTNING
## 9774 9686
table_health_2000 <- rbind(get_event('total_death', data_2000), get_event('total_inj', data_2000),
get_event('avg_death', data_2000), get_event('avg_inj', data_2000))
table_health_2000
## Indicator Event_type Number
## 1 total_death TORNADO 1193.000000
## 2 total_inj TORNADO 15213.000000
## 3 avg_death ROUGH SEAS 2.666667
## 4 avg_inj HURRICANE/TYPHOON 14.488636
table_health$Year <- "1950-2011"
table_health_2000$Year <- "2000-2011"
t <- rbind(table_health, table_health_2000)
To wrap up, the most harmful types of events to population health (as measured by death and injury rates) over two distinct time periods are summarized below in 4 barcharts below, each representing a different indicator adopted to assess the degree of harm.
par(mfcol=c(2,2), mai=c(0.5, 0.8, 0.6, 0.2))
barplot(t[t$Indicator=='total_death', ]$Number, names.arg=c('Tornado\n1950-2011','Tornado\n2000-2011'),
ylab='incidents', col=c(brewer.pal(4,'Reds')[4:3]), main='Total Deaths')
barplot(t[t$Indicator=='avg_death', ]$Number,
names.arg=c('Tornadoes, Tstm Wind, Hail\n1950-2011', 'Rough seas\n2000-2011'),
ylab='incidents', col=c(brewer.pal(4,'Reds')[2:1]), main="Average Deaths")
barplot(t[t$Indicator=='total_inj', ]$Number, names.arg=c('Tornado\n1950-2011','Tornado\n2000-2011'),
ylab='incidents', col=c(brewer.pal(4,'Blues')[4:3]), main='Total Injuries')
barplot(t[t$Indicator=='avg_inj', ]$Number, names.arg=c('Heat Wave\n1950-2011', 'Hurricane/Typhoon\n2000-2011'),
ylab='incidents', col=c(brewer.pal(4,'Blues')[2:1]), main="Average Injuries")
get_event <- function(type, data){
sum <- data %>% group_by(EVTYPE) %>%
summarise(total_pd = sum(PD, na.rm=TRUE), avg_pd = mean(PD, na.rm=TRUE),
total_cd = sum(CD, na.rm=TRUE), avg_cd = mean(CD, na.rm=TRUE))
event = cbind(type, sum[which.max(sum[[type]]), c('EVTYPE', type)])
names(event) <- c('Indicator','Event_type', 'Number')
event
}
table_econ <- rbind(get_event('total_pd', data), get_event('total_cd', data),
get_event('avg_pd', data), get_event('avg_cd', data))
table_econ$Year <- "1950-2011"
table_econ_2000 <- rbind(get_event('total_pd', data_2000), get_event('total_cd', data_2000),
get_event('avg_pd', data_2000), get_event('avg_cd', data_2000))
table_econ_2000$Year <- "2000-2011"
t_econ <- rbind(table_econ, table_econ_2000)
table_econ
## Indicator Event_type Number Year
## 1 total_pd FLOOD 144657709800 1950-2011
## 2 total_cd DROUGHT 13972566000 1950-2011
## 3 avg_pd TORNADOES, TSTM WIND, HAIL 1600000000 1950-2011
## 4 avg_cd EXCESSIVE WETNESS 142000000 1950-2011
table_econ_2000
## Indicator Event_type Number Year
## 1 total_pd FLOOD 134691074080 2000-2011
## 2 total_cd DROUGHT 9135585000 2000-2011
## 3 avg_pd HURRICANE/TYPHOON 787566364 2000-2011
## 4 avg_cd HURRICANE/TYPHOON 29634918 2000-2011
Flood has been the most harmful disaster to total property damage from both a historic (1950-2011) and contemporary (2000-2011) point of view, with the damage up to over 144 billion and 134 billion dollars, respectively.
Similarly, Drought reveals the same story of its persistent damage to total crop values, causing almost 14 billion loss along 1950-2011 and over 9 billion loss along 2000-2011.
In terms of average damage values over history (1950-2011), property suffers most from the combination of tornadoes, tstm wind and hail with the mean loss of 1.6 billion whereas crop suffers most from excessive wetness with the mean loss of 0.142 billion.
Since the 21st century, it is the hurricane/typhoon that has resulted in greatest average loss for both property and crop damage, as indicated by almost 788 million and 30 million, respectively.
The summary figure is shown as follows:
par(mfcol=c(2,2), mai=c(0.5, 0.8, 0.6, 0.2))
barplot(t_econ[t_econ$Indicator=='total_pd', ]$Number, names.arg=c('Flood\n1950-2011','Flood\n2000-2011'),
ylab='value($)', col=c(brewer.pal(4,'Reds')[4:3]), main='Total Property Damage')
barplot(t_econ[t_econ$Indicator=='avg_pd', ]$Number,
names.arg=c('Tornadoes, Tstm Wind, Hail\n1950-2011', 'Hurricane/Typhoon\n2000-2011'),
ylab='value($)', col=c(brewer.pal(4,'Reds')[2:1]), main="Average Property Damage")
barplot(t_econ[t_econ$Indicator=='total_cd', ]$Number,
names.arg=c('Drought\n1950-2011','Drought\n2000-2011'),
ylab='value($)', col=c(brewer.pal(4,'Blues')[4:3]), main='Total Crop Damage')
barplot(t_econ[t_econ$Indicator=='avg_cd', ]$Number,
names.arg=c('Excessive Wetness\n1950-2011', 'Hurricane/Typhoon\n2000-2011'),
ylab='value($)', col=c(brewer.pal(4,'Blues')[2:1]), main="Average Crop Damage")
It shows the most harmful types of events to national economy (as measured by property damage and crop damage) over two distinct time periods using 4 barcharts, each indicating a different indicator adopted to assess the degree of economic loss.
To conclude, tornadoes seem to have always been the biggest concern in terms of causing most deaths and injuries in total, whereas flood and drought pose the biggest threats to the economy.
On average, disasters such as tornadoes, tstm wind & hail, heatwaves as well as excessive wetness used to be concerning given the large deaths, injuries and economic loss brought by them; however, nowadays hurricanes/typhoons seem to overtake their place as the most disatrous event type leading to greatest injuries and economic loss.