Title: Most Impactful Weather Events in the United States: Health and Economic Consequences

Synopsis

This report details an analysis of the NOAA United States weather event data from 1950 to 2011, focusing on the weather events recorded to have resulted in the most devastating consequences in terms of population health and the national economy. Here, the health impacts are measured by death and injury rates on both an overall and average scale, while the economic impacts are measured using the financial cost of damage to property and crops. This report finds that tornadoes are the most devastating weather events with respect to health from a macro viewpoint, whereas flood and drought pose the biggest threats to the economy; meanwhile, since the 21st century, hurricane/typhoon has become the most disatrous event due to its ability to cause greatest injuries and economic loss per occurrence.

1. Data Processing

1-1. Load the Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(ggplot2)
library(RColorBrewer)

1-2. Read the Data

data <- read.csv('original.csv.bz2')

1-3 Clean the Columns That End with ~EXP

clean_units <- function(col){
    units <- gsub("[0-8]", "10", as.character(toupper(col)))
    units <- gsub("^$", "0", units)
    units <- gsub('K', '1000', units)
    units <- gsub('M', '1000000', units)
    units <- gsub('B', '1000000000', units)
    units <- gsub('H', '100', units)
    units <- gsub('\\+', '1', units)
    units <- gsub('-', '0', units)
    units <- gsub("\\?", '0', units)
    units
}

data$PROPDMGEXP <- as.numeric(clean_units(data$PROPDMGEXP))
data$CROPDMGEXP <- as.numeric(clean_units(data$CROPDMGEXP))
data$PD <- data$PROPDMG * data$PROPDMGEXP
data$CD <- data$CROPDMG * data$CROPDMGEXP

2. Descriptive Statistics

2-1. Check the Distribution of the Event Types

num_type <- length(unique(data$EVTYPE))

There are 985 different disaster types in total, and the top 10 ones in terms of total occurrences are listed below.

tab <- sort(table(data$EVTYPE), decreasing=TRUE)
head(tab, 10)
## 
##               HAIL          TSTM WIND  THUNDERSTORM WIND 
##             288661             219940              82563 
##            TORNADO        FLASH FLOOD              FLOOD 
##              60652              54277              25326 
## THUNDERSTORM WINDS          HIGH WIND          LIGHTNING 
##              20843              20212              15754 
##         HEAVY SNOW 
##              15708

According to the record, the most frequent type of natural disaster is Hail, followed by TSTM Wind, Thunderstorm wind and Tornado, respectively.

However, it is noticeable that the distribution is substantially skewed. As suggested by the quantile. 50% of the event types have been recorded to occurr no more than twice, and only 1% of the events have more than 12361 records.

quantile(tab, probs=c(0.25, 0.5, 0.75, 0.90, 0.99, 1))
##      25%      50%      75%      90%      99%     100% 
##      1.0      2.0      5.0     35.2  12360.6 288661.0

2-2 Distribution of Fatalities & Injuries

max_death_event <- as.character(data[data$FATALITIES == 583, 'EVTYPE'])

The average number of deaths from each record of disaster is 0.0167849, with the maximum number of 583 deaths because of HEAT.

summary(data$FATALITIES)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
max_death_event
## [1] "HEAT"

Also, at least 99% of the recorded disaster did not lead to any deaths.

quantile(data$FATALITIES, probs=c(0.25, 0.5, 0.75, 0.99, 1))
##  25%  50%  75%  99% 100% 
##    0    0    0    0  583
max_injury_event <- as.character(data[data$INJURIES == 1700, 'EVTYPE'])

From the viewpoint of injuries, the mean value is 0.1557447, and the the maximum number of 1700 injuries resulted from TORNADO.

summary(data$INJURIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000
max_injury_event
## [1] "TORNADO"

Also, at least 98% of the recorded disaster did not lead to any injuries.

quantile(data$INJURIES, probs=c(0.25, 0.5, 0.75, 0.98, 1))
##  25%  50%  75%  98% 100% 
##    0    0    0    0 1700

2-3 Distribution of Property & Crop Damage

max_pd_event <- as.character(data[which.max(data$PD), 'EVTYPE'])
max_cd_event <- as.character(data[which.max(data$CD), 'EVTYPE'])

The average number of property damage from each record of disaster is 4.735898710^{5}, with the maximum number of 1.1510^{11} deaths because of FLOOD.

summary(data$PD)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.736e+05 5.000e+02 1.150e+11
max_pd_event
## [1] "FLOOD"

The average number of crop damage from each record of disaster is 5.442132110^{4}, with the maximum number of 510^{9} caused by RIVER FLOOD.

summary(data$CD)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
max_cd_event
## [1] "RIVER FLOOD"

3. Results

3-1 Most Harmful Disasters to Population Health: 1950-2011

get_event <- function(type, data){
      sum <- data %>% group_by(EVTYPE) %>% 
      summarise(total_death = sum(FATALITIES, na.rm=TRUE), avg_death = mean(FATALITIES, na.rm=TRUE), 
                  total_inj = sum(INJURIES, na.rm=TRUE), avg_inj = mean(INJURIES, na.rm=TRUE))

      event = cbind(type, sum[which.max(sum[[type]]), c('EVTYPE', type)])
      names(event) <- c('Indicator','Event_type', 'Number')
      event
      }

table_health <- rbind(get_event('total_death', data), get_event('total_inj', data), 
               get_event('avg_death', data), get_event('avg_inj', data))

If the total number of deaths and injuries are taken into account, tornado seems to be the most harmful natural disaster to population health across the United States. However, since the number of recordes differs across various types of disasters, such high numbers could perhaps be partly contributed by its high frequencies (ranked 4th) as addressed in the 2-1 section. Moreover, the combination of tornadoes, TSTM wind and hail led to the highest average death rates. Hence, the high death rates plus high frequency combined should make tornado a concern in any preventive decision-making process.

table_health
##     Indicator                 Event_type Number
## 1 total_death                    TORNADO   5633
## 2   total_inj                    TORNADO  91346
## 3   avg_death TORNADOES, TSTM WIND, HAIL     25
## 4     avg_inj                  Heat Wave     70

Also, the highest number of injuries on average was caused by heat wave, which should also be taken into account by relevant department.

3-2 Most Harmful Disasters to Population Health: 2000-2011

More recent data should be more relevant to the policy-making at present. If we use the data collected within the 21st century, same implications can be derived for tornados (total deaths/injuries & high frequency); however, now it is the rough seas and hurricane/typhoon that would lead to most average deaths and injuries, respectively.

data$Year <- year(mdy_hms(data$BGN_DATE))
data_2000 <- filter(data, Year >= 2000)
tab <- sort(table(data_2000$EVTYPE), decreasing=TRUE)
head(tab, 10)
## 
##              HAIL         TSTM WIND THUNDERSTORM WIND       FLASH FLOOD 
##            165719             85007             81402             40585 
##             FLOOD           TORNADO         HIGH WIND        HEAVY SNOW 
##             19961             17687             16411             10901 
##      WINTER STORM         LIGHTNING 
##              9774              9686
table_health_2000 <- rbind(get_event('total_death', data_2000), get_event('total_inj', data_2000), 
               get_event('avg_death', data_2000), get_event('avg_inj', data_2000))

table_health_2000
##     Indicator        Event_type       Number
## 1 total_death           TORNADO  1193.000000
## 2   total_inj           TORNADO 15213.000000
## 3   avg_death        ROUGH SEAS     2.666667
## 4     avg_inj HURRICANE/TYPHOON    14.488636
table_health$Year <- "1950-2011"
table_health_2000$Year <- "2000-2011"
t <- rbind(table_health, table_health_2000)

To wrap up, the most harmful types of events to population health (as measured by death and injury rates) over two distinct time periods are summarized below in 4 barcharts below, each representing a different indicator adopted to assess the degree of harm.

par(mfcol=c(2,2), mai=c(0.5, 0.8, 0.6, 0.2))
barplot(t[t$Indicator=='total_death', ]$Number, names.arg=c('Tornado\n1950-2011','Tornado\n2000-2011'), 
        ylab='incidents', col=c(brewer.pal(4,'Reds')[4:3]), main='Total Deaths')

barplot(t[t$Indicator=='avg_death', ]$Number, 
        names.arg=c('Tornadoes, Tstm Wind, Hail\n1950-2011', 'Rough seas\n2000-2011'), 
        ylab='incidents', col=c(brewer.pal(4,'Reds')[2:1]), main="Average Deaths")

barplot(t[t$Indicator=='total_inj', ]$Number, names.arg=c('Tornado\n1950-2011','Tornado\n2000-2011'), 
        ylab='incidents', col=c(brewer.pal(4,'Blues')[4:3]), main='Total Injuries')

barplot(t[t$Indicator=='avg_inj', ]$Number, names.arg=c('Heat Wave\n1950-2011', 'Hurricane/Typhoon\n2000-2011'), 
        ylab='incidents', col=c(brewer.pal(4,'Blues')[2:1]), main="Average Injuries")

3-3 Most Harmful Disasters to Economic Development

get_event <- function(type, data){
      sum <- data %>% group_by(EVTYPE) %>% 
      summarise(total_pd = sum(PD, na.rm=TRUE), avg_pd = mean(PD, na.rm=TRUE), 
                  total_cd = sum(CD, na.rm=TRUE), avg_cd = mean(CD, na.rm=TRUE))

      event = cbind(type, sum[which.max(sum[[type]]), c('EVTYPE', type)])
      names(event) <- c('Indicator','Event_type', 'Number')
      event
      }

table_econ <- rbind(get_event('total_pd', data), get_event('total_cd', data), 
               get_event('avg_pd', data), get_event('avg_cd', data))

table_econ$Year <- "1950-2011"

table_econ_2000 <- rbind(get_event('total_pd', data_2000), get_event('total_cd', data_2000), 
               get_event('avg_pd', data_2000), get_event('avg_cd', data_2000))

table_econ_2000$Year <- "2000-2011"
t_econ <- rbind(table_econ, table_econ_2000)
table_econ
##   Indicator                 Event_type       Number      Year
## 1  total_pd                      FLOOD 144657709800 1950-2011
## 2  total_cd                    DROUGHT  13972566000 1950-2011
## 3    avg_pd TORNADOES, TSTM WIND, HAIL   1600000000 1950-2011
## 4    avg_cd          EXCESSIVE WETNESS    142000000 1950-2011
table_econ_2000
##   Indicator        Event_type       Number      Year
## 1  total_pd             FLOOD 134691074080 2000-2011
## 2  total_cd           DROUGHT   9135585000 2000-2011
## 3    avg_pd HURRICANE/TYPHOON    787566364 2000-2011
## 4    avg_cd HURRICANE/TYPHOON     29634918 2000-2011

Flood has been the most harmful disaster to total property damage from both a historic (1950-2011) and contemporary (2000-2011) point of view, with the damage up to over 144 billion and 134 billion dollars, respectively.

Similarly, Drought reveals the same story of its persistent damage to total crop values, causing almost 14 billion loss along 1950-2011 and over 9 billion loss along 2000-2011.

In terms of average damage values over history (1950-2011), property suffers most from the combination of tornadoes, tstm wind and hail with the mean loss of 1.6 billion whereas crop suffers most from excessive wetness with the mean loss of 0.142 billion.

Since the 21st century, it is the hurricane/typhoon that has resulted in greatest average loss for both property and crop damage, as indicated by almost 788 million and 30 million, respectively.

The summary figure is shown as follows:

par(mfcol=c(2,2), mai=c(0.5, 0.8, 0.6, 0.2))


barplot(t_econ[t_econ$Indicator=='total_pd', ]$Number, names.arg=c('Flood\n1950-2011','Flood\n2000-2011'), 
        ylab='value($)', col=c(brewer.pal(4,'Reds')[4:3]), main='Total Property Damage')

barplot(t_econ[t_econ$Indicator=='avg_pd', ]$Number, 
        names.arg=c('Tornadoes, Tstm Wind, Hail\n1950-2011', 'Hurricane/Typhoon\n2000-2011'), 
        ylab='value($)', col=c(brewer.pal(4,'Reds')[2:1]), main="Average Property Damage")

barplot(t_econ[t_econ$Indicator=='total_cd', ]$Number, 
        names.arg=c('Drought\n1950-2011','Drought\n2000-2011'), 
        ylab='value($)', col=c(brewer.pal(4,'Blues')[4:3]), main='Total Crop Damage')

barplot(t_econ[t_econ$Indicator=='avg_cd', ]$Number, 
        names.arg=c('Excessive Wetness\n1950-2011', 'Hurricane/Typhoon\n2000-2011'), 
        ylab='value($)', col=c(brewer.pal(4,'Blues')[2:1]), main="Average Crop Damage")

It shows the most harmful types of events to national economy (as measured by property damage and crop damage) over two distinct time periods using 4 barcharts, each indicating a different indicator adopted to assess the degree of economic loss.

4. Conclusion

To conclude, tornadoes seem to have always been the biggest concern in terms of causing most deaths and injuries in total, whereas flood and drought pose the biggest threats to the economy.

On average, disasters such as tornadoes, tstm wind & hail, heatwaves as well as excessive wetness used to be concerning given the large deaths, injuries and economic loss brought by them; however, nowadays hurricanes/typhoons seem to overtake their place as the most disatrous event type leading to greatest injuries and economic loss.