Consequences of natural disasters for population health and economy

Synopsis

The purpose of this analysis is to deepen our understanding on the harmful effects of natural disasters. Those harmful effects range from public health, as in fatalities and injuries, to economic, as in property damage. The project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events in the database span from the year 1950 to November 2011.

Data Processing

Preparing for the analysis, reading and loading the data:

Loading the dplyr library that will be later used to execute the analysis.

library(dplyr)

Downloading and reading the data. The read.csv function is used to directly unzip and read the bz2 file.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","./stormdata.csv.bz2")
data <- read.csv("./stormdata.csv.bz2")

Using dim we can see the number of observations and variables.

dim(data)
## [1] 902297     37

Exploring and cleaning the EVTYPE variable:

The main variable for the disaster type is the EVTYPE variable. As the particular variable is in very bad shape, we will try to clean it by grouping similar events together. To do that we first need to explore the data to get an idea of the different categories.

str(data$EVTYPE)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
summary(data$EVTYPE)
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE            FREEZING RAIN 
##                      304                      261                      250 
##              URBAN FLOOD     HEAVY SURF/HIGH SURF        EXTREME WINDCHILL 
##                      249                      228                      204 
##             STRONG WINDS           DRY MICROBURST    ASTRONOMICAL LOW TIDE 
##                      196                      186                      174 
##                HURRICANE              RIVER FLOOD               LIGHT SNOW 
##                      174                      173                      154 
##         STORM SURGE/TIDE            RECORD WARMTH         COASTAL FLOODING 
##                      148                      146                      143 
##               DUST DEVIL         MARINE HIGH WIND        UNSEASONABLY WARM 
##                      141                      135                      126 
##                 FLOODING   ASTRONOMICAL HIGH TIDE        MODERATE SNOWFALL 
##                      120                      103                      101 
##           URBAN FLOODING               WINTRY MIX        HURRICANE/TYPHOON 
##                       98                       90                       88 
##            FUNNEL CLOUDS               HEAVY SURF              RECORD HEAT 
##                       87                       84                       81 
##                   FREEZE                HEAT WAVE                     COLD 
##                       74                       74                       72 
##              RECORD COLD                      ICE  THUNDERSTORM WINDS HAIL 
##                       64                       61                       61 
##      TROPICAL DEPRESSION                    SLEET         UNSEASONABLY DRY 
##                       60                       59                       56 
##                    FROST              GUSTY WINDS      THUNDERSTORM WINDSS 
##                       53                       53                       51 
##       MARINE STRONG WIND                    OTHER               SMALL HAIL 
##                       48                       48                       47 
##                   FUNNEL             FREEZING FOG             THUNDERSTORM 
##                       46                       45                       45 
##       Temperature record          TSTM WIND (G45)         Coastal Flooding 
##                       43                       39                       38 
##              WATERSPOUTS    MONTHLY PRECIPITATION                    WINDS 
##                       37                       36                       36 
##                  (Other) 
##                     2940

After exploring the data a little we grepl some keywords and set the events to the appropriate categories:

evtype <- tolower(data$EVTYPE)

grepped <- grepl("wind", evtype)
evtype[which(grepped)] <- "strong wind"
grepped <- grepl("winter|wintry", evtype)
evtype[which(grepped)] <- "winter weather"
grepped <- grepl("snow", evtype)
evtype[which(grepped)] <- "snow"
grepped <- grepl("thunderstorm", evtype)
evtype[which(grepped)] <- "thunderstorm"
grepped <- grepl("flood", evtype)
evtype[which(grepped)] <- "flooding"
grepped <- grepl("current", evtype)
evtype[which(grepped)] <- "rip current"
grepped <- grepl("hurricane", evtype)
evtype[which(grepped)] <- "hurricane"
grepped <- grepl("tornado", evtype)
evtype[which(grepped)] <- "tornado"
grepped <- grepl("heat", evtype)
evtype[which(grepped)] <- "heat"
grepped <- grepl("surf", evtype)
evtype[which(grepped)] <- "high surf"


data[,8] <- as.factor(evtype)
str(data$EVTYPE)
##  Factor w/ 427 levels " lightning"," waterspout",..: 356 356 356 356 356 356 356 356 356 356 ...

There are many more duplicate categories that are hard to fix manually. However this much cleaning should suffice for our analysis, especially if we get big margins between each category in our results.

Cleaning the PROPDMGEXP and CROPDMGEXP variables:

Another thing that needs fixing is the damage on property and on crops. Our data provide us with an abbreviation based on if the number is in the thousands, the millions or the billions of dollars.

grepped <- grepl("K|k", data$PROPDMGEXP)
data[which(grepped), 25] <- data[which(grepped), 25] * 1000
grepped <- grepl("M|m", data$PROPDMGEXP)
data[which(grepped), 25] <- data[which(grepped), 25] * 1000000
grepped <- grepl("B|b", data$PROPDMGEXP)
data[which(grepped), 25] <- data[which(grepped), 25] * 1000000000

grepped <- grepl("K|k", data$CROPDMGEXP)
data[which(grepped), 27] <- data[which(grepped), 27] * 1000
grepped <- grepl("M|m", data$CROPDMGEXP)
data[which(grepped), 27] <- data[which(grepped), 27] * 1000000
grepped <- grepl("B|b", data$CROPDMGEXP)
data[which(grepped), 27] <- data[which(grepped), 27] * 1000000000

Results

Effects on public health

First we will only work with disasters that have at least one injury or death:

injs <- tapply(data$INJURIES, data$EVTYPE, sum)
bool <- injs != 0
injs <- injs[which(bool)]

fatlts <- tapply(data$FATALITIES, data$EVTYPE, sum)
bool <- fatlts != 0
fatlts <- fatlts[which(bool)]

Then we plot the data to barplots, setting the appropriate attributes for the plot to be as descriptive as possible:

injs <- sort(injs,TRUE)
fatlts <- sort(fatlts,TRUE)
par(mfrow = c(1, 2), las = 2, mar = c(6, 4, 2, 2))
barplot(injs[1:6], col = c("red", "lightblue", "yellow", "grey", "blue", "white"), main = "Injuries by Disaster")
barplot(fatlts[1:6], col = c("red", "lightblue", "yellow", "grey", "blue", "white"), main = "Deaths by Disaster")

From the above plots we can see that by far the most harmful disaster to public health is tornadoes.

Effects on properties and crops

Using the same method as before:

props <- tapply(data$PROPDMG, data$EVTYPE, sum)
bool <- props != 0
props <- props[which(bool)]

crops <- tapply(data$CROPDMG, data$EVTYPE, sum)
bool <- crops != 0
crops <- crops[which(bool)]

Then we plot the data to barplots, setting the appropriate attributes for the plot to be as descriptive as possible:

props <- sort(props,TRUE)
crops <- sort(crops,TRUE)
par(mfrow = c(1, 2), las = 2, mar = c(6, 4, 2, 2))
barplot(props[1:6], col = c("red", "lightblue", "yellow", "grey", "blue", "white"), main = "Property Damage by Disaster")
barplot(crops[1:6], col = c("red", "lightblue", "yellow", "grey", "blue", "white"), main = "Crop Damage by Disaster")

From the above plots we can see that by far the most harmful disaster to property is flooding followed by hurricanes, while to crops the first place goes to drought, followed by flooding.

Conclusion

As we discovered from our plots, we can see that by far the most harmful disaster to public health is tornadoes. Regarding the property damage, the most harmful disasters are flooding followed by hurricanes and to crops the most harmful ones are drought followed by flooding.

This report should indicate where the resources to both prevention and protection from these natural disasters should be allocated.