HEALTH AND ECONOMIC IMPACT OF DIFFERENT EVENTS IN THE US

Synopsis

I did a simple analysis and found that tornados are the ones that kill the most, and floods the events that cost the most.

Data Processing

  1. Install “R.utils” package, download, unbzip2 and load the data into a storm variable
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings

First things first, after a quick look at the data, and reading the codebooks the variables we need to process to answer the given questions are:
-Types of events: EVTYPE
-Population health:FATALITIES, INJURIES
-economic consecuences: PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP

Because of that reason, I will make a new data frame only with desired columns.

storm2 <- data.frame(EVTYPE=storm$EVTYPE,FATALITIES=storm$FATALITIES, INJURIES=storm$INJURIES, PROPDMG=storm$PROPDMG, PROPDMGEXP=storm$PROPDMGEXP, CROPDMG=storm$CROPDMG, CROPDMGEXP=storm$CROPDMGEXP )
rm(storm)

The important thing is to convert de economic variables to full numbers. As it is agreed, B= billions, M= millions,K= thousands,H= hundreds. Other signs will be converted to zero. The following function will change all the letters to their corresponding 10 potential. All the other numbers and signs will be changed to zero.

storm2$PROPDMGEXP <- as.character(storm2$PROPDMGEXP)

storm2$PROPDMGEXP[storm2$PROPDMGEXP=="-"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="?"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="+"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="0"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="1"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="2"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="3"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="4"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="5"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="6"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="7"] <- "0"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="8"] <- "0"

storm2$PROPDMGEXP[storm2$PROPDMGEXP=="K"] <- "3"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="B"] <- "9"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="b"] <- "9"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="M"] <- "6"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="m"] <- "6"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="H"] <- "2"
storm2$PROPDMGEXP[storm2$PROPDMGEXP=="h"] <- "2"

storm2$CROPDMGEXP<- as.character(storm2$CROPDMGEXP)
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="-"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="?"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="+"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="0"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="1"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="2"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="3"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="4"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="5"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="6"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="7"] <- "0"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="8"] <- "0"

storm2$CROPDMGEXP[storm2$CROPDMGEXP=="K"] <- "3"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="B"] <- "9"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="b"] <- "9"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="M"] <- "6"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="m"] <- "6"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="H"] <- "2"
storm2$CROPDMGEXP[storm2$CROPDMGEXP=="h"] <- "2"

storm2$PROPDMGEXP <- as.numeric(storm2$PROPDMGEXP)
storm2$CROPDMGEXP<- as.numeric(storm2$CROPDMGEXP)
## Warning: NAs introduced by coercion
storm2$CROPDMGEXP[is.na(storm2$CROPDMGEXP)] <- 0
storm2$PROPDMGEXP[is.na(storm2$PROPDMGEXP)] <- 0

After this, I will make a two new columns, multiplying DMGEXP and DMG columns for each crop and property.

storm2$NETDMG <- (storm2$PROPDMG*10^storm2$PROPDMGEXP)+(storm2$CROPDMG*10^storm2$CROPDMGEXP)

I will make a simplier data frame with the event types, total number of fatalities and total economic cost.

storm3 <- data.frame(EVTYPE=storm2$EVTYPE,FATALITIES=storm2$FATALITIES, INJURIES=storm2$INJURIES,NETDMG=storm2$NETDMG)

To finish this part, I will aggregate the data by event, and create new data frames for each injuries, fatalities and net cost.

storm_conc <-aggregate(. ~ EVTYPE, storm3, sum)
str(storm_conc)
## 'data.frame':    985 obs. of  4 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ NETDMG    : num  200000 0 50000 0 8100000 8000 0 0 5000 0 ...
#health_fatalities and health_injuries, self explanatory title for this new data frames by descending order
health_injuries <- storm_conc[order(-storm_conc$INJURIES),]
health_injuries <- data.frame(EVTYPE=as.character(health_injuries$EVTYPE), INJURIES=health_injuries$INJURIES)
health_fatalities <- storm_conc[order(-storm_conc$FATALITIES),]
health_fatalities <- data.frame(EVTYPE=as.character(health_fatalities$EVTYPE), FATALITIES=health_fatalities$FATALITIES)

net_costs <- storm_conc[order(-storm_conc$NETDMG),]
net_costs <- data.frame(EVTYPE=net_costs$EVTYPE, NETDMG=net_costs$NETDMG)

rm(storm2,storm3,storm_conc)

Results

Now that our data set is tidy, I can proceed to answer the questions.

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

After analysing the tidy data frame health_fatalities, I realise only the first 10 have a long number of fatalities. Only the first 20 have more than 100 deaths.

head(health_fatalities, n=22)
##                     EVTYPE FATALITIES
## 1                  TORNADO       5633
## 2           EXCESSIVE HEAT       1903
## 3              FLASH FLOOD        978
## 4                     HEAT        937
## 5                LIGHTNING        816
## 6                TSTM WIND        504
## 7                    FLOOD        470
## 8              RIP CURRENT        368
## 9                HIGH WIND        248
## 10               AVALANCHE        224
## 11            WINTER STORM        206
## 12            RIP CURRENTS        204
## 13               HEAT WAVE        172
## 14            EXTREME COLD        160
## 15       THUNDERSTORM WIND        133
## 16              HEAVY SNOW        127
## 17 EXTREME COLD/WIND CHILL        125
## 18             STRONG WIND        103
## 19                BLIZZARD        101
## 20               HIGH SURF        101
## 21              HEAVY RAIN         98
## 22            EXTREME HEAT         96

As it can be seen, tornado and excessive heat are by far the two events that killed most people.

For the total number of injuries, I will take only in consideration the first 14 causes because they’re above 1000 people injured.

head(health_injuries, n=22) 
##                EVTYPE INJURIES
## 1             TORNADO    91346
## 2           TSTM WIND     6957
## 3               FLOOD     6789
## 4      EXCESSIVE HEAT     6525
## 5           LIGHTNING     5230
## 6                HEAT     2100
## 7           ICE STORM     1975
## 8         FLASH FLOOD     1777
## 9   THUNDERSTORM WIND     1488
## 10               HAIL     1361
## 11       WINTER STORM     1321
## 12  HURRICANE/TYPHOON     1275
## 13          HIGH WIND     1137
## 14         HEAVY SNOW     1021
## 15           WILDFIRE      911
## 16 THUNDERSTORM WINDS      908
## 17           BLIZZARD      805
## 18                FOG      734
## 19   WILD/FOREST FIRE      545
## 20         DUST STORM      440
## 21     WINTER WEATHER      398
## 22          DENSE FOG      342

As it can be seen tornado again is the event with most injuries, followed by other four (TMST wind, flood, excessive heat and ligthning).

Across the United States, which types of events have the greatest economic consequences?

As you can see, flood is by far the most expensive type of event.

head(net_costs, n=10)
##               EVTYPE       NETDMG
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57352114049
## 4        STORM SURGE  43323541000
## 5               HAIL  18757805433
## 6        FLASH FLOOD  17562129167
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041360
plot(NETDMG[1:5]~factor(EVTYPE[1:5]), las=2, net_costs,xlab="", main="Total costs per event", type="l")

dev.off()
## null device 
##           1

Discusion

There are several points that dont make this analysis accurate. First, tornados are the longuest documented type of event, the others started being documented years later, giving tornados some advantage over the others. Also, total economic cost is different depending on the year, because of the total population and the real cost of money. For example, 25k dollars in 1950 is different from 25k dollares in 1995 and so on. For that reason it is very difficult to make real conclusions. I would suggest doing a correction on those variables. For the economic and health impact it could be adjusted per 100 000 people. And for the type of event, make it per decade, comparing different times. That work by far exceeds whats been asked in this exercise, but I wanted to explain it a little.

Conclusions

I conclude that tornados are the events that kill the most and floods the event that costs the most.

Im sorry I ran out of time for the plots.