Impact from severe weather events on public health and economy

author : @StefMT2970

Ref : R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage.
This report involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database.It tracks the impact of major storms and weather events in the United States, with regards to estimates of any fatalities, injuries, and property and crop damage. Data is reported as-is, records have not been removed or updated. The goal of the analysis is to detect the most important events.
It is direclty visible that tornados are the events which cause most human damage whilst flooding is responsible for most economic impact.

Data processing

The dataset under examination is read from the web and loaded into the storm_tbl data table.

library(ggplot2)
library(data.table)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:data.table':
## 
##     between, last
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:data.table':
## 
##     hour, mday, month, quarter, wday, week, yday, year
s <-sessionInfo()
s[1]$R.version$version.string
## [1] "R version 3.1.2 (2014-10-31)"
s[1]$R.version$nickname
## [1] "Pumpkin Helmet"
s[1]$R.version$platform
## [1] "x86_64-w64-mingw32"
oldwd <- getwd()
setwd("d:/dev/app/gitrepo/RepData_PeerAssessment2")
# dir()
if(!file.exists("stormData.csv.bz2")) {
  file_url <- 
     "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(url = file_url, destfile = "stormData.csv.bz2")
}

storm_tbl <- data.table(read.csv(bzfile("stormData.csv.bz2")))
# str(storm_tbl)
# storm data is a large dataset, the next line of code gives an overview
# of memory usage.
# sort( sapply(ls(),function(x){object.size(get(x))}))
# tables()

Certain variables are not in the best format to work with. Only few variables are required in this analysis. For the impact on health, 2 variables (fatalities and injuries) are added together. For the impact on economy, asset property and crop damage are counted together. Dollar amounts have to be calculated by multiplying the estimate (rounded to three significant digits), with an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. Details are in National Climatic Data Center Storm Events

workset <-  storm_tbl %>%
            select(BGN_DATE,
                   REFNUM,
                   STATE,
                   EVTYPE,
                   FATALITIES,
                   INJURIES,
                   PROPDMG,
                   PROPDMGEXP,
                   CROPDMG,
                   CROPDMGEXP)

workset$BGN_DATE <- as.Date(workset$BGN_DATE, "%m/%d/%Y")
workset$BGN_YEAR <- year(workset$BGN_DATE)
workset$HEALTH <- workset$FATALITIES + workset$INJURIES

unitConversion <- function(uom){
   multiplicator <- vector(mode="numeric", length=length(uom))
   for ( i in 1:length(uom)){
      if      (uom[i] == "K") multiplicator[i] = 1000
      else if (uom[i] == "M") multiplicator[i] = 1000000
      else if (uom[i] == "B") multiplicator[i] = 1000000000
      else multiplicator[i] = 0
      }
   multiplicator
   }

workset[ , ECONOMICAL:=PROPDMG * unitConversion(PROPDMGEXP) +
                       CROPDMG * unitConversion(CROPDMGEXP)]
##           BGN_DATE REFNUM STATE     EVTYPE FATALITIES INJURIES PROPDMG
##      1: 1950-04-18      1    AL    TORNADO          0       15    25.0
##      2: 1950-04-18      2    AL    TORNADO          0        0     2.5
##      3: 1951-02-20      3    AL    TORNADO          0        2    25.0
##      4: 1951-06-08      4    AL    TORNADO          0        2     2.5
##      5: 1951-11-15      5    AL    TORNADO          0        2     2.5
##     ---                                                               
## 902293: 2011-11-30 902293    WY  HIGH WIND          0        0     0.0
## 902294: 2011-11-10 902294    MT  HIGH WIND          0        0     0.0
## 902295: 2011-11-08 902295    AK  HIGH WIND          0        0     0.0
## 902296: 2011-11-09 902296    AK   BLIZZARD          0        0     0.0
## 902297: 2011-11-28 902297    AL HEAVY SNOW          0        0     0.0
##         PROPDMGEXP CROPDMG CROPDMGEXP BGN_YEAR HEALTH ECONOMICAL
##      1:          K       0                1950     15      25000
##      2:          K       0                1950      0       2500
##      3:          K       0                1951      2      25000
##      4:          K       0                1951      2       2500
##      5:          K       0                1951      2       2500
##     ---                                                         
## 902293:          K       0          K     2011      0          0
## 902294:          K       0          K     2011      0          0
## 902295:          K       0          K     2011      0          0
## 902296:          K       0          K     2011      0          0
## 902297:          K       0          K     2011      0          0

We are mainly interested in health and economical impact so let's investigate for any outliers in those 2 calculated variables. The data remains untouched, the tables show the largest values for both impact categories. For the oulier (refnum 605943) the orginal remark is shown. It is not clear from this remark if the reported economical damage has a data entry error or not. To validate the large events, historical research would be required which is outside the scope of this report.

summary(workset$HEALTH)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1725    0.0000 1742.0000
summary(workset$ECONOMICAL)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.28e+05 1.00e+03 1.15e+11
workset[HEALTH > 1000] %>% 
   select(BGN_DATE, REFNUM, EVTYPE, HEALTH, FATALITIES, INJURIES, ECONOMICAL)
##      BGN_DATE REFNUM    EVTYPE HEALTH FATALITIES INJURIES ECONOMICAL
## 1: 1953-06-09  67884   TORNADO   1318         90     1228    2.5e+08
## 2: 1974-04-03 116011   TORNADO   1186         36     1150    2.5e+08
## 3: 1979-04-10 157885   TORNADO   1742         42     1700    2.5e+08
## 4: 1994-02-08 223436 ICE STORM   1569          1     1568    5.5e+07
## 5: 2011-05-22 862563   TORNADO   1308        158     1150    2.8e+09
workset[ECONOMICAL > 9000000000] %>% 
   select(BGN_DATE, REFNUM, EVTYPE, ECONOMICAL, HEALTH)
##      BGN_DATE REFNUM            EVTYPE   ECONOMICAL HEALTH
## 1: 1993-08-31 198375       RIVER FLOOD  10000000000      0
## 2: 2005-10-24 569288 HURRICANE/TYPHOON  10000000000      5
## 3: 2005-08-28 577615 HURRICANE/TYPHOON  16930000000      0
## 4: 2005-08-29 577616       STORM SURGE  31300000000      0
## 5: 2005-08-29 581535       STORM SURGE  11260000000      0
## 6: 2006-01-01 605943             FLOOD 115032500000      0
storm_tbl [REFNUM== 605943] %>% select(REFNUM, EVTYPE, REMARKS)
##    REFNUM EVTYPE
## 1: 605943  FLOOD
##                                                                                                                                                                                                                                                                                                                                                                                           REMARKS
## 1: Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.

Although the official guidelines only make notice of a small number of event types, the dataset contains 985 types. However, some quick calculations show that top 10 event types account for 89% of all registered events. The purpose of this analysis is to detect the most important event, therefore cleaning the last 10% does not seem to be necessary.

evtype <- table(workset$EVTYPE)
evtype <- evtype[order(evtype, decreasing=T)]
evtype[1:10]
## 
##               HAIL          TSTM WIND  THUNDERSTORM WIND 
##             288661             219940              82563 
##            TORNADO        FLASH FLOOD              FLOOD 
##              60652              54277              25326 
## THUNDERSTORM WINDS          HIGH WIND          LIGHTNING 
##              20843              20212              15754 
##         HEAVY SNOW 
##              15708
sum(evtype[1:10])/nrow(workset)
## [1] 0.8909882
# barplot(evtype[1:10], las=3, cex.names = 0.5)

Observations have started as early as 1950, but the histogram shows that the mass of observations only comes in after 1990. Again no data has been removed as we already noticed that major events took place prior 1990.

hist(workset$BGN_YEAR, breaks=50, col = "blue",
     main = "number of registered extreme weather events per year",
     xlab = "year of observation",
     ylab = "number of events")

plot of chunk unnamed-chunk-3

Results

Healt impact of events

The HEALTH variable contains the sum of fatalities and injuries for all measurements in the data set (from 1950). This variable is examined per EVTYPE (for the entire dataset, so spanning all years).The table output shows for the top event types:

Tornado's by far outnumber any other event type with regards to impact on public health.

health <- workset %>%
          select(EVTYPE, HEALTH) %>%
          group_by(EVTYPE) %>%
          summarise(totalImpactHealth = sum(HEALTH),
                    avgImpactHealth = round(mean(HEALTH),2),
                    freqOfEvents = n())
health <- health[order(totalImpactHealth, decreasing=T)]

health$EVTYPE <- factor(health$EVTYPE, levels = health$EVTYPE)
health[totalImpactHealth > 999]
## Source: local data table [14 x 4]
## 
##               EVTYPE totalImpactHealth avgImpactHealth freqOfEvents
## 1            TORNADO             96979            1.60        60652
## 2     EXCESSIVE HEAT              8428            5.02         1678
## 3          TSTM WIND              7461            0.03       219940
## 4              FLOOD              7259            0.29        25326
## 5          LIGHTNING              6046            0.38        15754
## 6               HEAT              3037            3.96          767
## 7        FLASH FLOOD              2755            0.05        54277
## 8          ICE STORM              2064            1.03         2006
## 9  THUNDERSTORM WIND              1621            0.02        82563
## 10      WINTER STORM              1527            0.13        11433
## 11         HIGH WIND              1385            0.07        20212
## 12              HAIL              1376            0.00       288661
## 13 HURRICANE/TYPHOON              1339           15.22           88
## 14        HEAVY SNOW              1148            0.07        15708
glp <- ggplot(health[1:10], aes(EVTYPE, totalImpactHealth))
glp2 <- glp + geom_bar(stat= 'identity')+ 
      labs(x = "event", 
           y = "Total number of fatalities and injuries") +
      theme(axis.text.x = element_text(angle=90))
glp2

plot of chunk health

Economical impact of events

The analysis for the econonical impact shows a different picture. Again a table is constructed showing per event (spanning all events registered from 1950 onwards) :

Flooding has the biggest negative impact on economy, tornado's are a distant third. Hurricanes and typhoons occur rarely but are devastating.

econ <-   workset %>%
          select(EVTYPE, ECONOMICAL) %>%
          group_by(EVTYPE) %>%
          summarise(totalImpactEconMD = round(sum(ECONOMICAL)/1000000, 0),
                    avgImpactEconMD = round(mean(ECONOMICAL)/1000000,2),
                    freqOfEvents = n())
econ <- econ[order(totalImpactEconMD, decreasing=T)]

econ$EVTYPE <- factor(econ$EVTYPE, levels = econ$EVTYPE)
econ[totalImpactEconMD > 4999]
## Source: local data table [15 x 4]
## 
##               EVTYPE totalImpactEconMD avgImpactEconMD freqOfEvents
## 1              FLOOD            150320            5.94        25326
## 2  HURRICANE/TYPHOON             71914          817.20           88
## 3            TORNADO             57341            0.95        60652
## 4        STORM SURGE             43324          165.99          261
## 5               HAIL             18753            0.06       288661
## 6        FLASH FLOOD             17562            0.32        54277
## 7            DROUGHT             15019            6.04         2488
## 8          HURRICANE             14610           83.97          174
## 9        RIVER FLOOD             10148           58.66          173
## 10         ICE STORM              8967            4.47         2006
## 11    TROPICAL STORM              8382           12.15          690
## 12      WINTER STORM              6715            0.59        11433
## 13         HIGH WIND              5909            0.29        20212
## 14          WILDFIRE              5061            1.83         2761
## 15         TSTM WIND              5039            0.02       219940
glpe <- ggplot(econ[1:10], aes(EVTYPE, totalImpactEconMD))
glpe2 <- glpe + geom_bar(stat= 'identity')+ 
      labs(x = "event", 
           y = "Total amount of economical damage in million dollars") +
      theme(axis.text.x = element_text(angle=90))
glpe2

plot of chunk economics