Most costly events

Synopsis

In this note we show that tornado, thunderstorm wind and heat are the most harmful with respect to population health in the United State. We also show that flood, hurricane and tornado cause most property damage while drought and flood cause most crop damage. We use U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur. It also estimates of any fatalities, injuries, and property damage.

Data Processing

In this section we present code we used for data processing. We are also going to use local cache, so we need to create a file for this.

Load required libraries, set directories and file hash

First we load packages needed and set cashier.

require(knitr)
## Loading required package: knitr
require(data.table)
## Loading required package: data.table
require(ggplot2)
## Loading required package: ggplot2
require(xtable)
## Loading required package: xtable
require(filehash)
## Loading required package: filehash
## filehash: Simple key-value database (2.2-2 2013-12-16)
if (!file.exists('data')) {dir.create('data')}
dbPath <- 'data/DB'
if (!file.exists(dbPath)) {dbCreate(dbPath)}
db <-  dbInit(dbPath)

Download and read data

Now we download data and chache data.table created. Note that we save cache localy, so analysis start from the downloaded raw data file.

download.storm.dt <- function() {
  fileURI <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
  fileBzPath <- 'data/repdata-data-StormData.csv.bz2'
  if (!file.exists(fileBzPath)) {
    download.file(fileURI, destfile=fileBzPath, method="curl")
  }
  storm.df <- read.csv(bzfile(fileBzPath))
  as.data.table(storm.df)
}


get.or.cache <- function(fun, svar) {
  if (dbExists(db, svar)) {
    #We check if svar is cached. If not we exectute function and cache
    #its value.

    #print("Reading data.table form cache...")
    dbFetch(db, svar)
  } else {
    var <- fun()
    dbInsert(db, svar, var)
    var
  }
}

storm.dt <- get.or.cache(download.storm.dt, "storm.dt")

Cleaning data

Now, we preper new data.table with selected column, that we use for analysis.

clean.dt <- storm.dt[,.(FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
                        CROPDMG, CROPDMGEXP)]
clean.dt[,Evtype:=tolower(storm.dt[,EVTYPE])]
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow

We clean some names in the evtype column. For example tstm wind is clearly thunderstorm wind, raining is the same as rain etc. We follow events in table 2.1.1

https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

therefore we unified entries like hurricane, wildfire. Probably there are more problems there, but later we will show that they are insignificant in the analysis we do.

clean.dt[,Evtype:=gsub('winds','wind',Evtype)]
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl('tstm wind|thunderstorm wind', Evtype),Evtype:='thunderstorm wind']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl('heavy rain', Evtype),Evtype:='heavy rain']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl('flash flood', Evtype),Evtype:='flash flood']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl("river flood|river flooding|^flooding", Evtype),Evtype:='flood']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl("coastal flood", Evtype),Evtype:='coastal flood']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl('hurricane', Evtype),Evtype:='hurricane']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl('storm surge', Evtype),Evtype:='storm surge/tide']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow
clean.dt[grepl('wildfire', Evtype),Evtype:='wildfire']
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype
##      1:    tornado
##      2:    tornado
##      3:    tornado
##      4:    tornado
##      5:    tornado
##     ---           
## 902293:  high wind
## 902294:  high wind
## 902295:  high wind
## 902296:   blizzard
## 902297: heavy snow

Results

Events most harmful with respect to population health

Fatalities

events <- unique(tolower(clean.dt[,Evtype]))

strom.col.ordered <- function(cname) {
  sco <- clean.dt[,sum(get(cname)),by=Evtype]
  sco[order(sco$V1, decreasing=T)]
}

fatalities<-strom.col.ordered("FATALITIES")
cnum <- dim(fatalities)[1]
fatalities.40 <- xtable(fatalities[1:40])
print(fatalities.40, type="html")
Evtype V1
1 tornado 5633.00
2 excessive heat 1903.00
3 flash flood 1035.00
4 heat 937.00
5 lightning 816.00
6 thunderstorm wind 753.00
7 flood 481.00
8 rip current 368.00
9 high wind 283.00
10 avalanche 224.00
11 winter storm 206.00
12 rip currents 204.00
13 heat wave 172.00
14 extreme cold 162.00
15 hurricane 135.00
16 heavy snow 127.00
17 extreme cold/wind chill 125.00
18 strong wind 111.00
19 high surf 104.00
20 blizzard 101.00
21 heavy rain 99.00
22 extreme heat 96.00
23 cold/wind chill 95.00
24 ice storm 89.00
25 wildfire 75.00
26 fog 62.00
27 tropical storm 58.00
28 heavy surf/high surf 42.00
29 cold 38.00
30 landslide 38.00
31 winter weather 33.00
32 tsunami 33.00
33 unseasonably warm and dry 29.00
34 urban/sml stream fld 28.00
35 winter weather/mix 28.00
36 wind 24.00
37 storm surge/tide 24.00
38 dust storm 22.00
39 dense fog 18.00
40 record/excessive heat 17.00

There are total of 15145 fatalities and only 317 fatalities not included in Top 40.

Figure 1. Top 5 events that produce most fatalities.
ggplot(fatalities[1:5], aes(x=Evtype, y=V1))+geom_bar(stat = "identity")

Injuries

injuries<-strom.col.ordered("INJURIES")
cnum <- dim(injuries)[1]
injuries.40 <- xtable(injuries[1:40])
print(injuries.40, type="html")
Evtype V1
1 tornado 91346.00
2 thunderstorm wind 9493.00
3 flood 6794.00
4 excessive heat 6525.00
5 lightning 5230.00
6 heat 2100.00
7 ice storm 1975.00
8 flash flood 1802.00
9 high wind 1439.00
10 hail 1361.00
11 hurricane 1328.00
12 winter storm 1321.00
13 heavy snow 1021.00
14 wildfire 911.00
15 blizzard 805.00
16 fog 734.00
17 wild/forest fire 545.00
18 dust storm 440.00
19 winter weather 398.00
20 heat wave 379.00
21 dense fog 342.00
22 tropical storm 340.00
23 strong wind 301.00
24 rip currents 297.00
25 heavy rain 255.00
26 rip current 232.00
27 extreme cold 231.00
28 glaze 216.00
29 avalanche 170.00
30 high surf 156.00
31 extreme heat 155.00
32 wild fires 150.00
33 ice 137.00
34 tsunami 129.00
35 wind 87.00
36 urban/sml stream fld 79.00
37 wintry mix 77.00
38 winter weather/mix 72.00
39 winter weather mix 68.00
40 landslide 52.00

There are total of 140528 injuries and only 1035 injuries not included in Top 40.

Figure 2. Top 5 events that produce most injuries.

ggplot(injuries[1:5], aes(x=Evtype, y=V1))+geom_bar(stat = "identity")

Events that have the greatest economic consequences

Property damage

propexps <- clean.dt[,PROPDMGEXP]

tonum <- function(ch) {
  ch <- as.character(ch)
  if (ch=='h' || ch =='H') {
    1
  } else if (ch == 'K'||ch == 'k') {
    3
  } else if (ch=='m'|| ch=='M') {
    6
  }
  else if (ch=='B') {
    9
  } else if (ch %in% c("",  "-", "?", "+")) {
    0
  } else {
    as.integer(ch)
  }
}

propexps.new <- get.or.cache(function() {sapply(propexps, tonum)}, "propexps.new")

clean.dt[, PropExp:=propexps.new]
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype PropExp
##      1:    tornado       3
##      2:    tornado       3
##      3:    tornado       3
##      4:    tornado       3
##      5:    tornado       3
##     ---                   
## 902293:  high wind       3
## 902294:  high wind       3
## 902295:  high wind       3
## 902296:   blizzard       3
## 902297: heavy snow       3
clean.dt[, PropD:=PROPDMG*(10)^PropExp]
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype PropExp PropD
##      1:    tornado       3 25000
##      2:    tornado       3  2500
##      3:    tornado       3 25000
##      4:    tornado       3  2500
##      5:    tornado       3  2500
##     ---                         
## 902293:  high wind       3     0
## 902294:  high wind       3     0
## 902295:  high wind       3     0
## 902296:   blizzard       3     0
## 902297: heavy snow       3     0
prop.dmg<-strom.col.ordered("PropD")
cnum <- dim(prop.dmg)[1]
prop.40 <- xtable(prop.dmg[1:40])
print(prop.40, type="html")
Evtype V1
1 flood 150002295307.00
2 hurricane 84756180010.00
3 tornado 56947380676.50
4 storm surge/tide 47964724000.00
5 flash flood 17588780095.91
6 hail 15735267062.70
7 thunderstorm wind 11575229120.10
8 tropical storm 7703890550.00
9 winter storm 6688497251.00
10 high wind 5878370043.00
11 wildfire 4865614000.00
12 ice storm 3944927860.00
13 heavy rain 3248548142.00
14 wild/forest fire 3001829500.00
15 severe thunderstorm 1205360000.00
16 drought 1046106000.00
17 heavy snow 932759140.00
18 lightning 930379429.50
19 blizzard 659213950.00
20 wild fires 624100000.00
21 typhoon 600230000.00
22 coastal flood 417616060.00
23 landslide 324596000.00
24 hailstorm 241000000.00
25 strong wind 177674240.00
26 tsunami 144062000.00
27 high wind/cold 110500000.00
28 major flood 105000000.00
29 high surf 89955000.00
30 extreme cold 67737400.00
31 winter storm high wind 60000000.00
32 urban/sml stream fld 58309650.00
33 record cold 56000000.00
34 waterspout/tornado 51110000.00
35 lake-effect snow 40115000.00
36 winter weather 20866000.00
37 urban flood 18288000.00
38 erosion/cstl flood 16200000.00
39 coastal flooding/erosion 15000000.00
40 snow 14827550.00

There are total of 428224866338 USD of property damage and only 296327300 USD not included in Top 40.

Figure 3. Top 5 events that produce most property damage.

ggplot(prop.dmg[1:5], aes(x=Evtype, y=V1))+geom_bar(stat = "identity")

Crop damage

cropexps <- clean.dt[,CROPDMGEXP]
#cropexps.unique <- sort(as.character(unique(cropexps)))
cropexps.new <- get.or.cache(function() {sapply(cropexps, tonum)}, "cropexps.new")

clean.dt[, CropExp:=cropexps.new]
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype PropExp PropD CropExp
##      1:    tornado       3 25000       0
##      2:    tornado       3  2500       0
##      3:    tornado       3 25000       0
##      4:    tornado       3  2500       0
##      5:    tornado       3  2500       0
##     ---                                 
## 902293:  high wind       3     0       3
## 902294:  high wind       3     0       3
## 902295:  high wind       3     0       3
## 902296:   blizzard       3     0       3
## 902297: heavy snow       3     0       3
clean.dt[, CropD:=CROPDMG*(10)^CropExp]
##         FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##      1:          0       15    25.0          K       0           
##      2:          0        0     2.5          K       0           
##      3:          0        2    25.0          K       0           
##      4:          0        2     2.5          K       0           
##      5:          0        2     2.5          K       0           
##     ---                                                          
## 902293:          0        0     0.0          K       0          K
## 902294:          0        0     0.0          K       0          K
## 902295:          0        0     0.0          K       0          K
## 902296:          0        0     0.0          K       0          K
## 902297:          0        0     0.0          K       0          K
##             Evtype PropExp PropD CropExp CropD
##      1:    tornado       3 25000       0     0
##      2:    tornado       3  2500       0     0
##      3:    tornado       3 25000       0     0
##      4:    tornado       3  2500       0     0
##      5:    tornado       3  2500       0     0
##     ---                                       
## 902293:  high wind       3     0       3     0
## 902294:  high wind       3     0       3     0
## 902295:  high wind       3     0       3     0
## 902296:   blizzard       3     0       3     0
## 902297: heavy snow       3     0       3     0
crop.dmg<-strom.col.ordered("CropD")
cnum <- dim(crop.dmg)[1]
crop.dmg41 <- sum(crop.dmg[41:cnum]$V1)
crop.dmg1 <- sum(crop.dmg$V1)
crop.40 <- xtable(crop.dmg[1:40])
print(crop.40, type="html")
Evtype V1
1 drought 13972566000.00
2 flood 10728307950.00
3 hurricane 5515292800.00
4 ice storm 5022113500.00
5 hail 3025954473.00
6 flash flood 1532197150.00
7 extreme cold 1312973000.00
8 thunderstorm wind 1255947988.00
9 frost/freeze 1094186000.00
10 heavy rain 795762800.00
11 high wind 679291900.00
12 tropical storm 678346000.00
13 excessive heat 492402000.00
14 freeze 456725000.00
15 tornado 414953270.00
16 heat 401461500.00
17 damaging freeze 296230000.00
18 wildfire 295972800.00
19 excessive wetness 142000000.00
20 heavy snow 134653100.00
21 flood/rain/wind 112800000.00
22 blizzard 112060000.00
23 wild/forest fire 106796830.00
24 strong wind 69953500.00
25 frost 66000000.00
26 cold and wet conditions 66000000.00
27 early frost 42000000.00
28 agricultural freeze 28820000.00
29 winter storm 26944000.00
30 unseasonably cold 25042500.00
31 small hail 20793000.00
32 landslide 20017000.00
33 severe thunderstorms 17000000.00
34 extreme windchill 17000000.00
35 tropical storm jerry 16000000.00
36 winter weather 15000000.00
37 hard freeze 13100000.00
38 lightning 12092090.00
39 unseasonal rain 10000000.00
40 urban/sml stream fld 8488100.00

There are total of 49104192181 USD of crop damage and only 50947930 USD not included in Top 40.