ReproRsrchPeerAssmemt2

Weather events: Casualties and Damage in the United States

Synopsis

This is a report on the NOAA Storm Data collection. It is a preliminary survey of the main types of events resulting in casualties (injuries or deaths) and damage (to property or crops). Summary tables are prepared and presented as are two summary figures.

Data Processing

Preliminary notes

Official event types
We first determine a granularity of events taken “official event types” as a basis for summary. These adopted from the Table 1 of NOAA’s Storm Data Preparation document, refered to below as NSDP. 1. This has 48 categories. 2

Reported events.
Unfortunately, the reported event types in the field EVTYPE of the data are very heterogeneous format, with 690 distinct labels. The bulk of the work in mining this data involved regularizing those categories and reconciling them with the official event types. There groupings chosen are undoubtedly suboptimal. They should serve as an example of what kinds of reports can be done. More input from subject matter experts and the goals of the supervisory team will need to be provided to make this work well. The programming used here can be relatively easily modified once the goals are clarified.

Download file and load into data.table object

dataURL = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2" 


dataZipFile="./repdata-data-StormData.csv.bz2" 
csvFile="./repdata-data-StormData.csv" 
# Download file and unzip file if necessary 
library(data.table); library(qdap); library(SnowballC); library(stringdist); 
## Loading required package: qdapDictionaries
## Loading required package: qdapRegex
## Loading required package: qdapTools
## 
## Attaching package: 'qdapTools'
## 
## The following object is masked from 'package:data.table':
## 
##     shift
## 
## Loading required package: RColorBrewer
## 
## Attaching package: 'qdap'
## 
## The following object is masked from 'package:base':
## 
##     Filter
if (!exists('DT') || !is.data.table(DT)) { 
  DT=NULL 
  catln = function(...) cat(...,'\n') 
  #` The following function will download and preprocess the data  into a  {data.table} object 
  loadRawData=function(){ 
    # prepare (if necessary) a csv file and read it into the data table DT in parent 
    if (!file.exists(csvFile)){ 
      catln("Downloading bz2file") 
      if (!file.exists(dataZipFile)){ 
        # Adapted from http://stackoverflow.com/a/23899792/1795127 
        download.file(dataURL,dataZipFile, mode="wb") 
      } 
      catln('Unarchiving file to .csv') 
      sysUnarchiveCmd=paste0 ('open -a "Archive Utility" ', dataZipFile) 
    } 
    catln('Using fread() on .csv file') 
    DT <<- fread(csvFile) 
  } 
  if (!file.exists(csvFile)){ 
    unzip(dataZipFile, csvFile) 
  } 
  loadRawData() 
} 
## Using fread() on .csv file 
## 
Read 0.0% of 967216 rows
Read 22.7% of 967216 rows
Read 37.2% of 967216 rows
Read 48.6% of 967216 rows
Read 55.8% of 967216 rows
Read 69.3% of 967216 rows
Read 78.6% of 967216 rows
Read 85.8% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:13
## Warning in fread(csvFile): Read less rows (902297) than were allocated
## (967216). Run again with verbose=TRUE and please report.

We need to do some initial cleaning of text. See comments in cleanText

cleanText=function(x){ 
  #Clean up the labels in reported EVENT Type.  
  # Remove leading/trailing spaces, condense multispace to single space 
  # remove non alphabetic characters except / and space 
  # remove word final -ing , -s, -y   
  # replace complete words 'and' and 'from' with phrase break character /  
  y=tolower(gsub(' +',' ', 
                 gsub('^\\s*|\\s*$','',x,perl=TRUE), 
                 perl=TRUE )) 
  y=gsub('[^A-Za-z /]','',y,perl=TRUE) 
  y=gsub('(y|ing|s)\\b','',y,perl=TRUE) 
  y=gsub('\\band\\b|\\bfrom\\b','/',y,perl=TRUE) 
  return(y) 
} 
# First, we save the original reported events just in case. 

Then we do a preliminary cleaning of the EVTYPE field and correct a few obvous misspellings

origReportedEvents=DT$EVTYPE  
DT[,EVTYPE:=cleanText(DT$EVTYPE)] 
# Replace commmon abbrev tstm with thunderstorm 
DT[,EVTYPE:=gsub('\\btstm\\b','thunderstorm',EVTYPE,perl=TRUE)] 
# One event listed as tstmw-- guessing it means 'thunderstorm wind' 
DT[,EVTYPE:=gsub('\\btstmw\\b','thunderstorm wind',EVTYPE,perl=TRUE)] 

Preliminary selection and editing of raw data

Given the goals of the report, we first set up boolean vectors for events which have: * “EVTYPES” starting ‘summary’ (as these are not related to the official event types)
* entries indicate casualties, that is with non zero entries in “FATALITIES” or “INJURIES”)

binxEvent=!grepl('^summary',DT$EVTYPE,ignore.case=TRUE,perl=TRUE) 
# Filter on casualties: injures and fatalities 
binxCasualty=DT$INJURIES>0 | DT$FATALITIES>0 
# Filter on damage (X_DMGEXP doesn't need to be looked at when X_DMG==0 
binxDamage=DT$CROPDMG>0|DT$PROPDMG>0 
binxRelevant=binxEvent&(binxCasualty|binxDamage) 

Preliminary editing of reported event types

The variables above will be useful for selecting data later in the report We can subset the data include only the relevant cases (rows) and relevant variables.

keyFields=c("EVTYPE","FATALITIES", "INJURIES","PROPDMG",  "PROPDMGEXP", 
            "CROPDMG","CROPDMGEXP") 
SDT=DT[binxRelevant,c(keyFields),with=FALSE] 

We also need a list of ‘Official Event Types’.

officialEventTypesString='Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, Cold/Wind Chill,
Debris Flow, Dense Fog, Dense Smoke, Drought, Dust Devil, Dust Storm, Excessive Heat,
Extreme Cold/Wind Chill, Flash Flood, Flood, Frost/Freeze, Funnel Cloud, Freezing Fog,
Hail, Heat, Heavy Rain, Heavy Snow, High Surf, High Wind, Hurricane, Ice Storm,
Lake-Effect Snow, Lakeshore Flood, Lightning, Marine Hail, Marine High Wind,
Marine Strong Wind, Marine Thunderstorm Wind, Rip Current, Seiche, Sleet,
Storm Surge/Tide, Strong Wind,Thunderstorm Wind, Tornado, Tropical Depression,
Tropical Storm, Tsunami, Volcanic Ash, Waterspout, Wildfire, Winter Storm,
Winter Weather'
# These will be assigned in caller or global environment ( preparation for possible functionalization) 
officialEventsOrig <<- strsplit(officialEventTypesString,' *[\n,]+ *')[[1]]
officialEvents <<- cleanText(officialEventsOrig)

Mor elaborate ad hoc editing of reported events

A text substitution function editReportedEvents() was developed to map the ragged reportedEvents into editedEvents that bore a closer resemblance to terms in the main officialEvents list. This was the result of complex set of interactive stages involving ad hoc R scripts, evolving understanding of the relevant terminology, suggestions from internet fora (including this Reproducible Research fora) and repeated reference to the NSDP document. This is a long chunk of code, but it is useful to show its form here. It should be relatively easy to change with additional clarificaiton of the key event types of interest to our client.

# editReportedEvents.R 
# IterativelyReprocessReportedEvents.R 
# This is a hand edited sript for preprocessing substitution lists to feed to  
# list the  script  createReportedEventsToEditedCode.R'#  
editReportedEvents=function(reportedEvents){ 
# The following words seemed to bear no relation to the officialEvents list and will be deleted 
rawDeleteStr = 
'accident accumulation advisory agricultural apache alberto awning black 
breakup dam break clou county condition  beach erosion brush fire damage 
damaging drowning dry dr  unseasonably edouard emily erin exposure 
extended felix fire floe forest gordon gra gradient ground hypothermia  
road injur jam jerr landslide landslump  light mph major minor mircoburst 
mishap nonsevere nonthunderstorm on opal other rural urban small sml snowmelt 
squall stream street to wauseon wild' 
deleteWordPatCVec=sort(unique(cleanText(strsplit(rawDeleteStr,'\\s+',perl=TRUE)[[1]]))) 

# A few of these (downburst, gustnado ) were inferred from table of contents in the NOAA document 
#  microburst is assumed to be a kind of downburst 
# The following EVTYPE phrases will (left hand column) will be replaced with 
# terms in the officialEvents 
# This can be edited. 
rawSubsStr= 
'cold air tornado > tornado 
avalance > avalanche 
bitter > extreme 
blowing > wind 
flooding  > flood 
costalstorm> coastal storm 
cool > cold 
cstl > coastal 
currents>current 
downburst> thunderstorm wind 
freezing drizzle>freezing fog 
unseasonably warm > heat 
flashflood> flash flood 
fld >flood 
floodin > flood 
freez>freeze 
storm force > high 
frostfreeze > frost freeze 
wind gust >  strong wind 
glaze > ice go 
gust >  strong wind 
gustnado > thunderstorm wind 
hailstorm > hail 
hard freeze > freeze 
hazardous surf > high surf 
hv>heav 
hurricanegenerated swell > storm surge 
ic > ice 
lightn>lightning 
landspout > water spout 
late season snow > heavy snow 
microburst > thunderstorm wind 
mixed precip > sleet 
mixed precipitation > sleet 
freez rain> sleet 
lake effect> lakeeffect 
ligntn> lightn 
wet > rain 
mud slide > % 
non thunderstorm wind> high wind 
heav precipitation > heav rain 
excessive rainfall> heav rain 
record rainfall > heav rain 
torrential rainfall > heav rain 
rapidl ri water > flood 
rainfall>rain 
record cold > extreme cold 
record heat >excessive heat 
record rainfall > heav rain 
record snow >heav snow 
record/excessive heat >  excessive heat 
flood/river flood > flood 
river/stream flood > flood 
river flood > flood 
ic road > ice 
ice on road > ice 
ice road > ice 
rock slide>% 
rogue wave > high surf 
rough sea > high surf 
rough surf>high surf 
heav sea>high surf 
high sea>high surf 
rough sea>high surf 
high wind/sea > high wind/rough surf 
high wind/sea > high wind/rough surf 
late season snow > heav snow 
heav rain/severe weather >  heav rain/high wind 
severe thunderstorm >thunderstorm wind 
severe thunderstorm wind > thunderstorm wind 
severe turbulence> high wind 
heav shower > heav rain 
heav snow shower > heav snow 
mud slide > % 
snowpack >snow 
snowsquall>snow 
freez spra >  freezing fog 
snow squall > heavy snow 
cold temperature > cold/wind chill 
low temperature> cold/wind chill 
thuderstorm>thunderstorm 
thunderestorm>thunderstorm 
thundersnow> thunderstorm wind 
thunderstormw > thunderstorm wind 
thunderstormwind > thunderstorm wind 
thunderstrom >thunderstorm wind 
tunderstorm > thunderstorm 
tidal flood> storm surge 
torndao > tornado 
torrential>heavy 
severe turbulence > high wind 
hurricane/typhoon > hurricane 
typhoon > hurricane 
unseasonabl cold > cold 
unseasonabl warm > heat 
unseasonabl warm/dr > heat 
unseasonal rain > heav rain 
warm weather > heat 
waterspouttornado > waterspout 
cold wave > cold 
heat wave > heat 
heat wave drought > heat 
high wave > high surf 
rogue wave> high surf 
wind/wave > wind/high surf 
cool>cold 
excessive wetne > heavy rain 
whirlwind> dust devil 
thunderstorm wi > thunderstorm wind 
extreme windchill > extreme cold/wind chill 
thunderstorm windshail > thunderstorm winds/hail 
wintr mix > sleet' 

# Produce the vector of patterns to be replace 
rawSubsCvec=strsplit( 
gsub(' *> *','>',rawSubsStr,perl=TRUE), 
'\n')[[1]] 
# These are applied to cleaned up names 
subsPat=c() 
subsRepl=c() 
rawSubsList=strsplit(rawSubsCvec,'>') 
subsPat=cleanText(sapply(rawSubsList,FUN= function(s) s[1])) 
subsRepl=cleanText(sapply(rawSubsList,FUN= function(s) s[2])) 

# Sort these out for screening after http://stackoverflow.com/a/8920256/1795127  
# Sort out by number of words so larger changes occur first 
patWC=sapply(gregexpr("\\W+", subsPat), length) + 1 
patNchar=sapply(subsPat,nchar) 
ordSubs=order(patWC,patNchar,decreasing=TRUE) 
subsPat=subsPat[ordSubs] 
subsRepl=subsRepl[ordSubs] 

# Add on single word deletions 
subsPat=c(subsPat,deleteWordPatCVec) 
subsRepl=c(subsRepl,rep('',length(deleteWordPatCVec))) 

# Generate as markdown appendix -- this is commented out in production 
# catln('Reported phrase or word (cleaned)|Edited phrase or word   ') 
# catln("#' ------------------- | -------------------  ") 
# for (i in 1:length(subsPat)){ 
#   trepl=subsRepl[i] 
#   if(trepl=='') trepl= '[]' 
#   catln("#'",subsPat[i],' | ',trepl, '  ') 
# } 
# catln('  ') 
editedEvents = reportedEvents 
for (i in 1:length(subsPat)) { 
editedEvents = sub(subsPat[i], subsRepl[i],reportedEvents, fixed = TRUE) 
} 
return(editedEvents) 
} 


# A new field editedEvents is added to the SDT summary database. 
SDT[, editedEvents :=editReportedEvents(EVTYPE)]

Mapping edited reported events to official event types

The edited event names can be compared to the officialEvents list, which is generated by this code:

officialEventTypesString='Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, Cold/Wind Chill, 
Debris Flow, Dense Fog, Dense Smoke, Drought, Dust Devil, Dust Storm, Excessive Heat, 
Extreme Cold/Wind Chill, Flash Flood, Flood, Frost/Freeze, Funnel Cloud, Freezing Fog, 
Hail, Heat, Heavy Rain, Heavy Snow, High Surf, High Wind, Hurricane, Ice Storm, 
Lake-Effect Snow, Lakeshore Flood, Lightning, Marine Hail, Marine High Wind, 
Marine Strong Wind, Marine Thunderstorm Wind, Rip Current, Seiche, Sleet, 
Storm Surge/Tide, Strong Wind,Thunderstorm Wind, Tornado, Tropical Depression, 
Tropical Storm, Tsunami, Volcanic Ash, Waterspout, Wildfire, Winter Storm, 
Winter Weather' 
# Original names (for printing, eg) 
officialEventsOrig = strsplit(officialEventTypesString,' *[\n,]+ *')[[1]] 
# Cleaned up names (for comparison with editedEvents) 
officialEvents = cleanText(officialEventsOrig) 

Using Jaccard indices on ‘bags of words’ to slot reported events

The preliminary editing above is only the first step. We now have to compare the sets of words in the edited reported events editedEvents with the official event types and try to assign them. This is a very preliminary and crude ‘document classification task’. Basically, all the words in the each of (pre edited) reported events are compared with all the words for each official event type. For each edited event, the official event whos words with the most overlap with those of reported word in question is substituted for that reported event event. Degree of overlapped is tae]ken as the Jaccard index. 3 This is undoubtedly too crude a procedure. More subject knowledge is needed to come up with a rational way to do this.

# Collapse the slashes in event types 
officialEventsNoSlash=sub(pattern='/',replacement=' ', 
officialEvents,fixed=TRUE) 
editedEventsVec=unique(SDT$editedEvents) 
editedEventsNoSlash=sub(pattern='/',replacement=' ', 
editedEventsVec,fixed=TRUE) 
# Simple bag of words function 
bow=function(x) unique(strsplit(x,' +',perl=TRUE)[[1]]) 
editedBow=sapply(editedEventsNoSlash,bow) 
officialBow=sapply(officialEventsNoSlash,bow) 
# Jaccard index function--- relative match of bags of words 
fJaccard=function(x,y) 
return(length(intersect(x,y))/length(union(x,y))) 
# Generate proximity vector showind Jaccard index for each input bow with all the officialBow entries 
fProxVec=function(x){ 
# return vector of proximity for set x to each officialBow 
tvec=sapply(officialBow,function(y) fJaccard(x,y)) 
return(tvec) 
} 
# Find the best approximate match for each item 
jBestVec=NULL 

for (i in 1:length(editedBow)){ 
pv=fProxVec(editedBow[[i]]) 
# catln(editedBow[[i]],sort(pv,decreasin=TRUE)[1:5]) 
jBest=which.max(pv) 
pBest=pv[jBest] 
# catln(editedEventsVec[i],'<', officialEvents[jBest],pBest) 
jBestVec=c(jBestVec,jBest) 
}

# Add the approximateClassification to SDS database 
#  Get an index number in the editedEventsVec list 
findEditedEventsInx=function(x) which(x==editedEventsVec) 
# Look up the name of the best matching officialEvent 
substituteApproxOfficialEvent = function(x){ 
  return(officialEvents[jBestVec[findEditedEventsInx(x)]]) 
} 

substituteAllApproxEvents=function(x){ 
  sapply(x,substituteApproxOfficialEvent) 
} 
# Now add the new column 
SDT[,approxType:=substituteAllApproxEvents(editedEvents)] 

Reporting damage and casualties

We can use the reclassified EVTYPES a basis for casualty and damage calculation that is the object of this enterprise. We can table and plot some basic results. Although casualties (injuries and fatalities) are straightforward, reporting of damage is rather baroque, involving a base number and a power of 10 exponent. The latter needs to be decoded. See comments in the fDamage() function below.

fDamage=function(dmg,dmgexp){ 
  # Only digits are  and letters h k m b areinterpretable 
  # Powers of 10 b(illion)=10^9,b(illion)=10^6, k(ilo ==thousand)= 10^3 h(undred)=10^2 
  # Test x=c('b','k','m','2','7','2','','+') 
  # expLookup[x] 
  #   b    k    m    2    7    2 <NA>    +  
  #   9    3    6    2    7    2   NA   NA  
  expLookup=c("0" =0 ,"2"=2, "3"=3, "4"=4 ,"5"=5 ,"6"=6, "7"=7, 
              "b"=9, "h"=2 ,"k"=3 ,"m"=6) 
  return(dmg*10^(expLookup[tolower(dmgexp)])) 
} 

# Calculate damage in dollars
SDT[,propDamage:=fDamage(PROPDMG,PROPDMGEXP)] 
SDT[,cropDamage:=fDamage(CROPDMG,CROPDMGEXP)] 

# Summaries 

ByTypeDT=SDT[ , .(injuries=sum(INJURIES,na.rm=TRUE), 
                  fatalities=sum(FATALITIES,na.rm=TRUE), 
                  property=sum(propDamage,na.rm=TRUE), 
                  crops=sum(cropDamage,na.rm=TRUE) 
), by=approxType 
] 

# Printing dollar amounts on data frames (and datables) 
# Adapted from http://stackoverflow.com/a/22071182/1795127 ("Roland") 
class(SDT$propDamage) <- c("money", class(SDT$propDamage)) 
class(SDT$cropDamage) <- c("money", class(SDT$cropDamage)) 

print.money <- function(x, ...) { 
  print.default(paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=","))) 
} 

format.money  <- function(x, ...) { 
  paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=",")) 
} 


ByTypeDT[, casualties:=injuries+fatalities] 
ByTypeDT[, damage:=property+crops] 

class(ByTypeDT$property) <- c("money", class(ByTypeDT$property)) 
class(ByTypeDT$crops) <- c("money", class(ByTypeDT$crops)) 
class(ByTypeDT$damage) <- c("money", class(ByTypeDT$damage)) 

The resulting casualty counts are shown in the following table for all the official events.

setorder(ByTypeDT,-casualties) 
print(ByTypeDT[,.(approxType,casualties,fatalities,injuries)]) 
##                   approxType casualties fatalities injuries
##  1:                  tornado      97043       5636    91407
##  2:        thunderstorm wind      10217        735     9482
##  3:           excessive heat       8472       1924     6548
##  4:                    flood       7277        482     6795
##  5:                   lightn       6049        817     5232
##  6:                     heat       3896       1212     2684
##  7:              flash flood       2837       1035     1802
##  8:    astronomical low tide       2438        251     2187
##  9:                ice storm       2249         97     2152
## 10:                high wind       1954        326     1628
## 11:             winter storm       1554        216     1338
## 12:                hurricane       1459        133     1326
## 13:                     hail       1386         15     1371
## 14:                heav snow       1279        160     1119
## 15:                dense fog       1158         81     1077
## 16:              rip current       1106        577      529
## 17:                 blizzard        906        101      805
## 18:           winter weather        606         66      540
## 19:  extreme cold/wind chill        564        304      260
## 20:               dust storm        463         23      440
## 21:           tropical storm        449         66      383
## 22:                heav rain        415        118      297
## 23:              strong wind        412        111      301
## 24:                avalanche        394        224      170
## 25:                high surf        375        160      215
## 26:          cold/wind chill        203        143       60
## 27:                  tsunami        162         33      129
## 28:         storm surge/tide         67         24       43
## 29: marine thunderstorm wind         53         19       34
## 30:                freez fog         49         11       38
## 31:               dust devil         45          2       43
## 32:       marine strong wind         36         14       22
## 33:               waterspout         32          3       29
## 34:                  drought         23          4       19
## 35:            coastal flood         18          9        9
## 36:              marine hail         15          8        7
## 37:             frost/freeze          5          2        3
## 38:             funnel cloud          3          0        3
## 39:                    sleet          2          2        0
## 40:         marine high wind          2          1        1
## 41:                   seiche          0          0        0
## 42:             volcanic ash          0          0        0
## 43:      tropical depression          0          0        0
## 44:          lakeeffect snow          0          0        0
## 45:          lakeshore flood          0          0        0
## 46:              dense smoke          0          0        0
##                   approxType casualties fatalities injuries

A bar plot of the 10 categories with the higest casualty counts is shown here:

# Barplot code adapted from  
# http://stackoverflow.com/a/10286695/1795127 and 
# from http://www.statmethods.net/graphs/bar.html   
# Stacked Bar Plot with Colors and Legend 
top10Cas=as.matrix(ByTypeDT[1:10,.(fatalities,injuries)]) 
propTop10Cas=sum(ByTypeDT$casualties[1:10]/sum (ByTypeDT$casualties)) 
bpTitle=paste0('10 types causing ', round(1000*propTop10Cas)/10,'% of casualties') 
bp=barplot(t(top10Cas),xaxt="n", main=bpTitle,legend=colnames(top10Cas)) 
labs=ByTypeDT$approxType[1:10] 
text(cex=.8, x=bp+.2,y=-1.25,labs,xpd=TRUE,srt=60,pos=2) 

The total damage costs are shown in the following table for all the official events.

setorder(ByTypeDT,-damage) 
print(ByTypeDT[,.(approxType,damage,property,crops)]) 
##                   approxType              damage            property
##  1:                    flood $161,012,730,600.00 $150,165,277,650.00
##  2:                hurricane  $90,161,397,810.00  $84,656,105,010.00
##  3:                  tornado  $57,418,263,383.50  $57,003,302,863.50
##  4:         storm surge/tide  $47,965,579,000.00  $47,964,724,000.00
##  5:              flash flood  $19,121,994,028.50  $17,589,796,878.50
##  6:                     hail  $18,783,503,075.70  $15,736,565,455.70
##  7:                  drought  $15,018,922,000.00   $1,046,306,000.00
##  8:        thunderstorm wind  $14,052,334,553.10  $12,778,225,573.10
##  9:    astronomical low tide  $10,183,236,330.00   $9,744,562,650.00
## 10:                ice storm   $8,981,218,660.00   $3,959,104,360.00
## 11:           tropical storm   $8,409,286,550.00   $7,714,390,550.00
## 12:                high wind   $6,866,611,390.00   $6,164,244,490.00
## 13:             winter storm   $6,716,941,251.00   $6,689,497,251.00
## 14:                heav rain   $4,073,804,490.00   $3,267,288,690.00
## 15:             frost/freeze   $2,015,761,000.00      $18,700,000.00
## 16:  extreme cold/wind chill   $1,407,213,400.00      $77,190,400.00
## 17:                heav snow   $1,104,999,840.00     $970,316,740.00
## 18:                   lightn     $947,501,516.50     $935,409,426.50
## 19:                 blizzard     $771,373,950.00     $659,313,950.00
## 20:           excessive heat     $644,096,480.00       $9,688,700.00
## 21:            coastal flood     $433,988,060.00     $433,932,060.00
## 22:                     heat     $424,383,550.00      $12,372,050.00
## 23:              strong wind     $251,127,740.00     $181,174,240.00
## 24:          cold/wind chill     $155,386,500.00      $58,644,000.00
## 25:                  tsunami     $144,082,000.00     $144,062,000.00
## 26:                high surf     $100,560,650.00     $100,560,650.00
## 27:           winter weather      $42,298,000.00      $27,298,000.00
## 28:          lakeeffect snow      $40,115,000.00      $40,115,000.00
## 29:                dense fog      $22,829,500.00      $22,829,500.00
## 30:                freez fog      $13,504,500.00      $13,504,500.00
## 31:               dust storm       $9,699,000.00       $6,099,000.00
## 32:               waterspout       $9,564,200.00       $9,564,200.00
## 33:          lakeshore flood       $7,540,000.00       $7,540,000.00
## 34: marine thunderstorm wind       $5,907,400.00       $5,857,400.00
## 35:                avalanche       $3,721,800.00       $3,721,800.00
## 36:      tropical depression       $1,737,000.00       $1,737,000.00
## 37:                    sleet       $1,400,000.00       $1,400,000.00
## 38:         marine high wind       $1,297,010.00       $1,297,010.00
## 39:                   seiche         $980,000.00         $980,000.00
## 40:               dust devil         $739,130.00         $739,130.00
## 41:             volcanic ash         $500,000.00         $500,000.00
## 42:       marine strong wind         $418,330.00         $418,330.00
## 43:             funnel cloud         $194,600.00         $194,600.00
## 44:              rip current         $163,000.00         $163,000.00
## 45:              dense smoke         $100,000.00         $100,000.00
## 46:              marine hail          $54,000.00          $54,000.00
##                   approxType              damage            property
##                  crops
##  1: $10,847,452,950.00
##  2:  $5,505,292,800.00
##  3:    $414,960,520.00
##  4:        $855,000.00
##  5:  $1,532,197,150.00
##  6:  $3,046,937,620.00
##  7: $13,972,616,000.00
##  8:  $1,274,108,980.00
##  9:    $438,673,680.00
## 10:  $5,022,114,300.00
## 11:    $694,896,000.00
## 12:    $702,366,900.00
## 13:     $27,444,000.00
## 14:    $806,515,800.00
## 15:  $1,997,061,000.00
## 16:  $1,330,023,000.00
## 17:    $134,683,100.00
## 18:     $12,092,090.00
## 19:    $112,060,000.00
## 20:    $634,407,780.00
## 21:         $56,000.00
## 22:    $412,011,500.00
## 23:     $69,953,500.00
## 24:     $96,742,500.00
## 25:         $20,000.00
## 26:              $0.00
## 27:     $15,000,000.00
## 28:              $0.00
## 29:              $0.00
## 30:              $0.00
## 31:      $3,600,000.00
## 32:              $0.00
## 33:              $0.00
## 34:         $50,000.00
## 35:              $0.00
## 36:              $0.00
## 37:              $0.00
## 38:              $0.00
## 39:              $0.00
## 40:              $0.00
## 41:              $0.00
## 42:              $0.00
## 43:              $0.00
## 44:              $0.00
## 45:              $0.00
## 46:              $0.00
##                  crops

A bar plot of the 10 categories with the higest damage costs is shown here:

top10Dmg=as.matrix(ByTypeDT[1:10,.(property,crops)]) 
propTop10Dmg=sum(ByTypeDT$damage[1:10]/sum (ByTypeDT$damage)) 
bpTitle=paste0('10 types causing ', round(1000*propTop10Cas)/10,'% of damage') 
bp=barplot(t(top10Dmg)/1e9,xaxt="n", main=bpTitle,legend=colnames(top10Dmg), ylab='Billions of dollars') 
labs=ByTypeDT$approxType[1:10] 
text(cex=.8, x=bp+.2,y=-1.25,labs,xpd=TRUE,srt=60,pos=2) 

Overall, most of the results seem reasonable. However, the category “astronomical low tide” seems likely to have inflated numbers,perhaps due to flaws in the Jaccard indexing strategy with long and unusual terms.

Appendix A Substitutions/deletions for reported events

This table shows preliminary substitutions that were made for words and phrses in the EVTYPE field. The symbol [] indicates a deletion.

Reported phrase or word (cleaned) Edited phrase or word
heav rain/severe weather heav rain/high wind
severe thunderstorm wind thunderstorm wind
non thunderstorm wind high wind
record/excessive heat excessive heat
unseasonabl warm/dr heat
river/stream flood flood
flood/river flood flood
heat wave drought heat
cold air tornado tornado
late season snow heav snow
late season snow heav snow
heav snow shower heav snow
rapidl ri water flood
high wind/sea high wind/rough surf
high wind/sea high wind/rough surf
ice on road ice
hurricanegenerated swell storm surge
thunderstorm windshail thunderstorm wind/hail
mixed precipitation sleet
torrential rainfall heav rain
severe thunderstorm thunderstorm wind
heav precipitation heav rain
excessive rainfall heav rain
severe turbulence high wind
severe turbulence high wind
hurricane/typhoon hurricane
waterspouttornado waterspout
extreme windchill extreme cold/wind chill
unseasonabl warm heat
cold temperature cold/wind chill
thunderstormwind thunderstorm wind
unseasonabl cold cold
unseasonabl warm heat
record rainfall heav rain
record rainfall heav rain
low temperature cold/wind chill
unseasonal rain heav rain
excessive wetne heav rain
thunderstorm wi thunderstorm wind
freez drizzle freez fog
hazardou surf high surf
thunderestorm thunderstorm
thunderstormw thunderstorm wind
mixed precip sleet
thunderstrom thunderstorm wind
warm weather heat
costalstorm coastal storm
storm force high
frostfreeze frost freeze
hard freeze freeze
lake effect lakeeffect
record cold extreme cold
record heat excessive heat
record snow heav snow
river flood flood
heav shower heav rain
snow squall heav snow
thuderstorm thunderstorm
thundersnow thunderstorm wind
tunderstorm thunderstorm
tidal flood storm surge
flashflood flash flood
microburst thunderstorm wind
freez rain sleet
rock slide []
rogue wave high surf
rough surf high surf
snowsquall snow
freez spra freez fog
torrential heav
rogue wave high surf
downburst thunderstorm wind
wind gust strong wind
hailstorm hail
landspout water spout
mud slide []
rough sea high surf
rough sea high surf
mud slide []
cold wave cold
heat wave heat
high wave high surf
wind/wave wind/high surf
whirlwind dust devil
wintr mix sleet
avalance avalanche
gustnado thunderstorm wind
rainfall rain
ice road ice
heav sea high surf
high sea high surf
snowpack snow
current current
floodin flood
ic road ice
torndao tornado
typhoon hurricane
bitter extreme
lightn lightn
ligntn lightn
flood flood
freez freeze
glaze ice go
blow wind
cool cold
cstl coastal
gust strong wind
cool cold
fld flood
wet rain
hv heav
ic ice
accident []
accumulation []
advisor []
agricultural []
alberto []
apache []
awn []
beach []
black []
break []
breakup []
brush []
clou []
condition []
count []
dam []
damag []
damage []
dr []
drown []
edouard []
emil []
erin []
erosion []
exposure []
extended []
felix []
fire []
floe []
forest []
gordon []
gra []
gradient []
ground []
hypothermia []
injur []
jam []
jerr []
landslide []
landslump []
light []
major []
minor []
mircoburst []
mishap []
mph []
nonsevere []
nonthunderstorm []
on []
opal []
other []
road []
rural []
small []
sml []
snowmelt []
squall []
stream []
street []
to []
unseasonabl []
urban []
wauseon []
wild []

  1. https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf’ Table I page 6/

  2. All the original categories are kept, except that the term typhoon is eliminated. The word typhoon in the raw reported events is
    replaced by hurricaine.

  3. https://en.wikipedia.org/wiki/Jaccard_index