ReproRsrchPeerAssmemt2
This is a report on the NOAA Storm Data collection. It is a preliminary survey of the main types of events resulting in casualties (injuries or deaths) and damage (to property or crops). Summary tables are prepared and presented as are two summary figures.
Official event types
We first determine a granularity of events taken “official event types” as a basis for summary. These adopted from the Table 1 of NOAA’s Storm Data Preparation document, refered to below as NSDP. 1. This has 48 categories. 2
Reported events.
Unfortunately, the reported event types in the field EVTYPE of the data are very heterogeneous format, with 690 distinct labels. The bulk of the work in mining this data involved regularizing those categories and reconciling them with the official event types. There groupings chosen are undoubtedly suboptimal. They should serve as an example of what kinds of reports can be done. More input from subject matter experts and the goals of the supervisory team will need to be provided to make this work well. The programming used here can be relatively easily modified once the goals are clarified.
dataURL = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dataZipFile="./repdata-data-StormData.csv.bz2"
csvFile="./repdata-data-StormData.csv"
# Download file and unzip file if necessary
library(data.table); library(qdap); library(SnowballC); library(stringdist);
## Loading required package: qdapDictionaries
## Loading required package: qdapRegex
## Loading required package: qdapTools
##
## Attaching package: 'qdapTools'
##
## The following object is masked from 'package:data.table':
##
## shift
##
## Loading required package: RColorBrewer
##
## Attaching package: 'qdap'
##
## The following object is masked from 'package:base':
##
## Filter
if (!exists('DT') || !is.data.table(DT)) {
DT=NULL
catln = function(...) cat(...,'\n')
#` The following function will download and preprocess the data into a {data.table} object
loadRawData=function(){
# prepare (if necessary) a csv file and read it into the data table DT in parent
if (!file.exists(csvFile)){
catln("Downloading bz2file")
if (!file.exists(dataZipFile)){
# Adapted from http://stackoverflow.com/a/23899792/1795127
download.file(dataURL,dataZipFile, mode="wb")
}
catln('Unarchiving file to .csv')
sysUnarchiveCmd=paste0 ('open -a "Archive Utility" ', dataZipFile)
}
catln('Using fread() on .csv file')
DT <<- fread(csvFile)
}
if (!file.exists(csvFile)){
unzip(dataZipFile, csvFile)
}
loadRawData()
}
## Using fread() on .csv file
##
Read 0.0% of 967216 rows
Read 22.7% of 967216 rows
Read 37.2% of 967216 rows
Read 48.6% of 967216 rows
Read 55.8% of 967216 rows
Read 69.3% of 967216 rows
Read 78.6% of 967216 rows
Read 85.8% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:13
## Warning in fread(csvFile): Read less rows (902297) than were allocated
## (967216). Run again with verbose=TRUE and please report.
We need to do some initial cleaning of text. See comments in cleanText
cleanText=function(x){
#Clean up the labels in reported EVENT Type.
# Remove leading/trailing spaces, condense multispace to single space
# remove non alphabetic characters except / and space
# remove word final -ing , -s, -y
# replace complete words 'and' and 'from' with phrase break character /
y=tolower(gsub(' +',' ',
gsub('^\\s*|\\s*$','',x,perl=TRUE),
perl=TRUE ))
y=gsub('[^A-Za-z /]','',y,perl=TRUE)
y=gsub('(y|ing|s)\\b','',y,perl=TRUE)
y=gsub('\\band\\b|\\bfrom\\b','/',y,perl=TRUE)
return(y)
}
# First, we save the original reported events just in case.
Then we do a preliminary cleaning of the EVTYPE field and correct a few obvous misspellings
origReportedEvents=DT$EVTYPE
DT[,EVTYPE:=cleanText(DT$EVTYPE)]
# Replace commmon abbrev tstm with thunderstorm
DT[,EVTYPE:=gsub('\\btstm\\b','thunderstorm',EVTYPE,perl=TRUE)]
# One event listed as tstmw-- guessing it means 'thunderstorm wind'
DT[,EVTYPE:=gsub('\\btstmw\\b','thunderstorm wind',EVTYPE,perl=TRUE)]
Given the goals of the report, we first set up boolean vectors for events which have: * “EVTYPES” starting ‘summary’ (as these are not related to the official event types)
* entries indicate casualties, that is with non zero entries in “FATALITIES” or “INJURIES”)
binxEvent=!grepl('^summary',DT$EVTYPE,ignore.case=TRUE,perl=TRUE)
# Filter on casualties: injures and fatalities
binxCasualty=DT$INJURIES>0 | DT$FATALITIES>0
# Filter on damage (X_DMGEXP doesn't need to be looked at when X_DMG==0
binxDamage=DT$CROPDMG>0|DT$PROPDMG>0
binxRelevant=binxEvent&(binxCasualty|binxDamage)
The variables above will be useful for selecting data later in the report We can subset the data include only the relevant cases (rows) and relevant variables.
keyFields=c("EVTYPE","FATALITIES", "INJURIES","PROPDMG", "PROPDMGEXP",
"CROPDMG","CROPDMGEXP")
SDT=DT[binxRelevant,c(keyFields),with=FALSE]
We also need a list of ‘Official Event Types’.
officialEventTypesString='Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, Cold/Wind Chill,
Debris Flow, Dense Fog, Dense Smoke, Drought, Dust Devil, Dust Storm, Excessive Heat,
Extreme Cold/Wind Chill, Flash Flood, Flood, Frost/Freeze, Funnel Cloud, Freezing Fog,
Hail, Heat, Heavy Rain, Heavy Snow, High Surf, High Wind, Hurricane, Ice Storm,
Lake-Effect Snow, Lakeshore Flood, Lightning, Marine Hail, Marine High Wind,
Marine Strong Wind, Marine Thunderstorm Wind, Rip Current, Seiche, Sleet,
Storm Surge/Tide, Strong Wind,Thunderstorm Wind, Tornado, Tropical Depression,
Tropical Storm, Tsunami, Volcanic Ash, Waterspout, Wildfire, Winter Storm,
Winter Weather'
# These will be assigned in caller or global environment ( preparation for possible functionalization)
officialEventsOrig <<- strsplit(officialEventTypesString,' *[\n,]+ *')[[1]]
officialEvents <<- cleanText(officialEventsOrig)
A text substitution function editReportedEvents() was developed to map the ragged reportedEvents into editedEvents that bore a closer resemblance to terms in the main officialEvents list. This was the result of complex set of interactive stages involving ad hoc R scripts, evolving understanding of the relevant terminology, suggestions from internet fora (including this Reproducible Research fora) and repeated reference to the NSDP document. This is a long chunk of code, but it is useful to show its form here. It should be relatively easy to change with additional clarificaiton of the key event types of interest to our client.
# editReportedEvents.R
# IterativelyReprocessReportedEvents.R
# This is a hand edited sript for preprocessing substitution lists to feed to
# list the script createReportedEventsToEditedCode.R'#
editReportedEvents=function(reportedEvents){
# The following words seemed to bear no relation to the officialEvents list and will be deleted
rawDeleteStr =
'accident accumulation advisory agricultural apache alberto awning black
breakup dam break clou county condition beach erosion brush fire damage
damaging drowning dry dr unseasonably edouard emily erin exposure
extended felix fire floe forest gordon gra gradient ground hypothermia
road injur jam jerr landslide landslump light mph major minor mircoburst
mishap nonsevere nonthunderstorm on opal other rural urban small sml snowmelt
squall stream street to wauseon wild'
deleteWordPatCVec=sort(unique(cleanText(strsplit(rawDeleteStr,'\\s+',perl=TRUE)[[1]])))
# A few of these (downburst, gustnado ) were inferred from table of contents in the NOAA document
# microburst is assumed to be a kind of downburst
# The following EVTYPE phrases will (left hand column) will be replaced with
# terms in the officialEvents
# This can be edited.
rawSubsStr=
'cold air tornado > tornado
avalance > avalanche
bitter > extreme
blowing > wind
flooding > flood
costalstorm> coastal storm
cool > cold
cstl > coastal
currents>current
downburst> thunderstorm wind
freezing drizzle>freezing fog
unseasonably warm > heat
flashflood> flash flood
fld >flood
floodin > flood
freez>freeze
storm force > high
frostfreeze > frost freeze
wind gust > strong wind
glaze > ice go
gust > strong wind
gustnado > thunderstorm wind
hailstorm > hail
hard freeze > freeze
hazardous surf > high surf
hv>heav
hurricanegenerated swell > storm surge
ic > ice
lightn>lightning
landspout > water spout
late season snow > heavy snow
microburst > thunderstorm wind
mixed precip > sleet
mixed precipitation > sleet
freez rain> sleet
lake effect> lakeeffect
ligntn> lightn
wet > rain
mud slide > %
non thunderstorm wind> high wind
heav precipitation > heav rain
excessive rainfall> heav rain
record rainfall > heav rain
torrential rainfall > heav rain
rapidl ri water > flood
rainfall>rain
record cold > extreme cold
record heat >excessive heat
record rainfall > heav rain
record snow >heav snow
record/excessive heat > excessive heat
flood/river flood > flood
river/stream flood > flood
river flood > flood
ic road > ice
ice on road > ice
ice road > ice
rock slide>%
rogue wave > high surf
rough sea > high surf
rough surf>high surf
heav sea>high surf
high sea>high surf
rough sea>high surf
high wind/sea > high wind/rough surf
high wind/sea > high wind/rough surf
late season snow > heav snow
heav rain/severe weather > heav rain/high wind
severe thunderstorm >thunderstorm wind
severe thunderstorm wind > thunderstorm wind
severe turbulence> high wind
heav shower > heav rain
heav snow shower > heav snow
mud slide > %
snowpack >snow
snowsquall>snow
freez spra > freezing fog
snow squall > heavy snow
cold temperature > cold/wind chill
low temperature> cold/wind chill
thuderstorm>thunderstorm
thunderestorm>thunderstorm
thundersnow> thunderstorm wind
thunderstormw > thunderstorm wind
thunderstormwind > thunderstorm wind
thunderstrom >thunderstorm wind
tunderstorm > thunderstorm
tidal flood> storm surge
torndao > tornado
torrential>heavy
severe turbulence > high wind
hurricane/typhoon > hurricane
typhoon > hurricane
unseasonabl cold > cold
unseasonabl warm > heat
unseasonabl warm/dr > heat
unseasonal rain > heav rain
warm weather > heat
waterspouttornado > waterspout
cold wave > cold
heat wave > heat
heat wave drought > heat
high wave > high surf
rogue wave> high surf
wind/wave > wind/high surf
cool>cold
excessive wetne > heavy rain
whirlwind> dust devil
thunderstorm wi > thunderstorm wind
extreme windchill > extreme cold/wind chill
thunderstorm windshail > thunderstorm winds/hail
wintr mix > sleet'
# Produce the vector of patterns to be replace
rawSubsCvec=strsplit(
gsub(' *> *','>',rawSubsStr,perl=TRUE),
'\n')[[1]]
# These are applied to cleaned up names
subsPat=c()
subsRepl=c()
rawSubsList=strsplit(rawSubsCvec,'>')
subsPat=cleanText(sapply(rawSubsList,FUN= function(s) s[1]))
subsRepl=cleanText(sapply(rawSubsList,FUN= function(s) s[2]))
# Sort these out for screening after http://stackoverflow.com/a/8920256/1795127
# Sort out by number of words so larger changes occur first
patWC=sapply(gregexpr("\\W+", subsPat), length) + 1
patNchar=sapply(subsPat,nchar)
ordSubs=order(patWC,patNchar,decreasing=TRUE)
subsPat=subsPat[ordSubs]
subsRepl=subsRepl[ordSubs]
# Add on single word deletions
subsPat=c(subsPat,deleteWordPatCVec)
subsRepl=c(subsRepl,rep('',length(deleteWordPatCVec)))
# Generate as markdown appendix -- this is commented out in production
# catln('Reported phrase or word (cleaned)|Edited phrase or word ')
# catln("#' ------------------- | ------------------- ")
# for (i in 1:length(subsPat)){
# trepl=subsRepl[i]
# if(trepl=='') trepl= '[]'
# catln("#'",subsPat[i],' | ',trepl, ' ')
# }
# catln(' ')
editedEvents = reportedEvents
for (i in 1:length(subsPat)) {
editedEvents = sub(subsPat[i], subsRepl[i],reportedEvents, fixed = TRUE)
}
return(editedEvents)
}
# A new field editedEvents is added to the SDT summary database.
SDT[, editedEvents :=editReportedEvents(EVTYPE)]
The edited event names can be compared to the officialEvents list, which is generated by this code:
officialEventTypesString='Astronomical Low Tide, Avalanche, Blizzard, Coastal Flood, Cold/Wind Chill,
Debris Flow, Dense Fog, Dense Smoke, Drought, Dust Devil, Dust Storm, Excessive Heat,
Extreme Cold/Wind Chill, Flash Flood, Flood, Frost/Freeze, Funnel Cloud, Freezing Fog,
Hail, Heat, Heavy Rain, Heavy Snow, High Surf, High Wind, Hurricane, Ice Storm,
Lake-Effect Snow, Lakeshore Flood, Lightning, Marine Hail, Marine High Wind,
Marine Strong Wind, Marine Thunderstorm Wind, Rip Current, Seiche, Sleet,
Storm Surge/Tide, Strong Wind,Thunderstorm Wind, Tornado, Tropical Depression,
Tropical Storm, Tsunami, Volcanic Ash, Waterspout, Wildfire, Winter Storm,
Winter Weather'
# Original names (for printing, eg)
officialEventsOrig = strsplit(officialEventTypesString,' *[\n,]+ *')[[1]]
# Cleaned up names (for comparison with editedEvents)
officialEvents = cleanText(officialEventsOrig)
The preliminary editing above is only the first step. We now have to compare the sets of words in the edited reported events editedEvents with the official event types and try to assign them. This is a very preliminary and crude ‘document classification task’. Basically, all the words in the each of (pre edited) reported events are compared with all the words for each official event type. For each edited event, the official event whos words with the most overlap with those of reported word in question is substituted for that reported event event. Degree of overlapped is tae]ken as the Jaccard index. 3 This is undoubtedly too crude a procedure. More subject knowledge is needed to come up with a rational way to do this.
# Collapse the slashes in event types
officialEventsNoSlash=sub(pattern='/',replacement=' ',
officialEvents,fixed=TRUE)
editedEventsVec=unique(SDT$editedEvents)
editedEventsNoSlash=sub(pattern='/',replacement=' ',
editedEventsVec,fixed=TRUE)
# Simple bag of words function
bow=function(x) unique(strsplit(x,' +',perl=TRUE)[[1]])
editedBow=sapply(editedEventsNoSlash,bow)
officialBow=sapply(officialEventsNoSlash,bow)
# Jaccard index function--- relative match of bags of words
fJaccard=function(x,y)
return(length(intersect(x,y))/length(union(x,y)))
# Generate proximity vector showind Jaccard index for each input bow with all the officialBow entries
fProxVec=function(x){
# return vector of proximity for set x to each officialBow
tvec=sapply(officialBow,function(y) fJaccard(x,y))
return(tvec)
}
# Find the best approximate match for each item
jBestVec=NULL
for (i in 1:length(editedBow)){
pv=fProxVec(editedBow[[i]])
# catln(editedBow[[i]],sort(pv,decreasin=TRUE)[1:5])
jBest=which.max(pv)
pBest=pv[jBest]
# catln(editedEventsVec[i],'<', officialEvents[jBest],pBest)
jBestVec=c(jBestVec,jBest)
}
# Add the approximateClassification to SDS database
# Get an index number in the editedEventsVec list
findEditedEventsInx=function(x) which(x==editedEventsVec)
# Look up the name of the best matching officialEvent
substituteApproxOfficialEvent = function(x){
return(officialEvents[jBestVec[findEditedEventsInx(x)]])
}
substituteAllApproxEvents=function(x){
sapply(x,substituteApproxOfficialEvent)
}
# Now add the new column
SDT[,approxType:=substituteAllApproxEvents(editedEvents)]
We can use the reclassified EVTYPES a basis for casualty and damage calculation that is the object of this enterprise. We can table and plot some basic results. Although casualties (injuries and fatalities) are straightforward, reporting of damage is rather baroque, involving a base number and a power of 10 exponent. The latter needs to be decoded. See comments in the fDamage() function below.
fDamage=function(dmg,dmgexp){
# Only digits are and letters h k m b areinterpretable
# Powers of 10 b(illion)=10^9,b(illion)=10^6, k(ilo ==thousand)= 10^3 h(undred)=10^2
# Test x=c('b','k','m','2','7','2','','+')
# expLookup[x]
# b k m 2 7 2 <NA> +
# 9 3 6 2 7 2 NA NA
expLookup=c("0" =0 ,"2"=2, "3"=3, "4"=4 ,"5"=5 ,"6"=6, "7"=7,
"b"=9, "h"=2 ,"k"=3 ,"m"=6)
return(dmg*10^(expLookup[tolower(dmgexp)]))
}
# Calculate damage in dollars
SDT[,propDamage:=fDamage(PROPDMG,PROPDMGEXP)]
SDT[,cropDamage:=fDamage(CROPDMG,CROPDMGEXP)]
# Summaries
ByTypeDT=SDT[ , .(injuries=sum(INJURIES,na.rm=TRUE),
fatalities=sum(FATALITIES,na.rm=TRUE),
property=sum(propDamage,na.rm=TRUE),
crops=sum(cropDamage,na.rm=TRUE)
), by=approxType
]
# Printing dollar amounts on data frames (and datables)
# Adapted from http://stackoverflow.com/a/22071182/1795127 ("Roland")
class(SDT$propDamage) <- c("money", class(SDT$propDamage))
class(SDT$cropDamage) <- c("money", class(SDT$cropDamage))
print.money <- function(x, ...) {
print.default(paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=",")))
}
format.money <- function(x, ...) {
paste0("$", formatC(as.numeric(x), format="f", digits=2, big.mark=","))
}
ByTypeDT[, casualties:=injuries+fatalities]
ByTypeDT[, damage:=property+crops]
class(ByTypeDT$property) <- c("money", class(ByTypeDT$property))
class(ByTypeDT$crops) <- c("money", class(ByTypeDT$crops))
class(ByTypeDT$damage) <- c("money", class(ByTypeDT$damage))
The resulting casualty counts are shown in the following table for all the official events.
setorder(ByTypeDT,-casualties)
print(ByTypeDT[,.(approxType,casualties,fatalities,injuries)])
## approxType casualties fatalities injuries
## 1: tornado 97043 5636 91407
## 2: thunderstorm wind 10217 735 9482
## 3: excessive heat 8472 1924 6548
## 4: flood 7277 482 6795
## 5: lightn 6049 817 5232
## 6: heat 3896 1212 2684
## 7: flash flood 2837 1035 1802
## 8: astronomical low tide 2438 251 2187
## 9: ice storm 2249 97 2152
## 10: high wind 1954 326 1628
## 11: winter storm 1554 216 1338
## 12: hurricane 1459 133 1326
## 13: hail 1386 15 1371
## 14: heav snow 1279 160 1119
## 15: dense fog 1158 81 1077
## 16: rip current 1106 577 529
## 17: blizzard 906 101 805
## 18: winter weather 606 66 540
## 19: extreme cold/wind chill 564 304 260
## 20: dust storm 463 23 440
## 21: tropical storm 449 66 383
## 22: heav rain 415 118 297
## 23: strong wind 412 111 301
## 24: avalanche 394 224 170
## 25: high surf 375 160 215
## 26: cold/wind chill 203 143 60
## 27: tsunami 162 33 129
## 28: storm surge/tide 67 24 43
## 29: marine thunderstorm wind 53 19 34
## 30: freez fog 49 11 38
## 31: dust devil 45 2 43
## 32: marine strong wind 36 14 22
## 33: waterspout 32 3 29
## 34: drought 23 4 19
## 35: coastal flood 18 9 9
## 36: marine hail 15 8 7
## 37: frost/freeze 5 2 3
## 38: funnel cloud 3 0 3
## 39: sleet 2 2 0
## 40: marine high wind 2 1 1
## 41: seiche 0 0 0
## 42: volcanic ash 0 0 0
## 43: tropical depression 0 0 0
## 44: lakeeffect snow 0 0 0
## 45: lakeshore flood 0 0 0
## 46: dense smoke 0 0 0
## approxType casualties fatalities injuries
A bar plot of the 10 categories with the higest casualty counts is shown here:
# Barplot code adapted from
# http://stackoverflow.com/a/10286695/1795127 and
# from http://www.statmethods.net/graphs/bar.html
# Stacked Bar Plot with Colors and Legend
top10Cas=as.matrix(ByTypeDT[1:10,.(fatalities,injuries)])
propTop10Cas=sum(ByTypeDT$casualties[1:10]/sum (ByTypeDT$casualties))
bpTitle=paste0('10 types causing ', round(1000*propTop10Cas)/10,'% of casualties')
bp=barplot(t(top10Cas),xaxt="n", main=bpTitle,legend=colnames(top10Cas))
labs=ByTypeDT$approxType[1:10]
text(cex=.8, x=bp+.2,y=-1.25,labs,xpd=TRUE,srt=60,pos=2)
The total damage costs are shown in the following table for all the official events.
setorder(ByTypeDT,-damage)
print(ByTypeDT[,.(approxType,damage,property,crops)])
## approxType damage property
## 1: flood $161,012,730,600.00 $150,165,277,650.00
## 2: hurricane $90,161,397,810.00 $84,656,105,010.00
## 3: tornado $57,418,263,383.50 $57,003,302,863.50
## 4: storm surge/tide $47,965,579,000.00 $47,964,724,000.00
## 5: flash flood $19,121,994,028.50 $17,589,796,878.50
## 6: hail $18,783,503,075.70 $15,736,565,455.70
## 7: drought $15,018,922,000.00 $1,046,306,000.00
## 8: thunderstorm wind $14,052,334,553.10 $12,778,225,573.10
## 9: astronomical low tide $10,183,236,330.00 $9,744,562,650.00
## 10: ice storm $8,981,218,660.00 $3,959,104,360.00
## 11: tropical storm $8,409,286,550.00 $7,714,390,550.00
## 12: high wind $6,866,611,390.00 $6,164,244,490.00
## 13: winter storm $6,716,941,251.00 $6,689,497,251.00
## 14: heav rain $4,073,804,490.00 $3,267,288,690.00
## 15: frost/freeze $2,015,761,000.00 $18,700,000.00
## 16: extreme cold/wind chill $1,407,213,400.00 $77,190,400.00
## 17: heav snow $1,104,999,840.00 $970,316,740.00
## 18: lightn $947,501,516.50 $935,409,426.50
## 19: blizzard $771,373,950.00 $659,313,950.00
## 20: excessive heat $644,096,480.00 $9,688,700.00
## 21: coastal flood $433,988,060.00 $433,932,060.00
## 22: heat $424,383,550.00 $12,372,050.00
## 23: strong wind $251,127,740.00 $181,174,240.00
## 24: cold/wind chill $155,386,500.00 $58,644,000.00
## 25: tsunami $144,082,000.00 $144,062,000.00
## 26: high surf $100,560,650.00 $100,560,650.00
## 27: winter weather $42,298,000.00 $27,298,000.00
## 28: lakeeffect snow $40,115,000.00 $40,115,000.00
## 29: dense fog $22,829,500.00 $22,829,500.00
## 30: freez fog $13,504,500.00 $13,504,500.00
## 31: dust storm $9,699,000.00 $6,099,000.00
## 32: waterspout $9,564,200.00 $9,564,200.00
## 33: lakeshore flood $7,540,000.00 $7,540,000.00
## 34: marine thunderstorm wind $5,907,400.00 $5,857,400.00
## 35: avalanche $3,721,800.00 $3,721,800.00
## 36: tropical depression $1,737,000.00 $1,737,000.00
## 37: sleet $1,400,000.00 $1,400,000.00
## 38: marine high wind $1,297,010.00 $1,297,010.00
## 39: seiche $980,000.00 $980,000.00
## 40: dust devil $739,130.00 $739,130.00
## 41: volcanic ash $500,000.00 $500,000.00
## 42: marine strong wind $418,330.00 $418,330.00
## 43: funnel cloud $194,600.00 $194,600.00
## 44: rip current $163,000.00 $163,000.00
## 45: dense smoke $100,000.00 $100,000.00
## 46: marine hail $54,000.00 $54,000.00
## approxType damage property
## crops
## 1: $10,847,452,950.00
## 2: $5,505,292,800.00
## 3: $414,960,520.00
## 4: $855,000.00
## 5: $1,532,197,150.00
## 6: $3,046,937,620.00
## 7: $13,972,616,000.00
## 8: $1,274,108,980.00
## 9: $438,673,680.00
## 10: $5,022,114,300.00
## 11: $694,896,000.00
## 12: $702,366,900.00
## 13: $27,444,000.00
## 14: $806,515,800.00
## 15: $1,997,061,000.00
## 16: $1,330,023,000.00
## 17: $134,683,100.00
## 18: $12,092,090.00
## 19: $112,060,000.00
## 20: $634,407,780.00
## 21: $56,000.00
## 22: $412,011,500.00
## 23: $69,953,500.00
## 24: $96,742,500.00
## 25: $20,000.00
## 26: $0.00
## 27: $15,000,000.00
## 28: $0.00
## 29: $0.00
## 30: $0.00
## 31: $3,600,000.00
## 32: $0.00
## 33: $0.00
## 34: $50,000.00
## 35: $0.00
## 36: $0.00
## 37: $0.00
## 38: $0.00
## 39: $0.00
## 40: $0.00
## 41: $0.00
## 42: $0.00
## 43: $0.00
## 44: $0.00
## 45: $0.00
## 46: $0.00
## crops
A bar plot of the 10 categories with the higest damage costs is shown here:
top10Dmg=as.matrix(ByTypeDT[1:10,.(property,crops)])
propTop10Dmg=sum(ByTypeDT$damage[1:10]/sum (ByTypeDT$damage))
bpTitle=paste0('10 types causing ', round(1000*propTop10Cas)/10,'% of damage')
bp=barplot(t(top10Dmg)/1e9,xaxt="n", main=bpTitle,legend=colnames(top10Dmg), ylab='Billions of dollars')
labs=ByTypeDT$approxType[1:10]
text(cex=.8, x=bp+.2,y=-1.25,labs,xpd=TRUE,srt=60,pos=2)
Overall, most of the results seem reasonable. However, the category “astronomical low tide” seems likely to have inflated numbers,perhaps due to flaws in the Jaccard indexing strategy with long and unusual terms.
This table shows preliminary substitutions that were made for words and phrses in the EVTYPE field. The symbol [] indicates a deletion.
| Reported phrase or word (cleaned) | Edited phrase or word |
|---|---|
| heav rain/severe weather | heav rain/high wind |
| severe thunderstorm wind | thunderstorm wind |
| non thunderstorm wind | high wind |
| record/excessive heat | excessive heat |
| unseasonabl warm/dr | heat |
| river/stream flood | flood |
| flood/river flood | flood |
| heat wave drought | heat |
| cold air tornado | tornado |
| late season snow | heav snow |
| late season snow | heav snow |
| heav snow shower | heav snow |
| rapidl ri water | flood |
| high wind/sea | high wind/rough surf |
| high wind/sea | high wind/rough surf |
| ice on road | ice |
| hurricanegenerated swell | storm surge |
| thunderstorm windshail | thunderstorm wind/hail |
| mixed precipitation | sleet |
| torrential rainfall | heav rain |
| severe thunderstorm | thunderstorm wind |
| heav precipitation | heav rain |
| excessive rainfall | heav rain |
| severe turbulence | high wind |
| severe turbulence | high wind |
| hurricane/typhoon | hurricane |
| waterspouttornado | waterspout |
| extreme windchill | extreme cold/wind chill |
| unseasonabl warm | heat |
| cold temperature | cold/wind chill |
| thunderstormwind | thunderstorm wind |
| unseasonabl cold | cold |
| unseasonabl warm | heat |
| record rainfall | heav rain |
| record rainfall | heav rain |
| low temperature | cold/wind chill |
| unseasonal rain | heav rain |
| excessive wetne | heav rain |
| thunderstorm wi | thunderstorm wind |
| freez drizzle | freez fog |
| hazardou surf | high surf |
| thunderestorm | thunderstorm |
| thunderstormw | thunderstorm wind |
| mixed precip | sleet |
| thunderstrom | thunderstorm wind |
| warm weather | heat |
| costalstorm | coastal storm |
| storm force | high |
| frostfreeze | frost freeze |
| hard freeze | freeze |
| lake effect | lakeeffect |
| record cold | extreme cold |
| record heat | excessive heat |
| record snow | heav snow |
| river flood | flood |
| heav shower | heav rain |
| snow squall | heav snow |
| thuderstorm | thunderstorm |
| thundersnow | thunderstorm wind |
| tunderstorm | thunderstorm |
| tidal flood | storm surge |
| flashflood | flash flood |
| microburst | thunderstorm wind |
| freez rain | sleet |
| rock slide | [] |
| rogue wave | high surf |
| rough surf | high surf |
| snowsquall | snow |
| freez spra | freez fog |
| torrential | heav |
| rogue wave | high surf |
| downburst | thunderstorm wind |
| wind gust | strong wind |
| hailstorm | hail |
| landspout | water spout |
| mud slide | [] |
| rough sea | high surf |
| rough sea | high surf |
| mud slide | [] |
| cold wave | cold |
| heat wave | heat |
| high wave | high surf |
| wind/wave | wind/high surf |
| whirlwind | dust devil |
| wintr mix | sleet |
| avalance | avalanche |
| gustnado | thunderstorm wind |
| rainfall | rain |
| ice road | ice |
| heav sea | high surf |
| high sea | high surf |
| snowpack | snow |
| current | current |
| floodin | flood |
| ic road | ice |
| torndao | tornado |
| typhoon | hurricane |
| bitter | extreme |
| lightn | lightn |
| ligntn | lightn |
| flood | flood |
| freez | freeze |
| glaze | ice go |
| blow | wind |
| cool | cold |
| cstl | coastal |
| gust | strong wind |
| cool | cold |
| fld | flood |
| wet | rain |
| hv | heav |
| ic | ice |
| accident | [] |
| accumulation | [] |
| advisor | [] |
| agricultural | [] |
| alberto | [] |
| apache | [] |
| awn | [] |
| beach | [] |
| black | [] |
| break | [] |
| breakup | [] |
| brush | [] |
| clou | [] |
| condition | [] |
| count | [] |
| dam | [] |
| damag | [] |
| damage | [] |
| dr | [] |
| drown | [] |
| edouard | [] |
| emil | [] |
| erin | [] |
| erosion | [] |
| exposure | [] |
| extended | [] |
| felix | [] |
| fire | [] |
| floe | [] |
| forest | [] |
| gordon | [] |
| gra | [] |
| gradient | [] |
| ground | [] |
| hypothermia | [] |
| injur | [] |
| jam | [] |
| jerr | [] |
| landslide | [] |
| landslump | [] |
| light | [] |
| major | [] |
| minor | [] |
| mircoburst | [] |
| mishap | [] |
| mph | [] |
| nonsevere | [] |
| nonthunderstorm | [] |
| on | [] |
| opal | [] |
| other | [] |
| road | [] |
| rural | [] |
| small | [] |
| sml | [] |
| snowmelt | [] |
| squall | [] |
| stream | [] |
| street | [] |
| to | [] |
| unseasonabl | [] |
| urban | [] |
| wauseon | [] |
| wild | [] |
https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf’ Table I page 6/↩
All the original categories are kept, except that the term typhoon is eliminated. The word typhoon in the raw reported events is
replaced by hurricaine.↩