This report presents the analysis of casualty and economical lost caused by many types of disasters and summarize which type of disaster gave the US the heaviest damage. The raw data are retrived from Storm Events Database. The available dates are between 1951 and 2012. My hypothesis is that the severity of a disaster type can be evaluated by the total number of casualty (death and injury) and the lost dollors in the summary of the property and crop damage. I summarized these data by the event type and the year the event recorded. I used the 48 types of events in the documentation proposed by National Weather Service and found most available data recoded after 1990s. In the final summaries of top 10 types, both tornado and flood are listed. Among the damages casued by the disasters, tornado takes the most Americans’ life out, and flood causes the greatest damage to the states economy.
The raw data is a csv file with 37 columns. The column EVTYPE stores the recorded names since the establish of Storme Events Database.
FileName <- dir()[grep(".bz2",dir())]
con <- bzfile(paste0(getwd(),"/",FileName))
RAW_CSV <- read.csv(con)
However, there are 37 names recorded. Because the documentation lists 48 types of events, it is necessary the clean this column at first.
EVTYPEI created EVTYPE_Label_raw as the storage of raw labels in EVTYPE and create EVTYPE_Label_rep as the storage of cleaned labels. The cleaned labels of 48 event types are transfered to the regular expressions. They are listed after the code of cleaning the data.
EVTYPE_Label_raw = RAW_CSV$EVTYPE
EV_Table <- read.table("EVT_table.txt",sep=",")
EVTYPE_Label_rep = as.character(EVTYPE_Label_raw)
for(i in 1:dim(EV_Table)[1])
{
EVTYPE_Label_rep[unlist( regexpr(EV_Table$V1[i],ignore.case = TRUE, EVTYPE_Label_raw) ) != -1 ] = as.character( EV_Table$V2[i] )
}
## V1 V2
## 1 Astronomical Low Tide Astronomical Low Tide
## 2 Avalanche Avalanche
## 3 Blizzard|Ice Blizzard
## 4 Coastal|[^Flood] Coastal Flood
## 5 Cold Cold/Wind Chill
## 6 Debris Flow Debris Flow
## 7 Dense Fog Dense Fog
## 8 Dense Smoke Dense Smoke
## 9 ^Dr[ought|y] Drought
## 10 Dust Devil Dust Devil
## 11 Dust Storm Dust Storm
## 12 Excessive Heat Excessive Heat
## 13 Extreme Cold|Wind Chill Extreme Cold/Wind Chill
## 14 Flash Flood Flash Flood
## 15 Fl[^oo]d|Flood Flood
## 16 Frost|^Freez[e]|Freez[e]$ Frost/Freeze
## 17 Funnel Cloud Funnel Cloud
## 18 Freezing Fog Freezing Fog
## 19 Hail Hail
## 20 Heat Heat
## 21 Heavy Rain Heavy Rain
## 22 Heavy Snow Heavy Snow
## 23 High Surf High Surf
## 24 High Wind High Wind
## 25 Hurricane|Typhoon Hurricane (Typhoon)
## 26 Ice Storm Ice Storm
## 27 Lake-Effect Snow Lake-Effect Snow
## 28 Lakeshore Flood Lakeshore Flood
## 29 Lightning Lightning
## 30 Marine Hail Marine Hail
## 31 Marine High Wind Marine High Wind
## 32 Marine Strong Wind Marine Strong Wind
## 33 Marine Thunderstorm Wind Marine Thunderstorm Wind
## 34 Rip Current Rip Current
## 35 Seiche Seiche
## 36 Sleet Sleet
## 37 Storm Surge|Tide Storm Surge/Tide
## 38 Strong Wind Strong Wind
## 39 Thunderstorm Thunderstorm Wind
## 40 Tornado Tornado
## 41 Tropical Depression Tropical Depression
## 42 Tropical Storm Tropical Storm
## 43 ^Ts[unami] Tsunami
## 44 ^Volcanic|[Ash]$ Volcanic Ash
## 45 ^Waterspout|Waterspout$|spout[.|\\n]$|^WAYT Waterspout
## 46 ^Wild|fire[sS]$ Wildfire
## 47 Winter Storm Winter Storm
## 48 ^Wint[er] Winter Weather
At the end of cleaning process, I got 40 cleaned event types as shown below.
levels(as.factor(EVTYPE_Label_rep))
## [1] "Coastal Flood" "Cold/Wind Chill"
## [3] "Dense Fog" "Dense Smoke"
## [5] "Drought" "Dust Devil"
## [7] "Dust Storm" "Extreme Cold/Wind Chill"
## [9] "Flood" "Freezing Fog"
## [11] "Frost/Freeze" "Funnel Cloud"
## [13] "Hail" "Heat"
## [15] "Heavy Rain" "Heavy Snow"
## [17] "High Surf" "High Wind"
## [19] "Hurricane (Typhoon)" "Ice Storm"
## [21] "Lake-Effect Snow" "Lakeshore Flood"
## [23] "Lightning" "Marine Hail"
## [25] "Marine High Wind" "Rip Current"
## [27] "Seiche" "Sleet"
## [29] "Storm Surge/Tide" "Strong Wind"
## [31] "Thunderstorm Wind" "Tornado"
## [33] "Tropical Depression" "Tropical Storm"
## [35] "Tsunami" "Volcanic Ash"
## [37] "Waterspout" "Wildfire"
## [39] "Winter Storm" "Winter Weather"
Because the following process will summarize the raw data by year, I retrived the year of each data from BGN_DATA and stored in Year_ind. In consideration of event types had two primary changes in the past versions, I set 1955 and 1996 as the index to separate the raw data in the coming analysis. Note: I firstly deleted the final characters ‘0:00:00’ for every ‘BGN_DATE’ label.
RAW_CSV$BGN_DATE = gsub(" 0:00:00", "", RAW_CSV$BGN_DATE)
Year_ind <- as.numeric( substring(RAW_CSV$BGN_DATE, (nchar(RAW_CSV$BGN_DATE)-3), nchar(RAW_CSV$BGN_DATE)) )
Y1 <- 1955
Y2 <- 1996
To analyze the caualty of each event type by year, I created a data table CASUALTY.DT merged cleaned EVTYPE, YEAR of event recorded, and FATALITY and INJURY from raw data.
CASUALTY.DT <- data.table(EVTYPE = EVTYPE_Label_rep, YEAR = Year_ind, FATALITY = RAW_CSV$FATALITIES, INJURY = RAW_CSV$INJURIES)
After check the varialbes FATALITY and INJURY, I found they are the number of people recorded in the event.
CASUALTY.DT[,c(class(FATALITY), class(INJURY) )]
## [1] "numeric" "numeric"
CASUALTY.DT[,c( head(table(FATALITY)),tail(table(FATALITY)) )]
## 0 1 2 3 4 5 90 99 114 116
## 895323 5010 996 314 166 114 1 1 1 1
## 158 583
## 1 1
CASUALTY.DT[,c( head(table(INJURY)),tail(table(INJURY)) )]
## 0 1 2 3 4 5 785 800 1150 1228
## 884693 7756 3134 1552 931 709 1 2 2 1
## 1568 1700
## 1 1
To analyze the economical lost of each event type by year, I created two data table merged cleaned EVTYPE, YEAR of event recorded, and ???DMG and ???DMGEXP from raw data. One data table PROPDMG.DT stores the raw data for property damage, and the other data table CROPDMG.DT stores the raw data for crop damage. ???DMGEXP(PROPDMGEXP, CORPDMGEXP) are vectors of characters. According to the documentation, these columns store the units of dollors including “K” for thousands, “M” for millions, and “B” for billions. However, these columns have more than these alphabetics. I created the vectors PROP.ind and CROP.ind to filter the data for the coming analysis.
table(RAW_CSV$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(RAW_CSV$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
PROP.ind <- grep("[BKM]", RAW_CSV$PROPDMGEXP, ignore.case = TRUE)
PROPDMG.DT <- data.table(EVTYPE = EVTYPE_Label_rep, YEAR = Year_ind, PROPDMG = RAW_CSV$PROPDMG, PROPDMGEXP = as.character(RAW_CSV$PROPDMGEXP))[PROP.ind,]
CROP.ind <- grep("[BKM]", RAW_CSV$CROPDMGEXP, ignore.case = TRUE)
CROPDMG.DT <- data.table(EVTYPE = EVTYPE_Label_rep, YEAR = Year_ind, CROPDMG = RAW_CSV$CROPDMG, CROPDMGEXP = as.character(RAW_CSV$CROPDMGEXP))[CROP.ind,]
The total US dollars lost is the index to evaluate the serverity of each disaster. I created a variable COST stored the values. Then the tables PROPDMG.SET and CROPDMG.SET store the total lost of each disaster was sum up by year.
PROPDMG.DT$PROPDMGEXP[PROPDMG.DT[,grep("[Kk]",PROPDMGEXP)]] = 1000
PROPDMG.DT$PROPDMGEXP[PROPDMG.DT[,grep("[Mm]",PROPDMGEXP)]] = 1000000
PROPDMG.DT$PROPDMGEXP[PROPDMG.DT[,grep("[Bb]",PROPDMGEXP)]] = 1000000000
PROPDMG.DT[,COST:=as.numeric(PROPDMG)*as.numeric(PROPDMGEXP)]
PROPDMG.SET <- PROPDMG.DT[,tapply(COST, paste(YEAR, EVTYPE), sum)] # Cost per type and per year
PROPDMG.SET = data.table(EVTYPE = substring(names(PROPDMG.SET), 6), YEAR = as.numeric( substr(names(PROPDMG.SET), 1, 4) ), COST = PROPDMG.SET)
CROPDMG.DT$CROPDMGEXP[CROPDMG.DT[,grep("[Kk]",CROPDMGEXP)]] = 1000
CROPDMG.DT$CROPDMGEXP[CROPDMG.DT[,grep("[Mm]",CROPDMGEXP)]] = 1000000
CROPDMG.DT$CROPDMGEXP[CROPDMG.DT[,grep("[Bb]",CROPDMGEXP)]] = 1000000000
CROPDMG.DT[,COST:=as.numeric(CROPDMG)*as.numeric(CROPDMGEXP)]
CROPDMG.SET <- CROPDMG.DT[,tapply(COST, paste(YEAR, EVTYPE), sum)] # Cost per type and per year
CROPDMG.SET = data.table(EVTYPE = substring(names(CROPDMG.SET), 6), YEAR = as.numeric( substr(names(CROPDMG.SET), 1, 4) ), COST = CROPDMG.SET)
Before 1955 Storm Events Database only collects the records of torando. Firstly I exclude the data before 1995 from this analysis. I found 1.514510^{4} people lost life and 1.4052810^{5} people injured in these events since 1955. To compare the damage of every disaster type to American people, I calculated the death rates (in percentile) of each type from the data table.
I summarized the death rates of each disaster type and present the top 10 event type in the order. Tornado got the heaviest disaster among the event types.
sort(CASUALTY.DT[YEAR > Y1,100*tapply(FATALITY, EVTYPE, sum)/sum(FATALITY)], decreasing = TRUE)[1:10]
## Tornado Heat Flood
## 32.844907 22.177391 10.773696
## Coastal Flood Lightning Extreme Cold/Wind Chill
## 8.154598 5.783252 2.704042
## Rip Current Volcanic Ash Winter Weather
## 2.640334 2.576626 1.974942
## High Wind
## 1.776740
Here are the death rates of top 10 disasters before 1996 and after 1996. Before 1996, tornado is the heaviest disaster and flood (coastal flood and flood) take the secondary. After 1996, heat jumed to top 1, but torando and flood take the secondary and third positions.
sort(CASUALTY.DT[YEAR > Y1 & YEAR < Y2,100*tapply(FATALITY, EVTYPE, sum)/sum(FATALITY)], decreasing = TRUE)[1:10] # death rates of top 10 between 1954 and 1996
## Tornado Heat Coastal Flood
## 57.9981464 20.3336423 7.2474513
## Flood Lightning Volcanic Ash
## 3.9481001 3.0769231 2.4281742
## Extreme Cold/Wind Chill Rip Current Cold/Wind Chill
## 0.8711770 0.6116775 0.5560704
## Winter Weather
## 0.4819277
sort(CASUALTY.DT[YEAR > Y2,100*tapply(FATALITY, EVTYPE, sum)/sum(FATALITY)], decreasing = TRUE)[1:10] # death rates of top 10 after 1996
## Heat Tornado Flood
## 24.420024 18.131868 14.432234
## Coastal Flood Lightning Rip Current
## 8.376068 7.301587 4.151404
## Extreme Cold/Wind Chill Winter Weather High Wind
## 3.479853 2.796093 2.576313
## Volcanic Ash
## 2.503053
Here I merged FATALITY and INJURY and calcuated the casualty rate of each EVTYPE. All the data show tornado the heaviest disaster type. Heat and flood comes after in a casualty rate lower than 10%.
sort(CASUALTY.DT[YEAR > Y1,100*tapply( (FATALITY+INJURY), EVTYPE, sum)/sum((FATALITY+INJURY))], decreasing = TRUE)[1:10]
## Tornado Heat Coastal Flood Flood
## 59.490969 8.534724 7.387506 6.992437
## Lightning Winter Weather Ice Storm Volcanic Ash
## 4.177228 1.551956 1.437304 1.402079
## Thunderstorm Wind Wildfire
## 1.175536 1.171392
As with the analysis of death rate, I calcuated the casualty rates before 1996 and after 1996. Tornado and flood/coastal flood take the first and secondary heaviest disasters. However, the third heaviest disaster before 1996 is ice storme, but heat takes this position after 1996.
sort(CASUALTY.DT[YEAR > Y1 & YEAR < Y2,100*tapply( (FATALITY+INJURY), EVTYPE, sum)/sum((FATALITY+INJURY))], decreasing = TRUE)[1:10] # casualty rates between 1954 and 1996
## Tornado Coastal Flood Heat Ice Storm Volcanic Ash
## 81.9129588 5.6392326 3.3786726 2.1529752 1.8379057
## Lightning Hail Flood Heavy Snow Dense Fog
## 1.6086478 0.8401855 0.4790082 0.4533928 0.2664003
sort(CASUALTY.DT[YEAR > Y2,100*tapply( (FATALITY+INJURY), EVTYPE, sum)/sum((FATALITY+INJURY))], decreasing = TRUE)[1:10] # casualty rates after 1996
## Tornado Flood Heat
## 33.802484 15.065881 15.058000
## Coastal Flood Lightning Winter Weather
## 8.837158 6.972639 2.885828
## Thunderstorm Wind Wildfire Hurricane (Typhoon)
## 2.472891 2.379902 2.203379
## High Wind
## 1.861367
Because 48 lables are defined since 1996, Storm Events Database should have the precise records of econmoic lost since that year. The economical lost of each disaster is estimated by the total US dollars recorded since 1996. Firstly I summarized the property damage caused by each disaster. Here are the names of first 10 disasters according to the total dollars lost they caused since 1996.
PROP.TOP10 <- names( sort(PROPDMG.SET[YEAR > Y2,tapply(COST, EVTYPE, sum)], decreasing = TRUE) )[1:10]
## [1] "Flood" "Hurricane (Typhoon)" "Storm Surge/Tide"
## [4] "Tornado" "Hail" "Wildfire"
## [7] "Tropical Storm" "High Wind" "Coastal Flood"
## [10] "Ice Storm"
From 1996 to 2011, the property damage of top 10 disasters is summarized in the histgram below. This figure shows the records between 2005 and 2006 make flood and hurricane occupy the first and second heaviest disaster since 1996. The extreme climate changing in past decades is the primary cause increasing the property damage in the US.
barplot2(
as.matrix( PROPDMG.SET[EVTYPE == PROP.TOP10,][YEAR > Y2,tapply(COST/1000000000, YEAR, list)] ),
space = 0,
main = paste("Property Damage by Top 10 disasters in USA since", (Y2+1)),
xlab = "YEAR",
ylab = "Total Lost a Year in US dollors(billion)",
col = brewer.pal(10, "Set3"),
legend.text = unlist(PROPDMG.SET[EVTYPE == PROP.TOP10,][YEAR == Y2,tapply(EVTYPE, YEAR, list)])
)
I summarized the crop damage as the way I summarized the property damage. Here are the names of first 10 disasters according to the total dollars lost they caused since 1996.
# filter the top 10 events
CROP.TOP10 <- names( sort(CROPDMG.SET[YEAR > Y2,tapply(COST, EVTYPE, sum)], decreasing = TRUE) )[1:10]
## [1] "Drought" "Flood"
## [3] "Hurricane (Typhoon)" "Hail"
## [5] "Frost/Freeze" "Extreme Cold/Wind Chill"
## [7] "Heavy Rain" "Tropical Storm"
## [9] "Coastal Flood" "High Wind"
Flood did not cause the heaviest crop damage since the establish of Storm Database. At least before 2007, drought occupied the most of lost per year. Flood has became severe to the agricultural produce since 2007. Is this associated the extreme property damages happened at 2005 and 2006? This might be the trend we have to trace in the following years.
barplot2(
as.matrix( CROPDMG.SET[EVTYPE == CROP.TOP10,][YEAR > Y2,tapply(COST/1000000000, YEAR, list)] ),
main = paste("Crop Damage by Top 10 disasters in USA since", (Y2+1) ),
space = 0,
xlab = "YEAR",
ylab = "Total Lost a Year in US dollors(billion)",
col = brewer.pal(10, "Set3"),
legend.text = unlist(CROPDMG.SET[EVTYPE == CROP.TOP10,][YEAR == (Y2),tapply(EVTYPE, YEAR, list)])
)