Synopsis

This report presents the analysis of casualty and economical lost caused by many types of disasters and summarize which type of disaster gave the US the heaviest damage. The raw data are retrived from Storm Events Database. The available dates are between 1951 and 2012. My hypothesis is that the severity of a disaster type can be evaluated by the total number of casualty (death and injury) and the lost dollors in the summary of the property and crop damage. I summarized these data by the event type and the year the event recorded. I used the 48 types of events in the documentation proposed by National Weather Service and found most available data recoded after 1990s. In the final summaries of top 10 types, both tornado and flood are listed. Among the damages casued by the disasters, tornado takes the most Americans’ life out, and flood causes the greatest damage to the states economy.

Data Processing

The raw data is a csv file with 37 columns. The column EVTYPE stores the recorded names since the establish of Storme Events Database.

FileName <- dir()[grep(".bz2",dir())]
con <- bzfile(paste0(getwd(),"/",FileName))
RAW_CSV <- read.csv(con)

However, there are 37 names recorded. Because the documentation lists 48 types of events, it is necessary the clean this column at first.

1.Cleaning EVTYPE

I created EVTYPE_Label_raw as the storage of raw labels in EVTYPE and create EVTYPE_Label_rep as the storage of cleaned labels. The cleaned labels of 48 event types are transfered to the regular expressions. They are listed after the code of cleaning the data.

EVTYPE_Label_raw = RAW_CSV$EVTYPE
EV_Table <-  read.table("EVT_table.txt",sep=",") 
EVTYPE_Label_rep = as.character(EVTYPE_Label_raw)
for(i in 1:dim(EV_Table)[1])
{
 EVTYPE_Label_rep[unlist( regexpr(EV_Table$V1[i],ignore.case = TRUE, EVTYPE_Label_raw) ) != -1 ] =  as.character( EV_Table$V2[i] )
}
##                                             V1                       V2
## 1                        Astronomical Low Tide    Astronomical Low Tide
## 2                                    Avalanche                Avalanche
## 3                                 Blizzard|Ice                 Blizzard
## 4                             Coastal|[^Flood]            Coastal Flood
## 5                                         Cold          Cold/Wind Chill
## 6                                  Debris Flow              Debris Flow
## 7                                    Dense Fog                Dense Fog
## 8                                  Dense Smoke              Dense Smoke
## 9                                 ^Dr[ought|y]                  Drought
## 10                                  Dust Devil               Dust Devil
## 11                                  Dust Storm               Dust Storm
## 12                              Excessive Heat           Excessive Heat
## 13                     Extreme Cold|Wind Chill  Extreme Cold/Wind Chill
## 14                                 Flash Flood              Flash Flood
## 15                              Fl[^oo]d|Flood                    Flood
## 16                   Frost|^Freez[e]|Freez[e]$             Frost/Freeze
## 17                                Funnel Cloud             Funnel Cloud
## 18                                Freezing Fog             Freezing Fog
## 19                                        Hail                     Hail
## 20                                        Heat                     Heat
## 21                                  Heavy Rain               Heavy Rain
## 22                                  Heavy Snow               Heavy Snow
## 23                                   High Surf                High Surf
## 24                                   High Wind                High Wind
## 25                           Hurricane|Typhoon      Hurricane (Typhoon)
## 26                                   Ice Storm                Ice Storm
## 27                            Lake-Effect Snow         Lake-Effect Snow
## 28                             Lakeshore Flood          Lakeshore Flood
## 29                                   Lightning                Lightning
## 30                                 Marine Hail              Marine Hail
## 31                            Marine High Wind         Marine High Wind
## 32                          Marine Strong Wind       Marine Strong Wind
## 33                    Marine Thunderstorm Wind Marine Thunderstorm Wind
## 34                                 Rip Current              Rip Current
## 35                                      Seiche                   Seiche
## 36                                       Sleet                    Sleet
## 37                            Storm Surge|Tide         Storm Surge/Tide
## 38                                 Strong Wind              Strong Wind
## 39                                Thunderstorm        Thunderstorm Wind
## 40                                     Tornado                  Tornado
## 41                         Tropical Depression      Tropical Depression
## 42                              Tropical Storm           Tropical Storm
## 43                                  ^Ts[unami]                  Tsunami
## 44                            ^Volcanic|[Ash]$             Volcanic Ash
## 45 ^Waterspout|Waterspout$|spout[.|\\n]$|^WAYT               Waterspout
## 46                             ^Wild|fire[sS]$                 Wildfire
## 47                                Winter Storm             Winter Storm
## 48                                   ^Wint[er]           Winter Weather

At the end of cleaning process, I got 40 cleaned event types as shown below.

levels(as.factor(EVTYPE_Label_rep))
##  [1] "Coastal Flood"           "Cold/Wind Chill"        
##  [3] "Dense Fog"               "Dense Smoke"            
##  [5] "Drought"                 "Dust Devil"             
##  [7] "Dust Storm"              "Extreme Cold/Wind Chill"
##  [9] "Flood"                   "Freezing Fog"           
## [11] "Frost/Freeze"            "Funnel Cloud"           
## [13] "Hail"                    "Heat"                   
## [15] "Heavy Rain"              "Heavy Snow"             
## [17] "High Surf"               "High Wind"              
## [19] "Hurricane (Typhoon)"     "Ice Storm"              
## [21] "Lake-Effect Snow"        "Lakeshore Flood"        
## [23] "Lightning"               "Marine Hail"            
## [25] "Marine High Wind"        "Rip Current"            
## [27] "Seiche"                  "Sleet"                  
## [29] "Storm Surge/Tide"        "Strong Wind"            
## [31] "Thunderstorm Wind"       "Tornado"                
## [33] "Tropical Depression"     "Tropical Storm"         
## [35] "Tsunami"                 "Volcanic Ash"           
## [37] "Waterspout"              "Wildfire"               
## [39] "Winter Storm"            "Winter Weather"

Because the following process will summarize the raw data by year, I retrived the year of each data from BGN_DATA and stored in Year_ind. In consideration of event types had two primary changes in the past versions, I set 1955 and 1996 as the index to separate the raw data in the coming analysis. Note: I firstly deleted the final characters ‘0:00:00’ for every ‘BGN_DATE’ label.

RAW_CSV$BGN_DATE = gsub(" 0:00:00", "", RAW_CSV$BGN_DATE)
Year_ind <- as.numeric( substring(RAW_CSV$BGN_DATE, (nchar(RAW_CSV$BGN_DATE)-3), nchar(RAW_CSV$BGN_DATE)) )
Y1 <- 1955
Y2 <- 1996

2.Summarizing casualty

To analyze the caualty of each event type by year, I created a data table CASUALTY.DT merged cleaned EVTYPE, YEAR of event recorded, and FATALITY and INJURY from raw data.

CASUALTY.DT <- data.table(EVTYPE = EVTYPE_Label_rep, YEAR = Year_ind, FATALITY = RAW_CSV$FATALITIES, INJURY = RAW_CSV$INJURIES)

After check the varialbes FATALITY and INJURY, I found they are the number of people recorded in the event.

CASUALTY.DT[,c(class(FATALITY), class(INJURY) )]
## [1] "numeric" "numeric"
CASUALTY.DT[,c( head(table(FATALITY)),tail(table(FATALITY)) )]
##      0      1      2      3      4      5     90     99    114    116 
## 895323   5010    996    314    166    114      1      1      1      1 
##    158    583 
##      1      1
CASUALTY.DT[,c( head(table(INJURY)),tail(table(INJURY)) )]
##      0      1      2      3      4      5    785    800   1150   1228 
## 884693   7756   3134   1552    931    709      1      2      2      1 
##   1568   1700 
##      1      1

3.Summarizing economical lost

To analyze the economical lost of each event type by year, I created two data table merged cleaned EVTYPE, YEAR of event recorded, and ???DMG and ???DMGEXP from raw data. One data table PROPDMG.DT stores the raw data for property damage, and the other data table CROPDMG.DT stores the raw data for crop damage. ???DMGEXP(PROPDMGEXP, CORPDMGEXP) are vectors of characters. According to the documentation, these columns store the units of dollors including “K” for thousands, “M” for millions, and “B” for billions. However, these columns have more than these alphabetics. I created the vectors PROP.ind and CROP.ind to filter the data for the coming analysis.

table(RAW_CSV$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(RAW_CSV$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
PROP.ind <- grep("[BKM]", RAW_CSV$PROPDMGEXP, ignore.case = TRUE)
PROPDMG.DT <- data.table(EVTYPE = EVTYPE_Label_rep, YEAR = Year_ind, PROPDMG = RAW_CSV$PROPDMG, PROPDMGEXP = as.character(RAW_CSV$PROPDMGEXP))[PROP.ind,]
CROP.ind <- grep("[BKM]", RAW_CSV$CROPDMGEXP, ignore.case = TRUE)
CROPDMG.DT <- data.table(EVTYPE = EVTYPE_Label_rep, YEAR = Year_ind, CROPDMG = RAW_CSV$CROPDMG, CROPDMGEXP = as.character(RAW_CSV$CROPDMGEXP))[CROP.ind,]

The total US dollars lost is the index to evaluate the serverity of each disaster. I created a variable COST stored the values. Then the tables PROPDMG.SET and CROPDMG.SET store the total lost of each disaster was sum up by year.

PROPDMG.DT$PROPDMGEXP[PROPDMG.DT[,grep("[Kk]",PROPDMGEXP)]] = 1000
PROPDMG.DT$PROPDMGEXP[PROPDMG.DT[,grep("[Mm]",PROPDMGEXP)]] = 1000000
PROPDMG.DT$PROPDMGEXP[PROPDMG.DT[,grep("[Bb]",PROPDMGEXP)]] = 1000000000
PROPDMG.DT[,COST:=as.numeric(PROPDMG)*as.numeric(PROPDMGEXP)]
PROPDMG.SET <- PROPDMG.DT[,tapply(COST, paste(YEAR, EVTYPE), sum)]  # Cost per type and per year
PROPDMG.SET = data.table(EVTYPE = substring(names(PROPDMG.SET), 6), YEAR = as.numeric( substr(names(PROPDMG.SET), 1, 4) ), COST = PROPDMG.SET)

CROPDMG.DT$CROPDMGEXP[CROPDMG.DT[,grep("[Kk]",CROPDMGEXP)]] = 1000
CROPDMG.DT$CROPDMGEXP[CROPDMG.DT[,grep("[Mm]",CROPDMGEXP)]] = 1000000
CROPDMG.DT$CROPDMGEXP[CROPDMG.DT[,grep("[Bb]",CROPDMGEXP)]] = 1000000000
CROPDMG.DT[,COST:=as.numeric(CROPDMG)*as.numeric(CROPDMGEXP)]
CROPDMG.SET <- CROPDMG.DT[,tapply(COST, paste(YEAR, EVTYPE), sum)]  # Cost per type and per year
CROPDMG.SET = data.table(EVTYPE = substring(names(CROPDMG.SET), 6), YEAR = as.numeric( substr(names(CROPDMG.SET), 1, 4) ), COST = CROPDMG.SET)

Results

1.Tornado take the most Americans’ life out

Before 1955 Storm Events Database only collects the records of torando. Firstly I exclude the data before 1995 from this analysis. I found 1.514510^{4} people lost life and 1.4052810^{5} people injured in these events since 1955. To compare the damage of every disaster type to American people, I calculated the death rates (in percentile) of each type from the data table.

I summarized the death rates of each disaster type and present the top 10 event type in the order. Tornado got the heaviest disaster among the event types.

sort(CASUALTY.DT[YEAR > Y1,100*tapply(FATALITY, EVTYPE, sum)/sum(FATALITY)], decreasing =  TRUE)[1:10]  
##                 Tornado                    Heat                   Flood 
##               32.844907               22.177391               10.773696 
##           Coastal Flood               Lightning Extreme Cold/Wind Chill 
##                8.154598                5.783252                2.704042 
##             Rip Current            Volcanic Ash          Winter Weather 
##                2.640334                2.576626                1.974942 
##               High Wind 
##                1.776740

Here are the death rates of top 10 disasters before 1996 and after 1996. Before 1996, tornado is the heaviest disaster and flood (coastal flood and flood) take the secondary. After 1996, heat jumed to top 1, but torando and flood take the secondary and third positions.

sort(CASUALTY.DT[YEAR > Y1 & YEAR < Y2,100*tapply(FATALITY, EVTYPE, sum)/sum(FATALITY)], decreasing =  TRUE)[1:10]  # death rates of top 10 between 1954 and 1996
##                 Tornado                    Heat           Coastal Flood 
##              57.9981464              20.3336423               7.2474513 
##                   Flood               Lightning            Volcanic Ash 
##               3.9481001               3.0769231               2.4281742 
## Extreme Cold/Wind Chill             Rip Current         Cold/Wind Chill 
##               0.8711770               0.6116775               0.5560704 
##          Winter Weather 
##               0.4819277
sort(CASUALTY.DT[YEAR > Y2,100*tapply(FATALITY, EVTYPE, sum)/sum(FATALITY)], decreasing =  TRUE)[1:10]  # death rates of top 10 after 1996
##                    Heat                 Tornado                   Flood 
##               24.420024               18.131868               14.432234 
##           Coastal Flood               Lightning             Rip Current 
##                8.376068                7.301587                4.151404 
## Extreme Cold/Wind Chill          Winter Weather               High Wind 
##                3.479853                2.796093                2.576313 
##            Volcanic Ash 
##                2.503053

Here I merged FATALITY and INJURY and calcuated the casualty rate of each EVTYPE. All the data show tornado the heaviest disaster type. Heat and flood comes after in a casualty rate lower than 10%.

sort(CASUALTY.DT[YEAR > Y1,100*tapply( (FATALITY+INJURY), EVTYPE, sum)/sum((FATALITY+INJURY))], decreasing =  TRUE)[1:10]
##           Tornado              Heat     Coastal Flood             Flood 
##         59.490969          8.534724          7.387506          6.992437 
##         Lightning    Winter Weather         Ice Storm      Volcanic Ash 
##          4.177228          1.551956          1.437304          1.402079 
## Thunderstorm Wind          Wildfire 
##          1.175536          1.171392

As with the analysis of death rate, I calcuated the casualty rates before 1996 and after 1996. Tornado and flood/coastal flood take the first and secondary heaviest disasters. However, the third heaviest disaster before 1996 is ice storme, but heat takes this position after 1996.

sort(CASUALTY.DT[YEAR > Y1 & YEAR < Y2,100*tapply( (FATALITY+INJURY), EVTYPE, sum)/sum((FATALITY+INJURY))], decreasing =  TRUE)[1:10]  # casualty rates between 1954 and 1996
##       Tornado Coastal Flood          Heat     Ice Storm  Volcanic Ash 
##    81.9129588     5.6392326     3.3786726     2.1529752     1.8379057 
##     Lightning          Hail         Flood    Heavy Snow     Dense Fog 
##     1.6086478     0.8401855     0.4790082     0.4533928     0.2664003
sort(CASUALTY.DT[YEAR > Y2,100*tapply( (FATALITY+INJURY), EVTYPE, sum)/sum((FATALITY+INJURY))], decreasing =  TRUE)[1:10]  # casualty rates after 1996
##             Tornado               Flood                Heat 
##           33.802484           15.065881           15.058000 
##       Coastal Flood           Lightning      Winter Weather 
##            8.837158            6.972639            2.885828 
##   Thunderstorm Wind            Wildfire Hurricane (Typhoon) 
##            2.472891            2.379902            2.203379 
##           High Wind 
##            1.861367

2.Flood and Hurricane (Typhoon) casued the severiest economical lost between 1996 and 2011

Because 48 lables are defined since 1996, Storm Events Database should have the precise records of econmoic lost since that year. The economical lost of each disaster is estimated by the total US dollars recorded since 1996. Firstly I summarized the property damage caused by each disaster. Here are the names of first 10 disasters according to the total dollars lost they caused since 1996.

PROP.TOP10 <- names( sort(PROPDMG.SET[YEAR > Y2,tapply(COST, EVTYPE, sum)], decreasing =  TRUE) )[1:10]
##  [1] "Flood"               "Hurricane (Typhoon)" "Storm Surge/Tide"   
##  [4] "Tornado"             "Hail"                "Wildfire"           
##  [7] "Tropical Storm"      "High Wind"           "Coastal Flood"      
## [10] "Ice Storm"

From 1996 to 2011, the property damage of top 10 disasters is summarized in the histgram below. This figure shows the records between 2005 and 2006 make flood and hurricane occupy the first and second heaviest disaster since 1996. The extreme climate changing in past decades is the primary cause increasing the property damage in the US.

barplot2(
  as.matrix( PROPDMG.SET[EVTYPE == PROP.TOP10,][YEAR > Y2,tapply(COST/1000000000, YEAR, list)] ),
  space = 0,
  main = paste("Property Damage by Top 10 disasters in USA since", (Y2+1)),
  xlab = "YEAR",
  ylab = "Total Lost a Year in US dollors(billion)",
  col = brewer.pal(10, "Set3"), 
  legend.text = unlist(PROPDMG.SET[EVTYPE == PROP.TOP10,][YEAR == Y2,tapply(EVTYPE, YEAR, list)])
)

I summarized the crop damage as the way I summarized the property damage. Here are the names of first 10 disasters according to the total dollars lost they caused since 1996.

# filter the top 10 events
CROP.TOP10 <- names( sort(CROPDMG.SET[YEAR > Y2,tapply(COST, EVTYPE, sum)], decreasing =  TRUE) )[1:10]
##  [1] "Drought"                 "Flood"                  
##  [3] "Hurricane (Typhoon)"     "Hail"                   
##  [5] "Frost/Freeze"            "Extreme Cold/Wind Chill"
##  [7] "Heavy Rain"              "Tropical Storm"         
##  [9] "Coastal Flood"           "High Wind"

Flood did not cause the heaviest crop damage since the establish of Storm Database. At least before 2007, drought occupied the most of lost per year. Flood has became severe to the agricultural produce since 2007. Is this associated the extreme property damages happened at 2005 and 2006? This might be the trend we have to trace in the following years.

barplot2(
  as.matrix( CROPDMG.SET[EVTYPE == CROP.TOP10,][YEAR > Y2,tapply(COST/1000000000, YEAR, list)] ),
  main = paste("Crop Damage by Top 10 disasters in USA since", (Y2+1) ),
  space = 0,
  xlab = "YEAR",
  ylab = "Total Lost a Year in US dollors(billion)",
    col = brewer.pal(10, "Set3"), 
  legend.text = unlist(CROPDMG.SET[EVTYPE == CROP.TOP10,][YEAR == (Y2),tapply(EVTYPE, YEAR, list)])
)