Beware of Tornados: an analysis of economic and human damages by weather events in the USA (years 2000-2010)

Peer Assessment 2 - B. A. Benayoun

Synopsis

We took advantage of the tracking of weather events by the NOAA and NCDC centers since 1950. We concentrated on the years after 2000 that are expected to have had the best record keeping. We processed the data to identify the weather events that had represented the major issues with respect to persons (in terms of deaths or injuries), as well as the events that had induced the most financial losses (in terms of property or crop damage). We found that tornados have had the most human impact, both in terms of fatality and injuries. Droughts have been reponsible for most crop damage,whereas floods have been reponsible for most property damage and have been the second event most linked to property damage. Tornados are also in the top 15 of events most responsible for economical losses. Because of their dual strong negative impact on both human health and economy, we investigated the frequency of tornados since 1950s and found an increase in the number of reported events. This increase could come from better record-keeping or from a bona fide increase in events.

Obtaining the data

The storm dataset file repdata-data-StormData.csv.bz2 was downloaded from the course website. All analyses were conducted on a Macintosh desktop computed, with MAC-OSX system (Snow Leopard).

Data Processing

First, the csv file is loaded into an R object for processing, clean-up and analysis.

my.storm.data <- read.csv(bzfile('repdata-data-StormData.csv.bz2'),header=T)

Then, we look at the information contained in the dataframe:

## examine the entered data
colnames(my.storm.data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
## extracting the years= from the BGN_DATE field
my.data.subset <- subset(my.storm.data, select=c(EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG))

## extracting the years= from the BGN_DATE field
get_year <- function(my.string) {
  my.imd <- strsplit(as.character(my.string)," ")[[1]][1]
  strsplit(my.imd,"/")[[1]][3]
}
my.evt.year <- sapply(my.storm.data$BGN_DATE,get_year)
my.data.subset$YEAR <- as.factor(my.evt.year)

## examine encoding of exponents
unique(levels(my.storm.data$PROPDMGEXP))
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
unique(levels(my.storm.data$CROPDMGEXP))
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

Because encoded exponents for crop and property seem to be encoded as letters, we will convert the exponents to numbers. Unidentifiable signs will be assigned a multiplying factor of one in the subsequent analysis.

## convert letter encoded exponents for crop and property to numbers and applying them
get_exponent <- function(my.exp){
  my.exp <- toupper(my.exp)
  my.factor <- 1
  if (my.exp %in% "K") {
    my.factor <- 1000
  } else if (my.exp %in% "M") {
    my.factor <- 1e6
  } else if (my.exp %in% "B") {
    my.factor <- 1e9
  }
  my.factor
}

my.exp.factors.prop <- sapply(my.storm.data$PROPDMGEXP,get_exponent)
my.exp.factors.crop <- sapply(my.storm.data$CROPDMGEXP,get_exponent)

my.data.subset$PROPDMG <- my.data.subset$PROPDMG * my.exp.factors.prop
my.data.subset$CROPDMG <- my.data.subset$CROPDMG * my.exp.factors.crop

We will restrict most of the analysis on the recent years to eliminate bias from less comprehensive record keeping in the pre-2000 years.

## subsetting to years after 2000
my.2k.lines <- which(as.vector(my.evt.year) >= 2000)
my.data.subset <- my.data.subset[my.2k.lines,]

## get an overview of what remains (see a subset)
unique(levels(my.data.subset$EVTYPE))[1:75]
##  [1] "   HIGH SURF ADVISORY"          " COASTAL FLOOD"                
##  [3] " FLASH FLOOD"                   " LIGHTNING"                    
##  [5] " TSTM WIND"                     " TSTM WIND (G45)"              
##  [7] " WATERSPOUT"                    " WIND"                         
##  [9] "?"                              "ABNORMAL WARMTH"               
## [11] "ABNORMALLY DRY"                 "ABNORMALLY WET"                
## [13] "ACCUMULATED SNOWFALL"           "AGRICULTURAL FREEZE"           
## [15] "APACHE COUNTY"                  "ASTRONOMICAL HIGH TIDE"        
## [17] "ASTRONOMICAL LOW TIDE"          "AVALANCE"                      
## [19] "AVALANCHE"                      "BEACH EROSIN"                  
## [21] "Beach Erosion"                  "BEACH EROSION"                 
## [23] "BEACH EROSION/COASTAL FLOOD"    "BEACH FLOOD"                   
## [25] "BELOW NORMAL PRECIPITATION"     "BITTER WIND CHILL"             
## [27] "BITTER WIND CHILL TEMPERATURES" "Black Ice"                     
## [29] "BLACK ICE"                      "BLIZZARD"                      
## [31] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD AND HEAVY SNOW"       
## [33] "Blizzard Summary"               "BLIZZARD WEATHER"              
## [35] "BLIZZARD/FREEZING RAIN"         "BLIZZARD/HEAVY SNOW"           
## [37] "BLIZZARD/HIGH WIND"             "BLIZZARD/WINTER STORM"         
## [39] "BLOW-OUT TIDE"                  "BLOW-OUT TIDES"                
## [41] "BLOWING DUST"                   "blowing snow"                  
## [43] "Blowing Snow"                   "BLOWING SNOW"                  
## [45] "BLOWING SNOW & EXTREME WIND CH" "BLOWING SNOW- EXTREME WIND CHI"
## [47] "BLOWING SNOW/EXTREME WIND CHIL" "BREAKUP FLOODING"              
## [49] "BRUSH FIRE"                     "BRUSH FIRES"                   
## [51] "COASTAL  FLOODING/EROSION"      "COASTAL EROSION"               
## [53] "Coastal Flood"                  "COASTAL FLOOD"                 
## [55] "coastal flooding"               "Coastal Flooding"              
## [57] "COASTAL FLOODING"               "COASTAL FLOODING/EROSION"      
## [59] "Coastal Storm"                  "COASTAL STORM"                 
## [61] "COASTAL SURGE"                  "COASTAL/TIDAL FLOOD"           
## [63] "COASTALFLOOD"                   "COASTALSTORM"                  
## [65] "Cold"                           "COLD"                          
## [67] "COLD AIR FUNNEL"                "COLD AIR FUNNELS"              
## [69] "COLD AIR TORNADO"               "Cold and Frost"                
## [71] "COLD AND FROST"                 "COLD AND SNOW"                 
## [73] "COLD AND WET CONDITIONS"        "Cold Temperature"              
## [75] "COLD TEMPERATURES"

We noticed that some events were reported disjointly as lower or upper case (e.g “BLACK ICE” vs. “Black Ice”), or as singular vs. plural (e.g. “BRUSH FIRE” vs. “BRUSH FIRES”), or both (e.g “Cold Temperature” vs. “COLD TEMPERATURES”).

We then proceeded to clean up inputting differences from the dataset that could be cleaned up readily by scripting.

length(unique(my.data.subset$EVTYPE))
## [1] 196
## plural vs. singular problem: join them
my.res <- toupper(my.data.subset$EVTYPE)
my.evt.types <- unique(toupper(my.data.subset$EVTYPE))

for ( i in 1:length(my.evt.types)) {
  my.plural <- paste(my.evt.types[i],"S",sep="")

  if (length(which(my.evt.types %in% my.plural)) > 0) {
    my.sg.idx <- which(my.res %in% my.plural)
    my.res[my.sg.idx] <- my.evt.types[i]
  }
}

my.evt.types <- unique(my.res)
length(my.evt.types)
## [1] 190
## update event types
my.data.subset$EVTYPE <- as.factor(my.res)

Results

To obtain the most complete possible view of the phenomena under study, we decided to focus on events that occured after year 2000 (years 2000-2011). This will also have the added benefit of providing the most up-to-date view of the impact of abnormal weather events on persons and on the economy.

Weather events in the USA and impact on persons (2000-2011)

To identify the most impactful weather events on USA population health, we computed the percentage of fatalities and injuries reported in conjunction with specific weather events.

tot.fatalities <- sum(my.data.subset$FATALITIES)
tot.injuries <- sum(my.data.subset$INJURIES)

ev.fatalities <- aggregate(FATALITIES ~ EVTYPE, my.data.subset, sum)
ev.injuries <- aggregate(INJURIES ~ EVTYPE, my.data.subset, sum)

rat.ev.fatalities <- 100 * ev.fatalities$FATALITIES/tot.fatalities
rat.ev.injuries <- 100 * ev.injuries$INJURIES/tot.injuries

# obtain sorting indeces for impact, with largest first
sort.fat <- sort(rat.ev.fatalities,index.return = T, decreasing = T)
sort.inj <- sort(rat.ev.injuries,index.return = T, decreasing = T)

We then plot the top 15 types of weather events making up for the most casualities or injuries:

par(mfrow=c(1,1))
par(mfrow=c(1,2))
par(mar=c(4,10, 2, 2), oma=c(0.5,6,0.5,0.5))
barplot(rev(rat.ev.fatalities[sort.fat$ix[1:15]]),
        names=tolower(rev(ev.fatalities$EVTYPE[sort.fat$ix[1:15]])),
        xlab = "Percent total deaths (years 2000-2010)",
        horiz=T,las=2,col=rainbow(15), xlim=c(0,25), main = "Deaths", cex.lab= 0.8)
box(col = "black")

barplot(rev(rat.ev.injuries[sort.inj$ix[1:15]]),
        names=tolower(rev(ev.injuries$EVTYPE[sort.inj$ix[1:15]])),
        xlab = "Percent total injured (years 2000-2010)",
        horiz=T,las=2,col=rainbow(15), xlim=c(0,50), main = "Injuries", cex.lab= 0.8)
box(col = "black")

plot of chunk health_plots

Fig. 1: Top 15 events most responsible for deaths and injuries in the USA (2000-2011)

Based on this analysis, we notice that tornados and excessive heat have been the two types of weather events with most lethalaties and the ones that have caused most injuries in the USA over years 2000 to 2011.

Weather events in the United States and economic consequences (2000-2011)

To identify the most impactful weather events on the economy, we computed the percentage of property and crop damage reported in conjunction with specific weather events.

tot.propdam <- sum(my.data.subset$PROPDMG)
tot.cropdam <- sum(my.data.subset$CROPDMG)

ev.propdam <- aggregate(PROPDMG ~ EVTYPE, my.data.subset, sum)
ev.cropdam <- aggregate(CROPDMG ~ EVTYPE, my.data.subset, sum)

rat.ev.propdam <- 100 * ev.propdam$PROPDMG/tot.propdam
rat.ev.cropdam <- 100 * ev.cropdam$CROPDMG/tot.cropdam

# obtain sorting indeces for impact, with largest first
sort.prop <- sort(rat.ev.propdam,index.return = T, decreasing = T)
sort.crop <- sort(rat.ev.cropdam ,index.return = T, decreasing = T)

We then plot the top 15 types of weather events responsible for the most property and crop damages.

par(mfrow=c(1,1))
par(mfrow=c(1,2))
par(mar=c(4,10, 2, 2), oma=c(0.5,6,0.5,0.5))
barplot(rev(rat.ev.propdam[sort.prop$ix[1:15]]),
        names=tolower(rev(ev.propdam$EVTYPE[sort.prop$ix[1:15]])),
        xlab = "Percent total damage (years 2000-2011)",
        horiz=T,las=2,col=heat.colors(15), xlim=c(0,50), main = "Property damage", cex.lab= 0.8)
box(col = "black")

barplot(rev(rat.ev.cropdam[sort.crop$ix[1:15]]),
        names=tolower(rev(ev.cropdam$EVTYPE[sort.crop$ix[1:15]])),
        xlab = "Percent total damage (years 2000-2011)",
        horiz=T,las=2,col=heat.colors(15), xlim=c(0,50), main = "Crop damage", cex.lab= 0.8)
box(col = "black")

plot of chunk eco_plots

Fig. 2: Top 15 events most responsible for property and crop damage in the USA (2000-2011)

Based on this analysis, we notice that floods have major economical consequences. Indeed, they have been the leading cause of property damage, and the second cause of crop damage in the USA between years 2000 to 2011. In addition, the leading cause of crop damage has been due to drought in these years.

#### count and identify common events in the top 15 that have been causing property and crop damage
ev.type.damage <- intersect(rev(ev.propdam$EVTYPE[sort.prop$ix[1:15]]),
                            rev(ev.cropdam$EVTYPE[sort.crop$ix[1:15]]))
ev.type.damage.nb <- length(ev.type.damage)

In the top 15 event types which have caused most property and crop damages in the USA over years 2000 to 2011, 11 are common. These most costly events are listed below :

ev.type.damage
##  [1] "TSTM WIND"         "HURRICANE"         "THUNDERSTORM WIND"
##  [4] "WILDFIRE"          "HIGH WIND"         "TROPICAL STORM"   
##  [7] "FLASH FLOOD"       "HAIL"              "TORNADO"          
## [10] "HURRICANE/TYPHOON" "FLOOD"

Interestingly, tornados, which were the undisputed weather event causing most fatalities and injuries in the USA over years 2000 to 2010 (see above), are also found in the top 15 events responsible for major property and crop damages. It corresponds to the 4th event in terms of property damages, and the 15th in terms of crop damages.

Pattern of reported tornado frequency in the USA (1950-2011)

To evaluate the impact of tornados, which have been a major source of human and property damages between 2000-2011, we next decided to investigate the frequency of reported weather events and of reported tornados over the entire dataset.

my.storm.data$YEAR <- my.evt.year

my.years.all <- unique(my.evt.year)
my.tornado.per.year.all <- rep(0,length(my.years.all))
for (i in 1:length(my.years.all)) {
  my.tornados.of.the.year <- intersect(which(my.storm.data$EVTYPE %in% "TORNADO"),
                           which(my.evt.year %in% my.years.all[i]))
  my.tornado.per.year.all[i] <- length(my.tornados.of.the.year)

}

my.evs.per.year <- rowsum(rep(1,length(my.storm.data$YEAR)),my.storm.data$YEAR)

par(mfrow=c(1,2))
par(mar=c(4,4,2, 2), oma=c(2,2,2,2))
plot(as.numeric(my.years.all),as.numeric(my.evs.per.year),ylab= "Number of reported events per year",
      xlab= "Year", pch=16,col="slateblue",type='b',lwd=0.5, cex.axis = 0.6,
     main = "Weather events")

plot(as.vector(my.years.all),my.tornado.per.year.all,ylab= "Number of reported Tornados per year",
      xlab= "Year", pch=16,col="deeppink",type='b',lwd=0.5, cex.axis = 0.6,
     main = "Tornado events")

plot of chunk tornado_freq

Fig. 3: Evolution of reported weather events and tornado events in the USA (1950-2011)

We can see that both the number of reported number of weather events and of tornado events has increased regularly between 1950 to 2010. The increase in reported events is probably due to : i) better record-keeping with years and ii) to some extent, possible climate change inducing more catastrophic events. Thus, though reported tornado events have increased over the years, it is difficult to say whether this is really an upward trend over the entire time period. However, since we do see an upward trend in the number of occuring tornados with time even in the most recent years where event-tracking is most accurate, it is a possibility to consider and potentially plan for, given our results that they have been a major source of human and property damages.

R session information

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.7
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.4   evaluate_0.5.5 formatR_1.0    stringr_0.6.2 
## [5] tools_3.1.1