Tornados, costly in lives and injuries, floods costly in property damage

Synopsis

Data from the National Weather Service of the United States was used to analyze which are the kind of meteoroogical events that are more disastrous in lives and injuries and from the economical point of view.

The data used was the U.S. National Oceanic and Athmospheric Administration’s storm database. This database tracks characteristics of major storms and wheater events in the United States. However, the data has some quality issues that have to be addressed before extracting conclusions from them.

One of the important variables that has to be considered is the EVTYPE variable, which describes classification of the wheater events considered. This variable is taken as reported by the methereological units distributed along the country and it is not normilized, in the sense that is taken “as is”. Therefore, it is important to analyze and normalize such variable before getting conclusions from it.

We will use the EVTYE variable in order to identify which metheorolical events are more costly in accidents and fatalities (each of this variables are reported in the dataset as INJURIES and FATALITIES, respectively), but also from the economic point of view (using the variables PROPDMG and CROPDMG)

A bar plot is obtained in each case in order to sort the types of events according to their order of magnitude.

Data Procesing

First, we load the data to be analyzed

data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))

There are the same type of event identified with different labels, so we need to reagroup the data and correct some of the labels. For example, FLASH FLOOD, FLASH FLOOD/FLOOD, FLASH FLOODING, FLASH FLOODING/FLOOD, FLASH FLOODS, all represent the same type of event.

Firts in order to simplify differences, we need to set all the letters to one only type, I choose capital letters to represent the data, and create a new variable for event type and keep the original variable for comparing.

I decided to make the apparently obvious corrections and to group the most significant type of events.

data$EVTYPE2 <- toupper(as.character(data$EVTYPE))

#Fix (apparently) obvious mitakes
data$EVTYPE2[which(data$EVTYPE2 == "AVALANCE")] <- "AVALANCHE"
data$EVTYPE2[which(data$EVTYPE2 %in% c("BEACH EROSIN","BEACH EROSION"))] <- "BEACH EROSION" 
data$EVTYPE2[which(data$EVTYPE2 == "BLOW-OUT TIDE")] <- "BLOW-OUT TIDES"
data$EVTYPE2[which(data$EVTYPE2 == "BRUSH FIRE")] <- "BRUSH FIRES"
data$EVTYPE2[which(data$EVTYPE2 == "COASTALSTORM")] <- "COASTAL STORM"
data$EVTYPE2[which(data$EVTYPE2 == "COLD AIR FUNNEL")] <- "COLD AIR FUNNELS"
data$EVTYPE2[which(data$EVTYPE2 %in% c("GUSTNADO", "GUSTNADO AND"))] <- "GUSTNADO"
data$EVTYPE2[which(data$EVTYPE2 == "DUSTSTORM")] <- "DUST STORM"
data$EVTYPE2[which(data$EVTYPE2 %in% c("DUST DEVEL", "DUST DEVIL", "DUST DEVIL WATERSPOUT"))] <- "DUST DEVIL"
data$EVTYPE2[which(data$EVTYPE2 == "TORNDAO")] <- "TORNADO"
data$EVTYPE2[which(data$EVTYPE2 == "WET MICOBURST")] <- "WET MICROBURST"

#redefine groups

data$EVTYPE2[c(grep("THUND",data$EVTYPE2),grep("TSTM",data$EVTYPE2))] <- "THUNDERSTOM WINDS"
data$EVTYPE2[grep("FLOOD",data$EVTYPE2)] <- "FLOOD"
data$EVTYPE2[grep("LIGHTNING",data$EVTYPE2)] <- "LIGHTNING"
data$EVTYPE2[grep("HIGH +WIND?",data$EVTYPE2)] <- "HIGH WINDS"
data$EVTYPE2[grep("HAIL",data$EVTYPE2)] <- "HAIL"
data$EVTYPE2[grep("?HEAVY SNOW?",data$EVTYPE2)] <- "HEAVY SNOW"
data$EVTYPE2[grep("BLOWING +SNOW",data$EVTYPE2)] <- "BLOWING SNOW"
data$EVTYPE2[c(grep("COASTAL *FLOOD",data$EVTYPE2),grep("CSTL FLOOD",data$EVTYPE2))] <- "COASTAL FLOOD"
data$EVTYPE2[grep("WIND *CHILL",data$EVTYPE2)] <- "WIND CHILL"
data$EVTYPE2[c(grep("FREEZ",data$EVTYPE2),grep("FROST",data$EVTYPE2))] <- "FREEZING"
data$EVTYPE2[grep("FUNNEL",data$EVTYPE2)] <- "FUNNEL"
data$EVTYPE2[grep("GUSTY",data$EVTYPE2)] <- "GUSTY"
data$EVTYPE2[grep("COLD",data$EVTYPE2)] <- "COLD"
data$EVTYPE2[grep("DRY",data$EVTYPE2)] <- "DRY"
data$EVTYPE2[grep("TORNADO",data$EVTYPE2)] <- "TORNADO"
data$EVTYPE2[grep("SUMMARY",data$EVTYPE2)] <- "SUMMARY"
data$EVTYPE2[grep(" *HIGH SURF",data$EVTYPE2)] <- "HIGH SURF"
data$EVTYPE2[c(grep("WND",data$EVTYPE2), grep(" *WIND",data$EVTYPE2))] <- "WIND"
data$EVTYPE2[grep("WATER *SPOUT",data$EVTYPE2)] <- "WATER SPOUT"
data$EVTYPE2[grep("MUD *SLIDE",data$EVTYPE2)] <- "MUD SLIDE"
data$EVTYPE2[grep("HURRICANE",data$EVTYPE2)] <- "HURRICANE"
data$EVTYPE2[grep("SLEET",data$EVTYPE2)] <- "SLEET"
data$EVTYPE2[grep("HEAT",data$EVTYPE2)] <- "HEAT"
data$EVTYPE2[grep("TROPICAL STORM",data$EVTYPE2)] <- "TROPICAL STORM"
data$EVTYPE2[grep("WINTER",data$EVTYPE2)] <- "WINTER WEATHER"
data$EVTYPE2[c(grep("WILD *FIRE",data$EVTYPE2),grep("WILD/FOREST FIRE",data$EVTYPE2))] <- "WILD FIRES"

The next point is to estimate the costs for property damage and also for crop damage. In the next computation, we used the variable PROPDMGEXP for using the right scale fro the reported cost in the variable PROPDMG: B is for Billions (\(10^9\)), M/m is for millions, K/m is for milliards, and H/h is for hundreds. We repeat the same analysis for crop damage.

propdmg <- as.data.frame(tapply(data$PROPDMG,list(data$EVTYPE2,data$PROPDMGEXP),sum,na.rm=T))
totalpropdmg <- unlist(apply(propdmg,1,function(x){
    x[which(is.na(x))]<- 0;x[14]*1e9 + (x[15] + x[16])*100 + x[17]*1000 +(x[18]+x[19])*1e6
    }))
cropdmg <- as.data.frame(tapply(data$CROPDMG,list(data$EVTYPE2,data$CROPDMGEXP),sum,na.rm=T))
totalcropdmg <- unlist(apply(cropdmg,1,function(x){
    x[which(is.na(x))]<- 0;x[5]*1e9 + (x[6] + x[7])*1000 + (x[8]+x[9])*1e6
    }))
totaldmg <- totalcropdmg + totalpropdmg

Results

The following grah shows human damage according to the events. We only consider the 20 most significant events in the graphs It is clear that tornadoes are the most dangerous event from the cost of human life perspective. It can be seen that in both concepts, injuries and fatalities, tornados are extremely dangerous.

layout(matrix(c(1,1,2,3),nrow=2,byrow=T))
barplot(sort(tapply(data$INJURIES + data$FATALITIES,data$EVTYPE2,sum),decreasing = T)[1:20],las=2, main="Total human damage")
barplot(sort(tapply(data$INJURIES,data$EVTYPE2,sum),decreasing = T)[1:20],las=2, main= "Total Injuries by event")
barplot(sort(tapply(data$FATALITIES,data$EVTYPE2,sum),decreasing = T)[1:20],las=2, main="Total Fatalities by event")

plot of chunk unnamed-chunk-4

From the economic point of view, we can see that floods take much of the property damage account, and it accounts significatively for the amount of damage, but in the case of Crop damage, is the second element causing damage, since drougth take the first place.

layout(matrix(c(1,1,2,3),nrow=2,byrow=T))
barplot(sort(totaldmg,decreasing = T)[1:20],las=2, main = "Total material damage")
barplot(sort(totalpropdmg,decreasing = T)[1:20],las=2, main ="Total property damage")
barplot(sort(totalcropdmg,decreasing = T)[1:20],las=2, main = "Total crop damage")

plot of chunk unnamed-chunk-5