Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This report analysis data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

From this analysis, I found that the most serious weather events across the United States in term of fatalities in the period from 1996 to 2011 was EXCESSIVE HEAT. The other interesting note related to the harmful with respect to population health that we could see in the analysis that injuries are predominantly by TORNADO. On the other hand, the greatest economic consequences across the United States in the same period from 1996 to 2011 was HURRICANE/TYPHOON. I also found that crop damage play minor rule in compare to property damage in term of the total cost.

Data Processing

For this analysis I downloaded the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (last access in Nov 15, 2014). I import CSV data directly from bzfile pipeline to unzip source file.

data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), stringsAsFactors=FALSE)

I create data2 sub set from original data by drop un-used columns then remove original data frame to save memory.

data2 <- data[c('BGN_DATE','EVTYPE','FATALITIES','INJURIES','PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')]

rm(data)

I convert BGN_DATE string column into Date object then extract year as an integer and save it back in the BGN_DATE column.

data2$BGN_DATE <- as.Date(data2$BGN_DATE, "%m/%d/%Y")
data2$BGN_DATE <- as.integer(format(data2$BGN_DATE, "%Y"))

My approach to cleanup EVTYPE starts by simple rules like convert strings to upper case, but I did notice that all typo and non-standardising cases (e.g. “STRONG WIND”~“STRONG WINDS”, “MUD SLIDE”~“MUDSLIDE”, “ICE ROADS”~“ICY ROADS”, “WINTER WEATHER”~“WINTER WEATHER/MIX”, “TSTM WIND”~“TSTM WIND (G45)”, multiple event types “HEAVY SURF/HIGH SURF”, etc..) have not significant affect our main goal in finding the most harmful event with respect to population health and greatest economic consequences, so I save my time and efforts.

NOTE: I found one outlier affect economic damage estimate results dramatically related to the record 605953, when I checked associated remarks I guess that signify magnitude should be “M” not “B”.

data2$EVTYPE <- toupper(data2$EVTYPE)
data2[605953,]$PROPDMGEXP <- "M"
attach(data2)

Then I restrict the data used in this analysis to include only the last 15 years (i.e. from 1996 to 2011), this gives us enough reliable and complete data that can be used to plan for the future.

I also filter out all events in which there wasn’t a single death/injury and save the result in new health data frame, also I have create another economic data frame includes only records with non-zero damage.

NOTE: No need to the data2 data frame after this point of analysis, so I remove it to save some more memory.

health <- data2[which(BGN_DATE>=1996 & (INJURIES>0 | FATALITIES>0)),]
economic <- data2[which(BGN_DATE>=1996 & (PROPDMG>0 | CROPDMG>0)),]

rm(data2)

I write a function to calculate the damage cost using the value rounded to three significant digits, as well as the alphabetical characters used to signify magnitude, which include “K” for thousands, “M” for millions, and “B” for billions.

Then I apply this function on both Property Damage Estimates and Crop Damage Estimates related columns, after that we get the total cost by sum both of them in new column named as COST.

getCost <- function(x, i, j){
    value <- as.numeric(x[i])
    scale <- x[j]
    
    if (scale == "K") value <- value * 1000
    else if (scale == "M") value <- value * 1000000
    else if (scale == "B") value <- value * 1000000000
    
    return(value)
}

economic$PROPDMGCOST <- apply(economic, 1, getCost, 5, 6)
economic$CROPDMGCOST <- apply(economic, 1, getCost, 7, 8)
economic$COST <- economic$PROPDMGCOST + economic$CROPDMGCOST

Analysis

I split the health data frame into subsets grouped by event type, and compute the total fatalities and injuries for each subset. Then I sort this information in descending order based on fatalities numbers and use injuries numbers to break the ties. After this I pick up the top 5 records and summarise all the rest events in one extra record tagged as OTHERS.

health.totals <- aggregate(cbind(FATALITIES, INJURIES)~EVTYPE, data=health,  FUN=sum)

top5.health <- head(health.totals[order(-health.totals$FATALITIES, -health.totals$INJURIES),], 5)

newRow <- data.frame(EVTYPE="OTHERS", 
                     FATALITIES=sum(health$FATALITIES)-sum(top5.health$FATALITIES), 
                     INJURIES=sum(health$INJURIES)-sum(top5.health$INURIES))

top5.health <- rbind(top5.health, newRow)

Here I calculate some ratios to use them in the most harmful with respect to population health plot caption to highlight few interesting information:

tornado.injuries.ratio <- round(100*top5.health[2,]$INJURIES/sum(health$INJURIES), 1)
tornado.count.ratio <- round(100*length(rownames(health))/length(rownames(health[which(EVTYPE=="TORNADO"),])), 1)

Now I split the economic data frame into subsets grouped by event type, and compute the total cost for each subset. Then I sort this information in descending order and pick up the top 5 records and summarise all the rest events in one extra record tagged as OTHERS.

economic.totals <- aggregate(cbind(PROPDMGCOST,CROPDMGCOST,COST)~EVTYPE, data=economic, FUN=sum)

top5.economic <- head(economic.totals[order(-economic.totals$COST),], 5)

newRow <- data.frame(EVTYPE="OTHERS", 
                     PROPDMGCOST=sum(economic$PROPDMGCOST)-sum(top5.economic$PROPDMGCOST), 
                     CROPDMGCOST=sum(economic$CROPDMGCOST)-sum(top5.economic$CROPDMGCOST), 
                     COST=sum(economic$COST)-sum(top5.economic$COST))

top5.economic <- rbind(top5.economic, newRow)

Here I calculate some ratios to use them in the greatest economic consequences plot caption to highlight few interesting information:

crop.ratio <- round(100*sum(economic$CROPDMGCOST)/sum(economic$COST), 1)
top5.ratio <- 100 - round(100*top5.economic[6,]$COST/sum(economic$COST), 1)

Results

The following plot will present the above calculated information related to the population health in term of severe weather events:

par(mfrow = c(1, 2))

barplot(top5.health$FATALITIES, xaxt="n", space=1, las=2, ylim=c(0,3500),
        ylab="Number of Fatalities", xlab="Severe Weather Events",
        main="Fatalities Caused By Severe Weather Events")

grid(nx=NA, ny=NULL)

text(seq(1.75, 2*nrow(top5.health), by=2), par("usr")[3]-50, 
     srt=30, adj=1, xpd=TRUE, labels=top5.health$EVTYPE, cex=0.65)

barplot(top5.health$INJURIES/10^3, xaxt="n", space=1, las=2, ylim=c(0,60),
        ylab="Thousands of Injuries", xlab="Severe Weather Events",
        main="Injuries Caused By Severe Weather Events")

grid(nx=NA, ny=NULL)

text(seq(1.75, 2*nrow(top5.health), by=2), par("usr")[3]-1, 
     srt=30, adj=1, xpd=TRUE, labels=top5.health$EVTYPE, cex=0.65)

plot of chunk healthplot

The top 5 severe weather events have the most harmful with respect to population health across the United States in the period 1996-2011. You can notice that EXCESSIVE HEAT cause the most registered fatalities, but the other interesting note that TORNADO cause 35.6% of total registered injuries alone even that total count of this event does not count more than 21% of total events.

The following plot will present the above calculated information related to the economic consequences in term of severe weather events:

barplot(top5.economic$COST/10^9, xaxt="n", space=1, las=2, ylim=c(0,100),
        ylab="Billon Dollar", xlab="Severe Weather Events",
        main="Damage Caused By Severe Weather Events")

grid(nx=NA, ny=NULL)

text(seq(1.75, 2*nrow(top5.economic), by=2), par("usr")[3]-2, 
     srt=30, adj=1, xpd=TRUE, labels=top5.economic$EVTYPE, cex=0.65)

plot of chunk economicplot

The top 5 severe weather events have the greatest economic consequences across the United States in the period 1996-2011. We found that crop damage estimates has minor affect comparing to property damage estimates (less than 12.1%). You can notice that top 5 events cause 66.7% of the total economic damages.