Most Severe Weather Events

Synopsis

In this report, we aim to answer a few questions based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which can be downloaded here

This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011. The questions are: 1) Across the United States, which types of events are most harmful with respect to population health? 2) Across the United States, which types of events have the greatest economic consequences?

Loading the Data & Data Processing

We first read the data from the CSV file.

sdata1 <- read.csv("repdata-data-StormData.csv.bz2")

After reading the data, we check the dimension of the dataset.

dim(sdata1)
## [1] 902297     37

We will also have a look at the names of the variables and the summary of the data set.

names(sdata1)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Results

To answer our questions, 7 variables of concern, i.e. EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP.

First question: Which type of events are most harmful to population health?

First we load the plyr package, and we put a higher weight on the number of fatalities based on the mean of injuries over mean of fatalities.

library(plyr)

weightf <- mean(sdata1[,"INJURIES"], na.rm = TRUE)/mean(sdata1[,"FATALITIES"], na.rm =TRUE)
print(weightf)
## [1] 9.278838

Then, we do a transformation on the fatalities, by multiplying the fatalities by the weight.

sdata1[,"fw"] <- sdata1[,"FATALITIES"]*weightf

Next, we sum both weighted fatalities and injuries as a measure of severity to population health.

sdata1[,"sev"] <- sdata1[,"fw"]+sdata1[,"INJURIES"]

Next, we sum the severity across time based on each event and arrange the severity in descending order.

eventsum <- aggregate(sdata1[,"sev"], list(events = sdata1[,"EVTYPE"]), FUN = sum)
eventsum <- arrange(eventsum, -x)

Finally, we answer our question by plotting the bar chart of top 5 events which are most harmful to population health.

par(mar=c(11,7,4,2) , mgp = c(6, 1, 0))
barplot(eventsum$x[1:5], main="Barplot of severity to population health against events", xlab="Events", ylab="Severity", names.arg=eventsum$events[1:5], las=2)

Second question: Which events have the greatest economic consequences?

First we look at the possible property and crop damage exponents.

summary(sdata1[,"PROPDMGEXP"])
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
summary(sdata1[,"CROPDMGEXP"])
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Then, we make the necessary adjustment. For example, “B” stands for billion, and we replace “B” with 9, so that we can use exp(multiplier) later. Anything that is not known will be assigned 0.

multiplier <- function(x){sapply(x, function(x){
                                if(x %in% "B"){9}
                                else if(x %in% c("M","m")){6}
                                else if(x %in% c("K","k")){3}
                                else if(x %in% c("H","h")){2}
                                else if(is.numeric(x)){x}
                                else{0}
})}
                     
sdata1[,"newpde"] <- multiplier(sdata1[,"PROPDMGEXP"])
sdata1[,"newcde"] <- multiplier(sdata1[,"CROPDMGEXP"])

Now we apply the transformation totaldmg = PROPDMG*exp(PROPDMGEXP) + CROPDMG(CROPDMGEXP)

sdata1[,"totaldmg"] <- sdata1[,"PROPDMG"]*exp(sdata1[,"newpde"]) + sdata1[,"CROPDMG"]*exp(sdata1[,"newcde"])

Now we sum the total economic damage based on each event and arrange the total economic damage in descending order.

evdmg <- aggregate(sdata1[,"totaldmg"],list(events = sdata1[,"EVTYPE"]), FUN = sum)
evdmg <- arrange(evdmg, -x)

Finally, we answer our question by plotting a bar chart of the top 5 events which have the greatest economic consequences.

par(mar=c(11,7,4,2) , mgp = c(6, 1, 0))
barplot(evdmg$x[1:5], main="Barplot of total economic cost against events", xlab="Events", ylab="Total Economic Cost", names.arg=evdmg$events[1:5], las=2,)