Synopsis:

Using data from the NOAA (National Oceanic and Atmospheric Administration) storm database, weather event types associated with high damage to human health and economy were identified. High damage to human health is defined as the recorded number of a fatalities and injuries associated with a weather event. Damage as done to the economy is defined as the damage done to property as well as damage done to crops. For this exercise, fatalities, injuries, and economic damage will be reported separately. Results show tornadoes are events with the highest number of associated fatalities. Tornadoes are also associated with the highest number of injuries. Floods are the number one event with the highest associated economic damage.

Data Processing:

Downloading and importing the data:

dataURL <-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
      
dir.create("./RRA2")
      
download.file(dataURL, "./RRA2/data.bz2")

data <- data.frame(read.csv("./RRA2/data.bz2"))

Preparing and Subsetting the Data

Since we only desire fatalities, injuries, damage to property and damage to crops, we need to look at the variables contained in this data set, then subset.

Let’s look at the names of the columns we have:

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Reading the output and using the associated documentation, the columns we require are: * EVTYPE - shows the type of event * FATALITIES - shows the number of human fatalities for that event * INJURIES - shows the number of human injuries for that event * PROPDMG & PROPDMGEXP- when multiplied together, will shows the estimated value of property damage in USD. * CROPDMG & CROPDMGEXP- when multiplied together, will show the estimated value of damage in USD.

Lets subset the data into a smaller data frame using only the variables we need:

req.variables <-c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG",
                  "CROPDMGEXP")
data <- data[req.variables]

Since the variables with the suffix “EXP” are characters, they need to be converted to a numeric class so it can multiply with the estimated damages to get our total amount of damage (i.e. PROPDMG & CROPDMG)

data$PROPEXP[data$PROPDMGEXP ==  "K"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "M"] <- 10^6
data$PROPEXP[data$PROPDMGEXP == ""] <- 1
data$PROPEXP[data$PROPDMGEXP == "B"] <- 10^9
data$PROPEXP[data$PROPDMGEXP == "m"] <- 10^6
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "5"] <- 10^5
data$PROPEXP[data$PROPDMGEXP == "6"] <- 10^6
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
data$PROPEXP[data$PROPDMGEXP == "4"] <- 10000
data$PROPEXP[data$PROPDMGEXP == "2"] <- 100
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "h"] <- 100
data$PROPEXP[data$PROPDMGEXP == "7"] <- 10^7
data$PROPEXP[data$PROPDMGEXP == "H"] <- 100
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "1"] <- 10
data$PROPEXP[data$PROPDMGEXP == "8"] <- 10^8

Lets make a new variable is created that multiplies property value and magnitude:

data$PROPDMGVAL <- data$PROPDMG*data$PROPEXP

The same is done for the crop data, identifying magnitude and multiplying it to create a new value

data$CROPEXP[data$CROPDMGEXP == ""] <- 1
data$CROPEXP[data$CROPDMGEXP == "M"] <- 10^6
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "m"] <- 10^9
data$CROPEXP[data$CROPDMGEXP == "B"] <- 10^6
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "k"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "2"] <- 100

data$CROPDMGVAL <- data$CROPDMG*data$CROPEXP

Since I want a max of 3 figures for this presentation, we will plot fatalities, injuries, and economic damage separately. Economic damage is defined as the sum of both property and crop damage.

Create new values for economic damage:

data$ECDAMAGE <- data$PROPDMGVAL+data$CROPDMGVAL

Aggregate the sums by type and order the EVTYPES from most to least amounts:

fatal <- aggregate(FATALITIES ~ EVTYPE, data = data, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data = data, FUN = sum)
ecdamage <- aggregate(ECDAMAGE ~ EVTYPE, data = data, FUN = sum)

fatal5 <- fatal[order(-fatal$FATALITIES),][1:5,]
injury5 <- injury[order(-injury$INJURIES),][1:5,]
ecdamage5 <- ecdamage[order(-ecdamage$ECDAMAGE),][1:5,]

Plotting 3 figures of fatalities, injuries, and economic damage:

par(mfrow=c(1, 3), mar=c(10, 4, 3, 2), mgp=c(3, 1, 0), las=3,cex=0.8, oma=c(0,0,2,0))
barplot(fatal5$FATALITIES, names.arg=fatal5$EVTYPE,ylim= c(0,7000),
            col=heat.colors(5),ylab="Total Fatalities",
            main="Fatalities")
barplot((injury5$INJURIES)/(1000), names.arg=injury5$EVTYPE,ylim=c(0,100),
        col=heat.colors(5), ylab="Total Injuries in Thousands",
        main="Injuries")
barplot((ecdamage5$ECDAMAGE)/(10^9), names.arg=ecdamage5$EVTYPE, ylim=c(0,170),
        col=heat.colors(5), ylab="USD in Billions",
        main="Economic Damage")
title("Top 5 Events with Highest Fatalities, Injuries, and Economic Damage",
      outer=TRUE)

Results: