Data Processing

In order to load the data into R, the source .bz2 was downloaded to working directory and the following line of code loads the data into the dataTable variable

dataTable <- read.csv("repdata_data_StormData.csv.bz2")

Records of event types which do not contain any Health or Property events can be removed with the follwing line of code

dataTable <- dataTable[(dataTable$FATALITIES != 0 & dataTable$PROPDMG != 0 & dataTable$INJURIES != 0 | dataTable$CROPDMG != 0),]

In order to get the correct numerical monetary data for property and crop damage, the values in the PROPDMG and CROPDMG must be multiplied by the appropriate factors from the PROPDMGEXP and CROPDMGEXP columns (exponents which will considered are K, M and B)

dataTable[dataTable$CROPDMGEXP %in% c("K","k"),"CROPDMG"] <- dataTable[dataTable$CROPDMGEXP %in% c("K","k"),"CROPDMG"] * 1000
dataTable[dataTable$PROPDMGEXP %in% c("K","k"),"PROPDMG"] <- dataTable[dataTable$PROPDMGEXP %in% c("K","k"),"PROPDMG"] * 1000
dataTable[dataTable$CROPDMGEXP %in% c("M","m"),"CROPDMG"] <- dataTable[dataTable$CROPDMGEXP %in% c("m","M"),"CROPDMG"] * 1000000
dataTable[dataTable$PROPDMGEXP %in% c("M","m"),"PROPDMG"] <- dataTable[dataTable$PROPDMGEXP %in% c("m","M"),"PROPDMG"] * 1000000
dataTable[dataTable$CROPDMGEXP %in% c("b","B"),"CROPDMG"] <- dataTable[dataTable$CROPDMGEXP %in% c("b","B"),"CROPDMG"] * 1000000000
dataTable[dataTable$PROPDMGEXP %in% c("b","B"),"PROPDMG"] <- dataTable[dataTable$PROPDMGEXP %in% c("b","B"),"PROPDMG"] * 1000000000

In order to perform the required analysis data must be grouped based on the EVTYPE column (Event Type); this can be done by using the functions in the dplyr package.

library(dplyr)

In order to simplify the analysis, the fatality and injuries are combined into a single column named HEALTH_EVENT.

dataTable <- transform(dataTable, HEALTH_EVENT = dataTable$FATALITIES + dataTable$INJURIES)

Additionally, the crop and property damage are combined into a single column named ECON_LOSS

dataTable <- transform(dataTable, ECON_LOSS = dataTable$CROPDMG + dataTable$PROPDMG)

As discussed previously, data must be grouped by the EVTYPE column.

dataTable <- group_by(dataTable, EVTYPE)

Data can then be summarized by event type: sum of health events ,sum of economic losses and count of records per event type will be added to the table named sumTable. Furthermore, data will be ordered in the resulting table based on health events and economic losses (descending). Economic data will be summarized to display values in billions of dollars.

sumTable <- summarise(dataTable, HEALTH_EVENTS = sum(HEALTH_EVENT), ECON_LOSSES = sum(ECON_LOSS), EV_COUNT = n())
healthTable <- arrange(sumTable, desc(HEALTH_EVENTS))
econTable <- arrange(sumTable, desc(ECON_LOSSES))
econTable$ECON_LOSSES <- econTable$ECON_LOSSES/1000000000

In order to make it easier to visualize the data, only the top five event type will be considered.

healthTable <- healthTable[1:5,]
econTable <- econTable[1:5,]

Results

The following graph displays the top 5 event types by count of health event (injury or death) due to the corresponding event:

par(cex = 0.7)
barplot(names.arg = healthTable$EVTYPE,height = healthTable$HEALTH_EVENTS, xlab = "Event Type", 
        ylab = "Health Event", main = "Weather Events Most Harmful to Population Health", ylim = c(0,65000))

The following graph displays the top 5 event types by economic cost (property or agricultural) due to the corresponding event:

par(cex = 0.7)
barplot(names.arg = econTable$EVTYPE,height = econTable$ECON_LOSSES, xlab = "Event Type", 
        ylab = "Cost (Million $)", main = "Weather Events with Greatest Economic Consequences", ylim = c(0,140))