1) Synopsis

Storms in the U.S. cause significant damage, both in human and economic terms. Identification of the most damaging events could help direct the focus of our efforts to minimize the human and economic damages that result from these natural phenomena. The objective of this study was to determine which weather events warrant our focus. To that end, publicly available storm data were downloaded from the National Climatic Data Center. The data were imported and processed. Then for each weather event, the total number of fatalities and injuries and the economic cost in terms of property and crops (as well as the total cost) were calculated. The results indicate that tornadoes have the highest human cost, while flooding has the highest economic cost. Therefore, we should focus on minimizing the damages caused by these weather events.

2) Data Processing

2.1) Download and import data

The following code downloads the bz2 compressed file and unzips it if the data file is not already present in the working directory

fileName <- "data.csv"
zippedFileName <- paste(fileName, "bz2", sep = ".")
if (!exists("rawData")) {
        dataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download.file(dataURL, zippedFileName)
        rawData <- read.csv(zippedFileName, header = TRUE)
}

2.2) Subsetting the relevant data

The imported data has 37 columns. However, only 7 of these are relevant to the questions I seek to address. The relevant column names are listed below: * EVTYPE: This column of event types contains 985 factors.

  • FATALITIES: The number of deaths due to the event.
  • INJURIES: The number of injuries due to the event.

  • PROPDMG: The amount of property damage. The units of damage are in the following column.
  • PROPDMGEXP: The units of property damage.

  • CROPDMG: The amount of crop damage. The units of damage are in the following column.
  • CROPDMGEXP: The units of crop damage.

The following code subsets the raw data, extracting only the columns listed above.

relevantColumns <- c("EVTYPE", 
                     "FATALITIES", 
                     "INJURIES", 
                     "PROPDMG", 
                     "PROPDMGEXP", 
                     "CROPDMG", 
                     "CROPDMGEXP")

relevantData <- rawData[relevantColumns]

2.3) Calculate total property and crop damage

The two code chunks below remap the factors in the appropriate units column to a numeric value and use the corresponding “amount of damage” column to calculate the total damage. Remapping the values required the use of the function “mapvalues” in the “dplyr” package.

Total property damage (PROPDMGTOTAL, units = dollars) was calculated as follows:

library(plyr)
## Warning: package 'plyr' was built under R version 3.2.5
relevantData$PROPDMGEXP <- mapvalues(rawData$PROPDMGEXP, 
     from = c("K", "M","", "B", "m", "+", "0", "5", "6", "?", "4", "2", "3", "h", "7", "H", "-", "1", "8"), 
     to = c(10^3, 10^6, 1, 10^9, 10^6, 0,1,10^5, 10^6, 0, 10^4, 10^2, 10^3, 10^2, 10^7, 10^2, 0, 10, 10^8))

relevantData$PROPDMGTOTAL <- as.numeric(as.character(relevantData$PROPDMGEXP)) * relevantData$PROPDMG

Total crop damage (CROPDMGTOTAL, units = dollars) was calculated as follows:

relevantData$CROPDMGEXP <- mapvalues(rawData$CROPDMGEXP, 
     from = c("","M", "K", "m", "B", "?", "0", "k","2"), 
     to = c(1,10^6, 10^3, 10^6, 10^9, 0, 1, 10^3, 10^2))

relevantData$CROPDMGTOTAL <- as.numeric(as.character(relevantData$CROPDMGEXP)) * relevantData$CROPDMG

The following code subsets the data once again because the columns used to calculate total damages are no longer needed.

processedColumns <- c("EVTYPE", 
                      "FATALITIES", 
                      "INJURIES", 
                      "PROPDMGTOTAL", 
                      "CROPDMGTOTAL")
processedData <- relevantData[processedColumns]

3) Calculations

3.1) Which events are the most harmful to human health?

3.1.1) Which events result in the the most fatalities?

The following code aggregates fatalities by event type and sorts based on the number of fatalities.

fatalitiesByEvent <- aggregate(FATALITIES ~ EVTYPE, data = processedData, FUN = sum)
orderedFatalities <- fatalitiesByEvent[order(-fatalitiesByEvent$FATALITIES), ]
print(head(orderedFatalities), row.names = FALSE)
##          EVTYPE FATALITIES
##         TORNADO       5633
##  EXCESSIVE HEAT       1903
##     FLASH FLOOD        978
##            HEAT        937
##       LIGHTNING        816
##       TSTM WIND        504

3.1.2) Which events result in the most injuries?

The following code aggregates fatalities by event type and sorts based on the number of fatalities.

injuriesByEvent <- aggregate(INJURIES ~ EVTYPE, data = processedData, FUN = sum)
orderedInjuries <- injuriesByEvent[order(-injuriesByEvent$INJURIES), ]
print(head(orderedInjuries), row.names = FALSE)
##          EVTYPE INJURIES
##         TORNADO    91346
##       TSTM WIND     6957
##           FLOOD     6789
##  EXCESSIVE HEAT     6525
##       LIGHTNING     5230
##            HEAT     2100

3.2) Which events are the most economically damaging with respect to property and crops?

3.2.1) Which events result in the most property damage?

The following code aggregates property damage by event type and sorts based on the total cost.

propCostByEvent <- aggregate(PROPDMGTOTAL ~ EVTYPE, data = processedData, FUN = sum)
orderedPropCost <- propCostByEvent[order(-propCostByEvent$PROPDMGTOTAL), ]
print(head(orderedPropCost), row.names = FALSE)
##             EVTYPE PROPDMGTOTAL
##              FLOOD 144657709807
##  HURRICANE/TYPHOON  69305840000
##            TORNADO  56947380617
##        STORM SURGE  43323536000
##        FLASH FLOOD  16822673979
##               HAIL  15735267513

The following code plots the property costs associated with the 5 event types that cause the greatest property damage. This requires the function ggplot in the package ggplot2.

library(ggplot2)

output <- ggplot(data = orderedPropCost[1:5, ], aes(x = EVTYPE, y = PROPDMGTOTAL))
output + geom_bar(stat="identity") + xlab("Event type") + ylab("Economic Damage ($)") + labs(title="Top 5 events causing property damage")

3.2.2) Which events result in the most crop damage?

The following code aggregates property damage by event type and sorts based on the total cost.

cropCostByEvent <- aggregate(CROPDMGTOTAL ~ EVTYPE, data = processedData, FUN = sum)
orderedCropCost <- cropCostByEvent[order(-cropCostByEvent$CROPDMGTOTAL), ]
print(head(orderedCropCost), row.names = FALSE)
##       EVTYPE CROPDMGTOTAL
##      DROUGHT  13972566000
##        FLOOD   5661968450
##  RIVER FLOOD   5029459000
##    ICE STORM   5022113500
##         HAIL   3025954473
##    HURRICANE   2741910000

The following code plots the crop costs associated with the 5 event types that cause the greatest crop damage.

output <- ggplot(data = orderedCropCost[1:5, ], aes(x = EVTYPE, y = CROPDMGTOTAL))
output + geom_bar(stat="identity") + xlab("Event type") + ylab("Economic Damage ($)") + labs(title="Top 5 events causing crop damage")

3.2.3) Which events result in the largest total economic damage?

The following code sums the total property and crop damage to create the variable ECONDMGTOTAL (unit = dollars). It then aggregates that variable based on event type (EVTYPE).

processedData$ECONDMGTOTAL <- processedData$PROPDMGTOTAL + processedData$CROPDMGTOTAL

totalCostByEvent <- aggregate(ECONDMGTOTAL ~ EVTYPE, data = processedData, FUN = sum)
orderedTotalCost <- totalCostByEvent[order(-totalCostByEvent$ECONDMGTOTAL),]
print(head(orderedTotalCost), row.names = FALSE)
##             EVTYPE ECONDMGTOTAL
##              FLOOD 150319678257
##  HURRICANE/TYPHOON  71913712800
##            TORNADO  57362333887
##        STORM SURGE  43323541000
##               HAIL  18761221986
##        FLASH FLOOD  18243991079

The following code plots the total economic costs associated with the 5 event types that cause the greatest total economic damage.

output <- ggplot(data = orderedTotalCost[1:5, ], aes(x = EVTYPE, y = ECONDMGTOTAL))
output + geom_bar(stat="identity") + xlab("Event type") + ylab("Economic Damage ($)") + labs(title="Top 5 events causing economic damage")

4) Results and Conclusion

4.1) Question 1

“Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?”

Tornadoes cause both more fatalities and more injuries than any other event.

4.2) Question 2

“Across the United States, which types of events have the greatest economic consequences?”

Flooding causes the most property damage. However, it causes the second most crop damage. When these variables are summed to get the total economic impact, flooding is clearly the most damaging.

4.3) Conclusion

Based on the data, flooding by far causes the most economic damage of any of the events analyzed. Consequently, we should put effort into flood-proofing buildings. Likewise, the data indicate that tornadoes cause more injuries and fatalities than any of the other events analyzed. Therefore, efforts to improve human wellbeing should focus on alerting people of tornadoes sooner.