=====================================================================================================

Synopsis

In this document I attempt to figure the storm events from 1950-2011 that had the greatest impact on population health and on the economy. The data come from the National Weather Service.

I will determine which events cause the highest amounts of injuries and fatalities as well as Property and Crop damage. With this knowledge in hand, it will be easier to determine which types of natural events are the most dangerous and are most worth protecting against.

Question 1: Which types of events are most harmful with respect to Population Health?

Data Processing

First I load in the data to the current working directory using the R.utils package’s bunzip2 function. I cache this process because it takes a very long time to load each time.

library(R.utils)
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "./StormData.csv.bz2"
download.file(fileURL, destfile)
bunzip2("StormData.csv.bz2", "StormData.csv", remove = FALSE, skip = TRUE)
## [1] "StormData.csv"
## attr(,"temporary")
## [1] FALSE
stormdata <- read.csv("./StormData.csv", header=TRUE)

Next I loaded in the “dplyr” library because it has very useful functions for manipulating data frames.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

There are only a few relevant columnns for this analysis. They are the Event Type, Fatalities, Injuries, Property Damage, Property Damage Multiplier, Crop Damage, and Crop Damage Multiplier. I am subsetting the data to just these columns to make the work neater and easier. I should note that the data is very untidy and in many cases some events may show up with different forms depending on how each regional reporting station decided to record the event. This may affect the analysis later on but for the purposes of this report, I will ignore this.

stormdata <- stormdata[,c(8, 23:28)]

The first step is to find the total amount of injuries and fatalities based on the Event Type. To do this, I will use the dplyr aggregate and arrange functions and return the top 15 results for each grouping.

injuries <- aggregate(INJURIES~EVTYPE, stormdata, sum)
injuries <- arrange(injuries, desc(INJURIES))
injuries <- injuries[1:15, ]
injuries
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
## 11      WINTER STORM     1321
## 12 HURRICANE/TYPHOON     1275
## 13         HIGH WIND     1137
## 14        HEAVY SNOW     1021
## 15          WILDFIRE      911
fatalities <- aggregate(FATALITIES~EVTYPE, stormdata, sum)
fatalities <- arrange(fatalities, desc(FATALITIES))
fatalities <- fatalities[1:15, ]
fatalities
##               EVTYPE FATALITIES
## 1            TORNADO       5633
## 2     EXCESSIVE HEAT       1903
## 3        FLASH FLOOD        978
## 4               HEAT        937
## 5          LIGHTNING        816
## 6          TSTM WIND        504
## 7              FLOOD        470
## 8        RIP CURRENT        368
## 9          HIGH WIND        248
## 10         AVALANCHE        224
## 11      WINTER STORM        206
## 12      RIP CURRENTS        204
## 13         HEAT WAVE        172
## 14      EXTREME COLD        160
## 15 THUNDERSTORM WIND        133

Results

Now that the data is sorted by most injuries and fatalities, I will plot the two using barplots. Hopefully these plots will give us an idea of what kinds of events are worst for public health.

par(mfrow=c(1,2), mar=c(12,4,4,2), cex=.75)
barplot(injuries$INJURIES, names.arg = injuries$EVTYPE, las = 3, ylab = "Number of Injuries", main = "Top 15 Causes of Injuries", col="red")
barplot(fatalities$FATALITIES, names.arg=fatalities$EVTYPE, las=3, ylab="Number of Fatalities", main = "Top 15 Causes of Fatalities", col="blue")

You can see very clearly that Tornadoes are far and away the most dangerous event with respect to public health. Excessive heat, though not on the scale of Tornadoes, also causes a lot of problems. Flooding/Flash Flooding and Lightning also appear in the top 5 in both injuries and fatalities so it is important to be cautious during any of these kinds of storms and follow the appropriate guidelines to ensure maximum safety.

Question 2: Which types of events have the greatest economic consequences?

Data Processing

This dataset has a different way of handling the columns for Property and Crop damage. Each kind of damage has two columns. The first contains a number between 1 and 4 digits. The second column contains a multiplier with a value of “H,” “K,” “M,” or “B,” which correstpond to hundred, thousand, million, or billion, respectively. Because of this, we need to multiply the (PC)ROPDMG column by the (PC)ROPDMGEXP column where the second column has the letter replaced with the corresponding number.

# initialize a new column with the total property damage
stormdata$PROPERTY = 0

# multiply the values in the PROPDMG column by the appropriate multiplier and add it to the new column
stormdata[stormdata$PROPDMGEXP == "H", ]$PROPERTY = stormdata[stormdata$PROPDMGEXP == "H", ]$PROPDMG * 10^2
stormdata[stormdata$PROPDMGEXP == "K", ]$PROPERTY = stormdata[stormdata$PROPDMGEXP == "K", ]$PROPDMG * 10^3
stormdata[stormdata$PROPDMGEXP == "M", ]$PROPERTY = stormdata[stormdata$PROPDMGEXP == "M", ]$PROPDMG * 10^6
stormdata[stormdata$PROPDMGEXP == "B", ]$PROPERTY = stormdata[stormdata$PROPDMGEXP == "B", ]$PROPDMG * 10^9

# initialize a new column with the total crop damage
stormdata$CROP = 0

#multiply the values in the CROPDMG column by the appropriate multiplier and add it to the new column
stormdata[stormdata$CROPDMGEXP == "H", ]$CROP = stormdata[stormdata$CROPDMGEXP == "H", ]$CROPDMG * 10^2
stormdata[stormdata$CROPDMGEXP == "K", ]$CROP = stormdata[stormdata$CROPDMGEXP == "K", ]$CROPDMG * 10^3
stormdata[stormdata$CROPDMGEXP == "M", ]$CROP = stormdata[stormdata$CROPDMGEXP == "M", ]$CROPDMG * 10^6
stormdata[stormdata$CROPDMGEXP == "B", ]$CROP = stormdata[stormdata$CROPDMGEXP == "B", ]$CROPDMG * 10^9

Now that the correct values have been calculated, we can arrange the data like we did in the injuries and fatalities scenarios above using dplyr. I am going to add the values together rather than reporting the total property and crop damage for each type of event.

economictoll <- aggregate(PROPERTY+CROP ~ EVTYPE, stormdata, sum)
names(economictoll) <- c("Event", "TotalEconomicToll")
economictoll <- arrange(economictoll, desc(TotalEconomicToll))
economictoll <- economictoll[1:15,]
#Divide by 1 billion because the results are in the billions and it makes the scale of the ensuing plot cleaner
economictoll$TotalEconomicToll <- economictoll$TotalEconomicToll/10^9

Results

With this data sorted by economic toll, it is time to create a barplot to see what event causes the highest economic toll.

par(mar=c(12,4,4,2))
barplot(economictoll$TotalEconomicToll, names.arg = economictoll$Event, las = 3, ylab = "Total Economic Toll (Billions of $)", main = "Top 15 Events by Total Economic Toll", col = "blue")

Based on this plot, the storm events with the highest economic toll have to do with precipitation and wind. Flooding is far and away the highest, followed by Hurricane/Typhoon and Tornado. These results make sense because flooding does damages to both buildings and crops, so it should be high in both property and crop damage. Hurricanes and typhoons happen mostly along the coasts but those storms bring heavy winds and strong rains so most of the damage is probably done to property. Tornadoes happen mainly in the middle of the country where a lot of the land is crops, but the high winds from a tornado can also devastate a city so it has potential to cause a lot of damage.