Reproducible Research: Storm Data

Measuring the costs of extreme weather events

Synopsis

Considering both severity and frequency, heat-related events are the most critical in terms of population health.

Such events related to heat however are not prone to induce economic losses.

In terms of number of total causalities and people affected tornados are, by far, the most relevant event to be addressed. Tornados are also costly.

The most severe events in terms of health are hurricanes, with by the highest number of causalities per event by a substancial margin.

Hurricanes are one of the top events in terms of economic cost, along with floods, among others.

Additionally, floods and droughts are particularly dangerous for crops.

Relative figures are influenced by the Hurricane Katrina, the costliest natural disaster and one of the deadliest hurricanes in the history of the US.

Numbers confirm Katrina as the most extreme single event of the period in terms of population health.

1. Data processing

To download the data into our system we use the link provided in the assignment's page, and pass it to the download.file() function.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile="stormData.csv.bz2", method="curl")

Although the file is compressed via the bzip2 algorithm, we can read into R simply by using the read.csv() function.

stormData <- read.csv("stormData.csv.bz2")

2. Results

2.1. Which types of events are most harmful with respect to population health

We will use the dplyr package to manipulate data throughout the analysis.

library(dplyr)

First thing we need to do is to group our data by event so we can aggregate it later on.

stormDataByEvent <- stormData %>%
                        group_by(EVTYPE)

Then, we need to take a look at the total number of fatalities and injuries caused by each event during the years.

eventHealthImpact <- stormDataByEvent %>%                       
                         summarize(totalFatalities=sum(FATALITIES), 
                                   totalInjuries=sum(INJURIES),
                                   countOfEvents=n()) %>%
                             mutate(totalCasualties=totalFatalities+totalInjuries)

Now we can take a look at the events with the most casualties, in absolute, globally.

eventHealthImpact <- eventHealthImpact %>% 
                         arrange(desc(totalCasualties))
totalCasualtiesTable <- kable(head(eventHealthImpact, n=10),
                        format="html",
                        digits=0,
                        row.names=TRUE,
                        caption="Top-10 Events by Total Casualties",
                        col.names=c("Event Type",
                                    "Total Fatalities", 
                                    "Total Injuries", 
                                    "Number of Events", 
                                    "Total Casualties"))

Therefore, we can take a look at our top-10 events with the most casualties over the period.

Top-10 Events by Total Casualties
Event Type Total Fatalities Total Injuries Number of Events Total Casualties
1 TORNADO 5633 91346 60652 96979
2 EXCESSIVE HEAT 1903 6525 1678 8428
3 TSTM WIND 504 6957 219940 7461
4 FLOOD 470 6789 25326 7259
5 LIGHTNING 816 5230 15754 6046
6 HEAT 937 2100 767 3037
7 FLASH FLOOD 978 1777 54277 2755
8 ICE STORM 89 1975 2006 2064
9 THUNDERSTORM WIND 133 1488 82563 1621
10 WINTER STORM 206 1321 11433 1527

However, the total number of casualties may not be the best indicator for the severity of each event.

In order to study the severity of each type of event, we have to look at relative instead of absolute values.

eventHealthImpact <- eventHealthImpact %>%
                         mutate(casualtiesPerEvent=totalCasualties/countOfEvents) %>%
                             filter(countOfEvents>10) %>%
                                 arrange(desc(casualtiesPerEvent))

The code for the table summarizing the results.

relativeCasualtiesTable <- kable(head(eventHealthImpact, n=10),
                                 format="html",
                                 digits=0,
                                 row.names=TRUE,
                                 caption="Top-10 Events by Casualties per Event",
                                 col.names=c("Event Type",
                                             "Total Fatalities", 
                                             "Total Injuries", 
                                             "Number of Events", 
                                             "Total Casualties",
                                             "Casualties per Event"))

Find below the top-10 relative to casualties in relative terms.

Top-10 Events by Casualties per Event
Event Type Total Fatalities Total Injuries Number of Events Total Casualties Casualties per Event
1 HURRICANE/TYPHOON 64 1275 88 1339 15
2 EXTREME HEAT 96 155 22 251 11
3 TSUNAMI 33 129 20 162 8
4 GLAZE 7 216 32 223 7
5 HEAT WAVE 172 309 74 481 6
6 EXCESSIVE HEAT 1903 6525 1678 8428 5
7 HEAT 937 2100 767 3037 4
8 ICE 6 137 61 143 2
9 UNSEASONABLY WARM AND DRY 29 0 13 29 2
10 SNOW SQUALL 2 35 19 37 2

Remember that a small number of observations can be consequence of lack of data and does not imply necessarily low frequency.

This is extremely relevant when evalutating risk.

Let's take a look at the distribution of these critical events over time.

library(lubridate)

stormData$BGN_DATE <- mdy_hms(stormData$BGN_DATE)
min(stormData$BGN_DATE)
## [1] "1950-01-03 UTC"
criticalEventsList <- eventHealthImpact[1:10, 1]$EVTYPE
criticalEventsData <- stormData %>%
                          filter(EVTYPE %in% criticalEventsList) %>%
                              group_by(EVTYPE, BGN_DATE) %>%
                                 summarize(countOfEvents=n())

As can be seen in the plot below, apart those related with excessive heat, most critical events appear to be rather sporadic.

library(ggplot2)

qplot(x=BGN_DATE, 
      y=countOfEvents, 
      colour=EVTYPE, 
      facets=EVTYPE~., 
      data=criticalEventsData) + xlab("\nBegin Date") + ylab("Number of Events\n")

plot of chunk unnamed-chunk-7

The concentration of hurricane-type events around 2005 suggest these observations must be related to the Hurricane Katrina.

2.3. Which types of events have the greatest economic consequences?

Before starting, we need to process data in order to make damage data comparable.

damageData <- stormData
damageData[which(damageData$PROPDMGEXP=="B"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="B"), ]$PROPDMG*10^9
damageData[which(damageData$PROPDMGEXP=="M"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="M"), ]$PROPDMG*10^6
damageData[which(damageData$PROPDMGEXP=="K"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="K"), ]$PROPDMG*10^3
damageData[which(damageData$PROPDMGEXP=="H"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="H"), ]$PROPDMG*10^2
damageData[which(damageData$CROPDMGEXP=="B"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="B"), ]$CROPDMG*10^9
damageData[which(damageData$CROPDMGEXP=="M"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="M"), ]$CROPDMG*10^6
damageData[which(damageData$CROPDMGEXP=="K"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="K"), ]$CROPDMG*10^3
damageData[which(damageData$CROPDMGEXP=="H"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="H"), ]$CROPDMG*10^2

Once we have done this, we can follow to aggregation.

damageData <- damageData %>%
                  group_by(EVTYPE) %>%
                      summarize(propDmg=sum(PROPDMG),
                                cropDmg=sum(CROPDMG)) %>%
                          mutate(totalDmg=propDmg+cropDmg) %>%
                              arrange(desc(totalDmg))

totalDamageTable <- kable(head(damageData, n=10),
                          format="html",
                          digits=0,
                          row.names=TRUE,
                          caption="Top-10 Events by Total Damage",
                          col.names=c("Event Type",
                                    "Property Damage", 
                                    "Crop Damage", 
                                    "Total Damage"))

Find below the top-10 for most damaging types of events.

Top-10 Events by Total Damage
Event Type Property Damage Crop Damage Total Damage
1 FLOOD 144657709807 5661968450 150319678257
2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
3 TORNADO 56925660790 414953270 57340614060
4 STORM SURGE 43323536000 5000 43323541000
5 HAIL 15727367548 3025537890 18752905438
6 FLASH FLOOD 16140812067 1421317100 17562129167
7 DROUGHT 1046106000 13972566000 15018672000
8 HURRICANE 11868319010 2741910000 14610229010
9 RIVER FLOOD 5118945500 5029459000 10148404500
10 ICE STORM 3944927860 5022113500 8967041360