Reproducible Research: Storm Data

Measuring the costs of extreme weather events

Synopsis

Considering both severity and frequency, heat-related events are the most critical in terms of population health.

Such events related to heat however are not prone to induce economic losses.

In terms of number of total causalities and people affected tornados are, by far, the most relevant event to be addressed. Tornados are also costly.

The most severe events in terms of health are hurricanes, with by the highest number of causalities per event by a substancial margin.

Hurricanes are one of the top events in terms of economic cost, along with floods, among others.

Additionally, floods and droughts are particularly dangerous for crops.

Relative figures are influenced by the Hurricane Katrina, the costliest natural disaster and one of the deadliest hurricanes in the history of the US.

Numbers confirm Katrina as the most extreme single event of the period in terms of population health.

1. Data processing

To download the data into our system we use the link provided in the assignment's page, and pass it to the download.file() function.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile="stormData.csv.bz2", method="curl")

Although the file is compressed via the bzip2 algorithm, we can read into R simply by using the read.csv() function.

stormData <- read.csv("stormData.csv.bz2")

2. Results

2.1. Which types of events are most harmful with respect to population health

We will use the dplyr package to manipulate data throughout the analysis.

library(dplyr)

First thing we need to do is to group our data by event so we can aggregate it later on.

stormDataByEvent <- stormData %>%
                        group_by(EVTYPE)

Then, we need to take a look at the total number of fatalities and injuries caused by each event during the years.

eventHealthImpact <- stormDataByEvent %>%                       
                         summarize(totalFatalities=sum(FATALITIES), 
                                   totalInjuries=sum(INJURIES),
                                   countOfEvents=n()) %>%
                             mutate(totalCasualties=totalFatalities+totalInjuries)

Now we can take a look at the events with the most casualties, in absolute, globally.

eventHealthImpact <- eventHealthImpact %>% 
                         arrange(desc(totalCasualties))
totalCasualtiesTable <- kable(head(eventHealthImpact, n=10),
                        format="html",
                        digits=0,
                        row.names=TRUE,
                        caption="Top-10 Events by Total Casualties",
                        col.names=c("Event Type",
                                    "Total Fatalities", 
                                    "Total Injuries", 
                                    "Number of Events", 
                                    "Total Casualties"))

Therefore, we can take a look at our top-10 events with the most casualties over the period.

Top-10 Events by Total Casualties
	Event Type	Total Fatalities	Total Injuries	Number of Events	Total Casualties
1	TORNADO	5633	91346	60652	96979
2	EXCESSIVE HEAT	1903	6525	1678	8428
3	TSTM WIND	504	6957	219940	7461
4	FLOOD	470	6789	25326	7259
5	LIGHTNING	816	5230	15754	6046
6	HEAT	937	2100	767	3037
7	FLASH FLOOD	978	1777	54277	2755
8	ICE STORM	89	1975	2006	2064
9	THUNDERSTORM WIND	133	1488	82563	1621
10	WINTER STORM	206	1321	11433	1527

However, the total number of casualties may not be the best indicator for the severity of each event.

In order to study the severity of each type of event, we have to look at relative instead of absolute values.

eventHealthImpact <- eventHealthImpact %>%
                         mutate(casualtiesPerEvent=totalCasualties/countOfEvents) %>%
                             filter(countOfEvents>10) %>%
                                 arrange(desc(casualtiesPerEvent))

The code for the table summarizing the results.

relativeCasualtiesTable <- kable(head(eventHealthImpact, n=10),
                                 format="html",
                                 digits=0,
                                 row.names=TRUE,
                                 caption="Top-10 Events by Casualties per Event",
                                 col.names=c("Event Type",
                                             "Total Fatalities", 
                                             "Total Injuries", 
                                             "Number of Events", 
                                             "Total Casualties",
                                             "Casualties per Event"))

Find below the top-10 relative to casualties in relative terms.

Top-10 Events by Casualties per Event
	Event Type	Total Fatalities	Total Injuries	Number of Events	Total Casualties	Casualties per Event
1	HURRICANE/TYPHOON	64	1275	88	1339	15
2	EXTREME HEAT	96	155	22	251	11
3	TSUNAMI	33	129	20	162	8
4	GLAZE	7	216	32	223	7
5	HEAT WAVE	172	309	74	481	6
6	EXCESSIVE HEAT	1903	6525	1678	8428	5
7	HEAT	937	2100	767	3037	4
8	ICE	6	137	61	143	2
9	UNSEASONABLY WARM AND DRY	29	0	13	29	2
10	SNOW SQUALL	2	35	19	37	2

Remember that a small number of observations can be consequence of lack of data and does not imply necessarily low frequency.

This is extremely relevant when evalutating risk.

Let's take a look at the distribution of these critical events over time.

library(lubridate)

stormData$BGN_DATE <- mdy_hms(stormData$BGN_DATE)
min(stormData$BGN_DATE)

## [1] "1950-01-03 UTC"

criticalEventsList <- eventHealthImpact[1:10, 1]$EVTYPE
criticalEventsData <- stormData %>%
                          filter(EVTYPE %in% criticalEventsList) %>%
                              group_by(EVTYPE, BGN_DATE) %>%
                                 summarize(countOfEvents=n())

As can be seen in the plot below, apart those related with excessive heat, most critical events appear to be rather sporadic.

library(ggplot2)

qplot(x=BGN_DATE, 
      y=countOfEvents, 
      colour=EVTYPE, 
      facets=EVTYPE~., 
      data=criticalEventsData) + xlab("\nBegin Date") + ylab("Number of Events\n")

plot of chunk unnamed-chunk-7

The concentration of hurricane-type events around 2005 suggest these observations must be related to the Hurricane Katrina.

2.3. Which types of events have the greatest economic consequences?

Before starting, we need to process data in order to make damage data comparable.

damageData <- stormData
damageData[which(damageData$PROPDMGEXP=="B"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="B"), ]$PROPDMG*10^9
damageData[which(damageData$PROPDMGEXP=="M"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="M"), ]$PROPDMG*10^6
damageData[which(damageData$PROPDMGEXP=="K"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="K"), ]$PROPDMG*10^3
damageData[which(damageData$PROPDMGEXP=="H"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="H"), ]$PROPDMG*10^2
damageData[which(damageData$CROPDMGEXP=="B"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="B"), ]$CROPDMG*10^9
damageData[which(damageData$CROPDMGEXP=="M"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="M"), ]$CROPDMG*10^6
damageData[which(damageData$CROPDMGEXP=="K"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="K"), ]$CROPDMG*10^3
damageData[which(damageData$CROPDMGEXP=="H"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="H"), ]$CROPDMG*10^2

Once we have done this, we can follow to aggregation.

damageData <- damageData %>%
                  group_by(EVTYPE) %>%
                      summarize(propDmg=sum(PROPDMG),
                                cropDmg=sum(CROPDMG)) %>%
                          mutate(totalDmg=propDmg+cropDmg) %>%
                              arrange(desc(totalDmg))

totalDamageTable <- kable(head(damageData, n=10),
                          format="html",
                          digits=0,
                          row.names=TRUE,
                          caption="Top-10 Events by Total Damage",
                          col.names=c("Event Type",
                                    "Property Damage", 
                                    "Crop Damage", 
                                    "Total Damage"))

Find below the top-10 for most damaging types of events.

Top-10 Events by Total Damage
	Event Type	Property Damage	Crop Damage	Total Damage
1	FLOOD	144657709807	5661968450	150319678257
2	HURRICANE/TYPHOON	69305840000	2607872800	71913712800
3	TORNADO	56925660790	414953270	57340614060
4	STORM SURGE	43323536000	5000	43323541000
5	HAIL	15727367548	3025537890	18752905438
6	FLASH FLOOD	16140812067	1421317100	17562129167
7	DROUGHT	1046106000	13972566000	15018672000
8	HURRICANE	11868319010	2741910000	14610229010
9	RIVER FLOOD	5118945500	5029459000	10148404500
10	ICE STORM	3944927860	5022113500	8967041360