Considering both severity and frequency, heat-related events are the most critical in terms of population health.
Such events related to heat however are not prone to induce economic losses.
In terms of number of total causalities and people affected tornados are, by far, the most relevant event to be addressed. Tornados are also costly.
The most severe events in terms of health are hurricanes, with by the highest number of causalities per event by a substancial margin.
Hurricanes are one of the top events in terms of economic cost, along with floods, among others.
Additionally, floods and droughts are particularly dangerous for crops.
Relative figures are influenced by the Hurricane Katrina, the costliest natural disaster and one of the deadliest hurricanes in the history of the US.
Numbers confirm Katrina as the most extreme single event of the period in terms of population health.
To download the data into our system we use the link provided in the assignment's page, and pass it to the download.file() function.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile="stormData.csv.bz2", method="curl")
Although the file is compressed via the bzip2 algorithm, we can read into R simply by using the read.csv() function.
stormData <- read.csv("stormData.csv.bz2")
We will use the dplyr package to manipulate data throughout the analysis.
library(dplyr)
First thing we need to do is to group our data by event so we can aggregate it later on.
stormDataByEvent <- stormData %>%
group_by(EVTYPE)
Then, we need to take a look at the total number of fatalities and injuries caused by each event during the years.
eventHealthImpact <- stormDataByEvent %>%
summarize(totalFatalities=sum(FATALITIES),
totalInjuries=sum(INJURIES),
countOfEvents=n()) %>%
mutate(totalCasualties=totalFatalities+totalInjuries)
Now we can take a look at the events with the most casualties, in absolute, globally.
eventHealthImpact <- eventHealthImpact %>%
arrange(desc(totalCasualties))
totalCasualtiesTable <- kable(head(eventHealthImpact, n=10),
format="html",
digits=0,
row.names=TRUE,
caption="Top-10 Events by Total Casualties",
col.names=c("Event Type",
"Total Fatalities",
"Total Injuries",
"Number of Events",
"Total Casualties"))
Therefore, we can take a look at our top-10 events with the most casualties over the period.
| Event Type | Total Fatalities | Total Injuries | Number of Events | Total Casualties | |
|---|---|---|---|---|---|
| 1 | TORNADO | 5633 | 91346 | 60652 | 96979 |
| 2 | EXCESSIVE HEAT | 1903 | 6525 | 1678 | 8428 |
| 3 | TSTM WIND | 504 | 6957 | 219940 | 7461 |
| 4 | FLOOD | 470 | 6789 | 25326 | 7259 |
| 5 | LIGHTNING | 816 | 5230 | 15754 | 6046 |
| 6 | HEAT | 937 | 2100 | 767 | 3037 |
| 7 | FLASH FLOOD | 978 | 1777 | 54277 | 2755 |
| 8 | ICE STORM | 89 | 1975 | 2006 | 2064 |
| 9 | THUNDERSTORM WIND | 133 | 1488 | 82563 | 1621 |
| 10 | WINTER STORM | 206 | 1321 | 11433 | 1527 |
However, the total number of casualties may not be the best indicator for the severity of each event.
In order to study the severity of each type of event, we have to look at relative instead of absolute values.
eventHealthImpact <- eventHealthImpact %>%
mutate(casualtiesPerEvent=totalCasualties/countOfEvents) %>%
filter(countOfEvents>10) %>%
arrange(desc(casualtiesPerEvent))
The code for the table summarizing the results.
relativeCasualtiesTable <- kable(head(eventHealthImpact, n=10),
format="html",
digits=0,
row.names=TRUE,
caption="Top-10 Events by Casualties per Event",
col.names=c("Event Type",
"Total Fatalities",
"Total Injuries",
"Number of Events",
"Total Casualties",
"Casualties per Event"))
Find below the top-10 relative to casualties in relative terms.
| Event Type | Total Fatalities | Total Injuries | Number of Events | Total Casualties | Casualties per Event | |
|---|---|---|---|---|---|---|
| 1 | HURRICANE/TYPHOON | 64 | 1275 | 88 | 1339 | 15 |
| 2 | EXTREME HEAT | 96 | 155 | 22 | 251 | 11 |
| 3 | TSUNAMI | 33 | 129 | 20 | 162 | 8 |
| 4 | GLAZE | 7 | 216 | 32 | 223 | 7 |
| 5 | HEAT WAVE | 172 | 309 | 74 | 481 | 6 |
| 6 | EXCESSIVE HEAT | 1903 | 6525 | 1678 | 8428 | 5 |
| 7 | HEAT | 937 | 2100 | 767 | 3037 | 4 |
| 8 | ICE | 6 | 137 | 61 | 143 | 2 |
| 9 | UNSEASONABLY WARM AND DRY | 29 | 0 | 13 | 29 | 2 |
| 10 | SNOW SQUALL | 2 | 35 | 19 | 37 | 2 |
Remember that a small number of observations can be consequence of lack of data and does not imply necessarily low frequency.
This is extremely relevant when evalutating risk.
Let's take a look at the distribution of these critical events over time.
library(lubridate)
stormData$BGN_DATE <- mdy_hms(stormData$BGN_DATE)
min(stormData$BGN_DATE)
## [1] "1950-01-03 UTC"
criticalEventsList <- eventHealthImpact[1:10, 1]$EVTYPE
criticalEventsData <- stormData %>%
filter(EVTYPE %in% criticalEventsList) %>%
group_by(EVTYPE, BGN_DATE) %>%
summarize(countOfEvents=n())
As can be seen in the plot below, apart those related with excessive heat, most critical events appear to be rather sporadic.
library(ggplot2)
qplot(x=BGN_DATE,
y=countOfEvents,
colour=EVTYPE,
facets=EVTYPE~.,
data=criticalEventsData) + xlab("\nBegin Date") + ylab("Number of Events\n")
The concentration of hurricane-type events around 2005 suggest these observations must be related to the Hurricane Katrina.
Before starting, we need to process data in order to make damage data comparable.
damageData <- stormData
damageData[which(damageData$PROPDMGEXP=="B"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="B"), ]$PROPDMG*10^9
damageData[which(damageData$PROPDMGEXP=="M"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="M"), ]$PROPDMG*10^6
damageData[which(damageData$PROPDMGEXP=="K"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="K"), ]$PROPDMG*10^3
damageData[which(damageData$PROPDMGEXP=="H"), which(colnames(damageData)=="PROPDMG")] <- damageData[which(damageData$PROPDMGEXP=="H"), ]$PROPDMG*10^2
damageData[which(damageData$CROPDMGEXP=="B"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="B"), ]$CROPDMG*10^9
damageData[which(damageData$CROPDMGEXP=="M"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="M"), ]$CROPDMG*10^6
damageData[which(damageData$CROPDMGEXP=="K"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="K"), ]$CROPDMG*10^3
damageData[which(damageData$CROPDMGEXP=="H"), which(colnames(damageData)=="CROPDMG")] <- damageData[which(damageData$CROPDMGEXP=="H"), ]$CROPDMG*10^2
Once we have done this, we can follow to aggregation.
damageData <- damageData %>%
group_by(EVTYPE) %>%
summarize(propDmg=sum(PROPDMG),
cropDmg=sum(CROPDMG)) %>%
mutate(totalDmg=propDmg+cropDmg) %>%
arrange(desc(totalDmg))
totalDamageTable <- kable(head(damageData, n=10),
format="html",
digits=0,
row.names=TRUE,
caption="Top-10 Events by Total Damage",
col.names=c("Event Type",
"Property Damage",
"Crop Damage",
"Total Damage"))
Find below the top-10 for most damaging types of events.
| Event Type | Property Damage | Crop Damage | Total Damage | |
|---|---|---|---|---|
| 1 | FLOOD | 144657709807 | 5661968450 | 150319678257 |
| 2 | HURRICANE/TYPHOON | 69305840000 | 2607872800 | 71913712800 |
| 3 | TORNADO | 56925660790 | 414953270 | 57340614060 |
| 4 | STORM SURGE | 43323536000 | 5000 | 43323541000 |
| 5 | HAIL | 15727367548 | 3025537890 | 18752905438 |
| 6 | FLASH FLOOD | 16140812067 | 1421317100 | 17562129167 |
| 7 | DROUGHT | 1046106000 | 13972566000 | 15018672000 |
| 8 | HURRICANE | 11868319010 | 2741910000 | 14610229010 |
| 9 | RIVER FLOOD | 5118945500 | 5029459000 | 10148404500 |
| 10 | ICE STORM | 3944927860 | 5022113500 | 8967041360 |