Mt. Everest, the highest mountain on the planet standing at 8849 meters above sea level is attractive to many climbers. The first people to conquer the mountain were Sir Edmund Percival Hillary and Tenzing Norgay on 29th May, 1953. Since then, thousands have succesfully conquered the mountain. However, a substantial number of climbers succumb to accidents and altitude sickness. So far, over 330 climbers have died as per the records. In the recent past only 1977 and 2020 (due to the closure occasioned by the COVID 19 pandemic) have passed without a climber dying (R Core Team 2022).
In this analysis, I explore data from Wikipedia on the recorded number of deaths among climbers of Mt. Everest. In particular, I seek answers to the following questions.
I start by reading in the data. The also fill in the one missing date of death. The point concerns Maurice Wilson who attempted to climb the mountain in 1934. His last diary entry was 31st May 1934. Hence, we can reasonably presume he died in early June. In this case, I update the year of death to 1934. The other details remain as NA (2024).
read_html("https://en.wikipedia.org/wiki/List_of_people_who_died_climbing_Mount_Everest") %>%
html_nodes("table") %>%
html_table() %>%
.[[2]] %>%
clean_names() %>%
mutate(date = case_when(
name == "Maurice Wilson" ~ "June 1, 1934",
TRUE ~ date
)) %>%
mutate(date = mdy(date)) %>%
mutate(day_of_week = lubridate::wday(date, label = TRUE),
month = lubridate::month(date, label = TRUE),
year_d = lubridate::year(date)) %>%
write_csv("everest.csv")
#### ============
## Read in the CSV
everest_data <- read_csv("everest.csv", na = "")
The data consists of 332 rows and 12 variables. The variables contained in the dataset are;
Variables Description
variable | Data_type | Description |
---|---|---|
Name | Character | Name of victim. |
Date | Date/Time | Date of death. |
Age | Integer | Age of Victim in years at the time of death. |
Expedition | Character | The climbing expedition that the victim belonged, if any. |
Nationality | Character | Nationality of the victim |
Cause of death | Character | The victim’s cause of death. |
Location | Character | Approximate location of death. |
Day of Week | Character | Day of the week that victim died. |
Month | Character | The month that the victim died. |
Year_d | Integer | Year of victim’s death. |
In this article, I have explored the deaths of people attempting to scale Mt. Everest. Avalanches are the most risky events, contributing to the bulk of deaths. Citizens of Nepal form the bulk of casualties, followed by Indians. The month of May has the highest casualties, probably because it is the month that most people attempt to climb. Fidays and saturdays have a significantly higher rate of deaths among climbers. With ample data, we could examine the rates of deaths more deeply.
The dataset has substantial missing data points, especially the age of the climbers and their expeditions.
everest_data %>%
sapply(., is.na) %>%
colSums() %>%
tibble(variables = names(everest_data), missing = .) %>%
arrange(desc(missing)) %>%
kbl(booktabs = TRUE,
caption = "Missing Data") %>%
kable_classic(full_width = FALSE)
Missing Data
variables | missing |
---|---|
remains_status | 256 |
age | 128 |
expedition | 35 |
location | 17 |
cause_of_death | 9 |
nationality | 3 |
name | 0 |
date | 0 |
refs | 0 |
day_of_week | 0 |
month | 0 |
year_d | 0 |
There are certain dates that had high casualties due to extreme events. In this section I examine the these dates.
everest_data %>%
group_by(date) %>%
summarise(casualty_count = n(),
cause = cause_of_death) %>%
slice_head(n = 1) %>%
arrange(desc(casualty_count)) %>%
ungroup() %>%
head(10) %>%
kbl(booktabs = TRUE,
captions = "Deadly Days") %>%
kable_classic(full_width = FALSE)
date | casualty_count | cause |
---|---|---|
2015-04-25 | 17 | Base Camp avalanche following the April 2015 Nepal earthquake |
2014-04-18 | 15 | 2014 Mount Everest Avalanche |
1996-05-11 | 8 | Suspected HACE (high-altitude cerebral edema), exhaustion, frostbite and exposure. |
1922-06-07 | 7 | Avalanche |
1974-09-09 | 6 | Avalanche |
1970-04-05 | 5 | Avalanche |
2007-05-17 | 5 | NA |
1985-10-11 | 4 | Exposure |
1988-10-17 | 4 | Disappearance (likely accidental death during descent after reaching South Summit with Jozef Just rejoining group after he summited Everest solo)[76][77] |
1989-05-27 | 4 | Avalanche |
Here it appears like avalanches are notorious for killing climbers. But are avalanches, like earthquakes unpredictable?
In the rest of the analysis, I consider these extreme events as they may tilt our observations.
The highest number of casualties are nationals of Nepal. This observation is not surprising given that most of the mountain climbing guides are from the country. Outside Nepal, India, Japan, United Kingdom, United States and South Korea have the highest fatalities.
everest_data %>%
count(nationality) %>%
mutate(nationality = factor(nationality)) %>%
mutate(nationality = fct_reorder(nationality, n, max)) %>%
ggplot(aes(x = nationality, y = n)) +
geom_col() +
coord_flip()
However, there are several cases where a large number of climbers died due to an extreme event.
The deadliest months for climbers in Mt. Everest are May, April, Septembe, October, and June respectively. These months also correspond to the peak season for climbing the mountain owing to extreme weather during the rest of the year. Due to lack of data on the total climbers in a given month, it is not possible to compute the rate of death. However, it would appear that any attempt to climb the mountain outside of these months would result in higher casualty rate.
everest_data %>%
count(month) %>%
mutate(month = fct_reorder(month, n, max)) %>%
ggplot(mapping = aes(x = month, y = n)) +
geom_col() +
coord_flip() +
labs(x = NULL,
y = NULL,
title = "Deaths Among Climbers by Month")
Saturdays, followed by Fridays have the highest fatalities which could be due to the days being the most popular climbing days across the climbers.
everest_data %>%
count(day_of_week) %>%
mutate(day_of_week = fct_reorder(day_of_week, n, max)) %>%
ggplot(mapping = aes(x = day_of_week, y = n)) +
geom_col()
Moreover, it is possible that extreme weather and other natural events like earthquakes could tilt these numbers. Hence, I eliminate those days that had more than five deaths and plot the figures.
everest_data %>%
group_by(date) %>%
filter(n() <= 3) %>%
count(day_of_week) %>%
mutate(day_of_week = fct_reorder(day_of_week, n, max)) %>%
ggplot(mapping = aes(x = day_of_week, y = n)) +
geom_col()
In this article, I have explored the deaths of people attempting to scale Mt. Everest. Avalanches are the most risky events, contributing to the bulk of deaths. Citizens of Nepal form the bulk of casualties, followed by Indians. The month of May has the highest casualties, probably because it is the month that most people attempt to climb. Fidays and saturdays have a significantly higher rate of deaths among climbers. With ample data, we could examine the rates of deaths more deeply.