The Himalayan database is a compilation of records for all expeditions that have climbed in the Nepal Himalaya. The data covers all expeditions from 1905 through Spring 2019 to more than 465 significant peaks in Nepal. Also included are expeditions to both sides of border peaks such as Everest, Cho Oyu, Makalu and Kangchenjunga as well as to some smaller border peaks. Data on expeditions to trekking peaks are included for early attempts, first ascents and major accidents.

Load the weekly Data

Download the weekly data and make available in the tt object.

tt <- tt_load("2020-09-22")

## 
##  Downloading file 1 of 3: `peaks.csv`
##  Downloading file 2 of 3: `members.csv`
##  Downloading file 3 of 3: `expeditions.csv`

Describe: The data is rendered in height(meters) with 465 peaks, along with climbing status .

Expeditions

na_reasons <- c("Unknown", "Attempt rumoured", "Did not attempt climb", "Did not reach base camp")

expeditions <- tt$expeditions %>%
  mutate(success = case_when(str_detect(termination_reason, "Success") ~ "Success",
                             termination_reason %in% na_reasons ~ "Other",
                             TRUE ~ "Failure")) %>%
  mutate(days_to_highpoint = as.integer(highpoint_date - basecamp_date))

Describe: With the information provided, we can infer the distribution of length of climbs versus height or time, fraction of successful climbs per month/year, rate of death over time per mountain(by all passangers or hired staff), whether oxygen was used or not, death rate by mountain and age and death cause and rate of injury. It also shows correlation between frequency of expeditions and death rate.

Tallest peaks in Himalayas. (Arranging and mutating)

peaks <- tt$peaks
peaks %>%
  arrange(desc(height_metres)) %>%
  head(25) %>%
  mutate(peak_name = fct_reorder(peak_name, height_metres)) %>%
  ggplot(aes(height_metres, peak_name, fill = climbing_status)) +
  geom_col() +
  labs(title = "Tallest peaks in the Himalayas", x = "Height (meters)", y = "", filling = "")

Describe: Here we see that the Mount Everest is the tallest of them all(8,848 m), followed by Kangchenjunga(8,586 m) and Lhotse(8,516 m)

Another way of calculating this is:

peaks %>%
  ggplot(aes(height_metres)) +
  geom_histogram(binwidth = 200,  fill = fish(1, option = "Centropyge_loricula", begin = 0, end = 1,  direction = -1), alpha = 0.8) +
  annotate("text", 8450, 17, label = "Mount Everest", family = "Bahnschrift") +
  annotate("curve", x = 8500, y = 15, xend = 8775, yend = 2, curvature = -0.25, 
           arrow = arrow(length = unit(2, "mm"))) +
  labs(title = "How tall are Himalayan peaks?",
    caption = "Source: The Himalayan Database",
    x = "Height (m)",
    y = "Number of peaks") +
  theme(text = element_text(family = "Bahnschrift"))

Answer: Mount Everest is the tallest at almost 9,000 metres. Most of the other peaks are between 6,000 and 7,000 meters.

View expedition by members Describe: This shows us the distribution of members and its categories like gender, citizenship, season, expedition role, etc.
Time to get to the highpoint

expeditions %>%
  count(termination_reason, sort = TRUE)

## # A tibble: 15 x 2
##    termination_reason                                                          n
##    <chr>                                                                   <int>
##  1 Success (main peak)                                                      5581
##  2 Bad weather (storms, high winds)                                         1307
##  3 Bad conditions (deep snow, avalanching, falling ice, or rock)            1097
##  4 Illness, AMS, exhaustion, or frostbite                                    458
##  5 Route technically too difficult, lack of experience, strength, or moti~   438
##  6 Other                                                                     320
##  7 Accident (death or serious injury)                                        299
##  8 Did not attempt climb                                                     233
##  9 Lack (or loss) of supplies or equipment                                   220
## 10 Success (subpeak)                                                         126
## 11 Unknown                                                                    96
## 12 Lack of time                                                               93
## 13 Did not reach base camp                                                    64
## 14 Success (claimed)                                                          20
## 15 Attempt rumoured                                                           12

expeditions %>%
  filter(!is.na(days_to_highpoint), !is.na(peak_name)) %>%
  filter(success == "Success") %>%
  mutate(peak_name = fct_lump(peak_name, 10),
         peak_name = fct_reorder(peak_name, days_to_highpoint)) %>%
  ggplot(aes(days_to_highpoint, peak_name)) +
  geom_boxplot() +
  labs(x = "Days from basecamp to highpoint",
       y = "",
       title = "How long does it take to get to the high point?",
       subtitle = "Successful climbs only")

Answer: It takes 75 days from basecamp to the highpoint of Mount Everest.

Using tbl

summarize_expeditions <- function(tbl) 
  {tbl %>%
    summarize(n_climbs = n(),
              pct_success = mean(success == "Success"),
              across(members:hired_staff_deaths, sum),
              first_climb = min(year)) %>%
    mutate(pct_death = member_deaths / members,
         pct_hired_staff_deaths = hired_staff_deaths / hired_staff)}
peaks_summarized <- expeditions %>%
  group_by(peak_id, peak_name) %>%
  summarize_expeditions() %>%
  ungroup() %>%
  arrange(desc(n_climbs)) %>%
  inner_join(peaks %>% select(peak_id, height_metres), by = "peak_id")

Visualize

Histogram showing the death rates

expeditions %>%
  count(termination_reason, sort = TRUE)

## # A tibble: 15 x 2
##    termination_reason                                                          n
##    <chr>                                                                   <int>
##  1 Success (main peak)                                                      5581
##  2 Bad weather (storms, high winds)                                         1307
##  3 Bad conditions (deep snow, avalanching, falling ice, or rock)            1097
##  4 Illness, AMS, exhaustion, or frostbite                                    458
##  5 Route technically too difficult, lack of experience, strength, or moti~   438
##  6 Other                                                                     320
##  7 Accident (death or serious injury)                                        299
##  8 Did not attempt climb                                                     233
##  9 Lack (or loss) of supplies or equipment                                   220
## 10 Success (subpeak)                                                         126
## 11 Unknown                                                                    96
## 12 Lack of time                                                               93
## 13 Did not reach base camp                                                    64
## 14 Success (claimed)                                                          20
## 15 Attempt rumoured                                                           12

expeditions %>%
  mutate(days_to_highpoint = highpoint_date - basecamp_date) %>%
  ggplot(aes(days_to_highpoint)) +
  geom_histogram()

Answer: As the high point increases, the death rate increases, where we can infer that only a few members can possibly reach on top.

Top five causes of death while climbing.

tt$members %>%
  filter(died, year >= 1970) %>%
  mutate(death_cause = fct_lump(death_cause, 5)) %>%
  count(decade = year %/% 10 * 10, death_cause) %>%
  complete(decade, death_cause, fill = list(n = 0)) %>%
  group_by(decade) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup() %>%
  mutate(death_cause = fct_reorder(death_cause, n, sum)) %>%
  ggplot(aes(decade, pct, fill = death_cause)) +
  geom_area() +
  scale_x_continuous(labels = label_number(big.mark = "", suffix = "s")) +
  scale_y_continuous(labels = label_percent()) +
  scale_fill_fish(option = "Pseudocheilinus_tetrataenia", discrete = TRUE) +
  labs(title = "Avalanches and falls can be particularly deadly",
    subtitle = "Percent of deaths by cause of death (1960-2019)",
    caption = "Source: The Himalayan Database",
    x = "", y = "",
    fill = "Cause of death") +
  theme_minimal() +
  theme(text = element_text(family = "Bahnschrift"),
        axis.text = element_text(size = 12),
        panel.grid.minor = element_blank())

Answer: Weather and conditions with falls and avalanches being particularly deadly, are ranked the highest as the causes of death. Exhaustion and AMS(Acute Mountain Sickness) have become more prevalent in the last 20 years.

How many Himalayan peaks remain unclimbed?

peaks %>%
  ggplot(aes(climbing_status, fill = climbing_status)) +
  geom_bar() +
  scale_fill_fish(option = "Centropyge_loricula", begin = 0,
  end = 1, discrete = TRUE, direction = -1, alpha = 0.8) +
  labs(title = "More than a quarter of Himalayan peaks remain unclimbed",
    caption = "Source: The Himalayan Database",
    x = "", y = "Number of peaks") +
  theme(legend.position = "none", text = element_text(family = "Bahnschrift"))

Answer: Approximately 150 have remained unclimbed. However, more than a quarter of them haven’t been recorded yet, mostly because the routes are very technically challenging, or maybe because the weather is very bad, or they are very remote.

Top reasons for expeditions to unclimbed peaks

peaks %>%
  filter(climbing_status == "Unclimbed") %>%
  inner_join(expeditions, by = "peak_id") %>%
  count(termination_reason) %>%
  mutate( termination_reason = case_when(str_detect(termination_reason, "Route technically") ~ "Route technically too difficult", str_detect(termination_reason, "Bad conditions") ~ "Bad conditions",
      TRUE ~ termination_reason), termination_reason = fct_reorder(termination_reason, n)) %>%
  ggplot(aes(n, termination_reason, fill = termination_reason)) +
  geom_col() +
  scale_fill_fish(option = "Centropyge_loricula", begin = 0, end = 1, discrete = TRUE) +
  labs(title = "Technical routes a challenge for unclimbed peaks",
    subtitle = "Termination reasons for expeditions to unclimbed peaks",
    caption = "Source: The Himalayan Database",
    x = "Number of expeditions",
    y = "") +
  theme(legend.position = "none",
        text = element_text(family = "Bahnschrift"))

Answer: The top reason for expeditions to unclimbed peaks to end is the route being technically too difficult, followed by bad conditions and bad weather. However, some expeditions are considered successes, but how can an expedition be successful and the peak still be considered unclimbed?

When were Himalayan peaks first climbed?

peaks %>%
  ggplot(aes(first_ascent_year)) +
  geom_histogram(fill = fish(1, option = "Centropyge_loricula", begin = 0, end = 1, direction = -1), alpha = 0.8) +
  theme(text = element_text(family = "Bahnschrift"))

Since the table has a first_ascent_expedition_id field, we can look up the date of that expedition.

peaks %>%
  filter(first_ascent_year < 1000) %>%
  left_join(expeditions %>% select(expedition_id, year),
    by = c("first_ascent_expedition_id" = "expedition_id")) %>%
  select(peak_name, first_ascent_year, first_ascent_expedition_id, year)

## # A tibble: 1 x 4
##   peak_name  first_ascent_year first_ascent_expedition_id  year
##   <chr>                  <dbl> <chr>                      <dbl>
## 1 Sharphu II               201 SPH218301                   2018

Describe: Thae outlier on the histogram is Sharpu II – that could be miscoded as occurring in the year 2001. The expedition is from 2018.

##correction of data

Correcting the data

peaks_new <- peaks %>%
  mutate(first_ascent_year = ifelse(peak_id == "SPH2", 2018, first_ascent_year))

Calculating the first ascents.

peaks_new %>%
  ggplot(aes(first_ascent_year)) +
  geom_histogram(binwidth = 5, fill = fish(1, option = "Centropyge_loricula", direction = -1),
    alpha = 0.8) +
  scale_x_continuous(breaks = seq(1910, 2020, 10)) +
  labs(title = "Climbers summitting peaks for the first time",
       subtitle = "Year of first ascent for Himalayan peaks",
       caption = "Source: The Himalayan Database",
       x = "Year of first ascent (5-year bins)",
       y = "Number of first ascents") +
  theme(text = element_text(family = "Bahnschrift"),
        panel.grid.minor = element_blank())

Answer: No first ascents before 1950, but after the year 1950, there was a steep rise. There have been many first ascents since 2000.

Which countries were involved in first ascents?

top_20_countries <- peaks_new %>%
  filter(!is.na(first_ascent_country)) %>%
  separate_rows(first_ascent_country, sep = ",") %>%
  mutate(first_ascent_country = str_squish(first_ascent_country), 
         first_ascent_country = ifelse(first_ascent_country == "W Germany", "Germany",
      first_ascent_country)) %>%
  count(first_ascent_country, name = "first_ascents", sort = TRUE) %>%
  mutate(first_ascent_country = fct_reorder(first_ascent_country, first_ascents)) %>%
  top_n(20, wt = first_ascents)

top_20_countries %>%
  ggplot(aes(first_ascents, first_ascent_country, fill = first_ascent_country)) +
  geom_col() +
  scale_x_continuous(breaks = seq(0, 150, 25)) +
  scale_fill_fish(option = "Centropyge_loricula", begin = 0, end = 1, discrete = TRUE, direction = -1, alpha = 0.8) +
  labs(title = "Countries Involved",
    subtitle = "First ascents a country's citizen was involved in",
    caption = "Source: The Himalayan Database",
    x = "Number of first ascents",
    y = "") +
  theme(legend.position = "none",
    text = element_text(family = "Bahnschrift"),
    panel.grid.minor = element_blank())

Answer: Nepal and Japan lead the way in first ascents, followed by UK and France.

Using percentage compare decades that have different numbers of ascents.

countries_by_decade <- peaks_new %>%
  filter(!is.na(first_ascent_country), first_ascent_year >= 1910) %>%
  separate_rows(first_ascent_country, sep = ",") %>%
  mutate(first_ascent_country = str_squish(first_ascent_country),
    first_ascent_country = ifelse(first_ascent_country == "W Germany", "Germany", first_ascent_country), first_ascent_decade = first_ascent_year %/% 10 * 10,
    first_ascent_country = fct_lump(first_ascent_country, 8)) %>%
  count(first_ascent_country, first_ascent_decade, name = "first_ascents") %>%
  group_by(first_ascent_decade) %>%
  mutate(pct_of_ascents = first_ascents / sum(first_ascents)) %>%
  ungroup() %>%
  mutate(first_ascent_country = fct_reorder(first_ascent_country, -first_ascents, sum),
    first_ascent_country = fct_relevel(first_ascent_country, "Other", after = Inf))

countries_by_decade %>%
  ggplot(aes(first_ascent_decade, pct_of_ascents, fill = first_ascent_country)) +
  geom_col() +
  scale_x_continuous(breaks = seq(1930, 2010, 20)) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_fish(option = "Centropyge_loricula", begin = 0, end = 1, discrete = TRUE, 
                  direction = -1) +
  facet_wrap( ~ first_ascent_country) +
  labs(title ="Percent of first ascents involving a countries' citizens",
       caption = "Source: The Himalayan Database",
       x = "Decade of first ascent",
       y = "") +
  theme(text = element_text(family = "Bahnschrift"),
        legend.position = "none",
        panel.grid.major.x = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(colour = "black"),
        strip.background = element_blank())

Answer: We can infer that Nepal has been involved in first ascents consistently. Japan too has been involved in first ascents, though a little less recently. UK was involved pre-1960. The US involvement has been increasing lately. There were massive climbers from Switzerland in the 1940s.

What time of year(season) do people climb the most?

expeditions %>%
  mutate(decade = year %/% 10 * 10,
    season = fct_relevel(season, "Spring", "Summer", "Autumn", "Winter")) %>%
  count(decade, season, name = "expeditions") %>%
  ggplot(aes(decade, expeditions, fill = season)) +
  geom_col() +
  scale_x_continuous(breaks = seq(1920, 2010, 10)) +
  scale_fill_fish(option = "Pseudocheilinus_tetrataenia",
                  discrete = TRUE,
                  direction = -1) +
  facet_wrap(~ season) +
  labs(title = "time of the year people climb ",
    subtitle = "Number of expeditions (1921-2019)",
    caption = "Source: The Himalayan Database",
    x = "",
    y = "") +
  theme(legend.position = "none",
    text = element_text(family = "Bahnschrift"),
    panel.grid.minor = element_blank(),
    strip.text = element_text(size = 12, colour = "black"),
    strip.background = element_blank())

Answer: Spring and autumn expeditions are the most. There are almost no expeditions during the summer and very few during the winter season.

Calculating the most expeditions according to the month

expeditions %>%
  filter(!is.na(highpoint_date)) %>%
  mutate(highpoint_month = lubridate::month(highpoint_date, label = TRUE)) %>%
  ggplot(aes(highpoint_month, y = ..prop.., group = 1)) +
  geom_bar(stat = "count", fill = "#373142FF", alpha = 0.7) +
  scale_y_continuous(breaks = seq(0, 0.8, 0.1), labels = label_percent(accuracy = 1)) +
  labs(title = "Percent of expeditions by month of reaching their highpoint (1921-2019)",
    caption = "Source: The Himalayan Database",
    x = "Month",
    y = "") +
  theme(text = element_text(family = "Bahnschrift"),
        panel.grid.minor = element_blank())

Answer: The majority of expeditions took place during the month of May.

Calculating the most expeditions according to the years.

expeditions %>%
  filter(year >= 1960) %>%
  mutate(decade = paste0(year %/% 10 * 10, "s"),
    season = fct_relevel(season, "Spring", "Summer", "Autumn", "Winter")) %>%
  count(decade, season, name = "expeditions") %>%
  group_by(decade) %>%
  mutate(pct_expeditions = expeditions / sum(expeditions)) %>%
  ungroup() %>%
  ggplot(aes(season, pct_expeditions, fill = season)) +
  geom_col() +
  scale_y_continuous(breaks = seq(0, 1, 0.2),
                     labels = label_percent(accuracy = 1)) +
  scale_fill_fish(option = "Pseudocheilinus_tetrataenia",
                  discrete = TRUE,
                  direction = -1) +
  facet_wrap(~ decade) +
  labs(title = "Percent of expeditions by years",
    caption = "Source: The Himalayan Database",
    x = "",
    y = "") +
  theme(legend.position = "none",
    text = element_text(family = "Bahnschrift"),
    panel.grid.minor = element_blank(),
    strip.text = element_text(size = 12, colour = "black"),
    strip.background = element_blank())

Examining death probability per member.

tt$members

## # A tibble: 76,519 x 21
##    expedition_id member_id peak_id peak_name  year season sex     age
##    <chr>         <chr>     <chr>   <chr>     <dbl> <chr>  <chr> <dbl>
##  1 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        40
##  2 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        41
##  3 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        27
##  4 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        40
##  5 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        34
##  6 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        25
##  7 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        41
##  8 AMAD78301     AMAD7830~ AMAD    Ama Dabl~  1978 Autumn M        29
##  9 AMAD79101     AMAD7910~ AMAD    Ama Dabl~  1979 Spring M        35
## 10 AMAD79101     AMAD7910~ AMAD    Ama Dabl~  1979 Spring M        37
## # ... with 76,509 more rows, and 13 more variables: citizenship <chr>,
## #   expedition_role <chr>, hired <lgl>, highpoint_metres <dbl>, success <lgl>,
## #   solo <lgl>, oxygen_used <lgl>, died <lgl>, death_cause <chr>,
## #   death_height_metres <dbl>, injured <lgl>, injury_type <chr>,
## #   injury_height_metres <dbl>

everest <- tt$members %>%
  filter(peak_name == "Everest")
everest %>%
  group_by(age = 10 * (age %/% 10)) %>%
  summarize(n_climbers = n(),
            pct_death = mean(died))

## # A tibble: 9 x 3
##     age n_climbers pct_death
##   <dbl>      <int>     <dbl>
## 1    10        263   0.00760
## 2    20       5258   0.0126 
## 3    30       8079   0.0120 
## 4    40       5001   0.0116 
## 5    50       1897   0.0148 
## 6    60        446   0.0269 
## 7    70         48   0      
## 8    80          5   0.4    
## 9    NA        816   0.0502

everest %>%
  group_by(hired) %>%
  summarize(n_climbers = n(),
            pct_death = mean(died))

## # A tibble: 2 x 3
##   hired n_climbers pct_death
##   <lgl>      <int>     <dbl>
## 1 FALSE      15083    0.0123
## 2 TRUE        6730    0.0178

model <- everest %>%
  mutate(leader = expedition_role == "Leader") %>%
  glm(died ~ year + age + sex + leader + hired + oxygen_used, data = ., family = "binomial")
tidied <- model %>%
  tidy(conf.int = TRUE, exponentiate = TRUE)
tidied %>%
  filter(term != "(Intercept)") %>%
  mutate(term = reorder(term, estimate)) %>%
  ggplot(aes(estimate, term)) +
  geom_point() +
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high))

What are the deadliest mountains?

peaks_eb <- peaks_summarized %>%
  filter(members >= 20) %>%
  arrange(desc(pct_death)) %>%
  add_ebb_estimate(member_deaths, members)
peaks_eb %>%
  ggplot(aes(pct_death, .fitted)) +
  geom_point(aes(size = members, color = members)) +
  geom_abline(color = "red") +
  scale_x_continuous(labels = percent) +
  scale_y_continuous(labels = percent) +
  scale_color_continuous(trans = "log10") +
  labs(x = "Death rate (raw)",
       y = "Death rate (empirical Bayes adjusted)")

peaks_eb %>%
  filter(members >= 200) %>%
  arrange(desc(.fitted)) %>%
  mutate(peak_name = fct_reorder(peak_name, .fitted)) %>%
  ggplot(aes(.fitted, peak_name)) +
  geom_point(aes(size = members)) +
  geom_errorbarh(aes(xmin = .low, xmax = .high)) +
  expand_limits(x = 0) +
  scale_x_continuous(labels = percent) +
  labs(x = "Death rate (empirical Bayes adjusted + 95% credible interval)",
       y = "",
       title = "How deadly is each peak in the Himalayas?",
       subtitle = "Only peaks that at least 200 climbers have attempted")

Answer: With about 15000 people, Mount Everest has at least 1% death dates. However, with around 5000 climbers, the death rate increases to more than 6% for Himalchuli East.

Using .fitted

peaks_eb %>%
  filter(members >= 100) %>%
  ggplot(aes(height_metres, .fitted)) +
  geom_point(aes(size = members))

##Exploring Everest in particular

How deadly has Everest been getting over time?

expeditions %>%
  filter(peak_name == "Everest") %>%
  ggplot(aes(days_to_highpoint, fill = success)) +
  geom_density(alpha = .5)

expeditions %>%
  filter(peak_name == "Everest") %>%
  filter(success == "Success") %>%
  arrange(days_to_highpoint)

## # A tibble: 1,326 x 18
##    expedition_id peak_id peak_name  year season basecamp_date highpoint_date
##    <chr>         <chr>   <chr>     <dbl> <chr>  <date>        <date>        
##  1 EVER18151     EVER    Everest    2018 Spring 2018-04-07    2018-04-07    
##  2 EVER19157     EVER    Everest    2019 Spring 2019-05-19    2019-05-22    
##  3 EVER19188     EVER    Everest    2019 Spring 2019-05-19    2019-05-22    
##  4 EVER17105     EVER    Everest    2017 Spring 2017-05-23    2017-05-27    
##  5 EVER18148     EVER    Everest    2018 Spring 2018-05-14    2018-05-18    
##  6 EVER13104     EVER    Everest    2013 Spring 2013-05-13    2013-05-19    
##  7 EVER19129     EVER    Everest    2019 Spring 2019-05-10    2019-05-16    
##  8 EVER96130     EVER    Everest    1996 Spring 1996-05-17    1996-05-24    
##  9 EVER16163     EVER    Everest    2016 Spring 2016-05-16    2016-05-23    
## 10 EVER11102     EVER    Everest    2011 Spring 2011-05-12    2011-05-20    
## # ... with 1,316 more rows, and 11 more variables: termination_date <date>,
## #   termination_reason <chr>, highpoint_metres <dbl>, members <dbl>,
## #   member_deaths <dbl>, hired_staff <dbl>, hired_staff_deaths <dbl>,
## #   oxygen_used <lgl>, trekking_agency <chr>, success <chr>,
## #   days_to_highpoint <int>

expeditions %>%
  filter(success == "Success") %>%
  ggplot(aes(year)) +
  geom_histogram()

everest_by_decade <- expeditions %>%
  filter(peak_name == "Everest") %>%
  mutate(decade = pmax(10 * (year %/% 10), 1970)) %>%
  group_by(decade) %>%
  summarize_expeditions()
everest_by_decade %>%
  ggplot(aes(decade, pct_death)) +
  geom_line(aes(color = "All climbers")) +
  geom_line(aes(y = pct_hired_staff_deaths, color = "Hired staff")) +
  geom_point(aes(color = "All climbers", size = members)) +
  geom_point(aes(y = pct_hired_staff_deaths, size = hired_staff, color = "Hired staff")) +
  scale_x_continuous(breaks = seq(1970, 2010, 10),
                     labels = c("< 1980", seq(1980, 2010, 10))) +
  scale_y_continuous(labels = percent) +
  expand_limits(y = 0) +
  labs(x = "Decade",
       y = "Death rate",
       title = "Everest has been getting less deadly over time",
       subtitle = "Though trends have recently reversed for hired staff",
       size = "# of climbers",
       color = "")

Answer: The death rate has been the most minimum during the 1980s with 2% death rate of almost 1000 climbers. However, it has increased drastically over the years from 2000 and 2010 with at most 500 members and 1% death rate. Therefore, we can infer that Mount Everest has been getting less deadlier over time mostly because, I assume that there could be more professional(experienced) and trained climbers with extra precautions.

Deaths while climbing the Mount Everest?

tt$members %>%
  count(year, wt = died, name = "died") %>%
  complete(year = 1921:2019, fill = list(died = 0)) %>%
  ggplot(aes(year, died)) +
  geom_line(size = 0.8, col = "#FF9A41") +
  geom_point(size = 2, col = "#FF9A41") +
  geom_label(aes(1931, 7),label = "1922 avalanche", fill = "white", label.size = NA,
    family = "Bahnschrift") +
  geom_label(aes(1988, 15), label = "1996 blizzard", fill = "white", label.size = NA,
    family = "Bahnschrift") +
  geom_label(aes(2005, 17),label = "2014 avalanche", fill = "white", label.size = NA,
    family = "Bahnschrift") +
  scale_x_continuous(breaks = seq(1920, 2020, 10)) +
  labs(title = "Number of deaths from Everest expeditions",
    caption = "Source: The Himalayan Database",
    x = "",
    y = "") +
  theme(text = element_text("Bahnschrift"),
    panel.grid.minor.y = element_blank(),
    axis.text = element_text(size = 12))

Answer: 11 deaths in a season is the highest. Two seasons in the previous 5 years had more deaths. This includes 2014, the year with 17 deaths – the most ever in a single year.

Analysing Himalayan climbers

2020-09-22

Load the weekly Data

Visualize