Introduction

This report uses the Himalayan Mountaineering Expeditions dataset from the TidyTuesday project (2025, Week 3).

The data comes from the Himalayan Database, an archive of climbing expeditions in the Nepal Himalaya originally compiled by journalist Elizabeth Hawley.

Two tables are used:

Questions explored: how has climbing activity changed over time, which peaks are most dangerous, and what factors relate to expedition outcomes.

Setup and Data Loading

library(data.table)
library(ggplot2)
library(RColorBrewer)
exped <- fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-21/exped_tidy.csv')
peaks <- fread('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-21/peaks_tidy.csv')

Quick look at both tables:

str(exped)
## Classes 'data.table' and 'data.frame':   882 obs. of  69 variables:
##  $ EXPID            : chr  "EVER20101" "EVER20102" "EVER20103" "AMAD20301" ...
##  $ PEAKID           : chr  "EVER" "EVER" "EVER" "AMAD" ...
##  $ YEAR             : int  2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ SEASON           : int  1 1 1 3 3 3 3 3 3 3 ...
##  $ SEASON_FACTOR    : chr  "Spring" "Spring" "Spring" "Autumn" ...
##  $ HOST             : int  2 2 2 1 1 1 1 1 1 1 ...
##  $ HOST_FACTOR      : chr  "China" "China" "China" "Nepal" ...
##  $ ROUTE1           : chr  "N Col-NE Ridge" "N Col-NE Ridge" "N Col-NE Ridge" "SW Ridge" ...
##  $ ROUTE2           : chr  NA NA NA NA ...
##  $ ROUTE3           : logi  NA NA NA NA NA NA ...
##  $ ROUTE4           : logi  NA NA NA NA NA NA ...
##  $ NATION           : chr  "China" "China" "China" "Nepal" ...
##  $ LEADERS          : chr  "Tibetan Rope-Fixing" "Ci Luo (Tselo)" "Tsering Samdrup" "Chhang Dawa Sherpa" ...
##  $ SPONSOR          : chr  "Tibetan Rope-Fixing Everest North 2020" "Chinese Mount Everest Survey Team" "Holy Mountain Adventure Everest Expedition 2020" "Seven Summit Treks Ama Dablam Expedition 2020" ...
##  $ SUCCESS1         : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ SUCCESS2         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ SUCCESS3         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ SUCCESS4         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ ASCENT1          : chr  NA NA NA NA ...
##  $ ASCENT2          : chr  NA NA NA NA ...
##  $ ASCENT3          : logi  NA NA NA NA NA NA ...
##  $ ASCENT4          : logi  NA NA NA NA NA NA ...
##  $ CLAIMED          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ DISPUTED         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ COUNTRIES        : chr  NA NA NA "Canada, Czech Republic, France, Poland, Russia, Switzerland, Ukraine, USA" ...
##  $ APPROACH         : chr  "Lhasa->Tingri->Everest BC" NA "Lhasa->Tingri->Everest BC" NA ...
##  $ BCDATE           : IDate, format: NA NA ...
##  $ SMTDATE          : IDate, format: "2020-05-26" "2020-05-27" ...
##  $ SMTTIME          : int  1515 945 545 1300 1300 1243 NA 930 NA 615 ...
##  $ SMTDAYS          : int  0 0 35 1 9 16 11 3 0 10 ...
##  $ TOTDAYS          : int  0 0 38 0 11 17 13 4 0 0 ...
##  $ TERMDATE         : IDate, format: NA NA ...
##  $ TERMREASON       : int  1 1 1 1 1 1 4 1 12 1 ...
##  $ TERMREASON_FACTOR: chr  "Success (main peak)" "Success (main peak)" "Success (main peak)" "Success (main peak)" ...
##  $ TERMNOTE         : chr  NA NA NA NA ...
##  $ HIGHPOINT        : int  8849 8849 8849 6814 6814 6814 6650 6814 0 6814 ...
##  $ TRAVERSE         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ SKI              : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PARAPENTE        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ CAMPS            : int  3 3 3 2 2 2 2 2 0 2 ...
##  $ ROPE             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ TOTMEMBERS       : int  0 12 20 14 6 2 4 1 1 6 ...
##  $ SMTMEMBERS       : int  0 8 14 9 6 2 0 1 0 1 ...
##  $ MDEATHS          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ TOTHIRED         : int  6 0 22 19 8 1 2 1 0 6 ...
##  $ SMTHIRED         : int  6 0 21 14 8 1 0 1 0 3 ...
##  $ HDEATHS          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ NOHIRED          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ O2USED           : logi  TRUE TRUE TRUE FALSE FALSE FALSE ...
##  $ O2NONE           : logi  FALSE FALSE FALSE TRUE TRUE TRUE ...
##  $ O2CLIMB          : logi  TRUE TRUE TRUE FALSE FALSE FALSE ...
##  $ O2DESCENT        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ O2SLEEP          : logi  TRUE TRUE TRUE FALSE FALSE FALSE ...
##  $ O2MEDICAL        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ O2TAKEN          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ O2UNKWN          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ OTHERSMTS        : chr  NA NA NA NA ...
##  $ CAMPSITES        : chr  "BC,ABC,C1,C2,C3,Smt(26/05)" "BC,ABC,C1,C2,C3,Smt(27/05)" "BC(23/04,5200m),IC(26/04,5800m),ABC(05/01,6500m),C1(25/05,7028m),C2(26/05,7790m),C3(27/05,8300m),Smt(28/05)" "BC(09/11,4450m),C1(5600m),C2(5900m),Smt(10,12-13,15/11)" ...
##  $ ROUTEMEMO        : int  NA 221011 203869 NA 29755 107752 29661 17154 NA 69150 ...
##  $ ACCIDENTS        : chr  NA NA NA NA ...
##  $ ACHIEVMENT       : chr  NA NA NA NA ...
##  $ AGENCY           : chr  "Holy Mountain Adventure" NA "Holy Mountain Adventure" "Seven Summit Treks" ...
##  $ COMRTE           : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ STDRTE           : logi  TRUE TRUE TRUE FALSE FALSE FALSE ...
##  $ PRIMRTE          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PRIMMEM          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PRIMREF          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ PRIMID           : chr  NA NA NA NA ...
##  $ CHKSUM           : int  2465291 2465292 2465293 2463299 2463299 2463320 2463316 2463318 4135 2463318 ...
##  - attr(*, ".internal.selfref")=<pointer: 0x105f28080>
str(peaks)
## Classes 'data.table' and 'data.frame':   480 obs. of  29 variables:
##  $ PEAKID        : chr  "AMAD" "AMPG" "ANN1" "ANN2" ...
##  $ PKNAME        : chr  "Ama Dablam" "Amphu Gyabjen" "Annapurna I" "Annapurna II" ...
##  $ PKNAME2       : chr  "Amai Dablang" "Amphu Gyabien" NA NA ...
##  $ LOCATION      : chr  "Khumbu Himal" "Khumbu Himal (N of Ama Dablam)" "Annapurna Himal" "Annapurna Himal" ...
##  $ HEIGHTM       : int  6814 5630 8091 7937 7555 7525 8026 8051 7219 7132 ...
##  $ HEIGHTF       : int  22356 18471 26545 26040 24787 24688 26332 26414 23684 23399 ...
##  $ HIMAL         : int  12 12 1 1 1 1 1 1 1 2 ...
##  $ HIMAL_FACTOR  : chr  "Khumbu" "Khumbu" "Annapurna" "Annapurna" ...
##  $ REGION        : int  2 2 5 5 5 5 5 5 5 7 ...
##  $ REGION_FACTOR : chr  "Khumbu-Rolwaling-Makalu" "Khumbu-Rolwaling-Makalu" "Annapurna-Damodar-Peri" "Annapurna-Damodar-Peri" ...
##  $ OPEN          : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ UNLISTED      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ TREKKING      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ TREKYEAR      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ RESTRICT      : chr  NA "Opened in 2002" NA NA ...
##  $ PHOST         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PHOST_FACTOR  : chr  "Nepal only" "Nepal only" "Nepal only" "Nepal only" ...
##  $ PSTATUS       : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ PSTATUS_FACTOR: chr  "Climbed" "Climbed" "Climbed" "Climbed" ...
##  $ PEAKMEMO      : int  8 20 23 31 35 40 44 46 48 54 ...
##  $ PYEAR         : int  1961 1953 1950 1960 1961 1955 1974 1980 1964 1960 ...
##  $ PSEASON       : int  1 1 1 1 1 1 1 3 3 1 ...
##  $ PEXPID        : chr  "AMAD61101" "AMPG53101" "ANN150101" "ANN260101" ...
##  $ PSMTDATE      : chr  "Mar 13" "Apr 11" "Jun 03" "May 17" ...
##  $ PCOUNTRY      : chr  "New Zealand, USA, UK" "UK" "France" "UK, Nepal" ...
##  $ PSUMMITERS    : chr  "Mike Gill, Wally Romanes, Barry Bishop, Michael Ward" "John Hunt, Tom Bourdillon" "Maurice Herzog, Louis Lachenal" "Richard Grant, Chris Bonington, Ang Nyima Sherpa" ...
##  $ PSMTNOTE      : chr  NA NA NA NA ...
##  $ REFERMEMO     : int  NA NA 25 33 NA NA NA NA NA NA ...
##  $ PHOTOMEMO     : int  13 NA 26 34 37 42 45 NA 50 57 ...
##  - attr(*, ".internal.selfref")=<pointer: 0x105f28080>

The expeditions table has 882 rows and 69 columns. The peaks table has 480 rows and 29 columns.

Merging the Datasets

Both tables share PEAKID, so merging on that connects expedition outcomes with peak info like height and region.

dt <- merge(exped, peaks, by = 'PEAKID', suffixes = c('_exp', '_peak'))
dim(dt)
## [1] 882  97

After merging: 882 expedition records with peak-level information attached.

Data Transformations

Filtering

Filter for expeditions to peaks above 8,000 meters (the “eight-thousanders”):

eight_thousanders <- dt[HEIGHTM >= 8000]
nrow(eight_thousanders)
## [1] 483
## expedition count per peak
eight_thousanders[, .N, by = PKNAME][order(-N)]
##           PKNAME     N
##           <char> <int>
## 1:       Everest   189
## 2:       Manaslu    96
## 3:        Lhotse    79
## 4:  Dhaulagiri I    35
## 5:        Makalu    26
## 6:   Annapurna I    22
## 7: Kangchenjunga    20
## 8:       Cho Oyu    16

483 expeditions targeted peaks above 8,000m. Everest dominates the list.

Filter for expeditions where at least one person died:

deadly <- dt[MDEATHS > 0 | HDEATHS > 0]
deadly[, total_deaths := MDEATHS + HDEATHS]
nrow(deadly)
## [1] 38

38 expeditions had at least one fatality.

Aggregation

Summary stats by season:

dt[, .(
  expeditions   = .N,
  avg_team_size = round(mean(TOTMEMBERS, na.rm = TRUE), 1),
  avg_hired     = round(mean(TOTHIRED, na.rm = TRUE), 1),
  total_deaths  = sum(MDEATHS + HDEATHS, na.rm = TRUE)
), by = SEASON_FACTOR][order(-expeditions)]
##    SEASON_FACTOR expeditions avg_team_size avg_hired total_deaths
##           <char>       <int>         <num>     <num>        <int>
## 1:        Spring         462           8.2       9.4           42
## 2:        Autumn         394           8.5       5.9           11
## 3:        Winter          21           4.5       2.3            0
## 4:        Summer           5           2.2       0.8            0

Spring is by far the busiest season, which makes sense since the best weather windows on the big peaks happen in April/May.

Summary stats by region:

dt[, .(
  expeditions  = .N,
  total_deaths = sum(MDEATHS + HDEATHS, na.rm = TRUE),
  avg_height   = round(mean(HEIGHTM, na.rm = TRUE), 0)
), by = REGION_FACTOR][order(-expeditions)]
##              REGION_FACTOR expeditions total_deaths avg_height
##                     <char>       <int>        <int>      <num>
## 1: Khumbu-Rolwaling-Makalu         553           44       7876
## 2:  Annapurna-Damodar-Peri         109            1       7234
## 3:          Manaslu-Ganesh          99            4       8106
## 4:        Dhaulagiri-Mukut          50            2       7848
## 5:     Kangchenjunga-Janak          37            2       7848
## 6:          Langtang-Jugal          21            0       6442
## 7:      Kanjiroba-Far West          13            0       6455

Plots

Plot 1: Distribution of Peak Heights (Histogram)

ggplot(peaks, aes(x = HEIGHTM)) +
  geom_histogram(binwidth = 200, fill = brewer.pal(9, 'Blues')[6], color = 'white') +
  labs(title = 'Distribution of Himalayan Peak Heights',
       x = 'Height (meters)', y = 'Number of Peaks') +
  theme_minimal()

Right-skewed distribution. Most peaks are in the 6,000-7,500m range, with only a handful above 8,000m.

Plot 2: Expeditions Over Time by Season (Line + Point)

yearly <- dt[, .N, by = .(YEAR, SEASON_FACTOR)]

ggplot(yearly[SEASON_FACTOR %in% c('Spring', 'Autumn')],
       aes(x = YEAR, y = N, color = SEASON_FACTOR)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 1.5) +
  scale_color_brewer(palette = 'Set1') +
  labs(title = 'Himalayan Expeditions Over Time',
       x = 'Year', y = 'Number of Expeditions', color = 'Season') +
  theme_minimal()

Spring consistently has more expeditions than Autumn across 2020-2024. Both seasons saw a dip in 2024.

Plot 3: Top 10 Deadliest Peaks (Bar Chart)

deaths_by_peak <- dt[, .(total_deaths = sum(MDEATHS + HDEATHS, na.rm = TRUE)),
                     by = PKNAME][order(-total_deaths)][1:10]

ggplot(deaths_by_peak, aes(x = reorder(PKNAME, total_deaths), y = total_deaths)) +
  geom_col(fill = brewer.pal(9, 'Reds')[6]) +
  coord_flip() +
  labs(title = 'Top 10 Deadliest Himalayan Peaks',
       subtitle = 'Total deaths (members + hired) across all expeditions',
       x = '', y = 'Total Deaths') +
  theme_minimal()

Everest leads in absolute deaths, partly because it has the most expeditions. Manaslu and Omoga Ri Chang also rank high. Annapurna I is arguably more dangerous per attempt given fewer expeditions.

Plot 4: Team Size vs Peak Height (Scatterplot + Regression)

This plot uses two geom layers: geom_point and geom_smooth.

ggplot(dt, aes(x = HEIGHTM, y = TOTMEMBERS)) +
  geom_point(alpha = 0.3, color = brewer.pal(9, 'Set1')[2], size = 1.5) +
  geom_smooth(method = 'lm', se = TRUE, color = brewer.pal(9, 'Set1')[1]) +
  labs(title = 'Team Size vs Peak Height',
       x = 'Peak Height (meters)', y = 'Total Team Members') +
  theme_minimal()

Positive relationship. Taller peaks need more logistics (porters, camps, fixed ropes), so teams are bigger.

Plot 5: Termination Reasons (Bar Chart)

term_counts <- dt[!is.na(TERMREASON_FACTOR) & TERMREASON_FACTOR != '',
                  .N, by = TERMREASON_FACTOR][order(-N)]

ggplot(term_counts, aes(x = reorder(TERMREASON_FACTOR, N), y = N)) +
  geom_col(fill = brewer.pal(9, 'Purples')[6]) +
  coord_flip() +
  labs(title = 'Why Do Expeditions End?',
       x = '', y = 'Number of Expeditions') +
  theme_minimal()

Most expeditions end with a successful summit. Among failures, bad weather and conditions are the top reasons.

Plot 6: Expedition Duration by Season (Boxplot)

ggplot(dt[!is.na(TOTDAYS) & TOTDAYS > 0 & TOTDAYS < 200 &
            SEASON_FACTOR %in% c('Spring', 'Autumn', 'Winter', 'Summer')],
       aes(x = SEASON_FACTOR, y = TOTDAYS, fill = SEASON_FACTOR)) +
  geom_boxplot(outlier.alpha = 0.3) +
  scale_fill_brewer(palette = 'Pastel1') +
  labs(title = 'Expedition Duration by Season',
       x = 'Season', y = 'Total Days') +
  theme_minimal() +
  theme(legend.position = 'none')

Spring has the widest spread and highest median. That’s the season when the big 8,000m peaks are attempted, and those require weeks of acclimatization.

Plot 7: Peak Height vs Deaths (Bubble Chart)

This plot uses two geom layers: geom_point and geom_text.

peak_danger <- dt[, .(
  total_deaths = sum(MDEATHS + HDEATHS, na.rm = TRUE),
  expeditions  = .N,
  height       = first(HEIGHTM),
  peak_name    = first(PKNAME)
), by = PEAKID][total_deaths > 0][order(-total_deaths)]

top_peaks <- peak_danger[1:15]

ggplot(top_peaks, aes(x = height, y = total_deaths)) +
  geom_point(aes(size = expeditions), color = brewer.pal(8, 'Dark2')[1], alpha = 0.7) +
  geom_text(aes(label = peak_name), hjust = -0.1, vjust = 0.5,
            size = 3, check_overlap = TRUE) +
  scale_size_continuous(range = c(2, 10), name = 'Total Expeditions') +
  labs(title = 'Peak Height vs Total Deaths (Top 15)',
       subtitle = 'Bubble size = number of expeditions',
       x = 'Peak Height (meters)', y = 'Total Deaths') +
  theme_minimal() +
  theme(legend.position = 'bottom') +
  xlim(NA, max(top_peaks$height) + 400)

Everest has the most deaths and most expeditions. Annapurna I stands out as particularly deadly relative to how few expeditions attempt it.

Plot 8: Member vs Hired Deaths Over Time (Area Chart)

deaths_time <- dt[YEAR >= 1960, .(
  member_deaths = sum(MDEATHS, na.rm = TRUE),
  hired_deaths  = sum(HDEATHS, na.rm = TRUE)
), by = YEAR]

deaths_long <- melt(deaths_time, id.vars = 'YEAR',
                    variable.name = 'type', value.name = 'deaths')
deaths_long[, type := ifelse(type == 'member_deaths', 'Expedition Members', 'Hired Personnel')]

ggplot(deaths_long, aes(x = YEAR, y = deaths, fill = type)) +
  geom_area(alpha = 0.7) +
  scale_fill_brewer(palette = 'Set2') +
  labs(title = 'Climbing Deaths Over Time',
       subtitle = 'Member and hired personnel deaths (2020-2024)',
       x = 'Year', y = 'Deaths', fill = 'Category') +
  theme_minimal()

Deaths peaked around 2022-2023. Hired personnel (Sherpas, local guides) make up a significant share of fatalities, which is an ongoing ethical concern

Conclusion

Main takeaways from this analysis: