Import two related datasets from TidyTuesday Project.
exped_tidy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-21/exped_tidy.csv')
## Rows: 882 Columns: 69
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): EXPID, PEAKID, SEASON_FACTOR, HOST_FACTOR, ROUTE1, ROUTE2, NATION...
## dbl (17): YEAR, SEASON, HOST, SMTDAYS, TOTDAYS, TERMREASON, HIGHPOINT, CAMP...
## lgl (27): ROUTE3, ROUTE4, SUCCESS1, SUCCESS2, SUCCESS3, SUCCESS4, ASCENT3, ...
## date (3): BCDATE, SMTDATE, TERMDATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
peaks_tidy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-21/peaks_tidy.csv')
## Rows: 480 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): PEAKID, PKNAME, PKNAME2, LOCATION, HIMAL_FACTOR, REGION_FACTOR, RE...
## dbl (12): HEIGHTM, HEIGHTF, HIMAL, REGION, TREKYEAR, PHOST, PSTATUS, PEAKMEM...
## lgl (3): OPEN, UNLISTED, TREKKING
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: exped_tidy
Data 2: peaks_tidy
exped_tidy_small <- exped_tidy %>% select(PEAKID, YEAR, HOST) %>% sample_n(10)
peaks_tidy_small <- peaks_tidy %>% select(PKNAME, PEAKID, REGION) %>% sample_n(10)
exped_tidy_small
## # A tibble: 10 × 3
## PEAKID YEAR HOST
## <chr> <dbl> <dbl>
## 1 LHOT 2021 1
## 2 EVER 2023 1
## 3 LHOT 2021 1
## 4 HONK 2022 1
## 5 NEMJ 2023 1
## 6 TKPO 2021 1
## 7 EVER 2023 1
## 8 HIML 2023 1
## 9 LHOT 2022 1
## 10 AMAD 2021 1
peaks_tidy_small
## # A tibble: 10 × 3
## PKNAME PEAKID REGION
## <chr> <chr> <dbl>
## 1 Naulekh NAUL 2
## 2 Himjung HIMJ 5
## 3 Peak 41 PK41 2
## 4 Nampa II NAM2 7
## 5 Karyolung KARY 2
## 6 Pimu PIMU 2
## 7 Amphu I AMPH 2
## 8 Tankya I TANK 7
## 9 Mariyang MARI 7
## 10 Cho Oyu CHOY 2
Describe the resulting data:
How is it different from the original two datasets? 1 row compared to 10 rows in the original datasets All columns from two datasets
exped_tidy_small %>% inner_join(peaks_tidy_small)
## Joining with `by = join_by(PEAKID)`
## # A tibble: 0 × 5
## # ℹ 5 variables: PEAKID <chr>, YEAR <dbl>, HOST <dbl>, PKNAME <chr>,
## # REGION <dbl>
Describe the resulting data:
How is it different from the original two datasets? It is 10x5 not 10x3 like the original
exped_tidy_small %>% left_join(peaks_tidy_small)
## Joining with `by = join_by(PEAKID)`
## # A tibble: 10 × 5
## PEAKID YEAR HOST PKNAME REGION
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 LHOT 2021 1 <NA> NA
## 2 EVER 2023 1 <NA> NA
## 3 LHOT 2021 1 <NA> NA
## 4 HONK 2022 1 <NA> NA
## 5 NEMJ 2023 1 <NA> NA
## 6 TKPO 2021 1 <NA> NA
## 7 EVER 2023 1 <NA> NA
## 8 HIML 2023 1 <NA> NA
## 9 LHOT 2022 1 <NA> NA
## 10 AMAD 2021 1 <NA> NA
Describe the resulting data:
How is it different from the original two datasets? 11 rows not 10 like the original
exped_tidy_small %>% right_join(peaks_tidy_small)
## Joining with `by = join_by(PEAKID)`
## # A tibble: 10 × 5
## PEAKID YEAR HOST PKNAME REGION
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 NAUL NA NA Naulekh 2
## 2 HIMJ NA NA Himjung 5
## 3 PK41 NA NA Peak 41 2
## 4 NAM2 NA NA Nampa II 7
## 5 KARY NA NA Karyolung 2
## 6 PIMU NA NA Pimu 2
## 7 AMPH NA NA Amphu I 2
## 8 TANK NA NA Tankya I 7
## 9 MARI NA NA Mariyang 7
## 10 CHOY NA NA Cho Oyu 2
Describe the resulting data:
How is it different from the original two datasets? 19 rows not the original 10
exped_tidy_small %>% full_join(peaks_tidy_small)
## Joining with `by = join_by(PEAKID)`
## # A tibble: 20 × 5
## PEAKID YEAR HOST PKNAME REGION
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 LHOT 2021 1 <NA> NA
## 2 EVER 2023 1 <NA> NA
## 3 LHOT 2021 1 <NA> NA
## 4 HONK 2022 1 <NA> NA
## 5 NEMJ 2023 1 <NA> NA
## 6 TKPO 2021 1 <NA> NA
## 7 EVER 2023 1 <NA> NA
## 8 HIML 2023 1 <NA> NA
## 9 LHOT 2022 1 <NA> NA
## 10 AMAD 2021 1 <NA> NA
## 11 NAUL NA NA Naulekh 2
## 12 HIMJ NA NA Himjung 5
## 13 PK41 NA NA Peak 41 2
## 14 NAM2 NA NA Nampa II 7
## 15 KARY NA NA Karyolung 2
## 16 PIMU NA NA Pimu 2
## 17 AMPH NA NA Amphu I 2
## 18 TANK NA NA Tankya I 7
## 19 MARI NA NA Mariyang 7
## 20 CHOY NA NA Cho Oyu 2
Describe the resulting data:
How is it different from the original two datasets? 2 rows instead of the original 10
exped_tidy_small %>% semi_join(peaks_tidy_small)
## Joining with `by = join_by(PEAKID)`
## # A tibble: 0 × 3
## # ℹ 3 variables: PEAKID <chr>, YEAR <dbl>, HOST <dbl>
Describe the resulting data:
How is it different from the original two datasets? 8 rows instead of the original 10
exped_tidy_small %>% anti_join(peaks_tidy_small)
## Joining with `by = join_by(PEAKID)`
## # A tibble: 10 × 3
## PEAKID YEAR HOST
## <chr> <dbl> <dbl>
## 1 LHOT 2021 1
## 2 EVER 2023 1
## 3 LHOT 2021 1
## 4 HONK 2022 1
## 5 NEMJ 2023 1
## 6 TKPO 2021 1
## 7 EVER 2023 1
## 8 HIML 2023 1
## 9 LHOT 2022 1
## 10 AMAD 2021 1