Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

exped_tidy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-21/exped_tidy.csv')

## Rows: 882 Columns: 69
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (22): EXPID, PEAKID, SEASON_FACTOR, HOST_FACTOR, ROUTE1, ROUTE2, NATION...
## dbl  (17): YEAR, SEASON, HOST, SMTDAYS, TOTDAYS, TERMREASON, HIGHPOINT, CAMP...
## lgl  (27): ROUTE3, ROUTE4, SUCCESS1, SUCCESS2, SUCCESS3, SUCCESS4, ASCENT3, ...
## date  (3): BCDATE, SMTDATE, TERMDATE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

peaks_tidy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-21/peaks_tidy.csv')

## Rows: 480 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): PEAKID, PKNAME, PKNAME2, LOCATION, HIMAL_FACTOR, REGION_FACTOR, RE...
## dbl (12): HEIGHTM, HEIGHTF, HIMAL, REGION, TREKYEAR, PHOST, PSTATUS, PEAKMEM...
## lgl  (3): OPEN, UNLISTED, TREKKING
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1 exped_tidy

Columns: EXPID, PEAKID, YEAR
Rows:10 rows

set.seed(1234)
exped_tidy_small <- exped_tidy %>% select(EXPID, PEAKID, YEAR) %>% sample_n(10)
peaks_tidy_small <- peaks_tidy %>% select(PEAKID, PKNAME, PKNAME2) %>% sample_n(10)

exped_tidy_small

## # A tibble: 10 × 3
##    EXPID     PEAKID  YEAR
##    <chr>     <chr>  <dbl>
##  1 EVER22132 EVER    2022
##  2 JUG324201 JUG3    2024
##  3 LHOT21105 LHOT    2021
##  4 MAKA23106 MAKA    2023
##  5 AMAD23303 AMAD    2023
##  6 CHND22301 CHND    2022
##  7 LHOT21102 LHOT    2021
##  8 LHOT21107 LHOT    2021
##  9 HIML23314 HIML    2023
## 10 LHOT23107 LHOT    2023

peaks_tidy_small

## # A tibble: 10 × 3
##    PEAKID PKNAME               PKNAME2                     
##    <chr>  <chr>                <chr>                       
##  1 DINS   Dingjung Ri          Dingjung Ri South           
##  2 JAGD   Jagdula              <NA>                        
##  3 SANK   Sano Kailash         <NA>                        
##  4 YNGS   Yangra Kangri South  Yangra South, Ganesh I South
##  5 DING   Dingjung North       Kangkuru, Rima Mancho       
##  6 YANG   Yangri               Jugal                       
##  7 TUKU   Tukuche              <NA>                        
##  8 GIME   Gimmigela Chuli East Twins                       
##  9 ANN2   Annapurna II         <NA>                        
## 10 PHUN   Phu Kang North       Phu Khang North

Data 2 peaks_tidy

Columns: PEAKID, PKNAME, PKNAME2
Rows: 10 rows

set.seed(1234)
exped_tidy_small <- exped_tidy %>% select(EXPID, PEAKID, YEAR) %>% sample_n(10)
peaks_tidy_small <- peaks_tidy %>% select(PEAKID, PKNAME, PKNAME2) %>% sample_n(10)

exped_tidy_small

## # A tibble: 10 × 3
##    EXPID     PEAKID  YEAR
##    <chr>     <chr>  <dbl>
##  1 EVER22132 EVER    2022
##  2 JUG324201 JUG3    2024
##  3 LHOT21105 LHOT    2021
##  4 MAKA23106 MAKA    2023
##  5 AMAD23303 AMAD    2023
##  6 CHND22301 CHND    2022
##  7 LHOT21102 LHOT    2021
##  8 LHOT21107 LHOT    2021
##  9 HIML23314 HIML    2023
## 10 LHOT23107 LHOT    2023

peaks_tidy_small

## # A tibble: 10 × 3
##    PEAKID PKNAME               PKNAME2                     
##    <chr>  <chr>                <chr>                       
##  1 DINS   Dingjung Ri          Dingjung Ri South           
##  2 JAGD   Jagdula              <NA>                        
##  3 SANK   Sano Kailash         <NA>                        
##  4 YNGS   Yangra Kangri South  Yangra South, Ganesh I South
##  5 DING   Dingjung North       Kangkuru, Rima Mancho       
##  6 YANG   Yangri               Jugal                       
##  7 TUKU   Tukuche              <NA>                        
##  8 GIME   Gimmigela Chuli East Twins                       
##  9 ANN2   Annapurna II         <NA>                        
## 10 PHUN   Phu Kang North       Phu Khang North

3. inner_join

Describe the resulting data:

Columns: exped_tidy, PEAKID, PKNAME, PKNAME2
Rows:1

How is it different from the original two datasets? 1 row compared to 10 rows in the original datasets all columns from the two datasets

exped_tidy_small %>% inner_join(peaks_tidy_small)

## Joining with `by = join_by(PEAKID)`

## # A tibble: 0 × 5
## # ℹ 5 variables: EXPID <chr>, PEAKID <chr>, YEAR <dbl>, PKNAME <chr>,
## #   PKNAME2 <chr>

4. left_join

Describe the resulting data:

Columns: exped_tidy, PEAKID, PKNAME, PKNAME2
Rows: 1

How is it different from the original two datasets? 1 row compared to 10 rows in the original datasets all columns from the two datasets

 peaks_tidy_small %>% inner_join(exped_tidy_small)

## Joining with `by = join_by(PEAKID)`

## # A tibble: 0 × 5
## # ℹ 5 variables: PEAKID <chr>, PKNAME <chr>, PKNAME2 <chr>, EXPID <chr>,
## #   YEAR <dbl>

5. right_join

Describe the resulting data:

Columns:
Rows:

How is it different from the original two datasets?

6. full_join

Describe the resulting data:

Columns:
Rows:

How is it different from the original two datasets?

7. semi_join

Describe the resulting data:

Columns:
Rows:

How is it different from the original two datasets?

8. anti_join

Describe the resulting data:

Columns:
Rows:

How is it different from the original two datasets?

Week 9: Apply it to your data 8

Daniel Lee

2022-10-05

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join