1. Import your data

Import two related datasets from TidyTuesday Project.

penguins <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-04-15/penguins.csv')
## Rows: 344 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (5): bill_len, bill_dep, flipper_len, body_mass, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
penguins_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-04-15/penguins_raw.csv')
## Rows: 344 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (9): studyName, Species, Region, Island, Stage, Individual ID, Clutch C...
## dbl  (7): Sample Number, Culmen Length (mm), Culmen Depth (mm), Flipper Leng...
## date (1): Date Egg
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1 : Penguins

Data 2 : penguins raw

set.seed(1234)
penguins_small <- penguins %>% select(sex, species, year) %>% sample_n(10)

penguins_raw_small <- penguins_raw %>% select(Species, Sex, Island) %>% sample_n(10)

penguins_small
## # A tibble: 10 × 3
##    sex    species    year
##    <chr>  <chr>     <dbl>
##  1 male   Chinstrap  2007
##  2 female Chinstrap  2009
##  3 female Adelie     2009
##  4 female Adelie     2009
##  5 female Adelie     2009
##  6 male   Adelie     2008
##  7 female Adelie     2009
##  8 male   Gentoo     2008
##  9 female Adelie     2008
## 10 female Chinstrap  2009
penguins_raw_small
## # A tibble: 10 × 3
##    Species                             Sex    Island   
##    <chr>                               <chr>  <chr>    
##  1 Adelie Penguin (Pygoscelis adeliae) FEMALE Torgersen
##  2 Gentoo penguin (Pygoscelis papua)   MALE   Biscoe   
##  3 Gentoo penguin (Pygoscelis papua)   FEMALE Biscoe   
##  4 Adelie Penguin (Pygoscelis adeliae) MALE   Biscoe   
##  5 Adelie Penguin (Pygoscelis adeliae) <NA>   Torgersen
##  6 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream    
##  7 Adelie Penguin (Pygoscelis adeliae) MALE   Dream    
##  8 Gentoo penguin (Pygoscelis papua)   MALE   Biscoe   
##  9 Gentoo penguin (Pygoscelis papua)   FEMALE Biscoe   
## 10 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream

3. inner_join

Describe the resulting data:
There is no data and no rows. * Columns: Sex, Species, yera, Island * Rows:0

How is it different from the original two datasets? there is no rows of data anymore.

names(penguins_small)[names(penguins_small) == "sex"] <- "Sex"
names(penguins_small)[names(penguins_small) == "species"] <- "Species"

penguins_small %>% inner_join(penguins_raw_small)
## Joining with `by = join_by(Sex, Species)`
## # A tibble: 0 × 4
## # ℹ 4 variables: Sex <chr>, Species <chr>, year <dbl>, Island <chr>

4. left_join

Describe the resulting data:

How is it different from the original two datasets? There is alot more data, and alot more columns.

penguins_small %>% left_join(penguins, by = "year")
## Warning in left_join(., penguins, by = "year"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 101 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## # A tibble: 1,172 × 10
##    Sex   Species    year species island  bill_len bill_dep flipper_len body_mass
##    <chr> <chr>     <dbl> <chr>   <chr>      <dbl>    <dbl>       <dbl>     <dbl>
##  1 male  Chinstrap  2007 Adelie  Torger…     39.1     18.7         181      3750
##  2 male  Chinstrap  2007 Adelie  Torger…     39.5     17.4         186      3800
##  3 male  Chinstrap  2007 Adelie  Torger…     40.3     18           195      3250
##  4 male  Chinstrap  2007 Adelie  Torger…     NA       NA            NA        NA
##  5 male  Chinstrap  2007 Adelie  Torger…     36.7     19.3         193      3450
##  6 male  Chinstrap  2007 Adelie  Torger…     39.3     20.6         190      3650
##  7 male  Chinstrap  2007 Adelie  Torger…     38.9     17.8         181      3625
##  8 male  Chinstrap  2007 Adelie  Torger…     39.2     19.6         195      4675
##  9 male  Chinstrap  2007 Adelie  Torger…     34.1     18.1         193      3475
## 10 male  Chinstrap  2007 Adelie  Torger…     42       20.2         190      4250
## # ℹ 1,162 more rows
## # ℹ 1 more variable: sex <chr>

5. right_join

Describe the resulting data:
There is alot more rows and columns with alot of data * Columns: Species, Sex, Island, species, bill_len, bill_dep, flipper_len, body_mass, sex
* Rows:1316

How is it different from the original two datasets? alot of data but give alot more information than just 3 columns

names(penguins)[names(penguins) == "island"] <- "Island"
penguins_raw_small %>% right_join(penguins, by = "Island")
## Warning in right_join(., penguins, by = "Island"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 21 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## # A tibble: 1,316 × 10
##    Species    Sex   Island species bill_len bill_dep flipper_len body_mass sex  
##    <chr>      <chr> <chr>  <chr>      <dbl>    <dbl>       <dbl>     <dbl> <chr>
##  1 Adelie Pe… FEMA… Torge… Adelie      39.1     18.7         181      3750 male 
##  2 Adelie Pe… FEMA… Torge… Adelie      39.5     17.4         186      3800 fema…
##  3 Adelie Pe… FEMA… Torge… Adelie      40.3     18           195      3250 fema…
##  4 Adelie Pe… FEMA… Torge… Adelie      NA       NA            NA        NA <NA> 
##  5 Adelie Pe… FEMA… Torge… Adelie      36.7     19.3         193      3450 fema…
##  6 Adelie Pe… FEMA… Torge… Adelie      39.3     20.6         190      3650 male 
##  7 Adelie Pe… FEMA… Torge… Adelie      38.9     17.8         181      3625 fema…
##  8 Adelie Pe… FEMA… Torge… Adelie      39.2     19.6         195      4675 male 
##  9 Adelie Pe… FEMA… Torge… Adelie      34.1     18.1         193      3475 <NA> 
## 10 Adelie Pe… FEMA… Torge… Adelie      42       20.2         190      4250 <NA> 
## # ℹ 1,306 more rows
## # ℹ 1 more variable: year <dbl>

6. full_join

Describe the resulting data:
its just more data and more columns * Columns:Sex, Species, year, species island, bill_len, bill_dep, flipper_len body_mass * Rows:1172

How is it different from the original two datasets? way more daya a bit overwhelming and less organized.

penguins_small %>% full_join(penguins, by = "year")
## Warning in full_join(., penguins, by = "year"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 101 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## # A tibble: 1,172 × 10
##    Sex   Species    year species Island  bill_len bill_dep flipper_len body_mass
##    <chr> <chr>     <dbl> <chr>   <chr>      <dbl>    <dbl>       <dbl>     <dbl>
##  1 male  Chinstrap  2007 Adelie  Torger…     39.1     18.7         181      3750
##  2 male  Chinstrap  2007 Adelie  Torger…     39.5     17.4         186      3800
##  3 male  Chinstrap  2007 Adelie  Torger…     40.3     18           195      3250
##  4 male  Chinstrap  2007 Adelie  Torger…     NA       NA            NA        NA
##  5 male  Chinstrap  2007 Adelie  Torger…     36.7     19.3         193      3450
##  6 male  Chinstrap  2007 Adelie  Torger…     39.3     20.6         190      3650
##  7 male  Chinstrap  2007 Adelie  Torger…     38.9     17.8         181      3625
##  8 male  Chinstrap  2007 Adelie  Torger…     39.2     19.6         195      4675
##  9 male  Chinstrap  2007 Adelie  Torger…     34.1     18.1         193      3475
## 10 male  Chinstrap  2007 Adelie  Torger…     42       20.2         190      4250
## # ℹ 1,162 more rows
## # ℹ 1 more variable: sex <chr>

7. semi_join

Describe the resulting data:
it gives a good amount of data but also keeps it short with 10 rows of 3 * Columns:species, sex, island * Rows:10

How is it different from the original two datasets?

penguins_raw_small %>% semi_join(penguins, by = "Island")
## # A tibble: 10 × 3
##    Species                             Sex    Island   
##    <chr>                               <chr>  <chr>    
##  1 Adelie Penguin (Pygoscelis adeliae) FEMALE Torgersen
##  2 Gentoo penguin (Pygoscelis papua)   MALE   Biscoe   
##  3 Gentoo penguin (Pygoscelis papua)   FEMALE Biscoe   
##  4 Adelie Penguin (Pygoscelis adeliae) MALE   Biscoe   
##  5 Adelie Penguin (Pygoscelis adeliae) <NA>   Torgersen
##  6 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream    
##  7 Adelie Penguin (Pygoscelis adeliae) MALE   Dream    
##  8 Gentoo penguin (Pygoscelis papua)   MALE   Biscoe   
##  9 Gentoo penguin (Pygoscelis papua)   FEMALE Biscoe   
## 10 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream

8. anti_join

Describe the resulting data:
no data and the columns stayed the same * Columns:sex, species, year * Rows:0

How is it different from the original two datasets? its similar with the columns but there are no rows of data

penguins_small %>% anti_join(penguins, by = "year")
## # A tibble: 0 × 3
## # ℹ 3 variables: Sex <chr>, Species <chr>, year <dbl>