Import two related datasets from TidyTuesday Project.
penguins <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-04-15/penguins.csv')
## Rows: 344 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (5): bill_len, bill_dep, flipper_len, body_mass, year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
penguins_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-04-15/penguins_raw.csv')
## Rows: 344 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): studyName, Species, Region, Island, Stage, Individual ID, Clutch C...
## dbl (7): Sample Number, Culmen Length (mm), Culmen Depth (mm), Flipper Leng...
## date (1): Date Egg
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1 : Penguins
Data 2 : penguins raw
set.seed(1234)
penguins_small <- penguins %>% select(sex, species, year) %>% sample_n(10)
penguins_raw_small <- penguins_raw %>% select(Species, Sex, Island) %>% sample_n(10)
penguins_small
## # A tibble: 10 × 3
## sex species year
## <chr> <chr> <dbl>
## 1 male Chinstrap 2007
## 2 female Chinstrap 2009
## 3 female Adelie 2009
## 4 female Adelie 2009
## 5 female Adelie 2009
## 6 male Adelie 2008
## 7 female Adelie 2009
## 8 male Gentoo 2008
## 9 female Adelie 2008
## 10 female Chinstrap 2009
penguins_raw_small
## # A tibble: 10 × 3
## Species Sex Island
## <chr> <chr> <chr>
## 1 Adelie Penguin (Pygoscelis adeliae) FEMALE Torgersen
## 2 Gentoo penguin (Pygoscelis papua) MALE Biscoe
## 3 Gentoo penguin (Pygoscelis papua) FEMALE Biscoe
## 4 Adelie Penguin (Pygoscelis adeliae) MALE Biscoe
## 5 Adelie Penguin (Pygoscelis adeliae) <NA> Torgersen
## 6 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream
## 7 Adelie Penguin (Pygoscelis adeliae) MALE Dream
## 8 Gentoo penguin (Pygoscelis papua) MALE Biscoe
## 9 Gentoo penguin (Pygoscelis papua) FEMALE Biscoe
## 10 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream
Describe the resulting data:
There is no data and no rows. * Columns: Sex, Species, yera, Island *
Rows:0
How is it different from the original two datasets? there is no rows of data anymore.
names(penguins_small)[names(penguins_small) == "sex"] <- "Sex"
names(penguins_small)[names(penguins_small) == "species"] <- "Species"
penguins_small %>% inner_join(penguins_raw_small)
## Joining with `by = join_by(Sex, Species)`
## # A tibble: 0 × 4
## # ℹ 4 variables: Sex <chr>, Species <chr>, year <dbl>, Island <chr>
Describe the resulting data:
How is it different from the original two datasets? There is alot more data, and alot more columns.
penguins_small %>% left_join(penguins, by = "year")
## Warning in left_join(., penguins, by = "year"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 101 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 1,172 × 10
## Sex Species year species island bill_len bill_dep flipper_len body_mass
## <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 male Chinstrap 2007 Adelie Torger… 39.1 18.7 181 3750
## 2 male Chinstrap 2007 Adelie Torger… 39.5 17.4 186 3800
## 3 male Chinstrap 2007 Adelie Torger… 40.3 18 195 3250
## 4 male Chinstrap 2007 Adelie Torger… NA NA NA NA
## 5 male Chinstrap 2007 Adelie Torger… 36.7 19.3 193 3450
## 6 male Chinstrap 2007 Adelie Torger… 39.3 20.6 190 3650
## 7 male Chinstrap 2007 Adelie Torger… 38.9 17.8 181 3625
## 8 male Chinstrap 2007 Adelie Torger… 39.2 19.6 195 4675
## 9 male Chinstrap 2007 Adelie Torger… 34.1 18.1 193 3475
## 10 male Chinstrap 2007 Adelie Torger… 42 20.2 190 4250
## # ℹ 1,162 more rows
## # ℹ 1 more variable: sex <chr>
Describe the resulting data:
There is alot more rows and columns with alot of data * Columns:
Species, Sex, Island, species, bill_len, bill_dep, flipper_len,
body_mass, sex
* Rows:1316
How is it different from the original two datasets? alot of data but give alot more information than just 3 columns
names(penguins)[names(penguins) == "island"] <- "Island"
penguins_raw_small %>% right_join(penguins, by = "Island")
## Warning in right_join(., penguins, by = "Island"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 21 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 1,316 × 10
## Species Sex Island species bill_len bill_dep flipper_len body_mass sex
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Adelie Pe… FEMA… Torge… Adelie 39.1 18.7 181 3750 male
## 2 Adelie Pe… FEMA… Torge… Adelie 39.5 17.4 186 3800 fema…
## 3 Adelie Pe… FEMA… Torge… Adelie 40.3 18 195 3250 fema…
## 4 Adelie Pe… FEMA… Torge… Adelie NA NA NA NA <NA>
## 5 Adelie Pe… FEMA… Torge… Adelie 36.7 19.3 193 3450 fema…
## 6 Adelie Pe… FEMA… Torge… Adelie 39.3 20.6 190 3650 male
## 7 Adelie Pe… FEMA… Torge… Adelie 38.9 17.8 181 3625 fema…
## 8 Adelie Pe… FEMA… Torge… Adelie 39.2 19.6 195 4675 male
## 9 Adelie Pe… FEMA… Torge… Adelie 34.1 18.1 193 3475 <NA>
## 10 Adelie Pe… FEMA… Torge… Adelie 42 20.2 190 4250 <NA>
## # ℹ 1,306 more rows
## # ℹ 1 more variable: year <dbl>
Describe the resulting data:
its just more data and more columns * Columns:Sex, Species, year,
species island, bill_len, bill_dep, flipper_len body_mass *
Rows:1172
How is it different from the original two datasets? way more daya a bit overwhelming and less organized.
penguins_small %>% full_join(penguins, by = "year")
## Warning in full_join(., penguins, by = "year"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 101 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 1,172 × 10
## Sex Species year species Island bill_len bill_dep flipper_len body_mass
## <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 male Chinstrap 2007 Adelie Torger… 39.1 18.7 181 3750
## 2 male Chinstrap 2007 Adelie Torger… 39.5 17.4 186 3800
## 3 male Chinstrap 2007 Adelie Torger… 40.3 18 195 3250
## 4 male Chinstrap 2007 Adelie Torger… NA NA NA NA
## 5 male Chinstrap 2007 Adelie Torger… 36.7 19.3 193 3450
## 6 male Chinstrap 2007 Adelie Torger… 39.3 20.6 190 3650
## 7 male Chinstrap 2007 Adelie Torger… 38.9 17.8 181 3625
## 8 male Chinstrap 2007 Adelie Torger… 39.2 19.6 195 4675
## 9 male Chinstrap 2007 Adelie Torger… 34.1 18.1 193 3475
## 10 male Chinstrap 2007 Adelie Torger… 42 20.2 190 4250
## # ℹ 1,162 more rows
## # ℹ 1 more variable: sex <chr>
Describe the resulting data:
it gives a good amount of data but also keeps it short with 10 rows of 3
* Columns:species, sex, island * Rows:10
How is it different from the original two datasets?
penguins_raw_small %>% semi_join(penguins, by = "Island")
## # A tibble: 10 × 3
## Species Sex Island
## <chr> <chr> <chr>
## 1 Adelie Penguin (Pygoscelis adeliae) FEMALE Torgersen
## 2 Gentoo penguin (Pygoscelis papua) MALE Biscoe
## 3 Gentoo penguin (Pygoscelis papua) FEMALE Biscoe
## 4 Adelie Penguin (Pygoscelis adeliae) MALE Biscoe
## 5 Adelie Penguin (Pygoscelis adeliae) <NA> Torgersen
## 6 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream
## 7 Adelie Penguin (Pygoscelis adeliae) MALE Dream
## 8 Gentoo penguin (Pygoscelis papua) MALE Biscoe
## 9 Gentoo penguin (Pygoscelis papua) FEMALE Biscoe
## 10 Adelie Penguin (Pygoscelis adeliae) FEMALE Dream
Describe the resulting data:
no data and the columns stayed the same * Columns:sex, species, year *
Rows:0
How is it different from the original two datasets? its similar with the columns but there are no rows of data
penguins_small %>% anti_join(penguins, by = "year")
## # A tibble: 0 × 3
## # ℹ 3 variables: Sex <chr>, Species <chr>, year <dbl>