Import two related datasets from TidyTuesday Project.
episodes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-08-08/episodes.csv')
## Rows: 300 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): title, guest
## dbl (4): season, episode_overall, episode_season, guest_appearance_number
## lgl (1): finished
## date (1): original_release
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
seasons <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-08-08/seasons.csv')
## Rows: 21 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): note
## dbl (2): season, episodes
## date (2): original_release, last_release
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
set.seed(1235)
episodes_small <- episodes %>% select(season, original_release, finished) %>% sample_n(10, replace = TRUE)
seasons_small <- seasons %>% select(season, original_release, note) %>% sample_n(10, replace = TRUE)
episodes_small
## # A tibble: 10 × 3
## season original_release finished
## <dbl> <date> <lgl>
## 1 2 2016-07-29 TRUE
## 2 13 2020-12-10 TRUE
## 3 13 2020-10-08 TRUE
## 4 5 2018-04-26 TRUE
## 5 14 2021-02-04 TRUE
## 6 4 2017-11-30 TRUE
## 7 8 2019-02-14 TRUE
## 8 17 2022-03-24 FALSE
## 9 15 2021-08-05 TRUE
## 10 19 2022-10-06 TRUE
seasons_small
## # A tibble: 10 × 3
## season original_release note
## <dbl> <date> <chr>
## 1 5 2018-01-18 <NA>
## 2 12 2020-06-25 <NA>
## 3 2 2015-12-10 <NA>
## 4 8 2019-01-24 <NA>
## 5 8 2019-01-24 <NA>
## 6 12 2020-06-25 <NA>
## 7 13 2020-10-01 <NA>
## 8 4 2017-07-20 <NA>
## 9 2 2015-12-10 <NA>
## 10 15 2021-05-27 <NA>
Describe the two datasets:
Data1
Data 2
Describe the resulting data:
How is it different from the original two datasets? 0 columns compared to original 10 4 rows out of 6 combined
seasons_small %>% inner_join(episodes_small)
## Joining with `by = join_by(season, original_release)`
## # A tibble: 0 × 4
## # ℹ 4 variables: season <dbl>, original_release <date>, note <chr>,
## # finished <lgl>
Describe the resulting data:
How is it different from the original two datasets? Includes 4 columns
episodes_small %>% left_join(seasons_small)
## Joining with `by = join_by(season, original_release)`
## # A tibble: 10 × 4
## season original_release finished note
## <dbl> <date> <lgl> <chr>
## 1 2 2016-07-29 TRUE <NA>
## 2 13 2020-12-10 TRUE <NA>
## 3 13 2020-10-08 TRUE <NA>
## 4 5 2018-04-26 TRUE <NA>
## 5 14 2021-02-04 TRUE <NA>
## 6 4 2017-11-30 TRUE <NA>
## 7 8 2019-02-14 TRUE <NA>
## 8 17 2022-03-24 FALSE <NA>
## 9 15 2021-08-05 TRUE <NA>
## 10 19 2022-10-06 TRUE <NA>
Describe the resulting data:
How is it different from the original two datasets? Different numbers for season, finished has NA instead of True or False
episodes_small %>% right_join(seasons_small)
## Joining with `by = join_by(season, original_release)`
## # A tibble: 10 × 4
## season original_release finished note
## <dbl> <date> <lgl> <chr>
## 1 5 2018-01-18 NA <NA>
## 2 12 2020-06-25 NA <NA>
## 3 2 2015-12-10 NA <NA>
## 4 8 2019-01-24 NA <NA>
## 5 8 2019-01-24 NA <NA>
## 6 12 2020-06-25 NA <NA>
## 7 13 2020-10-01 NA <NA>
## 8 4 2017-07-20 NA <NA>
## 9 2 2015-12-10 NA <NA>
## 10 15 2021-05-27 NA <NA>
Describe the resulting data:
How is it different from the original two datasets? Has 20 rows Finished includes NA as well as True and False
episodes_small %>% full_join(seasons_small)
## Joining with `by = join_by(season, original_release)`
## # A tibble: 20 × 4
## season original_release finished note
## <dbl> <date> <lgl> <chr>
## 1 2 2016-07-29 TRUE <NA>
## 2 13 2020-12-10 TRUE <NA>
## 3 13 2020-10-08 TRUE <NA>
## 4 5 2018-04-26 TRUE <NA>
## 5 14 2021-02-04 TRUE <NA>
## 6 4 2017-11-30 TRUE <NA>
## 7 8 2019-02-14 TRUE <NA>
## 8 17 2022-03-24 FALSE <NA>
## 9 15 2021-08-05 TRUE <NA>
## 10 19 2022-10-06 TRUE <NA>
## 11 5 2018-01-18 NA <NA>
## 12 12 2020-06-25 NA <NA>
## 13 2 2015-12-10 NA <NA>
## 14 8 2019-01-24 NA <NA>
## 15 8 2019-01-24 NA <NA>
## 16 12 2020-06-25 NA <NA>
## 17 13 2020-10-01 NA <NA>
## 18 4 2017-07-20 NA <NA>
## 19 2 2015-12-10 NA <NA>
## 20 15 2021-05-27 NA <NA>
Describe the resulting data:
How is it different from the original two datasets? 0 Columns in this set
episodes_small %>% semi_join(seasons_small)
## Joining with `by = join_by(season, original_release)`
## # A tibble: 0 × 3
## # ℹ 3 variables: season <dbl>, original_release <date>, finished <lgl>
Describe the resulting data:
How is it different from the original two datasets? Does not include Notes
episodes_small %>% anti_join(seasons_small)
## Joining with `by = join_by(season, original_release)`
## # A tibble: 10 × 3
## season original_release finished
## <dbl> <date> <lgl>
## 1 2 2016-07-29 TRUE
## 2 13 2020-12-10 TRUE
## 3 13 2020-10-08 TRUE
## 4 5 2018-04-26 TRUE
## 5 14 2021-02-04 TRUE
## 6 4 2017-11-30 TRUE
## 7 8 2019-02-14 TRUE
## 8 17 2022-03-24 FALSE
## 9 15 2021-08-05 TRUE
## 10 19 2022-10-06 TRUE