Import two related datasets from TidyTuesday Project.
taylor_album_songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-17/taylor_album_songs.csv')
## Rows: 194 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): album_name, track_name, artist, featuring, key_name, mode_name, k...
## dbl (14): track_number, danceability, energy, key, loudness, mode, speechin...
## lgl (4): ep, bonus_track, explicit, lyrics
## date (4): album_release, promotional_release, single_release, track_release
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
taylor_all_songs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-10-17/taylor_all_songs.csv')
## Rows: 274 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): album_name, track_name, artist, featuring, key_name, mode_name, k...
## dbl (14): track_number, danceability, energy, key, loudness, mode, speechin...
## lgl (4): ep, bonus_track, explicit, lyrics
## date (4): album_release, promotional_release, single_release, track_release
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: Taylor Swift Album songs
Data 2: Taylor Swift All songs
set.seed(1234)
taylor_album_songs_small <- taylor_album_songs %>% select(album_name, single_release, track_name) %>% sample_n(10)
taylor_all_songs_small <- taylor_all_songs %>% select(album_name, single_release, bonus_track) %>% sample_n(10)
taylor_album_songs_small
## # A tibble: 10 × 3
## album_name single_release track_name
## <chr> <date> <chr>
## 1 Fearless (Taylor's Version) NA Change (Taylor's Version)
## 2 Red (Taylor's Version) NA Better Man (Taylor's Version) [Fr…
## 3 folklore NA epiphany
## 4 1989 NA Clean
## 5 reputation NA So It Goes...
## 6 Lover NA Daylight
## 7 Lover 2019-06-14 You Need To Calm Down
## 8 evermore NA long story short
## 9 folklore NA seven
## 10 Lover NA False God
taylor_all_songs_small
## # A tibble: 10 × 3
## album_name single_release bonus_track
## <chr> <date> <lgl>
## 1 Red NA TRUE
## 2 Red (Taylor's Version) NA FALSE
## 3 Midnights NA FALSE
## 4 Red NA FALSE
## 5 Red 2013-06-21 FALSE
## 6 folklore NA FALSE
## 7 Speak Now 2011-07-18 FALSE
## 8 Taylor Swift NA FALSE
## 9 reputation NA FALSE
## 10 Fearless (Taylor's Version) NA FALSE
Describe the resulting data:
How is it different from the original two datasets? * 7 Rows compared to 10 rows in the original dataset * all columns from the two datasets
*many-to-many relationship in this set
taylor_album_songs_small %>% inner_join(taylor_all_songs_small)
## Joining with `by = join_by(album_name, single_release)`
## # A tibble: 5 × 4
## album_name single_release track_name bonus_track
## <chr> <date> <chr> <lgl>
## 1 Fearless (Taylor's Version) NA Change (Taylor's Versi… FALSE
## 2 Red (Taylor's Version) NA Better Man (Taylor's V… FALSE
## 3 folklore NA epiphany FALSE
## 4 reputation NA So It Goes... FALSE
## 5 folklore NA seven FALSE
Describe the resulting data:
How is it different from the original two datasets? * keeps all observations in album_name * preserves the original observations in album_name even when there isn’t a match * still has many-to-many relationship
left_join(taylor_album_songs_small, taylor_all_songs_small, by = "album_name")
## # A tibble: 10 × 5
## album_name single_release.x track_name single_release.y bonus_track
## <chr> <date> <chr> <date> <lgl>
## 1 Fearless (Taylor's … NA Change (T… NA FALSE
## 2 Red (Taylor's Versi… NA Better Ma… NA FALSE
## 3 folklore NA epiphany NA FALSE
## 4 1989 NA Clean NA NA
## 5 reputation NA So It Goe… NA FALSE
## 6 Lover NA Daylight NA NA
## 7 Lover 2019-06-14 You Need … NA NA
## 8 evermore NA long stor… NA NA
## 9 folklore NA seven NA FALSE
## 10 Lover NA False God NA NA
Describe the resulting data:
How is it different from the original two datasets? * joined by album_name and single_release * still has many-to-many relationship
right_join(taylor_album_songs_small, taylor_all_songs_small)
## Joining with `by = join_by(album_name, single_release)`
## # A tibble: 11 × 4
## album_name single_release track_name bonus_track
## <chr> <date> <chr> <lgl>
## 1 Fearless (Taylor's Version) NA Change (Taylor's Vers… FALSE
## 2 Red (Taylor's Version) NA Better Man (Taylor's … FALSE
## 3 folklore NA epiphany FALSE
## 4 reputation NA So It Goes... FALSE
## 5 folklore NA seven FALSE
## 6 Red NA <NA> TRUE
## 7 Midnights NA <NA> FALSE
## 8 Red NA <NA> FALSE
## 9 Red 2013-06-21 <NA> FALSE
## 10 Speak Now 2011-07-18 <NA> FALSE
## 11 Taylor Swift NA <NA> FALSE
Describe the resulting data:
How is it different from the original two datasets? * keeps all observations from both data sets * still many-to-many relationship
full_join(taylor_album_songs_small, taylor_all_songs_small)
## Joining with `by = join_by(album_name, single_release)`
## # A tibble: 16 × 4
## album_name single_release track_name bonus_track
## <chr> <date> <chr> <lgl>
## 1 Fearless (Taylor's Version) NA Change (Taylor's Vers… FALSE
## 2 Red (Taylor's Version) NA Better Man (Taylor's … FALSE
## 3 folklore NA epiphany FALSE
## 4 1989 NA Clean NA
## 5 reputation NA So It Goes... FALSE
## 6 Lover NA Daylight NA
## 7 Lover 2019-06-14 You Need To Calm Down NA
## 8 evermore NA long story short NA
## 9 folklore NA seven FALSE
## 10 Lover NA False God NA
## 11 Red NA <NA> TRUE
## 12 Midnights NA <NA> FALSE
## 13 Red NA <NA> FALSE
## 14 Red 2013-06-21 <NA> FALSE
## 15 Speak Now 2011-07-18 <NA> FALSE
## 16 Taylor Swift NA <NA> FALSE
Describe the resulting data:
How is it different from the original two datasets? * only five rows compared to ten in the original * smaller tibble 5X3, and kept all observations in taylor_album_songs_small that have a match in taylor_all_songs_small
semi_join(taylor_album_songs_small, taylor_all_songs_small)
## Joining with `by = join_by(album_name, single_release)`
## # A tibble: 5 × 3
## album_name single_release track_name
## <chr> <date> <chr>
## 1 Fearless (Taylor's Version) NA Change (Taylor's Version)
## 2 Red (Taylor's Version) NA Better Man (Taylor's Version) [Fro…
## 3 folklore NA epiphany
## 4 reputation NA So It Goes...
## 5 folklore NA seven
Describe the resulting data:
How is it different from the original two datasets? * dropped all observations in taylor_album_songs_small that have a match in taylor_all_songs_small * only five rows compared to ten in original
anti_join(taylor_album_songs_small, taylor_all_songs_small)
## Joining with `by = join_by(album_name, single_release)`
## # A tibble: 5 × 3
## album_name single_release track_name
## <chr> <date> <chr>
## 1 1989 NA Clean
## 2 Lover NA Daylight
## 3 Lover 2019-06-14 You Need To Calm Down
## 4 evermore NA long story short
## 5 Lover NA False God