This was probably my favorite assignment we’ve done yet. Just thought you should know.
Import two related datasets from TidyTuesday Project.
details <- read_excel("../00_data/BGDetails.xlsx")
ratings <- read_excel("../00_data/BGRatings.xlsx")
Describe the two datasets:
Data 1 Details
Data 2 Ratings
set.seed(2020)
details_small <- details %>% select(year, name, playingtime) %>%
filter(year %in% c("2022")) %>% sample_n(10)
ratings_small <- ratings %>% select(year, name, average) %>%
filter(year %in% c("2022")) %>% sample_n(10)
details_small
## # A tibble: 10 × 3
## year name playingtime
## <dbl> <chr> <dbl>
## 1 2022 Stationfall 120
## 2 2022 Sheep in Disguise 45
## 3 2022 Reload 0
## 4 2022 Guards of Atlantis II: Tabletop MOBA 120
## 5 2022 Mythic Mischief 90
## 6 2022 Shogun no Katana 180
## 7 2022 Foundations of Rome 90
## 8 2022 Caesar!: Seize Rome in 20 Minutes! 20
## 9 2022 Vagrantsong 120
## 10 2022 Trinidad 150
ratings_small
## # A tibble: 10 × 3
## year name average
## <dbl> <chr> <dbl>
## 1 2022 The Age of Atlantis 7.66
## 2 2022 Bardsung 7.22
## 3 2022 The Warp 8.99
## 4 2022 Trinidad 7.24
## 5 2022 Vagrantsong 8.15
## 6 2022 Dog Lover 7.09
## 7 2022 John Company: Second Edition 8.2
## 8 2022 Neuroriders 8.36
## 9 2022 Darwin's Journey 7.97
## 10 2022 Agemonia 8.69
Describe the resulting data:
How is it different from the original two datasets?
2 rows compared to the 10 rows
all columns from the 2 datasets
details_small %>% inner_join(ratings_small, by = c ("year", "name"))
## # A tibble: 2 × 4
## year name playingtime average
## <dbl> <chr> <dbl> <dbl>
## 1 2022 Vagrantsong 120 8.15
## 2 2022 Trinidad 150 7.24
Describe the resulting data:
How is it different from the original two datasets?
details_small %>% left_join(ratings_small, by = c ("year", "name"))
## # A tibble: 10 × 4
## year name playingtime average
## <dbl> <chr> <dbl> <dbl>
## 1 2022 Stationfall 120 NA
## 2 2022 Sheep in Disguise 45 NA
## 3 2022 Reload 0 NA
## 4 2022 Guards of Atlantis II: Tabletop MOBA 120 NA
## 5 2022 Mythic Mischief 90 NA
## 6 2022 Shogun no Katana 180 NA
## 7 2022 Foundations of Rome 90 NA
## 8 2022 Caesar!: Seize Rome in 20 Minutes! 20 NA
## 9 2022 Vagrantsong 120 8.15
## 10 2022 Trinidad 150 7.24
Describe the resulting data:
How is it different from the original two datasets?
*It includes all rows from the ratings_small table, even when not applicable
details_small %>% right_join(ratings_small, by = c ("year", "name"))
## # A tibble: 10 × 4
## year name playingtime average
## <dbl> <chr> <dbl> <dbl>
## 1 2022 Vagrantsong 120 8.15
## 2 2022 Trinidad 150 7.24
## 3 2022 The Age of Atlantis NA 7.66
## 4 2022 Bardsung NA 7.22
## 5 2022 The Warp NA 8.99
## 6 2022 Dog Lover NA 7.09
## 7 2022 John Company: Second Edition NA 8.2
## 8 2022 Neuroriders NA 8.36
## 9 2022 Darwin's Journey NA 7.97
## 10 2022 Agemonia NA 8.69
Describe the resulting data:
How is it different from the original two datasets?
*It used all the rows from both datasets, even when not applicable
details_small %>% full_join(ratings_small, by = c ("year", "name"))
## # A tibble: 18 × 4
## year name playingtime average
## <dbl> <chr> <dbl> <dbl>
## 1 2022 Stationfall 120 NA
## 2 2022 Sheep in Disguise 45 NA
## 3 2022 Reload 0 NA
## 4 2022 Guards of Atlantis II: Tabletop MOBA 120 NA
## 5 2022 Mythic Mischief 90 NA
## 6 2022 Shogun no Katana 180 NA
## 7 2022 Foundations of Rome 90 NA
## 8 2022 Caesar!: Seize Rome in 20 Minutes! 20 NA
## 9 2022 Vagrantsong 120 8.15
## 10 2022 Trinidad 150 7.24
## 11 2022 The Age of Atlantis NA 7.66
## 12 2022 Bardsung NA 7.22
## 13 2022 The Warp NA 8.99
## 14 2022 Dog Lover NA 7.09
## 15 2022 John Company: Second Edition NA 8.2
## 16 2022 Neuroriders NA 8.36
## 17 2022 Darwin's Journey NA 7.97
## 18 2022 Agemonia NA 8.69
Describe the resulting data:
How is it different from the original two datasets?
*It only shows the columns from the details_small dataset but only use rows that appear in both
details_small %>% semi_join(ratings_small, by = c ("year", "name"))
## # A tibble: 2 × 3
## year name playingtime
## <dbl> <chr> <dbl>
## 1 2022 Vagrantsong 120
## 2 2022 Trinidad 150
Describe the resulting data:
How is it different from the original two datasets?
details_small %>% anti_join(ratings_small, by = c ("year", "name"))
## # A tibble: 8 × 3
## year name playingtime
## <dbl> <chr> <dbl>
## 1 2022 Stationfall 120
## 2 2022 Sheep in Disguise 45
## 3 2022 Reload 0
## 4 2022 Guards of Atlantis II: Tabletop MOBA 120
## 5 2022 Mythic Mischief 90
## 6 2022 Shogun no Katana 180
## 7 2022 Foundations of Rome 90
## 8 2022 Caesar!: Seize Rome in 20 Minutes! 20