Week 9: Apply it to your data 8

This was probably my favorite assignment we’ve done yet. Just thought you should know.

1. Import your data

Import two related datasets from TidyTuesday Project.

details <- read_excel("../00_data/BGDetails.xlsx")

ratings <- read_excel("../00_data/BGRatings.xlsx")

2. Make data small

Describe the two datasets:

Data 1 Details

Columns: year, name, playingtime
Rows: 10 rows

Data 2 Ratings

Columns: year, name, average
Rows: 10 rows

set.seed(2020)
details_small <- details %>% select(year, name, playingtime) %>%
filter(year %in% c("2022")) %>% sample_n(10)
ratings_small <- ratings %>% select(year, name, average) %>% 
filter(year %in% c("2022")) %>% sample_n(10)


details_small

## # A tibble: 10 × 3
##     year name                                 playingtime
##    <dbl> <chr>                                      <dbl>
##  1  2022 Stationfall                                  120
##  2  2022 Sheep in Disguise                             45
##  3  2022 Reload                                         0
##  4  2022 Guards of Atlantis II: Tabletop MOBA         120
##  5  2022 Mythic Mischief                               90
##  6  2022 Shogun no Katana                             180
##  7  2022 Foundations of Rome                           90
##  8  2022 Caesar!: Seize Rome in 20 Minutes!            20
##  9  2022 Vagrantsong                                  120
## 10  2022 Trinidad                                     150

ratings_small

## # A tibble: 10 × 3
##     year name                         average
##    <dbl> <chr>                          <dbl>
##  1  2022 The Age of Atlantis             7.66
##  2  2022 Bardsung                        7.22
##  3  2022 The Warp                        8.99
##  4  2022 Trinidad                        7.24
##  5  2022 Vagrantsong                     8.15
##  6  2022 Dog Lover                       7.09
##  7  2022 John Company: Second Edition    8.2 
##  8  2022 Neuroriders                     8.36
##  9  2022 Darwin's Journey                7.97
## 10  2022 Agemonia                        8.69

3. inner_join

Describe the resulting data:

Columns: year, name, playingtime, average
Rows: 2

How is it different from the original two datasets?

2 rows compared to the 10 rows
all columns from the 2 datasets

details_small %>% inner_join(ratings_small, by = c ("year", "name"))

## # A tibble: 2 × 4
##    year name        playingtime average
##   <dbl> <chr>             <dbl>   <dbl>
## 1  2022 Vagrantsong         120    8.15
## 2  2022 Trinidad            150    7.24

4. left_join

Describe the resulting data:

Columns: year, name, playingtime, average
Rows: 10

How is it different from the original two datasets?

includes all names from the details_small table, even when not applicable

details_small %>% left_join(ratings_small, by = c ("year", "name"))

## # A tibble: 10 × 4
##     year name                                 playingtime average
##    <dbl> <chr>                                      <dbl>   <dbl>
##  1  2022 Stationfall                                  120   NA   
##  2  2022 Sheep in Disguise                             45   NA   
##  3  2022 Reload                                         0   NA   
##  4  2022 Guards of Atlantis II: Tabletop MOBA         120   NA   
##  5  2022 Mythic Mischief                               90   NA   
##  6  2022 Shogun no Katana                             180   NA   
##  7  2022 Foundations of Rome                           90   NA   
##  8  2022 Caesar!: Seize Rome in 20 Minutes!            20   NA   
##  9  2022 Vagrantsong                                  120    8.15
## 10  2022 Trinidad                                     150    7.24

5. right_join

Describe the resulting data:

Columns: year, name, playingtime, average
Rows: 10

How is it different from the original two datasets?

*It includes all rows from the ratings_small table, even when not applicable

details_small %>% right_join(ratings_small, by = c ("year", "name"))

## # A tibble: 10 × 4
##     year name                         playingtime average
##    <dbl> <chr>                              <dbl>   <dbl>
##  1  2022 Vagrantsong                          120    8.15
##  2  2022 Trinidad                             150    7.24
##  3  2022 The Age of Atlantis                   NA    7.66
##  4  2022 Bardsung                              NA    7.22
##  5  2022 The Warp                              NA    8.99
##  6  2022 Dog Lover                             NA    7.09
##  7  2022 John Company: Second Edition          NA    8.2 
##  8  2022 Neuroriders                           NA    8.36
##  9  2022 Darwin's Journey                      NA    7.97
## 10  2022 Agemonia                              NA    8.69

6. full_join

Describe the resulting data:

Columns: year, name, playingtime, average
Rows: 18

How is it different from the original two datasets?

*It used all the rows from both datasets, even when not applicable

details_small %>% full_join(ratings_small, by = c ("year", "name"))

## # A tibble: 18 × 4
##     year name                                 playingtime average
##    <dbl> <chr>                                      <dbl>   <dbl>
##  1  2022 Stationfall                                  120   NA   
##  2  2022 Sheep in Disguise                             45   NA   
##  3  2022 Reload                                         0   NA   
##  4  2022 Guards of Atlantis II: Tabletop MOBA         120   NA   
##  5  2022 Mythic Mischief                               90   NA   
##  6  2022 Shogun no Katana                             180   NA   
##  7  2022 Foundations of Rome                           90   NA   
##  8  2022 Caesar!: Seize Rome in 20 Minutes!            20   NA   
##  9  2022 Vagrantsong                                  120    8.15
## 10  2022 Trinidad                                     150    7.24
## 11  2022 The Age of Atlantis                           NA    7.66
## 12  2022 Bardsung                                      NA    7.22
## 13  2022 The Warp                                      NA    8.99
## 14  2022 Dog Lover                                     NA    7.09
## 15  2022 John Company: Second Edition                  NA    8.2 
## 16  2022 Neuroriders                                   NA    8.36
## 17  2022 Darwin's Journey                              NA    7.97
## 18  2022 Agemonia                                      NA    8.69

7. semi_join

Describe the resulting data:

Columns: year, name, playingtime
Rows: 2

How is it different from the original two datasets?

*It only shows the columns from the details_small dataset but only use rows that appear in both

details_small %>% semi_join(ratings_small, by = c ("year", "name"))

## # A tibble: 2 × 3
##    year name        playingtime
##   <dbl> <chr>             <dbl>
## 1  2022 Vagrantsong         120
## 2  2022 Trinidad            150

8. anti_join

Describe the resulting data:

Columns: year, name, playingtime
Rows: 8

How is it different from the original two datasets?

It shows the rows from details_small but without the rows that crossover with ratings_small

details_small %>% anti_join(ratings_small, by = c ("year", "name"))

## # A tibble: 8 × 3
##    year name                                 playingtime
##   <dbl> <chr>                                      <dbl>
## 1  2022 Stationfall                                  120
## 2  2022 Sheep in Disguise                             45
## 3  2022 Reload                                         0
## 4  2022 Guards of Atlantis II: Tabletop MOBA         120
## 5  2022 Mythic Mischief                               90
## 6  2022 Shogun no Katana                             180
## 7  2022 Foundations of Rome                           90
## 8  2022 Caesar!: Seize Rome in 20 Minutes!            20

Week 9: Apply it to your data 8

Paige Biester

2024-10-23

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join