1. Import your data

Import two related datasets from TidyTuesday Project.

breed_traits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_traits.csv')
## Rows: 195 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Breed, Coat Type, Coat Length
## dbl (14): Affectionate With Family, Good With Young Children, Good With Othe...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
breed_rank_all <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_rank.csv')
## Rows: 195 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Breed, links, Image
## dbl (8): 2013 Rank, 2014 Rank, 2015 Rank, 2016 Rank, 2017 Rank, 2018 Rank, 2...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets: The first dataset shows different dog breeds and where they rank. The second data set shows the traits that each breed has.

Data1: breed rank

Data 2: breed traits

# Make breed_rank_all small with 20 random rows
breed_rank_all_small <- breed_rank_all %>%
    sample_n(20) %>%
    select(Breed, `2015 Rank`, `2020 Rank`)

# Make breed_traits small with 20 random rows
breed_traits_small <- breed_traits %>%
    sample_n(20) %>%
    select(Breed, `Playfulness Level`, `Barking Level`)
# Display the first few rows of each small dataset
breed_rank_all_small
## # A tibble: 20 × 3
##    Breed                         `2015 Rank` `2020 Rank`
##    <chr>                               <dbl>       <dbl>
##  1 Chinese Shar-Pei                       59          61
##  2 Norwich Terriers                      103         109
##  3 Scottish Terriers                      58          57
##  4 Pumik                                  NA         156
##  5 Norfolk Terriers                      131         130
##  6 Treeing Walker Coonhounds             117         153
##  7 Norwegian Buhunds                     165         168
##  8 Salukis                               132         144
##  9 American Eskimo Dogs                  118         119
## 10 Leonbergers                            93          98
## 11 Kerry Blue Terriers                   124         133
## 12 Mastiffs                               25          33
## 13 Redbone Coonhounds                    139         146
## 14 Maltese                                31          38
## 15 Manchester Terriers                   137         132
## 16 Rat Terriers                           NA          83
## 17 Grand Basset Griffon Vendeens          NA         188
## 18 Tibetan Spaniels                      115         118
## 19 Bergamasco Sheepdogs                   NA         173
## 20 Bichons Frises                         44          45
breed_traits_small
## # A tibble: 20 × 3
##    Breed                        `Playfulness Level` `Barking Level`
##    <chr>                                      <dbl>           <dbl>
##  1 Schipperkes                                    4               4
##  2 Kuvaszok                                       3               3
##  3 Belgian Sheepdogs                              3               3
##  4 Bulldogs                                       4               2
##  5 Samoyeds                                       5               5
##  6 Standard Schnauzers                            4               3
##  7 Wirehaired Pointing Griffons                   4               3
##  8 Keeshonden                                     5               4
##  9 Shiba Inu                                      3               3
## 10 Finnish Spitz                                  3               5
## 11 Bearded Collies                                4               5
## 12 Greyhounds                                     3               3
## 13 Skye Terriers                                  3               3
## 14 Staffordshire Bull Terriers                    4               3
## 15 Chinese Crested                                3               3
## 16 Icelandic Sheepdogs                            3               4
## 17 Pointers (German Wirehaired)                   4               3
## 18 Pulik                                          3               3
## 19 Setters (Irish)                                5               3
## 20 Wirehaired Vizslas                             5               3

3. inner_join

Describe the resulting data: The resulting dataset will include all columns from both datasets.

How is it different from the original two datasets? The inner join only keeps the intersection of both datasets. This means the resulting data has fewer rows than either dataset if not all breeds match.

inner_joined_data <- inner_join(breed_rank_all_small, breed_traits_small, by = "Breed")

inner_joined_data
## # A tibble: 0 × 5
## # ℹ 5 variables: Breed <chr>, 2015 Rank <dbl>, 2020 Rank <dbl>,
## #   Playfulness Level <dbl>, Barking Level <dbl>

4. left_join

Describe the resulting data: The resulting dataset will include all columns from both datasets.

How is it different from the original two datasets? This retains all rows from breed_rank_all_small. If a Breed in breed_rank_all_small has no match in breed_traits_small, it will still be included with NA values for the missing columns.

left_joined_data <- left_join(breed_rank_all_small, breed_traits_small, by = "Breed")

left_joined_data
## # A tibble: 20 × 5
##    Breed             `2015 Rank` `2020 Rank` `Playfulness Level` `Barking Level`
##    <chr>                   <dbl>       <dbl>               <dbl>           <dbl>
##  1 Chinese Shar-Pei           59          61                  NA              NA
##  2 Norwich Terriers          103         109                  NA              NA
##  3 Scottish Terriers          58          57                  NA              NA
##  4 Pumik                      NA         156                  NA              NA
##  5 Norfolk Terriers          131         130                  NA              NA
##  6 Treeing Walker C…         117         153                  NA              NA
##  7 Norwegian Buhunds         165         168                  NA              NA
##  8 Salukis                   132         144                  NA              NA
##  9 American Eskimo …         118         119                  NA              NA
## 10 Leonbergers                93          98                  NA              NA
## 11 Kerry Blue Terri…         124         133                  NA              NA
## 12 Mastiffs                   25          33                  NA              NA
## 13 Redbone Coonhoun…         139         146                  NA              NA
## 14 Maltese                    31          38                  NA              NA
## 15 Manchester Terri…         137         132                  NA              NA
## 16 Rat Terriers               NA          83                  NA              NA
## 17 Grand Basset Gri…          NA         188                  NA              NA
## 18 Tibetan Spaniels          115         118                  NA              NA
## 19 Bergamasco Sheep…          NA         173                  NA              NA
## 20 Bichons Frises             44          45                  NA              NA

5. right_join

Describe the resulting data: The resulting dataset will include all columns from both datasets.

How is it different from the original two datasets? This retains all rows from breed_traits_small. If a Breed in breed_traits_small has no match in breed_rank_all_small, it will still be included with NA values for the missing columns.

right_joined_data <- right_join(breed_rank_all_small, breed_traits_small, by = "Breed")

right_joined_data
## # A tibble: 20 × 5
##    Breed             `2015 Rank` `2020 Rank` `Playfulness Level` `Barking Level`
##    <chr>                   <dbl>       <dbl>               <dbl>           <dbl>
##  1 Schipperkes                NA          NA                   4               4
##  2 Kuvaszok                   NA          NA                   3               3
##  3 Belgian Sheepdogs          NA          NA                   3               3
##  4 Bulldogs                   NA          NA                   4               2
##  5 Samoyeds                   NA          NA                   5               5
##  6 Standard Schnauz…          NA          NA                   4               3
##  7 Wirehaired Point…          NA          NA                   4               3
##  8 Keeshonden                 NA          NA                   5               4
##  9 Shiba Inu                  NA          NA                   3               3
## 10 Finnish Spitz              NA          NA                   3               5
## 11 Bearded Collies            NA          NA                   4               5
## 12 Greyhounds                 NA          NA                   3               3
## 13 Skye Terriers              NA          NA                   3               3
## 14 Staffordshire Bu…          NA          NA                   4               3
## 15 Chinese Crested            NA          NA                   3               3
## 16 Icelandic Sheepd…          NA          NA                   3               4
## 17 Pointers (German…          NA          NA                   4               3
## 18 Pulik                      NA          NA                   3               3
## 19 Setters (Irish)            NA          NA                   5               3
## 20 Wirehaired Vizsl…          NA          NA                   5               3

6. full_join

Describe the resulting data: All columns from both datasets. Rows: All rows from both datasets are included. Breeds that don’t match will have NA values for the columns from the other dataset.

full_joined_data <- full_join(breed_rank_all_small, breed_traits_small, by = "Breed")

full_joined_data
## # A tibble: 40 × 5
##    Breed             `2015 Rank` `2020 Rank` `Playfulness Level` `Barking Level`
##    <chr>                   <dbl>       <dbl>               <dbl>           <dbl>
##  1 Chinese Shar-Pei           59          61                  NA              NA
##  2 Norwich Terriers          103         109                  NA              NA
##  3 Scottish Terriers          58          57                  NA              NA
##  4 Pumik                      NA         156                  NA              NA
##  5 Norfolk Terriers          131         130                  NA              NA
##  6 Treeing Walker C…         117         153                  NA              NA
##  7 Norwegian Buhunds         165         168                  NA              NA
##  8 Salukis                   132         144                  NA              NA
##  9 American Eskimo …         118         119                  NA              NA
## 10 Leonbergers                93          98                  NA              NA
## # ℹ 30 more rows

7. semi_join

Describe the resulting data: Only columns from breed_rank_all_small. Rows: Only rows where Breed exists in both datasets are included.

How is it different from the original two datasets? A semi join returns only the rows from breed_rank_all_small that have matches in breed_traits_small, without adding any columns from breed_traits_small.

semi_joined_data <- semi_join(breed_rank_all_small, breed_traits_small, by = "Breed")

semi_joined_data
## # A tibble: 0 × 3
## # ℹ 3 variables: Breed <chr>, 2015 Rank <dbl>, 2020 Rank <dbl>

8. anti_join

Describe the resulting data:
Only columns from breed_rank_all_small. Rows: Only rows from breed_rank_all_small that do not have matches in breed_traits_small.

How is it different from the original two datasets? The anti join returns rows from breed_rank_all_small that have no match in breed_traits_small. This shows the Breeds that are missing from the other dataset.

anti_joined_data <- anti_join(breed_rank_all_small, breed_traits_small, by = "Breed")

anti_joined_data
## # A tibble: 20 × 3
##    Breed                         `2015 Rank` `2020 Rank`
##    <chr>                               <dbl>       <dbl>
##  1 Chinese Shar-Pei                       59          61
##  2 Norwich Terriers                      103         109
##  3 Scottish Terriers                      58          57
##  4 Pumik                                  NA         156
##  5 Norfolk Terriers                      131         130
##  6 Treeing Walker Coonhounds             117         153
##  7 Norwegian Buhunds                     165         168
##  8 Salukis                               132         144
##  9 American Eskimo Dogs                  118         119
## 10 Leonbergers                            93          98
## 11 Kerry Blue Terriers                   124         133
## 12 Mastiffs                               25          33
## 13 Redbone Coonhounds                    139         146
## 14 Maltese                                31          38
## 15 Manchester Terriers                   137         132
## 16 Rat Terriers                           NA          83
## 17 Grand Basset Griffon Vendeens          NA         188
## 18 Tibetan Spaniels                      115         118
## 19 Bergamasco Sheepdogs                   NA         173
## 20 Bichons Frises                         44          45