Import two related datasets from TidyTuesday Project.
breed_traits <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_traits.csv')
## Rows: 195 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Breed, Coat Type, Coat Length
## dbl (14): Affectionate With Family, Good With Young Children, Good With Othe...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
breed_rank_all <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-01/breed_rank.csv')
## Rows: 195 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Breed, links, Image
## dbl (8): 2013 Rank, 2014 Rank, 2015 Rank, 2016 Rank, 2017 Rank, 2018 Rank, 2...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets: The first dataset shows different dog breeds and where they rank. The second data set shows the traits that each breed has.
Data1: breed rank
Data 2: breed traits
# Make breed_rank_all small with 20 random rows
breed_rank_all_small <- breed_rank_all %>%
sample_n(20) %>%
select(Breed, `2015 Rank`, `2020 Rank`)
# Make breed_traits small with 20 random rows
breed_traits_small <- breed_traits %>%
sample_n(20) %>%
select(Breed, `Playfulness Level`, `Barking Level`)
# Display the first few rows of each small dataset
breed_rank_all_small
## # A tibble: 20 × 3
## Breed `2015 Rank` `2020 Rank`
## <chr> <dbl> <dbl>
## 1 Chinese Shar-Pei 59 61
## 2 Norwich Terriers 103 109
## 3 Scottish Terriers 58 57
## 4 Pumik NA 156
## 5 Norfolk Terriers 131 130
## 6 Treeing Walker Coonhounds 117 153
## 7 Norwegian Buhunds 165 168
## 8 Salukis 132 144
## 9 American Eskimo Dogs 118 119
## 10 Leonbergers 93 98
## 11 Kerry Blue Terriers 124 133
## 12 Mastiffs 25 33
## 13 Redbone Coonhounds 139 146
## 14 Maltese 31 38
## 15 Manchester Terriers 137 132
## 16 Rat Terriers NA 83
## 17 Grand Basset Griffon Vendeens NA 188
## 18 Tibetan Spaniels 115 118
## 19 Bergamasco Sheepdogs NA 173
## 20 Bichons Frises 44 45
breed_traits_small
## # A tibble: 20 × 3
## Breed `Playfulness Level` `Barking Level`
## <chr> <dbl> <dbl>
## 1 Schipperkes 4 4
## 2 Kuvaszok 3 3
## 3 Belgian Sheepdogs 3 3
## 4 Bulldogs 4 2
## 5 Samoyeds 5 5
## 6 Standard Schnauzers 4 3
## 7 Wirehaired Pointing Griffons 4 3
## 8 Keeshonden 5 4
## 9 Shiba Inu 3 3
## 10 Finnish Spitz 3 5
## 11 Bearded Collies 4 5
## 12 Greyhounds 3 3
## 13 Skye Terriers 3 3
## 14 Staffordshire Bull Terriers 4 3
## 15 Chinese Crested 3 3
## 16 Icelandic Sheepdogs 3 4
## 17 Pointers (German Wirehaired) 4 3
## 18 Pulik 3 3
## 19 Setters (Irish) 5 3
## 20 Wirehaired Vizslas 5 3
Describe the resulting data: The resulting dataset will include all columns from both datasets.
How is it different from the original two datasets? The inner join only keeps the intersection of both datasets. This means the resulting data has fewer rows than either dataset if not all breeds match.
inner_joined_data <- inner_join(breed_rank_all_small, breed_traits_small, by = "Breed")
inner_joined_data
## # A tibble: 0 × 5
## # ℹ 5 variables: Breed <chr>, 2015 Rank <dbl>, 2020 Rank <dbl>,
## # Playfulness Level <dbl>, Barking Level <dbl>
Describe the resulting data: The resulting dataset will include all columns from both datasets.
Columns:from dataset breed_rank_all_small (Breed, 2015 Rank, 2020 Rank). from dataset breed_traits_small (Breed, playfulness level, barking level)
Rows: All rows from breed_rank_all_small. if a breed in breed_trait_small has non-matching rows, the data will be presented as NA
How is it different from the original two datasets? This retains all rows from breed_rank_all_small. If a Breed in breed_rank_all_small has no match in breed_traits_small, it will still be included with NA values for the missing columns.
left_joined_data <- left_join(breed_rank_all_small, breed_traits_small, by = "Breed")
left_joined_data
## # A tibble: 20 × 5
## Breed `2015 Rank` `2020 Rank` `Playfulness Level` `Barking Level`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Chinese Shar-Pei 59 61 NA NA
## 2 Norwich Terriers 103 109 NA NA
## 3 Scottish Terriers 58 57 NA NA
## 4 Pumik NA 156 NA NA
## 5 Norfolk Terriers 131 130 NA NA
## 6 Treeing Walker C… 117 153 NA NA
## 7 Norwegian Buhunds 165 168 NA NA
## 8 Salukis 132 144 NA NA
## 9 American Eskimo … 118 119 NA NA
## 10 Leonbergers 93 98 NA NA
## 11 Kerry Blue Terri… 124 133 NA NA
## 12 Mastiffs 25 33 NA NA
## 13 Redbone Coonhoun… 139 146 NA NA
## 14 Maltese 31 38 NA NA
## 15 Manchester Terri… 137 132 NA NA
## 16 Rat Terriers NA 83 NA NA
## 17 Grand Basset Gri… NA 188 NA NA
## 18 Tibetan Spaniels 115 118 NA NA
## 19 Bergamasco Sheep… NA 173 NA NA
## 20 Bichons Frises 44 45 NA NA
Describe the resulting data: The resulting dataset will include all columns from both datasets.
How is it different from the original two datasets? This retains all rows from breed_traits_small. If a Breed in breed_traits_small has no match in breed_rank_all_small, it will still be included with NA values for the missing columns.
right_joined_data <- right_join(breed_rank_all_small, breed_traits_small, by = "Breed")
right_joined_data
## # A tibble: 20 × 5
## Breed `2015 Rank` `2020 Rank` `Playfulness Level` `Barking Level`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Schipperkes NA NA 4 4
## 2 Kuvaszok NA NA 3 3
## 3 Belgian Sheepdogs NA NA 3 3
## 4 Bulldogs NA NA 4 2
## 5 Samoyeds NA NA 5 5
## 6 Standard Schnauz… NA NA 4 3
## 7 Wirehaired Point… NA NA 4 3
## 8 Keeshonden NA NA 5 4
## 9 Shiba Inu NA NA 3 3
## 10 Finnish Spitz NA NA 3 5
## 11 Bearded Collies NA NA 4 5
## 12 Greyhounds NA NA 3 3
## 13 Skye Terriers NA NA 3 3
## 14 Staffordshire Bu… NA NA 4 3
## 15 Chinese Crested NA NA 3 3
## 16 Icelandic Sheepd… NA NA 3 4
## 17 Pointers (German… NA NA 4 3
## 18 Pulik NA NA 3 3
## 19 Setters (Irish) NA NA 5 3
## 20 Wirehaired Vizsl… NA NA 5 3
Describe the resulting data: All columns from both datasets. Rows: All rows from both datasets are included. Breeds that don’t match will have NA values for the columns from the other dataset.
full_joined_data <- full_join(breed_rank_all_small, breed_traits_small, by = "Breed")
full_joined_data
## # A tibble: 40 × 5
## Breed `2015 Rank` `2020 Rank` `Playfulness Level` `Barking Level`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Chinese Shar-Pei 59 61 NA NA
## 2 Norwich Terriers 103 109 NA NA
## 3 Scottish Terriers 58 57 NA NA
## 4 Pumik NA 156 NA NA
## 5 Norfolk Terriers 131 130 NA NA
## 6 Treeing Walker C… 117 153 NA NA
## 7 Norwegian Buhunds 165 168 NA NA
## 8 Salukis 132 144 NA NA
## 9 American Eskimo … 118 119 NA NA
## 10 Leonbergers 93 98 NA NA
## # ℹ 30 more rows
Describe the resulting data: Only columns from breed_rank_all_small. Rows: Only rows where Breed exists in both datasets are included.
How is it different from the original two datasets? A semi join returns only the rows from breed_rank_all_small that have matches in breed_traits_small, without adding any columns from breed_traits_small.
semi_joined_data <- semi_join(breed_rank_all_small, breed_traits_small, by = "Breed")
semi_joined_data
## # A tibble: 0 × 3
## # ℹ 3 variables: Breed <chr>, 2015 Rank <dbl>, 2020 Rank <dbl>
Describe the resulting data:
Only columns from breed_rank_all_small. Rows: Only rows from
breed_rank_all_small that do not have matches in breed_traits_small.
How is it different from the original two datasets? The anti join returns rows from breed_rank_all_small that have no match in breed_traits_small. This shows the Breeds that are missing from the other dataset.
anti_joined_data <- anti_join(breed_rank_all_small, breed_traits_small, by = "Breed")
anti_joined_data
## # A tibble: 20 × 3
## Breed `2015 Rank` `2020 Rank`
## <chr> <dbl> <dbl>
## 1 Chinese Shar-Pei 59 61
## 2 Norwich Terriers 103 109
## 3 Scottish Terriers 58 57
## 4 Pumik NA 156
## 5 Norfolk Terriers 131 130
## 6 Treeing Walker Coonhounds 117 153
## 7 Norwegian Buhunds 165 168
## 8 Salukis 132 144
## 9 American Eskimo Dogs 118 119
## 10 Leonbergers 93 98
## 11 Kerry Blue Terriers 124 133
## 12 Mastiffs 25 33
## 13 Redbone Coonhounds 139 146
## 14 Maltese 31 38
## 15 Manchester Terriers 137 132
## 16 Rat Terriers NA 83
## 17 Grand Basset Griffon Vendeens NA 188
## 18 Tibetan Spaniels 115 118
## 19 Bergamasco Sheepdogs NA 173
## 20 Bichons Frises 44 45