Import two related datasets from TidyTuesday Project.
longbeach <- readr::read_csv("../00_data/longbeach.csv")
## Rows: 29787 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): animal_id, animal_name, animal_type, primary_color, secondary_colo...
## dbl (2): latitude, longitude
## lgl (2): outcome_is_dead, was_outcome_alive
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dallas <- read_excel("../00_data/week18_dallas_animals.xlsx")
dallas <- dallas %>%
mutate(across(where(is.character), tolower))
longbeach
## # A tibble: 29,787 × 22
## animal_id animal_name animal_type primary_color secondary_color sex dob
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 A693708 *charlien dog white <NA> Female 2/21…
## 2 A708149 <NA> reptile brown green Unknown <NA>
## 3 A638068 <NA> bird green red Unknown <NA>
## 4 A639310 <NA> bird white gray Unknown <NA>
## 5 A618968 *morgan cat black white Female 12/1…
## 6 A730385 *brandon rabbit black white Neuter… 4/19…
## 7 A646202 <NA> bird black <NA> Unknown <NA>
## 8 A628138 <NA> other gray black Unknown 4/12…
## 9 A597464 <NA> cat black <NA> Unknown 8/21…
## 10 A734321 sophie dog cream <NA> Spayed 12/1…
## # ℹ 29,777 more rows
## # ℹ 15 more variables: intake_date <chr>, intake_condition <chr>,
## # intake_type <chr>, intake_subtype <chr>, reason_for_intake <chr>,
## # outcome_date <chr>, crossing <chr>, jurisdiction <chr>, outcome_type <chr>,
## # outcome_subtype <chr>, latitude <dbl>, longitude <dbl>,
## # outcome_is_dead <lgl>, was_outcome_alive <lgl>, geopoint <chr>
dallas
## # A tibble: 34,819 × 33
## animal_id animal_type animal_breed kennel_number kennel_status tag_type
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 a0979593 dog rhod ridgeback freezer unavailable na
## 2 a0743013 dog yorkshire terr receiving impounded na
## 3 a1004433 bird chicken bay 31 impounded na
## 4 a0969724 dog germ shepherd dc 15 unavailable na
## 5 a0981479 dog germ shepherd psdog 01 unavailable na
## 6 a0958138 dog basset hound lost lost report na
## 7 a1008940 cat domestic sh foster unavailable na
## 8 a1008867 dog germ shepherd lfd 069 impounded na
## 9 a1003731 cat domestic sh foster impounded na
## 10 a0957888 cat domestic sh cc 25 unavailable na
## # ℹ 34,809 more rows
## # ℹ 27 more variables: activity_number <chr>, activity_sequence <dbl>,
## # source_id <chr>, census_tract <chr>, council_district <chr>,
## # intake_type <chr>, intake_subtype <chr>, reason <chr>, staff_id <chr>,
## # intake_date <dttm>, intake_time <chr>, due_out <dttm>,
## # intake_condition <chr>, hold_request <chr>, outcome_type <chr>,
## # outcome_date <chr>, outcome_time <chr>, receipt_number <chr>, …
Describe the two datasets:
Data1: longbeach
Data 2: Dallas
set.seed(1234)
longbeach_small <- longbeach %>% select(animal_type, intake_type, primary_color) %>% sample_n(10)
dallas_small <- dallas %>% select(animal_type, intake_type, outcome_type) %>% sample_n(10)
longbeach_small
## # A tibble: 10 × 3
## animal_type intake_type primary_color
## <chr> <chr> <chr>
## 1 other wildlife brown
## 2 dog stray tricolor
## 3 other wildlife gray
## 4 dog stray tan
## 5 dog welfare seized brown brindle
## 6 cat stray black
## 7 cat stray brown tabby
## 8 dog stray tricolor
## 9 dog owner surrender blonde
## 10 cat stray brown tabby
dallas_small
## # A tibble: 10 × 3
## animal_type intake_type outcome_type
## <chr> <chr> <chr>
## 1 dog confiscated adoption
## 2 dog owner surrender euthanized
## 3 cat owner surrender adoption
## 4 dog stray euthanized
## 5 cat stray adoption
## 6 cat stray transfer
## 7 dog owner surrender euthanized
## 8 dog lost report lost report
## 9 dog stray euthanized
## 10 wildlife stray euthanized
Describe the resulting data:
How is it different from the original two datasets?
longbeach_small %>% inner_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")
## # A tibble: 14 × 4
## animal_type intake_type primary_color outcome_type
## <chr> <chr> <chr> <chr>
## 1 dog stray tricolor euthanized
## 2 dog stray tricolor euthanized
## 3 dog stray tan euthanized
## 4 dog stray tan euthanized
## 5 cat stray black adoption
## 6 cat stray black transfer
## 7 cat stray brown tabby adoption
## 8 cat stray brown tabby transfer
## 9 dog stray tricolor euthanized
## 10 dog stray tricolor euthanized
## 11 dog owner surrender blonde euthanized
## 12 dog owner surrender blonde euthanized
## 13 cat stray brown tabby adoption
## 14 cat stray brown tabby transfer
Describe the resulting data:
How is it different from the original two datasets?
longbeach_small %>% left_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")
## # A tibble: 17 × 4
## animal_type intake_type primary_color outcome_type
## <chr> <chr> <chr> <chr>
## 1 other wildlife brown <NA>
## 2 dog stray tricolor euthanized
## 3 dog stray tricolor euthanized
## 4 other wildlife gray <NA>
## 5 dog stray tan euthanized
## 6 dog stray tan euthanized
## 7 dog welfare seized brown brindle <NA>
## 8 cat stray black adoption
## 9 cat stray black transfer
## 10 cat stray brown tabby adoption
## 11 cat stray brown tabby transfer
## 12 dog stray tricolor euthanized
## 13 dog stray tricolor euthanized
## 14 dog owner surrender blonde euthanized
## 15 dog owner surrender blonde euthanized
## 16 cat stray brown tabby adoption
## 17 cat stray brown tabby transfer
Describe the resulting data:
How is it different from the original two datasets?
longbeach_small %>% right_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")
## # A tibble: 18 × 4
## animal_type intake_type primary_color outcome_type
## <chr> <chr> <chr> <chr>
## 1 dog stray tricolor euthanized
## 2 dog stray tricolor euthanized
## 3 dog stray tan euthanized
## 4 dog stray tan euthanized
## 5 cat stray black adoption
## 6 cat stray black transfer
## 7 cat stray brown tabby adoption
## 8 cat stray brown tabby transfer
## 9 dog stray tricolor euthanized
## 10 dog stray tricolor euthanized
## 11 dog owner surrender blonde euthanized
## 12 dog owner surrender blonde euthanized
## 13 cat stray brown tabby adoption
## 14 cat stray brown tabby transfer
## 15 dog confiscated <NA> adoption
## 16 cat owner surrender <NA> adoption
## 17 dog lost report <NA> lost report
## 18 wildlife stray <NA> euthanized
Describe the resulting data:
How is it different from the original two datasets?
longbeach_small %>% full_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")
## # A tibble: 21 × 4
## animal_type intake_type primary_color outcome_type
## <chr> <chr> <chr> <chr>
## 1 other wildlife brown <NA>
## 2 dog stray tricolor euthanized
## 3 dog stray tricolor euthanized
## 4 other wildlife gray <NA>
## 5 dog stray tan euthanized
## 6 dog stray tan euthanized
## 7 dog welfare seized brown brindle <NA>
## 8 cat stray black adoption
## 9 cat stray black transfer
## 10 cat stray brown tabby adoption
## # ℹ 11 more rows
Describe the resulting data:
How is it different from the original two datasets?
longbeach_small %>% semi_join(dallas_small, by = c("animal_type", "intake_type"))
## # A tibble: 7 × 3
## animal_type intake_type primary_color
## <chr> <chr> <chr>
## 1 dog stray tricolor
## 2 dog stray tan
## 3 cat stray black
## 4 cat stray brown tabby
## 5 dog stray tricolor
## 6 dog owner surrender blonde
## 7 cat stray brown tabby
Describe the resulting data:
How is it different from the original two datasets?
longbeach_small %>% anti_join(dallas_small, by = c("animal_type", "intake_type"))
## # A tibble: 3 × 3
## animal_type intake_type primary_color
## <chr> <chr> <chr>
## 1 other wildlife brown
## 2 other wildlife gray
## 3 dog welfare seized brown brindle