Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

longbeach <- readr::read_csv("../00_data/longbeach.csv")

## Rows: 29787 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): animal_id, animal_name, animal_type, primary_color, secondary_colo...
## dbl  (2): latitude, longitude
## lgl  (2): outcome_is_dead, was_outcome_alive
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dallas <- read_excel("../00_data/week18_dallas_animals.xlsx")

dallas <- dallas %>%
  mutate(across(where(is.character), tolower))

longbeach

## # A tibble: 29,787 × 22
##    animal_id animal_name animal_type primary_color secondary_color sex     dob  
##    <chr>     <chr>       <chr>       <chr>         <chr>           <chr>   <chr>
##  1 A693708   *charlien   dog         white         <NA>            Female  2/21…
##  2 A708149   <NA>        reptile     brown         green           Unknown <NA> 
##  3 A638068   <NA>        bird        green         red             Unknown <NA> 
##  4 A639310   <NA>        bird        white         gray            Unknown <NA> 
##  5 A618968   *morgan     cat         black         white           Female  12/1…
##  6 A730385   *brandon    rabbit      black         white           Neuter… 4/19…
##  7 A646202   <NA>        bird        black         <NA>            Unknown <NA> 
##  8 A628138   <NA>        other       gray          black           Unknown 4/12…
##  9 A597464   <NA>        cat         black         <NA>            Unknown 8/21…
## 10 A734321   sophie      dog         cream         <NA>            Spayed  12/1…
## # ℹ 29,777 more rows
## # ℹ 15 more variables: intake_date <chr>, intake_condition <chr>,
## #   intake_type <chr>, intake_subtype <chr>, reason_for_intake <chr>,
## #   outcome_date <chr>, crossing <chr>, jurisdiction <chr>, outcome_type <chr>,
## #   outcome_subtype <chr>, latitude <dbl>, longitude <dbl>,
## #   outcome_is_dead <lgl>, was_outcome_alive <lgl>, geopoint <chr>

dallas

## # A tibble: 34,819 × 33
##    animal_id animal_type animal_breed   kennel_number kennel_status tag_type
##    <chr>     <chr>       <chr>          <chr>         <chr>         <chr>   
##  1 a0979593  dog         rhod ridgeback freezer       unavailable   na      
##  2 a0743013  dog         yorkshire terr receiving     impounded     na      
##  3 a1004433  bird        chicken        bay 31        impounded     na      
##  4 a0969724  dog         germ shepherd  dc 15         unavailable   na      
##  5 a0981479  dog         germ shepherd  psdog 01      unavailable   na      
##  6 a0958138  dog         basset hound   lost          lost report   na      
##  7 a1008940  cat         domestic sh    foster        unavailable   na      
##  8 a1008867  dog         germ shepherd  lfd 069       impounded     na      
##  9 a1003731  cat         domestic sh    foster        impounded     na      
## 10 a0957888  cat         domestic sh    cc 25         unavailable   na      
## # ℹ 34,809 more rows
## # ℹ 27 more variables: activity_number <chr>, activity_sequence <dbl>,
## #   source_id <chr>, census_tract <chr>, council_district <chr>,
## #   intake_type <chr>, intake_subtype <chr>, reason <chr>, staff_id <chr>,
## #   intake_date <dttm>, intake_time <chr>, due_out <dttm>,
## #   intake_condition <chr>, hold_request <chr>, outcome_type <chr>,
## #   outcome_date <chr>, outcome_time <chr>, receipt_number <chr>, …

2. Make data small

Describe the two datasets:

Data1: longbeach

Columns: animal_type, intake_type, reason_for_intake
Rows: 10 rows

Data 2: Dallas

Columns: animal_type, intake_type, outcome_type
Rows: 10 rows

set.seed(1234)
longbeach_small <- longbeach %>% select(animal_type, intake_type, primary_color) %>% sample_n(10)
dallas_small    <- dallas %>% select(animal_type, intake_type, outcome_type) %>% sample_n(10)

longbeach_small

## # A tibble: 10 × 3
##    animal_type intake_type     primary_color
##    <chr>       <chr>           <chr>        
##  1 other       wildlife        brown        
##  2 dog         stray           tricolor     
##  3 other       wildlife        gray         
##  4 dog         stray           tan          
##  5 dog         welfare seized  brown brindle
##  6 cat         stray           black        
##  7 cat         stray           brown  tabby 
##  8 dog         stray           tricolor     
##  9 dog         owner surrender blonde       
## 10 cat         stray           brown  tabby

dallas_small

## # A tibble: 10 × 3
##    animal_type intake_type     outcome_type
##    <chr>       <chr>           <chr>       
##  1 dog         confiscated     adoption    
##  2 dog         owner surrender euthanized  
##  3 cat         owner surrender adoption    
##  4 dog         stray           euthanized  
##  5 cat         stray           adoption    
##  6 cat         stray           transfer    
##  7 dog         owner surrender euthanized  
##  8 dog         lost report     lost report 
##  9 dog         stray           euthanized  
## 10 wildlife    stray           euthanized

3. inner_join

Describe the resulting data:

Columns: animal_type, intake_type, primary_color, outcome_type
Rows: 14

How is it different from the original two datasets?

14 rows compared to 10 rows in the original datasets
all columns from the two datasets

longbeach_small %>% inner_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")

## # A tibble: 14 × 4
##    animal_type intake_type     primary_color outcome_type
##    <chr>       <chr>           <chr>         <chr>       
##  1 dog         stray           tricolor      euthanized  
##  2 dog         stray           tricolor      euthanized  
##  3 dog         stray           tan           euthanized  
##  4 dog         stray           tan           euthanized  
##  5 cat         stray           black         adoption    
##  6 cat         stray           black         transfer    
##  7 cat         stray           brown  tabby  adoption    
##  8 cat         stray           brown  tabby  transfer    
##  9 dog         stray           tricolor      euthanized  
## 10 dog         stray           tricolor      euthanized  
## 11 dog         owner surrender blonde        euthanized  
## 12 dog         owner surrender blonde        euthanized  
## 13 cat         stray           brown  tabby  adoption    
## 14 cat         stray           brown  tabby  transfer

4. left_join

Describe the resulting data:

Columns: animal_type, intake_type, primary_color, outcome_type
Rows: 17

How is it different from the original two datasets?

17 rows compared to 10 rows in the original datasets
all columns from the two datasets

longbeach_small %>% left_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")

## # A tibble: 17 × 4
##    animal_type intake_type     primary_color outcome_type
##    <chr>       <chr>           <chr>         <chr>       
##  1 other       wildlife        brown         <NA>        
##  2 dog         stray           tricolor      euthanized  
##  3 dog         stray           tricolor      euthanized  
##  4 other       wildlife        gray          <NA>        
##  5 dog         stray           tan           euthanized  
##  6 dog         stray           tan           euthanized  
##  7 dog         welfare seized  brown brindle <NA>        
##  8 cat         stray           black         adoption    
##  9 cat         stray           black         transfer    
## 10 cat         stray           brown  tabby  adoption    
## 11 cat         stray           brown  tabby  transfer    
## 12 dog         stray           tricolor      euthanized  
## 13 dog         stray           tricolor      euthanized  
## 14 dog         owner surrender blonde        euthanized  
## 15 dog         owner surrender blonde        euthanized  
## 16 cat         stray           brown  tabby  adoption    
## 17 cat         stray           brown  tabby  transfer

5. right_join

Describe the resulting data:

Columns: animal_type, intake_type, primary_color, outcome_type
Rows: 18

How is it different from the original two datasets?

18 rows compared to 10 rows in the original datasets
all columns from the two datasets

longbeach_small %>% right_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")

## # A tibble: 18 × 4
##    animal_type intake_type     primary_color outcome_type
##    <chr>       <chr>           <chr>         <chr>       
##  1 dog         stray           tricolor      euthanized  
##  2 dog         stray           tricolor      euthanized  
##  3 dog         stray           tan           euthanized  
##  4 dog         stray           tan           euthanized  
##  5 cat         stray           black         adoption    
##  6 cat         stray           black         transfer    
##  7 cat         stray           brown  tabby  adoption    
##  8 cat         stray           brown  tabby  transfer    
##  9 dog         stray           tricolor      euthanized  
## 10 dog         stray           tricolor      euthanized  
## 11 dog         owner surrender blonde        euthanized  
## 12 dog         owner surrender blonde        euthanized  
## 13 cat         stray           brown  tabby  adoption    
## 14 cat         stray           brown  tabby  transfer    
## 15 dog         confiscated     <NA>          adoption    
## 16 cat         owner surrender <NA>          adoption    
## 17 dog         lost report     <NA>          lost report 
## 18 wildlife    stray           <NA>          euthanized

6. full_join

Describe the resulting data:

Columns: animal_type, intake_type, primary_color, outcome_type
Rows: 21

How is it different from the original two datasets?

21 rows compared to 10 rows in the original datasets
all columns from the two datasets

longbeach_small %>% full_join(dallas_small, by = c("animal_type", "intake_type"), relationship = "many-to-many")

## # A tibble: 21 × 4
##    animal_type intake_type    primary_color outcome_type
##    <chr>       <chr>          <chr>         <chr>       
##  1 other       wildlife       brown         <NA>        
##  2 dog         stray          tricolor      euthanized  
##  3 dog         stray          tricolor      euthanized  
##  4 other       wildlife       gray          <NA>        
##  5 dog         stray          tan           euthanized  
##  6 dog         stray          tan           euthanized  
##  7 dog         welfare seized brown brindle <NA>        
##  8 cat         stray          black         adoption    
##  9 cat         stray          black         transfer    
## 10 cat         stray          brown  tabby  adoption    
## # ℹ 11 more rows

7. semi_join

Describe the resulting data:

Columns: animal_type, intake_type, primary_color
Rows: 7

How is it different from the original two datasets?

7 rows compared to 10 rows in the original datasets
only three columns from the original four

longbeach_small %>% semi_join(dallas_small, by = c("animal_type", "intake_type"))

## # A tibble: 7 × 3
##   animal_type intake_type     primary_color
##   <chr>       <chr>           <chr>        
## 1 dog         stray           tricolor     
## 2 dog         stray           tan          
## 3 cat         stray           black        
## 4 cat         stray           brown  tabby 
## 5 dog         stray           tricolor     
## 6 dog         owner surrender blonde       
## 7 cat         stray           brown  tabby

8. anti_join

Describe the resulting data:

Columns: animal_type, intake_type, primary_color
Rows: 3

How is it different from the original two datasets?

3 rows compared to 10 rows in the original datasets
only three columns from the original four

longbeach_small %>% anti_join(dallas_small, by = c("animal_type", "intake_type"))

## # A tibble: 3 × 3
##   animal_type intake_type    primary_color
##   <chr>       <chr>          <chr>        
## 1 other       wildlife       brown        
## 2 other       wildlife       gray         
## 3 dog         welfare seized brown brindle

Week 9: Apply it to your data 8

Liam Smith

2022-10-22

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join