Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

cats_uk <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-31/cats_uk.csv')

## Rows: 18215 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): tag_id, study_name
## dbl  (5): event_id, location_long, location_lat, ground_speed, height_above_...
## lgl  (3): visible, algorithm_marked_outlier, manually_marked_outlier
## dttm (1): timestamp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

cats_uk_reference <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-01-31/cats_uk_reference.csv')

## Rows: 101 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): tag_id, animal_id, animal_taxon, animal_reproductive_condition, an...
## dbl  (4): prey_p_month, hrs_indoors, n_cats, age_years
## lgl  (4): hunt, food_dry, food_wet, food_other
## dttm (2): deploy_on_date, deploy_off_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1: cats_uk

Columns: tag_id, event_id, location_long
Rows: 10 rows

Data 2: cats_uk_reference

Columns: tag_id, animal_taxon, hunt
Rows: 10 rows

set.seed(1234)
cats_uk_small <- cats_uk %>% select(tag_id, event_id, location_long) %>% sample_n(10)
cats_uk_reference_small <- cats_uk_reference %>% select(tag_id, animal_taxon, hunt) %>% sample_n(10)

cats_uk_small

## # A tibble: 10 × 3
##    tag_id           event_id location_long
##    <chr>               <dbl>         <dbl>
##  1 Ernie-Tag      3507105313         -4.64
##  2 Bits-Tag       3544803591         -4.92
##  3 Amber-Tag      3507104912         -4.64
##  4 Bits-Tag       3544803661         -4.92
##  5 Smudge-Tag     3637402232         -5.08
##  6 Jago           3396159641         -5.07
##  7 Fairclough-Tag 3766102907         -5.18
##  8 Frank_2-Tag    3672007720         -5.54
##  9 Charlie        3403118802         -5.08
## 10 Tilly-Tag      3716217172         -5.30

cats_uk_reference_small

## # A tibble: 10 × 3
##    tag_id         animal_taxon hunt 
##    <chr>          <chr>        <lgl>
##  1 Lola           Felis catus  TRUE 
##  2 Millie-Tag     Felis catus  TRUE 
##  3 Jim-Tag        Felis catus  TRUE 
##  4 Siberia-Tag    Felis catus  TRUE 
##  5 Reggie-Tag     Felis catus  TRUE 
##  6 Fairclough-Tag Felis catus  FALSE
##  7 Dexter2-Tag    Felis catus  FALSE
##  8 Fonzie-Tag     Felis catus  TRUE 
##  9 Frank-Tag      Felis catus  TRUE 
## 10 Freya-Tag      Felis catus  TRUE

3. inner_join

Describe the resulting data:

Columns: tag_id, event_id, location_long, animal_taxon, hunt
Rows: 1

How is it different from the original two datasets? 1 row compared to 10 rows in the original dataset all columns from the two datasets

cats_uk_small %>% inner_join(cats_uk_reference_small, by = c("tag_id"))

## # A tibble: 1 × 5
##   tag_id           event_id location_long animal_taxon hunt 
##   <chr>               <dbl>         <dbl> <chr>        <lgl>
## 1 Fairclough-Tag 3766102907         -5.18 Felis catus  FALSE

4. left_join

Describe the resulting data:

Columns: tag_id, event_id, location_long, animal_taxon, hunt
Rows: 10 rows

How is it different from the original two datasets? *5 columns as compared to 3 columns in the original dataset

cats_uk_small %>% left_join(cats_uk_reference_small, by = c("tag_id"))

## # A tibble: 10 × 5
##    tag_id           event_id location_long animal_taxon hunt 
##    <chr>               <dbl>         <dbl> <chr>        <lgl>
##  1 Ernie-Tag      3507105313         -4.64 <NA>         NA   
##  2 Bits-Tag       3544803591         -4.92 <NA>         NA   
##  3 Amber-Tag      3507104912         -4.64 <NA>         NA   
##  4 Bits-Tag       3544803661         -4.92 <NA>         NA   
##  5 Smudge-Tag     3637402232         -5.08 <NA>         NA   
##  6 Jago           3396159641         -5.07 <NA>         NA   
##  7 Fairclough-Tag 3766102907         -5.18 Felis catus  FALSE
##  8 Frank_2-Tag    3672007720         -5.54 <NA>         NA   
##  9 Charlie        3403118802         -5.08 <NA>         NA   
## 10 Tilly-Tag      3716217172         -5.30 <NA>         NA

5. right_join

Describe the resulting data:

Columns: tag_id, event_id, location_long, animal_taxon, hunt
Rows: 10 rows

How is it different from the original two datasets? *5 columns as compared to 3 columns in the original dataset

cats_uk_small %>% right_join(cats_uk_reference_small, by = c("tag_id"))

## # A tibble: 10 × 5
##    tag_id           event_id location_long animal_taxon hunt 
##    <chr>               <dbl>         <dbl> <chr>        <lgl>
##  1 Fairclough-Tag 3766102907         -5.18 Felis catus  FALSE
##  2 Lola                   NA         NA    Felis catus  TRUE 
##  3 Millie-Tag             NA         NA    Felis catus  TRUE 
##  4 Jim-Tag                NA         NA    Felis catus  TRUE 
##  5 Siberia-Tag            NA         NA    Felis catus  TRUE 
##  6 Reggie-Tag             NA         NA    Felis catus  TRUE 
##  7 Dexter2-Tag            NA         NA    Felis catus  FALSE
##  8 Fonzie-Tag             NA         NA    Felis catus  TRUE 
##  9 Frank-Tag              NA         NA    Felis catus  TRUE 
## 10 Freya-Tag              NA         NA    Felis catus  TRUE

6. full_join

Describe the resulting data:

Columns: tag_id, event_id, location_long, animal_taxon, hunt
Rows: 19 rows

How is it different from the original two datasets? *19 rows as compared to 10 rows in the original dataset

cats_uk_small %>% full_join(cats_uk_reference_small, by = c("tag_id"))

## # A tibble: 19 × 5
##    tag_id           event_id location_long animal_taxon hunt 
##    <chr>               <dbl>         <dbl> <chr>        <lgl>
##  1 Ernie-Tag      3507105313         -4.64 <NA>         NA   
##  2 Bits-Tag       3544803591         -4.92 <NA>         NA   
##  3 Amber-Tag      3507104912         -4.64 <NA>         NA   
##  4 Bits-Tag       3544803661         -4.92 <NA>         NA   
##  5 Smudge-Tag     3637402232         -5.08 <NA>         NA   
##  6 Jago           3396159641         -5.07 <NA>         NA   
##  7 Fairclough-Tag 3766102907         -5.18 Felis catus  FALSE
##  8 Frank_2-Tag    3672007720         -5.54 <NA>         NA   
##  9 Charlie        3403118802         -5.08 <NA>         NA   
## 10 Tilly-Tag      3716217172         -5.30 <NA>         NA   
## 11 Lola                   NA         NA    Felis catus  TRUE 
## 12 Millie-Tag             NA         NA    Felis catus  TRUE 
## 13 Jim-Tag                NA         NA    Felis catus  TRUE 
## 14 Siberia-Tag            NA         NA    Felis catus  TRUE 
## 15 Reggie-Tag             NA         NA    Felis catus  TRUE 
## 16 Dexter2-Tag            NA         NA    Felis catus  FALSE
## 17 Fonzie-Tag             NA         NA    Felis catus  TRUE 
## 18 Frank-Tag              NA         NA    Felis catus  TRUE 
## 19 Freya-Tag              NA         NA    Felis catus  TRUE

7. semi_join

Describe the resulting data:

Columns: tag_id, event_id, location_long
Rows: 1

How is it different from the original two datasets? * 1 row compared to 10 rows in original dataset

cats_uk_small %>% semi_join(cats_uk_reference_small, by = c("tag_id"))

## # A tibble: 1 × 3
##   tag_id           event_id location_long
##   <chr>               <dbl>         <dbl>
## 1 Fairclough-Tag 3766102907         -5.18

8. anti_join

Describe the resulting data:

Columns: tag_id, event_id, location_long
Rows: 9 rows

How is it different from the original two datasets? * 9 rows as compared to 10 rows in original dataset

cats_uk_small %>% anti_join(cats_uk_reference_small, by = c("tag_id"))

## # A tibble: 9 × 3
##   tag_id        event_id location_long
##   <chr>            <dbl>         <dbl>
## 1 Ernie-Tag   3507105313         -4.64
## 2 Bits-Tag    3544803591         -4.92
## 3 Amber-Tag   3507104912         -4.64
## 4 Bits-Tag    3544803661         -4.92
## 5 Smudge-Tag  3637402232         -5.08
## 6 Jago        3396159641         -5.07
## 7 Frank_2-Tag 3672007720         -5.54
## 8 Charlie     3403118802         -5.08
## 9 Tilly-Tag   3716217172         -5.30

Week 9: Apply it to your data 8

Michael Kulig

2024-03-28

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join