Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

simpsons_characters <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-04/simpsons_characters.csv')

## Rows: 6722 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): name, normalized_name, gender
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

simpsons_locations <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-04/simpsons_locations.csv')

## Rows: 4459 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): name, normalized_name
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data 1: characters

Columns: id, name, normalized_name
Rows: 10 rows

Data 2: locations

Columns: id, name, normalized_name
Rows: 10 rows

set.seed(1234)
simpson_chr_small <- simpsons_characters %>% select(id, name, normalized_name) %>% sample_n(10)
simpson_loc_small <- simpsons_locations %>% select(id, name, normalized_name) %>% sample_n(10)

simpson_chr_small

## # A tibble: 10 × 3
##       id name              normalized_name  
##    <dbl> <chr>             <chr>            
##  1  1027 Raheem            raheem           
##  2   651 Bernard           bernard          
##  3  2738 Red's Friend #2   reds friend 2    
##  4   962 Pig               pig              
##  5  4562 Spanish Sailor    spanish sailor   
##  6  2996 Tree Jockey       tree jockey      
##  7  2186 Fat Convict       fat convict      
##  8  3224 Ring Bearer       ring bearer      
##  9  2818 CANADIAN WOMAN    canadian woman   
## 10  5802 2nd Male Animator 2nd male animator

simpson_loc_small

## # A tibble: 10 × 3
##       id name                  normalized_name      
##    <dbl> <chr>                 <chr>                
##  1  2373 FLAMING RUINS OF TROY flaming ruins of troy
##  2  1100 DETENTION AREA        detention area       
##  3  4046 Bohemian Art Gallery  bohemian art gallery 
##  4  4366 THE RELATION SHIP     the relation ship    
##  5  3454 CONCRETE              concrete             
##  6  2230 African City          african city         
##  7  2621 ENGLISH MEADOW        english meadow       
##  8  3972 OUTER CONCOURSE       outer concourse      
##  9  1682 PARIS STREET          paris street         
## 10  2599 COUNSELOR'S OFFICE    counselor office

3. inner_join

Describe the resulting data:

Columns: id, name, normalized_name
Rows: 0

How is it different from the original two datasets?

This data set has 0 rows unlike the original

simpson_loc_small %>% inner_join(simpson_chr_small)

## Joining with `by = join_by(id, name, normalized_name)`

## # A tibble: 0 × 3
## # ℹ 3 variables: id <dbl>, name <chr>, normalized_name <chr>

4. left_join

Describe the resulting data:

Columns: id, name, normalized_name
Rows: 10

How is it different from the original two datasets?

it displays the same data as the original simpson_chr_small

simpson_chr_small %>% left_join(simpson_loc_small)

## Joining with `by = join_by(id, name, normalized_name)`

## # A tibble: 10 × 3
##       id name              normalized_name  
##    <dbl> <chr>             <chr>            
##  1  1027 Raheem            raheem           
##  2   651 Bernard           bernard          
##  3  2738 Red's Friend #2   reds friend 2    
##  4   962 Pig               pig              
##  5  4562 Spanish Sailor    spanish sailor   
##  6  2996 Tree Jockey       tree jockey      
##  7  2186 Fat Convict       fat convict      
##  8  3224 Ring Bearer       ring bearer      
##  9  2818 CANADIAN WOMAN    canadian woman   
## 10  5802 2nd Male Animator 2nd male animator

5. right_join

Describe the resulting data:

Columns: id, name, normalized_name
Rows: 10

How is it different from the original two datasets?

it displays the same data as the original simpson_loc_small

simpson_chr_small %>% right_join(simpson_loc_small)

## Joining with `by = join_by(id, name, normalized_name)`

## # A tibble: 10 × 3
##       id name                  normalized_name      
##    <dbl> <chr>                 <chr>                
##  1  2373 FLAMING RUINS OF TROY flaming ruins of troy
##  2  1100 DETENTION AREA        detention area       
##  3  4046 Bohemian Art Gallery  bohemian art gallery 
##  4  4366 THE RELATION SHIP     the relation ship    
##  5  3454 CONCRETE              concrete             
##  6  2230 African City          african city         
##  7  2621 ENGLISH MEADOW        english meadow       
##  8  3972 OUTER CONCOURSE       outer concourse      
##  9  1682 PARIS STREET          paris street         
## 10  2599 COUNSELOR'S OFFICE    counselor office

6. full_join

Describe the resulting data:

Columns: id, name, normalized_name
Rows: 20

How is it different from the original two datasets?

it displays the double data as the originals combining simpson_loc_small and simpson_chr_small

simpson_chr_small %>% full_join(simpson_loc_small)

## Joining with `by = join_by(id, name, normalized_name)`

## # A tibble: 20 × 3
##       id name                  normalized_name      
##    <dbl> <chr>                 <chr>                
##  1  1027 Raheem                raheem               
##  2   651 Bernard               bernard              
##  3  2738 Red's Friend #2       reds friend 2        
##  4   962 Pig                   pig                  
##  5  4562 Spanish Sailor        spanish sailor       
##  6  2996 Tree Jockey           tree jockey          
##  7  2186 Fat Convict           fat convict          
##  8  3224 Ring Bearer           ring bearer          
##  9  2818 CANADIAN WOMAN        canadian woman       
## 10  5802 2nd Male Animator     2nd male animator    
## 11  2373 FLAMING RUINS OF TROY flaming ruins of troy
## 12  1100 DETENTION AREA        detention area       
## 13  4046 Bohemian Art Gallery  bohemian art gallery 
## 14  4366 THE RELATION SHIP     the relation ship    
## 15  3454 CONCRETE              concrete             
## 16  2230 African City          african city         
## 17  2621 ENGLISH MEADOW        english meadow       
## 18  3972 OUTER CONCOURSE       outer concourse      
## 19  1682 PARIS STREET          paris street         
## 20  2599 COUNSELOR'S OFFICE    counselor office

7. semi_join

Describe the resulting data:

Columns: id, name, normalized_name
Rows: 0

How is it different from the original two datasets?

This data set has 0 rows unlike the original

simpson_chr_small %>% semi_join(simpson_loc_small)

## Joining with `by = join_by(id, name, normalized_name)`

## # A tibble: 0 × 3
## # ℹ 3 variables: id <dbl>, name <chr>, normalized_name <chr>

8. anti_join

Describe the resulting data:

Columns: id, name, normalized_name
Rows: 10

How is it different from the original two datasets?

it displays the same data as the original simpson_chr_small

simpson_chr_small %>% anti_join(simpson_loc_small)

## Joining with `by = join_by(id, name, normalized_name)`

## # A tibble: 10 × 3
##       id name              normalized_name  
##    <dbl> <chr>             <chr>            
##  1  1027 Raheem            raheem           
##  2   651 Bernard           bernard          
##  3  2738 Red's Friend #2   reds friend 2    
##  4   962 Pig               pig              
##  5  4562 Spanish Sailor    spanish sailor   
##  6  2996 Tree Jockey       tree jockey      
##  7  2186 Fat Convict       fat convict      
##  8  3224 Ring Bearer       ring bearer      
##  9  2818 CANADIAN WOMAN    canadian woman   
## 10  5802 2nd Male Animator 2nd male animator

Week 9: Apply it to your data 8

James Hannigan

2025-06-12

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join