Import two related datasets from TidyTuesday Project.
simpsons_characters <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-04/simpsons_characters.csv')
## Rows: 6722 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): name, normalized_name, gender
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
simpsons_locations <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-02-04/simpsons_locations.csv')
## Rows: 4459 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): name, normalized_name
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data 1: characters
Data 2: locations
set.seed(1234)
simpson_chr_small <- simpsons_characters %>% select(id, name, normalized_name) %>% sample_n(10)
simpson_loc_small <- simpsons_locations %>% select(id, name, normalized_name) %>% sample_n(10)
simpson_chr_small
## # A tibble: 10 × 3
## id name normalized_name
## <dbl> <chr> <chr>
## 1 1027 Raheem raheem
## 2 651 Bernard bernard
## 3 2738 Red's Friend #2 reds friend 2
## 4 962 Pig pig
## 5 4562 Spanish Sailor spanish sailor
## 6 2996 Tree Jockey tree jockey
## 7 2186 Fat Convict fat convict
## 8 3224 Ring Bearer ring bearer
## 9 2818 CANADIAN WOMAN canadian woman
## 10 5802 2nd Male Animator 2nd male animator
simpson_loc_small
## # A tibble: 10 × 3
## id name normalized_name
## <dbl> <chr> <chr>
## 1 2373 FLAMING RUINS OF TROY flaming ruins of troy
## 2 1100 DETENTION AREA detention area
## 3 4046 Bohemian Art Gallery bohemian art gallery
## 4 4366 THE RELATION SHIP the relation ship
## 5 3454 CONCRETE concrete
## 6 2230 African City african city
## 7 2621 ENGLISH MEADOW english meadow
## 8 3972 OUTER CONCOURSE outer concourse
## 9 1682 PARIS STREET paris street
## 10 2599 COUNSELOR'S OFFICE counselor office
Describe the resulting data:
How is it different from the original two datasets?
This data set has 0 rows unlike the original
simpson_loc_small %>% inner_join(simpson_chr_small)
## Joining with `by = join_by(id, name, normalized_name)`
## # A tibble: 0 × 3
## # ℹ 3 variables: id <dbl>, name <chr>, normalized_name <chr>
Describe the resulting data:
How is it different from the original two datasets?
it displays the same data as the original simpson_chr_small
simpson_chr_small %>% left_join(simpson_loc_small)
## Joining with `by = join_by(id, name, normalized_name)`
## # A tibble: 10 × 3
## id name normalized_name
## <dbl> <chr> <chr>
## 1 1027 Raheem raheem
## 2 651 Bernard bernard
## 3 2738 Red's Friend #2 reds friend 2
## 4 962 Pig pig
## 5 4562 Spanish Sailor spanish sailor
## 6 2996 Tree Jockey tree jockey
## 7 2186 Fat Convict fat convict
## 8 3224 Ring Bearer ring bearer
## 9 2818 CANADIAN WOMAN canadian woman
## 10 5802 2nd Male Animator 2nd male animator
Describe the resulting data:
How is it different from the original two datasets?
it displays the same data as the original simpson_loc_small
simpson_chr_small %>% right_join(simpson_loc_small)
## Joining with `by = join_by(id, name, normalized_name)`
## # A tibble: 10 × 3
## id name normalized_name
## <dbl> <chr> <chr>
## 1 2373 FLAMING RUINS OF TROY flaming ruins of troy
## 2 1100 DETENTION AREA detention area
## 3 4046 Bohemian Art Gallery bohemian art gallery
## 4 4366 THE RELATION SHIP the relation ship
## 5 3454 CONCRETE concrete
## 6 2230 African City african city
## 7 2621 ENGLISH MEADOW english meadow
## 8 3972 OUTER CONCOURSE outer concourse
## 9 1682 PARIS STREET paris street
## 10 2599 COUNSELOR'S OFFICE counselor office
Describe the resulting data:
How is it different from the original two datasets?
it displays the double data as the originals combining simpson_loc_small and simpson_chr_small
simpson_chr_small %>% full_join(simpson_loc_small)
## Joining with `by = join_by(id, name, normalized_name)`
## # A tibble: 20 × 3
## id name normalized_name
## <dbl> <chr> <chr>
## 1 1027 Raheem raheem
## 2 651 Bernard bernard
## 3 2738 Red's Friend #2 reds friend 2
## 4 962 Pig pig
## 5 4562 Spanish Sailor spanish sailor
## 6 2996 Tree Jockey tree jockey
## 7 2186 Fat Convict fat convict
## 8 3224 Ring Bearer ring bearer
## 9 2818 CANADIAN WOMAN canadian woman
## 10 5802 2nd Male Animator 2nd male animator
## 11 2373 FLAMING RUINS OF TROY flaming ruins of troy
## 12 1100 DETENTION AREA detention area
## 13 4046 Bohemian Art Gallery bohemian art gallery
## 14 4366 THE RELATION SHIP the relation ship
## 15 3454 CONCRETE concrete
## 16 2230 African City african city
## 17 2621 ENGLISH MEADOW english meadow
## 18 3972 OUTER CONCOURSE outer concourse
## 19 1682 PARIS STREET paris street
## 20 2599 COUNSELOR'S OFFICE counselor office
Describe the resulting data:
How is it different from the original two datasets?
This data set has 0 rows unlike the original
simpson_chr_small %>% semi_join(simpson_loc_small)
## Joining with `by = join_by(id, name, normalized_name)`
## # A tibble: 0 × 3
## # ℹ 3 variables: id <dbl>, name <chr>, normalized_name <chr>
Describe the resulting data:
How is it different from the original two datasets?
it displays the same data as the original simpson_chr_small
simpson_chr_small %>% anti_join(simpson_loc_small)
## Joining with `by = join_by(id, name, normalized_name)`
## # A tibble: 10 × 3
## id name normalized_name
## <dbl> <chr> <chr>
## 1 1027 Raheem raheem
## 2 651 Bernard bernard
## 3 2738 Red's Friend #2 reds friend 2
## 4 962 Pig pig
## 5 4562 Spanish Sailor spanish sailor
## 6 2996 Tree Jockey tree jockey
## 7 2186 Fat Convict fat convict
## 8 3224 Ring Bearer ring bearer
## 9 2818 CANADIAN WOMAN canadian woman
## 10 5802 2nd Male Animator 2nd male animator