Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

# csv file
myData <- read_csv("../00_data/myData.csv")

## Rows: 27 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): film, film_rating
## dbl  (2): number, run_time
## date (1): release_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

movie_profit <- read_csv("../00_data/movie_profit.csv", col_types = cols(release_date = col_date(format = "%m/%d/%Y")))

## New names:
## • `` -> `...1`

2. Make data small

Describe the two datasets: movie ratings

Data1: Movie_Profit

Columns: movie, release_date, mpaa_rating
Rows: 10

Data2: MyData

Columns: film, release_date, film_rating
Rows: 10

set.seed(1236)
movie_profit_small <- movie_profit %>%
  select(movie, release_date, mpaa_rating) %>%
  sample_n(10)
movie_profit_small<- movie_profit_small%>%
    rename(film_rating= mpaa_rating)
myData_small <- myData %>%
    select(film, release_date, film_rating) %>%
    sample_n(10)

movie_profit_small

## # A tibble: 10 × 3
##    movie                            release_date film_rating
##    <chr>                            <date>       <chr>      
##  1 Porky's                          1982-03-19   R          
##  2 Home                             2015-03-27   PG         
##  3 A Simple Wish                    1997-07-11   PG         
##  4 Big Daddy                        1999-06-25   PG-13      
##  5 I Feel Pretty                    2018-04-20   PG-13      
##  6 Spirit: Stallion of the Cimarron 2002-05-24   G          
##  7 Small Apartments                 2013-02-08   R          
##  8 Sherlock Gnomes                  2018-03-23   PG         
##  9 Fireflies in the Garden          2011-10-14   R          
## 10 Full Frontal                     2002-08-02   R

myData_small

## # A tibble: 10 × 3
##    film         release_date film_rating
##    <chr>        <date>       <chr>      
##  1 Finding Dory 2016-06-17   PG         
##  2 Cars 3       2017-06-16   G          
##  3 Cars 2       2011-06-24   G          
##  4 Toy Story 3  2010-06-18   G          
##  5 Onward       2020-03-06   PG         
##  6 Finding Nemo 2003-05-30   G          
##  7 Toy Story    1995-11-22   G          
##  8 Luca         2021-06-18   N/A        
##  9 Cars         2006-06-09   G          
## 10 WALL-E       2008-06-27   G

3. inner_join

Describe the resulting data: its matching movies with the same rating * Columns: * Rows:

How is it different from the original two datasets?

myData_small%>% 
    inner_join(movie_profit_small, by = c("film_rating"))

## Warning in inner_join(., movie_profit_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 6 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## # A tibble: 13 × 5
##    film         release_date.x film_rating movie                  release_date.y
##    <chr>        <date>         <chr>       <chr>                  <date>        
##  1 Finding Dory 2016-06-17     PG          Home                   2015-03-27    
##  2 Finding Dory 2016-06-17     PG          A Simple Wish          1997-07-11    
##  3 Finding Dory 2016-06-17     PG          Sherlock Gnomes        2018-03-23    
##  4 Cars 3       2017-06-16     G           Spirit: Stallion of t… 2002-05-24    
##  5 Cars 2       2011-06-24     G           Spirit: Stallion of t… 2002-05-24    
##  6 Toy Story 3  2010-06-18     G           Spirit: Stallion of t… 2002-05-24    
##  7 Onward       2020-03-06     PG          Home                   2015-03-27    
##  8 Onward       2020-03-06     PG          A Simple Wish          1997-07-11    
##  9 Onward       2020-03-06     PG          Sherlock Gnomes        2018-03-23    
## 10 Finding Nemo 2003-05-30     G           Spirit: Stallion of t… 2002-05-24    
## 11 Toy Story    1995-11-22     G           Spirit: Stallion of t… 2002-05-24    
## 12 Cars         2006-06-09     G           Spirit: Stallion of t… 2002-05-24    
## 13 WALL-E       2008-06-27     G           Spirit: Stallion of t… 2002-05-24

4. left_join

Describe the resulting data:taking ratings from movie_profit, and joining with corresponding ones from myData * Columns: * Rows:

How is it different from the original two datasets?

movie_profit_small%>% left_join(myData_small)

## Joining with `by = join_by(release_date, film_rating)`

## # A tibble: 10 × 4
##    movie                            release_date film_rating film 
##    <chr>                            <date>       <chr>       <chr>
##  1 Porky's                          1982-03-19   R           <NA> 
##  2 Home                             2015-03-27   PG          <NA> 
##  3 A Simple Wish                    1997-07-11   PG          <NA> 
##  4 Big Daddy                        1999-06-25   PG-13       <NA> 
##  5 I Feel Pretty                    2018-04-20   PG-13       <NA> 
##  6 Spirit: Stallion of the Cimarron 2002-05-24   G           <NA> 
##  7 Small Apartments                 2013-02-08   R           <NA> 
##  8 Sherlock Gnomes                  2018-03-23   PG          <NA> 
##  9 Fireflies in the Garden          2011-10-14   R           <NA> 
## 10 Full Frontal                     2002-08-02   R           <NA>

movie_profit_small%>% left_join(myData_small, by = c("film_rating"))

## Warning in left_join(., myData_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## # A tibble: 19 × 5
##    movie                         release_date.x film_rating film  release_date.y
##    <chr>                         <date>         <chr>       <chr> <date>        
##  1 Porky's                       1982-03-19     R           <NA>  NA            
##  2 Home                          2015-03-27     PG          Find… 2016-06-17    
##  3 Home                          2015-03-27     PG          Onwa… 2020-03-06    
##  4 A Simple Wish                 1997-07-11     PG          Find… 2016-06-17    
##  5 A Simple Wish                 1997-07-11     PG          Onwa… 2020-03-06    
##  6 Big Daddy                     1999-06-25     PG-13       <NA>  NA            
##  7 I Feel Pretty                 2018-04-20     PG-13       <NA>  NA            
##  8 Spirit: Stallion of the Cima… 2002-05-24     G           Cars… 2017-06-16    
##  9 Spirit: Stallion of the Cima… 2002-05-24     G           Cars… 2011-06-24    
## 10 Spirit: Stallion of the Cima… 2002-05-24     G           Toy … 2010-06-18    
## 11 Spirit: Stallion of the Cima… 2002-05-24     G           Find… 2003-05-30    
## 12 Spirit: Stallion of the Cima… 2002-05-24     G           Toy … 1995-11-22    
## 13 Spirit: Stallion of the Cima… 2002-05-24     G           Cars  2006-06-09    
## 14 Spirit: Stallion of the Cima… 2002-05-24     G           WALL… 2008-06-27    
## 15 Small Apartments              2013-02-08     R           <NA>  NA            
## 16 Sherlock Gnomes               2018-03-23     PG          Find… 2016-06-17    
## 17 Sherlock Gnomes               2018-03-23     PG          Onwa… 2020-03-06    
## 18 Fireflies in the Garden       2011-10-14     R           <NA>  NA            
## 19 Full Frontal                  2002-08-02     R           <NA>  NA

5. right_join

Describe the resulting data: taking movie_profit data, and joining with the ratings from myData
* Columns: * Rows:

How is it different from the original two datasets?

movie_profit_small%>% right_join(myData_small)

## Joining with `by = join_by(release_date, film_rating)`

## # A tibble: 10 × 4
##    movie release_date film_rating film        
##    <chr> <date>       <chr>       <chr>       
##  1 <NA>  2016-06-17   PG          Finding Dory
##  2 <NA>  2017-06-16   G           Cars 3      
##  3 <NA>  2011-06-24   G           Cars 2      
##  4 <NA>  2010-06-18   G           Toy Story 3 
##  5 <NA>  2020-03-06   PG          Onward      
##  6 <NA>  2003-05-30   G           Finding Nemo
##  7 <NA>  1995-11-22   G           Toy Story   
##  8 <NA>  2021-06-18   N/A         Luca        
##  9 <NA>  2006-06-09   G           Cars        
## 10 <NA>  2008-06-27   G           WALL-E

movie_profit_small%>% right_join(myData_small, by = c("film_rating"))

## Warning in right_join(., myData_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## # A tibble: 14 × 5
##    movie                         release_date.x film_rating film  release_date.y
##    <chr>                         <date>         <chr>       <chr> <date>        
##  1 Home                          2015-03-27     PG          Find… 2016-06-17    
##  2 Home                          2015-03-27     PG          Onwa… 2020-03-06    
##  3 A Simple Wish                 1997-07-11     PG          Find… 2016-06-17    
##  4 A Simple Wish                 1997-07-11     PG          Onwa… 2020-03-06    
##  5 Spirit: Stallion of the Cima… 2002-05-24     G           Cars… 2017-06-16    
##  6 Spirit: Stallion of the Cima… 2002-05-24     G           Cars… 2011-06-24    
##  7 Spirit: Stallion of the Cima… 2002-05-24     G           Toy … 2010-06-18    
##  8 Spirit: Stallion of the Cima… 2002-05-24     G           Find… 2003-05-30    
##  9 Spirit: Stallion of the Cima… 2002-05-24     G           Toy … 1995-11-22    
## 10 Spirit: Stallion of the Cima… 2002-05-24     G           Cars  2006-06-09    
## 11 Spirit: Stallion of the Cima… 2002-05-24     G           WALL… 2008-06-27    
## 12 Sherlock Gnomes               2018-03-23     PG          Find… 2016-06-17    
## 13 Sherlock Gnomes               2018-03-23     PG          Onwa… 2020-03-06    
## 14 <NA>                          NA             N/A         Luca  2021-06-18

6. full_join

Describe the resulting data:keeps and matches all observations and if no other key matches- filled with N/A
* Columns: * Rows:

How is it different from the original two datasets?

movie_profit_small%>% full_join(myData_small)

## Joining with `by = join_by(release_date, film_rating)`

## # A tibble: 20 × 4
##    movie                            release_date film_rating film        
##    <chr>                            <date>       <chr>       <chr>       
##  1 Porky's                          1982-03-19   R           <NA>        
##  2 Home                             2015-03-27   PG          <NA>        
##  3 A Simple Wish                    1997-07-11   PG          <NA>        
##  4 Big Daddy                        1999-06-25   PG-13       <NA>        
##  5 I Feel Pretty                    2018-04-20   PG-13       <NA>        
##  6 Spirit: Stallion of the Cimarron 2002-05-24   G           <NA>        
##  7 Small Apartments                 2013-02-08   R           <NA>        
##  8 Sherlock Gnomes                  2018-03-23   PG          <NA>        
##  9 Fireflies in the Garden          2011-10-14   R           <NA>        
## 10 Full Frontal                     2002-08-02   R           <NA>        
## 11 <NA>                             2016-06-17   PG          Finding Dory
## 12 <NA>                             2017-06-16   G           Cars 3      
## 13 <NA>                             2011-06-24   G           Cars 2      
## 14 <NA>                             2010-06-18   G           Toy Story 3 
## 15 <NA>                             2020-03-06   PG          Onward      
## 16 <NA>                             2003-05-30   G           Finding Nemo
## 17 <NA>                             1995-11-22   G           Toy Story   
## 18 <NA>                             2021-06-18   N/A         Luca        
## 19 <NA>                             2006-06-09   G           Cars        
## 20 <NA>                             2008-06-27   G           WALL-E

myData_small%>% full_join(movie_profit_small, by = c("film_rating"))

## Warning in full_join(., movie_profit_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 6 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

## # A tibble: 20 × 5
##    film         release_date.x film_rating movie                  release_date.y
##    <chr>        <date>         <chr>       <chr>                  <date>        
##  1 Finding Dory 2016-06-17     PG          Home                   2015-03-27    
##  2 Finding Dory 2016-06-17     PG          A Simple Wish          1997-07-11    
##  3 Finding Dory 2016-06-17     PG          Sherlock Gnomes        2018-03-23    
##  4 Cars 3       2017-06-16     G           Spirit: Stallion of t… 2002-05-24    
##  5 Cars 2       2011-06-24     G           Spirit: Stallion of t… 2002-05-24    
##  6 Toy Story 3  2010-06-18     G           Spirit: Stallion of t… 2002-05-24    
##  7 Onward       2020-03-06     PG          Home                   2015-03-27    
##  8 Onward       2020-03-06     PG          A Simple Wish          1997-07-11    
##  9 Onward       2020-03-06     PG          Sherlock Gnomes        2018-03-23    
## 10 Finding Nemo 2003-05-30     G           Spirit: Stallion of t… 2002-05-24    
## 11 Toy Story    1995-11-22     G           Spirit: Stallion of t… 2002-05-24    
## 12 Luca         2021-06-18     N/A         <NA>                   NA            
## 13 Cars         2006-06-09     G           Spirit: Stallion of t… 2002-05-24    
## 14 WALL-E       2008-06-27     G           Spirit: Stallion of t… 2002-05-24    
## 15 <NA>         NA             R           Porky's                1982-03-19    
## 16 <NA>         NA             PG-13       Big Daddy              1999-06-25    
## 17 <NA>         NA             PG-13       I Feel Pretty          2018-04-20    
## 18 <NA>         NA             R           Small Apartments       2013-02-08    
## 19 <NA>         NA             R           Fireflies in the Gard… 2011-10-14    
## 20 <NA>         NA             R           Full Frontal           2002-08-02

7. semi_join

Describe the resulting data:keeps all observations in movie_profit that have a raiting match in myData . * Columns: * Rows:

How is it different from the original two datasets?

movie_profit_small%>% semi_join(myData_small, by = c("film_rating"))

## # A tibble: 4 × 3
##   movie                            release_date film_rating
##   <chr>                            <date>       <chr>      
## 1 Home                             2015-03-27   PG         
## 2 A Simple Wish                    1997-07-11   PG         
## 3 Spirit: Stallion of the Cimarron 2002-05-24   G          
## 4 Sherlock Gnomes                  2018-03-23   PG

8. anti_join

Describe the resulting data:shows all observations in movie_profit that dont have a match in myData. * Columns: * Rows:

How is it different from the original two datasets?

movie_profit_small%>% anti_join(myData_small, by = c("film_rating"))

## # A tibble: 6 × 3
##   movie                   release_date film_rating
##   <chr>                   <date>       <chr>      
## 1 Porky's                 1982-03-19   R          
## 2 Big Daddy               1999-06-25   PG-13      
## 3 I Feel Pretty           2018-04-20   PG-13      
## 4 Small Apartments        2013-02-08   R          
## 5 Fireflies in the Garden 2011-10-14   R          
## 6 Full Frontal            2002-08-02   R

Week 9: Apply it to your data 8

Daniel Lee

2025-11-12

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join