Import two related datasets from TidyTuesday Project.
# csv file
myData <- read_csv("../00_data/myData.csv")
## Rows: 27 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): film, film_rating
## dbl (2): number, run_time
## date (1): release_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
movie_profit <- read_csv("../00_data/movie_profit.csv", col_types = cols(release_date = col_date(format = "%m/%d/%Y")))
## New names:
## • `` -> `...1`
Describe the two datasets: movie ratings
Data1: Movie_Profit
Data2: MyData
set.seed(1236)
movie_profit_small <- movie_profit %>%
select(movie, release_date, mpaa_rating) %>%
sample_n(10)
movie_profit_small<- movie_profit_small%>%
rename(film_rating= mpaa_rating)
myData_small <- myData %>%
select(film, release_date, film_rating) %>%
sample_n(10)
movie_profit_small
## # A tibble: 10 × 3
## movie release_date film_rating
## <chr> <date> <chr>
## 1 Porky's 1982-03-19 R
## 2 Home 2015-03-27 PG
## 3 A Simple Wish 1997-07-11 PG
## 4 Big Daddy 1999-06-25 PG-13
## 5 I Feel Pretty 2018-04-20 PG-13
## 6 Spirit: Stallion of the Cimarron 2002-05-24 G
## 7 Small Apartments 2013-02-08 R
## 8 Sherlock Gnomes 2018-03-23 PG
## 9 Fireflies in the Garden 2011-10-14 R
## 10 Full Frontal 2002-08-02 R
myData_small
## # A tibble: 10 × 3
## film release_date film_rating
## <chr> <date> <chr>
## 1 Finding Dory 2016-06-17 PG
## 2 Cars 3 2017-06-16 G
## 3 Cars 2 2011-06-24 G
## 4 Toy Story 3 2010-06-18 G
## 5 Onward 2020-03-06 PG
## 6 Finding Nemo 2003-05-30 G
## 7 Toy Story 1995-11-22 G
## 8 Luca 2021-06-18 N/A
## 9 Cars 2006-06-09 G
## 10 WALL-E 2008-06-27 G
Describe the resulting data: its matching movies with the same rating * Columns: * Rows:
How is it different from the original two datasets?
myData_small%>%
inner_join(movie_profit_small, by = c("film_rating"))
## Warning in inner_join(., movie_profit_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 6 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 13 × 5
## film release_date.x film_rating movie release_date.y
## <chr> <date> <chr> <chr> <date>
## 1 Finding Dory 2016-06-17 PG Home 2015-03-27
## 2 Finding Dory 2016-06-17 PG A Simple Wish 1997-07-11
## 3 Finding Dory 2016-06-17 PG Sherlock Gnomes 2018-03-23
## 4 Cars 3 2017-06-16 G Spirit: Stallion of t… 2002-05-24
## 5 Cars 2 2011-06-24 G Spirit: Stallion of t… 2002-05-24
## 6 Toy Story 3 2010-06-18 G Spirit: Stallion of t… 2002-05-24
## 7 Onward 2020-03-06 PG Home 2015-03-27
## 8 Onward 2020-03-06 PG A Simple Wish 1997-07-11
## 9 Onward 2020-03-06 PG Sherlock Gnomes 2018-03-23
## 10 Finding Nemo 2003-05-30 G Spirit: Stallion of t… 2002-05-24
## 11 Toy Story 1995-11-22 G Spirit: Stallion of t… 2002-05-24
## 12 Cars 2006-06-09 G Spirit: Stallion of t… 2002-05-24
## 13 WALL-E 2008-06-27 G Spirit: Stallion of t… 2002-05-24
Describe the resulting data:taking ratings from movie_profit, and joining with corresponding ones from myData * Columns: * Rows:
How is it different from the original two datasets?
movie_profit_small%>% left_join(myData_small)
## Joining with `by = join_by(release_date, film_rating)`
## # A tibble: 10 × 4
## movie release_date film_rating film
## <chr> <date> <chr> <chr>
## 1 Porky's 1982-03-19 R <NA>
## 2 Home 2015-03-27 PG <NA>
## 3 A Simple Wish 1997-07-11 PG <NA>
## 4 Big Daddy 1999-06-25 PG-13 <NA>
## 5 I Feel Pretty 2018-04-20 PG-13 <NA>
## 6 Spirit: Stallion of the Cimarron 2002-05-24 G <NA>
## 7 Small Apartments 2013-02-08 R <NA>
## 8 Sherlock Gnomes 2018-03-23 PG <NA>
## 9 Fireflies in the Garden 2011-10-14 R <NA>
## 10 Full Frontal 2002-08-02 R <NA>
movie_profit_small%>% left_join(myData_small, by = c("film_rating"))
## Warning in left_join(., myData_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 19 × 5
## movie release_date.x film_rating film release_date.y
## <chr> <date> <chr> <chr> <date>
## 1 Porky's 1982-03-19 R <NA> NA
## 2 Home 2015-03-27 PG Find… 2016-06-17
## 3 Home 2015-03-27 PG Onwa… 2020-03-06
## 4 A Simple Wish 1997-07-11 PG Find… 2016-06-17
## 5 A Simple Wish 1997-07-11 PG Onwa… 2020-03-06
## 6 Big Daddy 1999-06-25 PG-13 <NA> NA
## 7 I Feel Pretty 2018-04-20 PG-13 <NA> NA
## 8 Spirit: Stallion of the Cima… 2002-05-24 G Cars… 2017-06-16
## 9 Spirit: Stallion of the Cima… 2002-05-24 G Cars… 2011-06-24
## 10 Spirit: Stallion of the Cima… 2002-05-24 G Toy … 2010-06-18
## 11 Spirit: Stallion of the Cima… 2002-05-24 G Find… 2003-05-30
## 12 Spirit: Stallion of the Cima… 2002-05-24 G Toy … 1995-11-22
## 13 Spirit: Stallion of the Cima… 2002-05-24 G Cars 2006-06-09
## 14 Spirit: Stallion of the Cima… 2002-05-24 G WALL… 2008-06-27
## 15 Small Apartments 2013-02-08 R <NA> NA
## 16 Sherlock Gnomes 2018-03-23 PG Find… 2016-06-17
## 17 Sherlock Gnomes 2018-03-23 PG Onwa… 2020-03-06
## 18 Fireflies in the Garden 2011-10-14 R <NA> NA
## 19 Full Frontal 2002-08-02 R <NA> NA
Describe the resulting data: taking movie_profit data, and joining
with the ratings from myData
* Columns: * Rows:
How is it different from the original two datasets?
movie_profit_small%>% right_join(myData_small)
## Joining with `by = join_by(release_date, film_rating)`
## # A tibble: 10 × 4
## movie release_date film_rating film
## <chr> <date> <chr> <chr>
## 1 <NA> 2016-06-17 PG Finding Dory
## 2 <NA> 2017-06-16 G Cars 3
## 3 <NA> 2011-06-24 G Cars 2
## 4 <NA> 2010-06-18 G Toy Story 3
## 5 <NA> 2020-03-06 PG Onward
## 6 <NA> 2003-05-30 G Finding Nemo
## 7 <NA> 1995-11-22 G Toy Story
## 8 <NA> 2021-06-18 N/A Luca
## 9 <NA> 2006-06-09 G Cars
## 10 <NA> 2008-06-27 G WALL-E
movie_profit_small%>% right_join(myData_small, by = c("film_rating"))
## Warning in right_join(., myData_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 14 × 5
## movie release_date.x film_rating film release_date.y
## <chr> <date> <chr> <chr> <date>
## 1 Home 2015-03-27 PG Find… 2016-06-17
## 2 Home 2015-03-27 PG Onwa… 2020-03-06
## 3 A Simple Wish 1997-07-11 PG Find… 2016-06-17
## 4 A Simple Wish 1997-07-11 PG Onwa… 2020-03-06
## 5 Spirit: Stallion of the Cima… 2002-05-24 G Cars… 2017-06-16
## 6 Spirit: Stallion of the Cima… 2002-05-24 G Cars… 2011-06-24
## 7 Spirit: Stallion of the Cima… 2002-05-24 G Toy … 2010-06-18
## 8 Spirit: Stallion of the Cima… 2002-05-24 G Find… 2003-05-30
## 9 Spirit: Stallion of the Cima… 2002-05-24 G Toy … 1995-11-22
## 10 Spirit: Stallion of the Cima… 2002-05-24 G Cars 2006-06-09
## 11 Spirit: Stallion of the Cima… 2002-05-24 G WALL… 2008-06-27
## 12 Sherlock Gnomes 2018-03-23 PG Find… 2016-06-17
## 13 Sherlock Gnomes 2018-03-23 PG Onwa… 2020-03-06
## 14 <NA> NA N/A Luca 2021-06-18
Describe the resulting data:keeps and matches all observations and if no other
key matches- filled with N/A
* Columns: * Rows:
How is it different from the original two datasets?
movie_profit_small%>% full_join(myData_small)
## Joining with `by = join_by(release_date, film_rating)`
## # A tibble: 20 × 4
## movie release_date film_rating film
## <chr> <date> <chr> <chr>
## 1 Porky's 1982-03-19 R <NA>
## 2 Home 2015-03-27 PG <NA>
## 3 A Simple Wish 1997-07-11 PG <NA>
## 4 Big Daddy 1999-06-25 PG-13 <NA>
## 5 I Feel Pretty 2018-04-20 PG-13 <NA>
## 6 Spirit: Stallion of the Cimarron 2002-05-24 G <NA>
## 7 Small Apartments 2013-02-08 R <NA>
## 8 Sherlock Gnomes 2018-03-23 PG <NA>
## 9 Fireflies in the Garden 2011-10-14 R <NA>
## 10 Full Frontal 2002-08-02 R <NA>
## 11 <NA> 2016-06-17 PG Finding Dory
## 12 <NA> 2017-06-16 G Cars 3
## 13 <NA> 2011-06-24 G Cars 2
## 14 <NA> 2010-06-18 G Toy Story 3
## 15 <NA> 2020-03-06 PG Onward
## 16 <NA> 2003-05-30 G Finding Nemo
## 17 <NA> 1995-11-22 G Toy Story
## 18 <NA> 2021-06-18 N/A Luca
## 19 <NA> 2006-06-09 G Cars
## 20 <NA> 2008-06-27 G WALL-E
myData_small%>% full_join(movie_profit_small, by = c("film_rating"))
## Warning in full_join(., movie_profit_small, by = c("film_rating")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 6 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## # A tibble: 20 × 5
## film release_date.x film_rating movie release_date.y
## <chr> <date> <chr> <chr> <date>
## 1 Finding Dory 2016-06-17 PG Home 2015-03-27
## 2 Finding Dory 2016-06-17 PG A Simple Wish 1997-07-11
## 3 Finding Dory 2016-06-17 PG Sherlock Gnomes 2018-03-23
## 4 Cars 3 2017-06-16 G Spirit: Stallion of t… 2002-05-24
## 5 Cars 2 2011-06-24 G Spirit: Stallion of t… 2002-05-24
## 6 Toy Story 3 2010-06-18 G Spirit: Stallion of t… 2002-05-24
## 7 Onward 2020-03-06 PG Home 2015-03-27
## 8 Onward 2020-03-06 PG A Simple Wish 1997-07-11
## 9 Onward 2020-03-06 PG Sherlock Gnomes 2018-03-23
## 10 Finding Nemo 2003-05-30 G Spirit: Stallion of t… 2002-05-24
## 11 Toy Story 1995-11-22 G Spirit: Stallion of t… 2002-05-24
## 12 Luca 2021-06-18 N/A <NA> NA
## 13 Cars 2006-06-09 G Spirit: Stallion of t… 2002-05-24
## 14 WALL-E 2008-06-27 G Spirit: Stallion of t… 2002-05-24
## 15 <NA> NA R Porky's 1982-03-19
## 16 <NA> NA PG-13 Big Daddy 1999-06-25
## 17 <NA> NA PG-13 I Feel Pretty 2018-04-20
## 18 <NA> NA R Small Apartments 2013-02-08
## 19 <NA> NA R Fireflies in the Gard… 2011-10-14
## 20 <NA> NA R Full Frontal 2002-08-02
Describe the resulting data:keeps all observations in movie_profit that have a raiting match in myData . * Columns: * Rows:
How is it different from the original two datasets?
movie_profit_small%>% semi_join(myData_small, by = c("film_rating"))
## # A tibble: 4 × 3
## movie release_date film_rating
## <chr> <date> <chr>
## 1 Home 2015-03-27 PG
## 2 A Simple Wish 1997-07-11 PG
## 3 Spirit: Stallion of the Cimarron 2002-05-24 G
## 4 Sherlock Gnomes 2018-03-23 PG
Describe the resulting data:shows all observations in movie_profit that dont have a match in myData. * Columns: * Rows:
How is it different from the original two datasets?
movie_profit_small%>% anti_join(myData_small, by = c("film_rating"))
## # A tibble: 6 × 3
## movie release_date film_rating
## <chr> <date> <chr>
## 1 Porky's 1982-03-19 R
## 2 Big Daddy 1999-06-25 PG-13
## 3 I Feel Pretty 2018-04-20 PG-13
## 4 Small Apartments 2013-02-08 R
## 5 Fireflies in the Garden 2011-10-14 R
## 6 Full Frontal 2002-08-02 R