Import two related datasets from TidyTuesday Project.
attendance <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-02-04/attendance.csv')
## Rows: 10846 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): team, team_name
## dbl (6): year, total, home, away, week, weekly_attendance
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
standings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2020/2020-02-04/standings.csv')
## Rows: 638 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): team, team_name, playoffs, sb_winner
## dbl (11): year, wins, loss, points_for, points_against, points_differential,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: simpsons_characters_small
Data 2: simpsons_locations_small
set.seed(1234)
attendance_small <- attendance %>% select(team_name, year, week) %>% sample_n(10)
standings_small <- standings %>% select(team, year, wins) %>% sample_n(10)
attendance_small
## # A tibble: 10 × 3
## team_name year week
## <chr> <dbl> <dbl>
## 1 Steelers 2013 6
## 2 Chargers 2014 9
## 3 Browns 2013 5
## 4 Buccaneers 2014 11
## 5 Colts 2013 10
## 6 Titans 2016 16
## 7 Bears 2001 11
## 8 Steelers 2001 16
## 9 Chiefs 2005 7
## 10 Cardinals 2004 4
standings_small
## # A tibble: 10 × 3
## team year wins
## <chr> <dbl> <dbl>
## 1 Indianapolis 2003 12
## 2 Tampa Bay 2018 5
## 3 Cincinnati 2010 4
## 4 Philadelphia 2002 12
## 5 Kansas City 2008 2
## 6 St. Louis 2011 2
## 7 Carolina 2005 11
## 8 San Francisco 2017 6
## 9 Buffalo 2000 8
## 10 Tennessee 2017 9
Describe the resulting data:
How is it different from the original two datasets?
attendance_small %>% inner_join(standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 1 × 5
## team_name year week team wins
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Chiefs 2005 7 Carolina 11
Describe the resulting data:
How is it different from the original two datasets?
left_join(attendance_small, standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 10 × 5
## team_name year week team wins
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Steelers 2013 6 <NA> NA
## 2 Chargers 2014 9 <NA> NA
## 3 Browns 2013 5 <NA> NA
## 4 Buccaneers 2014 11 <NA> NA
## 5 Colts 2013 10 <NA> NA
## 6 Titans 2016 16 <NA> NA
## 7 Bears 2001 11 <NA> NA
## 8 Steelers 2001 16 <NA> NA
## 9 Chiefs 2005 7 Carolina 11
## 10 Cardinals 2004 4 <NA> NA
Describe the resulting data:
How is it different from the original two datasets?
right_join(attendance_small, standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 10 × 5
## team_name year week team wins
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Chiefs 2005 7 Carolina 11
## 2 <NA> 2003 NA Indianapolis 12
## 3 <NA> 2018 NA Tampa Bay 5
## 4 <NA> 2010 NA Cincinnati 4
## 5 <NA> 2002 NA Philadelphia 12
## 6 <NA> 2008 NA Kansas City 2
## 7 <NA> 2011 NA St. Louis 2
## 8 <NA> 2017 NA San Francisco 6
## 9 <NA> 2000 NA Buffalo 8
## 10 <NA> 2017 NA Tennessee 9
Describe the resulting data:
How is it different from the original two datasets?
full_join(standings_small, attendance_small, by = c("year"))
## # A tibble: 19 × 5
## team year wins team_name week
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Indianapolis 2003 12 <NA> NA
## 2 Tampa Bay 2018 5 <NA> NA
## 3 Cincinnati 2010 4 <NA> NA
## 4 Philadelphia 2002 12 <NA> NA
## 5 Kansas City 2008 2 <NA> NA
## 6 St. Louis 2011 2 <NA> NA
## 7 Carolina 2005 11 Chiefs 7
## 8 San Francisco 2017 6 <NA> NA
## 9 Buffalo 2000 8 <NA> NA
## 10 Tennessee 2017 9 <NA> NA
## 11 <NA> 2013 NA Steelers 6
## 12 <NA> 2014 NA Chargers 9
## 13 <NA> 2013 NA Browns 5
## 14 <NA> 2014 NA Buccaneers 11
## 15 <NA> 2013 NA Colts 10
## 16 <NA> 2016 NA Titans 16
## 17 <NA> 2001 NA Bears 11
## 18 <NA> 2001 NA Steelers 16
## 19 <NA> 2004 NA Cardinals 4
Describe the resulting data:
How is it different from the original two datasets?
semi_join(standings_small, attendance_small)
## Joining with `by = join_by(year)`
## # A tibble: 1 × 3
## team year wins
## <chr> <dbl> <dbl>
## 1 Carolina 2005 11
Describe the resulting data:
How is it different from the original two datasets?
anti_join(attendance_small, standings_small)
## Joining with `by = join_by(year)`
## # A tibble: 9 × 3
## team_name year week
## <chr> <dbl> <dbl>
## 1 Steelers 2013 6
## 2 Chargers 2014 9
## 3 Browns 2013 5
## 4 Buccaneers 2014 11
## 5 Colts 2013 10
## 6 Titans 2016 16
## 7 Bears 2001 11
## 8 Steelers 2001 16
## 9 Cardinals 2004 4