Import two related datasets from TidyTuesday Project.
tt_datasets <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-07-02/tt_datasets.csv')
## Rows: 644 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): dataset_name
## dbl (4): year, week, variables, observations
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tt_summary <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-07-02/tt_summary.csv')
## Rows: 324 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): title, source_title, article_title
## dbl (2): year, week
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: tt_summary
Data 2: tt_datasets
tt_summary_small <- tt_summary %>% select(year, week, title) %>% sample_n(10)
tt_datasets_small <- tt_datasets %>% select(year, week, variables) %>% sample_n(10)
tt_summary_small
## # A tibble: 10 × 3
## year week title
## <dbl> <dbl> <chr>
## 1 2022 25 Juneteenth
## 2 2023 21 Central Park Squirrels
## 3 2021 35 Lemurs
## 4 2020 4 Song Genres
## 5 2024 15 2023 & 2024 US Solar Eclipses
## 6 2019 23 Ramen Ratings
## 7 2020 9 Measles Vaccination
## 8 2023 34 Refugees
## 9 2018 3 Global Mortality
## 10 2020 12 The Office
tt_datasets_small
## # A tibble: 10 × 3
## year week variables
## <dbl> <dbl> <dbl>
## 1 2024 24 5
## 2 2020 19 4
## 3 2022 43 24
## 4 2021 37 7
## 5 2020 27 4
## 6 2019 29 21
## 7 2019 18 1
## 8 2019 19 9
## 9 2020 14 6
## 10 2022 10 24
Describe the resulting data:
How is it different from the original two datasets? * 4 row compared to 10 rows in the original datasets * all colums from the two datasets
tt_summary_small %>% inner_join(tt_datasets_small, by = c("year", "week"))
## # A tibble: 0 × 4
## # ℹ 4 variables: year <dbl>, week <dbl>, title <chr>, variables <dbl>
Describe the resulting data:
How is it different from the original two datasets? * The orginal dataset variables are diffrent
tt_summary_small %>% left_join(tt_datasets_small, by = c("year", "week"))
## # A tibble: 10 × 4
## year week title variables
## <dbl> <dbl> <chr> <dbl>
## 1 2022 25 Juneteenth NA
## 2 2023 21 Central Park Squirrels NA
## 3 2021 35 Lemurs NA
## 4 2020 4 Song Genres NA
## 5 2024 15 2023 & 2024 US Solar Eclipses NA
## 6 2019 23 Ramen Ratings NA
## 7 2020 9 Measles Vaccination NA
## 8 2023 34 Refugees NA
## 9 2018 3 Global Mortality NA
## 10 2020 12 The Office NA
Describe the resulting data:
How is it different from the original two datasets? * The orginal dataset title is diffrent
tt_summary_small %>% right_join(tt_datasets_small, by = c("year", "week"))
## # A tibble: 10 × 4
## year week title variables
## <dbl> <dbl> <chr> <dbl>
## 1 2024 24 <NA> 5
## 2 2020 19 <NA> 4
## 3 2022 43 <NA> 24
## 4 2021 37 <NA> 7
## 5 2020 27 <NA> 4
## 6 2019 29 <NA> 21
## 7 2019 18 <NA> 1
## 8 2019 19 <NA> 9
## 9 2020 14 <NA> 6
## 10 2022 10 <NA> 24
Describe the resulting data:
How is it different from the original two datasets? * There are 19 rows in this dataset instead of 10
tt_summary_small %>% full_join(tt_datasets_small, by = c("year", "week"))
## # A tibble: 20 × 4
## year week title variables
## <dbl> <dbl> <chr> <dbl>
## 1 2022 25 Juneteenth NA
## 2 2023 21 Central Park Squirrels NA
## 3 2021 35 Lemurs NA
## 4 2020 4 Song Genres NA
## 5 2024 15 2023 & 2024 US Solar Eclipses NA
## 6 2019 23 Ramen Ratings NA
## 7 2020 9 Measles Vaccination NA
## 8 2023 34 Refugees NA
## 9 2018 3 Global Mortality NA
## 10 2020 12 The Office NA
## 11 2024 24 <NA> 5
## 12 2020 19 <NA> 4
## 13 2022 43 <NA> 24
## 14 2021 37 <NA> 7
## 15 2020 27 <NA> 4
## 16 2019 29 <NA> 21
## 17 2019 18 <NA> 1
## 18 2019 19 <NA> 9
## 19 2020 14 <NA> 6
## 20 2022 10 <NA> 24
Describe the resulting data:
How is it different from the original two datasets? * 3 row compared to 10 rows in the original datasets
tt_summary_small %>% semi_join(tt_datasets_small, by = c("year", "week"))
## # A tibble: 0 × 3
## # ℹ 3 variables: year <dbl>, week <dbl>, title <chr>
Describe the resulting data:
How is it different from the original two datasets? * 27 rows compared to 10 rows in the original datasets * All Info in dataset is there
tt_summary_small %>% anti_join(tt_datasets_small, by = c("year", "week"))
## # A tibble: 10 × 3
## year week title
## <dbl> <dbl> <chr>
## 1 2022 25 Juneteenth
## 2 2023 21 Central Park Squirrels
## 3 2021 35 Lemurs
## 4 2020 4 Song Genres
## 5 2024 15 2023 & 2024 US Solar Eclipses
## 6 2019 23 Ramen Ratings
## 7 2020 9 Measles Vaccination
## 8 2023 34 Refugees
## 9 2018 3 Global Mortality
## 10 2020 12 The Office