Import two related datasets from TidyTuesday Project.
# csv file
freedom <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-02-22/freedom.csv')
## Rows: 4979 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country, Status, Region_Name
## dbl (5): year, CL, PR, Region_Code, is_ldc
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ratio <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-05-07/student_teacher_ratio.csv')
## Rows: 5189 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): edulit_ind, indicator, country_code, country, flag_codes, flags
## dbl (2): year, student_ratio
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1
Data 2
set.seed(1234)
freedom_small <- freedom %>% select(country, year, Region_Name) %>% sample_n(50)
ratio_small <- ratio %>% select(country, year, student_ratio) %>% sample_n(50)
freedom_small
## # A tibble: 50 × 3
## country year Region_Name
## <chr> <dbl> <chr>
## 1 Congo 2010 Africa
## 2 Brazil 2019 Americas
## 3 Mali 2009 Africa
## 4 China 2018 Asia
## 5 Tonga 2005 Oceania
## 6 Montenegro 2015 Europe
## 7 Jamaica 2008 Americas
## 8 Nicaragua 2008 Americas
## 9 Mauritania 2012 Africa
## 10 Latvia 2002 Europe
## # ℹ 40 more rows
ratio_small
## # A tibble: 50 × 3
## country year student_ratio
## <chr> <dbl> <dbl>
## 1 Singapore 2016 15.1
## 2 Russian Federation 2015 20.1
## 3 Small Island Developing States 2016 21.6
## 4 Small Island Developing States 2014 23.0
## 5 Bulgaria 2016 11.8
## 6 Tonga 2015 10.5
## 7 Honduras 2015 29.1
## 8 South Africa 2015 30.3
## 9 Asia (Central) 2013 17.4
## 10 Solomon Islands 2013 33.8
## # ℹ 40 more rows
Describe the resulting data:
How is it different from the original two datasets? Now 4 variables with 0 rows
ratio_small %>% inner_join(freedom_small, by = c("country", "year"))
## # A tibble: 0 × 4
## # ℹ 4 variables: country <chr>, year <dbl>, student_ratio <dbl>,
## # Region_Name <chr>
Describe the resulting data: becomes unavailble
How is it different from the original two datasets? It merged the data sets and introduced NA values for region_name since it joined from left
ratio_small %>% left_join(freedom_small, by = c("country", "year"))
## # A tibble: 50 × 4
## country year student_ratio Region_Name
## <chr> <dbl> <dbl> <chr>
## 1 Singapore 2016 15.1 <NA>
## 2 Russian Federation 2015 20.1 <NA>
## 3 Small Island Developing States 2016 21.6 <NA>
## 4 Small Island Developing States 2014 23.0 <NA>
## 5 Bulgaria 2016 11.8 <NA>
## 6 Tonga 2015 10.5 <NA>
## 7 Honduras 2015 29.1 <NA>
## 8 South Africa 2015 30.3 <NA>
## 9 Asia (Central) 2013 17.4 <NA>
## 10 Solomon Islands 2013 33.8 <NA>
## # ℹ 40 more rows
Describe the resulting data:
How is it different from the original two datasets? It merged the data sets and introduced NA values for region_name since it joined from right
ratio_small %>% right_join(freedom_small, by = c("country", "year"))
## # A tibble: 50 × 4
## country year student_ratio Region_Name
## <chr> <dbl> <dbl> <chr>
## 1 Congo 2010 NA Africa
## 2 Brazil 2019 NA Americas
## 3 Mali 2009 NA Africa
## 4 China 2018 NA Asia
## 5 Tonga 2005 NA Oceania
## 6 Montenegro 2015 NA Europe
## 7 Jamaica 2008 NA Americas
## 8 Nicaragua 2008 NA Americas
## 9 Mauritania 2012 NA Africa
## 10 Latvia 2002 NA Europe
## # ℹ 40 more rows
Describe the resulting data:
How is it different from the original two datasets? Merged so there is 100 rows
ratio_small %>% full_join(freedom_small, by = c("country", "year"))
## # A tibble: 100 × 4
## country year student_ratio Region_Name
## <chr> <dbl> <dbl> <chr>
## 1 Singapore 2016 15.1 <NA>
## 2 Russian Federation 2015 20.1 <NA>
## 3 Small Island Developing States 2016 21.6 <NA>
## 4 Small Island Developing States 2014 23.0 <NA>
## 5 Bulgaria 2016 11.8 <NA>
## 6 Tonga 2015 10.5 <NA>
## 7 Honduras 2015 29.1 <NA>
## 8 South Africa 2015 30.3 <NA>
## 9 Asia (Central) 2013 17.4 <NA>
## 10 Solomon Islands 2013 33.8 <NA>
## # ℹ 90 more rows
Describe the resulting data:
How is it different from the original two datasets? Couldnt find rows by semi join
ratio_small %>% semi_join(freedom_small, by = c("country", "year"))
## # A tibble: 0 × 3
## # ℹ 3 variables: country <chr>, year <dbl>, student_ratio <dbl>
Describe the resulting data:
How is it different from the original two datasets? Removes the variable with NAs
ratio_small %>% anti_join(freedom_small, by = c("country", "year"))
## # A tibble: 50 × 3
## country year student_ratio
## <chr> <dbl> <dbl>
## 1 Singapore 2016 15.1
## 2 Russian Federation 2015 20.1
## 3 Small Island Developing States 2016 21.6
## 4 Small Island Developing States 2014 23.0
## 5 Bulgaria 2016 11.8
## 6 Tonga 2015 10.5
## 7 Honduras 2015 29.1
## 8 South Africa 2015 30.3
## 9 Asia (Central) 2013 17.4
## 10 Solomon Islands 2013 33.8
## # ℹ 40 more rows