1. Import your data

Import two related datasets from TidyTuesday Project.

life_expectancy_different_ages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-12-05/life_expectancy_different_ages.csv')
## Rows: 20755 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (7): Year, LifeExpectancy0, LifeExpectancy10, LifeExpectancy25, LifeExpe...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
life_expectancy_female_male <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-12-05/life_expectancy_female_male.csv')
## Rows: 19922 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (2): Year, LifeExpectancyDiffFM
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1 Life expectancy difference between men and women

Data 2 Life expectancy difference between different ages

set.seed(1234)
life_expectancy_female_male_small <- life_expectancy_female_male %>% select(Entity, Year, LifeExpectancyDiffFM) %>% sample_n(200)

life_expectancy_different_ages_small <- life_expectancy_different_ages%>% select(Entity, Year, LifeExpectancy0) %>% sample_n(200)

life_expectancy_female_male_small
## # A tibble: 200 × 3
##    Entity                      Year LifeExpectancyDiffFM
##    <chr>                      <dbl>                <dbl>
##  1 High-income countries       1963                 5.85
##  2 Indonesia                   1983                 3.48
##  3 Guinea                      1961                 2.55
##  4 Iran                        1981                 8.39
##  5 Kuwait                      2002                 3.40
##  6 Antigua and Barbuda         1996                 5.50
##  7 Saint Martin (French part)  1971                 5.36
##  8 Mali                        1963                 1.41
##  9 Asia (UN)                   2019                 4.98
## 10 New Caledonia               2010                 8.27
## # ℹ 190 more rows
life_expectancy_different_ages_small
## # A tibble: 200 × 3
##    Entity                           Year LifeExpectancy0
##    <chr>                           <dbl>           <dbl>
##  1 Romania                          2005            70.6
##  2 Namibia                          1957            45.6
##  3 Belgium                          2008            79.6
##  4 Sweden                           1819            37.0
##  5 Australia                        2012            82.3
##  6 Wallis and Futuna                1978            57.6
##  7 Bonaire Sint Eustatius and Saba  1986            73.6
##  8 Luxembourg                       1946            61.6
##  9 Cameroon                         1956            38.3
## 10 New Caledonia                    1977            64.3
## # ℹ 190 more rows

3. inner_join

Describe the resulting data:

How is it different from the original two datasets? There is only 1 country that overlaps in our data set, and it is Saint Martin. We go from 200 samples of different countries and years to just 1.

life_expectancy_female_male_small %>% inner_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 1 × 4
##   Entity                      Year LifeExpectancyDiffFM LifeExpectancy0
##   <chr>                      <dbl>                <dbl>           <dbl>
## 1 Saint Martin (French part)  1983                 6.68            72.6

4. left_join

Describe the resulting data:

How is it different from the original two datasets? It is the same size as the original dataset, but does not include data from LifeExpectancy0.

life_expectancy_female_male_small %>% left_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 200 × 4
##    Entity                      Year LifeExpectancyDiffFM LifeExpectancy0
##    <chr>                      <dbl>                <dbl>           <dbl>
##  1 High-income countries       1963                 5.85              NA
##  2 Indonesia                   1983                 3.48              NA
##  3 Guinea                      1961                 2.55              NA
##  4 Iran                        1981                 8.39              NA
##  5 Kuwait                      2002                 3.40              NA
##  6 Antigua and Barbuda         1996                 5.50              NA
##  7 Saint Martin (French part)  1971                 5.36              NA
##  8 Mali                        1963                 1.41              NA
##  9 Asia (UN)                   2019                 4.98              NA
## 10 New Caledonia               2010                 8.27              NA
## # ℹ 190 more rows

5. right_join

Describe the resulting data:

How is it different from the original two datasets? It is the same size as the original dataset, but does not include data from LifeExpectancyFM.

life_expectancy_female_male_small %>% right_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 200 × 4
##    Entity                           Year LifeExpectancyDiffFM LifeExpectancy0
##    <chr>                           <dbl>                <dbl>           <dbl>
##  1 Saint Martin (French part)       1983                 6.68            72.6
##  2 Romania                          2005                NA               70.6
##  3 Namibia                          1957                NA               45.6
##  4 Belgium                          2008                NA               79.6
##  5 Sweden                           1819                NA               37.0
##  6 Australia                        2012                NA               82.3
##  7 Wallis and Futuna                1978                NA               57.6
##  8 Bonaire Sint Eustatius and Saba  1986                NA               73.6
##  9 Luxembourg                       1946                NA               61.6
## 10 Cameroon                         1956                NA               38.3
## # ℹ 190 more rows

6. full_join

Describe the resulting data:

How is it different from the original two datasets? It includes all of the data from the original datasets, except the column where saint martin overlaps.

full_data <- life_expectancy_female_male_small %>% full_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
view(full_data)

7. semi_join

Describe the resulting data:

How is it different from the original two datasets? It is very similar to the inner join dataset, except it gets rid of the Life Expectancy0 column.

life_expectancy_female_male_small %>% semi_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 1 × 3
##   Entity                      Year LifeExpectancyDiffFM
##   <chr>                      <dbl>                <dbl>
## 1 Saint Martin (French part)  1983                 6.68

8. anti_join

Describe the resulting data:

How is it different from the original two datasets? It excludes that row that saint martin overlapped with the other dataset, as well as removing the LifeExpectancy0 variable.

life_expectancy_female_male_small %>% anti_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 199 × 3
##    Entity                      Year LifeExpectancyDiffFM
##    <chr>                      <dbl>                <dbl>
##  1 High-income countries       1963                 5.85
##  2 Indonesia                   1983                 3.48
##  3 Guinea                      1961                 2.55
##  4 Iran                        1981                 8.39
##  5 Kuwait                      2002                 3.40
##  6 Antigua and Barbuda         1996                 5.50
##  7 Saint Martin (French part)  1971                 5.36
##  8 Mali                        1963                 1.41
##  9 Asia (UN)                   2019                 4.98
## 10 New Caledonia               2010                 8.27
## # ℹ 189 more rows