Import two related datasets from TidyTuesday Project.
life_expectancy_different_ages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-12-05/life_expectancy_different_ages.csv')
## Rows: 20755 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (7): Year, LifeExpectancy0, LifeExpectancy10, LifeExpectancy25, LifeExpe...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
life_expectancy_female_male <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-12-05/life_expectancy_female_male.csv')
## Rows: 19922 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Entity, Code
## dbl (2): Year, LifeExpectancyDiffFM
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1 Life expectancy difference between men and women
Data 2 Life expectancy difference between different ages
set.seed(1234)
life_expectancy_female_male_small <- life_expectancy_female_male %>% select(Entity, Year, LifeExpectancyDiffFM) %>% sample_n(200)
life_expectancy_different_ages_small <- life_expectancy_different_ages%>% select(Entity, Year, LifeExpectancy0) %>% sample_n(200)
life_expectancy_female_male_small
## # A tibble: 200 × 3
## Entity Year LifeExpectancyDiffFM
## <chr> <dbl> <dbl>
## 1 High-income countries 1963 5.85
## 2 Indonesia 1983 3.48
## 3 Guinea 1961 2.55
## 4 Iran 1981 8.39
## 5 Kuwait 2002 3.40
## 6 Antigua and Barbuda 1996 5.50
## 7 Saint Martin (French part) 1971 5.36
## 8 Mali 1963 1.41
## 9 Asia (UN) 2019 4.98
## 10 New Caledonia 2010 8.27
## # ℹ 190 more rows
life_expectancy_different_ages_small
## # A tibble: 200 × 3
## Entity Year LifeExpectancy0
## <chr> <dbl> <dbl>
## 1 Romania 2005 70.6
## 2 Namibia 1957 45.6
## 3 Belgium 2008 79.6
## 4 Sweden 1819 37.0
## 5 Australia 2012 82.3
## 6 Wallis and Futuna 1978 57.6
## 7 Bonaire Sint Eustatius and Saba 1986 73.6
## 8 Luxembourg 1946 61.6
## 9 Cameroon 1956 38.3
## 10 New Caledonia 1977 64.3
## # ℹ 190 more rows
Describe the resulting data:
How is it different from the original two datasets? There is only 1 country that overlaps in our data set, and it is Saint Martin. We go from 200 samples of different countries and years to just 1.
life_expectancy_female_male_small %>% inner_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 1 × 4
## Entity Year LifeExpectancyDiffFM LifeExpectancy0
## <chr> <dbl> <dbl> <dbl>
## 1 Saint Martin (French part) 1983 6.68 72.6
Describe the resulting data:
How is it different from the original two datasets? It is the same size as the original dataset, but does not include data from LifeExpectancy0.
life_expectancy_female_male_small %>% left_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 200 × 4
## Entity Year LifeExpectancyDiffFM LifeExpectancy0
## <chr> <dbl> <dbl> <dbl>
## 1 High-income countries 1963 5.85 NA
## 2 Indonesia 1983 3.48 NA
## 3 Guinea 1961 2.55 NA
## 4 Iran 1981 8.39 NA
## 5 Kuwait 2002 3.40 NA
## 6 Antigua and Barbuda 1996 5.50 NA
## 7 Saint Martin (French part) 1971 5.36 NA
## 8 Mali 1963 1.41 NA
## 9 Asia (UN) 2019 4.98 NA
## 10 New Caledonia 2010 8.27 NA
## # ℹ 190 more rows
Describe the resulting data:
How is it different from the original two datasets? It is the same size as the original dataset, but does not include data from LifeExpectancyFM.
life_expectancy_female_male_small %>% right_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 200 × 4
## Entity Year LifeExpectancyDiffFM LifeExpectancy0
## <chr> <dbl> <dbl> <dbl>
## 1 Saint Martin (French part) 1983 6.68 72.6
## 2 Romania 2005 NA 70.6
## 3 Namibia 1957 NA 45.6
## 4 Belgium 2008 NA 79.6
## 5 Sweden 1819 NA 37.0
## 6 Australia 2012 NA 82.3
## 7 Wallis and Futuna 1978 NA 57.6
## 8 Bonaire Sint Eustatius and Saba 1986 NA 73.6
## 9 Luxembourg 1946 NA 61.6
## 10 Cameroon 1956 NA 38.3
## # ℹ 190 more rows
Describe the resulting data:
How is it different from the original two datasets? It includes all of the data from the original datasets, except the column where saint martin overlaps.
full_data <- life_expectancy_female_male_small %>% full_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
view(full_data)
Describe the resulting data:
How is it different from the original two datasets? It is very similar to the inner join dataset, except it gets rid of the Life Expectancy0 column.
life_expectancy_female_male_small %>% semi_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 1 × 3
## Entity Year LifeExpectancyDiffFM
## <chr> <dbl> <dbl>
## 1 Saint Martin (French part) 1983 6.68
Describe the resulting data:
How is it different from the original two datasets? It excludes that row that saint martin overlapped with the other dataset, as well as removing the LifeExpectancy0 variable.
life_expectancy_female_male_small %>% anti_join(life_expectancy_different_ages_small)
## Joining with `by = join_by(Entity, Year)`
## # A tibble: 199 × 3
## Entity Year LifeExpectancyDiffFM
## <chr> <dbl> <dbl>
## 1 High-income countries 1963 5.85
## 2 Indonesia 1983 3.48
## 3 Guinea 1961 2.55
## 4 Iran 1981 8.39
## 5 Kuwait 2002 3.40
## 6 Antigua and Barbuda 1996 5.50
## 7 Saint Martin (French part) 1971 5.36
## 8 Mali 1963 1.41
## 9 Asia (UN) 2019 4.98
## 10 New Caledonia 2010 8.27
## # ℹ 189 more rows