Import two related datasets from TidyTuesday Project.
forest <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-04-06/forest.csv')
## Rows: 475 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): entity, code
## dbl (2): year, net_forest_conversion
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
forest_area <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-04-06/forest_area.csv')
## Rows: 7846 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): entity, code
## dbl (2): year, forest_area
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1: Forest
Data 2: Forest Area
set.seed(1234)
forest_small <- forest %>% select(entity, year, net_forest_conversion) %>% sample_n(10)
forest_area_small <- forest_area %>% select(entity, year, forest_area) %>% sample_n(10)
forest_small
## # A tibble: 10 × 3
## entity year net_forest_conversion
## <chr> <dbl> <dbl>
## 1 Morocco 2000 16800
## 2 Papua New Guinea 2000 -9910
## 3 Switzerland 2000 3850
## 4 Cuba 1990 37700
## 5 Djibouti 2010 0
## 6 Spain 2000 145140
## 7 Falkland Islands 1990 0
## 8 Togo 2010 -2960
## 9 Suriname 2015 -11080
## 10 South Africa 2015 -36400
forest_area_small
## # A tibble: 10 × 3
## entity year forest_area
## <chr> <dbl> <dbl>
## 1 El Salvador 2011 0.0152
## 2 Indonesia 1996 2.58
## 3 Greenland 1998 0.00000527
## 4 Romania 2011 0.161
## 5 Faroe Islands 1999 0.00000192
## 6 Burundi 2015 0.00685
## 7 Malawi 2017 0.0581
## 8 Trinidad and Tobago 2002 0.00569
## 9 Micronesia 1997 0.00153
## 10 Jordan 1996 0.00233
forest_small %>% inner_join(forest_area_small, by = c("entity", "year"))
## # A tibble: 0 × 4
## # ℹ 4 variables: entity <chr>, year <dbl>, net_forest_conversion <dbl>,
## # forest_area <dbl>
Describe the resulting data:
How is it different from the original two datasets?
Only keeps rows that have matching entity and year in both datasets. In this case, there are no matches, so the result has 0 rows.
forest_small %>% left_join(forest_area_small, by = c("entity", "year"))
## # A tibble: 10 × 4
## entity year net_forest_conversion forest_area
## <chr> <dbl> <dbl> <dbl>
## 1 Morocco 2000 16800 NA
## 2 Papua New Guinea 2000 -9910 NA
## 3 Switzerland 2000 3850 NA
## 4 Cuba 1990 37700 NA
## 5 Djibouti 2010 0 NA
## 6 Spain 2000 145140 NA
## 7 Falkland Islands 1990 0 NA
## 8 Togo 2010 -2960 NA
## 9 Suriname 2015 -11080 NA
## 10 South Africa 2015 -36400 NA
Describe the resulting data:
How is it different from the original two datasets?
Keeps all rows from the first dataset forest_small and adds matching values from the second dataset. Missing matches are filled with NA.In this case, the forest_area column exist but contains only NA values because there are no matching rows.
forest_small %>%
right_join(forest_area_small, by = c("entity", "year"))
## # A tibble: 10 × 4
## entity year net_forest_conversion forest_area
## <chr> <dbl> <dbl> <dbl>
## 1 El Salvador 2011 NA 0.0152
## 2 Indonesia 1996 NA 2.58
## 3 Greenland 1998 NA 0.00000527
## 4 Romania 2011 NA 0.161
## 5 Faroe Islands 1999 NA 0.00000192
## 6 Burundi 2015 NA 0.00685
## 7 Malawi 2017 NA 0.0581
## 8 Trinidad and Tobago 2002 NA 0.00569
## 9 Micronesia 1997 NA 0.00153
## 10 Jordan 1996 NA 0.00233
Describe the resulting data:
How is it different from the original two datasets?
Keeps all rows from the second dataset (forest_area_small) and adds matching values from the first dataset (forest_small). In this case, there are no matches, so the net_forest_conversion column contains only NA values.
full_join(forest_small, forest_area_small, by = c("entity", "year"))
## # A tibble: 20 × 4
## entity year net_forest_conversion forest_area
## <chr> <dbl> <dbl> <dbl>
## 1 Morocco 2000 16800 NA
## 2 Papua New Guinea 2000 -9910 NA
## 3 Switzerland 2000 3850 NA
## 4 Cuba 1990 37700 NA
## 5 Djibouti 2010 0 NA
## 6 Spain 2000 145140 NA
## 7 Falkland Islands 1990 0 NA
## 8 Togo 2010 -2960 NA
## 9 Suriname 2015 -11080 NA
## 10 South Africa 2015 -36400 NA
## 11 El Salvador 2011 NA 0.0152
## 12 Indonesia 1996 NA 2.58
## 13 Greenland 1998 NA 0.00000527
## 14 Romania 2011 NA 0.161
## 15 Faroe Islands 1999 NA 0.00000192
## 16 Burundi 2015 NA 0.00685
## 17 Malawi 2017 NA 0.0581
## 18 Trinidad and Tobago 2002 NA 0.00569
## 19 Micronesia 1997 NA 0.00153
## 20 Jordan 1996 NA 0.00233
Describe the resulting data:
How is it different from the original two datasets?
Combines all rows from both datasets. Unmatched rows from either dataset are included with NA values.
semi_join(forest_small, forest_area_small, by = c("entity", "year"))
## # A tibble: 0 × 3
## # ℹ 3 variables: entity <chr>, year <dbl>, net_forest_conversion <dbl>
Describe the resulting data:
How is it different from the original two datasets?
Returns only rows from the first dataset that have matches in the second dataset. Here are no matches.
anti_join(forest_small, forest_area_small, by = c("entity", "year"))
## # A tibble: 10 × 3
## entity year net_forest_conversion
## <chr> <dbl> <dbl>
## 1 Morocco 2000 16800
## 2 Papua New Guinea 2000 -9910
## 3 Switzerland 2000 3850
## 4 Cuba 1990 37700
## 5 Djibouti 2010 0
## 6 Spain 2000 145140
## 7 Falkland Islands 1990 0
## 8 Togo 2010 -2960
## 9 Suriname 2015 -11080
## 10 South Africa 2015 -36400
Describe the resulting data:
How is it different from the original two datasets?
Returns rows from the first dataset that do not have matches in the second dataset.