Import two related datasets from TidyTuesday Project.
ocean_temperature <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-31/ocean_temperature.csv')
## Rows: 19165 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): sensor_depth_at_low_tide_m, mean_temperature_degree_c, sd_temperat...
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ocean_temperature_deployments <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-31/ocean_temperature_deployments.csv')
## Rows: 14 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): deployment_id
## dbl (2): latitude, longitude
## date (2): start_date, end_date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Describe the two datasets:
Data1
Data 2
data1 <- ocean_temperature %>%
head(10)
data2 <- ocean_temperature_deployments %>%
head(5)
Describe the resulting data:
How is it different from the original two datasets?
inner_join(data1, data2, by = c("date" = "start_date"))
## # A tibble: 7 × 9
## date sensor_depth_at_low_tide_m mean_temperature_degree_c
## <date> <dbl> <dbl>
## 1 2018-02-20 2 1.58
## 2 2018-02-20 5 1.50
## 3 2018-02-20 10 1.49
## 4 2018-02-20 15 1.63
## 5 2018-02-20 20 1.84
## 6 2018-02-20 30 1.92
## 7 2018-02-20 40 1.82
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## # deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>
Describe the resulting data:
How is it different from the original two datasets?
left_join(data1, data2, by = c("date" = "start_date"))
## # A tibble: 10 × 9
## date sensor_depth_at_low_tide_m mean_temperature_degree_c
## <date> <dbl> <dbl>
## 1 2018-02-20 2 1.58
## 2 2018-02-20 5 1.50
## 3 2018-02-20 10 1.49
## 4 2018-02-20 15 1.63
## 5 2018-02-20 20 1.84
## 6 2018-02-20 30 1.92
## 7 2018-02-20 40 1.82
## 8 2018-02-21 2 1.72
## 9 2018-02-21 5 1.46
## 10 2018-02-21 10 1.42
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## # deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>
Describe the resulting data:
How is it different from the original two datasets?
right_join(data1, data2, by = c("date" = "start_date"))
## # A tibble: 11 × 9
## date sensor_depth_at_low_tide_m mean_temperature_degree_c
## <date> <dbl> <dbl>
## 1 2018-02-20 2 1.58
## 2 2018-02-20 5 1.50
## 3 2018-02-20 10 1.49
## 4 2018-02-20 15 1.63
## 5 2018-02-20 20 1.84
## 6 2018-02-20 30 1.92
## 7 2018-02-20 40 1.82
## 8 2018-04-25 NA NA
## 9 2019-05-02 NA NA
## 10 2019-11-22 NA NA
## 11 2020-11-08 NA NA
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## # deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>
Describe the resulting data:
How is it different from the original two datasets?
full_join(data1, data2, by = c("date" = "start_date"))
## # A tibble: 14 × 9
## date sensor_depth_at_low_tide_m mean_temperature_degree_c
## <date> <dbl> <dbl>
## 1 2018-02-20 2 1.58
## 2 2018-02-20 5 1.50
## 3 2018-02-20 10 1.49
## 4 2018-02-20 15 1.63
## 5 2018-02-20 20 1.84
## 6 2018-02-20 30 1.92
## 7 2018-02-20 40 1.82
## 8 2018-02-21 2 1.72
## 9 2018-02-21 5 1.46
## 10 2018-02-21 10 1.42
## 11 2018-04-25 NA NA
## 12 2019-05-02 NA NA
## 13 2019-11-22 NA NA
## 14 2020-11-08 NA NA
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## # deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>
Describe the resulting data:
How is it different from the original two datasets?
semi_join(data1, data2, by = c("date" = "start_date"))
## # A tibble: 7 × 5
## date sensor_depth_at_low_tide_m mean_temperature_degree_c
## <date> <dbl> <dbl>
## 1 2018-02-20 2 1.58
## 2 2018-02-20 5 1.50
## 3 2018-02-20 10 1.49
## 4 2018-02-20 15 1.63
## 5 2018-02-20 20 1.84
## 6 2018-02-20 30 1.92
## 7 2018-02-20 40 1.82
## # ℹ 2 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>
Describe the resulting data:
How is it different from the original two datasets?
anti_join(data1, data2, by = c("date" = "start_date"))
## # A tibble: 3 × 5
## date sensor_depth_at_low_tide_m mean_temperature_degree_c
## <date> <dbl> <dbl>
## 1 2018-02-21 2 1.72
## 2 2018-02-21 5 1.46
## 3 2018-02-21 10 1.42
## # ℹ 2 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>