Week 9: Apply it to your data 8

1. Import your data

Import two related datasets from TidyTuesday Project.

ocean_temperature <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-31/ocean_temperature.csv')

## Rows: 19165 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (4): sensor_depth_at_low_tide_m, mean_temperature_degree_c, sd_temperat...
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ocean_temperature_deployments <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-31/ocean_temperature_deployments.csv')

## Rows: 14 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): deployment_id
## dbl  (2): latitude, longitude
## date (2): start_date, end_date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Make data small

Describe the two datasets:

Data1

Columns:5
Rows:10

Data 2

Columns:8
Rows:5

data1 <- ocean_temperature %>% 
    head(10)
data2 <- ocean_temperature_deployments %>% 
    head(5)

3. inner_join

Describe the resulting data:

Columns:7
Rows:9

How is it different from the original two datasets?

inner_join(data1, data2, by = c("date" = "start_date"))

## # A tibble: 7 × 9
##   date       sensor_depth_at_low_tide_m mean_temperature_degree_c
##   <date>                          <dbl>                     <dbl>
## 1 2018-02-20                          2                      1.58
## 2 2018-02-20                          5                      1.50
## 3 2018-02-20                         10                      1.49
## 4 2018-02-20                         15                      1.63
## 5 2018-02-20                         20                      1.84
## 6 2018-02-20                         30                      1.92
## 7 2018-02-20                         40                      1.82
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## #   deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>

4. left_join

Describe the resulting data:

Columns:9
Rows:10

How is it different from the original two datasets?

left_join(data1, data2, by = c("date" = "start_date"))

## # A tibble: 10 × 9
##    date       sensor_depth_at_low_tide_m mean_temperature_degree_c
##    <date>                          <dbl>                     <dbl>
##  1 2018-02-20                          2                      1.58
##  2 2018-02-20                          5                      1.50
##  3 2018-02-20                         10                      1.49
##  4 2018-02-20                         15                      1.63
##  5 2018-02-20                         20                      1.84
##  6 2018-02-20                         30                      1.92
##  7 2018-02-20                         40                      1.82
##  8 2018-02-21                          2                      1.72
##  9 2018-02-21                          5                      1.46
## 10 2018-02-21                         10                      1.42
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## #   deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>

5. right_join

Describe the resulting data:

Columns:9
Rows:11

How is it different from the original two datasets?

right_join(data1, data2, by = c("date" = "start_date"))

## # A tibble: 11 × 9
##    date       sensor_depth_at_low_tide_m mean_temperature_degree_c
##    <date>                          <dbl>                     <dbl>
##  1 2018-02-20                          2                      1.58
##  2 2018-02-20                          5                      1.50
##  3 2018-02-20                         10                      1.49
##  4 2018-02-20                         15                      1.63
##  5 2018-02-20                         20                      1.84
##  6 2018-02-20                         30                      1.92
##  7 2018-02-20                         40                      1.82
##  8 2018-04-25                         NA                     NA   
##  9 2019-05-02                         NA                     NA   
## 10 2019-11-22                         NA                     NA   
## 11 2020-11-08                         NA                     NA   
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## #   deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>

6. full_join

Describe the resulting data:

Columns:9
Rows:14

How is it different from the original two datasets?

full_join(data1, data2, by = c("date" = "start_date"))

## # A tibble: 14 × 9
##    date       sensor_depth_at_low_tide_m mean_temperature_degree_c
##    <date>                          <dbl>                     <dbl>
##  1 2018-02-20                          2                      1.58
##  2 2018-02-20                          5                      1.50
##  3 2018-02-20                         10                      1.49
##  4 2018-02-20                         15                      1.63
##  5 2018-02-20                         20                      1.84
##  6 2018-02-20                         30                      1.92
##  7 2018-02-20                         40                      1.82
##  8 2018-02-21                          2                      1.72
##  9 2018-02-21                          5                      1.46
## 10 2018-02-21                         10                      1.42
## 11 2018-04-25                         NA                     NA   
## 12 2019-05-02                         NA                     NA   
## 13 2019-11-22                         NA                     NA   
## 14 2020-11-08                         NA                     NA   
## # ℹ 6 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>,
## #   deployment_id <chr>, end_date <date>, latitude <dbl>, longitude <dbl>

7. semi_join

Describe the resulting data:

Columns:5
Rows:7

How is it different from the original two datasets?

semi_join(data1, data2, by = c("date" = "start_date"))

## # A tibble: 7 × 5
##   date       sensor_depth_at_low_tide_m mean_temperature_degree_c
##   <date>                          <dbl>                     <dbl>
## 1 2018-02-20                          2                      1.58
## 2 2018-02-20                          5                      1.50
## 3 2018-02-20                         10                      1.49
## 4 2018-02-20                         15                      1.63
## 5 2018-02-20                         20                      1.84
## 6 2018-02-20                         30                      1.92
## 7 2018-02-20                         40                      1.82
## # ℹ 2 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>

8. anti_join

Describe the resulting data:

Columns:5
Rows:3

How is it different from the original two datasets?

anti_join(data1, data2, by = c("date" = "start_date"))

## # A tibble: 3 × 5
##   date       sensor_depth_at_low_tide_m mean_temperature_degree_c
##   <date>                          <dbl>                     <dbl>
## 1 2018-02-21                          2                      1.72
## 2 2018-02-21                          5                      1.46
## 3 2018-02-21                         10                      1.42
## # ℹ 2 more variables: sd_temperature_degree_c <dbl>, n_obs <dbl>

Week 9: Apply it to your data 8

Dillon Lee

2022-10-05

1. Import your data

2. Make data small

3. inner_join

4. left_join

5. right_join

6. full_join

7. semi_join

8. anti_join