This tutorial is the part of the dplyr training series. Here is the YouTube Video link for this tutuorial.

https://youtu.be/a0_U3yTMEsI

Here is the link to the complete series on DPLYR

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR

Why dplyr

dplyr is a great tool to use in R.

The commands may look long and overwhelming to someone not using dplyr but that is not the case.

Once you learn the basics of it then it is very intuitive to use. Just like making a sentence once you have learnt the basic words of a language.

Audience

For beginners or experienced R users wanting to learn various commands of dplyr.

DPLYR : YouTube Video series

This tutorial is part of a series of tutorials on all practical aspects of dplyr All YouTube videos are available in a single playlist on YouTube.

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR

FULL JOIN

Create sample dataset

Packages used

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3

Sample datasets.

Let us create some data for patient ages.

First dataset

Contain the age and outcome of the patients. PatientID is the unique key in this dataset.

library(dplyr)

d1 <- data.frame( PatientID  = c('P001','P002','P003')
                  , Age         = c(23,12,5)
                  , Outcome  = c('Died','Died',NA)   )

d1

Second dataset

Contain the Systolic blood pressure (SBP) and diastolic blood pressure (DBP) of the patients. PatientID is the unique key in this dataset.

d2 <- data.frame(PatientID   = c('P001','P002','P004')
                 , SBP       = c(120,   80, 45)
                 , DBP       = c(80,    70, 30)
)


d2      

First attempt of doing a full join

This is the simple syntax using dplyr. You provide the names of your datasets.

As in both the datasets “PatientID” is the key so we can just specify it once saying by =“PatientID”. If the keys are different in both the cases then you have to specify both the keys as shown in subsequent examples in this document.

# First way - this is just to demonstrate the fact that the system automatically picks the key
# in our example patientid exists  in both datasets, so it assumes that this must be the key.

d.full <- dplyr::full_join(d1,d2)
d.full

Defining the key

We are providing the key by using the syntax by =“PatientID”

If your tables have different names for the key then you can say by= c(“PatientID” =“PatId”)

d.full <- dplyr::full_join(d1,d2, by = "PatientID")
d.full

Using the pipe

d.full <- d1%>%
           dplyr::full_join(d2, by = "PatientID")
d.full

What happens when your key is not unique

In our example below, the PatientID P001 has two records as there were two blood pressure reading taken for this patient. If you use case needs only one record per patient then you can make it unique first. In our example we wanted to pick the highest SBP readings for our patients ( wherever we had multiple readings for the same patients).

Multiple keys in the table

d1 <- data.frame( PatientID  = c('P001','P002','P003')
                  , Age        = c(23,12,5)
                  , Outcome  = c('Died','Died',NA)   )

# In d2 we have the P001 being repeated twice
d2 <- data.frame(PatientID   = c('P001','P002','P004','P001')
                 , SBP       = c(120,   80, 45,110)
                 , DBP       = c(80,    70, 30,90)
)



d.full <- d1%>%
  dplyr::full_join(d2, by = "PatientID")

d.full

Handle multiple keys by making them unique

d1 <- data.frame( PatientID  = c('P001','P002','P003')
                  , Age        = c(23,12,5)
                  , Outcome  = c('Died','Died',NA)   )

# In d2 we have the P001 being repeated twice
d2 <- data.frame(PatientID   = c('P001','P002','P004','P001')
                 , SBP       = c(120,   80, 45,110)
                 , DBP       = c(80,    70, 30,90)
)

# We can make it unique first by only using the first record 
d2.unique <- d2%>%
  dplyr:::arrange(PatientID,-SBP)%>%
  dplyr::group_by(PatientID)%>%
  dplyr::slice(1)


d.full <- d1%>%
  dplyr::full_join(d2.unique, by = "PatientID")

d.full

Watch our complete tutorial on all aspects of DPLYR.

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR