Often we have separate data frames that can have common and differing variables for similar observations. These types of data sets are referred as relational data sets.
The dplyr package offers three sets of joining functions to provide alternative ways to join data frames. These are:
The first set of functions to combine data sets is called “mutating joins”. left_join(), right_join(), inner_join(), and full_join() functions are in this group. The mutating join functions allow you to combine variables from two tables and add variables to the right (like mutate).
Note that mutating join functions add variables to the right. Therefore if you have a lot of variables already in the data, the new variables won’t get printed out.
Each mutating join takes an argument “by” that controls which variables are used to match observations in the two data sets. There are a few ways to specify it:
This code joins the planes column to flights2, joining on a common column called “tailnum” in both tables x and y.
# flights2 %>% left_join(planes, by = "tailnum")
This code joins the airport column to flights 2, joining on a common column “dest” in table x and column “faa” in column y. They are the same column but named differently.
# flights2 %>% left_join(airports, c("dest" = "faa"))
Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables. There are two types of filtering joins: semi_join() and anti_join().
BOTH OF THESE KEEP OBS IN X and GET RID OF OBS IN Y. THEY FILTER WHICH OBS IN X TO KEEP.
Anti-joins are useful for diagnosing joint mismatches.
The first set of functions to combine data sets is called “mutating joins”. left_join(), right_join(), inner_join(), and full_join() functions are in this group. The mutating join functions allow you to combine variables from two tables and add variables to the right (like mutate).
Note that mutating join functions add variables to the right. Therefore if you have a lot of variables already in the data, the new variables won’t get printed out.
Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables. There are two types of filtering joins: semi_join() and anti_join().
Set operations expect the x and y inputs to have the same variables and treat the observations like sets. There are three types of set operations:
Often you may just need to merge data frames by row and column. The bind_rows() and bind_cols() bind the multiple data frames by row and column, respectively.