Background

Original Reddit Link

I’m currently working on the project where I have about multiple CSV files need to match all combinations of events people attended by matching their customer id. They have been to multiple events. I need the events matched in a new column separated by comma. There is multiple data sets that need to be compared. Vlookup in excel was freezing due to the amount of data. How can I do this in R? I have tidyverse installed. I believe the rbind() function is needed.

Example data

library(knitr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Generate Sample Data
df1 <- data.frame(LastName = c("Robson", "Robson", "Dixon", "Grimes"),
                  FirstName = c("Jonson", "Jonson", "Darryl", "Rick"),
                  CustID = c(23019, 23019, 23036, 23050),
                  Event = c("NP5", "RTE3", "TWD1", "TWD1"),
                  stringsAsFactors = FALSE)

df2 <- data.frame(LastName = c("Robson", "Robson", "Grimes", "Dixon", "Dixon"),
                  FirstName = c("Jonson", "Jonson", "Lori", "Darryl", "Darryl"),
                  CustID = c(23019, 23019, 23051, 23036, 23036),
                  Event = c("GRT2", "RTE4", "TWD1", "TWD1", "TWD2"),
                  stringsAsFactors = FALSE)

# Combine the data frames
df <- rbind(df1, df2)
knitr::kable(df)

LastName	FirstName	CustID	Event
Robson	Jonson	23019	NP5
Robson	Jonson	23019	RTE3
Dixon	Darryl	23036	TWD1
Grimes	Rick	23050	TWD1
Robson	Jonson	23019	GRT2
Robson	Jonson	23019	RTE4
Grimes	Lori	23051	TWD1
Dixon	Darryl	23036	TWD1
Dixon	Darryl	23036	TWD2

df_summary <- df %>% group_by(LastName, FirstName, CustID) %>%
  summarize(NumEventsUnique = length(unique(Event))) %>%
  arrange(CustID)

knitr::kable(df_summary)

LastName	FirstName	CustID	NumEventsUnique
Robson	Jonson	23019	4
Dixon	Darryl	23036	2
Grimes	Rick	23050	1
Grimes	Lori	23051	1

Now, the hard part is done. Although I wouldn’t put multiple events in a single column, it’s a fairly easy task to accomplish.

Using a `base` R approach.

df_summary$Events <- character(nrow(df_summary))
for (i in 1:nrow(df_summary)) {
  df_summary[i, ]$Events <- paste(df[df_summary[i, ]$CustID == df$CustID, ]$Event, collapse = ", ")
}

knitr::kable(df_summary)

LastName	FirstName	CustID	NumEventsUnique	Events
Robson	Jonson	23019	4	NP5, RTE3, GRT2, RTE4
Dixon	Darryl	23036	2	TWD1, TWD1, TWD2
Grimes	Rick	23050	1	TWD1
Grimes	Lori	23051	1	TWD1

Using a Tidyverse Approach

df_summary$Events <- character(nrow(df_summary))
df_summary %>% mutate(Events = paste(df[CustID == df$CustID, ]$Event, collapse = ", "))

Reddit Post: Finding combinations of data in multiple sets

Thaufas

October 7, 2018

Background

Using a `base` R approach.

Using a Tidyverse Approach

Reddit Post: Finding combinations of data in multiple sets

Thaufas

October 7, 2018

Background

Using a base R approach.

Using a Tidyverse Approach

Using a `base` R approach.