Building a Comprehensive CRM Framework for Cafecito: Uniting Customer Insights, Transactions, and Promotional Strategies

Business Description
CRM Data Structure & Rationale
Conclusion
Resources

Business Description

Cafecito, an embodiment of Latin American heritage and flavors, stands as a beacon of warmth and community in the bustling streets of New England’s vibrant cities. Cafecito, a small franchise chain, is driven by a passion for delivering an authentic coffee experience dedicated to providing a haven where customers can savor the rich aromas and flavors of carefully crafted beverages.

In the heart of each neighborhood, Cafecito establishes its presence with a network of boutique coffee shops, each radiating a cozy ambiance that invites patrons to linger and connect. Currently, Cafecito has 10 locations across New England, including Biddeford, Lewiston, Portland and Westbrook in Maine, Portsmouth in New Hampshire, and Boston in Massachusetts, strategically positioned to cater to diverse communities.

At Cafecito, the menu is a celebration of Latin American coffee traditions, offering a delightful array of specialty coffees, including Ecuadorian delights like canelazo, colada morada, and colada de avena, alongside refreshing guanabana juice. Colombian hot chocolate like chocolate santafereño and Peruvian specialties such as chicha morada add an exotic flair to our offerings.

The food menu is equally enticing, featuring traditional delights from across Latin America. Indulge in savory bolon, crispy buñuelo, flavorful empanadas, and hearty tamales, each small plate is a testament to the rich culinary heritage of the region.

The target customer demographic spans across various age groups, united by a love for exceptional coffee and a desire for genuine connection. From busy professionals seeking a morning pick-me-up to students craving a cozy study spot, Cafecito welcomes all who seek a moment of respite from the daily hustle.

As Cafecito embarks on its journey to revolutionize the coffee culture, it transcends being simply a place to enjoy a cup of coffee; it becomes a destination where relationships flourish, stories are shared, and memories are made. Join in embracing the essence of Latin hospitality, one sip at a time. And by adding sazon to your life!

# Week 2

# Importing and preparing the dataset 

CustomerDF <- read_csv("unique_customers.csv")

## Rows: 25000 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): FirstName, LastName, PhoneNumber, EmailAddress, City
## dbl (3): CustomerNumber, PersonaNumber, PostalCode
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Clean names

clean_names(CustomerDF)

# Learn about the dataset

dim(CustomerDF) # checking dimensions of the dataset

summary(CustomerDF)

# Checking the structure of the dataset 

str(CustomerDF)

head(CustomerDF)

print(CustomerDF)

CRM Data Structure & Rationale

In the development process of a Customer Relationship Management (CRM) system for Cafecito, the objective is to deepen the understanding of the potential customer base. The task entails generating a comprehensive dataset that accurately reflects realistic customer interactions, transaction patterns, and marketing strategies.

The CRM system is structured around three primary tables: Customer Table, Transaction Table, and Coupon & Offers Table. These tables are merged into a final data frame named Final_Cafecito_Shop_Dataset, providing a consolidated perspective of customer data and interactions.

The foundational data for the CRM system is sourced from a customer data frame consisting of 25,000 observations and 10 variables, incorporating critical details such as First Name, Last Name, Customer Number, Persona Number, Contact Information (Phone Number, Email Address), City, and Postal Code. This dataset serves as the basis for simulating various customer interactions, transactional behaviors, and marketing endeavors.

The customer table will serve as an archive for detailed customer information, facilitating the tracking of individual preferences, demographics, and contact details. Concurrently, the Transaction Table will record transactional data, including order specifics, purchase history, and payment methods, allowing for in-depth analysis of buying patterns and trends.

Additionally, the coupon and offers table will organize promotional activities, managing coupon codes, descriptions, discount values, etc. Through the implementation of constraints such as conditional coupon values and targeted marketing strategies, the objective is to optimize customer engagement and retention.

Upon integration into a unified dataset, these tables will provide valuable insights into customer behavior, preferences, and loyalty. This comprehensive perspective will enable Cafecito to tailor its products, services, and marketing strategies to effectively serve and engage with its diverse customer base.

Customer Table

The customer table serves as a cornerstone of our CRM system, containing crucial information about each customer and providing insights for effective customer engagement. The structure of the customer table was carefully designed based on the analysis of the customer dataset and considerations of Cafecito’s business objectives.

Before proceeding with the customer table’s creation, an exploration of the frequency distribution of the persona number was conducted (Table 1). This analysis revealed a disproportionate distribution of persona numbers, indicating the need for further exploration and potential adjustments to ensure representation in customer segmentation.

To enhance the understanding of the persona numbers and align them with the clientele Cafecito aims to attract, names were assigned to each persona number (Table 2). These persona names include descriptors such as The Morning Regular, The Social Media Maven, The Health-Conscious Hero, The Studdy Sessioner, The Freelance Guru, The Local Celebrity, The Indecisive, and The Penny Pincher, reflecting different customer profiles and behaviors.

# ------------------------------------------------------
# Code for a complete frequency table
# ------------------------------------------------------

FreqTable = table(CustomerDF$PersonaNumber)

# print(FreqTable)

# Calculate percentages

PercentTable <- prop.table(FreqTable) * 100

# print(PercentTable)

# Calculate cumulative sum

CumulativeSum <- cumsum(FreqTable)

# print(CumulativeSum)

# Calculate cumulative percentage

CumulativePercentage <- cumsum(PercentTable)

# print(CumulativePercentage)

# Combine all the information into a data frame

ResultTable <- data.frame(PersonaID = names(FreqTable), 
                           Frequency = FreqTable, 
                           Percentage = PercentTable,
                           CumulativeSum = CumulativeSum,
                           CumulativePercent = CumulativePercentage)[, c("PersonaID", "Frequency.Freq", "Percentage.Freq", 
                                "CumulativeSum", "CumulativePercent")]


# print(ResultTable) 

knitr::kable(ResultTable, caption = "Table 1: Persona ID Frequency Distribution")

Table 1: Persona ID Frequency Distribution
PersonaID	Frequency.Freq	Percentage.Freq	CumulativeSum	CumulativePercent
1	7033	28.132	7033	28.132
2	4926	19.704	11959	47.836
3	4311	17.244	16270	65.080
4	2922	11.688	19192	76.768
5	2518	10.072	21710	86.840
6	1775	7.100	23485	93.940
7	987	3.948	24472	97.888
8	528	2.112	25000	100.000

# ---------------------------------------------------------------------------
# The above table shows that the personas are disproportionately distributed
# ---------------------------------------------------------------------------

In the creation of the customer table, several attributes were made to enrich the dataset and align it with the business needs. These additions include the inclusion of columns for persona names, replacing the original city column with specific city names corresponding to the locations of Cafecito’s franchise (Biddeford, Lewiston, Portland, Westbrook, Portsmouth, and Boston), and the addition of a state column to indicate the location of each franchise.

Furthermore, the postal codes in the original customer dataset were updated to match the city locations of the franchise outlets. This ensures accurate geographic representation within the dataset, enabling targeted marketing efforts and localized customer engagement strategies.

Following these changes, a random selection of 500 customers was extracted from the customer data frame and stored into a new data frame,SelectedCustomers. The customer table has 500 observations with 10 columns (variables). Additionally, Table 3 shows the frequency distribution of persona names, totaling 500 customers. This distribution illustrates the allocation of customers across different persona names, providing insights into the composition of the selected customer sample.

# Creating a customer table relevant to my business franchise 

# ------------------------------------------------------
# Giving names to personas and adding them to CustomerDF
# ------------------------------------------------------

PersonaNames <- c("The Morning Regular", "The Social Media Maven", "The Studdy Sessioner", "The Freelance Guru", "The Health-Conscious Hero", "The Local Celebrity", "The Indecisive", "The Penny Pincher")

# print(PersonaNames)

CustomerDF$PersonaName <- PersonaNames[CustomerDF$PersonaNumber]

# ------------------------------------------------------
# Code for a complete frequency table of persona names
# ------------------------------------------------------

FreqTable2 = table(CustomerDF$PersonaName)

# print(FreqTable2)

# Calculate percentages

PercentTable2 <- prop.table(FreqTable2) * 100

# print(PercentTable2)

# Calculate cumulative sum

CumulativeSum2 <- cumsum(FreqTable2)

# print(CumulativeSum2)

# Calculate cumulative percentage

CumulativePercent2 <- cumsum(PercentTable2)

# print(CumulativePercent2)

# Combine all the information into a data frame

ResultTable2 <- data.frame(PersonaName = names(FreqTable2), 
                           Frequency = FreqTable2, 
                           Percentage = PercentTable2,
                           CumulativeSum = CumulativeSum2,
                           CumulativePercent = CumulativePercent2)[, c("PersonaName", "Frequency.Freq", "Percentage.Freq", 
                                "CumulativeSum", "CumulativePercent")]

# print(ResultTable2)

knitr::kable(ResultTable2, caption = "Table 2: Persona Names Frequency Distribution")

Table 2: Persona Names Frequency Distribution
	PersonaName	Frequency.Freq	Percentage.Freq	CumulativeSum	CumulativePercent
The Freelance Guru	The Freelance Guru	2922	11.688	2922	11.688
The Health-Conscious Hero	The Health-Conscious Hero	2518	10.072	5440	21.760
The Indecisive	The Indecisive	987	3.948	6427	25.708
The Local Celebrity	The Local Celebrity	1775	7.100	8202	32.808
The Morning Regular	The Morning Regular	7033	28.132	15235	60.940
The Penny Pincher	The Penny Pincher	528	2.112	15763	63.052
The Social Media Maven	The Social Media Maven	4926	19.704	20689	82.756
The Studdy Sessioner	The Studdy Sessioner	4311	17.244	25000	100.000

# ------------------------------------------------------
# Giving names to "city" and adding them to CustomerDF
# ------------------------------------------------------

CityFreqTable = table(CustomerDF$City)

# print(CityFreqTable)

CityNames <- c(A = "Biddeford", B = "Lewiston", 
               C = "Portland", D = "Westbrook", 
               E = "Portsmouth", F = "Boston")

# print(CityNames)

CustomerDF$City <- CityNames[CustomerDF$City]

# ------------------------------------------------------
# Adding "State" column to CustomerDF
# ------------------------------------------------------

# Define a named vector mapping cities to states

StateMapping <- c(Biddeford = "ME", Lewiston = "ME", Portland = "ME", 
                  Westbrook = "ME", Portsmouth = "NH", Boston = "MA")

# Add State column based on city mapping

CustomerDF$State <- StateMapping[CustomerDF$City]

# Organize customerDF column

CustomerDF <- CustomerDF[c("FirstName", "LastName", "CustomerNumber",
                                                      "PersonaNumber",
                                                       "PersonaName",
                                                       "PhoneNumber",
                                                        "EmailAddress",
                                                        "City",
                                                        "State",
                                                        "PostalCode")]

# --------------------------------------------------------------------
# Giving accurate Postal Codes to match city and adding them to CustomerDF
# --------------------------------------------------------------------

# Define a named vector mapping cities to postal codes

PostalCodeMapping <- c(Biddeford = "04005", Lewiston = "04240", 
                       Portland = "04101", Westbrook = "04092", 
                       Portsmouth = "03801", Boston = "02116")

# print(PostalCodeMapping)

# Add PostalCode column based on city mapping

CustomerDF$PostalCode <- PostalCodeMapping[CustomerDF$City]

# ---------------------------------
# Step 2: Select n random customers
# ---------------------------------

set.seed(1996) # Set the seed for reproducibility

SelectedCustomers <- sample_n(CustomerDF, 500)

# head(SelectedCustomers)

dim(SelectedCustomers)  # Final Customer Table

## [1] 500  10

# print(SelectedCustomers) # Final Customer Table

CustomerTable6 <- head(SelectedCustomers)

knitr::kable(CustomerTable6, caption = "Cafecito Selected Customer Table")

Cafecito Selected Customer Table
FirstName	LastName	CustomerNumber	PersonaNumber	PersonaName	PhoneNumber	EmailAddress	City	State	PostalCode
James	HINES	686761	2	The Social Media Maven	401 - 312 - 9023	jameshines@example.com	Biddeford	ME	04005
Katherine	BRUCE	717779	2	The Social Media Maven	653 - 869 - 6422	katherinebruce@example.com	Boston	MA	02116
Lorelei	FROST	361593	1	The Morning Regular	227 - 599 - 6259	loreleifrost@example.com	Lewiston	ME	04240
Aarav	ZHANG	847030	2	The Social Media Maven	911 - 505 - 1161	aaravzhang@example.com	Portland	ME	04101
Lola	FRY	477764	1	The Morning Regular	555 - 729 - 3414	lolafry@example.com	Portsmouth	NH	03801
Emilio	KERR	419941	4	The Freelance Guru	924 - 287 - 1701	emiliokerr@example.com	Boston	MA	02116

# ------------------------------------------------------
# Code for a complete frequency table
# ------------------------------------------------------

FreqTable3 = table(SelectedCustomers$PersonaName)

# Calculate percentages

PercentTable3 <- prop.table(FreqTable3) * 100

# Calculate cumulative sum

CumulativeSum3 <- cumsum(FreqTable3)

# Calculate cumulative percentage

CumulativePercent3 <- cumsum(PercentTable3)

# Combine all the information into a data frame

ResultTable3 <- data.frame(PersonaName = names(FreqTable3), 
                           Frequency = FreqTable3, 
                           Percentage = PercentTable3,
                           CumulativeSum = CumulativeSum3,
                           CumulativePercent = CumulativePercent3)[, c("PersonaName", "Frequency.Freq", "Percentage.Freq", 
                                "CumulativeSum", "CumulativePercent")]

# print(ResultTable3)

knitr::kable(ResultTable3, caption = "Table 3: Persona Names Frequency Distribution:
             Sample Size = 500")

Table 3: Persona Names Frequency Distribution: Sample Size = 500
	PersonaName	Frequency.Freq	Percentage.Freq	CumulativeSum	CumulativePercent
The Freelance Guru	The Freelance Guru	54	10.8	54	10.8
The Health-Conscious Hero	The Health-Conscious Hero	60	12.0	114	22.8
The Indecisive	The Indecisive	19	3.8	133	26.6
The Local Celebrity	The Local Celebrity	41	8.2	174	34.8
The Morning Regular	The Morning Regular	131	26.2	305	61.0
The Penny Pincher	The Penny Pincher	11	2.2	316	63.2
The Social Media Maven	The Social Media Maven	106	21.2	422	84.4
The Studdy Sessioner	The Studdy Sessioner	78	15.6	500	100.0

The random selection was made to ensure a diverse representation of customers across different personas and geographic locations. Random selection through uniform distribution can be advantageous as it helps to avoid potential biases that could arise from a non-random selection process. By randomly selecting customers, it ensures that each customer has an equal chance of being included in the sample, thereby increasing the likelihood of capturing a diverse range of customer profiles. However, it’s important to acknowledge that this choice introduces a uniform distribution bias, potentially impacting the reliability and usefulness of the simulation results. A more balanced sampling approach may be necessary in future iterations to mitigate this bias and ensure a more representative dataset for statistical simulations. But for this statistical simulation, a uniform distribution is used.

Transaction Table

The transaction table serves as a crucial component of this CRM system, capturing essential information about customer transactions to further analyze purchasing patterns, product preferences, and revenue generation. The transaction table’s structure was carefully designed to include key attributes that provide comprehensive insights into customer behavior and transactional dynamics.

The chosen attributes for the transaction table include:

Order ID: An identifier uniquely assigned to each transaction, facilitating tracking and management of individual orders.

Order Date: The date and time when the transaction occurred, allowing for time-related analysis and trend identification.

Order Type: Categorization of the transaction type, such as Dine-In, Delivery, or Take Out, providing insights into customer preferences and service utilization.

Customer Number: A unique identifier linking each transaction to the respective customer, enabling customer focused analysis and segmentation.

Product: Description of the purchased items, including categories such as Coffee Beverages, Non-Coffee Beverages, Tea Beverages, Pastries, and Food, to support in product analysis and inventory management.

Items Purchased: Specific details of the items purchased within each transaction, providing detailed insights into customer preferences and buying behavior.

Item Size: The size or portion of the purchased items, such as 8 oz, 16 oz, 24 oz, or one size, allowing for analysis of size preferences and pricing strategies.

Quantity: The quantity of each item purchased in the transaction, assisting in sales volume analysis and inventory replenishment.

Price Per Unit: The unit price of each item, facilitating calculation of total transaction value and revenue analysis.

Order Total: The total value of the transaction, including all purchased items, providing a comprehensive view of transactional revenue.

Payment Method: The method used by the customer to make payment, such as Cash, Credit Card, or Online Payment, enabling analysis of payment preferences and financial transactions.

These attributes were chosen based on their significance in understanding customer purchasing behavior, analyzing sales performance, and optimizing operational processes. By capturing detailed transactional data, the transaction table enables Cafecito to gain valuable insights into customer preferences, product performance, and revenue generation, ultimately contributing to informed decision-making and strategic planning.

# ------------------------
# Step 3: Generating dates
# ------------------------

# ------------------------------------------------------
# All dates betwen two dates
# ------------------------------------------------------

AllDates <- seq(as.Date("2024-01-01"), as.Date("2024-12-31"), by="day")

# AllDates[1:366]

# length(AllDates)

# ------------------------------------------------------
# Generate multiple random date between two dates
# ------------------------------------------------------

StartDate <- as.Date("2023-01-01")

EndDate <- as.Date("2023-12-31")

RandomDate <- sample(seq(as.Date(StartDate), as.Date(EndDate), by="day"), 10)

# print(RandomDate)

# ------------------------------------------------------
# Function to generate random dates within a year
# This will be used to generate random dates for transactions
# ------------------------------------------------------

GenerateRandomDates <- function(n, StartDate, EndDate) {
  seq_dates <- seq(as.Date(StartDate), as.Date(EndDate), by="day")
  sample(seq_dates, n, replace = TRUE)
}

GenerateRandomDates(25, '2023-01-01', '2023-12-31')

##  [1] "2023-10-05" "2023-04-14" "2023-06-04" "2023-08-17" "2023-03-06"
##  [6] "2023-09-20" "2023-05-21" "2023-09-17" "2023-11-13" "2023-04-02"
## [11] "2023-08-18" "2023-07-27" "2023-06-03" "2023-09-30" "2023-02-26"
## [16] "2023-03-15" "2023-08-02" "2023-11-27" "2023-11-04" "2023-09-17"
## [21] "2023-07-04" "2023-12-05" "2023-11-12" "2023-10-20" "2023-12-31"

# ------------------------------------------------------
# Step 5: Transaction Table
# ------------------------------------------------------

# Generate transaction data

set.seed(1996)

# ----------------------------------------------------------------
# Method 1: Random generation of Order IDs
# ----------------------------------------------------------------

Test1 = data.frame(
  OrderID = c(1:10000)
)

# dim(Test1)

# head(Test1)

# ----------------------------------------------------------------
# Adding Customer IDs to the data frame
# ----------------------------------------------------------------

# Array of customer numbers, from the selected_customers table

CustomerNumbers <- SelectedCustomers$CustomerNumber

# CustomerNumbers[1:10]

# # ----------------------------------------------------------------
# # randomly select customers from the SelectedCustomer table 
# # What does replace = TRUE do?
# # ----------------------------------------------------------------

Test2 = data.frame(
  OrderID = c(1:10000),
  CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE)
)

# dim(Test2)

# head(Test2)

# Merge Test2 with SelectedCustomers to get the PersonaNumber and other details

Test2 <- merge(Test2, SelectedCustomers, by = "CustomerNumber")

# dim(Test2)

# head(Test2)

# Count unique number of customers

# length(unique(Test2$CustomerNumber))

# Customers with the highest number of transactions

sort(table(Test2$CustomerNumber), decreasing = TRUE)[1:10]

## 
## 545845 419941 691605 743992 747200 595426 995549 592638 974217 204593 
##     35     33     33     33     33     32     32     31     31     30

# ----------------------------------------------------------------
# Adding transaction date to the data frame
# ----------------------------------------------------------------

Test3 = data.frame(
  OrderID = c(1:10000),
  CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE),
  OrderDate = GenerateRandomDates(10000, '2023-01-01', '2023-12-31')
  
)

# dim(Test3)

# head(Test3)

# Merge Test3 with SelectedCustomers table

Test3 <- merge(Test3, SelectedCustomers, by = "CustomerNumber")

# dim(Test3)

# head(Test3)

# ----------------------------------------------------------------
# Adding more features to the data frame
# ----------------------------------------------------------------

# Product Details

ProductDetails <- list(
  Coffee_Beverages = c("Espresso", "Americano", "Latte", "Cappuccino", "Mocha", "Macchiato"),
  Tea_Beverages = c("Black Tea", "Green Tea", "Herbal Tea", "Chai Tea", "Matcha Latte", "Iced Tea"),
  Non_Coffee_Beverages = c("Guanabana Juice", "Canelazo", "Colada Morada", "Colada De AVena", 
                           "Chocolate Santafereño", "Chicha Morada"),
  Pastries = c("Arroz con Leche", "Muffin", "Scone", "Cookie", "Concha", "Tres Leches Cake", "Churros", "Ducle de Leche Flan"),
  Food = c("Empanadas", "Bolon", "Buñuelo", "Tamales", "Arepa", "Soup of the Day")
)

# Payment Method

PaymentMethods <- c("Cash", "Credit Card", "Online Payment")

# Add to Test4

Test4 = data.frame(
  OrderID = c(1:10000),
  CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE),
  OrderDate = GenerateRandomDates(10000, '2023-01-01', '2023-12-31'),        
  OrderType = sample(c("Dine-In", "Take Out", "Delivery"), 10000, replace = TRUE),
  Product = sample(c("Coffee_Beverages", "Tea_Beverages", "Non_Coffee_Beverages", "Pastries", "Food"), 10000, replace = TRUE),
  Quantity = sample(1:5, 10000, replace = TRUE),
  PricePerUnit = round(runif(10000,5, 15), 2),
  PaymentMethod = sample(PaymentMethods, 10000, replace = TRUE)
)

# head(Test4)

# Generate random product details for each transaction using the above lists

Test4$ItemsPurchased <- sapply(Test4$Product, function(x) sample(ProductDetails[[x]], 1))

# Calculate OrderTotal

Test4$OrderTotal <- Test4$Quantity * as.numeric(Test4$PricePerUnit)

# Add product sizes column 

ProductSizes <- c("8oz", "16oz", "24oz", "One Size")

# Assign sizes based on product type

Test4$ItemSize <- ifelse(Test4$Product %in% c("Coffee_Beverages", "Tea_Beverages", "Non_Coffee_Beverages"), 
                             sample(ProductSizes[1:3], nrow(Test4), replace = TRUE),
                             ProductSizes[4])

# Merge Test4 with SelectedCustomers

Test4 <- merge(Test4, SelectedCustomers, by = "CustomerNumber")

# Rearrange columns

Test4 <- Test4[, c("CustomerNumber", "FirstName", "LastName", "PersonaNumber", "PersonaName", "OrderID", "OrderDate", "OrderType", "Product", "ItemsPurchased", "ItemSize", "Quantity", "PricePerUnit", "OrderTotal", "PaymentMethod", "PhoneNumber", "EmailAddress", "City", "State", "PostalCode")]

dim(Test4) # Final Transaction Table (Includes Customer Table)

## [1] 10000    20

TransactionTable6 <- head(Test4) # Final Transaction Table (Includes Customer Table) 

knitr::kable(TransactionTable6, caption = " Cafecito Transaction Table")

Cafecito Transaction Table
CustomerNumber	FirstName	LastName	PersonaNumber	PersonaName	OrderID	OrderDate	OrderType	Product	ItemsPurchased	ItemSize	Quantity	PricePerUnit	OrderTotal	PaymentMethod	PhoneNumber	EmailAddress	City	State	PostalCode
101022	Shelby	FOLEY	5	The Health-Conscious Hero	7279	2023-01-31	Dine-In	Food	Bolon	One Size	4	9.74	38.96	Cash	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	8004	2023-05-31	Delivery	Food	Bolon	One Size	3	8.38	25.14	Online Payment	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	7783	2023-11-15	Take Out	Non_Coffee_Beverages	Guanabana Juice	16oz	5	7.46	37.30	Online Payment	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	8206	2023-08-05	Take Out	Coffee_Beverages	Latte	24oz	3	13.22	39.66	Cash	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	2241	2023-10-04	Dine-In	Tea_Beverages	Green Tea	24oz	5	9.24	46.20	Online Payment	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	5772	2023-06-26	Take Out	Pastries	Concha	One Size	1	8.47	8.47	Cash	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240

# print(Test4)  # Final Transaction Table (Includes Customer Table)

The Cafecito Transaction Table contains 10,000 rows of observation, each representing a unique transaction, and encompasses 20 columns. These columns are a combination of attributes from the Customer Table, capturing essential customer details, and new attributes specific to transactional data. Together, they form a comprehensive dataset that provides insights into customer behavior, transaction dynamics, and product performance. The first 6 rows can be seen in the transaction table below.

Additionally, the 10,000 rows of transactional data were generated at random, ensuring a diverse representation of customer interactions and transaction patterns. This uniform distribution approach is beneficial as it introduces variability and mimics real-world scenarios, allowing for a more realistic simulation of customer behavior. The use of random generation also ensures that each transaction is independent and unbiased, contributing to the statistical reliability of the simulation. Furthermore, by using a uniform distribution, the transactional data reflects a wide range of transaction frequencies and values, enhancing the richness and utility of the statistical stimulation.

Coupon & Offers Table

In developing the coupon & offers table for Cafecito’s promotional activities, several considerations were made to ensure effective management of these campaigns and their impact on customer behavior.

Firstly, seven new columns were introduced to capture essential details of each promotion. These include Coupon Code, Descriptions, Discount Amounts, Is Coupon Given, Is Coupon Redeemed, Coupon Discount Amount, and Coupon Campaign ID. These attributes provide comprehensive information about each promotion, from its unique identifier to its specific discount value and campaign ID.

One crucial aspect of this table is the inclusion of the Is Coupon Given column, which determines whether a coupon is distributed to a customer. A binomial distribution with a probability of 0.5 was chosen for generating values in this column. This probability reflects a balanced approach, where there’s an equal chance of a coupon being given or not. This decision aligns with the aim of reaching a broad customer base while maintaining a manageable distribution of coupons.

The use of a binomial distribution for generating coupon distribution reflects the unpredictability of real-world promotional activities. By simulating coupon distribution based on a probabilistic model, the simulation captures the variability and uncertainty that Cafecito may encounter in its promotional campaigns. This approach allows for the exploration of different scenarios and their potential impact on customer behavior.

# ----------------------------------------------------------------
# Adding coupon related features to the data frame
# ----------------------------------------------------------------

# ------------------------------------------------------
# Coupon Strategy preparation
# ------------------------------------------------------

# Use Test4 dataframe to create Test5 dataframe

Test5 <- Test4

# Add coupon values based on persona

Test5$CouponCode <- paste("CPN", Test5$PersonaNumber, sep = "")

CouponValues <- c(10, 15, 20, 25, 30, 35, 40, 50)

# Assign discount amounts based on persona names

Test5$DiscountAmountPercent <- CouponValues[Test5$PersonaNumber]

# Set IsCouponRedeemed randomly

Test5$IsCouponRedeemed <- sample(c("Yes", "No"), 10000, replace = TRUE)

# use binomial distribution to add a column to show whether a customer is given a coupon or not
# The first line creates a 0 or 1 indicator with a 50% chance (you may change it any other number)
# The second line attaches a coupon value if the first column is a 1. 

Test5$IsCouponGiven <- ifelse(rbinom(10000, 1, 0.5) == 1, "Yes", "No")

Test5$CouponDiscountAmountPercent = ifelse(Test5$IsCouponGiven == "Yes", Test5$DiscountAmountPercent, 0) 

# Create code where if coupon given is yes then coupon redeemed is yes or no, but if coupon given is no then coupon redeemed should be no

Test5$IsCouponRedeemed <- ifelse(Test5$IsCouponGiven == "Yes", 
                                 sample(c("Yes", "No"), length(Test5$IsCouponGiven), replace = TRUE), 
                                 "No")

# Set CouponCampaignID based on persona names

Test5$CouponCampaignID <- paste("CMP", Test5$PersonaNumber, sep = "")

# Set CouponDiscountAmount based on whether the coupon is redeemed

Test5$CouponDiscountAmountPercent <- ifelse(Test5$IsCouponRedeemed == "Yes", Test5$DiscountAmountPercent, 0)

# Define CouponDescriptions

CouponDescriptions <- c(
  "Happy Hour Deal",
  "Referral Reward",
  "Weekend Special",
  "Seasonal Promotion",
  "Holiday Promotion",
  "New Customer Offer",
  "Limited-Time Offer",
  "Buy 1, Get 1 Beverage 50% Off"
)

# Add CouponDescription column based on DiscountAmount

Test5$CouponDescription <- ifelse(Test5$DiscountAmountPercent %in% CouponValues, 
                                  CouponDescriptions[match(Test5$DiscountAmountPercent, CouponValues)], 
                                  "Regular Price")

# Re-arrange columns

Test5 <- Test5[, c("CustomerNumber", "FirstName", "LastName", "PersonaNumber",
                   "PersonaName", "OrderID", "OrderDate", "OrderType", "Product", 
                   "ItemsPurchased", "ItemSize", "Quantity", "PricePerUnit", "OrderTotal", 
                   "PaymentMethod", "CouponCode", "DiscountAmountPercent", "IsCouponGiven", 
                   "IsCouponRedeemed", "CouponDiscountAmountPercent", "CouponCampaignID", 
                   "CouponDescription", "PhoneNumber", "EmailAddress", "City", "State", "PostalCode")]

# Print the updated dataframe

dim(Test5)

## [1] 10000    27

CouponTable6 <- head(Test5) # Final Coupon Table (Includes Customer Table & Transaction Table) 

knitr::kable(CouponTable6, caption = "Cafecito Coupon & Offers Table")  # Final Coupon Table (Includes Customer Table & Transaction Table)

Cafecito Coupon & Offers Table
CustomerNumber	FirstName	LastName	PersonaNumber	PersonaName	OrderID	OrderDate	OrderType	Product	ItemsPurchased	ItemSize	Quantity	PricePerUnit	OrderTotal	PaymentMethod	CouponCode	DiscountAmountPercent	IsCouponGiven	IsCouponRedeemed	CouponDiscountAmountPercent	CouponCampaignID	CouponDescription	PhoneNumber	EmailAddress	City	State	PostalCode
101022	Shelby	FOLEY	5	The Health-Conscious Hero	7279	2023-01-31	Dine-In	Food	Bolon	One Size	4	9.74	38.96	Cash	CPN5	30	No	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	8004	2023-05-31	Delivery	Food	Bolon	One Size	3	8.38	25.14	Online Payment	CPN5	30	Yes	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	7783	2023-11-15	Take Out	Non_Coffee_Beverages	Guanabana Juice	16oz	5	7.46	37.30	Online Payment	CPN5	30	No	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	8206	2023-08-05	Take Out	Coffee_Beverages	Latte	24oz	3	13.22	39.66	Cash	CPN5	30	Yes	Yes	30	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	2241	2023-10-04	Dine-In	Tea_Beverages	Green Tea	24oz	5	9.24	46.20	Online Payment	CPN5	30	No	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	5772	2023-06-26	Take Out	Pastries	Concha	One Size	1	8.47	8.47	Cash	CPN5	30	Yes	Yes	30	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240

# print(Test5) # Final Coupon Table (Includes Customer Table & Transaction Table)

Furthermore, most parameters, such as coupon descriptions and discount amounts, were chosen at random (uniform distribution). This uniform distribution approach adds variability to the simulation, imitating the diverse nature of promotional offers in the coffee shop industry. By incorporating randomness into the simulation, the model becomes adaptable to different situations, enhancing its usefulness for statistical stimulation.

The first 6 rows of the Cafecito Coupon & Offers Table can be seen below. This table contains 10,000 rows of observation, each representing a unique transaction, and contains 27 columns. These columns are a combination of attributes from the Customer Table and Transaction Table. Together, they form a comprehensive dataset that provides insights into customer behavior, transaction dynamics, and product performance.

Overall, the selection of a probability of 0.5 for coupon distribution, along with the randomization of other parameters, ensures a realistic and flexible approach to simulating promotional activities for Cafecito. This approach allows for comprehensive exploration of various scenarios and their effects on customer engagement and order patterns, ultimately contributing to the optimization of promotional strategies in the coffee shop franchise.

The Final CRM Table

The merging of all three CRM systems culminates in the creation of the Final_Cafecito_Shop_Dataset, a comprehensive dataset that consolidates customer, transaction, and coupon & offers information into a unified framework. This integrated dataset combines data from the Customer Table, Transaction Table, and Coupon & Offers Table, offering a broader perspective on customer interactions, transactional patterns, and promotional initiatives within the Cafecito coffee shop franchise.

The Final_Cafecito_Shop_Dataset contains a total of 10,000 rows of observations, each representing a unique transaction, and contains 26 columns. These columns include a rich range of attributes, including customer demographics, transaction details, product information, and coupon & offers data. By consolidating these tables, a powerful resource has been created for analyzing customer preferences, identifying trends, and optimizing marketing strategies.

The attributes included in the Final_Cafecito_Shop_Dataset are obtained from the customer, transaction, and coupon & offers tables, ensuring that all relevant information is captured and synthesized in one cohesive dataset. This integration implements seamless analysis and reporting, which enables actionable insights and informed decision making to drive business growth and enhance customer satisfaction.

As a preview, the first six rows of the Final_Cafecito_Shop_Dataset are presented below:

Final_Cafecito_Shop_Dataset <- Test5 # Final dataset with all three tables combined

dim(Final_Cafecito_Shop_Dataset)

## [1] 10000    27

Final_Cafecito_Shop_Dataset6 <- head(Final_Cafecito_Shop_Dataset)

knitr::kable(Final_Cafecito_Shop_Dataset6, caption = "The Final Cafecito Shop Dataset")

The Final Cafecito Shop Dataset
CustomerNumber	FirstName	LastName	PersonaNumber	PersonaName	OrderID	OrderDate	OrderType	Product	ItemsPurchased	ItemSize	Quantity	PricePerUnit	OrderTotal	PaymentMethod	CouponCode	DiscountAmountPercent	IsCouponGiven	IsCouponRedeemed	CouponDiscountAmountPercent	CouponCampaignID	CouponDescription	PhoneNumber	EmailAddress	City	State	PostalCode
101022	Shelby	FOLEY	5	The Health-Conscious Hero	7279	2023-01-31	Dine-In	Food	Bolon	One Size	4	9.74	38.96	Cash	CPN5	30	No	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	8004	2023-05-31	Delivery	Food	Bolon	One Size	3	8.38	25.14	Online Payment	CPN5	30	Yes	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	7783	2023-11-15	Take Out	Non_Coffee_Beverages	Guanabana Juice	16oz	5	7.46	37.30	Online Payment	CPN5	30	No	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	8206	2023-08-05	Take Out	Coffee_Beverages	Latte	24oz	3	13.22	39.66	Cash	CPN5	30	Yes	Yes	30	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	2241	2023-10-04	Dine-In	Tea_Beverages	Green Tea	24oz	5	9.24	46.20	Online Payment	CPN5	30	No	No	0	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240
101022	Shelby	FOLEY	5	The Health-Conscious Hero	5772	2023-06-26	Take Out	Pastries	Concha	One Size	1	8.47	8.47	Cash	CPN5	30	Yes	Yes	30	CMP5	Holiday Promotion	492 - 895 - 2441	shelbyfoley@example.com	Lewiston	ME	04240

write.csv(Final_Cafecito_Shop_Dataset, "Final_Cafecito_Shop_Dataset.csv", row.names = FALSE)

Furthermore, the creation of the Final_Cafecito_Shop_Dataset represents a significant milestone in this CRM system development, providing a comprehensive and unified view of customer interactions and transactional dynamics. This dataset serves as a valuable asset for strategic decision-making, customer segmentation, and targeted marketing efforts, empowering Cafecito to deliver exceptional experiences and drive success in the competitive coffee shop industry.

Conclusion

The development of a comprehensive Customer Relationship Management (CRM) system for Cafecito has been a strategic endeavor to gain a better understanding of the potential customer base and optimizing marketing strategies. Through meticulous design and integration of customer, transaction, and promotional data, a powerful resource for analyzing customer interactions and transactional dynamics was created, Final_Cafecito_Shop_Dataset. This dataset, which contains 10,000 rows of observations and 27 columns, empowers Cafecito with actionable insights for strategic decision-making and targeted marketing efforts. By utilizing the insights gained from this CRM system, Cafecito is ready to deliver exceptional experiences and drive success in the competitive coffee shop industry. The development of a CRM system that aims to drive growth, enhance customer satisfaction, and solidify Cafecito’s position as a leading coffee shop franchise.

Resources

Bluman, A. (2018). Elementary statistics: A step by step approach (10th ed.). McGraw Hill. Goodreads. (n.d.).

Kabacoff, R.I. (2022). R in action: Data analysis and graphics with R and tidyverse (3rd edition).