Business Description

Cafecito, an embodiment of Latin American heritage and flavors, stands as a beacon of warmth and community in the bustling streets of New England’s vibrant cities. Cafecito, a small franchise chain, is driven by a passion for delivering an authentic coffee experience dedicated to providing a haven where customers can savor the rich aromas and flavors of carefully crafted beverages.

In the heart of each neighborhood, Cafecito establishes its presence with a network of boutique coffee shops, each radiating a cozy ambiance that invites patrons to linger and connect. Currently, Cafecito has 10 locations across New England, including Biddeford, Lewiston, Portland and Westbrook in Maine, Portsmouth in New Hampshire, and Boston in Massachusetts, strategically positioned to cater to diverse communities.

At Cafecito, the menu is a celebration of Latin American coffee traditions, offering a delightful array of specialty coffees, including Ecuadorian delights like canelazo, colada morada, and colada de avena, alongside refreshing guanabana juice. Colombian hot chocolate like chocolate santafereño and Peruvian specialties such as chicha morada add an exotic flair to our offerings.

The food menu is equally enticing, featuring traditional delights from across Latin America. Indulge in savory bolon, crispy buñuelo, flavorful empanadas, and hearty tamales, each small plate is a testament to the rich culinary heritage of the region.

The target customer demographic spans across various age groups, united by a love for exceptional coffee and a desire for genuine connection. From busy professionals seeking a morning pick-me-up to students craving a cozy study spot, Cafecito welcomes all who seek a moment of respite from the daily hustle.

As Cafecito embarks on its journey to revolutionize the coffee culture, it transcends being simply a place to enjoy a cup of coffee; it becomes a destination where relationships flourish, stories are shared, and memories are made. Join in embracing the essence of Latin hospitality, one sip at a time. And by adding sazon to your life!

# Week 2

# Importing and preparing the dataset 

CustomerDF <- read_csv("unique_customers.csv")
## Rows: 25000 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): FirstName, LastName, PhoneNumber, EmailAddress, City
## dbl (3): CustomerNumber, PersonaNumber, PostalCode
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Clean names

clean_names(CustomerDF)

# Learn about the dataset

dim(CustomerDF) # checking dimensions of the dataset

summary(CustomerDF)

# Checking the structure of the dataset 

str(CustomerDF)

head(CustomerDF)

print(CustomerDF)

CRM Data Structure & Rationale

In the development process of a Customer Relationship Management (CRM) system for Cafecito, the objective is to deepen the understanding of the potential customer base. The task entails generating a comprehensive dataset that accurately reflects realistic customer interactions, transaction patterns, and marketing strategies.

The CRM system is structured around three primary tables: Customer Table, Transaction Table, and Coupon & Offers Table. These tables are merged into a final data frame named Final_Cafecito_Shop_Dataset, providing a consolidated perspective of customer data and interactions.

The foundational data for the CRM system is sourced from a customer data frame consisting of 25,000 observations and 10 variables, incorporating critical details such as First Name, Last Name, Customer Number, Persona Number, Contact Information (Phone Number, Email Address), City, and Postal Code. This dataset serves as the basis for simulating various customer interactions, transactional behaviors, and marketing endeavors.

The customer table will serve as an archive for detailed customer information, facilitating the tracking of individual preferences, demographics, and contact details. Concurrently, the Transaction Table will record transactional data, including order specifics, purchase history, and payment methods, allowing for in-depth analysis of buying patterns and trends.

Additionally, the coupon and offers table will organize promotional activities, managing coupon codes, descriptions, discount values, etc. Through the implementation of constraints such as conditional coupon values and targeted marketing strategies, the objective is to optimize customer engagement and retention.

Upon integration into a unified dataset, these tables will provide valuable insights into customer behavior, preferences, and loyalty. This comprehensive perspective will enable Cafecito to tailor its products, services, and marketing strategies to effectively serve and engage with its diverse customer base.

Customer Table

The customer table serves as a cornerstone of our CRM system, containing crucial information about each customer and providing insights for effective customer engagement. The structure of the customer table was carefully designed based on the analysis of the customer dataset and considerations of Cafecito’s business objectives.

Before proceeding with the customer table’s creation, an exploration of the frequency distribution of the persona number was conducted (Table 1). This analysis revealed a disproportionate distribution of persona numbers, indicating the need for further exploration and potential adjustments to ensure representation in customer segmentation.

To enhance the understanding of the persona numbers and align them with the clientele Cafecito aims to attract, names were assigned to each persona number (Table 2). These persona names include descriptors such as The Morning Regular, The Social Media Maven, The Health-Conscious Hero, The Studdy Sessioner, The Freelance Guru, The Local Celebrity, The Indecisive, and The Penny Pincher, reflecting different customer profiles and behaviors.

# ------------------------------------------------------
# Code for a complete frequency table
# ------------------------------------------------------

FreqTable = table(CustomerDF$PersonaNumber)

# print(FreqTable)

# Calculate percentages

PercentTable <- prop.table(FreqTable) * 100

# print(PercentTable)

# Calculate cumulative sum

CumulativeSum <- cumsum(FreqTable)

# print(CumulativeSum)

# Calculate cumulative percentage

CumulativePercentage <- cumsum(PercentTable)

# print(CumulativePercentage)

# Combine all the information into a data frame

ResultTable <- data.frame(PersonaID = names(FreqTable), 
                           Frequency = FreqTable, 
                           Percentage = PercentTable,
                           CumulativeSum = CumulativeSum,
                           CumulativePercent = CumulativePercentage)[, c("PersonaID", "Frequency.Freq", "Percentage.Freq", 
                                "CumulativeSum", "CumulativePercent")]


# print(ResultTable) 

knitr::kable(ResultTable, caption = "Table 1: Persona ID Frequency Distribution")
Table 1: Persona ID Frequency Distribution
PersonaID Frequency.Freq Percentage.Freq CumulativeSum CumulativePercent
1 7033 28.132 7033 28.132
2 4926 19.704 11959 47.836
3 4311 17.244 16270 65.080
4 2922 11.688 19192 76.768
5 2518 10.072 21710 86.840
6 1775 7.100 23485 93.940
7 987 3.948 24472 97.888
8 528 2.112 25000 100.000
# ---------------------------------------------------------------------------
# The above table shows that the personas are disproportionately distributed
# ---------------------------------------------------------------------------

In the creation of the customer table, several attributes were made to enrich the dataset and align it with the business needs. These additions include the inclusion of columns for persona names, replacing the original city column with specific city names corresponding to the locations of Cafecito’s franchise (Biddeford, Lewiston, Portland, Westbrook, Portsmouth, and Boston), and the addition of a state column to indicate the location of each franchise.

Furthermore, the postal codes in the original customer dataset were updated to match the city locations of the franchise outlets. This ensures accurate geographic representation within the dataset, enabling targeted marketing efforts and localized customer engagement strategies.

Following these changes, a random selection of 500 customers was extracted from the customer data frame and stored into a new data frame,SelectedCustomers. The customer table has 500 observations with 10 columns (variables). Additionally, Table 3 shows the frequency distribution of persona names, totaling 500 customers. This distribution illustrates the allocation of customers across different persona names, providing insights into the composition of the selected customer sample.

# Creating a customer table relevant to my business franchise 

# ------------------------------------------------------
# Giving names to personas and adding them to CustomerDF
# ------------------------------------------------------

PersonaNames <- c("The Morning Regular", "The Social Media Maven", "The Studdy Sessioner", "The Freelance Guru", "The Health-Conscious Hero", "The Local Celebrity", "The Indecisive", "The Penny Pincher")

# print(PersonaNames)

CustomerDF$PersonaName <- PersonaNames[CustomerDF$PersonaNumber]

# ------------------------------------------------------
# Code for a complete frequency table of persona names
# ------------------------------------------------------

FreqTable2 = table(CustomerDF$PersonaName)

# print(FreqTable2)

# Calculate percentages

PercentTable2 <- prop.table(FreqTable2) * 100

# print(PercentTable2)

# Calculate cumulative sum

CumulativeSum2 <- cumsum(FreqTable2)

# print(CumulativeSum2)

# Calculate cumulative percentage

CumulativePercent2 <- cumsum(PercentTable2)

# print(CumulativePercent2)

# Combine all the information into a data frame

ResultTable2 <- data.frame(PersonaName = names(FreqTable2), 
                           Frequency = FreqTable2, 
                           Percentage = PercentTable2,
                           CumulativeSum = CumulativeSum2,
                           CumulativePercent = CumulativePercent2)[, c("PersonaName", "Frequency.Freq", "Percentage.Freq", 
                                "CumulativeSum", "CumulativePercent")]

# print(ResultTable2)

knitr::kable(ResultTable2, caption = "Table 2: Persona Names Frequency Distribution")
Table 2: Persona Names Frequency Distribution
PersonaName Frequency.Freq Percentage.Freq CumulativeSum CumulativePercent
The Freelance Guru The Freelance Guru 2922 11.688 2922 11.688
The Health-Conscious Hero The Health-Conscious Hero 2518 10.072 5440 21.760
The Indecisive The Indecisive 987 3.948 6427 25.708
The Local Celebrity The Local Celebrity 1775 7.100 8202 32.808
The Morning Regular The Morning Regular 7033 28.132 15235 60.940
The Penny Pincher The Penny Pincher 528 2.112 15763 63.052
The Social Media Maven The Social Media Maven 4926 19.704 20689 82.756
The Studdy Sessioner The Studdy Sessioner 4311 17.244 25000 100.000
# ------------------------------------------------------
# Giving names to "city" and adding them to CustomerDF
# ------------------------------------------------------

CityFreqTable = table(CustomerDF$City)

# print(CityFreqTable)

CityNames <- c(A = "Biddeford", B = "Lewiston", 
               C = "Portland", D = "Westbrook", 
               E = "Portsmouth", F = "Boston")

# print(CityNames)

CustomerDF$City <- CityNames[CustomerDF$City]

# ------------------------------------------------------
# Adding "State" column to CustomerDF
# ------------------------------------------------------

# Define a named vector mapping cities to states

StateMapping <- c(Biddeford = "ME", Lewiston = "ME", Portland = "ME", 
                  Westbrook = "ME", Portsmouth = "NH", Boston = "MA")

# Add State column based on city mapping

CustomerDF$State <- StateMapping[CustomerDF$City]

# Organize customerDF column

CustomerDF <- CustomerDF[c("FirstName", "LastName", "CustomerNumber",
                                                      "PersonaNumber",
                                                       "PersonaName",
                                                       "PhoneNumber",
                                                        "EmailAddress",
                                                        "City",
                                                        "State",
                                                        "PostalCode")]

# --------------------------------------------------------------------
# Giving accurate Postal Codes to match city and adding them to CustomerDF
# --------------------------------------------------------------------

# Define a named vector mapping cities to postal codes

PostalCodeMapping <- c(Biddeford = "04005", Lewiston = "04240", 
                       Portland = "04101", Westbrook = "04092", 
                       Portsmouth = "03801", Boston = "02116")

# print(PostalCodeMapping)

# Add PostalCode column based on city mapping

CustomerDF$PostalCode <- PostalCodeMapping[CustomerDF$City]

# ---------------------------------
# Step 2: Select n random customers
# ---------------------------------

set.seed(1996) # Set the seed for reproducibility

SelectedCustomers <- sample_n(CustomerDF, 500)

# head(SelectedCustomers)

dim(SelectedCustomers)  # Final Customer Table
## [1] 500  10
# print(SelectedCustomers) # Final Customer Table

CustomerTable6 <- head(SelectedCustomers)

knitr::kable(CustomerTable6, caption = "Cafecito Selected Customer Table")
Cafecito Selected Customer Table
FirstName LastName CustomerNumber PersonaNumber PersonaName PhoneNumber EmailAddress City State PostalCode
James HINES 686761 2 The Social Media Maven 401 - 312 - 9023 Biddeford ME 04005
Katherine BRUCE 717779 2 The Social Media Maven 653 - 869 - 6422 Boston MA 02116
Lorelei FROST 361593 1 The Morning Regular 227 - 599 - 6259 Lewiston ME 04240
Aarav ZHANG 847030 2 The Social Media Maven 911 - 505 - 1161 Portland ME 04101
Lola FRY 477764 1 The Morning Regular 555 - 729 - 3414 Portsmouth NH 03801
Emilio KERR 419941 4 The Freelance Guru 924 - 287 - 1701 Boston MA 02116
# ------------------------------------------------------
# Code for a complete frequency table
# ------------------------------------------------------

FreqTable3 = table(SelectedCustomers$PersonaName)

# Calculate percentages

PercentTable3 <- prop.table(FreqTable3) * 100

# Calculate cumulative sum

CumulativeSum3 <- cumsum(FreqTable3)

# Calculate cumulative percentage

CumulativePercent3 <- cumsum(PercentTable3)

# Combine all the information into a data frame

ResultTable3 <- data.frame(PersonaName = names(FreqTable3), 
                           Frequency = FreqTable3, 
                           Percentage = PercentTable3,
                           CumulativeSum = CumulativeSum3,
                           CumulativePercent = CumulativePercent3)[, c("PersonaName", "Frequency.Freq", "Percentage.Freq", 
                                "CumulativeSum", "CumulativePercent")]

# print(ResultTable3)

knitr::kable(ResultTable3, caption = "Table 3: Persona Names Frequency Distribution:
             Sample Size = 500")
Table 3: Persona Names Frequency Distribution: Sample Size = 500
PersonaName Frequency.Freq Percentage.Freq CumulativeSum CumulativePercent
The Freelance Guru The Freelance Guru 54 10.8 54 10.8
The Health-Conscious Hero The Health-Conscious Hero 60 12.0 114 22.8
The Indecisive The Indecisive 19 3.8 133 26.6
The Local Celebrity The Local Celebrity 41 8.2 174 34.8
The Morning Regular The Morning Regular 131 26.2 305 61.0
The Penny Pincher The Penny Pincher 11 2.2 316 63.2
The Social Media Maven The Social Media Maven 106 21.2 422 84.4
The Studdy Sessioner The Studdy Sessioner 78 15.6 500 100.0

The random selection was made to ensure a diverse representation of customers across different personas and geographic locations. Random selection through uniform distribution can be advantageous as it helps to avoid potential biases that could arise from a non-random selection process. By randomly selecting customers, it ensures that each customer has an equal chance of being included in the sample, thereby increasing the likelihood of capturing a diverse range of customer profiles. However, it’s important to acknowledge that this choice introduces a uniform distribution bias, potentially impacting the reliability and usefulness of the simulation results. A more balanced sampling approach may be necessary in future iterations to mitigate this bias and ensure a more representative dataset for statistical simulations. But for this statistical simulation, a uniform distribution is used.

Transaction Table

The transaction table serves as a crucial component of this CRM system, capturing essential information about customer transactions to further analyze purchasing patterns, product preferences, and revenue generation. The transaction table’s structure was carefully designed to include key attributes that provide comprehensive insights into customer behavior and transactional dynamics. 

The chosen attributes for the transaction table include: 

  1. Order ID: An identifier uniquely assigned to each transaction, facilitating tracking and management of individual orders. 
  1. Order Date: The date and time when the transaction occurred, allowing for time-related analysis and trend identification. 
  1. Order Type: Categorization of the transaction type, such as Dine-In, Delivery, or Take Out, providing insights into customer preferences and service utilization. 
  1. Customer Number: A unique identifier linking each transaction to the respective customer, enabling customer focused analysis and segmentation. 
  1. Product: Description of the purchased items, including categories such as Coffee Beverages, Non-Coffee Beverages, Tea Beverages, Pastries, and Food, to support in product analysis and inventory management. 
  1. Items Purchased: Specific details of the items purchased within each transaction, providing detailed insights into customer preferences and buying behavior. 
  1. Item Size: The size or portion of the purchased items, such as 8 oz, 16 oz, 24 oz, or one size, allowing for analysis of size preferences and pricing strategies. 
  1. Quantity: The quantity of each item purchased in the transaction, assisting in sales volume analysis and inventory replenishment. 
  1. Price Per Unit: The unit price of each item, facilitating calculation of total transaction value and revenue analysis. 
  1. Order Total: The total value of the transaction, including all purchased items, providing a comprehensive view of transactional revenue. 
  1. Payment Method: The method used by the customer to make payment, such as Cash, Credit Card, or Online Payment, enabling analysis of payment preferences and financial transactions. 

These attributes were chosen based on their significance in understanding customer purchasing behavior, analyzing sales performance, and optimizing operational processes. By capturing detailed transactional data, the transaction table enables Cafecito to gain valuable insights into customer preferences, product performance, and revenue generation, ultimately contributing to informed decision-making and strategic planning. 

# ------------------------
# Step 3: Generating dates
# ------------------------

# ------------------------------------------------------
# All dates betwen two dates
# ------------------------------------------------------

AllDates <- seq(as.Date("2024-01-01"), as.Date("2024-12-31"), by="day")

# AllDates[1:366]

# length(AllDates)

# ------------------------------------------------------
# Generate multiple random date between two dates
# ------------------------------------------------------

StartDate <- as.Date("2023-01-01")

EndDate <- as.Date("2023-12-31")

RandomDate <- sample(seq(as.Date(StartDate), as.Date(EndDate), by="day"), 10)

# print(RandomDate)

# ------------------------------------------------------
# Function to generate random dates within a year
# This will be used to generate random dates for transactions
# ------------------------------------------------------

GenerateRandomDates <- function(n, StartDate, EndDate) {
  seq_dates <- seq(as.Date(StartDate), as.Date(EndDate), by="day")
  sample(seq_dates, n, replace = TRUE)
}

GenerateRandomDates(25, '2023-01-01', '2023-12-31')
##  [1] "2023-10-05" "2023-04-14" "2023-06-04" "2023-08-17" "2023-03-06"
##  [6] "2023-09-20" "2023-05-21" "2023-09-17" "2023-11-13" "2023-04-02"
## [11] "2023-08-18" "2023-07-27" "2023-06-03" "2023-09-30" "2023-02-26"
## [16] "2023-03-15" "2023-08-02" "2023-11-27" "2023-11-04" "2023-09-17"
## [21] "2023-07-04" "2023-12-05" "2023-11-12" "2023-10-20" "2023-12-31"
# ------------------------------------------------------
# Step 5: Transaction Table
# ------------------------------------------------------

# Generate transaction data

set.seed(1996)

# ----------------------------------------------------------------
# Method 1: Random generation of Order IDs
# ----------------------------------------------------------------

Test1 = data.frame(
  OrderID = c(1:10000)
)

# dim(Test1)

# head(Test1)

# ----------------------------------------------------------------
# Adding Customer IDs to the data frame
# ----------------------------------------------------------------

# Array of customer numbers, from the selected_customers table

CustomerNumbers <- SelectedCustomers$CustomerNumber

# CustomerNumbers[1:10]

# # ----------------------------------------------------------------
# # randomly select customers from the SelectedCustomer table 
# # What does replace = TRUE do?
# # ----------------------------------------------------------------

Test2 = data.frame(
  OrderID = c(1:10000),
  CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE)
)

# dim(Test2)

# head(Test2)

# Merge Test2 with SelectedCustomers to get the PersonaNumber and other details

Test2 <- merge(Test2, SelectedCustomers, by = "CustomerNumber")

# dim(Test2)

# head(Test2)

# Count unique number of customers

# length(unique(Test2$CustomerNumber))

# Customers with the highest number of transactions

sort(table(Test2$CustomerNumber), decreasing = TRUE)[1:10]
## 
## 545845 419941 691605 743992 747200 595426 995549 592638 974217 204593 
##     35     33     33     33     33     32     32     31     31     30
# ----------------------------------------------------------------
# Adding transaction date to the data frame
# ----------------------------------------------------------------

Test3 = data.frame(
  OrderID = c(1:10000),
  CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE),
  OrderDate = GenerateRandomDates(10000, '2023-01-01', '2023-12-31')
  
)

# dim(Test3)

# head(Test3)

# Merge Test3 with SelectedCustomers table

Test3 <- merge(Test3, SelectedCustomers, by = "CustomerNumber")

# dim(Test3)

# head(Test3)

# ----------------------------------------------------------------
# Adding more features to the data frame
# ----------------------------------------------------------------

# Product Details

ProductDetails <- list(
  Coffee_Beverages = c("Espresso", "Americano", "Latte", "Cappuccino", "Mocha", "Macchiato"),
  Tea_Beverages = c("Black Tea", "Green Tea", "Herbal Tea", "Chai Tea", "Matcha Latte", "Iced Tea"),
  Non_Coffee_Beverages = c("Guanabana Juice", "Canelazo", "Colada Morada", "Colada De AVena", 
                           "Chocolate Santafereño", "Chicha Morada"),
  Pastries = c("Arroz con Leche", "Muffin", "Scone", "Cookie", "Concha", "Tres Leches Cake", "Churros", "Ducle de Leche Flan"),
  Food = c("Empanadas", "Bolon", "Buñuelo", "Tamales", "Arepa", "Soup of the Day")
)

# Payment Method

PaymentMethods <- c("Cash", "Credit Card", "Online Payment")

# Add to Test4

Test4 = data.frame(
  OrderID = c(1:10000),
  CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE),
  OrderDate = GenerateRandomDates(10000, '2023-01-01', '2023-12-31'),        
  OrderType = sample(c("Dine-In", "Take Out", "Delivery"), 10000, replace = TRUE),
  Product = sample(c("Coffee_Beverages", "Tea_Beverages", "Non_Coffee_Beverages", "Pastries", "Food"), 10000, replace = TRUE),
  Quantity = sample(1:5, 10000, replace = TRUE),
  PricePerUnit = round(runif(10000,5, 15), 2),
  PaymentMethod = sample(PaymentMethods, 10000, replace = TRUE)
)

# head(Test4)

# Generate random product details for each transaction using the above lists

Test4$ItemsPurchased <- sapply(Test4$Product, function(x) sample(ProductDetails[[x]], 1))

# Calculate OrderTotal

Test4$OrderTotal <- Test4$Quantity * as.numeric(Test4$PricePerUnit)

# Add product sizes column 

ProductSizes <- c("8oz", "16oz", "24oz", "One Size")

# Assign sizes based on product type

Test4$ItemSize <- ifelse(Test4$Product %in% c("Coffee_Beverages", "Tea_Beverages", "Non_Coffee_Beverages"), 
                             sample(ProductSizes[1:3], nrow(Test4), replace = TRUE),
                             ProductSizes[4])

# Merge Test4 with SelectedCustomers

Test4 <- merge(Test4, SelectedCustomers, by = "CustomerNumber")

# Rearrange columns

Test4 <- Test4[, c("CustomerNumber", "FirstName", "LastName", "PersonaNumber", "PersonaName", "OrderID", "OrderDate", "OrderType", "Product", "ItemsPurchased", "ItemSize", "Quantity", "PricePerUnit", "OrderTotal", "PaymentMethod", "PhoneNumber", "EmailAddress", "City", "State", "PostalCode")]

dim(Test4) # Final Transaction Table (Includes Customer Table) 
## [1] 10000    20
TransactionTable6 <- head(Test4) # Final Transaction Table (Includes Customer Table) 

knitr::kable(TransactionTable6, caption = " Cafecito Transaction Table")
Cafecito Transaction Table
CustomerNumber FirstName LastName PersonaNumber PersonaName OrderID OrderDate OrderType Product ItemsPurchased ItemSize Quantity PricePerUnit OrderTotal PaymentMethod PhoneNumber EmailAddress City State PostalCode
101022 Shelby FOLEY 5 The Health-Conscious Hero 7279 2023-01-31 Dine-In Food Bolon One Size 4 9.74 38.96 Cash 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 8004 2023-05-31 Delivery Food Bolon One Size 3 8.38 25.14 Online Payment 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 7783 2023-11-15 Take Out Non_Coffee_Beverages Guanabana Juice 16oz 5 7.46 37.30 Online Payment 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 8206 2023-08-05 Take Out Coffee_Beverages Latte 24oz 3 13.22 39.66 Cash 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 2241 2023-10-04 Dine-In Tea_Beverages Green Tea 24oz 5 9.24 46.20 Online Payment 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 5772 2023-06-26 Take Out Pastries Concha One Size 1 8.47 8.47 Cash 492 - 895 - 2441 Lewiston ME 04240
# print(Test4)  # Final Transaction Table (Includes Customer Table) 

The Cafecito Transaction Table contains 10,000 rows of observation, each representing a unique transaction, and encompasses 20 columns. These columns are a combination of attributes from the Customer Table, capturing essential customer details, and new attributes specific to transactional data. Together, they form a comprehensive dataset that provides insights into customer behavior, transaction dynamics, and product performance. The first 6 rows can be seen in the transaction table below.  

Additionally, the 10,000 rows of transactional data were generated at random, ensuring a diverse representation of customer interactions and transaction patterns. This uniform distribution approach is beneficial as it introduces variability and mimics real-world scenarios, allowing for a more realistic simulation of customer behavior. The use of random generation also ensures that each transaction is independent and unbiased, contributing to the statistical reliability of the simulation. Furthermore, by using a uniform distribution, the transactional data reflects a wide range of transaction frequencies and values, enhancing the richness and utility of the statistical stimulation. 

Coupon & Offers Table

In developing the coupon & offers table for Cafecito’s promotional activities, several considerations were made to ensure effective management of these campaigns and their impact on customer behavior. 

Firstly, seven new columns were introduced to capture essential details of each promotion. These include Coupon Code, Descriptions, Discount Amounts, Is Coupon Given, Is Coupon Redeemed, Coupon Discount Amount, and Coupon Campaign ID. These attributes provide comprehensive information about each promotion, from its unique identifier to its specific discount value and campaign ID. 

One crucial aspect of this table is the inclusion of the Is Coupon Given column, which determines whether a coupon is distributed to a customer. A binomial distribution with a probability of 0.5 was chosen for generating values in this column. This probability reflects a balanced approach, where there’s an equal chance of a coupon being given or not. This decision aligns with the aim of reaching a broad customer base while maintaining a manageable distribution of coupons. 

The use of a binomial distribution for generating coupon distribution reflects the unpredictability of real-world promotional activities. By simulating coupon distribution based on a probabilistic model, the simulation captures the variability and uncertainty that Cafecito may encounter in its promotional campaigns. This approach allows for the exploration of different scenarios and their potential impact on customer behavior. 

# ----------------------------------------------------------------
# Adding coupon related features to the data frame
# ----------------------------------------------------------------

# ------------------------------------------------------
# Coupon Strategy preparation
# ------------------------------------------------------

# Use Test4 dataframe to create Test5 dataframe

Test5 <- Test4

# Add coupon values based on persona

Test5$CouponCode <- paste("CPN", Test5$PersonaNumber, sep = "")

CouponValues <- c(10, 15, 20, 25, 30, 35, 40, 50)

# Assign discount amounts based on persona names

Test5$DiscountAmountPercent <- CouponValues[Test5$PersonaNumber]

# Set IsCouponRedeemed randomly

Test5$IsCouponRedeemed <- sample(c("Yes", "No"), 10000, replace = TRUE)

# use binomial distribution to add a column to show whether a customer is given a coupon or not
# The first line creates a 0 or 1 indicator with a 50% chance (you may change it any other number)
# The second line attaches a coupon value if the first column is a 1. 

Test5$IsCouponGiven <- ifelse(rbinom(10000, 1, 0.5) == 1, "Yes", "No")

Test5$CouponDiscountAmountPercent = ifelse(Test5$IsCouponGiven == "Yes", Test5$DiscountAmountPercent, 0) 

# Create code where if coupon given is yes then coupon redeemed is yes or no, but if coupon given is no then coupon redeemed should be no

Test5$IsCouponRedeemed <- ifelse(Test5$IsCouponGiven == "Yes", 
                                 sample(c("Yes", "No"), length(Test5$IsCouponGiven), replace = TRUE), 
                                 "No")

# Set CouponCampaignID based on persona names

Test5$CouponCampaignID <- paste("CMP", Test5$PersonaNumber, sep = "")

# Set CouponDiscountAmount based on whether the coupon is redeemed

Test5$CouponDiscountAmountPercent <- ifelse(Test5$IsCouponRedeemed == "Yes", Test5$DiscountAmountPercent, 0)

# Define CouponDescriptions

CouponDescriptions <- c(
  "Happy Hour Deal",
  "Referral Reward",
  "Weekend Special",
  "Seasonal Promotion",
  "Holiday Promotion",
  "New Customer Offer",
  "Limited-Time Offer",
  "Buy 1, Get 1 Beverage 50% Off"
)

# Add CouponDescription column based on DiscountAmount

Test5$CouponDescription <- ifelse(Test5$DiscountAmountPercent %in% CouponValues, 
                                  CouponDescriptions[match(Test5$DiscountAmountPercent, CouponValues)], 
                                  "Regular Price")

# Re-arrange columns

Test5 <- Test5[, c("CustomerNumber", "FirstName", "LastName", "PersonaNumber",
                   "PersonaName", "OrderID", "OrderDate", "OrderType", "Product", 
                   "ItemsPurchased", "ItemSize", "Quantity", "PricePerUnit", "OrderTotal", 
                   "PaymentMethod", "CouponCode", "DiscountAmountPercent", "IsCouponGiven", 
                   "IsCouponRedeemed", "CouponDiscountAmountPercent", "CouponCampaignID", 
                   "CouponDescription", "PhoneNumber", "EmailAddress", "City", "State", "PostalCode")]

# Print the updated dataframe

dim(Test5)
## [1] 10000    27
CouponTable6 <- head(Test5) # Final Coupon Table (Includes Customer Table & Transaction Table) 

knitr::kable(CouponTable6, caption = "Cafecito Coupon & Offers Table")  # Final Coupon Table (Includes Customer Table & Transaction Table) 
Cafecito Coupon & Offers Table
CustomerNumber FirstName LastName PersonaNumber PersonaName OrderID OrderDate OrderType Product ItemsPurchased ItemSize Quantity PricePerUnit OrderTotal PaymentMethod CouponCode DiscountAmountPercent IsCouponGiven IsCouponRedeemed CouponDiscountAmountPercent CouponCampaignID CouponDescription PhoneNumber EmailAddress City State PostalCode
101022 Shelby FOLEY 5 The Health-Conscious Hero 7279 2023-01-31 Dine-In Food Bolon One Size 4 9.74 38.96 Cash CPN5 30 No No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 8004 2023-05-31 Delivery Food Bolon One Size 3 8.38 25.14 Online Payment CPN5 30 Yes No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 7783 2023-11-15 Take Out Non_Coffee_Beverages Guanabana Juice 16oz 5 7.46 37.30 Online Payment CPN5 30 No No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 8206 2023-08-05 Take Out Coffee_Beverages Latte 24oz 3 13.22 39.66 Cash CPN5 30 Yes Yes 30 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 2241 2023-10-04 Dine-In Tea_Beverages Green Tea 24oz 5 9.24 46.20 Online Payment CPN5 30 No No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 5772 2023-06-26 Take Out Pastries Concha One Size 1 8.47 8.47 Cash CPN5 30 Yes Yes 30 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
# print(Test5) # Final Coupon Table (Includes Customer Table & Transaction Table) 

Furthermore, most parameters, such as coupon descriptions and discount amounts, were chosen at random (uniform distribution). This uniform distribution approach adds variability to the simulation, imitating the diverse nature of promotional offers in the coffee shop industry. By incorporating randomness into the simulation, the model becomes adaptable to different situations, enhancing its usefulness for statistical stimulation. 

The first 6 rows of the Cafecito Coupon & Offers Table can be seen below. This table contains 10,000 rows of observation, each representing a unique transaction, and contains 27 columns. These columns are a combination of attributes from the Customer Table and Transaction Table. Together, they form a comprehensive dataset that provides insights into customer behavior, transaction dynamics, and product performance.  

Overall, the selection of a probability of 0.5 for coupon distribution, along with the randomization of other parameters, ensures a realistic and flexible approach to simulating promotional activities for Cafecito. This approach allows for comprehensive exploration of various scenarios and their effects on customer engagement and order patterns, ultimately contributing to the optimization of promotional strategies in the coffee shop franchise. 

The Final CRM Table

The merging of all three CRM systems culminates in the creation of the Final_Cafecito_Shop_Dataset, a comprehensive dataset that consolidates customer, transaction, and coupon & offers information into a unified framework. This integrated dataset combines data from the Customer Table, Transaction Table, and Coupon & Offers Table, offering a broader perspective on customer interactions, transactional patterns, and promotional initiatives within the Cafecito coffee shop franchise.

The Final_Cafecito_Shop_Dataset contains a total of 10,000 rows of observations, each representing a unique transaction, and contains 26 columns. These columns include a rich range of attributes, including customer demographics, transaction details, product information, and coupon & offers data. By consolidating these tables, a powerful resource has been created for analyzing customer preferences, identifying trends, and optimizing marketing strategies.

The attributes included in the Final_Cafecito_Shop_Dataset are obtained from the customer, transaction, and coupon & offers tables, ensuring that all relevant information is captured and synthesized in one cohesive dataset. This integration implements seamless analysis and reporting, which enables actionable insights and informed decision making to drive business growth and enhance customer satisfaction.

As a preview, the first six rows of the Final_Cafecito_Shop_Dataset are presented below:

Final_Cafecito_Shop_Dataset <- Test5 # Final dataset with all three tables combined

dim(Final_Cafecito_Shop_Dataset)
## [1] 10000    27
Final_Cafecito_Shop_Dataset6 <- head(Final_Cafecito_Shop_Dataset)

knitr::kable(Final_Cafecito_Shop_Dataset6, caption = "The Final Cafecito Shop Dataset")
The Final Cafecito Shop Dataset
CustomerNumber FirstName LastName PersonaNumber PersonaName OrderID OrderDate OrderType Product ItemsPurchased ItemSize Quantity PricePerUnit OrderTotal PaymentMethod CouponCode DiscountAmountPercent IsCouponGiven IsCouponRedeemed CouponDiscountAmountPercent CouponCampaignID CouponDescription PhoneNumber EmailAddress City State PostalCode
101022 Shelby FOLEY 5 The Health-Conscious Hero 7279 2023-01-31 Dine-In Food Bolon One Size 4 9.74 38.96 Cash CPN5 30 No No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 8004 2023-05-31 Delivery Food Bolon One Size 3 8.38 25.14 Online Payment CPN5 30 Yes No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 7783 2023-11-15 Take Out Non_Coffee_Beverages Guanabana Juice 16oz 5 7.46 37.30 Online Payment CPN5 30 No No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 8206 2023-08-05 Take Out Coffee_Beverages Latte 24oz 3 13.22 39.66 Cash CPN5 30 Yes Yes 30 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 2241 2023-10-04 Dine-In Tea_Beverages Green Tea 24oz 5 9.24 46.20 Online Payment CPN5 30 No No 0 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
101022 Shelby FOLEY 5 The Health-Conscious Hero 5772 2023-06-26 Take Out Pastries Concha One Size 1 8.47 8.47 Cash CPN5 30 Yes Yes 30 CMP5 Holiday Promotion 492 - 895 - 2441 Lewiston ME 04240
write.csv(Final_Cafecito_Shop_Dataset, "Final_Cafecito_Shop_Dataset.csv", row.names = FALSE)

Furthermore, the creation of the Final_Cafecito_Shop_Dataset represents a significant milestone in this CRM system development, providing a comprehensive and unified view of customer interactions and transactional dynamics. This dataset serves as a valuable asset for strategic decision-making, customer segmentation, and targeted marketing efforts, empowering Cafecito to deliver exceptional experiences and drive success in the competitive coffee shop industry.

Conclusion

The development of a comprehensive Customer Relationship Management (CRM) system for Cafecito has been a strategic endeavor to gain a better understanding of the potential customer base and optimizing marketing strategies. Through meticulous design and integration of customer, transaction, and promotional data, a powerful resource for analyzing customer interactions and transactional dynamics was created, Final_Cafecito_Shop_Dataset. This dataset, which contains 10,000 rows of observations and 27 columns, empowers Cafecito with actionable insights for strategic decision-making and targeted marketing efforts. By utilizing the insights gained from this CRM system, Cafecito is ready to deliver exceptional experiences and drive success in the competitive coffee shop industry. The development of a CRM system that aims to drive growth, enhance customer satisfaction, and solidify Cafecito’s position as a leading coffee shop franchise.

Resources

Bluman, A. (2018). Elementary statistics: A step by step approach (10th ed.). McGraw Hill. Goodreads. (n.d.).

Kabacoff, R.I. (2022). R in action: Data analysis and graphics with R and tidyverse (3rd edition).