Cafecito, an embodiment of Latin American heritage and flavors, stands as a beacon of warmth and community in the bustling streets of New England’s vibrant cities. Cafecito, a small franchise chain, is driven by a passion for delivering an authentic coffee experience dedicated to providing a haven where customers can savor the rich aromas and flavors of carefully crafted beverages.
In the heart of each neighborhood, Cafecito establishes its presence with a network of boutique coffee shops, each radiating a cozy ambiance that invites patrons to linger and connect. Currently, Cafecito has 10 locations across New England, including Biddeford, Lewiston, Portland and Westbrook in Maine, Portsmouth in New Hampshire, and Boston in Massachusetts, strategically positioned to cater to diverse communities.
At Cafecito, the menu is a celebration of Latin American coffee traditions, offering a delightful array of specialty coffees, including Ecuadorian delights like canelazo, colada morada, and colada de avena, alongside refreshing guanabana juice. Colombian hot chocolate like chocolate santafereño and Peruvian specialties such as chicha morada add an exotic flair to our offerings.
The food menu is equally enticing, featuring traditional delights from across Latin America. Indulge in savory bolon, crispy buñuelo, flavorful empanadas, and hearty tamales, each small plate is a testament to the rich culinary heritage of the region.
The target customer demographic spans across various age groups, united by a love for exceptional coffee and a desire for genuine connection. From busy professionals seeking a morning pick-me-up to students craving a cozy study spot, Cafecito welcomes all who seek a moment of respite from the daily hustle.
As Cafecito embarks on its journey to revolutionize the coffee culture, it transcends being simply a place to enjoy a cup of coffee; it becomes a destination where relationships flourish, stories are shared, and memories are made. Join in embracing the essence of Latin hospitality, one sip at a time. And by adding sazon to your life!
# Week 2
# Importing and preparing the dataset
CustomerDF <- read_csv("unique_customers.csv")
## Rows: 25000 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): FirstName, LastName, PhoneNumber, EmailAddress, City
## dbl (3): CustomerNumber, PersonaNumber, PostalCode
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Clean names
clean_names(CustomerDF)
# Learn about the dataset
dim(CustomerDF) # checking dimensions of the dataset
summary(CustomerDF)
# Checking the structure of the dataset
str(CustomerDF)
head(CustomerDF)
print(CustomerDF)
In the development process of a Customer Relationship Management (CRM) system for Cafecito, the objective is to deepen the understanding of the potential customer base. The task entails generating a comprehensive dataset that accurately reflects realistic customer interactions, transaction patterns, and marketing strategies.
The CRM system is structured around three primary tables: Customer
Table, Transaction Table, and Coupon & Offers Table. These tables
are merged into a final data frame named
Final_Cafecito_Shop_Dataset, providing a consolidated
perspective of customer data and interactions.
The foundational data for the CRM system is sourced from a customer data frame consisting of 25,000 observations and 10 variables, incorporating critical details such as First Name, Last Name, Customer Number, Persona Number, Contact Information (Phone Number, Email Address), City, and Postal Code. This dataset serves as the basis for simulating various customer interactions, transactional behaviors, and marketing endeavors.
The customer table will serve as an archive for detailed customer information, facilitating the tracking of individual preferences, demographics, and contact details. Concurrently, the Transaction Table will record transactional data, including order specifics, purchase history, and payment methods, allowing for in-depth analysis of buying patterns and trends.
Additionally, the coupon and offers table will organize promotional activities, managing coupon codes, descriptions, discount values, etc. Through the implementation of constraints such as conditional coupon values and targeted marketing strategies, the objective is to optimize customer engagement and retention.
Upon integration into a unified dataset, these tables will provide valuable insights into customer behavior, preferences, and loyalty. This comprehensive perspective will enable Cafecito to tailor its products, services, and marketing strategies to effectively serve and engage with its diverse customer base.
The customer table serves as a cornerstone of our CRM system, containing crucial information about each customer and providing insights for effective customer engagement. The structure of the customer table was carefully designed based on the analysis of the customer dataset and considerations of Cafecito’s business objectives.
Before proceeding with the customer table’s creation, an exploration of the frequency distribution of the persona number was conducted (Table 1). This analysis revealed a disproportionate distribution of persona numbers, indicating the need for further exploration and potential adjustments to ensure representation in customer segmentation.
To enhance the understanding of the persona numbers and align them
with the clientele Cafecito aims to attract, names were
assigned to each persona number (Table 2). These persona names include
descriptors such as The Morning Regular,
The Social Media Maven,
The Health-Conscious Hero,
The Studdy Sessioner, The Freelance Guru,
The Local Celebrity, The Indecisive, and
The Penny Pincher, reflecting different customer profiles
and behaviors.
# ------------------------------------------------------
# Code for a complete frequency table
# ------------------------------------------------------
FreqTable = table(CustomerDF$PersonaNumber)
# print(FreqTable)
# Calculate percentages
PercentTable <- prop.table(FreqTable) * 100
# print(PercentTable)
# Calculate cumulative sum
CumulativeSum <- cumsum(FreqTable)
# print(CumulativeSum)
# Calculate cumulative percentage
CumulativePercentage <- cumsum(PercentTable)
# print(CumulativePercentage)
# Combine all the information into a data frame
ResultTable <- data.frame(PersonaID = names(FreqTable),
Frequency = FreqTable,
Percentage = PercentTable,
CumulativeSum = CumulativeSum,
CumulativePercent = CumulativePercentage)[, c("PersonaID", "Frequency.Freq", "Percentage.Freq",
"CumulativeSum", "CumulativePercent")]
# print(ResultTable)
knitr::kable(ResultTable, caption = "Table 1: Persona ID Frequency Distribution")
| PersonaID | Frequency.Freq | Percentage.Freq | CumulativeSum | CumulativePercent |
|---|---|---|---|---|
| 1 | 7033 | 28.132 | 7033 | 28.132 |
| 2 | 4926 | 19.704 | 11959 | 47.836 |
| 3 | 4311 | 17.244 | 16270 | 65.080 |
| 4 | 2922 | 11.688 | 19192 | 76.768 |
| 5 | 2518 | 10.072 | 21710 | 86.840 |
| 6 | 1775 | 7.100 | 23485 | 93.940 |
| 7 | 987 | 3.948 | 24472 | 97.888 |
| 8 | 528 | 2.112 | 25000 | 100.000 |
# ---------------------------------------------------------------------------
# The above table shows that the personas are disproportionately distributed
# ---------------------------------------------------------------------------
In the creation of the customer table, several attributes were made to enrich the dataset and align it with the business needs. These additions include the inclusion of columns for persona names, replacing the original city column with specific city names corresponding to the locations of Cafecito’s franchise (Biddeford, Lewiston, Portland, Westbrook, Portsmouth, and Boston), and the addition of a state column to indicate the location of each franchise.
Furthermore, the postal codes in the original customer dataset were updated to match the city locations of the franchise outlets. This ensures accurate geographic representation within the dataset, enabling targeted marketing efforts and localized customer engagement strategies.
Following these changes, a random selection of 500 customers was
extracted from the customer data frame and stored into a new data
frame,SelectedCustomers. The customer table has 500
observations with 10 columns (variables). Additionally, Table 3 shows
the frequency distribution of persona names, totaling 500 customers.
This distribution illustrates the allocation of customers across
different persona names, providing insights into the composition of the
selected customer sample.
# Creating a customer table relevant to my business franchise
# ------------------------------------------------------
# Giving names to personas and adding them to CustomerDF
# ------------------------------------------------------
PersonaNames <- c("The Morning Regular", "The Social Media Maven", "The Studdy Sessioner", "The Freelance Guru", "The Health-Conscious Hero", "The Local Celebrity", "The Indecisive", "The Penny Pincher")
# print(PersonaNames)
CustomerDF$PersonaName <- PersonaNames[CustomerDF$PersonaNumber]
# ------------------------------------------------------
# Code for a complete frequency table of persona names
# ------------------------------------------------------
FreqTable2 = table(CustomerDF$PersonaName)
# print(FreqTable2)
# Calculate percentages
PercentTable2 <- prop.table(FreqTable2) * 100
# print(PercentTable2)
# Calculate cumulative sum
CumulativeSum2 <- cumsum(FreqTable2)
# print(CumulativeSum2)
# Calculate cumulative percentage
CumulativePercent2 <- cumsum(PercentTable2)
# print(CumulativePercent2)
# Combine all the information into a data frame
ResultTable2 <- data.frame(PersonaName = names(FreqTable2),
Frequency = FreqTable2,
Percentage = PercentTable2,
CumulativeSum = CumulativeSum2,
CumulativePercent = CumulativePercent2)[, c("PersonaName", "Frequency.Freq", "Percentage.Freq",
"CumulativeSum", "CumulativePercent")]
# print(ResultTable2)
knitr::kable(ResultTable2, caption = "Table 2: Persona Names Frequency Distribution")
| PersonaName | Frequency.Freq | Percentage.Freq | CumulativeSum | CumulativePercent | |
|---|---|---|---|---|---|
| The Freelance Guru | The Freelance Guru | 2922 | 11.688 | 2922 | 11.688 |
| The Health-Conscious Hero | The Health-Conscious Hero | 2518 | 10.072 | 5440 | 21.760 |
| The Indecisive | The Indecisive | 987 | 3.948 | 6427 | 25.708 |
| The Local Celebrity | The Local Celebrity | 1775 | 7.100 | 8202 | 32.808 |
| The Morning Regular | The Morning Regular | 7033 | 28.132 | 15235 | 60.940 |
| The Penny Pincher | The Penny Pincher | 528 | 2.112 | 15763 | 63.052 |
| The Social Media Maven | The Social Media Maven | 4926 | 19.704 | 20689 | 82.756 |
| The Studdy Sessioner | The Studdy Sessioner | 4311 | 17.244 | 25000 | 100.000 |
# ------------------------------------------------------
# Giving names to "city" and adding them to CustomerDF
# ------------------------------------------------------
CityFreqTable = table(CustomerDF$City)
# print(CityFreqTable)
CityNames <- c(A = "Biddeford", B = "Lewiston",
C = "Portland", D = "Westbrook",
E = "Portsmouth", F = "Boston")
# print(CityNames)
CustomerDF$City <- CityNames[CustomerDF$City]
# ------------------------------------------------------
# Adding "State" column to CustomerDF
# ------------------------------------------------------
# Define a named vector mapping cities to states
StateMapping <- c(Biddeford = "ME", Lewiston = "ME", Portland = "ME",
Westbrook = "ME", Portsmouth = "NH", Boston = "MA")
# Add State column based on city mapping
CustomerDF$State <- StateMapping[CustomerDF$City]
# Organize customerDF column
CustomerDF <- CustomerDF[c("FirstName", "LastName", "CustomerNumber",
"PersonaNumber",
"PersonaName",
"PhoneNumber",
"EmailAddress",
"City",
"State",
"PostalCode")]
# --------------------------------------------------------------------
# Giving accurate Postal Codes to match city and adding them to CustomerDF
# --------------------------------------------------------------------
# Define a named vector mapping cities to postal codes
PostalCodeMapping <- c(Biddeford = "04005", Lewiston = "04240",
Portland = "04101", Westbrook = "04092",
Portsmouth = "03801", Boston = "02116")
# print(PostalCodeMapping)
# Add PostalCode column based on city mapping
CustomerDF$PostalCode <- PostalCodeMapping[CustomerDF$City]
# ---------------------------------
# Step 2: Select n random customers
# ---------------------------------
set.seed(1996) # Set the seed for reproducibility
SelectedCustomers <- sample_n(CustomerDF, 500)
# head(SelectedCustomers)
dim(SelectedCustomers) # Final Customer Table
## [1] 500 10
# print(SelectedCustomers) # Final Customer Table
CustomerTable6 <- head(SelectedCustomers)
knitr::kable(CustomerTable6, caption = "Cafecito Selected Customer Table")
| FirstName | LastName | CustomerNumber | PersonaNumber | PersonaName | PhoneNumber | EmailAddress | City | State | PostalCode |
|---|---|---|---|---|---|---|---|---|---|
| James | HINES | 686761 | 2 | The Social Media Maven | 401 - 312 - 9023 | jameshines@example.com | Biddeford | ME | 04005 |
| Katherine | BRUCE | 717779 | 2 | The Social Media Maven | 653 - 869 - 6422 | katherinebruce@example.com | Boston | MA | 02116 |
| Lorelei | FROST | 361593 | 1 | The Morning Regular | 227 - 599 - 6259 | loreleifrost@example.com | Lewiston | ME | 04240 |
| Aarav | ZHANG | 847030 | 2 | The Social Media Maven | 911 - 505 - 1161 | aaravzhang@example.com | Portland | ME | 04101 |
| Lola | FRY | 477764 | 1 | The Morning Regular | 555 - 729 - 3414 | lolafry@example.com | Portsmouth | NH | 03801 |
| Emilio | KERR | 419941 | 4 | The Freelance Guru | 924 - 287 - 1701 | emiliokerr@example.com | Boston | MA | 02116 |
# ------------------------------------------------------
# Code for a complete frequency table
# ------------------------------------------------------
FreqTable3 = table(SelectedCustomers$PersonaName)
# Calculate percentages
PercentTable3 <- prop.table(FreqTable3) * 100
# Calculate cumulative sum
CumulativeSum3 <- cumsum(FreqTable3)
# Calculate cumulative percentage
CumulativePercent3 <- cumsum(PercentTable3)
# Combine all the information into a data frame
ResultTable3 <- data.frame(PersonaName = names(FreqTable3),
Frequency = FreqTable3,
Percentage = PercentTable3,
CumulativeSum = CumulativeSum3,
CumulativePercent = CumulativePercent3)[, c("PersonaName", "Frequency.Freq", "Percentage.Freq",
"CumulativeSum", "CumulativePercent")]
# print(ResultTable3)
knitr::kable(ResultTable3, caption = "Table 3: Persona Names Frequency Distribution:
Sample Size = 500")
| PersonaName | Frequency.Freq | Percentage.Freq | CumulativeSum | CumulativePercent | |
|---|---|---|---|---|---|
| The Freelance Guru | The Freelance Guru | 54 | 10.8 | 54 | 10.8 |
| The Health-Conscious Hero | The Health-Conscious Hero | 60 | 12.0 | 114 | 22.8 |
| The Indecisive | The Indecisive | 19 | 3.8 | 133 | 26.6 |
| The Local Celebrity | The Local Celebrity | 41 | 8.2 | 174 | 34.8 |
| The Morning Regular | The Morning Regular | 131 | 26.2 | 305 | 61.0 |
| The Penny Pincher | The Penny Pincher | 11 | 2.2 | 316 | 63.2 |
| The Social Media Maven | The Social Media Maven | 106 | 21.2 | 422 | 84.4 |
| The Studdy Sessioner | The Studdy Sessioner | 78 | 15.6 | 500 | 100.0 |
The random selection was made to ensure a diverse representation of customers across different personas and geographic locations. Random selection through uniform distribution can be advantageous as it helps to avoid potential biases that could arise from a non-random selection process. By randomly selecting customers, it ensures that each customer has an equal chance of being included in the sample, thereby increasing the likelihood of capturing a diverse range of customer profiles. However, it’s important to acknowledge that this choice introduces a uniform distribution bias, potentially impacting the reliability and usefulness of the simulation results. A more balanced sampling approach may be necessary in future iterations to mitigate this bias and ensure a more representative dataset for statistical simulations. But for this statistical simulation, a uniform distribution is used.
The transaction table serves as a crucial component of this CRM system, capturing essential information about customer transactions to further analyze purchasing patterns, product preferences, and revenue generation. The transaction table’s structure was carefully designed to include key attributes that provide comprehensive insights into customer behavior and transactional dynamics.
The chosen attributes for the transaction table include:
These attributes were chosen based on their significance in understanding customer purchasing behavior, analyzing sales performance, and optimizing operational processes. By capturing detailed transactional data, the transaction table enables Cafecito to gain valuable insights into customer preferences, product performance, and revenue generation, ultimately contributing to informed decision-making and strategic planning.
# ------------------------
# Step 3: Generating dates
# ------------------------
# ------------------------------------------------------
# All dates betwen two dates
# ------------------------------------------------------
AllDates <- seq(as.Date("2024-01-01"), as.Date("2024-12-31"), by="day")
# AllDates[1:366]
# length(AllDates)
# ------------------------------------------------------
# Generate multiple random date between two dates
# ------------------------------------------------------
StartDate <- as.Date("2023-01-01")
EndDate <- as.Date("2023-12-31")
RandomDate <- sample(seq(as.Date(StartDate), as.Date(EndDate), by="day"), 10)
# print(RandomDate)
# ------------------------------------------------------
# Function to generate random dates within a year
# This will be used to generate random dates for transactions
# ------------------------------------------------------
GenerateRandomDates <- function(n, StartDate, EndDate) {
seq_dates <- seq(as.Date(StartDate), as.Date(EndDate), by="day")
sample(seq_dates, n, replace = TRUE)
}
GenerateRandomDates(25, '2023-01-01', '2023-12-31')
## [1] "2023-10-05" "2023-04-14" "2023-06-04" "2023-08-17" "2023-03-06"
## [6] "2023-09-20" "2023-05-21" "2023-09-17" "2023-11-13" "2023-04-02"
## [11] "2023-08-18" "2023-07-27" "2023-06-03" "2023-09-30" "2023-02-26"
## [16] "2023-03-15" "2023-08-02" "2023-11-27" "2023-11-04" "2023-09-17"
## [21] "2023-07-04" "2023-12-05" "2023-11-12" "2023-10-20" "2023-12-31"
# ------------------------------------------------------
# Step 5: Transaction Table
# ------------------------------------------------------
# Generate transaction data
set.seed(1996)
# ----------------------------------------------------------------
# Method 1: Random generation of Order IDs
# ----------------------------------------------------------------
Test1 = data.frame(
OrderID = c(1:10000)
)
# dim(Test1)
# head(Test1)
# ----------------------------------------------------------------
# Adding Customer IDs to the data frame
# ----------------------------------------------------------------
# Array of customer numbers, from the selected_customers table
CustomerNumbers <- SelectedCustomers$CustomerNumber
# CustomerNumbers[1:10]
# # ----------------------------------------------------------------
# # randomly select customers from the SelectedCustomer table
# # What does replace = TRUE do?
# # ----------------------------------------------------------------
Test2 = data.frame(
OrderID = c(1:10000),
CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE)
)
# dim(Test2)
# head(Test2)
# Merge Test2 with SelectedCustomers to get the PersonaNumber and other details
Test2 <- merge(Test2, SelectedCustomers, by = "CustomerNumber")
# dim(Test2)
# head(Test2)
# Count unique number of customers
# length(unique(Test2$CustomerNumber))
# Customers with the highest number of transactions
sort(table(Test2$CustomerNumber), decreasing = TRUE)[1:10]
##
## 545845 419941 691605 743992 747200 595426 995549 592638 974217 204593
## 35 33 33 33 33 32 32 31 31 30
# ----------------------------------------------------------------
# Adding transaction date to the data frame
# ----------------------------------------------------------------
Test3 = data.frame(
OrderID = c(1:10000),
CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE),
OrderDate = GenerateRandomDates(10000, '2023-01-01', '2023-12-31')
)
# dim(Test3)
# head(Test3)
# Merge Test3 with SelectedCustomers table
Test3 <- merge(Test3, SelectedCustomers, by = "CustomerNumber")
# dim(Test3)
# head(Test3)
# ----------------------------------------------------------------
# Adding more features to the data frame
# ----------------------------------------------------------------
# Product Details
ProductDetails <- list(
Coffee_Beverages = c("Espresso", "Americano", "Latte", "Cappuccino", "Mocha", "Macchiato"),
Tea_Beverages = c("Black Tea", "Green Tea", "Herbal Tea", "Chai Tea", "Matcha Latte", "Iced Tea"),
Non_Coffee_Beverages = c("Guanabana Juice", "Canelazo", "Colada Morada", "Colada De AVena",
"Chocolate Santafereño", "Chicha Morada"),
Pastries = c("Arroz con Leche", "Muffin", "Scone", "Cookie", "Concha", "Tres Leches Cake", "Churros", "Ducle de Leche Flan"),
Food = c("Empanadas", "Bolon", "Buñuelo", "Tamales", "Arepa", "Soup of the Day")
)
# Payment Method
PaymentMethods <- c("Cash", "Credit Card", "Online Payment")
# Add to Test4
Test4 = data.frame(
OrderID = c(1:10000),
CustomerNumber = sample(SelectedCustomers$CustomerNumber, 10000,replace = TRUE),
OrderDate = GenerateRandomDates(10000, '2023-01-01', '2023-12-31'),
OrderType = sample(c("Dine-In", "Take Out", "Delivery"), 10000, replace = TRUE),
Product = sample(c("Coffee_Beverages", "Tea_Beverages", "Non_Coffee_Beverages", "Pastries", "Food"), 10000, replace = TRUE),
Quantity = sample(1:5, 10000, replace = TRUE),
PricePerUnit = round(runif(10000,5, 15), 2),
PaymentMethod = sample(PaymentMethods, 10000, replace = TRUE)
)
# head(Test4)
# Generate random product details for each transaction using the above lists
Test4$ItemsPurchased <- sapply(Test4$Product, function(x) sample(ProductDetails[[x]], 1))
# Calculate OrderTotal
Test4$OrderTotal <- Test4$Quantity * as.numeric(Test4$PricePerUnit)
# Add product sizes column
ProductSizes <- c("8oz", "16oz", "24oz", "One Size")
# Assign sizes based on product type
Test4$ItemSize <- ifelse(Test4$Product %in% c("Coffee_Beverages", "Tea_Beverages", "Non_Coffee_Beverages"),
sample(ProductSizes[1:3], nrow(Test4), replace = TRUE),
ProductSizes[4])
# Merge Test4 with SelectedCustomers
Test4 <- merge(Test4, SelectedCustomers, by = "CustomerNumber")
# Rearrange columns
Test4 <- Test4[, c("CustomerNumber", "FirstName", "LastName", "PersonaNumber", "PersonaName", "OrderID", "OrderDate", "OrderType", "Product", "ItemsPurchased", "ItemSize", "Quantity", "PricePerUnit", "OrderTotal", "PaymentMethod", "PhoneNumber", "EmailAddress", "City", "State", "PostalCode")]
dim(Test4) # Final Transaction Table (Includes Customer Table)
## [1] 10000 20
TransactionTable6 <- head(Test4) # Final Transaction Table (Includes Customer Table)
knitr::kable(TransactionTable6, caption = " Cafecito Transaction Table")
| CustomerNumber | FirstName | LastName | PersonaNumber | PersonaName | OrderID | OrderDate | OrderType | Product | ItemsPurchased | ItemSize | Quantity | PricePerUnit | OrderTotal | PaymentMethod | PhoneNumber | EmailAddress | City | State | PostalCode |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 7279 | 2023-01-31 | Dine-In | Food | Bolon | One Size | 4 | 9.74 | 38.96 | Cash | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 8004 | 2023-05-31 | Delivery | Food | Bolon | One Size | 3 | 8.38 | 25.14 | Online Payment | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 7783 | 2023-11-15 | Take Out | Non_Coffee_Beverages | Guanabana Juice | 16oz | 5 | 7.46 | 37.30 | Online Payment | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 8206 | 2023-08-05 | Take Out | Coffee_Beverages | Latte | 24oz | 3 | 13.22 | 39.66 | Cash | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 2241 | 2023-10-04 | Dine-In | Tea_Beverages | Green Tea | 24oz | 5 | 9.24 | 46.20 | Online Payment | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 5772 | 2023-06-26 | Take Out | Pastries | Concha | One Size | 1 | 8.47 | 8.47 | Cash | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
# print(Test4) # Final Transaction Table (Includes Customer Table)
The Cafecito Transaction Table contains 10,000 rows of observation, each representing a unique transaction, and encompasses 20 columns. These columns are a combination of attributes from the Customer Table, capturing essential customer details, and new attributes specific to transactional data. Together, they form a comprehensive dataset that provides insights into customer behavior, transaction dynamics, and product performance. The first 6 rows can be seen in the transaction table below.
Additionally, the 10,000 rows of transactional data were generated at random, ensuring a diverse representation of customer interactions and transaction patterns. This uniform distribution approach is beneficial as it introduces variability and mimics real-world scenarios, allowing for a more realistic simulation of customer behavior. The use of random generation also ensures that each transaction is independent and unbiased, contributing to the statistical reliability of the simulation. Furthermore, by using a uniform distribution, the transactional data reflects a wide range of transaction frequencies and values, enhancing the richness and utility of the statistical stimulation.
In developing the coupon & offers table for Cafecito’s promotional activities, several considerations were made to ensure effective management of these campaigns and their impact on customer behavior.
Firstly, seven new columns were introduced to capture essential details of each promotion. These include Coupon Code, Descriptions, Discount Amounts, Is Coupon Given, Is Coupon Redeemed, Coupon Discount Amount, and Coupon Campaign ID. These attributes provide comprehensive information about each promotion, from its unique identifier to its specific discount value and campaign ID.
One crucial aspect of this table is the inclusion of the
Is Coupon Given column, which determines whether a coupon
is distributed to a customer. A binomial distribution with a probability
of 0.5 was chosen for generating values in this column. This probability
reflects a balanced approach, where there’s an equal chance of a coupon
being given or not. This decision aligns with the aim of reaching a
broad customer base while maintaining a manageable distribution of
coupons.
The use of a binomial distribution for generating coupon distribution reflects the unpredictability of real-world promotional activities. By simulating coupon distribution based on a probabilistic model, the simulation captures the variability and uncertainty that Cafecito may encounter in its promotional campaigns. This approach allows for the exploration of different scenarios and their potential impact on customer behavior.
# ----------------------------------------------------------------
# Adding coupon related features to the data frame
# ----------------------------------------------------------------
# ------------------------------------------------------
# Coupon Strategy preparation
# ------------------------------------------------------
# Use Test4 dataframe to create Test5 dataframe
Test5 <- Test4
# Add coupon values based on persona
Test5$CouponCode <- paste("CPN", Test5$PersonaNumber, sep = "")
CouponValues <- c(10, 15, 20, 25, 30, 35, 40, 50)
# Assign discount amounts based on persona names
Test5$DiscountAmountPercent <- CouponValues[Test5$PersonaNumber]
# Set IsCouponRedeemed randomly
Test5$IsCouponRedeemed <- sample(c("Yes", "No"), 10000, replace = TRUE)
# use binomial distribution to add a column to show whether a customer is given a coupon or not
# The first line creates a 0 or 1 indicator with a 50% chance (you may change it any other number)
# The second line attaches a coupon value if the first column is a 1.
Test5$IsCouponGiven <- ifelse(rbinom(10000, 1, 0.5) == 1, "Yes", "No")
Test5$CouponDiscountAmountPercent = ifelse(Test5$IsCouponGiven == "Yes", Test5$DiscountAmountPercent, 0)
# Create code where if coupon given is yes then coupon redeemed is yes or no, but if coupon given is no then coupon redeemed should be no
Test5$IsCouponRedeemed <- ifelse(Test5$IsCouponGiven == "Yes",
sample(c("Yes", "No"), length(Test5$IsCouponGiven), replace = TRUE),
"No")
# Set CouponCampaignID based on persona names
Test5$CouponCampaignID <- paste("CMP", Test5$PersonaNumber, sep = "")
# Set CouponDiscountAmount based on whether the coupon is redeemed
Test5$CouponDiscountAmountPercent <- ifelse(Test5$IsCouponRedeemed == "Yes", Test5$DiscountAmountPercent, 0)
# Define CouponDescriptions
CouponDescriptions <- c(
"Happy Hour Deal",
"Referral Reward",
"Weekend Special",
"Seasonal Promotion",
"Holiday Promotion",
"New Customer Offer",
"Limited-Time Offer",
"Buy 1, Get 1 Beverage 50% Off"
)
# Add CouponDescription column based on DiscountAmount
Test5$CouponDescription <- ifelse(Test5$DiscountAmountPercent %in% CouponValues,
CouponDescriptions[match(Test5$DiscountAmountPercent, CouponValues)],
"Regular Price")
# Re-arrange columns
Test5 <- Test5[, c("CustomerNumber", "FirstName", "LastName", "PersonaNumber",
"PersonaName", "OrderID", "OrderDate", "OrderType", "Product",
"ItemsPurchased", "ItemSize", "Quantity", "PricePerUnit", "OrderTotal",
"PaymentMethod", "CouponCode", "DiscountAmountPercent", "IsCouponGiven",
"IsCouponRedeemed", "CouponDiscountAmountPercent", "CouponCampaignID",
"CouponDescription", "PhoneNumber", "EmailAddress", "City", "State", "PostalCode")]
# Print the updated dataframe
dim(Test5)
## [1] 10000 27
CouponTable6 <- head(Test5) # Final Coupon Table (Includes Customer Table & Transaction Table)
knitr::kable(CouponTable6, caption = "Cafecito Coupon & Offers Table") # Final Coupon Table (Includes Customer Table & Transaction Table)
| CustomerNumber | FirstName | LastName | PersonaNumber | PersonaName | OrderID | OrderDate | OrderType | Product | ItemsPurchased | ItemSize | Quantity | PricePerUnit | OrderTotal | PaymentMethod | CouponCode | DiscountAmountPercent | IsCouponGiven | IsCouponRedeemed | CouponDiscountAmountPercent | CouponCampaignID | CouponDescription | PhoneNumber | EmailAddress | City | State | PostalCode |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 7279 | 2023-01-31 | Dine-In | Food | Bolon | One Size | 4 | 9.74 | 38.96 | Cash | CPN5 | 30 | No | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 8004 | 2023-05-31 | Delivery | Food | Bolon | One Size | 3 | 8.38 | 25.14 | Online Payment | CPN5 | 30 | Yes | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 7783 | 2023-11-15 | Take Out | Non_Coffee_Beverages | Guanabana Juice | 16oz | 5 | 7.46 | 37.30 | Online Payment | CPN5 | 30 | No | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 8206 | 2023-08-05 | Take Out | Coffee_Beverages | Latte | 24oz | 3 | 13.22 | 39.66 | Cash | CPN5 | 30 | Yes | Yes | 30 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 2241 | 2023-10-04 | Dine-In | Tea_Beverages | Green Tea | 24oz | 5 | 9.24 | 46.20 | Online Payment | CPN5 | 30 | No | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 5772 | 2023-06-26 | Take Out | Pastries | Concha | One Size | 1 | 8.47 | 8.47 | Cash | CPN5 | 30 | Yes | Yes | 30 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
# print(Test5) # Final Coupon Table (Includes Customer Table & Transaction Table)
Furthermore, most parameters, such as coupon descriptions and discount amounts, were chosen at random (uniform distribution). This uniform distribution approach adds variability to the simulation, imitating the diverse nature of promotional offers in the coffee shop industry. By incorporating randomness into the simulation, the model becomes adaptable to different situations, enhancing its usefulness for statistical stimulation.
The first 6 rows of the Cafecito Coupon & Offers Table can be seen below. This table contains 10,000 rows of observation, each representing a unique transaction, and contains 27 columns. These columns are a combination of attributes from the Customer Table and Transaction Table. Together, they form a comprehensive dataset that provides insights into customer behavior, transaction dynamics, and product performance.
Overall, the selection of a probability of 0.5 for coupon distribution, along with the randomization of other parameters, ensures a realistic and flexible approach to simulating promotional activities for Cafecito. This approach allows for comprehensive exploration of various scenarios and their effects on customer engagement and order patterns, ultimately contributing to the optimization of promotional strategies in the coffee shop franchise.
The merging of all three CRM systems culminates in the creation of
the Final_Cafecito_Shop_Dataset, a comprehensive dataset
that consolidates customer, transaction, and coupon & offers
information into a unified framework. This integrated dataset combines
data from the Customer Table, Transaction Table, and Coupon & Offers
Table, offering a broader perspective on customer interactions,
transactional patterns, and promotional initiatives within the
Cafecito coffee shop franchise.
The Final_Cafecito_Shop_Dataset contains a total of
10,000 rows of observations, each representing a unique transaction, and
contains 26 columns. These columns include a rich range of attributes,
including customer demographics, transaction details, product
information, and coupon & offers data. By consolidating these
tables, a powerful resource has been created for analyzing customer
preferences, identifying trends, and optimizing marketing
strategies.
The attributes included in the
Final_Cafecito_Shop_Dataset are obtained from the customer,
transaction, and coupon & offers tables, ensuring that all relevant
information is captured and synthesized in one cohesive dataset. This
integration implements seamless analysis and reporting, which enables
actionable insights and informed decision making to drive business
growth and enhance customer satisfaction.
As a preview, the first six rows of the
Final_Cafecito_Shop_Dataset are presented below:
Final_Cafecito_Shop_Dataset <- Test5 # Final dataset with all three tables combined
dim(Final_Cafecito_Shop_Dataset)
## [1] 10000 27
Final_Cafecito_Shop_Dataset6 <- head(Final_Cafecito_Shop_Dataset)
knitr::kable(Final_Cafecito_Shop_Dataset6, caption = "The Final Cafecito Shop Dataset")
| CustomerNumber | FirstName | LastName | PersonaNumber | PersonaName | OrderID | OrderDate | OrderType | Product | ItemsPurchased | ItemSize | Quantity | PricePerUnit | OrderTotal | PaymentMethod | CouponCode | DiscountAmountPercent | IsCouponGiven | IsCouponRedeemed | CouponDiscountAmountPercent | CouponCampaignID | CouponDescription | PhoneNumber | EmailAddress | City | State | PostalCode |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 7279 | 2023-01-31 | Dine-In | Food | Bolon | One Size | 4 | 9.74 | 38.96 | Cash | CPN5 | 30 | No | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 8004 | 2023-05-31 | Delivery | Food | Bolon | One Size | 3 | 8.38 | 25.14 | Online Payment | CPN5 | 30 | Yes | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 7783 | 2023-11-15 | Take Out | Non_Coffee_Beverages | Guanabana Juice | 16oz | 5 | 7.46 | 37.30 | Online Payment | CPN5 | 30 | No | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 8206 | 2023-08-05 | Take Out | Coffee_Beverages | Latte | 24oz | 3 | 13.22 | 39.66 | Cash | CPN5 | 30 | Yes | Yes | 30 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 2241 | 2023-10-04 | Dine-In | Tea_Beverages | Green Tea | 24oz | 5 | 9.24 | 46.20 | Online Payment | CPN5 | 30 | No | No | 0 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
| 101022 | Shelby | FOLEY | 5 | The Health-Conscious Hero | 5772 | 2023-06-26 | Take Out | Pastries | Concha | One Size | 1 | 8.47 | 8.47 | Cash | CPN5 | 30 | Yes | Yes | 30 | CMP5 | Holiday Promotion | 492 - 895 - 2441 | shelbyfoley@example.com | Lewiston | ME | 04240 |
write.csv(Final_Cafecito_Shop_Dataset, "Final_Cafecito_Shop_Dataset.csv", row.names = FALSE)
Furthermore, the creation of the
Final_Cafecito_Shop_Dataset represents a significant
milestone in this CRM system development, providing a comprehensive and
unified view of customer interactions and transactional dynamics. This
dataset serves as a valuable asset for strategic decision-making,
customer segmentation, and targeted marketing efforts, empowering
Cafecito to deliver exceptional experiences and drive success
in the competitive coffee shop industry.
The development of a comprehensive Customer Relationship Management
(CRM) system for Cafecito has been a strategic endeavor to gain
a better understanding of the potential customer base and optimizing
marketing strategies. Through meticulous design and integration of
customer, transaction, and promotional data, a powerful resource for
analyzing customer interactions and transactional dynamics was created,
Final_Cafecito_Shop_Dataset. This dataset, which contains
10,000 rows of observations and 27 columns, empowers Cafecito
with actionable insights for strategic decision-making and targeted
marketing efforts. By utilizing the insights gained from this CRM
system, Cafecito is ready to deliver exceptional experiences
and drive success in the competitive coffee shop industry. The
development of a CRM system that aims to drive growth, enhance customer
satisfaction, and solidify Cafecito’s position as a leading
coffee shop franchise.
Bluman, A. (2018). Elementary statistics: A step by step approach (10th ed.). McGraw Hill. Goodreads. (n.d.).
Kabacoff, R.I. (2022). R in action: Data analysis and graphics with R and tidyverse (3rd edition).