Let’s first take a look at the structure of customer/project pair over time in the unit of bi-week.
imputed_PDF_biweek_indiv_clean$account_created_at <-
as.Date(imputed_PDF_biweek_indiv_clean$account_created_at)
imputed_PDF_biweek_indiv_clean <-
imputed_PDF_biweek_indiv_clean |> group_by(user_project_pair) |>
mutate(min_biweek = ifelse(any(observed == 1), min(biweek_i_indiv[observed == 1], na.rm = TRUE), NA),
max_biweek = ifelse(any(observed == 1), max(biweek_i_indiv[observed == 1], na.rm = TRUE), NA),
active_period = ifelse(biweek_i_indiv >= min_biweek & biweek_i_indiv <= max_biweek, 1, observed),
account_created_min = min(account_created_at, na.rm = TRUE)) |> ungroup() |>
arrange(account_created_min, user_project_pair, biweek_i_indiv) |>
select(-min_biweek, -max_biweek, -account_created_min)
imputed_PDF_biweek_indiv_clean_seg <- imputed_PDF_biweek_indiv_clean[, c(1:2, 53, 55)]
The following time-series plot of self-pacing transaction and
redemption history is updated from Statistical Modeling II [3] by
employing the new definition of active_period.
Notations:
1: obs_TR (observed data with
transaction/redemption recorded)0: obs_nTR (observed data with
transaction/redemption NOT recorded)-1: mis (missing data, mainly for
counterfactual inference if it hasn’t happened yet)panelview(D = "active_period",
data = imputed_PDF_biweek_indiv_clean_seg,
index = c("user_project_pair", "biweek_i_indiv"),
display.all = TRUE, type = "treat", by.timing = TRUE,
xlab = "Biweek Index", ylab = "User Project Pair",
axis.lab.gap = c(2,10), main = "Self-Pacing Transaction & Redemption History")
## If the number of units is more than 300, we set "gridOff = TRUE".
## 3 treatment levels.
In this section, we create obs_TR_count_freq to record
the number of frequency for different counts of transaction and
redemption (in short, TR).
obs_TR <- imputed_PDF_biweek_indiv_clean[imputed_PDF_biweek_indiv_clean$observed == 1,]
obs_TR_count <- table(obs_TR$user_project_pair)
obs_TR_count_freq <- table(obs_TR_count)
obs_TR_count_freq <- as.data.frame(obs_TR_count_freq)
colnames(obs_TR_count_freq) <- c("obs_count", "freq_obs_count")
ggplot(obs_TR_count_freq, aes(x = obs_count, y = freq_obs_count)) +
geom_bar(stat = "identity", fill = "steelblue2", color = "black") +
labs(title = "Frequency of Count of TR Made Within 1 Bi-week",
x = "Count of TR",
y = "Frequency") +
geom_text(aes(label = freq_obs_count), vjust = -0.5, size = 3) +
theme_minimal()
The function obs_TR_pair_k takes input k to
return a vector of customer/project IDs with count of TR equal to
k. This reduces the time for us to implement subsetting
examples.
obs_TR_pair_k <- function(k) {
obs_TR_count_k <- as.vector(names(obs_TR_count[obs_TR_count == k]))
return(obs_TR_count_k)
}
The following example is by subsetting customer/project IDs active periods over 20. A panel view is provided as usual.
obs_TR_count_gt20 <- as.vector(names(obs_TR_count[obs_TR_count > 20]))
length(obs_TR_count_gt20)
## [1] 365
model_PDF_TRgt20 <-
imputed_PDF_biweek_indiv_clean[imputed_PDF_biweek_indiv_clean$user_project_pair %in% obs_TR_count_gt20, ]
dim(model_PDF_TRgt20)
## [1] 20075 55
panelview(D = "active_period",
data = model_PDF_TRgt20,
index = c("user_project_pair", "biweek_i_indiv"),
display.all = TRUE, type = "treat", by.timing = TRUE,
xlab = "Biweek Index", ylab = "User Project Pair",
axis.lab.gap = c(2,10), main = "Self-Pacing TR History (active periods > 20)")
## If the number of units is more than 300, we set "gridOff = TRUE".
## 3 treatment levels.
For validation purpose, we will hide 20% of data and see if we can recover it using 80% of existing data.
model_PDF_TRgt20 <- model_PDF_TRgt20[, -c(8, 10, 51:52, 54)]
model_PDF_TRgt20_imputed <- model_PDF_TRgt20
model_PDF_TRgt20_imputed[, 4:10] <- na.locf(model_PDF_TRgt20_imputed[, 4:10])
model_PDF_TRgt20_imputed[, 11:48][is.na(model_PDF_TRgt20_imputed[, 11:48])] <- 0
model_PDF_TRgt20_imputed$hide <- 0
model_PDF_TRgt20_imputed$hide[model_PDF_TRgt20_imputed$biweek_i_indiv >= 44 &
model_PDF_TRgt20_imputed$biweek_i_indiv <= 54] <- 1
model_PDF_TRgt20_imputed_train <-
model_PDF_TRgt20_imputed[model_PDF_TRgt20_imputed$hide == 0,]
model_PDF_TRgt20_imputed_test <-
model_PDF_TRgt20_imputed[model_PDF_TRgt20_imputed$hide == 1,]
dim(model_PDF_TRgt20_imputed_train)
## [1] 16060 51
dim(model_PDF_TRgt20_imputed_test)
## [1] 4015 51
panelview(D = "active_period",
data = model_PDF_TRgt20_imputed_train,
index = c("user_project_pair", "biweek_i_indiv"),
display.all = TRUE, type = "treat", by.timing = TRUE,
xlab = "Biweek Index", ylab = "User Project Pair",
axis.lab.gap = c(2,10), main = "Self-Pacing TR History (active periods > 20, training data)")
## If the number of units is more than 300, we set "gridOff = TRUE".
## 3 treatment levels.
This sub-section is to take 3 treatment levels above into binary
treatment, where 1 and 0 above is now under
control, and -1 is now under treatment.
model_PDF_TRgt20_imputed_train$D_i <-
ifelse(model_PDF_TRgt20_imputed_train$observed == -1, 1, 0)
model_PDF_TRgt20_imputed_train$treat <- ave(model_PDF_TRgt20_imputed_train$D_i,
model_PDF_TRgt20_imputed_train$user_project_pair,
FUN = function(x) as.integer(any(x == 1)))
panelview(D = "D_i",
data = model_PDF_TRgt20_imputed_train,
index = c("user_project_pair", "biweek_i_indiv"),
display.all = TRUE, type = "treat", by.timing = TRUE,
xlab = "Biweek Index", ylab = "User Project Pair",
axis.lab.gap = c(2,10), main = "Self-Pacing TR History (active periods > 20, binary treatment training data)")
## If the number of units is more than 300, we set "gridOff = TRUE".
Unfortunately, our data is not suitable for directly implementing
bpCausal since their software does not accept both
staggered adoption (where different customers adopt
purchase or redemption at different times) and staggered
treatment assignment (where different customers receive the
treatment at different times). Instead of rearranging the data to use
their model, we aim to create our own model to directly resolve this
problem. Such an issue will be discussed in Statistical Modeling V in a
.ipynb document.
Hence, the concentration of Statistical Modeling III is focusing on customer segmentation under supervised learning. Due to heavy computations, unsupervised clustering methods will be discussed separately in Statistical Modeling IV.
We will rearrange the panel data structure of PDF that
is tailored to work perfectly in our designed program. First, we rename
several columns below for easy reference.
names(PDF)[names(PDF) == "user_project_pair"] <- "pair"
names(PDF)[names(PDF) == "total_redemption_amount"] <- "redem"
names(PDF)[names(PDF) == "amount_charged_in_usd"] <- "charged"
names(PDF)[names(PDF) == "credit_given_in_usd"] <- "trans"
names(PDF)[names(PDF) == "cumsum_total_redemption_amount"] <- "cumsum_redem"
names(PDF)[names(PDF) == "cumsum_amount_charged_in_usd"] <- "cumsum_charged"
names(PDF)[names(PDF) == "cumsum_credit_given_in_usd"] <- "cumsum_trans"
The recency pattern discovered in EDA III [1] and IV [2]
is an important indicator, so we will display such information in our
modeling data, PDF. The detailed steps here are:
Define last_date as the maximum value of the
created_at column in PDF.
Create 3 new columns, recency_trans,
recency_redem, and recency_charged, in
PDF.
If redem is not equal to 0, calculate the difference
between created_at and last_date and record
the difference in recency_redem. If redem is
0, record Inf in recency_redem.
Similarly, if trans is not equal to 0, calculate the
difference between created_at and last_date
and record the difference in recency_trans. If
trans is 0, record Inf in
recency_trans.
Similarly, if charged is not equal to 0, calculate
the difference between created_at and
last_date and record the difference in
recency_charged. If charged is 0, record
Inf in recency_charged.
last_date <- max(PDF$created_at)
PDF$recency_trans <- round(ifelse(PDF$trans != 0,
as.numeric(difftime(last_date, PDF$created_at, units = "days")), Inf))
PDF$recency_redem <- round(ifelse(PDF$redem != 0,
as.numeric(difftime(last_date, PDF$created_at, units = "days")), Inf))
PDF$recency_charged <- round(ifelse(PDF$charged != 0,
as.numeric(difftime(last_date, PDF$created_at, units = "days")), Inf))
head(PDF[, c(3,4, 80:82)], 10)
## pair created_at recency_trans recency_redem recency_charged
## 1 100001-695 2022-04-22 284 Inf 284
## 2 100002-676 2022-04-22 Inf 284 Inf
## 3 100002-676 2022-04-23 283 Inf 283
## 4 100003-185 2022-04-22 Inf 284 Inf
## 5 100003-185 2022-04-22 284 Inf 284
## 6 100003-185 2022-04-23 283 Inf 283
## 7 100005-692 2022-04-22 Inf 284 Inf
## 8 100005-692 2023-01-09 Inf 22 Inf
## 9 100005-692 2023-01-09 22 Inf 22
## 10 100007-695 2022-04-23 283 Inf 283
This sub-section will begin the merging summary for PDF,
taking essential covariates for our consequent analyses. Notice that
PDF_cov is created as a simplified copy of
PDF, and we rename the time index
biweek_i_indiv as t for simplicity.
PDF_cov <- PDF[, c(3, 6, 7:9, 11, 12:14, 16, 80:82, 17, 22:61, 76, 62:66, 78:79)]
names(PDF_cov)[names(PDF_cov) == "biweek_i_indiv"] <- "t"
Our merging summary data frame is called PDF_JA, which
refers to the panel data frame with jagged arrays. The expanding
covariates are case-dependent:
For numeric columns, such as redem,
charged, trans, and tip, their
average amount (avg_redem, etc.) and frequency amount is
created (freq_redem, etc.).
For cumulative numeric columns, such as
cumsum_redem, cumsum_charged,
cumsum_trans, and cumsum_tip, their maximum
amount is returned (max_cumsum_trans, etc.).
For recency related columns, such as
recency_redem, recency_trans, and
recency_charged, their minimum amount is returned
(min_recency_redem, etc.).
For character types of data and indicator variables like
is_reconcilable, suspicious_promo_binary, and
utm_binary, since each pair shares same information across
them, these inputs are returned with no changes.
For expanding factor types, such as
Venue.Type...Detail_Bar, their frequency amount is returned
(freq_Venue.Type...Detail_Bar).
PDF_JA <- PDF_cov |>
group_by(pair, t) |>
summarise(
across(1:4,
list(avg = ~round(mean(.), 2),
freq = ~sum(. != 0 & !is.na(.))),
.names = "{.fn}_{.col}"),
across(5:8,
list(max = max),
.names = "{.fn}_{.col}"),
across(9:11,
list(min = min),
.names = "{.fn}_{.col}"),
across(12:14, ~first(.), .names = "{.col}"),
across(15:52,
~sum(. != 0 & !is.na(.)),
.names = "freq_{.col}"),
across(53:60, ~first(.), .names = "{.col}")
) |>
ungroup()
## `summarise()` has grouped output by 'pair'. You can override using the
## `.groups` argument.
colnames(PDF_JA)
## [1] "pair"
## [2] "t"
## [3] "avg_redem"
## [4] "freq_redem"
## [5] "avg_charged"
## [6] "freq_charged"
## [7] "avg_trans"
## [8] "freq_trans"
## [9] "avg_tip"
## [10] "freq_tip"
## [11] "max_cumsum_redem"
## [12] "max_cumsum_charged"
## [13] "max_cumsum_trans"
## [14] "max_cumsum_tip"
## [15] "min_recency_trans"
## [16] "min_recency_redem"
## [17] "min_recency_charged"
## [18] "is_reconcilable"
## [19] "suspicious_promo_binary"
## [20] "utm_binary"
## [21] "freq_Venue.Type...Detail_Bar"
## [22] "freq_Venue.Type...Detail_Cafe"
## [23] "freq_Venue.Type...Detail_Casual.Dining"
## [24] "freq_Venue.Type...Detail_Casual.Fine.Dining"
## [25] "freq_Venue.Type...Detail_Coffee.Shop"
## [26] "freq_Venue.Type...Detail_Fast.Casual"
## [27] "freq_Venue.Type...Detail_Fast.Food"
## [28] "freq_Venue.Type...Detail_Fine.Dining"
## [29] "freq_Venue.Type...Detail_Market"
## [30] "freq_Venue.Type...Detail_Nightclub"
## [31] "freq_Venue.Type...Detail_Unknown"
## [32] "freq_Check.Average_High"
## [33] "freq_Check.Average_Low"
## [34] "freq_Check.Average_Mid"
## [35] "freq_Check.Average_Very.High"
## [36] "freq_Check.Average_Very.Low"
## [37] "freq_Check.Average_Missing"
## [38] "freq_Service.Type_Fast.Casual"
## [39] "freq_Service.Type_Full.Service"
## [40] "freq_Service.Type_Ghost.Kitchen"
## [41] "freq_Service.Type_QSR"
## [42] "freq_Service.Type_Unknown"
## [43] "freq_is_app_purchase_0"
## [44] "freq_is_app_purchase_1"
## [45] "freq_is_app_purchase_Missing"
## [46] "freq_transaction_type_0"
## [47] "freq_transaction_type_1"
## [48] "freq_transaction_type_Missing"
## [49] "freq_credit_type_0"
## [50] "freq_credit_type_5"
## [51] "freq_credit_type_Missing"
## [52] "freq_is_excess_0"
## [53] "freq_is_excess_1"
## [54] "freq_stripe_brand_American.Express"
## [55] "freq_stripe_brand_Discover"
## [56] "freq_stripe_brand_MasterCard"
## [57] "freq_stripe_brand_Visa"
## [58] "freq_stripe_brand_Other"
## [59] "account_created_at"
## [60] "project_name"
## [61] "Location"
## [62] "project_location_id"
## [63] "city"
## [64] "state"
## [65] "user_app_version"
## [66] "user_app_platform"
The RFM scoring system [4] is based on the following principles:
Recency (R): Represents the freshness of customer
activity. A lower recency value indicates more recent activity, so a
lower recency value corresponds to a higher score in R
dimension. This reflects the fact that more recently active customers
are often more engaged with the transaction.
Frequency (F): Reflects how often a customer
purchases. A higher frequency value should correspond to a higher score
in F dimension, as more frequent customers are typically
more loyal.
Monetary (M): Represents the total revenue generated
by the customer. A higher monetary value should correspond to a higher
score in M dimension, as customers who spend more are
generally more valuable to the business.
10 customer segmentations (excluding the Unknown type)
are defined below with various levels for R,
F, and M. This segmentation approach is
inspired by RFM - Customer Level Data (https://rfm.rsquaredacademy.com/articles/rfm-customer-level-data.html)
[4], though I edit the inner structure here a bit to better suit the
data RFM. Notice that each of our F and
M includes 3 sub-components, transaction, redemption, and
charged amount.
segments <- list(
"Champions" = c(5, 5, 5),
"Potential Loyalist" = list(3:5, 3:5, 2:5),
"Loyal Customers" = list(2:4, 2:4, 2:4),
"Promising" = list(3:4, 1:3, 3:5),
"New Customers" = list(4:5, 1:3, 1:5),
"Can't Lose Them" = list(1:2, 3:4, 4:5),
"At Risk" = list(1:2, 2:5, 4:5),
"Need Attention" = list(1:3, 3:5, 3:5),
"About To Sleep" = list(2:3, 1:3, 1:4),
"Lost" = list(1, 1:5, 1:5)
)
assign_segments <- function(r_score, f_score, m_score) {
for (segment in names(segments)) {
r <- segments[[segment]][[1]]
f <- segments[[segment]][[2]]
m <- segments[[segment]][[3]]
if (r_score %in% r & f_score %in% f & m_score %in% m) {
return(segment)
}
}
return("Unknown")
}
RFM <- PDF_JA |> group_by(pair) |>
summarise(
R_redem = min(min_recency_redem, na.rm = TRUE),
R_trans = min(min_recency_trans, na.rm = TRUE),
R_charged = min(min_recency_charged, na.rm = TRUE),
F_redem = sum(freq_redem, na.rm = TRUE),
F_trans = sum(freq_trans, na.rm = TRUE),
F_charged = sum(freq_charged, na.rm = TRUE),
M_redem = max(max_cumsum_redem, na.rm = TRUE),
M_trans = max(max_cumsum_trans, na.rm = TRUE),
M_charged = max(max_cumsum_charged, na.rm = TRUE)
)
head(RFM)
## # A tibble: 6 × 10
## pair R_redem R_trans R_charged F_redem F_trans F_charged M_redem M_trans
## <chr> <dbl> <dbl> <dbl> <int> <int> <int> <dbl> <dbl>
## 1 100001-695 Inf 284 284 0 1 1 0 48.6
## 2 100002-676 284 283 283 1 1 1 126. 1.5
## 3 100003-185 284 283 283 1 2 2 292. 292.
## 4 100005-692 22 22 22 2 1 1 118 18
## 5 100007-695 Inf 283 283 0 1 1 0 26.3
## 6 100008-692 284 283 283 1 1 1 19.5 19.5
## # ℹ 1 more variable: M_charged <dbl>
Let’s first implement R_trans, F_trans, and
M_trans for dimensions of R, F,
and M. This is considered to be the most standard format of
RFM model since it is only based on transaction history.
The output here is RFM_score_t.
RFM_score_t <- RFM |>
mutate(
R_score = ifelse(R_trans == Inf, 1, 6 - ntile(R_trans, 5)),
F_score = ntile(F_trans, 5),
M_score = ntile(M_trans, 5),
RFM_score = paste(R_score, F_score, M_score, sep = "")
)
RFM_score_t$segment <- mapply(assign_segments,
RFM_score_t$R_score,
RFM_score_t$F_score,
RFM_score_t$M_score)
RFM_score_t[1:10, c(1, 11:15)]
## # A tibble: 10 × 6
## pair R_score F_score M_score RFM_score segment
## <chr> <dbl> <int> <int> <chr> <chr>
## 1 100001-695 3 2 2 322 Loyal Customers
## 2 100002-676 3 2 2 322 Loyal Customers
## 3 100003-185 3 5 4 354 Potential Loyalist
## 4 100005-692 5 2 2 522 New Customers
## 5 100007-695 3 2 2 322 Loyal Customers
## 6 100008-692 3 2 2 322 Loyal Customers
## 7 100009-720 3 5 4 354 Potential Loyalist
## 8 100012-702 3 2 4 324 Loyal Customers
## 9 100013-697 3 5 4 354 Potential Loyalist
## 10 100014-706 3 2 3 323 Loyal Customers
Let’s then implement a combination of R_trans,
R_redem, and R_charged, a combination of
F_trans, F_redem, and F_charged
AND a combination of M_trans, M_redem, and
M_charged for dimensions of R, F,
and M. The logic implemented here is followed by:
Recency (R): Since we have 3 sub-components
(R_redem, R_trans, R_charged),
we’ll consider the minimum of these 3 as the recency value. If any of
them contains Inf, we’ll automatically mark the lowest
ranking for R dimension.
Frequency (F): We’ll sum the sub-components
(F_redem, F_trans, F_charged) to
get the frequency value.
Monetary (M): Similarly, we’ll sum the
sub-components (M_redem, M_trans,
M_charged) to get the monetary value.
The output here is RFM_score.
RFM_score <- RFM |>
mutate(
R = pmin(R_redem, R_trans, R_charged, na.rm = TRUE),
F_sum = F_redem + F_trans + F_charged,
M_sum = M_redem + M_trans + M_charged
) |>
mutate(
R_score = ifelse(R == Inf, 1, 6 - ntile(R, 5)),
F_score = ntile(F_sum, 5),
M_score = ntile(M_sum, 5),
RFM_score = paste(R_score, F_score, M_score, sep = "")
)
RFM_score$segment <- mapply(assign_segments,
RFM_score$R_score,
RFM_score$F_score,
RFM_score$M_score)
RFM_score[1:10, c(1, 14:18)]
## # A tibble: 10 × 6
## pair R_score F_score M_score RFM_score segment
## <chr> <dbl> <int> <int> <chr> <chr>
## 1 100001-695 2 1 2 212 About To Sleep
## 2 100002-676 2 2 2 222 Loyal Customers
## 3 100003-185 2 4 5 245 Can't Lose Them
## 4 100005-692 5 4 2 542 Potential Loyalist
## 5 100007-695 2 1 1 211 About To Sleep
## 6 100008-692 2 2 1 221 About To Sleep
## 7 100009-720 2 4 4 244 Loyal Customers
## 8 100012-702 2 2 4 224 Loyal Customers
## 9 100013-697 2 5 4 254 At Risk
## 10 100014-706 2 2 3 223 Loyal Customers
colors <- brewer.pal(10, "Set3")
segment_descriptions <- c(
"Champions" = "Champions (R: 5, F: 5, M: 5)",
"Potential Loyalist" = "Potential Loyalist (R: 3-5, F: 3-5, M: 2-5)",
"Loyal Customers" = "Loyal Customers (R: 2-4, F: 2-4, M: 2-4)",
"Promising" = "Promising (R: 3-4, F: 1-3, M: 3-5)",
"New Customers" = "New Customers (R: 4-5, F: 1-3, M: 1-5)",
"Can't Lose Them" = "Can't Lose Them (R: 1-2, F: 3-4, M: 4-5)",
"At Risk" = "At Risk (R: 1-2, F: 2-5, M: 4-5)",
"Need Attention" = "Need Attention (R: 1-3, F: 3-5, M: 3-5)",
"About To Sleep" = "About To Sleep (R: 2-3, F: 1-3, M: 1-4)",
"Lost" = "Lost (R: 1, F: 1-5, M: 1-5)",
"Unknown" = "Unknown"
)
ggplot(RFM_score_t, aes(x = segment, fill = segment_descriptions[segment])) +
geom_bar() +
scale_fill_brewer(palette = "Set3", na.value = "gray", name = "Segment Description") +
ggtitle("Segment Distribution for RFM_score_t") +
xlab("Segment") +
ylab("Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right")
marker_size <- 3
plot_ly(RFM_score_t, x = ~R_trans, y = ~F_trans, z = ~M_trans,
color = ~segment_descriptions[segment], colors = "Set3") |>
add_markers(marker = list(size = marker_size)) |>
layout(scene = list(xaxis = list(title = "R_trans"),
yaxis = list(title = "F_trans"),
zaxis = list(title = "M_trans")),
title = "3D Scatterplot for RFM_score_t")
ggplot(RFM_score, aes(x = segment, fill = segment_descriptions[segment])) +
geom_bar() +
scale_fill_brewer(palette = "Set3", na.value = "gray", name = "Segment Description") +
ggtitle("Segment Distribution for RFM_score") +
xlab("Segment") +
ylab("Count") +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right")
plot_ly(RFM_score, x = ~R_trans, y = ~F_trans, z = ~M_trans,
color = ~segment_descriptions[segment], colors = "Set3") |>
add_markers(marker = list(size = marker_size)) |>
layout(scene = list(xaxis = list(title = "R_trans"),
yaxis = list(title = "F_trans"),
zaxis = list(title = "M_trans")),
title = "3D Scatterplot for RFM_score")
The segments of RFM_score_t and RFM_score
are predefined, meaning that the actual underlying structure in the data
might not be captured effectively. Also, the approach is rigid as it
relies on specific rules for segmentation. Any changes in customer
behavior or market dynamics might necessitate a change in these rules.
Last, unlike probabilistic models, such classification approach does not
provide any measure of uncertainty associated with the segment
assignments. Given such disadvantages, we aim to redo such a problem
from an unsupervised learning approach in Statistical Modeling IV. We
will mainly take a look at 2 approaches: K-means Clustering, used
heavily in data mining customer segmentation literature in marketing
science, and Gaussian Mixture Models (GMM), a potentially new
direction.
[1] Z. Jiang, “RPubs - Exploratory Data Analysis III - Redemptions and Revenue,” rpubs.com, May 13, 2023. https://rpubs.com/jiangzm/1041691
[2] Z. Jiang, “RPubs - Exploratory Data Analysis IV - Further Revision on Redemptions and Revenue,” rpubs.com, May 17, 2023. https://rpubs.com/jiangzm/1044338
[3] Z. Jiang, “RPubs - Statistical Modeling II - Visualization of Causal Panel Analysis on inKind Data,” rpubs.com, Aug. 04, 2023. https://rpubs.com/jiangzm/1069499
[4] A. Hebbali, “RFM - Customer Level Data,” rfm.rsquaredacademy.com, May 25, 2023. https://rfm.rsquaredacademy.com/articles/rfm-customer-level-data.html