MAST 6252 - Applied Predictive Analytics II

Joyful Voyages, Inc. (JV) runs vacation cruises to various destinations around the world. In 2009, the company acquired a new cohort of 18,402 customers who took their first cruise that year. The ‘cruisescalib.txt’ file records all 26,748 cruises taken by this cohort from 2009 to 2014. By definition, all customers take a cruise in the year in which they are acquired (in this case, 2009), some are modeling only the repeat cruises from 2010 to 2014 (i.e., customers whose only cruise came in 2009 have taken zero repeat cruises during the observation period). Each customer can take at most one cruise per year.

Estimate both a BB and a BG-BB model on these data, and report the parameter estimates. Interpret the parameters for each model, in the context of each model’s implicit assumptions.

#Define function for getting parameters

log_bb <- function(x,m,a,b) {
lchoose(m, x) + lbeta(a+x, b+m-x) - lbeta(a,b)
}
LL_BB <- function(pars, N, m,x) {
a <- exp(pars[1])
b <- exp(pars[2])
z <- log_bb(x,m,a,b)
return(-sum(N*z))
}

M = 2014 - 2009

#Getting a,b parameters for BB model
start <- log(c(1,1))
opt <- optim(start, LL_BB, N=BB[['N']], m = M, x = BB[['Occur']])
BB_a <- exp(opt$par[1])
BB_b <- exp(opt$par[2])
BB_LL <- opt[["value"]]

# Get number of rows as n value for bgbb.LL function
rows <- nrow(BGBB)

#Create a function to sum up bgbb.LL
LL_BGBB <- function(pars, x, tx, m, N){
  a <- exp(pars[1])
  b <- exp(pars[2])
  c <- exp(pars[3])
  d <- exp(pars[4])
  p <- c(a, b, c, d)
  z <- bgbb.LL(p, x, tx, m)
  return(-sum(N*z))
}

#Get a, b, c, d parameters
start <- log(c(1,1,1,1))
opt <- optim(start, LL_BGBB, x = BGBB[['Occur']], tx = BGBB[['Recency']], m = BGBB[['m']], N=BGBB[['N']], method = "BFGS")
BGBB_a <- exp(opt$par[1])
BGBB_b <- exp(opt$par[2])
BGBB_c <- exp(opt$par[3])
BGBB_d <- exp(opt$par[4])
BGBB_LL <- opt[["value"]]

For Beta-binomial distribution, we have parameters a = 0.9 and b = 9.05. Beta-binomial has optimal likelihood 16535.36. Since “a” is smaller than 1 and “b” is bigger than 1, the distribution of (a, b) is “L” shape. As for the assumptions, BB model assumes that all customers will not “die”.

For Beta-Geometric-Beta-binomial distribution, we have parameters a = 1.47, b = 10.78, c = 0.33, and d = 2.36. The distribution has optimal likelihood 27378.13. The distribution of heterogeneity of drop out rate is also “L” shape. That being said, both BB and BGBB models’ parameters shows the cohort of 18,402 customers who decide not to travel with Joyful Voyages, Inc. drop out in the beginning of the observed period (2009-2014).

BGBB model assumes that customers can either be “alive” or “dead”. A “dead” customer will not have transaction in the future, providing no future value to the company.

A table of parameters and likelihood is provided here:

Parameters	BB	BGBB
a	0.90	1.47
b	9.05	10.78
c	NA	0.33
d	NA	2.36
LL	16535.36	27378.13

One way to assess how well a model fits is to compare observed and predicted “histograms” ofthe number of cruises. That is, for the 18,402 customers in the dataset, how many made (orare predicted to make),x= 0,1,2, . . .repeat cruises? Createbotha table and a bar plot todemonstrate this comparison. For the table, each row is a number of cruises (0,1,2, . . .), andeach column is the number of customers taking that number of cruises, as either observed inthe dataset or predicted by each model. For the plot, the number of cruises is on the x-axis,and the number of customers is on the y-axis. Over eachx, you will have three vertical barsnext to each other, each with its own color: one for the observed counts, and one each for theBB and BG-BB models. Be sure all axes, scales, and colors are labeled appropriately. Discuss how well you think the models fit the observed data, and why. Explain any clear discrepancies you may see.

#Get Number of Customers (in this particular cohort)
people <- nrow(df)

x <- c(0:M)

#Get BB rate
BB_rate <- exp(log_bb(x, M, BB_a, BB_b))

BGBB_pars <- c(BGBB_a, BGBB_b, BGBB_c, BGBB_d)
BGBB_rate <- bgbb.pmf(BGBB_pars, M, x)

#Store result in a dataframe
result <- data.frame(Number_of_Cruises = x,
                      BB = round((people * BB_rate),0),
                      BGBB = round((people * BGBB_rate),0),
                      Actual = BB$N
                      )


#Transform result to a histogram
hist <- melt(result, id.vars = 'Number_of_Cruises')
names(hist)[names(hist) == 'value'] <- 'Number_of_Customers'


kable(result, format = 'html') %>%
  kable_styling("striped", full_width = F)

Number_of_Cruises	BB	BGBB	Actual
0	12350	12358	12351
1	4272	4246	4260
2	1349	1370	1380
3	354	355	330
4	69	66	69
5	7	7	12

ggplot(data = hist, aes(x = Number_of_Cruises, y = Number_of_Customers, fill = variable)) + geom_bar(stat = 'identity', position = 'dodge')

The results of both models are very similar, both very close to observed results. This may result in short period of time. BGBB is better predicting in a long run. At 5 year period, both models are doing great. The reason that both models have similar result is because customers tend to stay with the business in short amount of time. Very few customers are “dead” at this time.

Next, let’s see how well the models capture the distribution of cruise counts during a holdout (forecast) period. The file cruiseshold.txt contains repeat cruises made by the same set of 18,402 customers from 2015 to 2018. Create the same types of table and plot as in Question 2, but for the observed and predicted customer counts for each number of cruises during thisholdout period. As above, discuss how well the two models did when predicting data in theholdout period, and explain why either model might do better than the other.Use the parameter estimates from the calibration data to predict counts during the holdoutperiod. Do not re-estimate the models using the holdout data.Note that some customers who took at least one cruise from 2009 to 2014 may not have takenany cruises at all from 2015 to 2018. Nevertheless, these customers still need to be accountedfor in the holdout comparison. You will need to do some kind of dataset manipulation to ensureall customers appear; a database “join” may be one way to do it.

BB_hold <- df2 %>%
  group_by(Occur) %>%
  summarise(N = n())


m1 <- 4
x1 <- c(0:m1)

BB_rate1 <- exp(log_bb(BB_hold$Occur, m1, BB_a, BB_b))
BGBB_rate1 <- bgbb.pmf.General(BGBB_pars, M, m1, 0:m1)

#Store result in a dataframe
result1 <- data.frame(Number_of_Cruises = x1,
                      BB = round((people * BB_rate1),0),
                      BGBB = round((people * BGBB_rate1),0),
                      Actual = BB_hold$N
                      )

#Transform result to a histogram
hist1 <- melt(result1, id.vars = 'Number_of_Cruises')
names(hist1)[names(hist1) == 'value'] <- 'Number_of_Customers'


kable(result1, format = 'html') %>%
  kable_styling("striped", full_width = F)

Number_of_Cruises	BB	BGBB	Actual
0	13204	14313	14216
1	3958	3057	3162
2	1022	851	849
3	197	164	161
4	21	17	14

ggplot(data = hist1, aes(x = Number_of_Cruises, y = Number_of_Customers, fill = variable)) + geom_bar(stat = 'identity', position = 'dodge')

When predicting future transactions, BGBB are closer to actual value. This is because BGBB takes ‘“dead”’ customer out of forecast in the future and predict who are alive only, while BB takes all customers and forecast them together. ‘“Dead”’ customers left in BB model created noise that disrupted accurate prediction.

It is now the nearing the end of 2018, and the company is preparing its forecasts for 2019 andbeyond. The company is assuming that the distributions of cruise probabilities and lifetimerates for new cohorts have remained stationary since 2009, and is comfortable using its existingBG-BB (not BB) parameter estimates for subsequent analysis. No more cruises are expected to be sold in 2018. The company earns a margin of $241 from each cruise, and its annual costof capital (discount rate) is 13%.

Under the maintained assumptions of the BG-BB model, and conditioning on all observed data from 2010 to 2018, what is the expected number of the original 18,402 customers acquired in 2009 who are still “alive” at the start of 2019? (You will need to combine the calibration and holdout datasets to answer this question. This is not a trivial task, so think hard about how you want to do that).
At the beginning of 2019, what is the total expected residual lifetime value of the 2009 cohort?

#Combine both data
all <- df2
all <- subset(all, select = -c(Occur, Recency))

for(i in 1:nrow(all)) {
  all$Year[i] <- ifelse(all$Year.x[i] >= all$Year.y[i], all$Year.x[i], all$Year.y[i])
  all$Occur[i] <- sum(all$`2010`[i], all$`2011`[i],all$`2012`[i], all$`2013`[i], all$`2014`[i], all$`2015`[i], all$`2016`[i], all$`2017`[i], all$`2018`[i])
  all$Recency[i] <- all$Year[i] - 2009
}

all <- subset(all, select = -c(Year.x, Year.y))

m2 <- 9

pred <- all %>%
  group_by(Recency, Occur) %>%
  summarise(N=n()) %>%
  mutate(m = m2)

alive <- bgbb.PAlive(BGBB_pars, pred$Occur, pred$Recency, pred$m)
alive_num <- round(sum(alive * pred$N),0)

discount <- .13
value <- bgbb.DERT(BGBB_pars, pred$Occur, pred$Recency, pred$m, .13)
value <- sum(value * 241 * pred$N)

(a). The number of customers who are still alive at the beginning of 2019 from 2009 cohort is 10253.

To know how many customers acquired in 2009 will still be alive in the beginning of 2019, we first merge the data from 2010 to 2018 and group by the ‘Recency’ and ‘Occurence.’ We then utilized the “bgbb.PAlive” function to calculate “the probability of being alive.” The probability of being alive (alive) times number of customers (N) in each category allows us to get the total predicted number of customers being alive in 2019.

This number gives us an idea that approximately 55.7% of 2009 cohort will still be alive at the start of 2019.

(b). The total expected residual lifetime value of the 2009 cohort is $1997407.

As we know the discount rate is 13%, we can utilize the “bgbb.DERT” function to derive the discounted expected residual transactions (DERT). The total expected residual lifetime value of the 2009 cohort can thus be obtained through DERT times number of customers (N) in each category times the margin $241 from each cruise.

While interpreting the expected residual lifetime value $1997407, we need to be careful as the marketing for profits and marketing for loyalty are two different stratgies. The loyal customers might ask for more financial benefits or services in return for their loyalty. These requests can lead to the lower margin. Therefore, the marketing team might want to look at more metrics to make sure they spend an appropriate amount for running an efficient custmoer retention program.

The company often acquires new customers through targeted online marketing campaigns. It believes that in the upcoming campaign for 2019, it can acquire 11,993 new customers.

By definition, all members of this cohort will take a cruise in 2019. But how many cruises can the company expect from this new cohort in each year from 2020 to 2024? Show this information as a table or plot (your choice).
What is the most that the company should spend on that campaign? Note that the customers have not yet been acquired, and payment for the first cruise has not yet been made. However, the payment for the first cruise will be certain and immediate. For simplicity, your schedule of annual cash flow may assume that all payments are made on January 1 of each year. That is, do not worry about how purchases are made at different times during a year.

year_num <- round(diff(11993 * bgbb.Expectation(BGBB_pars, 0:5)),0)
new_cohort <- data.frame(Year = 2020: 2024,
                         Number_of_CUstomer = year_num
)

expect <- 11993*241+11993*241*bgbb.DERT(BGBB_pars, 0, 0, 0, discount)

(a). Numbers of cruises the company can make from 2019 cohort in between 2020 to 2024 are in the following table:

kable(new_cohort, format = 'html') %>%
  kable_styling("striped", full_width = F)

Year	Number_of_CUstomer
2020	1263
2021	1149
2022	1068
2023	1006
2024	956

The company can make $4598361, including money they make in 2019. Therefore, the company should spend any amount in investment that is below $4598361.

MAST 6252 - Applied Predictive Analytics II - Assignment 2

Jack Chen, Sandy Hsieh, Carlos Tian

January 27, 2019