Abstract

Achieving optimal learning outcomes requires consistent engagement with learning activities over a sufficiently long period of time. However, sustaining engagement on a regular basis can be challenging especially in a highly autonomous environment such as online learning. We investigate factors that drive individuals’ daily learning decisions and the subsequent evolution of consistency or formation of habits in learning using data from a subscription-based online learning platform. We jointly model individual-level product usage and subscription decisions, using a Bayesian nonparametric Gaussian process framework, which allows us to flexibly capture the dynamic heterogeneous effects of contextual and individual-specific factors on consumer engagement and contract choices.

keywords: customer-base analysis, habit formation,

Introduction

Literature

customer-base analysis:

Gopalakrishnan et al. (2017) highlight the perils of pooling data across cohorts without accounting for cross-cohorts shifts. They develop a vector changepoint models to uncover evidence of (latent) regime changes for each cohort-level parameter separately, while disentangling cross-cohort changes from calendar-time changes.

Data and Summary Statistics

## [1] "Individuals That Enrolled Free Trials by Trial Length/Year/Program:"
##               2014 2015 2016  2017  2018  2019   2020   2021  2022
## (-2.05e+03,6]    0    0    2    11    30    81   1071   5490  3996
## (6,7]            0    0    0     3     5  4758 111801 139866 86963
## (7,14]           0    0    0    30   110   195   1054   1097   780
## (14,15]         12   15   35 40129 69930 76908 202820    246    83
## (15,16]          5    0    0   149   284   365    651     30    25
## (16,7.76e+03]   18    7   27  3008  2485   703   5029   1450   291
## COMP_LECTORA     0    0    0     0     0     0  10599  39031 32120
## MATEMATICAS     38   23   64 43385 73355 84010 313954 110206 60781
## [1] "Trial Period Information Availability and Association with Other Accounts"
##            multip_kids
## no_trialend  FALSE   TRUE
##       FALSE 373457 388591
##       TRUE     513   5005
## [1] "Number of New Subscriptions by Year and Contrat Type"
##        101  102  103 104 134 135  136
## total 6089 5938 2024 753 944 283 3403
## 2017  1163 1660  425 134 133  19  492
## 2018  2697 2376  877 312 311  93 1149
## 2019  2229 1902  722 307 500 171 1762

Trial Usage

Subset a smaller sample of users that satisfy the following criteria:

  • started the trial period between 2017-01-01 and 2019-10-01
  • 15-day trial period
  • single child ( the only account associated with one tutor account)
  • who only subscribe to regular/summer promotional contracts

Data Processing:

  • Replace the age information of 3434 users whose recorded ages are not within our target child age range by random draw from the age distribution of the customer base.
  • Fill missing observation of trials with 0 for daily usage

Sample size: 87,805 users

trial15 = subset(trial_window,tr_length==15 & num_asso_kids==1 & fecha>='2017-01-01' & fecha < '2019-10-01' & !alumno %in% con.id)

trial15$age = year(trial15$fecha) - year(trial15$fecha_nacimiento)
trial15$age_ran = ifelse(trial15$age %in% seq(4,14),0,1)
p = prop.table(table(cut(trial15$age,breaks=seq(3,14))))

set.seed(231)
trial15$age[!trial15$age %in% seq(4,14)] = sample(seq(4,14),size=sum(!trial15$age %in% seq(4,14)),replace=T,prob=p)
#table(trial15$mundo_virtual)
names(trial15)[1] = 'fecha_trial_start'
trial15 = trial15 %>% select(-c('num_asso_kids','num_iden_kids','estado','orden','orden_program','use'))
#names(session)

ses_tr15 = trial15 %>% left_join(session[,c('alumno','use','mundo_virtual','fecha')],by=c('alumno'),suffix = c('_t1',''))
ses_tr15 = ses_tr15 %>% mutate(tr_day = as.numeric(fecha-fecha_trial_start)+1,tr_dend= as.numeric(fecha_trial_end-fecha)) %>% filter(tr_day>0,tr_dend>=0)
Summary Statistics of Trial Usage and Subscription
 #Total     tarifa_cat     tarifa 
   NoSub   Regular   Summer     101   102   103   104   134   135   136 
 trial_ls 
   Mean  3.6   3.2 11.3 10.1   10.7 11.6 12.5 11.4 9.1 10.0 10.4
   Std. dev.  3.7   3.2 3.3 3.9   3.5 3.1 2.8 3.3 4.3 3.8 3.8
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 trial_gs 
   Mean  2.7   2.4 8.8 7.7   8.2 9.0 10.2 9.3 7.0 7.7 7.9
   Std. dev.  3.3   2.8 4.4 4.5   4.4 4.3 4.1 3.9 4.4 4.5 4.4
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 wkly.usage_1 
   Mean  1.6   1.4 5.3 4.8   5.0 5.5 5.9 5.3 4.4 4.5 5.0
   Std. dev.  2.1   2.0 1.7 2.0   1.9 1.6 1.4 1.7 2.3 2.0 1.9
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 wkly.usage_2 
   Mean  1.0   0.7 5.0 4.3   4.7 5.1 5.5 5.1 3.7 4.5 4.4
   Std. dev.  1.8   1.6 2.0 2.3   2.1 1.9 1.8 2.0 2.4 2.1 2.3
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 wkly.game_1 
   Mean  1.2   1.1 4.2 3.7   3.9 4.3 4.9 4.4 3.4 3.6 3.8
   Std. dev.  1.8   1.7 2.2 2.3   2.3 2.2 2.1 2.1 2.4 2.3 2.3
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 wkly.game_2 
   Mean  0.7   0.5 3.8 3.1   3.5 3.9 4.4 4.0 2.7 3.3 3.3
   Std. dev.  1.6   1.3 2.4 2.4   2.4 2.4 2.2 2.3 2.4 2.4 2.4
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 num_effect_paid 
   Mean  6.1   6.5 4.5   9.2 5.4 2.5 2.2 4.1 2.9 4.8
   Std. dev.  8.4   8.5 7.4   11.3 5.7 2.2 1.6 7.3 4.1 7.5
   Unw. valid N  4654.0   3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 mean_price 
   Mean  105.8   112.6 75.2   46.4 106.1 297.9 280.3 39.8 58.6 86.2
   Std. dev.  87.4   92.7 47.3   25.7 27.5 56.0 102.4 28.4 13.0 47.7
   Unw. valid N  4654.0   3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 use_dif 
   Mean  -0.7   -0.7 -0.4 -0.6   -0.4 -0.4 -0.4 -0.2 -0.7 -0.1 -0.6
   Std. dev.  1.4   1.4 1.7 1.8   1.8 1.6 1.5 1.7 1.8 1.7 1.8
   Unw. valid N  87793.0   83139.0 3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
 lag_sub 
   Mean  31.8   18.7 91.2   21.8 16.8 15.5 16.3 130.1 78.9 81.0
   Std. dev.  99.5   72.2 163.3   78.2 72.1 57.6 47.4 215.5 162.9 143.7
   Unw. valid N  4654.0   3808.0 846.0   1541.0 1642.0 437.0 188.0 178.0 39.0 629.0
##         use_d1 use_d2 use_d3 use_d4 use_d5 use_d6 use_d7 use_d8 use_d9 use_d10
## use_d1       1   0.00   0.00   0.00   0.00   0.00   0.00   0.00   0.00    0.00
## use_d2       0   1.00   0.44   0.40   0.38   0.35   0.35   0.35   0.36    0.34
## use_d3       0   0.44   1.00   0.50   0.46   0.43   0.42   0.41   0.42    0.42
## use_d4       0   0.40   0.50   1.00   0.53   0.48   0.46   0.45   0.45    0.45
## use_d5       0   0.38   0.46   0.53   1.00   0.53   0.50   0.48   0.48    0.46
## use_d6       0   0.35   0.43   0.48   0.53   1.00   0.55   0.51   0.50    0.48
## use_d7       0   0.35   0.42   0.46   0.50   0.55   1.00   0.55   0.53    0.51
## use_d8       0   0.35   0.41   0.45   0.48   0.51   0.55   1.00   0.57    0.54
## use_d9       0   0.36   0.42   0.45   0.48   0.50   0.53   0.57   1.00    0.59
## use_d10      0   0.34   0.42   0.45   0.46   0.48   0.51   0.54   0.59    1.00
## use_d11      0   0.31   0.39   0.46   0.46   0.47   0.48   0.51   0.56    0.59
## use_d12      0   0.30   0.38   0.42   0.46   0.47   0.48   0.49   0.53    0.55
## use_d13      0   0.29   0.35   0.39   0.43   0.47   0.47   0.48   0.50    0.52
## use_d14      0   0.27   0.33   0.36   0.39   0.42   0.46   0.45   0.47    0.48
## use_d15      0   0.26   0.30   0.33   0.35   0.37   0.40   0.42   0.44    0.45
##         use_d11 use_d12 use_d13 use_d14 use_d15
## use_d1     0.00    0.00    0.00    0.00    0.00
## use_d2     0.31    0.30    0.29    0.27    0.26
## use_d3     0.39    0.38    0.35    0.33    0.30
## use_d4     0.46    0.42    0.39    0.36    0.33
## use_d5     0.46    0.46    0.43    0.39    0.35
## use_d6     0.47    0.47    0.47    0.42    0.37
## use_d7     0.48    0.48    0.47    0.46    0.40
## use_d8     0.51    0.49    0.48    0.45    0.42
## use_d9     0.56    0.53    0.50    0.47    0.44
## use_d10    0.59    0.55    0.52    0.48    0.45
## use_d11    1.00    0.59    0.54    0.50    0.45
## use_d12    0.59    1.00    0.58    0.52    0.47
## use_d13    0.54    0.58    1.00    0.55    0.48
## use_d14    0.50    0.52    0.55    1.00    0.55
## use_d15    0.45    0.47    0.48    0.55    1.00

Usage Under Contract

3,808 users subscribe to regular contracts (1-/3-/12-month) between 2017-01 and 2019-11-01.

## [1] 26296
##      alumno              n        
##  Min.   : 267788   Min.   : 1.00  
##  1st Qu.: 546736   1st Qu.:89.00  
##  Median : 779851   Median :90.00  
##  Mean   : 780196   Mean   :79.98  
##  3rd Qu.:1003163   3rd Qu.:90.00  
##  Max.   :1485284   Max.   :94.00

dropoff <- read_csv("dropoff.csv")

id = dropoff$id[dropoff$fecha >= '2014-10-01' & dropoff$fecha<='2015-01-01' & dropoff$antiguedad_dias==0 & dropoff$tarifa==102]
dropoff_use = dropoff[dropoff$id %in% id, c('id','fecha','dias_sin_contrato','antiguedad_dias','tarifa','fecha_inicio','fecha_fin','fecha_ultima_sesion','start_time','end_time','num_sesiones_contrato','sesion_realizada')]
names(dropoff)
##  [1] "fecha"                                  
##  [2] "contrato_baja"                          
##  [3] "dias_contrato"                          
##  [4] "antiguedad_dias"                        
##  [5] "dias_sin_contrato"                      
##  [6] "tipo_alumno"                            
##  [7] "id"                                     
##  [8] "edad"                                   
##  [9] "curriculo"                              
## [10] "fecha_fin"                              
## [11] "fecha_inicio"                           
## [12] "id_contrato"                            
## [13] "tarifa"                                 
## [14] "renovacion_automatica"                  
## [15] "no_renovar_contratos"                   
## [16] "num_contratos_previos"                  
## [17] "fecha_ultima_sesion"                    
## [18] "fecha_ultima_sesion_solicitud_baja"     
## [19] "fecha_solicitud_baja"                   
## [20] "fecha_solicitud_baja_hermano"           
## [21] "hermano_baja"                           
## [22] "fecha_fin_hermano_baja"                 
## [23] "dia_baja"                               
## [24] "sin_sesiones_contrato"                  
## [25] "motivo_baja"                            
## [26] "fecha_penultima_sesion"                 
## [27] "num_sesiones_contrato"                  
## [28] "asistencia_contrato"                    
## [29] "num_sesiones_mes"                       
## [30] "asistencia_mes"                         
## [31] "num_sesiones_semana"                    
## [32] "asistencia_semana"                      
## [33] "dias_sin_sesion"                        
## [34] "dias_solicitud_baja"                    
## [35] "dias_solicitud_hermano_baja"            
## [36] "dias_hermano_baja"                      
## [37] "asistencia_contrato_tramos"             
## [38] "asistencia_mes_tramos"                  
## [39] "asistencia_semana_tramos"               
## [40] "sesion_realizada"                       
## [41] "tipo_sesion"                            
## [42] "leccion"                                
## [43] "sup_contrato"                           
## [44] "sup_mes"                                
## [45] "sup_semana"                             
## [46] "vez_leccion"                            
## [47] "mundo_virtual_contrato"                 
## [48] "mundo_virtual_mes"                      
## [49] "mundo_virtual_semana"                   
## [50] "pregunta_inicio_mes"                    
## [51] "pregunta_inicio_semana"                 
## [52] "triste_mes"                             
## [53] "triste_semana"                          
## [54] "pregunta_inicio"                        
## [55] "pregunta_fin_mes"                       
## [56] "pregunta_fin_semana"                    
## [57] "dificil_mes"                            
## [58] "dificil_semana"                         
## [59] "num_mes"                                
## [60] "num_semana"                             
## [61] "pregunta_fin"                           
## [62] "muy_triste_mes"                         
## [63] "muy_triste_semana"                      
## [64] "id_sesion"                              
## [65] "id_ejercicio"                           
## [66] "state_sesion"                           
## [67] "start_time"                             
## [68] "end_time"                               
## [69] "efectividad_sesion"                     
## [70] "numero_problemas_total"                 
## [71] "ticks"                                  
## [72] "consecutive_errors"                     
## [73] "consecutive_successes"                  
## [74] "dificultad"                             
## [75] "grados_dificultad"                      
## [76] "tipo_incidencia_microprogramacion"      
## [77] "tmr"                                    
## [78] "efectividad_ejercicio_normal"           
## [79] "numero_problemas_total_ejercicio_normal"
dropoff_use$tod= hour(ymd_hms(dropoff_use$start_time))
subset(dropoff_use, id %in% sample(id,8)) %>% mutate(dow= factor(weekdays(fecha,T),levels=c('Mon','Tue',"Wed",'Thu','Fri','Sat','Sun'))) %>% filter(!is.na(tod)) %>% group_by(id,dow,tod) %>% summarise(use=sum(sesion_realizada)) %>% ggplot(aes(x=factor(tod),y=dow)) + geom_tile(aes(fill=use)) + coord_equal()+facet_wrap(~id) + theme_classic()

subset(dropoff_use, id %in% sample(id,4)) %>% mutate(week= factor(antiguedad_dias %/%7),dow= factor(weekdays(as.Date(start_time),T),levels=c('Mon','Tue',"Wed",'Thu','Fri','Sat','Sun'))) %>% filter(week %in% c(0:12))%>%ggplot(aes(x=week,y=tod)) + geom_dotplot(aes(fill=dow),alpha=0.7,binaxis = 'y',position = 'identity') + theme_classic() + facet_wrap(~id)

#geom_dotplot()
#subset(dropoff_use, id %in% sample(id,1)) %>% mutate(week= antiguedad_dias %/%7,dow= factor(weekdays(fecha,T),levels=c('Mon','Tue',"Wed",'Thu','Fri','Sat','Sun'))) %>% filter(!is.na(tod) & week<9) %>% group_by(id,dow,tod,week) %>% summarise(use=sum(sesion_realizada)) %>% ggplot(aes(y=factor(tod),x=dow)) + geom_tile(aes(fill=dow)) + coord_equal()+facet_wrap(~week,nrow=1) + theme_classic()

Model Development

Setup

Our data set contains records of transaction (including daily product usage decisions and contract choices) throughout each individual’s lifetime with the online learning platform.

Consider individual \(i\) observed over period \(T_{it}\) throughout his/her lifetime since his/her first interaction with the firm at calendar time \(T_{i,t=0}\). We consider a discrete-time multi-stage model for each consumer where the unit of time \(t\) denoting the lifetime with the firm is defined at the daily level. The state of each individual at time \(T_{i,\tau}\) can be represented by a trivariate vector \(y = [y_{ik,\tau_0}^c,y_{ik,\tau_t}^m,y_{ik,\tau_t}^g]\), where \(y_{ik,\tau_0}^c\) represents the contract choice\(j_{\tau} \in J_{\tau}=\{0,1,2,...\}\) for the \(k\)th incidence of subscription decision,

  1. Contract Choice \(y^c_{it,k}\)

\[ u_{it,k}^j = \delta_{ij} -\beta_{ip} p_{j} + \theta_{it}l_j + \epsilon_{ijt} \]

\[ \theta_{it} = \theta_{i0} + \alpha^c(X_{it}) + \alpha^c(T_{it}) + \gamma'_cz_i \]

\[ P(y_{it,k}^c= j) = \frac{\exp(v_{it}^j)}{1+\sum_{{j'}=1}^J \exp(v_{i{j'}}^{j'})} \]

where \(v_{it}^j = \delta_{ij} - \beta_{ip}p_j + \theta_{itl}l_j + \gamma'z_i\)

  • \(X_{it}\) individual time-varying covariates: (1) past interactions with the product (e.g., usage stock, length of most recent streak, day passages since last usage), (2) lifetime with the product & time passage from the end of last contract, (3) past transaction (contract), (4)
  • \(T_{it}\) calendar time events: (1) short-term & long-term (2) seasonality and the day of the week

auto-renewal

frictions/inertia/…

  1. Daily usage \([y_{it}^{m},y_{it}^g|y_{it,k}^c=1]\)
  • \(y_{it}^m =1\) if the user completed the learning session, 0 otherwise

  • \(y_{it}^g=1\) if the user engaged with activities in virtual world, 0 otherwise

    \[ P(y_{it}^m=1) = \frac{\exp (v_{it}^m)}{1+\exp (v_{it}^m)} \]

\[ P(y_{it}^g=1) = \{ \begin{array}{r@{}} \frac{\exp (v_{it}^g)}{1+\exp(v_{iv}^g)},\; y_{it}^m=1 \\ 0,\;y_{it}^m=0 \end{array} \]

\[ u_{it}^m = \alpha^m(X_{it}^2) +\alpha^m(T_{it}^2) + \mu^m_{it} + \gamma_m'z_i + \epsilon_{it}^m \]

\[ u_{it}^g = \alpha^g(X_{it}^3) + \alpha^g(T_{it}^3) + \mu_{it}^g+\gamma_g'z_i + \epsilon_{it}^g \]

Appendix: Data Preparation