Achieving optimal learning outcomes requires consistent engagement with learning activities over a sufficiently long period of time. However, sustaining engagement on a regular basis can be challenging especially in a highly autonomous environment such as online learning. We investigate factors that drive individuals’ daily learning decisions and the subsequent evolution of consistency or formation of habits in learning using data from a subscription-based online learning platform. We jointly model individual-level product usage and subscription decisions, using a Bayesian nonparametric Gaussian process framework, which allows us to flexibly capture the dynamic heterogeneous effects of contextual and individual-specific factors on consumer engagement and contract choices.
keywords: customer-base analysis, habit formation,
customer-base analysis:
Gopalakrishnan et al. (2017) highlight the perils of pooling data across cohorts without accounting for cross-cohorts shifts. They develop a vector changepoint models to uncover evidence of (latent) regime changes for each cohort-level parameter separately, while disentangling cross-cohort changes from calendar-time changes.
## [1] "Individuals That Enrolled Free Trials by Trial Length/Year/Program:"
## 2014 2015 2016 2017 2018 2019 2020 2021 2022
## (-2.05e+03,6] 0 0 2 11 30 81 1071 5490 3996
## (6,7] 0 0 0 3 5 4758 111801 139866 86963
## (7,14] 0 0 0 30 110 195 1054 1097 780
## (14,15] 12 15 35 40129 69930 76908 202820 246 83
## (15,16] 5 0 0 149 284 365 651 30 25
## (16,7.76e+03] 18 7 27 3008 2485 703 5029 1450 291
## COMP_LECTORA 0 0 0 0 0 0 10599 39031 32120
## MATEMATICAS 38 23 64 43385 73355 84010 313954 110206 60781
## [1] "Trial Period Information Availability and Association with Other Accounts"
## multip_kids
## no_trialend FALSE TRUE
## FALSE 373457 388591
## TRUE 513 5005
## [1] "Number of New Subscriptions by Year and Contrat Type"
## 101 102 103 104 134 135 136
## total 6089 5938 2024 753 944 283 3403
## 2017 1163 1660 425 134 133 19 492
## 2018 2697 2376 877 312 311 93 1149
## 2019 2229 1902 722 307 500 171 1762
Subset a smaller sample of users that satisfy the following criteria:
Data Processing:
Sample size: 87,805 users
trial15 = subset(trial_window,tr_length==15 & num_asso_kids==1 & fecha>='2017-01-01' & fecha < '2019-10-01' & !alumno %in% con.id)
trial15$age = year(trial15$fecha) - year(trial15$fecha_nacimiento)
trial15$age_ran = ifelse(trial15$age %in% seq(4,14),0,1)
p = prop.table(table(cut(trial15$age,breaks=seq(3,14))))
set.seed(231)
trial15$age[!trial15$age %in% seq(4,14)] = sample(seq(4,14),size=sum(!trial15$age %in% seq(4,14)),replace=T,prob=p)
#table(trial15$mundo_virtual)
names(trial15)[1] = 'fecha_trial_start'
trial15 = trial15 %>% select(-c('num_asso_kids','num_iden_kids','estado','orden','orden_program','use'))
#names(session)
ses_tr15 = trial15 %>% left_join(session[,c('alumno','use','mundo_virtual','fecha')],by=c('alumno'),suffix = c('_t1',''))
ses_tr15 = ses_tr15 %>% mutate(tr_day = as.numeric(fecha-fecha_trial_start)+1,tr_dend= as.numeric(fecha_trial_end-fecha)) %>% filter(tr_day>0,tr_dend>=0)
| Summary Statistics of Trial Usage and Subscription | |||||||||||||
|  #Total |  |  tarifa_cat |  |  tarifa | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|  |  NoSub |  Regular |  Summer |  |  101 |  102 |  103 |  104 |  134 |  135 |  136 | ||
|  trial_ls | |||||||||||||
|    Mean | 3.6 |  | 3.2 | 11.3 | 10.1 |  | 10.7 | 11.6 | 12.5 | 11.4 | 9.1 | 10.0 | 10.4 |
|    Std. dev. | 3.7 |  | 3.2 | 3.3 | 3.9 |  | 3.5 | 3.1 | 2.8 | 3.3 | 4.3 | 3.8 | 3.8 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
|  trial_gs | |||||||||||||
|    Mean | 2.7 |  | 2.4 | 8.8 | 7.7 |  | 8.2 | 9.0 | 10.2 | 9.3 | 7.0 | 7.7 | 7.9 |
|    Std. dev. | 3.3 |  | 2.8 | 4.4 | 4.5 |  | 4.4 | 4.3 | 4.1 | 3.9 | 4.4 | 4.5 | 4.4 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
| Â wkly.usage_1Â | |||||||||||||
|    Mean | 1.6 |  | 1.4 | 5.3 | 4.8 |  | 5.0 | 5.5 | 5.9 | 5.3 | 4.4 | 4.5 | 5.0 |
|    Std. dev. | 2.1 |  | 2.0 | 1.7 | 2.0 |  | 1.9 | 1.6 | 1.4 | 1.7 | 2.3 | 2.0 | 1.9 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
| Â wkly.usage_2Â | |||||||||||||
|    Mean | 1.0 |  | 0.7 | 5.0 | 4.3 |  | 4.7 | 5.1 | 5.5 | 5.1 | 3.7 | 4.5 | 4.4 |
|    Std. dev. | 1.8 |  | 1.6 | 2.0 | 2.3 |  | 2.1 | 1.9 | 1.8 | 2.0 | 2.4 | 2.1 | 2.3 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
| Â wkly.game_1Â | |||||||||||||
|    Mean | 1.2 |  | 1.1 | 4.2 | 3.7 |  | 3.9 | 4.3 | 4.9 | 4.4 | 3.4 | 3.6 | 3.8 |
|    Std. dev. | 1.8 |  | 1.7 | 2.2 | 2.3 |  | 2.3 | 2.2 | 2.1 | 2.1 | 2.4 | 2.3 | 2.3 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
| Â wkly.game_2Â | |||||||||||||
|    Mean | 0.7 |  | 0.5 | 3.8 | 3.1 |  | 3.5 | 3.9 | 4.4 | 4.0 | 2.7 | 3.3 | 3.3 |
|    Std. dev. | 1.6 |  | 1.3 | 2.4 | 2.4 |  | 2.4 | 2.4 | 2.2 | 2.3 | 2.4 | 2.4 | 2.4 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
|  num_effect_paid | |||||||||||||
|    Mean | 6.1 |  | 6.5 | 4.5 |  | 9.2 | 5.4 | 2.5 | 2.2 | 4.1 | 2.9 | 4.8 | |
|    Std. dev. | 8.4 |  | 8.5 | 7.4 |  | 11.3 | 5.7 | 2.2 | 1.6 | 7.3 | 4.1 | 7.5 | |
| Â Â Â Unw. valid NÂ | 4654.0 | Â | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 | |
|  mean_price | |||||||||||||
|    Mean | 105.8 |  | 112.6 | 75.2 |  | 46.4 | 106.1 | 297.9 | 280.3 | 39.8 | 58.6 | 86.2 | |
|    Std. dev. | 87.4 |  | 92.7 | 47.3 |  | 25.7 | 27.5 | 56.0 | 102.4 | 28.4 | 13.0 | 47.7 | |
| Â Â Â Unw. valid NÂ | 4654.0 | Â | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 | |
|  use_dif | |||||||||||||
|    Mean | -0.7 |  | -0.7 | -0.4 | -0.6 |  | -0.4 | -0.4 | -0.4 | -0.2 | -0.7 | -0.1 | -0.6 |
|    Std. dev. | 1.4 |  | 1.4 | 1.7 | 1.8 |  | 1.8 | 1.6 | 1.5 | 1.7 | 1.8 | 1.7 | 1.8 |
| Â Â Â Unw. valid NÂ | 87793.0 | Â | 83139.0 | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 |
|  lag_sub | |||||||||||||
|    Mean | 31.8 |  | 18.7 | 91.2 |  | 21.8 | 16.8 | 15.5 | 16.3 | 130.1 | 78.9 | 81.0 | |
|    Std. dev. | 99.5 |  | 72.2 | 163.3 |  | 78.2 | 72.1 | 57.6 | 47.4 | 215.5 | 162.9 | 143.7 | |
| Â Â Â Unw. valid NÂ | 4654.0 | Â | 3808.0 | 846.0 | Â | 1541.0 | 1642.0 | 437.0 | 188.0 | 178.0 | 39.0 | 629.0 | |
## use_d1 use_d2 use_d3 use_d4 use_d5 use_d6 use_d7 use_d8 use_d9 use_d10
## use_d1 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## use_d2 0 1.00 0.44 0.40 0.38 0.35 0.35 0.35 0.36 0.34
## use_d3 0 0.44 1.00 0.50 0.46 0.43 0.42 0.41 0.42 0.42
## use_d4 0 0.40 0.50 1.00 0.53 0.48 0.46 0.45 0.45 0.45
## use_d5 0 0.38 0.46 0.53 1.00 0.53 0.50 0.48 0.48 0.46
## use_d6 0 0.35 0.43 0.48 0.53 1.00 0.55 0.51 0.50 0.48
## use_d7 0 0.35 0.42 0.46 0.50 0.55 1.00 0.55 0.53 0.51
## use_d8 0 0.35 0.41 0.45 0.48 0.51 0.55 1.00 0.57 0.54
## use_d9 0 0.36 0.42 0.45 0.48 0.50 0.53 0.57 1.00 0.59
## use_d10 0 0.34 0.42 0.45 0.46 0.48 0.51 0.54 0.59 1.00
## use_d11 0 0.31 0.39 0.46 0.46 0.47 0.48 0.51 0.56 0.59
## use_d12 0 0.30 0.38 0.42 0.46 0.47 0.48 0.49 0.53 0.55
## use_d13 0 0.29 0.35 0.39 0.43 0.47 0.47 0.48 0.50 0.52
## use_d14 0 0.27 0.33 0.36 0.39 0.42 0.46 0.45 0.47 0.48
## use_d15 0 0.26 0.30 0.33 0.35 0.37 0.40 0.42 0.44 0.45
## use_d11 use_d12 use_d13 use_d14 use_d15
## use_d1 0.00 0.00 0.00 0.00 0.00
## use_d2 0.31 0.30 0.29 0.27 0.26
## use_d3 0.39 0.38 0.35 0.33 0.30
## use_d4 0.46 0.42 0.39 0.36 0.33
## use_d5 0.46 0.46 0.43 0.39 0.35
## use_d6 0.47 0.47 0.47 0.42 0.37
## use_d7 0.48 0.48 0.47 0.46 0.40
## use_d8 0.51 0.49 0.48 0.45 0.42
## use_d9 0.56 0.53 0.50 0.47 0.44
## use_d10 0.59 0.55 0.52 0.48 0.45
## use_d11 1.00 0.59 0.54 0.50 0.45
## use_d12 0.59 1.00 0.58 0.52 0.47
## use_d13 0.54 0.58 1.00 0.55 0.48
## use_d14 0.50 0.52 0.55 1.00 0.55
## use_d15 0.45 0.47 0.48 0.55 1.00
3,808 users subscribe to regular contracts (1-/3-/12-month) between 2017-01 and 2019-11-01.
## [1] 26296
## alumno n
## Min. : 267788 Min. : 1.00
## 1st Qu.: 546736 1st Qu.:89.00
## Median : 779851 Median :90.00
## Mean : 780196 Mean :79.98
## 3rd Qu.:1003163 3rd Qu.:90.00
## Max. :1485284 Max. :94.00
dropoff <- read_csv("dropoff.csv")
id = dropoff$id[dropoff$fecha >= '2014-10-01' & dropoff$fecha<='2015-01-01' & dropoff$antiguedad_dias==0 & dropoff$tarifa==102]
dropoff_use = dropoff[dropoff$id %in% id, c('id','fecha','dias_sin_contrato','antiguedad_dias','tarifa','fecha_inicio','fecha_fin','fecha_ultima_sesion','start_time','end_time','num_sesiones_contrato','sesion_realizada')]
names(dropoff)
## [1] "fecha"
## [2] "contrato_baja"
## [3] "dias_contrato"
## [4] "antiguedad_dias"
## [5] "dias_sin_contrato"
## [6] "tipo_alumno"
## [7] "id"
## [8] "edad"
## [9] "curriculo"
## [10] "fecha_fin"
## [11] "fecha_inicio"
## [12] "id_contrato"
## [13] "tarifa"
## [14] "renovacion_automatica"
## [15] "no_renovar_contratos"
## [16] "num_contratos_previos"
## [17] "fecha_ultima_sesion"
## [18] "fecha_ultima_sesion_solicitud_baja"
## [19] "fecha_solicitud_baja"
## [20] "fecha_solicitud_baja_hermano"
## [21] "hermano_baja"
## [22] "fecha_fin_hermano_baja"
## [23] "dia_baja"
## [24] "sin_sesiones_contrato"
## [25] "motivo_baja"
## [26] "fecha_penultima_sesion"
## [27] "num_sesiones_contrato"
## [28] "asistencia_contrato"
## [29] "num_sesiones_mes"
## [30] "asistencia_mes"
## [31] "num_sesiones_semana"
## [32] "asistencia_semana"
## [33] "dias_sin_sesion"
## [34] "dias_solicitud_baja"
## [35] "dias_solicitud_hermano_baja"
## [36] "dias_hermano_baja"
## [37] "asistencia_contrato_tramos"
## [38] "asistencia_mes_tramos"
## [39] "asistencia_semana_tramos"
## [40] "sesion_realizada"
## [41] "tipo_sesion"
## [42] "leccion"
## [43] "sup_contrato"
## [44] "sup_mes"
## [45] "sup_semana"
## [46] "vez_leccion"
## [47] "mundo_virtual_contrato"
## [48] "mundo_virtual_mes"
## [49] "mundo_virtual_semana"
## [50] "pregunta_inicio_mes"
## [51] "pregunta_inicio_semana"
## [52] "triste_mes"
## [53] "triste_semana"
## [54] "pregunta_inicio"
## [55] "pregunta_fin_mes"
## [56] "pregunta_fin_semana"
## [57] "dificil_mes"
## [58] "dificil_semana"
## [59] "num_mes"
## [60] "num_semana"
## [61] "pregunta_fin"
## [62] "muy_triste_mes"
## [63] "muy_triste_semana"
## [64] "id_sesion"
## [65] "id_ejercicio"
## [66] "state_sesion"
## [67] "start_time"
## [68] "end_time"
## [69] "efectividad_sesion"
## [70] "numero_problemas_total"
## [71] "ticks"
## [72] "consecutive_errors"
## [73] "consecutive_successes"
## [74] "dificultad"
## [75] "grados_dificultad"
## [76] "tipo_incidencia_microprogramacion"
## [77] "tmr"
## [78] "efectividad_ejercicio_normal"
## [79] "numero_problemas_total_ejercicio_normal"
dropoff_use$tod= hour(ymd_hms(dropoff_use$start_time))
subset(dropoff_use, id %in% sample(id,8)) %>% mutate(dow= factor(weekdays(fecha,T),levels=c('Mon','Tue',"Wed",'Thu','Fri','Sat','Sun'))) %>% filter(!is.na(tod)) %>% group_by(id,dow,tod) %>% summarise(use=sum(sesion_realizada)) %>% ggplot(aes(x=factor(tod),y=dow)) + geom_tile(aes(fill=use)) + coord_equal()+facet_wrap(~id) + theme_classic()
subset(dropoff_use, id %in% sample(id,4)) %>% mutate(week= factor(antiguedad_dias %/%7),dow= factor(weekdays(as.Date(start_time),T),levels=c('Mon','Tue',"Wed",'Thu','Fri','Sat','Sun'))) %>% filter(week %in% c(0:12))%>%ggplot(aes(x=week,y=tod)) + geom_dotplot(aes(fill=dow),alpha=0.7,binaxis = 'y',position = 'identity') + theme_classic() + facet_wrap(~id)
#geom_dotplot()
#subset(dropoff_use, id %in% sample(id,1)) %>% mutate(week= antiguedad_dias %/%7,dow= factor(weekdays(fecha,T),levels=c('Mon','Tue',"Wed",'Thu','Fri','Sat','Sun'))) %>% filter(!is.na(tod) & week<9) %>% group_by(id,dow,tod,week) %>% summarise(use=sum(sesion_realizada)) %>% ggplot(aes(y=factor(tod),x=dow)) + geom_tile(aes(fill=dow)) + coord_equal()+facet_wrap(~week,nrow=1) + theme_classic()
Our data set contains records of transaction (including daily product usage decisions and contract choices) throughout each individual’s lifetime with the online learning platform.
Consider individual \(i\) observed over period \(T_{it}\) throughout his/her lifetime since his/her first interaction with the firm at calendar time \(T_{i,t=0}\). We consider a discrete-time multi-stage model for each consumer where the unit of time \(t\) denoting the lifetime with the firm is defined at the daily level. The state of each individual at time \(T_{i,\tau}\) can be represented by a trivariate vector \(y = [y_{ik,\tau_0}^c,y_{ik,\tau_t}^m,y_{ik,\tau_t}^g]\), where \(y_{ik,\tau_0}^c\) represents the contract choice\(j_{\tau} \in J_{\tau}=\{0,1,2,...\}\) for the \(k\)th incidence of subscription decision,
\[ u_{it,k}^j = \delta_{ij} -\beta_{ip} p_{j} + \theta_{it}l_j + \epsilon_{ijt} \]
\[ \theta_{it} = \theta_{i0} + \alpha^c(X_{it}) + \alpha^c(T_{it}) + \gamma'_cz_i \]
\[ P(y_{it,k}^c= j) = \frac{\exp(v_{it}^j)}{1+\sum_{{j'}=1}^J \exp(v_{i{j'}}^{j'})} \]
where \(v_{it}^j = \delta_{ij} - \beta_{ip}p_j + \theta_{itl}l_j + \gamma'z_i\)
auto-renewal
frictions/inertia/…
\(y_{it}^m =1\) if the user completed the learning session, 0 otherwise
\(y_{it}^g=1\) if the user engaged with activities in virtual world, 0 otherwise
\[ P(y_{it}^m=1) = \frac{\exp (v_{it}^m)}{1+\exp (v_{it}^m)} \]
\[ P(y_{it}^g=1) = \{ \begin{array}{r@{}} \frac{\exp (v_{it}^g)}{1+\exp(v_{iv}^g)},\; y_{it}^m=1 \\ 0,\;y_{it}^m=0 \end{array} \]
\[ u_{it}^m = \alpha^m(X_{it}^2) +\alpha^m(T_{it}^2) + \mu^m_{it} + \gamma_m'z_i + \epsilon_{it}^m \]
\[ u_{it}^g = \alpha^g(X_{it}^3) + \alpha^g(T_{it}^3) + \mu_{it}^g+\gamma_g'z_i + \epsilon_{it}^g \]