Research problem

A company has faced challenges in retaining customers, consistently noting a monthly customer retention rate below 20%. The customer retention rate is defined as the proportion of repeat clients in a given month. In response to this challenge, the company has opted to develop a loyalty program. Initially, the program will be introduced in selected areas, with plans for expansion to other regions following a thorough assessment of its impact. It is crucial to determine whether the retention rate has increased and whether this improvement can be attributed to the loyalty program

Impact measurement

A before-after design will compare monthly retention rate during the previous 12 months and after 12 months of the loyalty programme.

An interrupted time series (ITS) model will be utilised to measure the impact of the programme and will compare monthly retention rates in areas where the loyalty programme was implemented

Introduction

The ITS design hinges on gathering time series data both before and after an intervention to establish a causal relationship between the intervention and its effects , a series of observations on the same outcome before and after the introduction of an intervention are used to test immediate and gradual effects of the intervention. At least three data points, before and after an intervention have been recommended for ITS analysis. By collecting data at regular intervals over time, a pre-post comparison can be made while accounting for underlying trends in the outcome.

Single Interrupted Time Series (SITS) compares the changes over time before and after the intervention in the exposed group. It assumes that the level and trend of an outcome measure in the exposed group would have remained the same without the intervention.

Analysis approach: Generalised least squares

Common approaches of analysing ITS are segmented regression and autoregressive integrated moving average (ARIMA). Although segemented regression are popular than ARIMA, however, when used, additional adjustment is usually required to account for serial correlation, which arises because observations taken over time are usually correlated.

\[\begin{equation} Y_t=\alpha +\beta_{1} \times \beta_{2} \times intervention + \beta_{3} \times P + \epsilon_{t} \end{equation} \] Where P: Time since intervention

To adjust unequal variances and autocorrelation, common with time series data. The statistical approach to model effect will be through generalised least-squares regression

Data preparation

library(readxl)
library(stargazer)
library(stringr)
library(dplyr)
library(DT)
library(nlme)
library(its.analysis)

The data has 28 data points and with the following variables:

time: The study time from the start to the end
group: Variable to identify the group is in the intervention arm (labelled 1) or control group (labelled 0)
Intervention: A binary indication of whether the intervention has taken place at each time
post_intervention_time A number, the time elapsed since the Intervention

Analysis

This is the analysis of monthly retention rate in treatment areas that do not take into account the comparison areas.

# Plot with custom y-axis labels
plot(mydata$time, mydata$outcome,
     bty="n", pch=19, col="blue",
     ylim=c(0, 20), xlim=c(0, 25),  # Set xlim to match your data range
     ylab="Retention rate",
     xlab="Time(months)",
     main="Customer Retention Rate Over Time")

# Line marking the interruption
abline( v=13, col="purple", lty=2 )
text(7, 20, "After", col="firebrick", cex=0.8, pos=4 )
text(16, 20, "Before", col="purple4", cex=0.8, pos=4 )

ts <- glm( outcome ~ time + intervention + post_intervention_time, data=mydata )
lines( mydata$time, ts$fitted.values, col="sienna4", lwd=2 )

model.a = gls(outcome ~ time + intervention + post_intervention_time, data = mydata,method="ML")
summary(model.a)
## Generalized least squares fit by maximum likelihood
##   Model: outcome ~ time + intervention + post_intervention_time 
##   Data: mydata 
##         AIC       BIC  logLik
##   -30.35479 -24.26041 20.1774
## 
## Coefficients:
##                            Value  Std.Error   t-value p-value
## (Intercept)            10.381128 0.07149540 145.19993  0.0000
## time                    0.014468 0.00984210   1.47004  0.1564
## intervention           -0.228081 0.09640555  -2.36585  0.0277
## post_intervention_time  1.210147 0.01392444  86.90810  0.0000
## 
##  Correlation: 
##                        (Intr) time   intrvn
## time                   -0.889              
## intervention            0.348 -0.565       
## post_intervention_time  0.629 -0.707 -0.070
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -4.18025245 -0.11491765 -0.01243698  0.23590455  1.52009180 
## 
## Residual standard error: 0.1079557 
## Degrees of freedom: 25 total; 21 residual

Interpretation

At the time of intervention, the retention rate dropped by -0.22 units. However, after the intervention, the trend shows that the retention rate was increasing by 1.21 units a month