1 Loading the required libraries.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(DT)
## Warning: package 'DT' was built under R version 3.5.1

2 Introduction:

This exercise is to basically get an idea about retention, Customer Lifetime Value and Cohort Analysis. These are some of the metrics and analysis that are the foundation of data analysis in the product, web or the aaplication industry. Understanding the basics about these metrics is really important to any data analyst.

2.1 What is retention?

The term “retention rate” is used in a variety of fields, including marketing, investing, education, in the workplace and in clinical trials. Maintaining retention in each of these fields often results in a positive outcome for the overall organization or school, or pharmacological study. In marketing, retention rate is used to count customers and track customer activity irrespective of the number of transactions (or dollar value of those transactions) made by each customer.

2.2 What is customer lifetime value (CLV)?

In marketing, customer lifetime value (CLV or often CLTV), lifetime customer value (LCV), or life-time value (LTV) is a prediction of the net profit attributed to the entire future relationship with a customer. The prediction model can have varying levels of sophistication and accuracy, ranging from a crude heuristic to the use of complex predictive analytics techniques.

2.3 What will tis article contain?

This article would contain 2 important things

  1. Retention Analysis - We are going to understand the growth of our product.

  2. Customer Lifetime Value Calculation.

3 Loading the data

ltv <- read.csv("LTV.csv", stringsAsFactors = F)

4 The Dataset

cat("There are", nrow(ltv),"rows", "and", ncol(ltv), "columns.")
## There are 21597 rows and 8 columns.
ltv$month = month(mdy_hm(ltv$created_at))
ltv$year = year(mdy_hm(ltv$created_at))
ltv$YearMonth <-  paste0(ltv$year,ltv$month)

cat("There are",length(unique(ltv$YearMonth)),"months worth of transaction.")
## There are 14 months worth of transaction.

4.1 Getting to know the columns in a dataset.

datatable(head(ltv, 5))

There are 21597 transactions cpmpleted between February 2015 to March 2016 and there are 17036 customers involved in those transactions.

The visitors made transactions throught 3 platforms: - iOS

  • Android

  • Windows

Visitors also use 3 different modes to complete transactions: - Apps

  • Browser

5 Customer Retention Analysis

As the data is based on a monthly subscription service, I will be using months to calculate monthly retention rates. I am also going to assume that February 2015 as the first month and I will be starting by analyzing retention of customers who visited the site in February 2015.

  Februry_pool <- unique(ltv$customer_id[ltv$month == 2 & ltv$year == 2015])

  ltv %>% 
    filter(year == 2015) %>%
    filter(customer_id %in% Februry_pool) %>%
    group_by(year, month) %>%
    summarise(number = length(customer_id)) %>%
    arrange(year, month) %>%
    ungroup() %>%
    ggplot(aes(x = factor(month), y = number, group = 1)) +
      geom_line(color = "#0915BD") +
      #geom_point(color = "0915BD") +
    labs(x = "Months", y = "Visitors", title = "Retention") +
    theme_minimal()

In february we had 911 new customers, but and the graph shows the customer retention of the february cohort of customers.

5.1 Frequency of visits:

We will now identify which months do the visitors visit and the frequency of visits. We are also going to identy which of our users are returning after a gap, which of them visit every month and which of the customers are lost.

visitLog <- data.frame(customerID = ltv$customer_id, year = year(mdy_hm(ltv$created_at)), visit_month = month(mdy_hm(ltv$created_at)))

visitLog$visit_month <- if_else(visitLog$year == 2015, visitLog$visit_month, visitLog$year-2015+11+visitLog$visit_month)
visitLog <- visitLog[order(visitLog$customerID),]

visitLog <- visitLog %>%
              group_by(customerID) %>%
              mutate(lead = lead(visit_month)) 

visitLog$timediff <- visitLog$lead - visitLog$visit_month

visitLog$customerType <- ifelse(visitLog$timediff == 1, "Retained",
                                ifelse(visitLog$timediff > 1, "Lagger",
                                       "Repeat"))
visitLog$customerType <- ifelse(is.na(visitLog$timediff),"Lost",visitLog$customerType)  

visitLog %>%
    group_by(visit_month) %>%
    summarise(Retention = round(sum(customerType == "Retained")*100/n(), 2)) %>%
    filter(visit_month != 15) %>%
    ggplot(aes(x = factor(visit_month), y = Retention, group = 1)) +
      geom_line(color = "#0915BD") +
      theme_minimal() +
      labs(x = "Months", y = "Retention Percentage", title = "Month-On-Month Customer Retention")