1. Synopsis

There is 1 dataset(csv) with 3 years worth of customer orders. There are 4 columns in the csv dataset: index, CUSTOMER_EMAIL(unique identifier as hash), Net_Revenue, and Year.

For each year we need the following information:

  • Total revenue for the current year
  • New Customer Revenue e.g. new customers not present in previous year only
  • Existing Customer Growth, Revenue of existing customers for current year – Revenue of existing customers from existing year
  • Revenue lost from attrition
  • Existing Customer Revenue Current Year
  • Existing Customer Revenue Prior Year
  • Total Customers Current Year
  • Total Customers Previous Year
  • New Customers
  • Lost Customers

Dataset:https://www.dropbox.com/sh/xhy2fzjdvg3ykhy/AADAVKH9tgD_dWh6TZtOd34ia?dl=0

2. Packages Required

We begin by loading the packages that will be required throughout the course of our analysis.

library(tidyr)
library(DT)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(kableExtra)
library(lubridate)
library(readxl)
library(highcharter)
library(lubridate)
library(scales)
library(RColorBrewer)
library(wesanderson)
library(plotly)
library(shiny)
library(readxl)
library(readr)
library(scales)
library(stringr)
library(boot)
library(reshape2)

3. Data Preparation

After loading the required packages, we move on to the data preparation step which would start by loading the data into our R-studio.

3.1 Loading and Reading the Data

Summary and Glimpse of the Data

summary(data)
##        V1         customer_email      net_revenue          year     
##  Min.   :     0   Length:685927      Min.   :  1.00   Min.   :2015  
##  1st Qu.:171482   Class :character   1st Qu.: 63.51   1st Qu.:2015  
##  Median :342963   Mode  :character   Median :125.73   Median :2016  
##  Mean   :342963                      Mean   :125.65   Mean   :2016  
##  3rd Qu.:514445                      3rd Qu.:187.82   3rd Qu.:2017  
##  Max.   :685926                      Max.   :250.00   Max.   :2017
glimpse(data)
## Rows: 685,927
## Columns: 4
## $ V1             <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...
## $ customer_email <chr> "nhknapwsbx@gmail.com", "joiuzbvcpn@gmail.com", "ukk...
## $ net_revenue    <dbl> 249.92, 87.61, 168.38, 62.40, 43.08, 39.44, 171.46, ...
## $ year           <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015...

Observations

Checking for NA

Next we will check for missing values and deal with them accordingly.

summary(is.na(data))
##      V1          customer_email  net_revenue        year        
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:685927    FALSE:685927    FALSE:685927    FALSE:685927

Observations

3.3 Cleaned Dataset

The final cleaned dataset can be found below in an interactive table.

datatable(head(data, 20), class = 'cell-border stripe')

Task

  1. Total revenue for the current year

data %>%
group_by(year) %>%
summarise(Revenue =sum(net_revenue))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
##    year   Revenue
##   <int>     <dbl>
## 1  2015 29036749.
## 2  2016 25730944.
## 3  2017 31417495.

Answer

The Current Revenue of 2017 is $31417495

• New Customer Revenue e.g. new customers not present in previous year only

cust_2017 <- data %>% 
  filter( year == 2017) 

cust_2016 <- data %>% 
  filter( year == 2016)

cust_2015 <- data %>% 
  filter( year == 2015)


new_cust_2017 <- anti_join(cust_2017, cust_2016, by = c("customer_email" = "customer_email"))
new_cust_2016 <- anti_join(cust_2016, cust_2015, by = c("customer_email" = "customer_email"))

new_cust_2017 %>%
summarise(Revenue_new_cust_2017 = sum(net_revenue))
##   Revenue_new_cust_2017
## 1              21769213
new_cust_2016 %>%
summarise(Revenue_new_cust_2016 = sum(net_revenue))
##   Revenue_new_cust_2016
## 1              17206367

• Existing Customer Growth, Revenue of existing customers for current year – Revenue of existing customers from existing year

result = data %>% 
  group_by(customer_email) 

head(result)
## # A tibble: 6 x 4
## # Groups:   customer_email [6]
##      V1 customer_email       net_revenue  year
##   <int> <chr>                      <dbl> <int>
## 1     0 nhknapwsbx@gmail.com       250.   2015
## 2     1 joiuzbvcpn@gmail.com        87.6  2015
## 3     2 ukkjctepxt@gmail.com       168.   2015
## 4     3 gykatilzrt@gmail.com        62.4  2015
## 5     4 mmsgsrtxah@gmail.com        43.1  2015
## 6     5 mobvusnzfr@gmail.com        39.4  2015