1. Synopsis

There is 1 dataset(csv) with 3 years worth of customer orders. There are 4 columns in the csv dataset: index, CUSTOMER_EMAIL(unique identifier as hash), Net_Revenue, and Year.

For each year we need the following information:

Total revenue for the current year
New Customer Revenue e.g. new customers not present in previous year only
Existing Customer Growth, Revenue of existing customers for current year – Revenue of existing customers from existing year
Revenue lost from attrition
Existing Customer Revenue Current Year
Existing Customer Revenue Prior Year
Total Customers Current Year
Total Customers Previous Year
New Customers
Lost Customers

Dataset:https://www.dropbox.com/sh/xhy2fzjdvg3ykhy/AADAVKH9tgD_dWh6TZtOd34ia?dl=0

2. Packages Required

We begin by loading the packages that will be required throughout the course of our analysis.

library(tidyr)
library(DT)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(kableExtra)
library(lubridate)
library(readxl)
library(highcharter)
library(lubridate)
library(scales)
library(RColorBrewer)
library(wesanderson)
library(plotly)
library(shiny)
library(readxl)
library(readr)
library(scales)
library(stringr)
library(boot)
library(reshape2)

3. Data Preparation

After loading the required packages, we move on to the data preparation step which would start by loading the data into our R-studio.

3.1 Loading and Reading the Data

Summary and Glimpse of the Data

summary(data)

##        V1         customer_email      net_revenue          year     
##  Min.   :     0   Length:685927      Min.   :  1.00   Min.   :2015  
##  1st Qu.:171482   Class :character   1st Qu.: 63.51   1st Qu.:2015  
##  Median :342963   Mode  :character   Median :125.73   Median :2016  
##  Mean   :342963                      Mean   :125.65   Mean   :2016  
##  3rd Qu.:514445                      3rd Qu.:187.82   3rd Qu.:2017  
##  Max.   :685926                      Max.   :250.00   Max.   :2017

glimpse(data)

## Rows: 685,927
## Columns: 4
## $ V1             <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15...
## $ customer_email <chr> "nhknapwsbx@gmail.com", "joiuzbvcpn@gmail.com", "ukk...
## $ net_revenue    <dbl> 249.92, 87.61, 168.38, 62.40, 43.08, 39.44, 171.46, ...
## $ year           <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015...

Observations

Checking for NA

Next we will check for missing values and deal with them accordingly.

summary(is.na(data))

##      V1          customer_email  net_revenue        year        
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:685927    FALSE:685927    FALSE:685927    FALSE:685927

Observations

3.3 Cleaned Dataset

The final cleaned dataset can be found below in an interactive table.

datatable(head(data, 20), class = 'cell-border stripe')

Task

Total revenue for the current year

data %>%
group_by(year) %>%
summarise(Revenue =sum(net_revenue))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 3 x 2
##    year   Revenue
##   <int>     <dbl>
## 1  2015 29036749.
## 2  2016 25730944.
## 3  2017 31417495.

Answer

The Current Revenue of 2017 is $31417495

• New Customer Revenue e.g. new customers not present in previous year only

cust_2017 <- data %>% 
  filter( year == 2017) 

cust_2016 <- data %>% 
  filter( year == 2016)

cust_2015 <- data %>% 
  filter( year == 2015)


new_cust_2017 <- anti_join(cust_2017, cust_2016, by = c("customer_email" = "customer_email"))
new_cust_2016 <- anti_join(cust_2016, cust_2015, by = c("customer_email" = "customer_email"))

new_cust_2017 %>%
summarise(Revenue_new_cust_2017 = sum(net_revenue))

##   Revenue_new_cust_2017
## 1              21769213

new_cust_2016 %>%
summarise(Revenue_new_cust_2016 = sum(net_revenue))

##   Revenue_new_cust_2016
## 1              17206367

• Existing Customer Growth, Revenue of existing customers for current year – Revenue of existing customers from existing year

result = data %>% 
  group_by(customer_email) 

head(result)

## # A tibble: 6 x 4
## # Groups:   customer_email [6]
##      V1 customer_email       net_revenue  year
##   <int> <chr>                      <dbl> <int>
## 1     0 nhknapwsbx@gmail.com       250.   2015
## 2     1 joiuzbvcpn@gmail.com        87.6  2015
## 3     2 ukkjctepxt@gmail.com       168.   2015
## 4     3 gykatilzrt@gmail.com        62.4  2015
## 5     4 mmsgsrtxah@gmail.com        43.1  2015
## 6     5 mobvusnzfr@gmail.com        39.4  2015

Stout Case Study

Number 2

Sayak Chakraborty

1. Synopsis

2. Packages Required

3. Data Preparation

3.1 Loading and Reading the Data

Summary and Glimpse of the Data

Observations

Checking for NA

Observations

3.3 Cleaned Dataset

Task

Total revenue for the current year

Answer

• New Customer Revenue e.g. new customers not present in previous year only

• Existing Customer Growth, Revenue of existing customers for current year – Revenue of existing customers from existing year