Consumer Packaged Goods Sales Management

Project Context

Explore the data to discover potential sales management issues and build relevant hypothesis
Create data visualizations/dashboards to facilitate issue/hypo discussion with team managers/memebers who aren’t failiar with data analysis

Why EDA?

Explore data to form initial hypothesis

Data Pre-processing?

Read data and library

library(tidyverse)

## ─ Attaching packages ──────────────────── tidyverse 1.2.1 ─

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.5
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ─ Conflicts ────────────────────── tidyverse_conflicts() ─
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(knitr)

SalesTable <- read_csv('SalesTable.csv')

## Parsed with column specification:
## cols(
##   Agency = col_integer(),
##   Product_ID = col_integer(),
##   Client_ID = col_integer(),
##   Sales_Amount = col_integer(),
##   Sales = col_integer()
## )

ClientTable <- read_csv('ClientTable.csv')

## Parsed with column specification:
## cols(
##   Client_ID = col_integer(),
##   Client_Name = col_character()
## )

ProductTable <- read_csv('ProductTable.csv')

## Parsed with column specification:
## cols(
##   Product_ID = col_integer(),
##   Product_Name = col_character()
## )

Join all the data and look into the data structure

SalesTableNew <- SalesTable %>%
  inner_join(ClientTable, by = 'Client_ID') %>%
  inner_join(ProductTable, by = 'Product_ID')

kable(SalesTableNew[1:10,])

Agency	Product_ID	Client_ID	Sales_Amount	Sales	Client_Name	Product_Name
101	1004	2003	465	78971	CC	D
101	1007	2003	182	23053	CC	G
101	1008	2003	757	61897	CC	H
101	1016	2003	714	69885	CC	P
101	1017	2003	321	37015	CC	Q
101	1018	2003	56	6141	CC	R
101	1002	2003	541	54350	CC	B
101	1004	2003	475	56077	CC	D
101	1007	2003	222	40973	CC	G
101	1008	2003	1868	208044	CC	H

Change data type for the following analysis : Turn the number code from “interger” into “factor”

SalesTableNew$Agency <- as.factor(SalesTableNew$Agency)
SalesTableNew$Product_ID <- as.factor(SalesTableNew$Product_ID)
SalesTableNew$Client_ID <- as.factor(SalesTableNew$Client_ID)

Propose Hypothesis

Product unit price * Sales

Question : What is the relationship between product unit price & sales amount?
Chart purpose & type : Correlation, Scatterplot
Fact : Sales mostly generated by products which price at ~150 NTD
Hypothesis : The average product unit price is lower/similar/highter compared to the industry benchmark
Further Data Verification : Product unit price industry benchmark

SalesTablePrice <- SalesTableNew %>%
  mutate( Unit_Price = Sales / Sales_Amount)

ggplot(data = SalesTablePrice,
             aes( x = Unit_Price,
                  y = Sales_Amount))+
  geom_point(color = 'red',
             alpha = 0.5) + theme_bw()

## Warning: Removed 7 rows containing missing values (geom_point).

**Sales pattern * Client**

Question : What’s different in Sales between different clients?
Chart Purpose & Type : Distribution , Boxplot
Fact : LL’s avg. product sales (total sales by pdct) are obviously higher than other clients
Hypothesis : LL’s products sales is much more healthier
Further Data Verification : How many pdcts by clients have

ggplot(SalesTableNew)+geom_boxplot( aes( x = factor(Client_Name),
                                         y = Sales,
                                         colour = Client_Name))+
  labs( x = 'Client',
        title = 'Sales Distribution by Client') + theme_bw()

Consumer Packaged Goods Sales Management

Jiawen

Project Context

Why EDA?

Data Pre-processing?

Propose Hypothesis

Product unit price * Sales

Sales pattern * Client

**Sales pattern * Client**