Introduction

A sales manager came to me looking for a deep dive into his sales team. He has a team of 5 Sales Representatives and was looking for help parsing through the past years sales data to discover ways to improve.

About the Company

This is a fictitious company for use in this case. The “company” sells clothing, electronics, furniture, and food. There is a sales department consisting of 5 sales reps and a sales manager.

Data Collection

The data was offered in a single .csv format. It was collected from January of 2023 to January 2024. The data was downlaoded from https://www.kaggle.com/datasets/vinothkannaece/sales-dataset/data .

Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.

Columns in the Dataset

Product_ID: Unique identifier for each product sold. Randomly generated for practice purposes. Sale_Date: The date when the sale occurred. Randomly selected from the year 2023. Sales_Rep: The sales representative responsible for the transaction. The dataset includes five random sales representatives (Alice, Bob, Charlie, David, Eve). Region: The region where the sale took place. The possible regions are North, South, East, and West. Sales_Amount: The total sales amount for the transaction, including discounts if any. Values range from 100 to 10,000 (in currency units). Quantity_Sold: The number of units sold in that transaction, randomly generated between 1 and 50. Product_Category: The category of the product sold. Categories include Electronics, Furniture, Clothing, and Food. Unit_Cost: The cost per unit of the product sold, randomly generated between 50 and 5000 currency units. Unit_Price: The selling price per unit of the product, calculated to be higher than the unit cost. Customer_Type: Indicates whether the customer is a New or Returning customer. Discount: The discount applied to the sale, randomly chosen between 0% and 30%. Payment_Method: The method of payment used by the customer (e.g., Credit Card, Cash, Bank Transfer). Sales_Channel: The channel through which the sale occurred. Either Online or Retail. Region_and_Sales_Rep: A combined column that pairs the region and sales representative for easier tracking.

Data Cleaning & Processing

# Installing packages :      
install.packages("tidyverse")    
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("lubridate")     
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("dplyr")       
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")      
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("tidyr")    
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("here")     
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("skimr")     
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("plotly")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("corrplot")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
#Load Libraries:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
## here() starts at /cloud/project
library(skimr)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(corrplot)
## corrplot 0.95 loaded
SalesData<- read_csv('SalesData.csv')
## Rows: 1000 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Sale_Date, Sales_Rep, Region, Product_Category, Customer_Type, Paym...
## dbl (6): Product_ID, Sales_Amount, Quantity_Sold, Unit_Cost, Unit_Price, Dis...
## lgl (1): Unit_Revenue
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(SalesData)
##    Product_ID    Sale_Date          Sales_Rep            Region         
##  Min.   :1001   Length:1000        Length:1000        Length:1000       
##  1st Qu.:1024   Class :character   Class :character   Class :character  
##  Median :1051   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :1050                                                           
##  3rd Qu.:1075                                                           
##  Max.   :1100                                                           
##   Sales_Amount    Quantity_Sold   Product_Category     Unit_Cost      
##  Min.   : 100.1   Min.   : 1.00   Length:1000        Min.   :  60.28  
##  1st Qu.:2550.3   1st Qu.:13.00   Class :character   1st Qu.:1238.38  
##  Median :5019.3   Median :25.00   Mode  :character   Median :2467.24  
##  Mean   :5019.3   Mean   :25.36                      Mean   :2475.30  
##  3rd Qu.:7507.4   3rd Qu.:38.00                      3rd Qu.:3702.86  
##  Max.   :9989.0   Max.   :49.00                      Max.   :4995.30  
##    Unit_Price     Customer_Type         Discount      Payment_Method    
##  Min.   : 167.1   Length:1000        Min.   :0.0000   Length:1000       
##  1st Qu.:1509.1   Class :character   1st Qu.:0.0800   Class :character  
##  Median :2696.4   Mode  :character   Median :0.1500   Mode  :character  
##  Mean   :2728.4                      Mean   :0.1524                     
##  3rd Qu.:3958.0                      3rd Qu.:0.2300                     
##  Max.   :5442.1                      Max.   :0.3000                     
##  Sales_Channel      Region_and_Sales_Rep Unit_Revenue  
##  Length:1000        Length:1000          Mode:logical  
##  Class :character   Class :character     NA's:1000     
##  Mode  :character   Mode  :character                   
##                                                        
##                                                        
## 
colnames(SalesData)
##  [1] "Product_ID"           "Sale_Date"            "Sales_Rep"           
##  [4] "Region"               "Sales_Amount"         "Quantity_Sold"       
##  [7] "Product_Category"     "Unit_Cost"            "Unit_Price"          
## [10] "Customer_Type"        "Discount"             "Payment_Method"      
## [13] "Sales_Channel"        "Region_and_Sales_Rep" "Unit_Revenue"
glimpse(SalesData)
## Rows: 1,000
## Columns: 15
## $ Product_ID           <dbl> 1052, 1093, 1015, 1072, 1061, 1021, 1083, 1087, 1…
## $ Sale_Date            <chr> "2/3/2023", "4/21/2023", "9/21/2023", "8/24/2023"…
## $ Sales_Rep            <chr> "Bob", "Bob", "David", "Bob", "Charlie", "Charlie…
## $ Region               <chr> "North", "West", "South", "South", "East", "West"…
## $ Sales_Amount         <dbl> 5053.97, 4384.02, 4631.23, 2167.94, 3750.20, 3761…
## $ Quantity_Sold        <dbl> 18, 17, 30, 39, 13, 32, 29, 46, 30, 18, 13, 43, 2…
## $ Product_Category     <chr> "Furniture", "Furniture", "Food", "Clothing", "El…
## $ Unit_Cost            <dbl> 152.75, 3816.39, 261.56, 4330.03, 637.37, 900.79,…
## $ Unit_Price           <dbl> 267.22, 4209.44, 371.40, 4467.75, 692.71, 1106.51…
## $ Customer_Type        <chr> "Returning", "Returning", "Returning", "New", "Ne…
## $ Discount             <dbl> 0.09, 0.11, 0.20, 0.02, 0.08, 0.21, 0.14, 0.12, 0…
## $ Payment_Method       <chr> "Cash", "Cash", "Bank Transfer", "Credit Card", "…
## $ Sales_Channel        <chr> "Online", "Retail", "Retail", "Retail", "Online",…
## $ Region_and_Sales_Rep <chr> "North-Bob", "West-Bob", "South-David", "South-Bo…
## $ Unit_Revenue         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
SalesData<-SalesData%>%
    mutate(Sales_Date = as.Date(Sale_Date))

Analysis

We will take a look at the sales of both the company and the sales reps to understand where improvements can be made.

Analysis by Category and Region Sales by Region

When sales are broken down by region, we can see that there should be room to grow in the “South”. The “North” is the top producer, with the “East” and “West” fairly equal.

With the sales being broken down by product, we see a pretty even distribution across our pie chart.

There appears to be an even disbursement between new and returning customers. The marketing online seems to eb working as there is a greater number of new clients. In regards to payment types, there does not appear to be much of note.

The box plot graph helps us to see Bob and David appear to have the greatest average sale amount, and Eve the lowest. Eve sold the 2nd most units, but her lowest average sale did hurt her total output. If her average sale was to increase she should be well over a million in sales yearly.

## # A tibble: 5 × 4
##   Sales_Rep Total_Sales  Num_Sales Average_Sale
##   <chr>     <chr>            <int> <chr>       
## 1 David     $1,141,737.4       222 $5,142.96   
## 2 Bob       $1,080,990.6       208 $5,197.07   
## 3 Eve       $  970,184.0       209 $4,642.03   
## 4 Alice     $  965,541.8       192 $5,028.86   
## 5 Charlie   $  860,811.5       169 $5,093.56

In Conclusion

Upon analyzing the sales data, it was found that the sales team has effectively divided their efforts, maintaining an even balance in both product distribution and the breakdown of sales between new and returning customers. This indicates a well-rounded approach to market penetration. However, there is still untapped potential in the “South” region, where growth opportunities remain. By focusing additional resources and strategies in this area, the company could capitalize on its existing strengths and further expand its market share in that region. In addition to focusing on the “South” region, there is a strategic opportunity to encourage the sales team to concentrate on closing deals near the $5,000 mark. The top two producers, who each average sales well above this figure, have been key contributors to achieving over $1,000,000 in total sales. By aligning the team to target similar-sized deals, they can replicate this success and drive greater performance across the board. The ultimate goal should be to close 200 units yearly, and by focusing on these high-value transactions, the sales team will be better positioned to meet this target while continuing to push overall sales growth.