A sales manager came to me looking for a deep dive into his sales team. He has a team of 5 Sales Representatives and was looking for help parsing through the past years sales data to discover ways to improve.
This is a fictitious company for use in this case. The “company” sells clothing, electronics, furniture, and food. There is a sales department consisting of 5 sales reps and a sales manager.
The data was offered in a single .csv format. It was collected from January of 2023 to January 2024. The data was downlaoded from https://www.kaggle.com/datasets/vinothkannaece/sales-dataset/data .
Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.
Columns in the Dataset
Product_ID: Unique identifier for each product sold. Randomly generated for practice purposes. Sale_Date: The date when the sale occurred. Randomly selected from the year 2023. Sales_Rep: The sales representative responsible for the transaction. The dataset includes five random sales representatives (Alice, Bob, Charlie, David, Eve). Region: The region where the sale took place. The possible regions are North, South, East, and West. Sales_Amount: The total sales amount for the transaction, including discounts if any. Values range from 100 to 10,000 (in currency units). Quantity_Sold: The number of units sold in that transaction, randomly generated between 1 and 50. Product_Category: The category of the product sold. Categories include Electronics, Furniture, Clothing, and Food. Unit_Cost: The cost per unit of the product sold, randomly generated between 50 and 5000 currency units. Unit_Price: The selling price per unit of the product, calculated to be higher than the unit cost. Customer_Type: Indicates whether the customer is a New or Returning customer. Discount: The discount applied to the sale, randomly chosen between 0% and 30%. Payment_Method: The method of payment used by the customer (e.g., Credit Card, Cash, Bank Transfer). Sales_Channel: The channel through which the sale occurred. Either Online or Retail. Region_and_Sales_Rep: A combined column that pairs the region and sales representative for easier tracking.
# Installing packages :
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("here")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("plotly")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("corrplot")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
#Load Libraries:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
## here() starts at /cloud/project
library(skimr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(corrplot)
## corrplot 0.95 loaded
SalesData<- read_csv('SalesData.csv')
## Rows: 1000 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Sale_Date, Sales_Rep, Region, Product_Category, Customer_Type, Paym...
## dbl (6): Product_ID, Sales_Amount, Quantity_Sold, Unit_Cost, Unit_Price, Dis...
## lgl (1): Unit_Revenue
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(SalesData)
## Product_ID Sale_Date Sales_Rep Region
## Min. :1001 Length:1000 Length:1000 Length:1000
## 1st Qu.:1024 Class :character Class :character Class :character
## Median :1051 Mode :character Mode :character Mode :character
## Mean :1050
## 3rd Qu.:1075
## Max. :1100
## Sales_Amount Quantity_Sold Product_Category Unit_Cost
## Min. : 100.1 Min. : 1.00 Length:1000 Min. : 60.28
## 1st Qu.:2550.3 1st Qu.:13.00 Class :character 1st Qu.:1238.38
## Median :5019.3 Median :25.00 Mode :character Median :2467.24
## Mean :5019.3 Mean :25.36 Mean :2475.30
## 3rd Qu.:7507.4 3rd Qu.:38.00 3rd Qu.:3702.86
## Max. :9989.0 Max. :49.00 Max. :4995.30
## Unit_Price Customer_Type Discount Payment_Method
## Min. : 167.1 Length:1000 Min. :0.0000 Length:1000
## 1st Qu.:1509.1 Class :character 1st Qu.:0.0800 Class :character
## Median :2696.4 Mode :character Median :0.1500 Mode :character
## Mean :2728.4 Mean :0.1524
## 3rd Qu.:3958.0 3rd Qu.:0.2300
## Max. :5442.1 Max. :0.3000
## Sales_Channel Region_and_Sales_Rep Unit_Revenue
## Length:1000 Length:1000 Mode:logical
## Class :character Class :character NA's:1000
## Mode :character Mode :character
##
##
##
colnames(SalesData)
## [1] "Product_ID" "Sale_Date" "Sales_Rep"
## [4] "Region" "Sales_Amount" "Quantity_Sold"
## [7] "Product_Category" "Unit_Cost" "Unit_Price"
## [10] "Customer_Type" "Discount" "Payment_Method"
## [13] "Sales_Channel" "Region_and_Sales_Rep" "Unit_Revenue"
glimpse(SalesData)
## Rows: 1,000
## Columns: 15
## $ Product_ID <dbl> 1052, 1093, 1015, 1072, 1061, 1021, 1083, 1087, 1…
## $ Sale_Date <chr> "2/3/2023", "4/21/2023", "9/21/2023", "8/24/2023"…
## $ Sales_Rep <chr> "Bob", "Bob", "David", "Bob", "Charlie", "Charlie…
## $ Region <chr> "North", "West", "South", "South", "East", "West"…
## $ Sales_Amount <dbl> 5053.97, 4384.02, 4631.23, 2167.94, 3750.20, 3761…
## $ Quantity_Sold <dbl> 18, 17, 30, 39, 13, 32, 29, 46, 30, 18, 13, 43, 2…
## $ Product_Category <chr> "Furniture", "Furniture", "Food", "Clothing", "El…
## $ Unit_Cost <dbl> 152.75, 3816.39, 261.56, 4330.03, 637.37, 900.79,…
## $ Unit_Price <dbl> 267.22, 4209.44, 371.40, 4467.75, 692.71, 1106.51…
## $ Customer_Type <chr> "Returning", "Returning", "Returning", "New", "Ne…
## $ Discount <dbl> 0.09, 0.11, 0.20, 0.02, 0.08, 0.21, 0.14, 0.12, 0…
## $ Payment_Method <chr> "Cash", "Cash", "Bank Transfer", "Credit Card", "…
## $ Sales_Channel <chr> "Online", "Retail", "Retail", "Retail", "Online",…
## $ Region_and_Sales_Rep <chr> "North-Bob", "West-Bob", "South-David", "South-Bo…
## $ Unit_Revenue <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
SalesData<-SalesData%>%
mutate(Sales_Date = as.Date(Sale_Date))
We will take a look at the sales of both the company and the sales reps to understand where improvements can be made.
Analysis by Category and Region Sales by Region
When sales are broken down by region, we can see that there should be room to grow in the “South”. The “North” is the top producer, with the “East” and “West” fairly equal.
With the sales being broken down by product, we see a pretty even
distribution across our pie chart.
There appears to be an even disbursement between new and returning
customers. The marketing online seems to eb working as there is a
greater number of new clients. In regards to payment types, there does
not appear to be much of note.
The box plot graph helps us to see Bob and David appear to have the greatest average sale amount, and Eve the lowest. Eve sold the 2nd most units, but her lowest average sale did hurt her total output. If her average sale was to increase she should be well over a million in sales yearly.
## # A tibble: 5 × 4
## Sales_Rep Total_Sales Num_Sales Average_Sale
## <chr> <chr> <int> <chr>
## 1 David $1,141,737.4 222 $5,142.96
## 2 Bob $1,080,990.6 208 $5,197.07
## 3 Eve $ 970,184.0 209 $4,642.03
## 4 Alice $ 965,541.8 192 $5,028.86
## 5 Charlie $ 860,811.5 169 $5,093.56
Upon analyzing the sales data, it was found that the sales team has effectively divided their efforts, maintaining an even balance in both product distribution and the breakdown of sales between new and returning customers. This indicates a well-rounded approach to market penetration. However, there is still untapped potential in the “South” region, where growth opportunities remain. By focusing additional resources and strategies in this area, the company could capitalize on its existing strengths and further expand its market share in that region. In addition to focusing on the “South” region, there is a strategic opportunity to encourage the sales team to concentrate on closing deals near the $5,000 mark. The top two producers, who each average sales well above this figure, have been key contributors to achieving over $1,000,000 in total sales. By aligning the team to target similar-sized deals, they can replicate this success and drive greater performance across the board. The ultimate goal should be to close 200 units yearly, and by focusing on these high-value transactions, the sales team will be better positioned to meet this target while continuing to push overall sales growth.