The Supermarket Sales Dataset used for this analysis can be downloaded for free on: https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales The Dataset contains Sales records for 3 Branches of a Supermarket chain across 3 months. I have visualised the data on revenue generation and customer perception across the three branches.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.2 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.2 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer) #Useful for selecting colours that go well
Supermarket_Sales <- read.csv("Supermarket_Sales.csv")
theme_set(theme_classic()+
theme(plot.title = element_text(hjust = 0.5)))
#Setting a theme so each plot inherits these settings.
glimpse(Supermarket_Sales)
## Rows: 1,000
## Columns: 17
## $ Invoice.ID <chr> "750-67-8428", "226-31-3081", "631-41-3108", "…
## $ Branch <chr> "A", "C", "A", "A", "A", "C", "A", "C", "A", "…
## $ City <chr> "Yangon", "Naypyitaw", "Yangon", "Yangon", "Ya…
## $ Customer.type <chr> "Member", "Normal", "Normal", "Member", "Norma…
## $ Gender <chr> "Female", "Female", "Male", "Male", "Male", "M…
## $ Product.line <chr> "Health and beauty", "Electronic accessories",…
## $ Unit.price <dbl> 74.69, 15.28, 46.33, 58.22, 86.31, 85.39, 68.8…
## $ Quantity <int> 7, 5, 7, 8, 7, 7, 6, 10, 2, 3, 4, 4, 5, 10, 10…
## $ Tax.5. <dbl> 26.1415, 3.8200, 16.2155, 23.2880, 30.2085, 29…
## $ Total <dbl> 548.9715, 80.2200, 340.5255, 489.0480, 634.378…
## $ Date <chr> "1/5/2019", "3/8/2019", "3/3/2019", "1/27/2019…
## $ Time <chr> "13:08", "10:29", "13:23", "20:33", "10:37", "…
## $ Payment <chr> "Ewallet", "Cash", "Credit card", "Ewallet", "…
## $ cogs <dbl> 522.83, 76.40, 324.31, 465.76, 604.17, 597.73,…
## $ gross.margin.percentage <dbl> 4.761905, 4.761905, 4.761905, 4.761905, 4.7619…
## $ gross.income <dbl> 26.1415, 3.8200, 16.2155, 23.2880, 30.2085, 29…
## $ Rating <dbl> 9.1, 9.6, 7.4, 8.4, 5.3, 4.1, 5.8, 8.0, 7.2, 5…
names(Supermarket_Sales)
## [1] "Invoice.ID" "Branch"
## [3] "City" "Customer.type"
## [5] "Gender" "Product.line"
## [7] "Unit.price" "Quantity"
## [9] "Tax.5." "Total"
## [11] "Date" "Time"
## [13] "Payment" "cogs"
## [15] "gross.margin.percentage" "gross.income"
## [17] "Rating"
unique(Supermarket_Sales$Branch)
## [1] "A" "C" "B"
unique(Supermarket_Sales$Payment)
## [1] "Ewallet" "Cash" "Credit card"
unique(Supermarket_Sales$City)
## [1] "Yangon" "Naypyitaw" "Mandalay"
unique(Supermarket_Sales$Product.line)
## [1] "Health and beauty" "Electronic accessories" "Home and lifestyle"
## [4] "Sports and travel" "Food and beverages" "Fashion accessories"
I find this useful to get an idea of how columns are named and also see unique values in a column.
Supermarket_Sales %>%
select(Branch, Total) %>%
ggplot(aes(Branch, Total, fill = Branch))+
geom_col()+
scale_fill_brewer(palette = "Set2")+
labs(title = "Sales by Branch",
y = "")
Branch C is recording slightly more revenue than the others. Sales per product line could be explored to find out why.
Supermarket_Sales %>%
select(Branch, Total, Product.line) %>%
ggplot(aes(Product.line, Total, fill = Product.line))+
geom_col()+
coord_flip()+
scale_fill_brewer(palette = "Set2")+
labs(title = "Sales by Product Line",
y = "")
Most revenue is generated by Food and Beverages with Health and Beauty generating the least.
Supermarket_Sales %>%
select(Branch, Total, Product.line) %>%
filter(Branch == "A") %>%
ggplot(aes(Product.line, Total, fill = Product.line))+
geom_col()+
coord_flip()+
scale_fill_brewer(palette = "Set2")+
labs(title = "Most sold Product Line by Branch A",
y = "")
Supermarket_Sales %>%
select(Branch, Total, Product.line) %>%
filter(Branch == "B") %>%
ggplot(aes(Product.line, Total, fill = Product.line))+
geom_col()+
coord_flip()+
scale_fill_brewer(palette = "Set2")+
labs(title = "Most sold Product Line by Branch B",
y = "")
Supermarket_Sales %>%
select(Branch, Total, Product.line) %>%
filter(Branch == "C") %>%
ggplot(aes(Product.line, Total, fill = Product.line))+
geom_col()+
coord_flip()+
scale_fill_brewer(palette = "Set2")+
labs(title = "Most sold Product Line by Branch C",
y = "")
Different branches seem to have different bestselling products. Branch A - Home and Lifestyle Products Branch B - Health and Beauty products Branch C - Food and beverages As the brances are located in different cities, the differences might be explained by demographic. An analysis of longer term data could help to check if the trend had held true over time. Surprisingly, Branch B generated its highest income from the sales of Health and Beauty products, the product category that generated the least income out of all the others.
Supermarket_Sales %>%
select(Branch, Total, Date) %>%
ggplot(aes(Total))+
geom_density(colour = "steelblue")+
facet_wrap(~Branch)+
labs(title = "Customer Spending Distribution by Branch",
y = "")
The graph shows a peak sales value of just over 100 across all 3 branches. This can serve as the daily sales target per customer.
Supermarket_Sales %>%
select(Branch, Gender, Rating) %>%
ggplot(aes(Branch, Rating, fill = Branch))+
geom_boxplot()+
scale_fill_brewer(palette = "Set2")+
labs(title = "Customer Reviews by Branch")
Branch A and C seem to be receiving more favourable reviews than B.
Supermarket_Sales %>%
select(Branch, Gender, Rating) %>%
ggplot(aes(Branch, Rating, fill = Gender))+
geom_boxplot()+
facet_wrap(~Gender)+
scale_fill_brewer(palette = "Set2")+
labs(title = "Customer Reviews by Gender")
A look at reviews by gender shows female shoppers in Branch B gave a wider range of reviews suggesting they had polarised experiences. This will have to be looked into to improve the overall customer experience.
Supermarket_Sales %>%
select(Branch, Customer.type, Rating) %>%
ggplot(aes(Customer.type, Rating, fill = Customer.type))+
geom_boxplot()+
facet_wrap(~Branch)+
scale_fill_brewer(palette = "Set2")+
labs(title = "Customer Reviews by Customer type",
x = "Customer Type")
There doesn’t seem to be a significant difference between members and non-members in level of customer satisfaction.