Business Task

The purpose is to provide a clear and interactive visualization of sales data for decision-making.

About the data

The data was provided by SP Toys company. The data was stored in csv file. The data for the year 2021 is incomplete yet, it stopped at the month of September. Hence there will be room for this analysis to be updated later on.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(geosphere)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, were retired in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(dplyr)
library(ggmap)
## ℹ Google's Terms of Service: <https://mapsplatform.google.com>
## ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
library(readxl)

Prepare

Now, let’s prepare data for exploration

The data was copied and pasted from the different sheets in the excel file New Data (Aug 2021) and New Data (Sep 2021) into the sheet named Data.

Key tasks

  • Check the data for errors.

file path

excel_file <- "C:\\Users\\ARTHUR\\Downloads\\SPToys_Monthly_Saless.xlsx"

Importing Dataset

SPToys <- read_xlsx(excel_file, sheet = 'Data', range = "A1:K4795")

To verify the dataset was imported correctly, i used the head() function

str(SPToys)
## tibble [4,794 × 11] (S3: tbl_df/tbl/data.frame)
##  $ Year            : num [1:4794] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ Month           : num [1:4794] 1 1 1 1 1 1 1 1 1 1 ...
##  $ Month_text      : chr [1:4794] "Jan" "Jan" "Jan" "Jan" ...
##  $ Store Name      : chr [1:4794] "Beverly Hills" "Beverly Hills" "Beverly Hills" "Beverly Hills" ...
##  $ Region          : chr [1:4794] "Los Angeles" "Los Angeles" "Los Angeles" "Los Angeles" ...
##  $ Store Type      : chr [1:4794] "Residential" "Residential" "Residential" "Residential" ...
##  $ Product Name    : chr [1:4794] "Action Figure" "Animal Figures" "Chutes & Ladders" "Classic Dominoes" ...
##  $ Product Category: chr [1:4794] "Toys" "Toys" "Games" "Games" ...
##  $ Units Sold      : num [1:4794] 63 7 5 11 5 24 81 18 52 10 ...
##  $ Revenue         : num [1:4794] 1007.4 90.9 65 109.9 75 ...
##  $ Profit          : num [1:4794] 378 21 15 22 40 96 243 18 208 50 ...

Process

Cleaning data for analysis or manipulation of data

The data was checked for inconsistency in formats, missing data, whitespaces, duplicates but none was seen.

Analyze

Since the data has been prepared and proven clean, it’s time for analysis

Exploring and summarising data

n_distinct(SPToys)
## [1] 4794
Insights:

SP Toys had a total of 4,794 sales transactions between January 2020 and September 2021. The company operates in three regions, has four types of stores, and offers 41 different products across five categories.

Recommendations:

Further analyze sales performance by region, store type, and product category to identify specific areas for improvement or expansion.

n_distinct(SPToys$Region)
## [1] 3
n_distinct(SPToys$`Store Type`)
## [1] 4
n_distinct(SPToys$`Store Name`)
## [1] 10
n_distinct(SPToys$`Product Name`)
## [1] 41
n_distinct(SPToys$`Product Category`)
## [1] 5

Here we learnt that SP Toy has branches in 3 regions and 4 type of stores. There are 10 different store names in the company, the company also deals on 41 products which are categorized into 5 product categories.

Let’s have an understanding of the data through summary.

SPToys %>%
  select(`Units Sold`,
         Revenue,
         Profit) %>%
  summary
##    Units Sold        Revenue            Profit      
##  Min.   :  1.00   Min.   :   4.99   Min.   :   1.0  
##  1st Qu.: 16.00   1st Qu.: 199.95   1st Qu.:  52.0  
##  Median : 35.00   Median : 431.69   Median : 116.0  
##  Mean   : 56.48   Mean   : 752.40   Mean   : 209.6  
##  3rd Qu.: 71.00   3rd Qu.: 911.43   3rd Qu.: 248.0  
##  Max.   :927.00   Max.   :9594.27   Max.   :4600.0
Insights:
  • The average number of units sold per store is 56.48 per day.
  • The average daily revenue for a store is $752.40, with an average daily profit of $209.60.
Recommendations:

Investigate factors influencing the variation in units sold, revenue, and profit across different stores and regions. Identify strategies to increase units sold and optimize revenue and profit margins.

Visualization

ggplot(data = SPToys, aes(x = `Units Sold`, y= Revenue)) + geom_point() +
  geom_smooth() + labs(title = "Units sold vs Revenue")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Insights:

Positive relationships exist between units sold and both revenue and profit, indicating that higher sales contribute to increased revenue and profit.

Recommendations:

Implement strategies to boost sales, such as targeted marketing campaigns or promotions.

ggplot(data = SPToys, aes(x = `Units Sold`, y= Profit)) + geom_point() +
  geom_smooth() + labs(title = "Units sold vs Profit")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

* There is equally a positive relationship between profit and Units sold as we can see in the visualization.

Let’s see the best month with the highest revenue in the year 2020

Filter data for years 2020 and 2021
data_2020_2021 <- subset(SPToys, Year %in% c(2020, 2021))
Define the order of months
ordered_months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
Aggregate data by store and sum the revenue for each month and year
monthly_summary_2020_2021 <- data_2020_2021 %>%
  group_by(Month_text, Year) %>%
  summarise(Revenue = sum(Revenue)) %>%
  mutate(Month_text = factor(Month_text, levels = ordered_months, ordered = TRUE))
## `summarise()` has grouped output by 'Month_text'. You can override using the
## `.groups` argument.
Convert to data frame
monthly_summary_2020_2021 <- as.data.frame(monthly_summary_2020_2021)

print(monthly_summary_2020_2021)
##    Month_text Year  Revenue
## 1         Apr 2020 175018.5
## 2         Apr 2021 216565.0
## 3         Aug 2020 121524.3
## 4         Aug 2021 160907.7
## 5         Dec 2020 203188.1
## 6         Feb 2020 127624.7
## 7         Feb 2021 184085.2
## 8         Jan 2020 131732.0
## 9         Jan 2021 178712.5
## 10        Jul 2020 157047.9
## 11        Jul 2021 223774.2
## 12        Jun 2020 175832.9
## 13        Jun 2021 199483.0
## 14        Mar 2020 143414.2
## 15        Mar 2021 217935.4
## 16        May 2020 167215.6
## 17        May 2021 224318.7
## 18        Nov 2020 159775.6
## 19        Oct 2020 145197.4
## 20        Sep 2020 137295.2
## 21        Sep 2021 156363.1
Create a bar graph for the year 2020 and 2021
ggplot(data = monthly_summary_2020_2021, aes(x = Month_text, y = Revenue, fill = as.factor(Year), group = 1)) +
  geom_line(color = "blue") +
  labs(title = "Revenue per month (2020 vs. 2021)", x = "Month", y = "Total Revenue") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_wrap(~Year, scales = "free_x", ncol = 2)

Insights:

December consistently stands out as the month with the highest revenue and profit. February appears to be the month with the lowest profit in 2020, while May and September show the highest and lowest profits in 2021, respectively.

Recommendations:

Investigate the factors contributing to the seasonality of sales. Plan marketing and sales strategies around peak months to maximize revenue.

Let’s check the profit, so that we can know if higher revenue really means higher profit

Aggregate data by month and sum the profit for each month, excluding NA values

monthly_summary_2020_2021 <- data_2020_2021 %>%
  group_by(Month_text, Year) %>%
  summarise(Profit = sum(Profit)) %>%
  mutate(Month_text = factor(Month_text, levels = ordered_months, ordered = TRUE))
## `summarise()` has grouped output by 'Month_text'. You can override using the
## `.groups` argument.

Create a line chart for profit across the year 2020 and 2021

ggplot(data = monthly_summary_2020_2021, aes(x = Month_text, y = Profit, fill = as.factor(Year), group = 1)) +
  geom_line(color = "blue") +
  labs(title = "Profit per month (2020 vs. 2021)", x = "Month", y = "Total Profit") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_wrap(~Year, scales = "free_x", ncol = 2)

Insights:

The analysis indicates that higher revenue does not necessarily mean higher profit.

Recommendations:

Conduct a detailed analysis of expenses, cost structures, and profit margins to understand the factors affecting profit. Implement cost-cutting measures or pricing strategies to improve profit margins.

It’s time to have a look at our stores to see how they have been performing over the years.

Aggregate data by store and sum the revenue for each store and year

store_summary_2020_2021 <- data_2020_2021 %>%
  group_by(`Store Name`, Year) %>%
  summarise(Total_Revenue = sum(Revenue))
## `summarise()` has grouped output by 'Store Name'. You can override using the
## `.groups` argument.

Create a bar chart for store performance in 2020 and 2021

ggplot(data = store_summary_2020_2021, aes(x = `Store Name`, y = Total_Revenue, fill = as.factor(Year))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Store Performance Comparison (2020 vs. 2021)", x = "Store Name", y = "Total Revenue") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # Rotate x-axis labels for better readability
  facet_wrap(~Year, scales = "free_x", ncol = 2)

###### Insights: O’Hare consistently outperforms other stores in terms of revenue in both 2020 and 2021. JFX and Hollywood show lower performance among stores.

Recommendations:

Analyze the operational and marketing strategies of high-performing stores for potential implementation in others. Identify and address challenges faced by lower-performing stores.

Data Quality and Future Analysis:

Recommendations:
  • Continuously monitor and ensure data quality for accurate and reliable analysis.
  • Regularly update the analysis as more data becomes available for the year 2021.
  • By leveraging these insights and recommendations, SP Toys can make informed decisions to enhance overall sales performance, optimize store operations, and strategically position its products in the market.