The purpose is to provide a clear and interactive visualization of sales data for decision-making.
The data was provided by SP Toys company. The data was stored in csv file. The data for the year 2021 is incomplete yet, it stopped at the month of September. Hence there will be room for this analysis to be updated later on.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(geosphere)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, were retired in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(dplyr)
library(ggmap)
## ℹ Google's Terms of Service: <https://mapsplatform.google.com>
## ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
library(readxl)
Now, let’s prepare data for exploration
The data was copied and pasted from the different sheets in the excel file New Data (Aug 2021) and New Data (Sep 2021) into the sheet named Data.
excel_file <- "C:\\Users\\ARTHUR\\Downloads\\SPToys_Monthly_Saless.xlsx"
SPToys <- read_xlsx(excel_file, sheet = 'Data', range = "A1:K4795")
To verify the dataset was imported correctly, i used the head() function
str(SPToys)
## tibble [4,794 × 11] (S3: tbl_df/tbl/data.frame)
## $ Year : num [1:4794] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
## $ Month : num [1:4794] 1 1 1 1 1 1 1 1 1 1 ...
## $ Month_text : chr [1:4794] "Jan" "Jan" "Jan" "Jan" ...
## $ Store Name : chr [1:4794] "Beverly Hills" "Beverly Hills" "Beverly Hills" "Beverly Hills" ...
## $ Region : chr [1:4794] "Los Angeles" "Los Angeles" "Los Angeles" "Los Angeles" ...
## $ Store Type : chr [1:4794] "Residential" "Residential" "Residential" "Residential" ...
## $ Product Name : chr [1:4794] "Action Figure" "Animal Figures" "Chutes & Ladders" "Classic Dominoes" ...
## $ Product Category: chr [1:4794] "Toys" "Toys" "Games" "Games" ...
## $ Units Sold : num [1:4794] 63 7 5 11 5 24 81 18 52 10 ...
## $ Revenue : num [1:4794] 1007.4 90.9 65 109.9 75 ...
## $ Profit : num [1:4794] 378 21 15 22 40 96 243 18 208 50 ...
Cleaning data for analysis or manipulation of data
The data was checked for inconsistency in formats, missing data, whitespaces, duplicates but none was seen.
Since the data has been prepared and proven clean, it’s time for analysis
n_distinct(SPToys)
## [1] 4794
SP Toys had a total of 4,794 sales transactions between January 2020 and September 2021. The company operates in three regions, has four types of stores, and offers 41 different products across five categories.
Further analyze sales performance by region, store type, and product category to identify specific areas for improvement or expansion.
n_distinct(SPToys$Region)
## [1] 3
n_distinct(SPToys$`Store Type`)
## [1] 4
n_distinct(SPToys$`Store Name`)
## [1] 10
n_distinct(SPToys$`Product Name`)
## [1] 41
n_distinct(SPToys$`Product Category`)
## [1] 5
Here we learnt that SP Toy has branches in 3 regions and 4 type of stores. There are 10 different store names in the company, the company also deals on 41 products which are categorized into 5 product categories.
Let’s have an understanding of the data through summary.
SPToys %>%
select(`Units Sold`,
Revenue,
Profit) %>%
summary
## Units Sold Revenue Profit
## Min. : 1.00 Min. : 4.99 Min. : 1.0
## 1st Qu.: 16.00 1st Qu.: 199.95 1st Qu.: 52.0
## Median : 35.00 Median : 431.69 Median : 116.0
## Mean : 56.48 Mean : 752.40 Mean : 209.6
## 3rd Qu.: 71.00 3rd Qu.: 911.43 3rd Qu.: 248.0
## Max. :927.00 Max. :9594.27 Max. :4600.0
Investigate factors influencing the variation in units sold, revenue, and profit across different stores and regions. Identify strategies to increase units sold and optimize revenue and profit margins.
ggplot(data = SPToys, aes(x = `Units Sold`, y= Revenue)) + geom_point() +
geom_smooth() + labs(title = "Units sold vs Revenue")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Positive relationships exist between units sold and both revenue and profit, indicating that higher sales contribute to increased revenue and profit.
Implement strategies to boost sales, such as targeted marketing campaigns or promotions.
ggplot(data = SPToys, aes(x = `Units Sold`, y= Profit)) + geom_point() +
geom_smooth() + labs(title = "Units sold vs Profit")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
* There is equally a positive relationship between profit and Units sold
as we can see in the visualization.
Let’s see the best month with the highest revenue in the year 2020
data_2020_2021 <- subset(SPToys, Year %in% c(2020, 2021))
ordered_months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
monthly_summary_2020_2021 <- data_2020_2021 %>%
group_by(Month_text, Year) %>%
summarise(Revenue = sum(Revenue)) %>%
mutate(Month_text = factor(Month_text, levels = ordered_months, ordered = TRUE))
## `summarise()` has grouped output by 'Month_text'. You can override using the
## `.groups` argument.
monthly_summary_2020_2021 <- as.data.frame(monthly_summary_2020_2021)
print(monthly_summary_2020_2021)
## Month_text Year Revenue
## 1 Apr 2020 175018.5
## 2 Apr 2021 216565.0
## 3 Aug 2020 121524.3
## 4 Aug 2021 160907.7
## 5 Dec 2020 203188.1
## 6 Feb 2020 127624.7
## 7 Feb 2021 184085.2
## 8 Jan 2020 131732.0
## 9 Jan 2021 178712.5
## 10 Jul 2020 157047.9
## 11 Jul 2021 223774.2
## 12 Jun 2020 175832.9
## 13 Jun 2021 199483.0
## 14 Mar 2020 143414.2
## 15 Mar 2021 217935.4
## 16 May 2020 167215.6
## 17 May 2021 224318.7
## 18 Nov 2020 159775.6
## 19 Oct 2020 145197.4
## 20 Sep 2020 137295.2
## 21 Sep 2021 156363.1
ggplot(data = monthly_summary_2020_2021, aes(x = Month_text, y = Revenue, fill = as.factor(Year), group = 1)) +
geom_line(color = "blue") +
labs(title = "Revenue per month (2020 vs. 2021)", x = "Month", y = "Total Revenue") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~Year, scales = "free_x", ncol = 2)
December consistently stands out as the month with the highest revenue and profit. February appears to be the month with the lowest profit in 2020, while May and September show the highest and lowest profits in 2021, respectively.
Investigate the factors contributing to the seasonality of sales. Plan marketing and sales strategies around peak months to maximize revenue.
Let’s check the profit, so that we can know if higher revenue really means higher profit
monthly_summary_2020_2021 <- data_2020_2021 %>%
group_by(Month_text, Year) %>%
summarise(Profit = sum(Profit)) %>%
mutate(Month_text = factor(Month_text, levels = ordered_months, ordered = TRUE))
## `summarise()` has grouped output by 'Month_text'. You can override using the
## `.groups` argument.
ggplot(data = monthly_summary_2020_2021, aes(x = Month_text, y = Profit, fill = as.factor(Year), group = 1)) +
geom_line(color = "blue") +
labs(title = "Profit per month (2020 vs. 2021)", x = "Month", y = "Total Profit") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~Year, scales = "free_x", ncol = 2)
The analysis indicates that higher revenue does not necessarily mean higher profit.
Conduct a detailed analysis of expenses, cost structures, and profit margins to understand the factors affecting profit. Implement cost-cutting measures or pricing strategies to improve profit margins.
It’s time to have a look at our stores to see how they have been performing over the years.
store_summary_2020_2021 <- data_2020_2021 %>%
group_by(`Store Name`, Year) %>%
summarise(Total_Revenue = sum(Revenue))
## `summarise()` has grouped output by 'Store Name'. You can override using the
## `.groups` argument.
ggplot(data = store_summary_2020_2021, aes(x = `Store Name`, y = Total_Revenue, fill = as.factor(Year))) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Store Performance Comparison (2020 vs. 2021)", x = "Store Name", y = "Total Revenue") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + # Rotate x-axis labels for better readability
facet_wrap(~Year, scales = "free_x", ncol = 2)
###### Insights: O’Hare consistently outperforms other stores in terms
of revenue in both 2020 and 2021. JFX and Hollywood show lower
performance among stores.
Analyze the operational and marketing strategies of high-performing stores for potential implementation in others. Identify and address challenges faced by lower-performing stores.