library(completejourney) # Data that represents grocery store transactions from 2,469 households.
library(DT) # Creates understandable data tables once a user Knits into HTML.
library(lubridate) # Provides tools that make it easier to manipulate dates.
library(ggplot2) # A system for creating visually pleasing graphics based off of the data.
library(tidyverse) # System of packages for data manipulation, exploration and visualization.
library(dplyr) # Helps solve the most common data manipulation challenges with verbs.
library(scales) # Scales that map data to various aesthetics and helps with labels, axes and legends.
library(ggrepel) # Move text labels while plotting for aesthetic puposes.
transactions <- get_transactions(verbose = FALSE)
promotions <- get_promotions(verbose = FALSE)
products <- products
demographics <- demographics
plot_one <- products %>%
filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
inner_join(transactions, by = "product_id") %>%
group_by(Month = month(transaction_timestamp, label = TRUE, abbr = TRUE)) %>%
mutate(month = month(transaction_timestamp)) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
plot_one %>%
ggplot(aes(x = Month, y = total_sales)) +
geom_col(fill = "lightblue", color = "darkgreen") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(title = "Monthly School Supplies Sales",
subtitle = "August shows the most total sales across all other months",
x = "Month",
y = "Total Sales"
)
From this analysis we can tell that August has the highest total sales at just over $1,500 dollars. Following, we can see that no other months truly come close, with the next closest being in December at just over $1,000.
An easier look into the monthly sales of school supplies.
Showing the total sales in dollar amount by each month to get exact sales values.
datatable(head(plot_one, 12), options = list(pageLength = 12))
plot_two <- products %>%
filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
inner_join(transactions, by = "product_id") %>%
group_by(day = day(transaction_timestamp)) %>%
mutate(day = day(transaction_timestamp)) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
plot_two %>%
ggplot(aes(x = day, y = total_sales, c("Aug"))) +
geom_line(color = "darkgreen") +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_continuous(breaks = seq(1, 31, by = 1)) +
labs(title = "Daily School Supply Sales in August",
x = "Day",
y = "Total Sales")
This analysis proves to be beneficial for Regork as you can tell from the chart that school supply sales are pretty steady from the first of the month until around the 19th of August. With Sales taking a huge leap from the 20th-24th. From this spike Regork must be aware of having enough product on hand to support the august spike, so customers are leaving happy and Regork is operating efficiently.
An easier look into the daily sales throughout August to determine exact sales.
Showing the total sales in dollar amount by each day in August.
datatable(plot_two)
plot_three <- products %>%
filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
inner_join(transactions, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(age) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
plot_three %>%
ggplot(aes(x = age, y = total_sales)) +
geom_col(fill = "lightblue", color = "darkgreen") +
scale_y_continuous(labels = scales::dollar_format()) +
geom_label_repel(aes(label = total_sales), size = 3, nudge_x = 0.1, nudge_y = 0.5) +
labs(title = "School Supplies Sales by Age Ranges",
x = "Age Ranges",
y = "Total Sales"
)
From this analysis you can tell that Regork should market the school supplies around the age ranges of 35-44 and 45-54 as they spend the most among all age ranges for school supplies. In doing so they could be as efficient as possible in terms of sales.
plot_three <- products %>%
filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
inner_join(transactions, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(income) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
plot_three %>%
ggplot(aes(x = income, y = total_sales)) +
geom_col(fill = "lightblue", color = "darkgreen") +
scale_y_continuous(labels = scales::dollar_format()) +
geom_label_repel(aes(label = total_sales), size = 2.5, nudge_x = .2, nudge_y = 1) +
theme(axis.text.x = element_text(angle = 50, size = 5, vjust = 0.5)) +
theme(axis.text.x = element_text(size = 7)) +
labs(title = "School Supplies Sales by Income Ranges",
x = "Income Ranges",
y = "Total Sales"
)
From this analysis you can see the total sales of school supplies by income ranges. Furthermore, we can see that the top two income ranges in terms of sales are from 35-49K and 50-74K. Meaning Regork can determine at what prices they should sell school supplies to appeal to these income ranges.
From this analysis Regork can determine the optimal solution to generate the most school supplies sales while operating efficiently. We found that sales in August were the greatest, diving deeper we found that sales were the highest from the 21^st to the 24^th of August. Following that, Regork should appeal the school supplies to ages 35-54 as they generate the highest sales. Finally, Regork should price their school supplies to income ranges between 35-74K as they generate the highest sales.