Results

Introduction

  • Introduction:
    • We wanted to determine what specific month and days school supplies sales are the greatest and which age ranges as well as income ranges had the highest total sales. By determining the days on which sales spike and which customers purchase the most school supplies, Regork can best utilize the school supplies to appeal to the customers and operate more efficiently.
  • The data used and the analytic methodology employed:
    • To determine how Regork could be most effective in their school supplies sales, we used the comparisons of sales between all of the months and then narrowed it down to school supplies sales each day in the peak month sales (August). Following that we wanted to determine which age ranges purchased the most school supplies based on total sales. Lastly, we wanted to determine which income ranges had the highest total sales. For our analysis we used data from products, demographics, and transactions.

Packages Required

  • The following are required packages needed to run all of the coding throughout.
library(completejourney) # Data that represents grocery store transactions from 2,469 households.
library(DT) # Creates understandable data tables once a user Knits into HTML.
library(lubridate) # Provides tools that make it easier to manipulate dates.
library(ggplot2) # A system for creating visually pleasing graphics based off of the data.
library(tidyverse) # System of packages for data manipulation, exploration and visualization.
library(dplyr) # Helps solve the most common data manipulation challenges with verbs.
library(scales) # Scales that map data to various aesthetics and helps with labels, axes and legends.
library(ggrepel) # Move text labels while plotting for aesthetic puposes. 

Exploratory Data Analysis

  • Data variables required throughout.
transactions <- get_transactions(verbose = FALSE)
promotions <- get_promotions(verbose = FALSE)
products <- products
demographics <- demographics
  • Code required to determine school supplies sales by month.
plot_one <- products %>%
  filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
  inner_join(transactions, by = "product_id") %>%
  group_by(Month = month(transaction_timestamp, label = TRUE, abbr = TRUE)) %>%
  mutate(month = month(transaction_timestamp)) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))
  • Plot One: Code required to show sales by month column chart.
plot_one %>%
  ggplot(aes(x = Month, y = total_sales)) +
  geom_col(fill = "lightblue", color = "darkgreen") +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Monthly School Supplies Sales",
       subtitle = "August shows the most total sales across all other months",
       x = "Month",
       y = "Total Sales"
  )

From this analysis we can tell that August has the highest total sales at just over $1,500 dollars. Following, we can see that no other months truly come close, with the next closest being in December at just over $1,000.

An easier look into the monthly sales of school supplies.

Showing the total sales in dollar amount by each month to get exact sales values.

datatable(head(plot_one, 12), options = list(pageLength = 12))
  • Code required to determine the daily sales of school supplies surrounding the peak month of August.
plot_two <- products %>%
  filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
  inner_join(transactions, by = "product_id") %>%
  group_by(day = day(transaction_timestamp)) %>%
  mutate(day = day(transaction_timestamp)) %>% 
  summarize(total_sales = sum(sales_value, na.rm = TRUE))
  • Code required to plot daily sales of school supplies in the month of August.
plot_two %>%
  ggplot(aes(x = day, y = total_sales, c("Aug"))) +
  geom_line(color = "darkgreen") +
  scale_y_continuous(labels = scales::dollar_format()) +
  scale_x_continuous(breaks = seq(1, 31, by = 1)) +
  labs(title = "Daily School Supply Sales in August",
       x = "Day",
       y = "Total Sales")

This analysis proves to be beneficial for Regork as you can tell from the chart that school supply sales are pretty steady from the first of the month until around the 19th of August. With Sales taking a huge leap from the 20th-24th. From this spike Regork must be aware of having enough product on hand to support the august spike, so customers are leaving happy and Regork is operating efficiently.

An easier look into the daily sales throughout August to determine exact sales.

Showing the total sales in dollar amount by each day in August.

datatable(plot_two)
  • Code required to find total sales by age ranges
plot_three <- products %>%
  filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
  inner_join(transactions, by = "product_id") %>%
  inner_join(demographics, by = "household_id") %>%
  group_by(age) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))
  • Code required to plot school supplies sales by the various age ranges.
plot_three %>%
  ggplot(aes(x = age, y = total_sales)) +
  geom_col(fill = "lightblue", color = "darkgreen") +
  scale_y_continuous(labels = scales::dollar_format()) +
  geom_label_repel(aes(label = total_sales), size = 3, nudge_x = 0.1, nudge_y = 0.5) +
  labs(title = "School Supplies Sales by Age Ranges",
       x = "Age Ranges",
       y = "Total Sales"
  )

From this analysis you can tell that Regork should market the school supplies around the age ranges of 35-44 and 45-54 as they spend the most among all age ranges for school supplies. In doing so they could be as efficient as possible in terms of sales.

  • Code required to find total sales of school supplies from income ranges.
plot_three <- products %>%
  filter(str_detect(product_category, regex("STATIONERY & SCHOOL SUPPLIES", ignore_case = TRUE))) %>%
  inner_join(transactions, by = "product_id") %>%
  inner_join(demographics, by = "household_id") %>%
  group_by(income) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))
  • Code required to plot total sales of school supplies by income ranges.
plot_three %>%
  ggplot(aes(x = income, y = total_sales)) +
  geom_col(fill = "lightblue", color = "darkgreen") +
  scale_y_continuous(labels = scales::dollar_format()) +
  geom_label_repel(aes(label = total_sales), size = 2.5, nudge_x = .2, nudge_y = 1) +
  theme(axis.text.x = element_text(angle = 50, size = 5, vjust = 0.5)) +
  theme(axis.text.x = element_text(size = 7)) +
  labs(title = "School Supplies Sales by Income Ranges",
       x = "Income Ranges",
       y = "Total Sales"
  )

From this analysis you can see the total sales of school supplies by income ranges. Furthermore, we can see that the top two income ranges in terms of sales are from 35-49K and 50-74K. Meaning Regork can determine at what prices they should sell school supplies to appeal to these income ranges.

Summary

From this analysis Regork can determine the optimal solution to generate the most school supplies sales while operating efficiently. We found that sales in August were the greatest, diving deeper we found that sales were the highest from the 21^st to the 24^th of August. Following that, Regork should appeal the school supplies to ages 35-54 as they generate the highest sales. Finally, Regork should price their school supplies to income ranges between 35-74K as they generate the highest sales.