Mahika Gunjkar, Fairouz Maayah, Ashleigh Fischer

Introduction

Business Question: Do we notice a trend (positive or negative) where purchasing behavior is changing over time. Maybe sales volume is increasing for a particular product or customer segment, and we could invest in marketing to hopefully compound this trend?

Packages Required:

ggplot2: A visualization package that implements the Grammar of Graphics to create complex and customizable plots.

tidyverse: A collection of R packages designed for data science, emphasizing tidy data principles for streamlined data manipulation and visualization.

dplyr: A data manipulation package that provides functions for transforming and summarizing tabular data with a clear syntax.

completejourney: Contains datasets related to customer purchases, demographics, and promotional details for analyzing consumer behavior.

lubridate: Simplifies the manipulation and formatting of date-time objects for easier time series analysis.

kableExtra: Enhances knitr::kable() with additional formatting options for creating visually appealing tables in HTML and LaTeX.

scales: Provides functions to format axis labels and control scales in ggplot2 plots, improving readability.

plotly: An interactive graphing library that allows the creation of web-based interactive plots, integrating well with ggplot2.

RColorBrewer: Offers color palettes for creating visually appealing plots, especially useful in maps and categorical data visualizations.

viridis: Provides perceptually uniform color palettes that enhance accessibility and clarity in visualizations.

broom: Converts statistical model objects into tidy data frames, summarizing results into a standard format for easier analysis.

transactions <- transactions_sample
products <- products

combined_data <- transactions %>%
  left_join(products, by = "product_id")  

Total sales by department in a pie chart

department_sales <- combined_data %>%
  group_by(department) %>%
  summarise(Total_Sales = sum(quantity), .groups = 'drop') %>%
  arrange(desc(Total_Sales))

department_sales <- department_sales %>%
  mutate(Percentage = Total_Sales / sum(Total_Sales) * 100)

num_departments <- nrow(department_sales)
custom_colors <- viridis::viridis(num_departments)  

ggplot(department_sales, aes(x = "", y = Total_Sales, fill = department)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  labs(title = "Sales Contribution by Department", 
       fill = "Department") +
  theme_void() +
  scale_fill_manual(values = custom_colors) +  
  
  theme(legend.position = "right")  

We analyzed the sales contribution of different departments and the sales contribution by each department was represented by a pie graph, this was done to understand which department contributes the most sales which will allow us to plan for better resource allocations, promotions and inventory management. We utilized the transaction_sample and products datasets from the complete packages journey, then we merged these datasets with the product_id column. After joining the datasets the sales were aggregated by departments by summing up all the sales made by each department.

We found that the department with the highest sales was fuel, but in the pie graph you can also see the other departments that came second and third. By visualizing the percentage of sales for each department it becomes clear how sales are spread across different categories, this allows us to see if were only dependent on one department for most of the overall sales. When we understand which department contributes the most sales, it helps us tailor our marketing efforts so that the high-performance department which is fuel can benefit from enhanced promotions to capitalize sales but also see targeted offers to boost sales for other departments. I would recommend Regork to capitalize on their fuel sales and analyze all the other departments to understand the low sales and develop strategies to improve sales.

Average price and total sales for top products

top_products <- combined_data %>%
  group_by(department) %>%
  summarise(Total_Sales = sum(quantity), 
            Avg_Price = mean(sales_value, na.rm = TRUE), 
            .groups = 'drop') %>%
  arrange(desc(Total_Sales)) %>%
  head(5)  


ggplot() +
  geom_bar(data = top_products, aes(x = reorder(department, Total_Sales), y = Total_Sales), 
           stat = "identity", fill = "steelblue", position = "dodge") +
  geom_line(data = top_products, aes(x = reorder(department, Total_Sales), 
                                       y = Avg_Price * max(top_products$Total_Sales) / max(top_products$Avg_Price)), 
            color = "orange", size = 1, group = 1) +
  geom_point(data = top_products, aes(x = reorder(department, Total_Sales), 
                                        y = Avg_Price * max(top_products$Total_Sales) / max(top_products$Avg_Price)), 
             color = "orange", size = 2) +
  labs(title = "Top 5 Products: Total Sales and Average Price",
       x = "Product Department", 
       y = "Total Sales") +
  scale_y_continuous(labels = scales::comma,
                     sec.axis = sec_axis(~ . * max(top_products$Avg_Price) / max(top_products$Total_Sales), 
                                         name = "Average Price")) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

We decided to identify the top 5 departments by total sales and compared their performances based one “total sales” and “average price of products” the analysis helps identify which departments not only sell the most products but also maintain higher average prices. We decided to use combines_data and merge it to transaction_sample and products databases and focus on the sales and pricing at the department level. For each department the total sales and average price are calculates and we chose the top five departments based on total sales, a bar chart is chosen to visualize the top 5 departments while a line graph is used to show the price average for each department. We found that the top five departments were fuel miscellaneous, grocery, drug gm, and produce with produce being the lowest and fuel being the highest. We recommend an adjustment in pricing strategy for departments with high total sales, but low average prices might benefit from price increases if the demand is inelastic while departments that have low sales might benefit from a price reduction. We also suggest targeted promotion towards low sales to increase sales volumes.

Predictive Analysis for Future

model <- lm(Total_Sales ~ month, data = monthly_sales)

future_months <- data.frame(month = seq(max(monthly_sales$month) + months(1), by = "month", length.out = 12))

predictions <- predict(model, newdata = future_months, interval = "confidence")

forecast_data <- bind_rows(
  monthly_sales %>% mutate(Type = "Historical"),
  future_months %>% mutate(Total_Sales = predictions[, "fit"], Type = "Predicted"),
  future_months %>% mutate(Lower_CI = predictions[, "lwr"], Upper_CI = predictions[, "upr"], Type = "Confidence Interval")
)

ggplot(forecast_data, aes(x = month)) +
  geom_line(aes(y = Total_Sales, color = Type), size = 1) +
  geom_point(aes(y = Total_Sales, color = Type), size = 2) +
  geom_ribbon(data = forecast_data %>% filter(Type == "Confidence Interval"), 
              aes(ymin = Lower_CI, ymax = Upper_CI), 
              fill = "lightblue", alpha = 0.5) +
  labs(title = "Monthly Sales Trend and Forecast with Confidence Intervals",
       x = "Month",
       y = "Total Sales") +
  scale_y_continuous(labels = scales::comma) +
  scale_color_manual(values = c("Historical" = "blue", "Predicted" = "orange", "Confidence Interval" = "lightblue")) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This predictive analysis uses historical sales data and a linear regression model to forecast future trends. The model predicts a rise in sales over the next 12 months, signaling positive consumer demand. However, variability in predictions, indicated by confidence intervals, suggests uncertainty in exact sales figures, emphasizing the need for ongoing monitoring.

Key actionable insights include increasing inventory levels to meet rising demand, planning marketing campaigns to capitalize on growth, and optimizing promotions around peak sales periods. If signs of a downturn appear, businesses should investigate factors affecting demand, adjust marketing strategies, and prepare for supply chain and inventory challenges.

Overall, the analysis offers critical insights into future sales trends, helping guide strategic decisions. Continuous comparison of actual sales against predictions will enable businesses to respond flexibly to market changes, ensuring they remain prepared for both growth opportunities and potential risks.

Summary table:

department_sales <- combined_data %>%
  group_by(department) %>%
  summarise(Total_Sales = sum(quantity),
            Avg_Sales_Per_Transaction = mean(quantity, na.rm = TRUE),
            Total_Transactions = n(),
            .groups = 'drop') %>%
  mutate(Percentage_Contribution = Total_Sales / sum(Total_Sales) * 100) %>%
  arrange(desc(Total_Sales))

summary_table <- department_sales %>%
  select(department, Total_Sales, Avg_Sales_Per_Transaction, Total_Transactions, Percentage_Contribution)

kable(summary_table, caption = "Detailed Summary of Sales by Department") %>%
  kable_styling(full_width = FALSE, position = "left")
Detailed Summary of Sales by Department
department Total_Sales Avg_Sales_Per_Transaction Total_Transactions Percentage_Contribution
FUEL 6479260 9613.1454006 674 83.2307603
MISCELLANEOUS 1208045 4718.9257812 256 15.5182079
GROCERY 63730 1.3442028 47411 0.8186577
DRUG GM 10199 1.2841853 7942 0.1310135
PRODUCE 9691 1.2731214 7612 0.1244879
MEAT-PCKGD 4241 1.3195395 3214 0.0544787
MEAT 3358 1.3257008 2533 0.0431359
DELI 2009 1.0847732 1852 0.0258071
PASTRY 1422 1.3339587 1066 0.0182666
NUTRITION 1290 1.3451512 959 0.0165710
SEAFOOD-PCKGD 384 1.2268371 313 0.0049328
SALAD BAR 325 1.1245675 289 0.0041749
COSMETICS 242 1.0297872 235 0.0031087
FLORAL 164 1.2058824 136 0.0021067
SEAFOOD 109 1.1122449 98 0.0014002
SPIRITS 74 1.0882353 68 0.0009506
GARDEN CENTER 43 1.7916667 24 0.0005524
CHEF SHOPPE 29 1.0740741 27 0.0003725
TRAVEL & LEISURE 21 1.0000000 21 0.0002698
COUPON 20 1.3333333 15 0.0002569
RESTAURANT 16 1.0666667 15 0.0002055
GM MERCH EXP 9 1.2857143 7 0.0001156
FROZEN GROCERY 6 1.0000000 6 0.0000771
AUTOMOTIVE 3 1.0000000 3 0.0000385
PHOTO & VIDEO 2 2.0000000 1 0.0000257
CNTRL/STORE SUP 1 1.0000000 1 0.0000128
NA 1 0.0045045 222 0.0000128

Summary:

Our analysis focused on the sales contribution by department at Regork, revealing that fuel was the top-selling category. We recommend that Regork prioritize the growth of high-performing departments while investigating the factors contributing to lower performance in other areas.

How do we get to our solution

We examined monthly sales trends and discovered that sales dramatically increase in September and decline sharply in January. Based on this trend, we suggest that Regork adjust its marketing strategies to capitalize on high-demand periods and effectively address low-demand months, thereby driving sales volumes.

Our Recommendation

In our analysis of top departments, sales, and pricing, we identified that produce had the lowest sales compared to the other departments, while fuel ranked the highest. We recommend that Regork adjust its pricing strategies to take advantage of the inelastic demand for fuel and consider ways to boost sales in lower-performing categories, thereby increasing overall sales volume.