Regork National Grocery Chain

Introduction

In the realm of analytical exploration, formulating a precise business question is just the beginning; translating it into a coherent analytic approach is where the real challenge lies. As the orchestrator of insight, you’ll delineate a logical pathway to address the chosen business question, leveraging data to unravel its complexities. For instance, consider the inquiry: “Do customers purchasing frozen pizzas also tend to buy beer simultaneously?” Your approach might entail merging transaction records with product data, employing regular expressions to identify relevant items, and crafting variables to discern concurrent purchases. Essential to this process is a keen awareness of subgroup dynamics, data quality nuances, temporal trends, and potential confounding variables. Your narrative should culminate in a cogent storyline that elucidates the business problem, eschewing complexity in favor of impactful simplicity. The dataset of choice, Complete Journey data, offers a rich tapestry for exploration, demanding meticulous integration of multiple datasets to extract meaningful insights, echoing the sentiment that sometimes, the most powerful analyses stem from the simplest statistics.

1. Soft drinks or Coffee?

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
## 
## Joining with `by = join_by(household_id)`
## Joining with `by = join_by(product_id)`
## Total quantity of soft drink:  47769 
## Total quantity of coffee:  5162

2. Top 5 product for househole of 2 and more than 2

# Count households with size 2 and higher than 2
household_counts <- table(ifelse(demo_trans_prod$household_size == 2, "Size 2", "More than 2"))


#greater than 2 most buy product 
# Display the counts
print(household_counts)
## 
## More than 2      Size 2 
##      512023      316827
households_greater_than_2 <- subset(demo_trans_prod, household_size > 2)
product_counts_greater_than_2 <- table(households_greater_than_2$product_type)
sorted_product_counts_greater_than_2 <- sort(product_counts_greater_than_2, decreasing = TRUE)

# Display the sorted product counts
head (sorted_product_counts_greater_than_2,n=5)
## 
##          FLUID MILK WHITE ONLY         YOGURT NOT MULTI-PACKS 
##                           6788                           4288 
## SOFT DRINKS 12/18&15PK CAN CAR SFT DRNK 2 LITER BTL CARB INCL 
##                           3890                           3611 
##                        BANANAS 
##                           3110
product_counts_df <- as.data.frame(sorted_product_counts_greater_than_2)
product_counts_df$product_type <- rownames(product_counts_df)

# Rename the columns
colnames(product_counts_df) <- c("Product_Type", "Count")

# Create the bar chart
bar_chart <- ggplot(product_counts_df[1:5, ], aes(x = reorder(Product_Type, Count), y = Count)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  labs(title = "Top 5 Selling Products for Households > 2",
       x = "Product Type",
       y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(bar_chart)

#most buy for household size of 2 

households_equal_to_2 <- subset(demo_trans_prod, household_size == 2)
product_counts_equal_to_2 <- table(households_equal_to_2$product_type)
sorted_product_counts_equal_to_2 <- sort(product_counts_equal_to_2, decreasing = TRUE)

print(head(sorted_product_counts_equal_to_2, n = 5))
## 
##          FLUID MILK WHITE ONLY         YOGURT NOT MULTI-PACKS 
##                           7909                           5289 
## SOFT DRINKS 12/18&15PK CAN CAR                        BANANAS 
##                           4743                           4129 
## SFT DRNK 2 LITER BTL CARB INCL 
##                           3553
# Convert the sorted product counts to a data frame
product_counts_df_equal_to_2 <- as.data.frame(sorted_product_counts_equal_to_2)
product_counts_df_equal_to_2$product_type <- rownames(product_counts_df_equal_to_2)

# Rename the columns
colnames(product_counts_df_equal_to_2) <- c("Product_Type", "Count")

# Create the bar chart
bar_chart_equal_to_2 <- ggplot(product_counts_df_equal_to_2[1:5, ], aes(x = reorder(Product_Type, Count), y = Count)) +
  geom_bar(stat = "identity", fill = "purple", color = "black") +
  labs(title = "Top 5 Selling Products for Households = 2",
       x = "Product Type",
       y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(bar_chart_equal_to_2)

Summary

It’s great that you’ve identified the top 5 products with the highest sales quantity for households of size 2 and greater than 2. Banana, milk, soft drink, and yogurt seem to be consistent across both household sizes, indicating their popularity among consumers.

Considering soft drink stands out as a significant contender for investment, it’s crucial to delve deeper into its performance and market trends. Analyzing its sales patterns over time, identifying any seasonal fluctuations, and understanding consumer preferences can provide valuable insights for investment decisions.

Additionally, conducting a thorough market analysis, assessing competitor strategies, and exploring potential marketing initiatives can help maximize the profitability of investments in these top-selling products.

Overall, leveraging data-driven insights and market intelligence can inform strategic decisions and enhance the success of investment ventures.