In the realm of analytical exploration, formulating a precise business question is just the beginning; translating it into a coherent analytic approach is where the real challenge lies. As the orchestrator of insight, you’ll delineate a logical pathway to address the chosen business question, leveraging data to unravel its complexities. For instance, consider the inquiry: “Do customers purchasing frozen pizzas also tend to buy beer simultaneously?” Your approach might entail merging transaction records with product data, employing regular expressions to identify relevant items, and crafting variables to discern concurrent purchases. Essential to this process is a keen awareness of subgroup dynamics, data quality nuances, temporal trends, and potential confounding variables. Your narrative should culminate in a cogent storyline that elucidates the business problem, eschewing complexity in favor of impactful simplicity. The dataset of choice, Complete Journey data, offers a rich tapestry for exploration, demanding meticulous integration of multiple datasets to extract meaningful insights, echoing the sentiment that sometimes, the most powerful analyses stem from the simplest statistics.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
##
## Joining with `by = join_by(household_id)`
## Joining with `by = join_by(product_id)`
## Total quantity of soft drink: 47769
## Total quantity of coffee: 5162
# Count households with size 2 and higher than 2
household_counts <- table(ifelse(demo_trans_prod$household_size == 2, "Size 2", "More than 2"))
#greater than 2 most buy product
# Display the counts
print(household_counts)
##
## More than 2 Size 2
## 512023 316827
households_greater_than_2 <- subset(demo_trans_prod, household_size > 2)
product_counts_greater_than_2 <- table(households_greater_than_2$product_type)
sorted_product_counts_greater_than_2 <- sort(product_counts_greater_than_2, decreasing = TRUE)
# Display the sorted product counts
head (sorted_product_counts_greater_than_2,n=5)
##
## FLUID MILK WHITE ONLY YOGURT NOT MULTI-PACKS
## 6788 4288
## SOFT DRINKS 12/18&15PK CAN CAR SFT DRNK 2 LITER BTL CARB INCL
## 3890 3611
## BANANAS
## 3110
product_counts_df <- as.data.frame(sorted_product_counts_greater_than_2)
product_counts_df$product_type <- rownames(product_counts_df)
# Rename the columns
colnames(product_counts_df) <- c("Product_Type", "Count")
# Create the bar chart
bar_chart <- ggplot(product_counts_df[1:5, ], aes(x = reorder(Product_Type, Count), y = Count)) +
geom_bar(stat = "identity", fill = "skyblue", color = "black") +
labs(title = "Top 5 Selling Products for Households > 2",
x = "Product Type",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(bar_chart)
#most buy for household size of 2
households_equal_to_2 <- subset(demo_trans_prod, household_size == 2)
product_counts_equal_to_2 <- table(households_equal_to_2$product_type)
sorted_product_counts_equal_to_2 <- sort(product_counts_equal_to_2, decreasing = TRUE)
print(head(sorted_product_counts_equal_to_2, n = 5))
##
## FLUID MILK WHITE ONLY YOGURT NOT MULTI-PACKS
## 7909 5289
## SOFT DRINKS 12/18&15PK CAN CAR BANANAS
## 4743 4129
## SFT DRNK 2 LITER BTL CARB INCL
## 3553
# Convert the sorted product counts to a data frame
product_counts_df_equal_to_2 <- as.data.frame(sorted_product_counts_equal_to_2)
product_counts_df_equal_to_2$product_type <- rownames(product_counts_df_equal_to_2)
# Rename the columns
colnames(product_counts_df_equal_to_2) <- c("Product_Type", "Count")
# Create the bar chart
bar_chart_equal_to_2 <- ggplot(product_counts_df_equal_to_2[1:5, ], aes(x = reorder(Product_Type, Count), y = Count)) +
geom_bar(stat = "identity", fill = "purple", color = "black") +
labs(title = "Top 5 Selling Products for Households = 2",
x = "Product Type",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(bar_chart_equal_to_2)
It’s great that you’ve identified the top 5 products with the highest sales quantity for households of size 2 and greater than 2. Banana, milk, soft drink, and yogurt seem to be consistent across both household sizes, indicating their popularity among consumers.
Considering soft drink stands out as a significant contender for investment, it’s crucial to delve deeper into its performance and market trends. Analyzing its sales patterns over time, identifying any seasonal fluctuations, and understanding consumer preferences can provide valuable insights for investment decisions.
Additionally, conducting a thorough market analysis, assessing competitor strategies, and exploring potential marketing initiatives can help maximize the profitability of investments in these top-selling products.
Overall, leveraging data-driven insights and market intelligence can inform strategic decisions and enhance the success of investment ventures.