library(ggplot2)
library(dplyr)
library(completejourney)
transactions <- transactions_sample
library(tidyverse)
library(completejourney)
# Data Processing
top_products <- transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(income, product_category) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)) %>%
group_by(income) %>%
slice_max(order_by = total_sales, n = 5) %>%
ungroup()
# Plot
ggplot(top_products, aes(x = reorder(product_category, total_sales), y = total_sales, fill = income)) +
geom_col() +
coord_flip() +
labs(title = "Top 5 Purchased Products by Income Level",
x = "Product Category",
y = "Total Sales ($)",
fill = "Income Level") +
theme_minimal()
This bar chart illustrates the top five product categories purchased by each income level based on total sales. By examining these spending patterns, we can identify which products are most popular among different income brackets. The use of color coding for income levels makes it easy to compare preferences across groups, highlighting potential differences in purchasing behavior.
# Data Processing
spending_vs_household <- transactions %>%
inner_join(demographics, by = "household_id") %>%
group_by(household_size, income) %>%
summarize(total_spent = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
# Plot
ggplot(spending_vs_household, aes(x = household_size, y = total_spent, color = income)) +
geom_point(size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
labs(title = "Total Spending vs. Household Size",
x = "Household Size",
y = "Total Spending ($)",
color = "Income Level") +
theme_minimal()
This scatter plot explores the relationship between household size and total spending, segmented by income level. The trend line provides insights into whether spending generally increases as household size grows, helping us determine if larger households contribute more to overall sales. Additionally, the inclusion of income levels offers a deeper understanding of how financial status influences household spending behavior.
# Data Processing
payment_by_income <- transactions %>%
inner_join(demographics, by = "household_id") %>%
group_by(income, household_id, age) %>%
summarize(total_spent = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
group_by(income, age) %>%
summarize(total_spent = sum(total_spent), .groups = "drop") %>%
filter(income != "Unknown") # Remove unknown income levels
# Plot
ggplot(payment_by_income, aes(x = income, y = total_spent, fill = age)) +
geom_bar(stat = "identity", position = "fill") +
labs(title = "Proportion of Spending by Age",
x = "Income Level",
y = "Proportion of Total Spend",
fill = "Age") +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This stacked bar chart visualizes the proportion of spending among different age groups within each income level. Instead of showing total spending, it focuses on how different age groups contribute to overall purchases in each income bracket. The percentage-based y-axis makes it easier to compare across income levels, revealing potential trends in consumer behavior based on both age and financial standing.