Essential Business Question: The business question this report will attempt to solve is if we are doing the most possible business in every income demographic possible.
Thought Process: By Focusing on income levels we can see which provide the most value to our company and plan accordingly
Plan of Attack: Utilize product, transaction, and demographics data in order to provide a valuable analysis on how we can better our products for our customers and bring the company more profit.
Complete Journey- This gives us all of the Regork data needed for the project
GG PLOT 2- This package allows us to have awesome visualizations to help draw conclusions and make predictions about our data
DPLYR- This package allows us to format and use our data in advantageous ways
Transactions
Demographics
Products
transactions
demographics -> demographics
products -> products
product_id - This was the primary key used to join the table products and transactions tables together
income - this was they key to the data set as it got the income levels for the households
household_id - identifies each household
basket_id - identifies each product
# income_count to show number of customers per income level
income_count <- demographics %>%
group_by(income) %>%
summarize(num_customers = n()) %>%
ungroup()
#plot it
ggplot(data = income_count, aes(x = income, y = num_customers)) +
geom_col(fill = "skyblue") +
labs(
title = "Number of Customers Per Income Level",
x = "Income Level",
y = "Number of Customers"
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
)
This chart is helping us visualize the number of customers we serve per income level. While we expect more customers close to the national average in the $35,000-$100,000 range, it appears we are severely low on 200-249K customers.While we have enough customers in the other categories, this report will focus on increasing our customer base in the 200-249k customer range. Now we should check out the transactions per household.
trans_per_household <- transactions %>%
group_by(household_id) %>%
summarize(num_transactions = n_distinct(basket_id))
# join with demo
trans_per_household_income <- trans_per_household %>%
left_join(demographics, by = "household_id")
# look over the data/make sure it is looking good
head(trans_per_household_income)
## # A tibble: 6 × 9
## household_id num_transactions age income home_ownership marital_status
## <chr> <int> <ord> <ord> <ord> <ord>
## 1 1 51 65+ 35-49K Homeowner Married
## 2 10 1 <NA> <NA> <NA> <NA>
## 3 100 21 <NA> <NA> <NA> <NA>
## 4 1000 85 <NA> <NA> <NA> <NA>
## 5 1001 46 45-54 50-74K Homeowner Unmarried
## 6 1002 16 <NA> <NA> <NA> <NA>
## # ℹ 3 more variables: household_size <ord>, household_comp <ord>,
## # kids_count <ord>
#o see the avg # of transactions per household in each income level and filter out n/a
transactions_by_income <- trans_per_household_income %>%
group_by(income) %>%
summarize(avg_transactions = mean(num_transactions, na.rm = TRUE),
count_households = n()) %>%
ungroup()
#look over again
head(transactions_by_income)
## # A tibble: 6 × 3
## income avg_transactions count_households
## <ord> <dbl> <int>
## 1 Under 15K 121. 61
## 2 15-24K 98.0 74
## 3 25-34K 115. 77
## 4 35-49K 99.2 172
## 5 50-74K 98.6 192
## 6 75-99K 89 96
#plot it
ggplot(transactions_by_income, aes(x = income, y = avg_transactions)) +
geom_col(fill = "skyblue") +
labs(
title = "Average Number of Transactions Per Household by Income",
x = "Income Range",
y = "Avg. Transactions per Household"
) +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This chart helps paint a clear picture that we need to increase our number of customers in the 200-249k range and we need to find a way to increase their transactions. This chart also showcases all of the null/N/A data we have that can take away from the accuracy. This data also helps show some great consistency with the other income groups, besides $200-$249k
transactions_per_basket <- transactions %>%
group_by(household_id, basket_id) %>%
summarize(
total_spend = sum(sales_value, na.rm = TRUE),
.groups = "drop"
)
trans_basket_income <- transactions_per_basket %>%
left_join(demographics, by = "household_id")
spend_by_income <- trans_basket_income %>%
group_by(income) %>%
summarize(
avg_spend_per_transaction = mean(total_spend, na.rm = TRUE),
num_transactions = n(),
.groups = "drop"
)
ggplot(spend_by_income, aes(x = income, y = avg_spend_per_transaction)) +
geom_col(fill = "skyblue") +
labs(
title = "Avg Dollars Spent Per Transaction",
x = "Income",
y = "Avg $ Spend Per"
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
)
This helps show the importance of increasing our customer base in the 200-249k category. The 200-249k is among the highest spending per transaction. This provides our company a massive void to fill that can increase our customer base in this income rage.
#join/filter the data for the graph
transactions_200_249 <- transactions %>%
left_join(demographics, by = "household_id") %>%
left_join(products, by = "product_id") %>%
filter(income == "200-249K")
#group
top_sales_categories <- transactions_200_249 %>%
group_by(product_category) %>%
summarize(
total_sales = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(total_sales)) %>%
slice_head(n = 10)
#plot it
ggplot(top_sales_categories, aes(
x = reorder(product_category, total_sales),
y = total_sales
)) +
geom_col(fill = "skyblue") +
coord_flip() +
labs(
title = "Top 10 Products by Sales For $200K-$249K)",
x = "Products",
y = "Total Sales In Dollars"
) +
theme_minimal(base_size = 14)
This data helps us see what products we can capitalize on to increase growth from the $200-$249k range. We can see that they primarily buy groceries in their transactions. Considering their income, one thing we can do a better job at is offering higher quality beef products to suit their demands. In addition, we can do a better job at advertising that we have pet products available at our store.
There are a few major limitations in the analysis.
Only 2017 data is shown
Only numerical data is provided, I would love to see some reports on how consumers feel our selection and prices are
There was a lot of N/A data that can skew analysis depending on what that N/A data actually is
The new tables and data visualizations paint a clear picture on a fast way to improve our company. My official recommendation would be to run advertising campaigns centered at the $200-$249k income group. We should feature the products these households have been proven to buy with us, especially since they spend a high amount per transaction. If we can get their market share on par with other similarly sized markets, it will lead to near instant growth.