Introduction

When it comes to running a business, there are many ways to increase revenue as well as enticing customers to come back and shop with us again. By looking at the demographics of our customers we can get a better idea of the items that people buy depending on their income levels. By identifying items that higher income people tend to purchase we can increase the price because they are more likely to keep buying that product no matter the cost. On the other hand by identifying items that lower income people tend to purchase, we can lower the cost or provide coupons more often for those items which encourages people to buy more of the product therefore increasing revenue.

Packages Used

library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.0     âś” stringr   1.5.1
## âś” ggplot2   3.5.1     âś” tibble    3.2.1
## âś” lubridate 1.9.3     âś” tidyr     1.3.1
## âś” purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)

Exploratory Data Analysis

transactions <- get_transactions()
ggplot(demographics, aes(x = income)) +
  geom_bar() +
  ggtitle('Income Ranges of Customers') +
  ylab('Number of Customers') +
  xlab('Income Ranges')

In this graph we see the distribution of incomes of our customers. Most of the customers are middle class but what we want to focus on is the tail ends of the graph. For the purpose of this analysis we will consider lower income to be less than 34K and we will consider higher income to be 150K and more.

higher_income <-
  demographics %>%
  filter(str_detect(income, regex('150-174K|176-199K|200-249K|250K'))) %>%
  inner_join(transactions, by = 'household_id') %>%
  inner_join(products, by = 'product_id')

head(sort(table(higher_income$product_category), decreasing = TRUE), n = 10)
## 
##            SOFT DRINKS    FLUID MILK PRODUCTS BAKED BREAD/BUNS/ROLLS 
##                   2198                   2030                   1825 
##                 YOGURT                 CHEESE                   SOUP 
##                   1782                   1681                   1454 
##             BAG SNACKS         TROPICAL FRUIT            COLD CEREAL 
##                   1314                   1233                   1089 
## FRZN MEAT/MEAT DINNERS 
##                    988
lower_income <-
  demographics %>%
  filter(str_detect(income, regex('Under 15K|15-24K|25-34K'))) %>%
  inner_join(transactions, by = 'household_id') %>%
  inner_join(products, by = 'product_id')

head(sort(table(lower_income$product_category), decreasing = TRUE), n = 10)
## 
##               SOFT DRINKS    BAKED BREAD/BUNS/ROLLS       FLUID MILK PRODUCTS 
##                     10963                      6351                      6201 
##    FRZN MEAT/MEAT DINNERS                    CHEESE                BAG SNACKS 
##                      5739                      5622                      5518 
##                      BEEF              FROZEN PIZZA VEGETABLES - SHELF STABLE 
##                      4425                      4049                      3541 
##                      SOUP 
##                      3476

After manipulating our data so that we can see the most common purchases by higher and lower income people, we can see that the items that higher income people buy more than lower income people is Yogurt and Tropical Fruit. We can also see that lower income people prioritize Soft Drinks, Bread, and Milk.

HI_no_groc_drug <-
  higher_income %>%
  filter(department != 'GROCERY') %>%
  filter(department != 'DRUG GM')

LI_no_groc_drug <-
  lower_income %>%
  filter(department != 'GROCERY') %>%
  filter(department != 'DRUG GM')

Next we can look at which deparments people shop in based on their income levels. The Grocery and Drug departments were removed because they are the two most popular for both groups. This way we can get a better idea of which departments are more popular between each group of people.

ggplot(HI_no_groc_drug, aes(department)) +
  geom_bar() +
  ggtitle('Number of Department Purchases by Higer Income Customers') +
  theme(axis.text.x=element_text(angle=45, hjust=1))+
  ylab('Number of Total Products') +
  xlab('Departments')

ggplot(LI_no_groc_drug, aes(department)) +
  geom_bar() +
  ggtitle('Number of Department Purchases by Lower Income Customers') +
  theme(axis.text.x=element_text(angle=45, hjust=1))+
  ylab('Number of Total Products') +
  xlab('Departments')

One observation that is seen when looking at the two graphs is that lower income people tend to buy Meat products much more often than higher income people.

meat_prod_by_month <-
  products %>%
  filter(str_detect(department, 'MEAT')) %>%
  inner_join(transactions, by = 'product_id') %>%
  inner_join(demographics, by = 'household_id') %>%
  mutate(month = month(transaction_timestamp, label = TRUE, abbr = TRUE)) %>%
  group_by(month) %>%
  summarise(total_sales = sum(sales_value))

ggplot(meat_prod_by_month, aes(x = month, y = total_sales, group = 1)) +
  geom_line() +
  geom_point() +
  ggtitle('Total Sales in the Meat Department by Month') +
  ylab('Total Sales') +
  xlab('Month')

This graph shows us customers who are of lower income tend to buy Meat products more often in the months of May and July.

soda_by_month <-
  products %>%
  filter(str_detect(product_category, 'SOFT DRINKS')) %>%
  inner_join(transactions, by = 'product_id') %>%
  inner_join(demographics, by = 'household_id') %>%
  mutate(month = month(transaction_timestamp, label = TRUE, abbr = TRUE)) %>%
  group_by(month) %>%
  summarise(total_sales = sum(sales_value))

ggplot(soda_by_month, aes(x = month, y = total_sales, group = 1)) +
  geom_line() +
  geom_point() +
  ggtitle('Total Sales of Soft Drinks by Month') +
  ylab('Total Sales') +
  xlab('Month')

This last graph shows us that customers who are of lower income tend to buy Soft Drink products more often in the months of May and July.

Summary

The problem we set out to solve was how to increase revenue while also helping lower income customers better afford the products they tend to buy most of. To do this we identified what was going to be consider higher income and lower income households and then isolating those to find what products each group buy most often. We also looked at which departments each group made most of their purchases in while excluding the grocery and drug departments. By doing this we can see what lower income customers buy more of than higher income customers. Once we identified those departments, which were meat and soft drinks, we can identify the time of year that they bought more of those items.

One solution to our problem is to raise the prices on items like yogurt and tropical fruit that higher income customers tend to buy while lowering items like milk and bread that lower income customers tend to buy the most. This would cause higher income customers to still buy yogurt and topical fruit because they can afford it as well as allow more lower income customers to purchase milk and bread more often which raises revenue.

Another solution to our problem is to look at our last two graphs and see that in both the meat department and soft drink product category, during the months of May and July, the demand for these products by our lower income customers skyrockets, but for some reason in the month of June, it seems to take a dip. To allow for lower income customers to more easily make purchases of these products, we can send coupons that encourage the purchasing of both meat and soft drinks for the summer months. This not only would allow lower income customers to purchase these products at a higher rate, but it would raise sales in the month of June where it wasn’t selling as well as the months of May and July.

The limitations of this exploratory process is that many of the products that both higher and lower income customers buy are similar like milk, bread, and soft drinks, so making it easier for lower income people to shop for popular products while also raising profits is a tedious task.