Introduction

Regork Growth Opportunity

The business problem I am trying to solve is how does income effect what products are purchased? With this information, we can help our business grow and even help families be able to afford certain products they couldn’t afford before.

How Did I Address This?

The way I addressed this issue was by comparing data on graphs. By doing this, I was able to see the trends in the data and visualize why certain products were purchased by certain demographics.

Why is This Information Useful?

This information is useful as it can help Regork grow as a business. Being able to see trends, we can visualize what can help us grow and have a larger purchasing range.

Packages

Complete Journey - This package contains all of the data sets required to perform my analysis for Regork.

DT - DT provides easy ways to insert and read data tables in an RMarkdown when it is Knit into an HTML.

library(completejourney)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(lubridate)
library(DT)

Load in data

demos <- demographics
prods <- products
trans <- get_transactions()
prom <- promotions_sample

Join the data

dt <- inner_join(demos, trans)
## Joining with `by = join_by(household_id)`
dtp <- inner_join(dt, prods)
## Joining with `by = join_by(product_id)`

Preview the data

datatable(head(dtp, 500), options = list(pageLength = 10))

Analysis

ggplot(data = demos, aes(x = income, fill = income)) +
  geom_bar() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=.5)) +
  ggtitle("Income Distribution") 

## Income Distribution I created this graph to show the spread of incomes throughout the data. As you can see, it is skewed a bit to the right as most incomes are on the median to lower end.

dtp %>%
  filter(str_detect(product_category, "MEAT")) %>%
  group_by(product_category) %>%
  ggplot(aes(x=product_category)) +
  geom_bar() +
  xlab("Product Category") +
  ggtitle("Number bought According to Type of Meat per Income") +
  facet_wrap( ~ income) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

## Number bought According to Type of Meat per Income This graph is here to show what type of meat is bought amongst certain income ranges. This provided some insight into what the purchasing crowd was for each category of meat. As you can see, the lower the income goes, the more people tend to buy frozen meat. This does provide some insight on the lower income range, but there is very minimal data when it comes to the higher income range which doesn’t provide any data to compare to.

dtp %>%
  filter(str_detect(product_category, "MEAT")) %>%
  group_by(product_category) %>%
  summarize(avg_salesval = mean(sales_value)) %>%
  ggplot(aes(x = product_category, y = avg_salesval, fill = product_category)) +
  geom_col() +
  xlab("Product Category") +
  ylab("Average Price") +
  ggtitle("Price According to Type of Meat") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=.5))

## Price According to Type of Meat In this graph, I wanted to show the price range of different types of meat. This graph will help show why there is a trend in certain types of meat.

dtp %>%
  filter(str_detect(product_category, "VEG")) %>%
  group_by(product_category) %>%
  ggplot(aes(x=product_category)) +
  geom_bar() +
  xlab("Product Category") +
  ylab("Count") +
  ggtitle("Number Bought According to Type of Vegetable per Income") +
  facet_wrap( ~ income) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

## Number bought According to Type of Vegetable per Income This graph is here to show what type of vegetable is bought amongst certain income ranges. This provided some insight into what the purchasing crowd was for each category of vegetable. As you can see, the middle and lower income families tend to buy “shelf stable” vegetables. This does provide some insight on the lower income range, but there is very minimal data when it comes to the higher income range which doesn’t provide any data to compare to.

dtp %>%
  filter(str_detect(product_category, "VEG")) %>%
  group_by(product_category) %>%
  summarize(avg_salesval = mean(sales_value)) %>%
  ggplot(aes(x = product_category, y = avg_salesval, fill = product_category)) +
  geom_col() +
  xlab("Product Category") +
  ylab("Average Price") +
  ggtitle("Price According to Type of Vegetable") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=.5))

## Price According to Type of Meat In this graph, I wanted to show the price range of different types of vegetables. This graph will help show why there is a trend in certain types of vegetables.

dtp %>%
  filter(str_detect(product_category, "MEAT | VEG")) %>%
  filter(coupon_disc > 0) %>%
  group_by(product_category) %>%
  summarize(avg_disc = mean(coupon_disc)) %>%
  ggplot(aes(x = product_category, y=avg_disc, fill = product_category)) +
  geom_col() +
  xlab("Product Category") +
  ylab("Average Coupon Discount") +
  ggtitle("Average Discount for Meat and Vegetables") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=.5))

## Average Discount for Meats and Vegetables In this graph, I wanted to see if discounts had any effect on what types of foods were purchased. As you can see, there isn’t much of a discount for any type, so I can conclude there wasn’t any effect on what types of products were purchased because of discounts.

Summary

Business Question

The buisness question I wanted to answer was how does income effect what products are purchased? Were there any significant trends?

How I addressed this

When analyzing this questions, I wanted to look for trends in the data. Using the demographics, transactions, and products data, I was able to compare income to products and evaluate the trends from there.

Insights

The most prominent insight that my analysis collected was that middle to lower income families households tend to buy products that last longer and are cheaper. Though there was a prominent trend for middle to low class, there wasn’t enough data to compare for higher class families.

Implications

I believe we can use this data to help get a more diverse purchasing spread. Instead of only the cheaper items being purchased, we can provide discount or promotions for the more expensive items such as fresh meat and vegetables. This can help the company by providing a more diverse purchasing range along with fresh food for families.

Limitations

The limitations of my analysis was providing data for higher income households. There wasn’t enough for me to analyze and I felt if I were to make any assumptions, the analysis could be incorrect.