The Business Problem to be Solved-

The business problem I am trying to solve is what products are the lowest age bracket not buying and how can we reach out to this age group in order to gain more sales from this age group. The young adult age group I will be analyzing will be the 19-24 age group. Most within this group are just now starting to grocery shop for themselves for the first time. So how can we adjust our marketing to gain more sales on products that this age group would normally not invest in.

Packages Needed -

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggplot2)
library(dplyr)
library(purrr)
library(readr)
library(tibble)

Data to be Loaded -

get_transactions
## function (verbose = FALSE) 
## {
##     download_data(which = "transactions", verbose = verbose)
## }
## <bytecode: 0x000002759f058028>
## <environment: namespace:completejourney>
get_promotions()
## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows
products
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
demographics
## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

promotions, products, transactions, demographics

Adjusting the data to analyze our Age Group -

In order for me to properly analyze the 19-24 age group. I adjusted the data frames in order to easily see the transactions within this age group.

demo <- demographics
promotions <- get_promotions
transactions <- get_transactions()

Filtering Data

Once I was able to get all of my data put together, I will now filter the data to only show the 19-24 age group.

transgraphics <- inner_join(demo, transactions)
## Joining with `by = join_by(household_id)`
Alltransactions <- left_join(transgraphics,products)
## Joining with `by = join_by(product_id)`

Filtering Data

Once I was able to get all of my data put together, I will now filter the data to only show the 19-24 age group.

youngagedata <- filter(Alltransactions, age == "19-24")
youngagedata
## # A tibble: 43,913 × 24
##    household_id age   income home_ownership marital_status household_size
##    <chr>        <ord> <ord>  <ord>          <ord>          <ord>         
##  1 1147         19-24 25-34K <NA>           <NA>           2             
##  2 1147         19-24 25-34K <NA>           <NA>           2             
##  3 1147         19-24 25-34K <NA>           <NA>           2             
##  4 1147         19-24 25-34K <NA>           <NA>           2             
##  5 1147         19-24 25-34K <NA>           <NA>           2             
##  6 1147         19-24 25-34K <NA>           <NA>           2             
##  7 1147         19-24 25-34K <NA>           <NA>           2             
##  8 1147         19-24 25-34K <NA>           <NA>           2             
##  9 1147         19-24 25-34K <NA>           <NA>           2             
## 10 1147         19-24 25-34K <NA>           <NA>           2             
## # ℹ 43,903 more rows
## # ℹ 18 more variables: household_comp <ord>, kids_count <ord>, store_id <chr>,
## #   basket_id <chr>, product_id <chr>, quantity <dbl>, sales_value <dbl>,
## #   retail_disc <dbl>, coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, manufacturer_id <chr>, department <chr>,
## #   brand <fct>, product_category <chr>, product_type <chr>, package_size <chr>

Exploring Data

Now that the age 19-24 data has been seperated from all of the other age groups, I can now explore this data to uncover the buying habits of the young adults. I will be examining what products they purchase the most, the least, and what products we can dive into further.

First I wanted to see the variation in the amount of sales of each age. This graph shows how little the 19-24 age group is buying things within our store. This is the problem that I am going to address.

ggplot(Alltransactions, aes(x = age)) +
     geom_bar() +
  ggtitle("Sales Per Age")

Now that I have seen how little this age group purchases from our store, I wanted to see how much purchasing power they have.

ggplot(youngagedata, aes(x = income)) +
     geom_bar() +
  ggtitle("Average Income for 19-24 Year Olds")

People within the age group of 19-24 are going under a very large amount of changes in their lives. Some are attending or have graduated higher education. Some did not attend college and went straight into the workforce. Within this histogram you can see two large spikes in the count. The high count in the “Under 15k” range would more than likely be the people who have graduated high school, but do not currently have a full time job. The other spike would be the range between “35k and 74k”. This bracket would be the people who have obtained a full time job after high school, or who have graduated college and beginning their careers.

One thing to note would be the average household size of group as well.

ggplot(youngagedata, aes(x = household_size)) +
      geom_bar() +
   ggtitle("Average Household Size for 19-24 Year Olds")

As you can see, most are living by themselves, or with another. This means that more than likely the average spending for this group will be low as well.

Transactional Data for 19-24 Year Olds

ggplot(youngagedata, aes(x = quantity, y = sales_value)) +
  geom_smooth()+
  ggtitle("Sale Values for 19-24 Year Olds")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

This chart shows the relation between the value of the sales and the quantity that they were purchased. I will now show a graph that shows the sales value relative to the quanity purchased for all other age groups.

ggplot(Alltransactions, aes(x = quantity, y = sales_value)) +
  geom_smooth()+
  ggtitle("Sale Values for All Ages")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

With these two graphs, you can see that the sales values are actually closely related. This is very beneficial to know for our data because it shows that they are buying around the same values of good, just on a way lower scale. So looking at this, I think that the best way for us to improve sales within this age group is to try and increase the frequency that they come in and shop at our store.

ggplot(youngagedata, aes(sales_value, department)) +
     geom_point()+
  ggtitle("Sale Values per Department")

This graph shows the sales value for each purchase made by the 19-24 year old group. With our strategy of keeping the average price low, we can see Drug GM, Spirits, Miscellaneous, and Fuel are a little pricier than the other groups. These are areas that we could look into to lower prices, or provide incentives for students and younger adults. What incentives could we provide? We could give a discount on fuel to students if they show their student ID before buying gas. This will lower the average price of gas for the students, but they may feel more comfortable in buying higher amounts of gas. We could lower the prices for Drug GM products that students would mainly buy. This can be allergy pills, sleeping medicines, or health products for stress.

Summary

The 19-24 age bracket is a very important demographic to attack in the market. Trying to increase sales within this age group gives us the opportunity to build a relationship with these customers from right at the start of them beginning to grocery shop for themselves. Showing how the houshold sizes are low shows us that they are not going to need to purchase a lot of items, so we can market towards smaller quantities of items in order to get them into our store over competitors. Most do not have a high income as well. So we will also have to keep the average price of the items we want to market low as well.

Final Findings

Looking at the data that I have uncovered in this report, I have come to 2 final findings. I think that since the 19-24 demographic does not have a high share of total sales, we should begin to market towards them, not just in the store but as well as outside of it. I think we should have special deals for students to lower the cost of everyday items that they may purchase. This includes fuel and health products for young adults. We should also focus on marketing towards younger adults in general as well. With the average household size for our group being low, we could advertise for cheaper items that younger people could use to get started in a new chapter of their lives. Getting them into our store and letting them become familiar with our company will be pivotal in increasing sales.