The business problem I am trying to solve is what products are the lowest age bracket not buying and how can we reach out to this age group in order to gain more sales from this age group. The young adult age group I will be analyzing will be the 19-24 age group. Most within this group are just now starting to grocery shop for themselves for the first time. So how can we adjust our marketing to gain more sales on products that this age group would normally not invest in.
Packages Needed -
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggplot2)
library(dplyr)
library(purrr)
library(readr)
library(tibble)
get_transactions
## function (verbose = FALSE)
## {
## download_data(which = "transactions", verbose = verbose)
## }
## <bytecode: 0x000002759f058028>
## <environment: namespace:completejourney>
get_promotions()
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # ℹ 20,940,519 more rows
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
promotions, products, transactions, demographics
In order for me to properly analyze the 19-24 age group. I adjusted the data frames in order to easily see the transactions within this age group.
demo <- demographics
promotions <- get_promotions
transactions <- get_transactions()
Once I was able to get all of my data put together, I will now filter the data to only show the 19-24 age group.
transgraphics <- inner_join(demo, transactions)
## Joining with `by = join_by(household_id)`
Alltransactions <- left_join(transgraphics,products)
## Joining with `by = join_by(product_id)`
Once I was able to get all of my data put together, I will now filter the data to only show the 19-24 age group.
youngagedata <- filter(Alltransactions, age == "19-24")
youngagedata
## # A tibble: 43,913 × 24
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1147 19-24 25-34K <NA> <NA> 2
## 2 1147 19-24 25-34K <NA> <NA> 2
## 3 1147 19-24 25-34K <NA> <NA> 2
## 4 1147 19-24 25-34K <NA> <NA> 2
## 5 1147 19-24 25-34K <NA> <NA> 2
## 6 1147 19-24 25-34K <NA> <NA> 2
## 7 1147 19-24 25-34K <NA> <NA> 2
## 8 1147 19-24 25-34K <NA> <NA> 2
## 9 1147 19-24 25-34K <NA> <NA> 2
## 10 1147 19-24 25-34K <NA> <NA> 2
## # ℹ 43,903 more rows
## # ℹ 18 more variables: household_comp <ord>, kids_count <ord>, store_id <chr>,
## # basket_id <chr>, product_id <chr>, quantity <dbl>, sales_value <dbl>,
## # retail_disc <dbl>, coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, manufacturer_id <chr>, department <chr>,
## # brand <fct>, product_category <chr>, product_type <chr>, package_size <chr>
Now that the age 19-24 data has been seperated from all of the other age groups, I can now explore this data to uncover the buying habits of the young adults. I will be examining what products they purchase the most, the least, and what products we can dive into further.
First I wanted to see the variation in the amount of sales of each age. This graph shows how little the 19-24 age group is buying things within our store. This is the problem that I am going to address.
ggplot(Alltransactions, aes(x = age)) +
geom_bar() +
ggtitle("Sales Per Age")
Now that I have seen how little this age group purchases from our store, I wanted to see how much purchasing power they have.
ggplot(youngagedata, aes(x = income)) +
geom_bar() +
ggtitle("Average Income for 19-24 Year Olds")
People within the age group of 19-24 are going under a very large amount of changes in their lives. Some are attending or have graduated higher education. Some did not attend college and went straight into the workforce. Within this histogram you can see two large spikes in the count. The high count in the “Under 15k” range would more than likely be the people who have graduated high school, but do not currently have a full time job. The other spike would be the range between “35k and 74k”. This bracket would be the people who have obtained a full time job after high school, or who have graduated college and beginning their careers.
One thing to note would be the average household size of group as well.
ggplot(youngagedata, aes(x = household_size)) +
geom_bar() +
ggtitle("Average Household Size for 19-24 Year Olds")
As you can see, most are living by themselves, or with another. This means that more than likely the average spending for this group will be low as well.
ggplot(youngagedata, aes(x = quantity, y = sales_value)) +
geom_smooth()+
ggtitle("Sale Values for 19-24 Year Olds")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
This chart shows the relation between the value of the sales and the quantity that they were purchased. I will now show a graph that shows the sales value relative to the quanity purchased for all other age groups.
ggplot(Alltransactions, aes(x = quantity, y = sales_value)) +
geom_smooth()+
ggtitle("Sale Values for All Ages")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
With these two graphs, you can see that the sales values are actually closely related. This is very beneficial to know for our data because it shows that they are buying around the same values of good, just on a way lower scale. So looking at this, I think that the best way for us to improve sales within this age group is to try and increase the frequency that they come in and shop at our store.
ggplot(youngagedata, aes(sales_value, department)) +
geom_point()+
ggtitle("Sale Values per Department")
This graph shows the sales value for each purchase made by the 19-24 year old group. With our strategy of keeping the average price low, we can see Drug GM, Spirits, Miscellaneous, and Fuel are a little pricier than the other groups. These are areas that we could look into to lower prices, or provide incentives for students and younger adults. What incentives could we provide? We could give a discount on fuel to students if they show their student ID before buying gas. This will lower the average price of gas for the students, but they may feel more comfortable in buying higher amounts of gas. We could lower the prices for Drug GM products that students would mainly buy. This can be allergy pills, sleeping medicines, or health products for stress.
The 19-24 age bracket is a very important demographic to attack in the market. Trying to increase sales within this age group gives us the opportunity to build a relationship with these customers from right at the start of them beginning to grocery shop for themselves. Showing how the houshold sizes are low shows us that they are not going to need to purchase a lot of items, so we can market towards smaller quantities of items in order to get them into our store over competitors. Most do not have a high income as well. So we will also have to keep the average price of the items we want to market low as well.
Looking at the data that I have uncovered in this report, I have come to 2 final findings. I think that since the 19-24 demographic does not have a high share of total sales, we should begin to market towards them, not just in the store but as well as outside of it. I think we should have special deals for students to lower the cost of everyday items that they may purchase. This includes fuel and health products for young adults. We should also focus on marketing towards younger adults in general as well. With the average household size for our group being low, we could advertise for cheaper items that younger people could use to get started in a new chapter of their lives. Getting them into our store and letting them become familiar with our company will be pivotal in increasing sales.