3.DATA PREPARATION
#Importing data sets needed for analysis
transactions <- get_transactions()
products <- products
demographics <- demographics
#Joining, mutating and cleaning data sets
tran_prod_dem <- transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
mutate(Transaction_date = date(transaction_timestamp))
glimpse(tran_prod_dem)
## Rows: 825,622
## Columns: 25
## $ household_id <chr> "900", "900", "1228", "906", "906", "906", "906"…
## $ store_id <chr> "330", "330", "406", "319", "319", "319", "319",…
## $ basket_id <chr> "31198570044", "31198570047", "31198655051", "31…
## $ product_id <chr> "1095275", "9878513", "1041453", "1020156", "105…
## $ quantity <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ sales_value <dbl> 0.50, 0.99, 1.43, 1.50, 2.78, 5.49, 1.50, 1.00, …
## $ retail_disc <dbl> 0.00, 0.10, 0.15, 0.29, 0.80, 0.50, 0.29, 0.29, …
## $ coupon_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ coupon_match_disc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ week <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ transaction_timestamp <dttm> 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-…
## $ manufacturer_id <chr> "2", "69", "69", "2142", "2326", "608", "2326", …
## $ department <chr> "PASTRY", "GROCERY", "GROCERY", "GROCERY", "GROC…
## $ brand <fct> National, Private, Private, National, National, …
## $ product_category <chr> "ROLLS", "FACIAL TISS/DNR NAPKIN", "BAG SNACKS",…
## $ product_type <chr> "ROLLS: BAGELS", "FACIAL TISSUE & PAPER HANDKE",…
## $ package_size <chr> "4 OZ", "85 CT", "11.5 OZ", "17.1 OZ", "5.0 OZ",…
## $ age <ord> 35-44, 35-44, 45-54, 55-64, 55-64, 55-64, 55-64,…
## $ income <ord> 35-49K, 35-49K, 100-124K, Under 15K, Under 15K, …
## $ home_ownership <ord> Homeowner, Homeowner, NA, Homeowner, Homeowner, …
## $ marital_status <ord> Married, Married, Unmarried, Married, Married, M…
## $ household_size <ord> 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ household_comp <ord> 2 Adults No Kids, 2 Adults No Kids, 1 Adult No K…
## $ kids_count <ord> 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Transaction_date <date> 2017-01-01, 2017-01-01, 2017-01-01, 2017-01-01,…
5.SUMMARY
Problem:
The goal of analyzing the complete journey data was to evaluate how
top sales values and specific demographics could be used to identify
purchasing behaviors and trends, helping to determine the best products
to invest in and the optimal timing for those investments.
Addressing the Problem:
We tackled this issue by leveraging the transaction, demographic, and
product data to create visual representations that provided the
necessary insights. We generated graphs comparing total family sales
based on income ranges and the number of children, identifying which
products had the highest sales values for families with five or more
members, and displaying the sales values of the top three products each
month. Additionally, we created graphs showing the total monthly sales
for 2017, the sales values by department, and whether the brand was
national or private. To further visualize the data, we used word clouds
to highlight the frequency of top departments and transactions. By
analyzing these graphs, we gained valuable insights into the
relationships between demographics and purchasing behaviors.
Insights:
We discovered that the highest-income families were not the ones
contributing the most to total sales, which was unexpected. Through our
analysis of sales value by department, we identified several departments
with relatively low sales, and noted that most departments had a
significant number of customers purchasing private brands. One standout
finding was that “Fluid White Milk” emerged as a top-selling product for
families with five or more members, ranking among the highest-selling
products overall and maintaining consistent demand throughout the year.
Another insight was the steady rate of total sales throughout 2017,
including the summer months, followed by a sharp decline to below $4,000
between November and December, before spiking to nearly $16,000 at the
year’s end. Our Word Cloud analysis also revealed that the Grocery
department had the highest number of transactions, with other
departments trailing by about 5%, making it clear that Grocery would be
a strong area for future investment.
Implications:
From these insights, we concluded that while higher-income families
might be expected to account for a larger portion of total sales, the
number of children or overall family size plays a more significant role.
Based on our findings, it would be beneficial to invest in departments
with higher sales values, as indicated in the “Sales Value per
Department,” “Top 3 Products,” and “Sales for Households over 5”
analyses. Departments such as Grocery, Drug GM, Miscellaneous, and Fuel
had higher sales value and a wider variety of products purchased by a
broader range of customers.
Additionally, the “Total Sales of All Products per Month” graph
suggests that the optimal time to invest is toward the end of the year
when sales spike, likely due to holiday shopping. This makes the holiday
season a key period for maximizing returns on investment.
We recommend leveraging the strong sales performance of top products
and departments to create targeted marketing strategies aimed at larger
households, families with more children, and specific income brackets.
Promotions that focus on these demographics could drive further
engagement. Furthermore, it may be worthwhile to address departments
with lower sales by developing strategies to attract these demographics,
thereby increasing overall sales in those areas.
Limitations of our analysis:
For future analysis, it would be valuable to explore which
departments have the highest purchase rates across different demographic
segments. Additionally, gathering more detailed statistics on age
demographics could provide a clearer understanding of the age groups
involved, enabling more effective and tailored marketing strategies. Our
key area for improvement would be to delve deeper into specific
demographic factors. While we did examine a few demographic
relationships with sales values, a more comprehensive analysis of other
demographic attributes would offer a more nuanced understanding of
customer behavior and purchasing trends.