Final Project

suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(ggplot2)))

Objective

I’d like to show you ways to increase fruit sales at Regork as fruit is a seasonal product that has a best time to eat and a worst time to eat. By looking into customers buying habits and advertisement strategy, Regork could eliminate potential food waste of fruit products through advertisement to potential customers.

# Loading Promotions and Transactions
promotions <- get_promotions()
transactions <- get_transactions() 

# Joining transactions with products and demographics
transactions <- transactions %>%
  inner_join(products) %>%
  inner_join(demographics)
## Joining with `by = join_by(product_id)`
## Joining with `by = join_by(household_id)`
# Plotting fruit sales based on product category
transactions %>%
  filter(str_detect(product_category, regex("fruit", ignore_case = TRUE))) %>%
  group_by(product_category) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = product_category, y = total_sales)) +
  geom_col(fill = "green") +
  coord_flip() +
  ggtitle("Sales of Each Fruit Product Category Purchased") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Fruits Product Category")

Taking Product_Category into Account

As shown in the chart above, Tropical and Shelf Stable are the top product categories in terms of fruit sales. However, as we are looking at ways to utilize advertisement in order to reduce the amount of fruit that perishes on Regork shelves it is best to not look at Frozen, shelf stable, or dried fruits as these products are less likely to perish.

# Creating a filtered fruit list for only perishable fruits
# This will be used for the rest of the study
fruit_transactions <- transactions %>%
    filter(
    str_detect(product_category, regex("fruit", ignore_case = TRUE)),
    !str_detect(product_category, regex("frzn fruits", ignore_case = TRUE)),
    !str_detect(product_category, regex("dried fruit", ignore_case = TRUE)),
    !str_detect(product_category, regex("fruit - shelf stable", ignore_case = TRUE))
    )

# Plotting monthly fruit sales
fruit_transactions %>%
  group_by(Month = month(transaction_timestamp, label = TRUE)) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = Month, y = total_sales)) +
  geom_point() +
  ggtitle("Perishable Fruit Sales by Month") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Month") + 
  geom_segment(aes(x = Month, xend = Month, y = 0, yend = total_sales), size = 0.15)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Total fruit sales value by month

As shown in the chart above, perishable fruit sales rise greatly in summer months when compared to any other month. This makes sense as many fruits are at their peak in the summer, but it does mean that fall fruits like apples and citrus or fruits that start can start their season in spring like cherries might not be bought for as long as they could be. From this data, we could interpret that we could potentially work on increasing Regork’s perishable fruit sales for fruit products that are best outside of summer months. Now, we will look at perishable fruit sales by day of week.

# Plotting fruit sales by week day
fruit_transactions %>%
  group_by(day_of_week = wday(transaction_timestamp, label = TRUE)) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = day_of_week, y = total_sales)) +
  geom_col(fill = "orange") +
  geom_text(aes(label = scales::dollar(total_sales)), color = "black", size = 4) +
  ggtitle("Perishable Fruit Sales by Weekday") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Weekday")

# Plotting quantity of fruit purchased by week day
fruit_transactions %>%
  group_by(day_of_week = wday(transaction_timestamp, label = TRUE)) %>%
  summarize(total_quantity = sum(quantity)) %>%
  ggplot(aes(x = day_of_week, y = total_quantity)) +
  geom_col(fill = "orange") +
  geom_text(aes(label = total_quantity), color = "black", size = 4) +
  ggtitle("Quantity of Perishable Fruit Sales by Weekday") +
  scale_y_continuous(name = "Quantity") +
  scale_x_discrete(name = "Weekday")

Total fruit sales value by day of week

As shown in the charts above, the expectation that people are shopping on the weekends is very true as can be seen by a majority of fruit sales being earned on Saturday and Sunday. Sunday looks to be the highest due to quantity of fruits bought in addition to price.

# Plotting fruit sales by marital status
fruit_transactions %>%
  group_by(marital_status, product_category) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = marital_status, y = total_sales, fill = product_category)) +
  geom_col() +
  ggtitle("Perishable Fruit Sales by Marital Status") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Marital Status")
## `summarise()` has grouped output by 'marital_status'. You can override using
## the `.groups` argument.

# Plotting fruit sales by age
fruit_transactions %>%
  group_by(age, product_category) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = age, y = total_sales, fill = product_category)) +
  geom_col() +
  ggtitle("Perishable Fruit Sales by Age Range") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Age Ranges")
## `summarise()` has grouped output by 'age'. You can override using the `.groups`
## argument.

# Plotting fruit sales by income
fruit_transactions %>%
  group_by(income, product_category) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = income, y = total_sales, fill = product_category)) +
  geom_col() +
  coord_flip() +
  ggtitle("Perishable Fruit Sales by Income Range") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Income Ranges")
## `summarise()` has grouped output by 'income'. You can override using the
## `.groups` argument.

Demographics Data

From the above charts, we can are able to learn a lot about the demographics of the people buying fruit. It seems that the most likely person to be buying perishable fruit at Regork is a married person between the ages of 45 and 54 who is making 50-74 thousand dollars. As the first chart looking at the total sales of each fruit product category, it is not a surprise that the most common perishable fruit being bought is tropical fruit.

# Plotting fruit sales by in-store location
fruit_transactions %>%
  inner_join(promotions) %>%
  group_by(display_location) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = display_location, y = total_sales)) +
  geom_col(fill = "blue") +
  ggtitle("Sales of Perishable Fruit by Display Location") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Display Location", labels = c("0" = "No Display", 
                                                         "1" = "Store Front",
                                                         "2" = "Store Rear",
                                                         "3" = "Front End Cap",
                                                         "4" = "Mid-Aisle End Cap",
                                                         "5" = "Rear End Cap",
                                                         "6" = "Side Aisle End Cap",
                                                         "7" = "In-Aisle",
                                                         "9" = "Secondary Location",
                                                         "A" = "In-Shelf"))
## Joining with `by = join_by(store_id, product_id, week)`

Fruit sales value by display location

Looking at where perishable fruits are displayed in-store, it seems most perishable fruit is not on prominent display locations. This means that to improve fruit sales, it seems that location in the store does not play a big factor as customers already know where to find fruit and are not persuaded by displays when buying perishable fruits.

# Plotting fruit sales by mailer location
fruit_transactions %>%
  inner_join(promotions) %>%
  group_by(mailer_location) %>%
  summarize(total_sales = sum(sales_value, retail_disc, coupon_disc, coupon_match_disc)) %>%
  ggplot(aes(x = mailer_location, y = total_sales)) +
  geom_col(fill = "blue") +
  ggtitle("Sales of Perishable Fruit by Mailer Location") +
  scale_y_continuous(name = "Sales", labels = scales::dollar) +
  scale_x_discrete(name = "Mailer Location", labels = c("0" = "Not on ad", 
                                                        "A" = "Interior Page Feature",
                                                        "C" = "Interior Page Line Item",
                                                        "D" = "Front Page Feature",
                                                        "F" = "Back Page Feature",
                                                        "H" = "Wrap Front Feature",
                                                        "J" = "Wrap Interior Coupon",
                                                        "L" = "Wrap Back Feature",
                                                        "P" = "Interior Page Coupon",
                                                        "X" = "Free on Interior Page",
                                                        "Z" = "Free on Front Page, Back Page, or Wrap"))
## Joining with `by = join_by(store_id, product_id, week)`

Sales value by mailer location

It seems most perishable fruit sales that were located in the mailer have an advertisement on a ‘Back Page Feature’, with a second grouping of sales having an advertisement on a ‘Front Page Feature’. This means that to improve perishable fruit sales, not only does advertisement in mailers work but a back or front page feature works very well for perishable fruit products.

Conclusion

Looking at the information from the data we have gathered regarding Regork’s perishable fruit sales, we can see some areas that might improve perishable fruit sales: - As seen by looking at fruit sales by product category, the largest portion of fruit products sold is shelf stable fruits, with the second highest being tropical. This is great for not throwing bad fruit out, but it means we could be throwing out a lot of tropical fruit. - From sales by month, we can see summer months have very high perishable fruit sales, with this trend falling off in October. As the buying season for perishable fruits does not start until June, this means that we could advertise citrus fruits and other perishable fruits like cherries outside of the summer months. These advertisements would hopefully drive up fruit sales and decrease the amount of wasted product. - From looking at the demographics of people who purchase perishable fruit, our target demographic seems to be married people who are between 45 and 54 that are making 50-74K. - Location in the store does not seem to have a positive affect on people purchasing perishable fruit, so keeping fruit in its traditional location is best. However the location of advertisements in mailers does seem to have an affect, so fruit advertisements should be put in a front or back page feature if we are looking to increase sales.