Introduction

There is an opportunity to grow the sale of paper products together rather than seperately.

How we approached our solution

By first identifying which paper products were sold the most depending on household size, we were able to first determine that both toilet tissue and paper towels & holders were the two most prominent paper products.

The Utility from our Analysis

Packages/Libraries Required

complete journey - provides access to data sets in order to perform our analysis.

tidyverse - collection of open souce packages.

lubridate - allows for the manipulation of dates and times.

dplyr - helps with data manipulation.

ggplot - allows for the creation of graphs using our data.

Exploratory Data Analysis

library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
transactions <- get_transactions()
products %>%
  filter(str_detect(product_type, regex("(TOILET TISSUE)|(PAPER NAPKINS)|(PAPER TOWELS & HOLDERS)|(BASIC PAPER PLATES)"))) %>%
  inner_join(transactions, by = 'product_id') %>%
  inner_join(demographics, by = 'household_id') %>%
  group_by(household_size, product_type) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  ggplot(aes(x = product_type, y = total_sales)) +
  geom_col() +
  facet_wrap(~ household_size) +
  coord_flip() +
  scale_y_continuous("Total Sales") +
  scale_x_discrete("Paper Products") +
  ggtitle("Which Paper Product is Bought the Most per Household?")

Graph Description

The Paper Products graph shows us the highest demanded paper based products are toilet tissue as well paper towels & holders. It is to note that although it shows that household sizes with one or two people appear to buy the most, this is due to the fact that there is more data for the household sizes in comparison to the other observations.
house_1 <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(TOILET TISSUE)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "1") %>%
  arrange(desc(total_sales))
print(house_1)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type  total_sales
##   <ord>          <ord> <chr>               <dbl>
## 1 1              Sat   TOILET TISSUE        919.
## 2 1              Sun   TOILET TISSUE        885.
## 3 1              Fri   TOILET TISSUE        863.
## 4 1              Thu   TOILET TISSUE        773.
## 5 1              Tue   TOILET TISSUE        709.
## 6 1              Mon   TOILET TISSUE        678.
## 7 1              Wed   TOILET TISSUE        583.
For households with only one occupant, the three days with the highest sale of toilet tissue are Saturday, Sunday, and Friday.
house_2 <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(TOILET TISSUE)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "2") %>%
  arrange(desc(total_sales))
print(house_2)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type  total_sales
##   <ord>          <ord> <chr>               <dbl>
## 1 2              Sun   TOILET TISSUE       1399.
## 2 2              Sat   TOILET TISSUE       1392.
## 3 2              Fri   TOILET TISSUE       1064.
## 4 2              Tue   TOILET TISSUE       1037.
## 5 2              Mon   TOILET TISSUE       1036.
## 6 2              Wed   TOILET TISSUE       1032.
## 7 2              Thu   TOILET TISSUE        880.
For households with two occupants, the three days with the highest sale of toilet tissue are Sunday, Saturday, and Friday.
house_3 <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(TOILET TISSUE)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "3") %>%
  arrange(desc(total_sales))
print(house_3)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type  total_sales
##   <ord>          <ord> <chr>               <dbl>
## 1 3              Sun   TOILET TISSUE       1038.
## 2 3              Sat   TOILET TISSUE        551.
## 3 3              Mon   TOILET TISSUE        403.
## 4 3              Tue   TOILET TISSUE        397.
## 5 3              Fri   TOILET TISSUE        360.
## 6 3              Wed   TOILET TISSUE        279.
## 7 3              Thu   TOILET TISSUE        202.
For households with three occupants, the three days with the highest sale of toilet tissue are Sunday, Saturday, and Monday.
house_4 <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(TOILET TISSUE)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "4") %>%
  arrange(desc(total_sales))
print(house_4)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type  total_sales
##   <ord>          <ord> <chr>               <dbl>
## 1 4              Sun   TOILET TISSUE        299.
## 2 4              Fri   TOILET TISSUE        240.
## 3 4              Mon   TOILET TISSUE        219.
## 4 4              Sat   TOILET TISSUE        218.
## 5 4              Tue   TOILET TISSUE        187.
## 6 4              Thu   TOILET TISSUE        181.
## 7 4              Wed   TOILET TISSUE        165.
For households with four occupants, the three days with the highest sale of toilet tissue are Sunday, Friday, and Monday.
house_5 <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(TOILET TISSUE)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "5+") %>%
  arrange(desc(total_sales))
print(house_5)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type  total_sales
##   <ord>          <ord> <chr>               <dbl>
## 1 5+             Fri   TOILET TISSUE        288.
## 2 5+             Tue   TOILET TISSUE        217.
## 3 5+             Sat   TOILET TISSUE        215.
## 4 5+             Sun   TOILET TISSUE        208.
## 5 5+             Wed   TOILET TISSUE        184.
## 6 5+             Mon   TOILET TISSUE        160.
## 7 5+             Thu   TOILET TISSUE        152.
For households with five plus occupants, the three days with the highest sale of toilet tissue are Friday, Tuesday, and Saturday.
house_a <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(PAPER TOWELS & HOLDERS)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "1") %>%
  arrange(desc(total_sales))
print(house_a)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type           total_sales
##   <ord>          <ord> <chr>                        <dbl>
## 1 1              Sun   PAPER TOWELS & HOLDERS        472.
## 2 1              Sat   PAPER TOWELS & HOLDERS        464.
## 3 1              Thu   PAPER TOWELS & HOLDERS        455.
## 4 1              Fri   PAPER TOWELS & HOLDERS        381.
## 5 1              Mon   PAPER TOWELS & HOLDERS        351.
## 6 1              Wed   PAPER TOWELS & HOLDERS        349.
## 7 1              Tue   PAPER TOWELS & HOLDERS        319.
For households with only one occupant, the three days with the highest sale of paper towels & holders are Sunday, Saturday, and Friday.
house_b <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(PAPER TOWELS & HOLDERS)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "2") %>%
  arrange(desc(total_sales))
print(house_b)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type           total_sales
##   <ord>          <ord> <chr>                        <dbl>
## 1 2              Sun   PAPER TOWELS & HOLDERS        796.
## 2 2              Sat   PAPER TOWELS & HOLDERS        730.
## 3 2              Tue   PAPER TOWELS & HOLDERS        686.
## 4 2              Fri   PAPER TOWELS & HOLDERS        577.
## 5 2              Mon   PAPER TOWELS & HOLDERS        564.
## 6 2              Thu   PAPER TOWELS & HOLDERS        524.
## 7 2              Wed   PAPER TOWELS & HOLDERS        520.
For households with two occupants, the three days with the highest sale of paper towels & holders are Sunday, Saturday, and Tuesday.
house_c <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(PAPER TOWELS & HOLDERS)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "3") %>%
  arrange(desc(total_sales))
print(house_c)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type           total_sales
##   <ord>          <ord> <chr>                        <dbl>
## 1 3              Sun   PAPER TOWELS & HOLDERS        546.
## 2 3              Sat   PAPER TOWELS & HOLDERS        280.
## 3 3              Tue   PAPER TOWELS & HOLDERS        237.
## 4 3              Mon   PAPER TOWELS & HOLDERS        217.
## 5 3              Wed   PAPER TOWELS & HOLDERS        189.
## 6 3              Fri   PAPER TOWELS & HOLDERS        182.
## 7 3              Thu   PAPER TOWELS & HOLDERS        151.
For households with three occupants, the three days with the highest sale of paper towels & holders are Sunday, Saturday, and Tuesday.
house_d <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(PAPER TOWELS & HOLDERS)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "4") %>%
  arrange(desc(total_sales))
print(house_d)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type           total_sales
##   <ord>          <ord> <chr>                        <dbl>
## 1 4              Sun   PAPER TOWELS & HOLDERS       184. 
## 2 4              Fri   PAPER TOWELS & HOLDERS       122. 
## 3 4              Mon   PAPER TOWELS & HOLDERS       115. 
## 4 4              Wed   PAPER TOWELS & HOLDERS       110. 
## 5 4              Tue   PAPER TOWELS & HOLDERS       101. 
## 6 4              Thu   PAPER TOWELS & HOLDERS        97.1
## 7 4              Sat   PAPER TOWELS & HOLDERS        81.6
For households with four occupants, the three days with the highest sale of paper towels & holders are Sunday, Friday, and Monday.
house_e <- transactions %>%
  mutate(day = wday(transaction_timestamp, label = TRUE)) %>%
  inner_join(products, by = 'product_id') %>%
  inner_join(demographics, by = "household_id") %>%
  filter(str_detect(product_type, regex("(PAPER TOWELS & HOLDERS)"))) %>%
  group_by(household_size, day, product_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  filter(household_size == "5+") %>%
  arrange(desc(total_sales))
print(house_e)
## # A tibble: 7 × 4
## # Groups:   household_size, day [7]
##   household_size day   product_type           total_sales
##   <ord>          <ord> <chr>                        <dbl>
## 1 5+             Sun   PAPER TOWELS & HOLDERS        194.
## 2 5+             Tue   PAPER TOWELS & HOLDERS        183.
## 3 5+             Sat   PAPER TOWELS & HOLDERS        175.
## 4 5+             Thu   PAPER TOWELS & HOLDERS        154.
## 5 5+             Wed   PAPER TOWELS & HOLDERS        137.
## 6 5+             Mon   PAPER TOWELS & HOLDERS        136.
## 7 5+             Fri   PAPER TOWELS & HOLDERS        134.
For households with five plus occupants, the three days with the highest sale of paper towels & holders are Sunday, Tuesday, and Saturday.

Summarize the Problem Statement Addressed

Through our analysis, we have discovered a common theme amongst purchasing toilet tissue and paper towels & holders on weekends rather than weekdays.

Summarize how you Addressed this Problem Statement

By analyzing buying habits of different household sizes as well as the date in which these paper products were bought, we were able to discover the similiar buying habits.

Summarize the Interesting Insights that you Analysis Provided

There was a large seperation between the top two paper based products (toilet tissue & paper towels & holders) opposed to the two other paper based products (paper napkins & basic paper plates).
There was also a large seperation between the amount purchased specifically on both Wednesday and Thursday, which failed to reach the top three most common days for all household sizes and both paper products.

Summarize the Implications to the Consumer and give a Reccommendation

We recommend a coupon to be created in which money is saved for the consumer, if they were to buy both toilet tissue and paper towels & holders together. We recommend that this coupon be delivered to consumers and reoccurring patrons specifically on a Sunday, then give the coupon a five day expiration date. We believe this is the best choice because a majority of consumers tend to do their weekly shopping on Sundays. With the expiration date, this would incentivize them to come back once throughout the week in order to redeem the coupon. This could induce the Diderot Effect, making for more purchases as consumers buy more then they had initially intended for throughout the week.

Limitations

The largest implicaiton of our analysis lies in the quantity of observations for mulitple household sizes, forbidding us to make the conclusion that specific household sizes buy more of a product compared to others.
Another implication within our analysis is the fact that consumers can buy their paper based products at a different location for instance a store such as Costco, which historically has had better prices for these types of products, especially for bulk purchases.