Introduction



We have been tasked with identifying and analyzing potential growth opportunities where Regork can invest in order to increase revenue and profits. Companies like Regork are always looking to make an impact and to win the “hearts and minds” of the consumer. As a grocer, sometimes this can be difficult when you deal with commodities. How can you gain the attention of the consumer and get them into the store?

Do the targeted campaigns generate a greater amount of customer engagement?

Regork has much of the data to start with and paint a picture of the consumer. With this data, we can see that the targeted Type A campaigns have a good effect on the larger families of 5+, but not so much on the smaller families. Utilizing exploratory data analysis (EDA) techniques and visualizations will allow us to identify potential solutions for increasing customer engagement through campaigns, while penetrating larger household sizes.

Packages Required


The following R packages are required in order to run the code in this R project:

library(completejourney)          # grocery store shopping transactions data from group of 2,469 households
library(corrplot)                 # visualization of a correlation matrix
library(dplyr)                    # manipulating and transforming data (i.e., filtering, joining, etc.)
library(forcats)                  # working with categorical variables or factors
library(GGally)                   # plotting system for R based on "Grammar of Graphics"; extension to "ggplot2"
library(ggrepel)                  # position and repel overlapping text labels with "ggplot2"
library(ggplot2)                  # data visualization plotting system using "Grammar of Graphics"
library(ggthemes)                 # additional plotting themes, scales, and geoms for "ggplot2"
library(grid)                     # grid graphics plotting functions and capabilities
library(gridExtra)                # additional grid capability functions for "grid"
library(jpeg)                     # reading and writing JPEG images
library(knitr)                    # dynamic report generation in R
library(lubridate)                # functions used for working with dates and times
library(naniar)                   # find data quality issues; summarize, visualize, and manipulate missing data
library(png)                      # reading and writing PNG images
library(readr)                    # reading rectangular text data
library(stringr)                  # manipulation text
library(tidyr)                    # functions used for tidying or cleaning up messy data
library(tidyverse)                # tidying data and working with other R packages
library(treemapify)               # plotting treemaps in "ggplot2"

Data Preparation


The data for this R project can be accessed from the CompleteJourney website. The CompleteJourney datasets are based on grocery shopping transactions from a group of 2,469 households. Entities such as demographics, products, coupons, campaigns, etc., were collected over a one-year timeframe from January 2017 - December 2017.


Let’s Begin

The first task in the data preparation process is to get the full transactions and promotions datasets. Both datasets and the remaining CompleteJourney datasets were previewed in a table-like format called “tibble.”

# get the completejourney - transactions dataset
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
##    house…¹ store…² baske…³ produ…⁴ quant…⁵ sales…⁶ retai…⁷ coupo…⁸ coupo…⁹  week
##    <chr>   <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <int>
##  1 900     330     311985… 1095275       1    0.5     0          0       0     1
##  2 900     330     311985… 9878513       1    0.99    0.1        0       0     1
##  3 1228    406     311986… 1041453       1    1.43    0.15       0       0     1
##  4 906     319     311987… 1020156       1    1.5     0.29       0       0     1
##  5 906     319     311987… 1053875       2    2.78    0.8        0       0     1
##  6 906     319     311987… 1060312       1    5.49    0.5        0       0     1
##  7 906     319     311987… 1075313       1    1.5     0.29       0       0     1
##  8 1058    381     311986… 985893        1    1.88    0.21       0       0     1
##  9 1058    381     311986… 988791        1    1.5     1.29       0       0     1
## 10 1058    381     311986… 9297106       1    2.69    0          0       0     1
## # … with 1,469,297 more rows, 1 more variable: transaction_timestamp <dttm>,
## #   and abbreviated variable names ¹​household_id, ²​store_id, ³​basket_id,
## #   ⁴​product_id, ⁵​quantity, ⁶​sales_value, ⁷​retail_disc, ⁸​coupon_disc,
## #   ⁹​coupon_match_disc
# get the completejourney - promotions dataset
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # … with 20,940,519 more rows
campaigns
## # A tibble: 6,589 × 2
##    campaign_id household_id
##    <chr>       <chr>       
##  1 1           105         
##  2 1           1238        
##  3 1           1258        
##  4 1           1483        
##  5 1           2200        
##  6 1           293         
##  7 1           529         
##  8 1           536         
##  9 1           568         
## 10 1           630         
## # … with 6,579 more rows
campaign_descriptions
## # A tibble: 27 × 4
##    campaign_id campaign_type start_date end_date  
##    <chr>       <ord>         <date>     <date>    
##  1 1           Type B        2017-03-03 2017-04-09
##  2 2           Type B        2017-03-08 2017-04-09
##  3 3           Type C        2017-03-13 2017-05-08
##  4 4           Type B        2017-03-29 2017-04-30
##  5 5           Type B        2017-04-03 2017-05-07
##  6 6           Type C        2017-04-19 2017-05-21
##  7 7           Type B        2017-04-24 2017-05-28
##  8 8           Type A        2017-05-08 2017-06-25
##  9 9           Type B        2017-05-31 2017-07-02
## 10 10          Type B        2017-06-28 2017-07-30
## # … with 17 more rows
coupons
## # A tibble: 116,204 × 3
##    coupon_upc  product_id campaign_id
##    <chr>       <chr>      <chr>      
##  1 10000085207 9676830    26         
##  2 10000085207 9676943    26         
##  3 10000085207 9676944    26         
##  4 10000085207 9676947    26         
##  5 10000085207 9677008    26         
##  6 10000085207 9677052    26         
##  7 10000085207 9677385    26         
##  8 10000085207 9677479    26         
##  9 10000085207 9677791    26         
## 10 10000085207 9677878    26         
## # … with 116,194 more rows
coupon_redemptions
## # A tibble: 2,102 × 4
##    household_id coupon_upc  campaign_id redemption_date
##    <chr>        <chr>       <chr>       <date>         
##  1 1029         51380041013 26          2017-01-01     
##  2 1029         51380041313 26          2017-01-01     
##  3 165          53377610033 26          2017-01-03     
##  4 712          51380041013 26          2017-01-07     
##  5 712          54300016033 26          2017-01-07     
##  6 2488         51200092776 26          2017-01-10     
##  7 2488         51410010050 26          2017-01-10     
##  8 1923         53000012033 26          2017-01-14     
##  9 1923         54300021057 26          2017-01-14     
## 10 1923         57047091041 26          2017-01-14     
## # … with 2,092 more rows
demographics
## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital…¹ house…² house…³ kids_…⁴
##    <chr>        <ord> <ord>     <ord>          <ord>     <ord>   <ord>   <ord>  
##  1 1            65+   35-49K    Homeowner      Married   2       2 Adul… 0      
##  2 1001         45-54 50-74K    Homeowner      Unmarried 1       1 Adul… 0      
##  3 1003         35-44 25-34K    <NA>           Unmarried 1       1 Adul… 0      
##  4 1004         25-34 15-24K    <NA>           Unmarried 1       1 Adul… 0      
##  5 101          45-54 Under 15K Homeowner      Married   4       2 Adul… 2      
##  6 1012         35-44 35-49K    <NA>           Married   5+      2 Adul… 3+     
##  7 1014         45-54 15-24K    <NA>           Married   4       2 Adul… 2      
##  8 1015         45-54 50-74K    Homeowner      Unmarried 1       1 Adul… 0      
##  9 1018         45-54 35-49K    Homeowner      Married   5+      2 Adul… 3+     
## 10 1020         45-54 25-34K    Homeowner      Married   2       2 Adul… 0      
## # … with 791 more rows, and abbreviated variable names ¹​marital_status,
## #   ²​household_size, ³​household_comp, ⁴​kids_count
products
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand    product_c…¹ produ…² packa…³
##    <chr>      <chr>           <chr>         <fct>    <chr>       <chr>   <chr>  
##  1 25671      2               GROCERY       National FRZN ICE    ICE - … 22 LB  
##  2 26081      2               MISCELLANEOUS National <NA>        <NA>    <NA>   
##  3 26093      69              PASTRY        Private  BREAD       BREAD:… <NA>   
##  4 26190      69              GROCERY       Private  FRUIT - SH… APPLE … 50 OZ  
##  5 26355      69              GROCERY       Private  COOKIES/CO… SPECIA… 14 OZ  
##  6 26426      69              GROCERY       Private  SPICES & E… SPICES… 2.5 OZ 
##  7 26540      69              GROCERY       Private  COOKIES/CO… TRAY P… 16 OZ  
##  8 26601      69              DRUG GM       Private  VITAMINS    VITAMI… 300 CT…
##  9 26636      69              PASTRY        Private  BREAKFAST … SW GDS… <NA>   
## 10 26691      16              GROCERY       Private  PNT BTR/JE… HONEY   12 OZ  
## # … with 92,321 more rows, and abbreviated variable names ¹​product_category,
## #   ²​product_type, ³​package_size


Dataframes

This section of the data preparation combines all the specified dataframes that are used with the visualizations in this R project. References to “na.rm = TRUE” have been incorporated to remove “NA” values from the calculations or summarization of variables.

filtered_trans <- demographics %>%
  left_join(transactions) %>%
  mutate(transaction_quarter = quarter(transaction_timestamp, type = "year.quarter"))

full_view <- filtered_trans %>%
  inner_join(products, key = "product_id") %>%
  inner_join(coupon_redemptions) %>%
  inner_join(coupons) %>%
  inner_join(campaign_descriptions) %>%
  inner_join(campaigns)

savings_view <- full_view %>%
  dplyr::mutate(coupon_savings = (coupon_disc + coupon_match_disc)) %>%
  dplyr::mutate(customer_amount = (sales_value - (coupon_savings))) %>%
#  dplyr::mutate(campaign_id = as.integer(campaign_id)) %>%
  group_by(transaction_quarter, week, household_id, household_size, campaign_id, campaign_type, age, income, marital_status, quantity, sales_value) %>%
  summarise(total_saving = sum(coupon_savings))

spend_view <- full_view %>%
  dplyr::mutate(coupon_savings = (coupon_disc + coupon_match_disc)) %>%
  dplyr::mutate(customer_amount = (sales_value - (coupon_savings))) %>%
# dplyr::mutate(campaign_id = as.integer(campaign_id)) %>%
  group_by(transaction_quarter, week, household_id, household_size, campaign_id, campaign_type, age, income, marital_status) %>%
  summarise(total_spend = sum(customer_amount))

avg_spend <- demographics %>%
  inner_join(transactions) %>%
  inner_join(campaigns) %>%
  inner_join(campaign_descriptions) %>%
  group_by(household_id) %>%
  summarise(basket = sum(sales_value , na.rm = TRUE), campaign_type, household_size) %>%
  group_by(household_size, campaign_type) %>%
  summarise(spending = mean(basket))

household_percent <- full_view %>% 
  count(campaign_type, household_size) %>%
  group_by(household_size) %>%
  mutate(pct = n / sum(n))

sales_campaign <- campaigns %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(campaign_descriptions, by = "campaign_id")

coupon_redeem <- coupon_redemptions %>%
  inner_join(demographics) %>%
  group_by(household_size)

# coupon_redeem %>% map_int(n_distinct)

Exploratory Data Analysis


The Exploratory Data Analysis (EDA) section outlines the visualizations that are used in the R project to address the problem statement and/or business question. Formats such as bar charts, line charts, scatter plots, correlation matrix, etc., are used as different styles for the visualizations.


Let’s Explore

We began our analysis by looking at the three types of campaigns: Type A, Type B, and Type C. Type A was the only targeted campaign of the three where coupons were sent to customers based on purchase history. As shown in the graph below, the total amount of campaigns received was analyzed based on household size and subset by campaign type. The data indicates that for each household size, over half of the coupons received were Type A. The second largest amount received was Type B in each category, and Type C campaigns were subsequently the least disbursed. The data indicates that campaign types were disbursed in a similar ratio across household size. Our analysis will determine whether the number of Type A campaigns received should increase or decrease based on consumer reports.

# campaigns received based on household size by campaign type

campaigns %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(campaign_descriptions, by = "campaign_id") %>%
  group_by(household_id, campaign_id) %>%
  ggplot(aes(x = household_size, fill = campaign_type)) +
  geom_bar() +
  labs(
    fill = "Campaign Type",
    x = "Household Size",
    y = "Campaigns Received",
    title = "Campaigns Received by Household Size",
    subtitle = "Data represents the number of campaigns received based on campaign type by household size.\nThe top campaign type is A with its highest count being received within a household size of 2.",
    caption = "Data Source R Package: completejourney" 
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  ) 


We decided to take our findings a little more in depth and analyze the specific coupon redemption by household. Our graph shows that households of 1 or 2 people redeemed more coupons than households of 3 or more.

# total coupon redemption by household size

ggplot(coupon_redeem, aes(x = household_size, fill = household_size)) +
  geom_bar(stat = "count") +
  labs(
    fill = "Household Size",
    title = "Total Coupon Redemption Volume by Household Size",
    subtitle = "Data represents total coupon redemption from January 1 - December 31, 2017.",
    x = "Household Size",
    y = "Total Coupon Redemption Volume",
    caption = "Data Source R Package: completejourney"
    )  +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )


The following are “tibbles” that display the counts of campaigns received based on household size and campaign type. The “Groups: household_size” tibble details the number of participants grouped by household size. It is important to note that a household size of two is the largest with a count of 1,482 households, while the lowest household size is 4 with a count of 300 households.

We proceeded to review the “Groups: campaign_type” tibble to observe study participants based on campaign type received. 52% of campaigns disbursed were Type A, 38% were Type B, and only 9% of campaigns received were Type C. These numbers tell us that a higher value is placed on coupons that are targeted to customer purchase history.

# count of campaigns received based on household size
campaigns %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(campaign_descriptions, by = "campaign_id") %>%
  group_by(household_size) %>%
  count(household_size)
## # A tibble: 5 × 2
## # Groups:   household_size [5]
##   household_size     n
##   <ord>          <int>
## 1 1               1155
## 2 2               1482
## 3 3                576
## 4 4                300
## 5 5+               322
# count of campaigns by campaign type
campaigns %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(campaign_descriptions, by = "campaign_id") %>%
  group_by(campaign_type) %>%
  count(campaign_type)
## # A tibble: 3 × 2
## # Groups:   campaign_type [3]
##   campaign_type     n
##   <ord>         <int>
## 1 Type A         2007
## 2 Type B         1466
## 3 Type C          362
# count of campaigns received by household size based on campaign type
campaigns %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(campaign_descriptions, by = "campaign_id") %>%
  group_by(household_size) %>%
  count(campaign_type)
## # A tibble: 15 × 3
## # Groups:   household_size [5]
##    household_size campaign_type     n
##    <ord>          <ord>         <int>
##  1 1              Type A          647
##  2 1              Type B          403
##  3 1              Type C          105
##  4 2              Type A          802
##  5 2              Type B          558
##  6 2              Type C          122
##  7 3              Type A          280
##  8 3              Type B          228
##  9 3              Type C           68
## 10 4              Type A          126
## 11 4              Type B          139
## 12 4              Type C           35
## 13 5+             Type A          152
## 14 5+             Type B          138
## 15 5+             Type C           32


The next graph shows the average spending per observation by household size, and we subset it by campaign type. We decided to take the average spending to make sure the data was not skewed and to help depict a more accurate graph showing the similarities and differences of spending for each campaign type. As we can see by the graph, each housing size has slightly different habits towards which campaign is redeemed the most.

As we can see from the graph, households of all sizes were strong users of Type A campaigns, especially in household sizes 1-4, where we can see that Type A was showed less than or equal spending compared to Types B and C. Sales Value in our data table includes any discounts calculated in the transaction, so we can justify observe smaller averages as more coupon usage.


We also wanted to confirm the seasonality of the campaigns to make sure there were no issues with timing of the coupons. Our seasonality graph shows that Type A is effective year-round, which is another positive factor showing that the targeted campaigns are effective.

# products sales per month by campaign type; top 10 coupons

products %>%
  inner_join(transactions, by = "product_id") %>%
  inner_join(coupons, by = "product_id") %>%
  inner_join(campaigns, by = c("household_id", "campaign_id")) %>%
  inner_join(campaign_descriptions, by = "campaign_id") %>%
  mutate(product_category = str_replace_all(product_category, pattern = "FRZN", replacement = "FROZEN")) %>%
  mutate(sales_month = month(transaction_timestamp, label = TRUE)) %>%
  group_by(sales_month, coupon_upc, campaign_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales)) %>%
  slice(1:10) %>%
  ggplot(aes(x = sales_month, y = total_sales, color = campaign_type)) +
  geom_jitter() +
  guides(color = guide_legend(title = "Campaign Type")) +
  scale_y_continuous(name = "Total Sales", labels = scales::dollar) +
  labs(
    x = "Month",
    title = "Total Product Sales Per Month",
    subtitle = "Data represents the total sales based on the top 10 coupons.\nCampaign Type A displays the highest product sales per month..",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )
## `summarise()` has grouped output by 'sales_month', 'coupon_upc'. You can
## override using the `.groups` argument.


* this needs a write-up… (Emmett)

# spending by household size by week; filtering outlier

spend_view %>%  #line view
  filter(household_id != 1228) %>%
  arrange(campaign_type) %>%
  ggplot(aes(y = total_spend, x = week, color = campaign_type)) +
  geom_line(data = spend_view, aes(week, total_spend)) +
  guides(color = guide_legend(title = "Campaign Type")) +
  scale_y_continuous(labels = scales::dollar) +
  theme(legend.justification = 'centre',
        legend.position = 'bottom',
        legend.direction = "horizontal",
        legend.key.height = unit(0.5, "cm"),
        legend.key.width = unit(0.5,"cm")) +
  labs(
    x = "Total Spending",
    y = "Week",
    title = "Spending by Household Size", 
    subtitle = "Data represents split by campaign type on a week basis.",
    caption = "Data Source R Package: completejourney"
    ) +
  facet_wrap(~ household_size) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )


* this needs a write-up… (still working on boxplot or bar chart - Cecilia)

# total sales by household size based on campaign type

sales_campaign_2 <- sales_campaign %>%
  inner_join(transactions, by = "household_id") %>%
  group_by(household_id, household_size, campaign_type) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))
## `summarise()` has grouped output by 'household_id', 'household_size'. You can
## override using the `.groups` argument.
box1 <- sales_campaign_2%>%
  ggplot(aes(x = campaign_type, y = total_sales, color = household_size)) +
  geom_boxplot() +
  guides(color = guide_legend(title = "Household Size")) +
  scale_y_log10(name = "Total Sales", labels = scales::dollar) +
  labs(
    x = "Campaign Type",
    y = "Total Sales",
    title = "Total Sales by Campaign Type",
    subtitle = "Data represents the total sales by campaign type based on the household size.",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )
box1

sales_campaign_2
## # A tibble: 1,549 × 4
## # Groups:   household_id, household_size [756]
##    household_id household_size campaign_type total_sales
##    <chr>        <ord>          <ord>               <dbl>
##  1 1            2              Type A              7247.
##  2 1            2              Type B              9662.
##  3 1            2              Type C              2416.
##  4 1001         1              Type A              5118.
##  5 1001         1              Type B              2559.
##  6 1003         1              Type A              1404.
##  7 1004         1              Type A              7461.
##  8 1004         1              Type B              4974.
##  9 101          4              Type A             13450.
## 10 101          4              Type B              4483.
## # … with 1,539 more rows


* this needs a write-up…(work in progress by Cecilia)

##  [1] "#7F3B08" "#B75C07" "#E68D25" "#FDC57E" "#FEEDD6" "#E9EAF3" "#BEBADA"
##  [8] "#8B7FB4" "#582F8C" "#2D004B"


* this needs a write-up (Zekeriya)

# spending by household size where quantity less than 40000 and sales value < 400

filtered_trans %>%
  select(sales_value, quantity, household_size) %>%
  filter(quantity < 40000, sales_value < 400) %>%
  ggplot(aes(x = quantity, y = sales_value, color = household_size)) +
  geom_point() + 
  guides(color = guide_legend(title = "Household Size")) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    x = "Quantity",
    y = "Total Spending",
    title = "Spending by Household Size", 
    subtitle = "Data represents household spending where quantity less than\n40,000 and sales less than $400.",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )

  • this needs a write-up (Zekeriya)
# savings based on household size

savings_view %>%
  ggplot(aes(x = quantity, y = sales_value, color = household_size)) +
  geom_point() + 
  guides(color = guide_legend(title = "Household Size")) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    x = "Household Size",
    y = "Total Savings",
    title = "Savings by Household Size", 
    subtitle = "Data represents total savings by household size.",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )


Our next graph shows the savings per campaign type. As we can see by the graphs, Type A campaigns were prominently found in the savings, compared to Types B and C. We felt that this graph gave us a great depiction of overall usage of Type A compared to Types B and C. (We can then tie into how we view Type A is the dominant campaign being the targeted one and roll into solution on how to make campaigns better?)

# savings based on campaign type by week; filtering outlier

savings_view %>% 
  filter(household_id != 1228) %>%
  arrange(campaign_id) %>%
  ggplot(aes(y = total_saving, x = week, color = campaign_type)) +
  geom_point(position = 'dodge', stat = 'identity') +
  guides(color = guide_legend(title = "Household Size")) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    x = "Week",
    y = "Total Savings",
    title = "Savings by Campaign Type", 
    subtitle = "Data represents total savings by campaign type on a week basis.",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )
## Warning: Width not defined. Set with `position_dodge(width = ?)`


* this needs a write-up… (Kristi)

# spending based on household size by quarter; filtering outlier 

spend_view %>% 
  filter(household_id != 1228) %>%
  arrange(campaign_id) %>%
  ggplot(aes(y = total_spend, x = transaction_quarter, fill = household_size)) +
  geom_bar(position = 'dodge', stat = 'identity') +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    fill = "Household Size",
    x = "Transaction by Quarter",
    y = "Total Spending",
    title = "Quarterly Spending by Household Size", 
    subtitle = "Data represents total spending by household size on a quarterly basis.",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )


* this needs a write-up… (Kristi)

# savings based on household size by quarter; filtering outlier

savings_view %>% 
  filter(household_id != 1228) %>%
  arrange(campaign_id) %>%
  ggplot(aes(y = total_saving, x = transaction_quarter, fill = household_size)) +
  geom_bar(position = 'dodge', stat = 'identity') +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    fill = "Household Size",
    x = "Transaction by Quarter",
    y = "Total Savings",
    title = "Quarterly Savings by Household Size", 
    subtitle = "Data represents total savings by household size on a quarterly basis.",
    caption = "Data Source R Package: completejourney"
    ) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )

Summary


Emmett to work on write-up