We have been tasked with identifying and analyzing potential growth opportunities where Regork can invest in order to increase revenue and profits. Companies like Regork are always looking to make an impact and to win the “hearts and minds” of the consumer. As a grocer, sometimes this can be difficult when you deal with commodities. How can you gain the attention of the consumer and get them into the store?
Do the targeted campaigns generate a greater amount of customer engagement?
Regork has much of the data to start with and paint a picture of the consumer. With this data, we can see that the targeted Type A campaigns have a good effect on the larger families of 5+, but not so much on the smaller families. Utilizing exploratory data analysis (EDA) techniques and visualizations will allow us to identify potential solutions for increasing customer engagement through campaigns, while penetrating larger household sizes.
The following R packages are required in order to run the code in this R project:
library(completejourney) # grocery store shopping transactions data from group of 2,469 households
library(corrplot) # visualization of a correlation matrix
library(dplyr) # manipulating and transforming data (i.e., filtering, joining, etc.)
library(forcats) # working with categorical variables or factors
library(GGally) # plotting system for R based on "Grammar of Graphics"; extension to "ggplot2"
library(ggrepel) # position and repel overlapping text labels with "ggplot2"
library(ggplot2) # data visualization plotting system using "Grammar of Graphics"
library(ggthemes) # additional plotting themes, scales, and geoms for "ggplot2"
library(grid) # grid graphics plotting functions and capabilities
library(gridExtra) # additional grid capability functions for "grid"
library(jpeg) # reading and writing JPEG images
library(knitr) # dynamic report generation in R
library(lubridate) # functions used for working with dates and times
library(naniar) # find data quality issues; summarize, visualize, and manipulate missing data
library(png) # reading and writing PNG images
library(readr) # reading rectangular text data
library(stringr) # manipulation text
library(tidyr) # functions used for tidying or cleaning up messy data
library(tidyverse) # tidying data and working with other R packages
library(treemapify) # plotting treemaps in "ggplot2"
The data for this R project can be accessed from the CompleteJourney website. The CompleteJourney datasets are based on grocery shopping transactions from a group of 2,469 households. Entities such as demographics, products, coupons, campaigns, etc., were collected over a one-year timeframe from January 2017 - December 2017.
Let’s Begin
The first task in the data preparation process is to get the full transactions and promotions datasets. Both datasets and the remaining CompleteJourney datasets were previewed in a table-like format called “tibble.”
# get the completejourney - transactions dataset
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## house…¹ store…² baske…³ produ…⁴ quant…⁵ sales…⁶ retai…⁷ coupo…⁸ coupo…⁹ week
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 900 330 311985… 1095275 1 0.5 0 0 0 1
## 2 900 330 311985… 9878513 1 0.99 0.1 0 0 1
## 3 1228 406 311986… 1041453 1 1.43 0.15 0 0 1
## 4 906 319 311987… 1020156 1 1.5 0.29 0 0 1
## 5 906 319 311987… 1053875 2 2.78 0.8 0 0 1
## 6 906 319 311987… 1060312 1 5.49 0.5 0 0 1
## 7 906 319 311987… 1075313 1 1.5 0.29 0 0 1
## 8 1058 381 311986… 985893 1 1.88 0.21 0 0 1
## 9 1058 381 311986… 988791 1 1.5 1.29 0 0 1
## 10 1058 381 311986… 9297106 1 2.69 0 0 0 1
## # … with 1,469,297 more rows, 1 more variable: transaction_timestamp <dttm>,
## # and abbreviated variable names ¹household_id, ²store_id, ³basket_id,
## # ⁴product_id, ⁵quantity, ⁶sales_value, ⁷retail_disc, ⁸coupon_disc,
## # ⁹coupon_match_disc
# get the completejourney - promotions dataset
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # … with 20,940,519 more rows
campaigns
## # A tibble: 6,589 × 2
## campaign_id household_id
## <chr> <chr>
## 1 1 105
## 2 1 1238
## 3 1 1258
## 4 1 1483
## 5 1 2200
## 6 1 293
## 7 1 529
## 8 1 536
## 9 1 568
## 10 1 630
## # … with 6,579 more rows
campaign_descriptions
## # A tibble: 27 × 4
## campaign_id campaign_type start_date end_date
## <chr> <ord> <date> <date>
## 1 1 Type B 2017-03-03 2017-04-09
## 2 2 Type B 2017-03-08 2017-04-09
## 3 3 Type C 2017-03-13 2017-05-08
## 4 4 Type B 2017-03-29 2017-04-30
## 5 5 Type B 2017-04-03 2017-05-07
## 6 6 Type C 2017-04-19 2017-05-21
## 7 7 Type B 2017-04-24 2017-05-28
## 8 8 Type A 2017-05-08 2017-06-25
## 9 9 Type B 2017-05-31 2017-07-02
## 10 10 Type B 2017-06-28 2017-07-30
## # … with 17 more rows
coupons
## # A tibble: 116,204 × 3
## coupon_upc product_id campaign_id
## <chr> <chr> <chr>
## 1 10000085207 9676830 26
## 2 10000085207 9676943 26
## 3 10000085207 9676944 26
## 4 10000085207 9676947 26
## 5 10000085207 9677008 26
## 6 10000085207 9677052 26
## 7 10000085207 9677385 26
## 8 10000085207 9677479 26
## 9 10000085207 9677791 26
## 10 10000085207 9677878 26
## # … with 116,194 more rows
coupon_redemptions
## # A tibble: 2,102 × 4
## household_id coupon_upc campaign_id redemption_date
## <chr> <chr> <chr> <date>
## 1 1029 51380041013 26 2017-01-01
## 2 1029 51380041313 26 2017-01-01
## 3 165 53377610033 26 2017-01-03
## 4 712 51380041013 26 2017-01-07
## 5 712 54300016033 26 2017-01-07
## 6 2488 51200092776 26 2017-01-10
## 7 2488 51410010050 26 2017-01-10
## 8 1923 53000012033 26 2017-01-14
## 9 1923 54300021057 26 2017-01-14
## 10 1923 57047091041 26 2017-01-14
## # … with 2,092 more rows
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital…¹ house…² house…³ kids_…⁴
## <chr> <ord> <ord> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2 2 Adul… 0
## 2 1001 45-54 50-74K Homeowner Unmarried 1 1 Adul… 0
## 3 1003 35-44 25-34K <NA> Unmarried 1 1 Adul… 0
## 4 1004 25-34 15-24K <NA> Unmarried 1 1 Adul… 0
## 5 101 45-54 Under 15K Homeowner Married 4 2 Adul… 2
## 6 1012 35-44 35-49K <NA> Married 5+ 2 Adul… 3+
## 7 1014 45-54 15-24K <NA> Married 4 2 Adul… 2
## 8 1015 45-54 50-74K Homeowner Unmarried 1 1 Adul… 0
## 9 1018 45-54 35-49K Homeowner Married 5+ 2 Adul… 3+
## 10 1020 45-54 25-34K Homeowner Married 2 2 Adul… 0
## # … with 791 more rows, and abbreviated variable names ¹marital_status,
## # ²household_size, ³household_comp, ⁴kids_count
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_c…¹ produ…² packa…³
## <chr> <chr> <chr> <fct> <chr> <chr> <chr>
## 1 25671 2 GROCERY National FRZN ICE ICE - … 22 LB
## 2 26081 2 MISCELLANEOUS National <NA> <NA> <NA>
## 3 26093 69 PASTRY Private BREAD BREAD:… <NA>
## 4 26190 69 GROCERY Private FRUIT - SH… APPLE … 50 OZ
## 5 26355 69 GROCERY Private COOKIES/CO… SPECIA… 14 OZ
## 6 26426 69 GROCERY Private SPICES & E… SPICES… 2.5 OZ
## 7 26540 69 GROCERY Private COOKIES/CO… TRAY P… 16 OZ
## 8 26601 69 DRUG GM Private VITAMINS VITAMI… 300 CT…
## 9 26636 69 PASTRY Private BREAKFAST … SW GDS… <NA>
## 10 26691 16 GROCERY Private PNT BTR/JE… HONEY 12 OZ
## # … with 92,321 more rows, and abbreviated variable names ¹product_category,
## # ²product_type, ³package_size
Dataframes
This section of the data preparation combines all the specified dataframes that are used with the visualizations in this R project. References to “na.rm = TRUE” have been incorporated to remove “NA” values from the calculations or summarization of variables.
filtered_trans <- demographics %>%
left_join(transactions) %>%
mutate(transaction_quarter = quarter(transaction_timestamp, type = "year.quarter"))
full_view <- filtered_trans %>%
inner_join(products, key = "product_id") %>%
inner_join(coupon_redemptions) %>%
inner_join(coupons) %>%
inner_join(campaign_descriptions) %>%
inner_join(campaigns)
savings_view <- full_view %>%
dplyr::mutate(coupon_savings = (coupon_disc + coupon_match_disc)) %>%
dplyr::mutate(customer_amount = (sales_value - (coupon_savings))) %>%
# dplyr::mutate(campaign_id = as.integer(campaign_id)) %>%
group_by(transaction_quarter, week, household_id, household_size, campaign_id, campaign_type, age, income, marital_status, quantity, sales_value) %>%
summarise(total_saving = sum(coupon_savings))
spend_view <- full_view %>%
dplyr::mutate(coupon_savings = (coupon_disc + coupon_match_disc)) %>%
dplyr::mutate(customer_amount = (sales_value - (coupon_savings))) %>%
# dplyr::mutate(campaign_id = as.integer(campaign_id)) %>%
group_by(transaction_quarter, week, household_id, household_size, campaign_id, campaign_type, age, income, marital_status) %>%
summarise(total_spend = sum(customer_amount))
avg_spend <- demographics %>%
inner_join(transactions) %>%
inner_join(campaigns) %>%
inner_join(campaign_descriptions) %>%
group_by(household_id) %>%
summarise(basket = sum(sales_value , na.rm = TRUE), campaign_type, household_size) %>%
group_by(household_size, campaign_type) %>%
summarise(spending = mean(basket))
household_percent <- full_view %>%
count(campaign_type, household_size) %>%
group_by(household_size) %>%
mutate(pct = n / sum(n))
sales_campaign <- campaigns %>%
inner_join(demographics, by = "household_id") %>%
inner_join(campaign_descriptions, by = "campaign_id")
coupon_redeem <- coupon_redemptions %>%
inner_join(demographics) %>%
group_by(household_size)
# coupon_redeem %>% map_int(n_distinct)
The Exploratory Data Analysis (EDA) section outlines the visualizations that are used in the R project to address the problem statement and/or business question. Formats such as bar charts, line charts, scatter plots, correlation matrix, etc., are used as different styles for the visualizations.
Let’s Explore
We began our analysis by looking at the three types of campaigns: Type A, Type B, and Type C. Type A was the only targeted campaign of the three where coupons were sent to customers based on purchase history. As shown in the graph below, the total amount of campaigns received was analyzed based on household size and subset by campaign type. The data indicates that for each household size, over half of the coupons received were Type A. The second largest amount received was Type B in each category, and Type C campaigns were subsequently the least disbursed. The data indicates that campaign types were disbursed in a similar ratio across household size. Our analysis will determine whether the number of Type A campaigns received should increase or decrease based on consumer reports.
# campaigns received based on household size by campaign type
campaigns %>%
inner_join(demographics, by = "household_id") %>%
inner_join(campaign_descriptions, by = "campaign_id") %>%
group_by(household_id, campaign_id) %>%
ggplot(aes(x = household_size, fill = campaign_type)) +
geom_bar() +
labs(
fill = "Campaign Type",
x = "Household Size",
y = "Campaigns Received",
title = "Campaigns Received by Household Size",
subtitle = "Data represents the number of campaigns received based on campaign type by household size.\nThe top campaign type is A with its highest count being received within a household size of 2.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
We decided to take our findings a little more in depth and analyze the
specific coupon redemption by household. Our graph shows that households
of 1 or 2 people redeemed more coupons than households of 3 or more.
# total coupon redemption by household size
ggplot(coupon_redeem, aes(x = household_size, fill = household_size)) +
geom_bar(stat = "count") +
labs(
fill = "Household Size",
title = "Total Coupon Redemption Volume by Household Size",
subtitle = "Data represents total coupon redemption from January 1 - December 31, 2017.",
x = "Household Size",
y = "Total Coupon Redemption Volume",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
The following are “tibbles” that display the counts of campaigns
received based on household size and campaign type. The “Groups:
household_size” tibble details the number of participants
grouped by household size. It is important to note that a household size
of two is the largest with a count of 1,482 households, while the lowest
household size is 4 with a count of 300 households.
We proceeded to review the “Groups: campaign_type” tibble to observe study participants based on campaign type received. 52% of campaigns disbursed were Type A, 38% were Type B, and only 9% of campaigns received were Type C. These numbers tell us that a higher value is placed on coupons that are targeted to customer purchase history.
# count of campaigns received based on household size
campaigns %>%
inner_join(demographics, by = "household_id") %>%
inner_join(campaign_descriptions, by = "campaign_id") %>%
group_by(household_size) %>%
count(household_size)
## # A tibble: 5 × 2
## # Groups: household_size [5]
## household_size n
## <ord> <int>
## 1 1 1155
## 2 2 1482
## 3 3 576
## 4 4 300
## 5 5+ 322
# count of campaigns by campaign type
campaigns %>%
inner_join(demographics, by = "household_id") %>%
inner_join(campaign_descriptions, by = "campaign_id") %>%
group_by(campaign_type) %>%
count(campaign_type)
## # A tibble: 3 × 2
## # Groups: campaign_type [3]
## campaign_type n
## <ord> <int>
## 1 Type A 2007
## 2 Type B 1466
## 3 Type C 362
# count of campaigns received by household size based on campaign type
campaigns %>%
inner_join(demographics, by = "household_id") %>%
inner_join(campaign_descriptions, by = "campaign_id") %>%
group_by(household_size) %>%
count(campaign_type)
## # A tibble: 15 × 3
## # Groups: household_size [5]
## household_size campaign_type n
## <ord> <ord> <int>
## 1 1 Type A 647
## 2 1 Type B 403
## 3 1 Type C 105
## 4 2 Type A 802
## 5 2 Type B 558
## 6 2 Type C 122
## 7 3 Type A 280
## 8 3 Type B 228
## 9 3 Type C 68
## 10 4 Type A 126
## 11 4 Type B 139
## 12 4 Type C 35
## 13 5+ Type A 152
## 14 5+ Type B 138
## 15 5+ Type C 32
The next graph shows the average spending per observation by household
size, and we subset it by campaign type. We decided to take the average
spending to make sure the data was not skewed and to help depict a more
accurate graph showing the similarities and differences of spending for
each campaign type. As we can see by the graph, each housing size has
slightly different habits towards which campaign is redeemed the
most.
As we can see from the graph, households of all sizes were strong users of Type A campaigns, especially in household sizes 1-4, where we can see that Type A was showed less than or equal spending compared to Types B and C. Sales Value in our data table includes any discounts calculated in the transaction, so we can justify observe smaller averages as more coupon usage.
We also wanted to confirm the seasonality of the campaigns to make sure
there were no issues with timing of the coupons. Our seasonality graph
shows that Type A is effective year-round, which is another positive
factor showing that the targeted campaigns are effective.
# products sales per month by campaign type; top 10 coupons
products %>%
inner_join(transactions, by = "product_id") %>%
inner_join(coupons, by = "product_id") %>%
inner_join(campaigns, by = c("household_id", "campaign_id")) %>%
inner_join(campaign_descriptions, by = "campaign_id") %>%
mutate(product_category = str_replace_all(product_category, pattern = "FRZN", replacement = "FROZEN")) %>%
mutate(sales_month = month(transaction_timestamp, label = TRUE)) %>%
group_by(sales_month, coupon_upc, campaign_type) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)) %>%
slice(1:10) %>%
ggplot(aes(x = sales_month, y = total_sales, color = campaign_type)) +
geom_jitter() +
guides(color = guide_legend(title = "Campaign Type")) +
scale_y_continuous(name = "Total Sales", labels = scales::dollar) +
labs(
x = "Month",
title = "Total Product Sales Per Month",
subtitle = "Data represents the total sales based on the top 10 coupons.\nCampaign Type A displays the highest product sales per month..",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
## `summarise()` has grouped output by 'sales_month', 'coupon_upc'. You can
## override using the `.groups` argument.
* this needs a write-up… (Emmett)
# spending by household size by week; filtering outlier
spend_view %>% #line view
filter(household_id != 1228) %>%
arrange(campaign_type) %>%
ggplot(aes(y = total_spend, x = week, color = campaign_type)) +
geom_line(data = spend_view, aes(week, total_spend)) +
guides(color = guide_legend(title = "Campaign Type")) +
scale_y_continuous(labels = scales::dollar) +
theme(legend.justification = 'centre',
legend.position = 'bottom',
legend.direction = "horizontal",
legend.key.height = unit(0.5, "cm"),
legend.key.width = unit(0.5,"cm")) +
labs(
x = "Total Spending",
y = "Week",
title = "Spending by Household Size",
subtitle = "Data represents split by campaign type on a week basis.",
caption = "Data Source R Package: completejourney"
) +
facet_wrap(~ household_size) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
* this needs a write-up… (still working on boxplot or bar chart -
Cecilia)
# total sales by household size based on campaign type
sales_campaign_2 <- sales_campaign %>%
inner_join(transactions, by = "household_id") %>%
group_by(household_id, household_size, campaign_type) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
## `summarise()` has grouped output by 'household_id', 'household_size'. You can
## override using the `.groups` argument.
box1 <- sales_campaign_2%>%
ggplot(aes(x = campaign_type, y = total_sales, color = household_size)) +
geom_boxplot() +
guides(color = guide_legend(title = "Household Size")) +
scale_y_log10(name = "Total Sales", labels = scales::dollar) +
labs(
x = "Campaign Type",
y = "Total Sales",
title = "Total Sales by Campaign Type",
subtitle = "Data represents the total sales by campaign type based on the household size.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
box1
sales_campaign_2
## # A tibble: 1,549 × 4
## # Groups: household_id, household_size [756]
## household_id household_size campaign_type total_sales
## <chr> <ord> <ord> <dbl>
## 1 1 2 Type A 7247.
## 2 1 2 Type B 9662.
## 3 1 2 Type C 2416.
## 4 1001 1 Type A 5118.
## 5 1001 1 Type B 2559.
## 6 1003 1 Type A 1404.
## 7 1004 1 Type A 7461.
## 8 1004 1 Type B 4974.
## 9 101 4 Type A 13450.
## 10 101 4 Type B 4483.
## # … with 1,539 more rows
* this needs a write-up…(work in progress by Cecilia)
## [1] "#7F3B08" "#B75C07" "#E68D25" "#FDC57E" "#FEEDD6" "#E9EAF3" "#BEBADA"
## [8] "#8B7FB4" "#582F8C" "#2D004B"
* this needs a write-up (Zekeriya)
# spending by household size where quantity less than 40000 and sales value < 400
filtered_trans %>%
select(sales_value, quantity, household_size) %>%
filter(quantity < 40000, sales_value < 400) %>%
ggplot(aes(x = quantity, y = sales_value, color = household_size)) +
geom_point() +
guides(color = guide_legend(title = "Household Size")) +
scale_y_continuous(labels = scales::dollar) +
labs(
x = "Quantity",
y = "Total Spending",
title = "Spending by Household Size",
subtitle = "Data represents household spending where quantity less than\n40,000 and sales less than $400.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
# savings based on household size
savings_view %>%
ggplot(aes(x = quantity, y = sales_value, color = household_size)) +
geom_point() +
guides(color = guide_legend(title = "Household Size")) +
scale_y_continuous(labels = scales::dollar) +
labs(
x = "Household Size",
y = "Total Savings",
title = "Savings by Household Size",
subtitle = "Data represents total savings by household size.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
Our next graph shows the savings per campaign type. As we can see by the
graphs, Type A campaigns were prominently found in the savings, compared
to Types B and C. We felt that this graph gave us a great depiction of
overall usage of Type A compared to Types B and C. (We can
then tie into how we view Type A is the dominant campaign being the
targeted one and roll into solution on how to make campaigns
better?)
# savings based on campaign type by week; filtering outlier
savings_view %>%
filter(household_id != 1228) %>%
arrange(campaign_id) %>%
ggplot(aes(y = total_saving, x = week, color = campaign_type)) +
geom_point(position = 'dodge', stat = 'identity') +
guides(color = guide_legend(title = "Household Size")) +
scale_y_continuous(labels = scales::dollar) +
labs(
x = "Week",
y = "Total Savings",
title = "Savings by Campaign Type",
subtitle = "Data represents total savings by campaign type on a week basis.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
## Warning: Width not defined. Set with `position_dodge(width = ?)`
* this needs a write-up… (Kristi)
# spending based on household size by quarter; filtering outlier
spend_view %>%
filter(household_id != 1228) %>%
arrange(campaign_id) %>%
ggplot(aes(y = total_spend, x = transaction_quarter, fill = household_size)) +
geom_bar(position = 'dodge', stat = 'identity') +
scale_y_continuous(labels = scales::dollar) +
labs(
fill = "Household Size",
x = "Transaction by Quarter",
y = "Total Spending",
title = "Quarterly Spending by Household Size",
subtitle = "Data represents total spending by household size on a quarterly basis.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
* this needs a write-up… (Kristi)
# savings based on household size by quarter; filtering outlier
savings_view %>%
filter(household_id != 1228) %>%
arrange(campaign_id) %>%
ggplot(aes(y = total_saving, x = transaction_quarter, fill = household_size)) +
geom_bar(position = 'dodge', stat = 'identity') +
scale_y_continuous(labels = scales::dollar) +
labs(
fill = "Household Size",
x = "Transaction by Quarter",
y = "Total Savings",
title = "Quarterly Savings by Household Size",
subtitle = "Data represents total savings by household size on a quarterly basis.",
caption = "Data Source R Package: completejourney"
) +
theme(
plot.subtitle = element_text(face = "italic"),
plot.caption = element_text(hjust = 1, size = 7),
legend.title = element_text(size = 9),
legend.text = element_text(size = 7)
)
Emmett to work on write-up