Marketing Strategies Regarding Campaigns
Class: Data Mining (4080-001)
Authors: An Nguyen, Avneet Dharni, Oanh Dang, Roan Zappanti, Tatum Arey

Prerequities

Note: RColorBrewer provides color schemes for graphs

library(tidyverse)
library(lubridate)
library(completejourney)
library(RColorBrewer)
library(dplyr)


c(promotions, transactions) %<-% get_data(which = 'both', verbose = FALSE)

Introduction

Our objective is to evaluate success factors and areas for growth in Campaigns A, B, and C. This will allow us to understand if the initiatives we are implementing are optimized to their fullest potential.

We took 3 primary things into consideration during our analysis; who we are serving, what we are serving, and where we are serving it. These pillars provide a thorough analysis to understand each campaign’s strengths and weaknesses. This analysis will provide strategic recommendations to revise our campaigns for increased sales for Regork.

Business Objective: Identify a potential growth area where the company could invest future resources to increase revenue and profit

Business Question 1: Which campaign types generated most sales and what households are associated with that campaign type?

Business Question 2: What further information can we glean from this data about Campaign Types B and C that can be used to develop a marketing strategy?

Answering the Questions


Initial Analysis on sales by campaign and household size

Based on the results of this exploratory graph, we discovered differences between sales in Type A, Type B, and Type C. This was our starting point for looking into campaigns.


# Creating a new table with summarized sales by campaign
sales_by_campaign <- campaign_descriptions %>%
  inner_join(campaigns) %>%
  inner_join(transactions) %>%
  inner_join(demographics) %>%
  group_by(campaign_id, campaign_type, household_comp) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales))
# Graphing the sales by campaign with household information
sales_by_campaign %>%
  ggplot(aes(x = campaign_type, y = total_sales, fill = household_comp)) +
  geom_col(stat = "identity", position = "dodge") +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Total sales by campaign",
    subtitle = "Campaign running from 2016 to 2017",
    x = "Campaign type",
    y = "Total sales",
    fill = "Household size")


Product Count & Sales Amt Analysis

Next, we wanted to know how many products were involved in each campaign type, and we wanted to validate that sales amounts differed by campaign with a second graph.

It seems that campaign C lacks in both product count and total sales value.

# Joining tables for further general analysis
df <-transactions %>%
  left_join(products) %>%
  left_join(demographics) %>%
  full_join(campaigns) %>%
  full_join(campaign_descriptions)

  
# This graph show how many products is applied for each type of campaign
#This is the reason why campaign A has the most sales value 
#(df2 is the sales value of each campaign type)
df1 <- df %>%
  group_by(campaign_type) %>%
  count(n_distinct(product_id)) %>%
  summarise(total_products= sum(n))

df1 %>% ggplot(aes(x= campaign_type,y=total_products)) +
  geom_segment( aes(x= campaign_type,xend= campaign_type, y=0, yend= total_products, color= "Orange"), show.legend = FALSE)+
  geom_point(color= 'Orange', size = 3, show.legend = FALSE) + 
  scale_y_continuous(trans= 'log2',name = "Number of products", labels = scales::comma) +
  scale_x_discrete(name= "Campaign Type")+
  labs( title = "Number of products in each campaign type",
        subtitle = "This graph shows the number of products applied \n in each campaign type",
       caption = "http://https://github.com/bradleyboehmke/completejourney")

# Creating a second data frame that looks at sales by campaign type without household information (more intuitive to understand)
df2<- df %>%
  group_by(campaign_type) %>%
summarise(total_sales_campaign_type= sum(sales_value))

df2 %>% 
  ggplot(aes(x= campaign_type, y= total_sales_campaign_type, fill= campaign_type))+
  geom_bar(stat = "identity", show.legend = FALSE)+
  scale_fill_brewer(palette = "Oranges")+
  scale_y_continuous( name = "Sales Value of each Campaign Type", labels = scales::dollar)+
  scale_x_discrete(name= "Campaign Type")+
  labs( title = "Total Sales Value of Each Campaign Type",
       caption = "http://https://github.com/bradleyboehmke/completejourney")


An Analysis: Income Analysis

Additionally, we wanted to know what which household income factors were attributed the most (percentage-wise) to each Campaign Type.

Based on the resulting stacked graph, middle-income families are the target market of these campaigns.


# Without saving any variables, create the graph output of the count of households by income across campaign types.
demographics %>%
  inner_join(transactions) %>%
  inner_join(campaigns) %>%
  inner_join(campaign_descriptions) %>%
  filter(campaign_type %in% c("Type A","Type B", "Type C")) %>%
  group_by(campaign_type) %>%
  count(income, campaign_type) %>%
  mutate(pct = n/sum(n)) %>%
  ggplot(aes(reorder(income,n), y = n, fill = campaign_type)) +
    geom_bar(stat = "identity") +
    theme(axis.title.y = element_blank()) +
    theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(colour = "black"),
        plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0)) +
    scale_fill_brewer(palette = "Oranges") +
    # Adding labels of percentages if wanted
    # geom_text(aes(label = round(pct, 2)), hjust = -0.1,
    # position = position_stack(vjust = 0.01),
    # color = "black", check_overlap = TRUE) +
    scale_y_continuous(labels = scales::number) +
    coord_flip() +
  labs(
    title = "Income Distribution Among All Campaign Types",
    x = "Income range",
    y = "Count",
    caption = "Source: R data package that provides access to data in the Complete Journey package \n provided by 84.51",
    fill = "Campaign type")


Campaign-Location Analysis

Finally, we wanted to know which promotional display locations contributed to each campaign type by counting the # of unique proucts that were placed in each location during the campaign. (For the sake of options, we made 2 graphs for each type).

Locations 0 and 7 are the top two for Types A and B, while locations 5 and 0 are common for type C (which has lower performance).

#Defining variables to analyze locations in each campaign. This uses large data frames.
plot_data <- promotions %>% 
    group_by(display_location, product_id) %>% 
    select(display_location, product_id) %>% 
    inner_join(products, by = "product_id") %>% 
    inner_join(coupons, by = "product_id") %>% 
    inner_join(campaign_descriptions, by = "campaign_id") %>%
    select(campaign_type, display_location, product_id) %>% 
    group_by(campaign_type, display_location) %>% 
    summarize(distinct = n_distinct(product_id)) %>%
    arrange(campaign_type, desc(distinct))

location_plot1 <- NA
location_plot2 <- NA
# This function intakes the campaign type (ex: 'Type A') and outputs two ranked graphs regarding location counts by distinct product id

campaign_locations <- function(campaign_tp, messages = TRUE){
  
    
  location_plot1 <<- plot_data %>%
    filter(campaign_type == campaign_tp) %>% 
    ggplot(aes(x = reorder(display_location, -distinct), 
                y = distinct,
               fill = reorder(display_location, -distinct))) + 
      geom_bar(stat = "identity", show.legend = FALSE) +
      scale_y_continuous(labels = scales::comma) +
      scale_fill_grey() +
      xlab("Display Locations") +
      ylab("Distinct Product Count") +
      labs( title = paste("# of unique products per display location"),
            subtitle = paste("For", campaign_tp))
    
  location_plot2 <<- plot_data %>% 
    filter(campaign_type == campaign_tp) %>% 
    ggplot(aes(x = distinct, 
               y = reorder(display_location, distinct),
               col = -distinct)) + 
      geom_point(show.legend = FALSE, size = 5) +
      scale_fill_distiller(type = "seq", palette = "Blues") +      
      xlab("Distinct Product Count") +
      ylab("Display Location") +
      scale_x_continuous(labels = scales::comma) +
      labs( title = paste("# of unique products per display location"),
            subtitle = paste("For", campaign_tp))
  
  message("Function complete")
}  
# Running the 'campgain_locations()' function and displaying outputs
cmpgns <- c("Type A","Type B","Type C")
for(i in cmpgns){
  campaign_locations(i)
  print(location_plot1)
  print(location_plot2)
  rm(location_plot1)
  rm(location_plot2)
}
Function complete

rm(cmpgns)
rm(i)
rm(plot_data)


Summary

Problem Statement:

Which campaign types generated most sales and what households are associated with that campaign type? Additionally, what further information can be gleaned from this data about Campaign Types A, B, and C that can be used to develop a marketing strategy?

Analysis Methodology

Addressing the strategy required to properly invest in all three campaigns required a multi-metric approach, so the analysis was divided into major categories of:

  • Basic sales metrics by campaign type

  • Household income distributions

  • Location popularity distributions

Once broken into the above-stated parts, each analysis used dplyr to join, mutate, and filter (and more) multiple data tables from the completejourney data. Then, the created data frames were visualized utilizing ggplot2 and RColorBrewer.

Strategic Insights

From a marketing perspective, we wanted to address the 4 P’s (pricing, product, placement, & promotion) to fully understand the consumer. We recognized that the most popular income range for all 3 campaigns was 50-74K. This is a great insight to understand the consumers that are reacting positively to our campaigns. It will also allow us to implement a strategic pricing and promotion initiative in place.

Although that is great, we also have some missed opportunities in the other income ranges. This would be a chance for Regork to investigate why that may be and innovate solutions to address that in future campaigns.

Next, we also found that the higher the number of products offered, the higher the sales value for the campaign. This product variety seems to attract consumers and positively impact their buying habits.

Finally, we discovered that campaigns A & B were very successful using display locations 0 and 7. Grocery stores are very strategic in their product placement to increase sales. We wanted to be sensitive to this and understand how this influences campaign sales.

Implications on the Customer

Based on the findings listed above, our recommendations are to:

  1. Continue to target the 50-74K income range

  2. Increase the number of products per campaign (particularly C)

  3. Rotate the campaigns among display locations 0 and 7

This will not only drive already-profitable growth in the 50-74k income range, but also shift Campaign C’s low-quality strategy into the same, highly successful strategies as A and B.

Analysis Limitations

After conducting out analysis, we discovered 4 primary limitations; unknown locations, overlapping campaigns, campaign budget, and unknown product categories.

One of our recommendations was to rotate the campaigns between display locations 0 and 7. Although that is good in theory, this poses some potential road blocks. First, we do not have any information on these locations other than their designated number. This is crucial information especially in a grocery store setting. Is it an end cap? Is it refrigerated? How much space is available? All of these things would be necessary information to implement the recommendation.

Next, since we recommended rotating the campaigns this means no campaigns can overlap in timelines which could cause an issue. Another limitation is campaign budget. One of our recommendations was to add more products to Campaign C. However, more products require more capital which may not be available in the budget.

Lastly, the data set did not provide product categories. This information would’ve been instrumental in creating more specific and successful campaigns.

---
title: "Group 11 Project"
date: ' `r Sys.Date()`'
output: html_notebook
---
Marketing Strategies Regarding Campaigns <br>
Class: Data Mining (4080-001) <br>
Authors: An Nguyen, Avneet Dharni, Oanh Dang, Roan Zappanti, Tatum Arey 
<br><br>
  
## Prerequities
Note: RColorBrewer provides color schemes for graphs
```{r, warning=FALSE, message=FALSE}
library(tidyverse)
library(lubridate)
library(completejourney)
library(RColorBrewer)
library(dplyr)


c(promotions, transactions) %<-% get_data(which = 'both', verbose = FALSE)
```
## Introduction  

Our objective is to evaluate success factors and areas for growth in Campaigns A, B, and C. This will allow us to understand if the initiatives we are implementing are optimized to their fullest potential. <br> <br>We took 3 primary things into consideration during our analysis; who we are serving, what we are serving, and where we are serving it. These pillars provide a thorough analysis to understand each campaign's strengths and weaknesses. This analysis will provide strategic recommendations to revise our campaigns for increased sales for Regork.

**Business Objective**: Identify a potential growth area where the company could invest future resources to increase revenue and profit

**Business Question 1**: Which campaign types generated most sales and what households are associated with that campaign type?

**Business Question 2:** What further information can we glean from this data about Campaign Types B and C that can be used to develop a marketing strategy?  <br><br>

- What are the total sales and number of products within each campaign?

-  What are the income distributions among each campaigns type?  

- What display locations lead to higher sales, and what locations are common within each campaign type? 
<br><br>


## Answering the Questions

<br>

#### Initial Analysis on sales by campaign and household size

Based on the results of this exploratory graph, we discovered differences between sales in Type A, Type B, and Type C. This was our starting point for looking into campaigns.
```{r, message= FALSE, include= TRUE, echo= TRUE} 

# Creating a new table with summarized sales by campaign
sales_by_campaign <- campaign_descriptions %>%
  inner_join(campaigns) %>%
  inner_join(transactions) %>%
  inner_join(demographics) %>%
  group_by(campaign_id, campaign_type, household_comp) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales))
```


```{r, warning=FALSE}
# Graphing the sales by campaign with household information
sales_by_campaign %>%
  ggplot(aes(x = campaign_type, y = total_sales, fill = household_comp)) +
  geom_col(stat = "identity", position = "dodge") +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Total sales by campaign",
    subtitle = "Campaign running from 2016 to 2017",
    x = "Campaign type",
    y = "Total sales",
    fill = "Household size")
```
<br>

#### Product Count & Sales Amt Analysis
Next, we wanted to know how many products were involved in each campaign type, and we wanted to validate that sales amounts differed by campaign with a second graph. 
<br><br>
It seems that **campaign C** lacks in both product count and total sales value.

```{r echo=TRUE, warning=FALSE, message=FALSE}
# Joining tables for further general analysis
df <-transactions %>%
  left_join(products) %>%
  left_join(demographics) %>%
  full_join(campaigns) %>%
  full_join(campaign_descriptions)

  
# This graph show how many products is applied for each type of campaign
#This is the reason why campaign A has the most sales value 
#(df2 is the sales value of each campaign type)
df1 <- df %>%
  group_by(campaign_type) %>%
  count(n_distinct(product_id)) %>%
  summarise(total_products= sum(n))

df1 %>% ggplot(aes(x= campaign_type,y=total_products)) +
  geom_segment( aes(x= campaign_type,xend= campaign_type, y=0, yend= total_products, color= "Orange"), show.legend = FALSE)+
  geom_point(color= 'Orange', size = 3, show.legend = FALSE) + 
  scale_y_continuous(trans= 'log2',name = "Number of products", labels = scales::comma) +
  scale_x_discrete(name= "Campaign Type")+
  labs( title = "Number of products in each campaign type",
        subtitle = "This graph shows the number of products applied \n in each campaign type",
       caption = "http://https://github.com/bradleyboehmke/completejourney")
```

```{r echo=TRUE, warning=FALSE, message= FALSE}
# Creating a second data frame that looks at sales by campaign type without household information (more intuitive to understand)
df2<- df %>%
  group_by(campaign_type) %>%
summarise(total_sales_campaign_type= sum(sales_value))

df2 %>% 
  ggplot(aes(x= campaign_type, y= total_sales_campaign_type, fill= campaign_type))+
  geom_bar(stat = "identity", show.legend = FALSE)+
  scale_fill_brewer(palette = "Oranges")+
  scale_y_continuous( name = "Sales Value of each Campaign Type", labels = scales::dollar)+
  scale_x_discrete(name= "Campaign Type")+
  labs( title = "Total Sales Value of Each Campaign Type",
       caption = "http://https://github.com/bradleyboehmke/completejourney")
```
<br>

#### An Analysis: Income Analysis

Additionally, we wanted to know what which household income factors were attributed the most (percentage-wise) to each Campaign Type.
<br><br>
Based on the resulting stacked graph, middle-income families are the target market of these campaigns.
```{r echo=TRUE, warning=FALSE, message=FALSE}

# Without saving any variables, create the graph output of the count of households by income across campaign types.
demographics %>%
  inner_join(transactions) %>%
  inner_join(campaigns) %>%
  inner_join(campaign_descriptions) %>%
  filter(campaign_type %in% c("Type A","Type B", "Type C")) %>%
  group_by(campaign_type) %>%
  count(income, campaign_type) %>%
  mutate(pct = n/sum(n)) %>%
  ggplot(aes(reorder(income,n), y = n, fill = campaign_type)) +
    geom_bar(stat = "identity") +
    theme(axis.title.y = element_blank()) +
    theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(colour = "black"),
        plot.caption = element_text(size = 8, margin = margin(t = 10), color = "grey70", hjust = 0)) +
    scale_fill_brewer(palette = "Oranges") +
    # Adding labels of percentages if wanted
    # geom_text(aes(label = round(pct, 2)), hjust = -0.1,
    # position = position_stack(vjust = 0.01),
    # color = "black", check_overlap = TRUE) +
    scale_y_continuous(labels = scales::number) +
    coord_flip() +
  labs(
    title = "Income Distribution Among All Campaign Types",
    x = "Income range",
    y = "Count",
    caption = "Source: R data package that provides access to data in the Complete Journey package \n provided by 84.51",
    fill = "Campaign type")
```

<br>

#### Campaign-Location Analysis

Finally, we wanted to know which promotional display locations contributed to each campaign type by counting the # of unique proucts that were placed in each location during the campaign.
(For the sake of options, we made 2 graphs for each type).

Locations 0 and 7 are the top two for Types A and B, while locations 5 and 0 are common for type C (which has lower performance).

```{r Defining Data & Variables, echo = TRUE, message= FALSE}
#Defining variables to analyze locations in each campaign. This uses large data frames.
plot_data <- promotions %>% 
    group_by(display_location, product_id) %>% 
    select(display_location, product_id) %>% 
    inner_join(products, by = "product_id") %>% 
    inner_join(coupons, by = "product_id") %>% 
    inner_join(campaign_descriptions, by = "campaign_id") %>%
    select(campaign_type, display_location, product_id) %>% 
    group_by(campaign_type, display_location) %>% 
    summarize(distinct = n_distinct(product_id)) %>%
    arrange(campaign_type, desc(distinct))

location_plot1 <- NA
location_plot2 <- NA
```

```{r Create the Function, echo = TRUE}
# This function intakes the campaign type (ex: 'Type A') and outputs two ranked graphs regarding location counts by distinct product id

campaign_locations <- function(campaign_tp, messages = TRUE){
  
    
  location_plot1 <<- plot_data %>%
    filter(campaign_type == campaign_tp) %>% 
    ggplot(aes(x = reorder(display_location, -distinct), 
                y = distinct,
               fill = reorder(display_location, -distinct))) + 
      geom_bar(stat = "identity", show.legend = FALSE) +
      scale_y_continuous(labels = scales::comma) +
      scale_fill_grey() +
      xlab("Display Locations") +
      ylab("Distinct Product Count") +
      labs( title = paste("# of unique products per display location"),
            subtitle = paste("For", campaign_tp))
    
  location_plot2 <<- plot_data %>% 
    filter(campaign_type == campaign_tp) %>% 
    ggplot(aes(x = distinct, 
               y = reorder(display_location, distinct),
               col = -distinct)) + 
      geom_point(show.legend = FALSE, size = 5) +
      scale_fill_distiller(type = "seq", palette = "Blues") +      
      xlab("Distinct Product Count") +
      ylab("Display Location") +
      scale_x_continuous(labels = scales::comma) +
      labs( title = paste("# of unique products per display location"),
            subtitle = paste("For", campaign_tp))
  
  message("Function complete")
}  

```

```{r Run the Function, messages = FALSE}
# Running the 'campgain_locations()' function and displaying outputs
cmpgns <- c("Type A","Type B","Type C")
for(i in cmpgns){
  campaign_locations(i)
  print(location_plot1)
  print(location_plot2)
  rm(location_plot1)
  rm(location_plot2)
}

rm(cmpgns)
rm(i)
rm(plot_data)
```
<br>

## Summary

### Problem Statement:
Which campaign types generated most sales and what households are associated with that campaign type? Additionally, what further information can be gleaned from this data about Campaign Types A, B, and C that can be used to develop a marketing strategy? <br><br>

### Analysis Methodology
Addressing the strategy required to properly invest in all three campaigns required a multi-metric approach, so the analysis was divided into major categories of: <br>

- Basic sales metrics by campaign type

- Household income distributions  

- Location popularity distributions 

Once broken into the above-stated parts, each analysis used dplyr to join, mutate, and filter (and more) multiple data tables from the completejourney data. Then, the created data frames were visualized utilizing ggplot2 and RColorBrewer.<br><br>

### Strategic Insights
From a marketing perspective, we wanted to address the 4 P's (pricing, product, placement, & promotion) to fully understand the consumer. 
We recognized that the most popular income range for all 3 campaigns was 50-74K. This is a great insight to understand the consumers that are reacting positively to our campaigns. It will also allow us to implement a strategic pricing and promotion initiative in place. <br><br>Although that is great, we also have some missed opportunities in the other income ranges. This would be a chance for Regork to investigate why that may be and innovate solutions to address that in future campaigns. <br><br>Next, we also found that the higher the number of products offered, the higher the sales value for the campaign. This product variety seems to attract consumers and positively impact their buying habits. <br><br>Finally, we discovered that campaigns A & B were very successful using display locations 0 and 7. Grocery stores are very strategic in their product placement to increase sales. We wanted to be sensitive to this and understand how this influences campaign sales.<br><br>

### Implications on the Customer
Based on the findings listed above, our recommendations are to:<br>

1. Continue to target the 50-74K income range 

2. Increase the number of products per campaign (particularly C) 

3. Rotate the campaigns among display locations 0 and 7

This will not only drive already-profitable growth in the 50-74k income range, but also shift Campaign C's low-quality strategy into the same, highly successful strategies as A and B. <br><br>

### Analysis Limitations
After conducting out analysis, we discovered 4 primary limitations; unknown locations, overlapping campaigns, campaign budget, and unknown product categories. <br><br>One of our recommendations was to rotate the campaigns between display locations 0 and 7. Although that is good in theory, this poses some potential road blocks. First, we do not have any information on these locations other than their designated number. This is crucial information especially in a grocery store setting. Is it an end cap? Is it refrigerated? How much space is available? All of these things would be necessary information to implement the recommendation. <br><br>Next, since we recommended rotating the campaigns this means no campaigns can overlap in timelines which could cause an issue. Another limitation is campaign budget. One of our recommendations was to add more products to Campaign C. However, more products require more capital which may not be available in the budget. <br><br>Lastly, the data set did not provide product categories. This information would've been instrumental in creating more specific and successful campaigns.


