Part 1

Primary Focus: Optimize the Direct channel’s performance while reducing inefficient costs in Affiliates.

Key Insights: - Direct is the dominant channel (largest volume, strong profitability). - Paid Social has the highest profit per customer, though with lower volume. - Geography & Age Trends: Dublin leads in Direct volume while Longford leads in conversion rate; younger demographics (20-29) are key profit drivers.

Recommendations: Cut costs by reallocating spend from Affiliates to Direct and Paid Social. Prioritize Dublin as it drives substantial profit and should be a focal point for further investment. Focus on 20-29 age group for maximum long-term profitability.

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(rio)

rm(list=ls())

setwd("/Users/mkim/Desktop/2K")

df <- import("/Users/mkim/Desktop/2K/Demographic.csv")
dfvalue <- import("/Users/mkim/Desktop/2K/Value.csv")

# 1. Data Preparation

df <- df %>%
  separate(`Registration Date`, into = c("Date", "Time"), sep = " ")

df <- df %>%
  mutate(
    `Date Of Birth` = mdy(`Date Of Birth`),  # Convert to Date format (if stored as DD/MM/YYYY)
    `Date` = mdy(`Date`),  # Ensure the reference date is also in Date format
    Age = as.integer(interval(`Date Of Birth`, `Date`) / years(1)) # Calculate Age
  )

# 2. Merge in Customer Value Data (dfvalue)

# Assume customer value data is in 'dfvalue'
df_full <- left_join(df, dfvalue, by = "Customer ID") %>%
  # Create an overall profitability metric
  mutate(Total_Profit = `1st Order Profit` + `Subsequent Order Profit` - `Total value of all promotions`)
channel_attr <- df_full %>%
  group_by(`Source of Customer`) %>%
  summarise(
    Num_Customers = n(),
    Total_First_Order_Profit = sum(`1st Order Profit`, na.rm = TRUE),
    Total_Subsequent_Order_Profit = sum(`Subsequent Order Profit`, na.rm = TRUE),
    Total_Profit = sum(Total_Profit, na.rm = TRUE),
    Avg_Profit_Per_Customer = Total_Profit / n()
  ) %>%
  # Calculate percentage contributions
  mutate(
    Perc_Customers = Num_Customers / sum(Num_Customers) * 100,
    Perc_Profit = Total_Profit / sum(Total_Profit) * 100
  )

# View the aggregated data
print(channel_attr)
## # A tibble: 5 × 8
##   `Source of Customer` Num_Customers Total_First_Order_Profit
##   <chr>                        <int>                    <int>
## 1 Affiliates                    3868                    29751
## 2 Direct                       11252                   118219
## 3 Organic Search                8392                    90581
## 4 Paid Search                   6518                    69435
## 5 Paid Social                   1411                    19029
## # ℹ 5 more variables: Total_Subsequent_Order_Profit <int>, Total_Profit <int>,
## #   Avg_Profit_Per_Customer <dbl>, Perc_Customers <dbl>, Perc_Profit <dbl>
#Bar Chart – Total Profit by Channel
ggplot(channel_attr, aes(x = reorder(`Source of Customer`, -Total_Profit), y = Total_Profit, fill = `Source of Customer`)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Profit Attribution by Source of Customer",
       x = "Source of Customer",
       y = "Total Profit") +
  theme_minimal()

# Bar Chart – Average Profit per Customer
ggplot(channel_attr, aes(x = reorder(`Source of Customer`, -Avg_Profit_Per_Customer), y = Avg_Profit_Per_Customer, fill = `Source of Customer`)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = round(Avg_Profit_Per_Customer, 1)), vjust = -0.5, size = 4) +
  labs(title = "Average Profit per Customer by Source of Customer",
       x = "Source of Customer",
       y = "Avg Profit per Customer") +
  theme_minimal()

# Scatterplot – Customer vs. Profit Contribution by Source of Customer
ggplot(channel_attr, aes(x = Perc_Customers, y = Perc_Profit, color = `Source of Customer`)) +
  geom_point(size = 3) + 
  geom_text(aes(label = `Source of Customer`), vjust = -1, size = 4, color = "black") + 
  labs(title = "Customer vs. Profit Contribution by Source of Customer",
       x = "Percentage of Customers",
       y = "Percentage of Total Profit") +
  theme_minimal(base_size = 15) + 
  theme(
    plot.title = element_text(size = 14, face = "bold"), 
    axis.text = element_text(size = 7), 
    axis.title = element_text(size = 12) 
  ) +
  coord_cartesian(clip = "off") 

Observations:

Affiliates: This channel underperforms significantly, with the lowest average profit per customer (29.35) and a disproportionate profit contribution relative to its customer base (8.91% profit vs. 12.30% customers). Action: Review and optimize partnerships, focusing on high-value affiliates. Negotiate better cost structures or discontinue low-performing affiliates.

Direct: While it dominates in volume and profit, its average profit per customer (41.32) is lower than channels like Paid Social. Action: Focus on customer retention and increasing subsequent orders to enhance profitability.

Organic Search: Balances customer acquisition volume and profit contribution well. Its 42.10 average profit per customer is solid. Action: Continue investing in SEO and organic content strategies. Address crowd-out effects by complementing organic efforts with Paid Search.

Paid Search: Performs efficiently but shows slightly lower average profit per customer (40.84) compared to Organic Search and Paid Social. Action: Optimize ad spending by identifying high-performing keywords and reducing crowd-out from competitors.

Paid Social: While small in scale, it delivers the highest average profit per customer (53.99) and a profit contribution higher than its customer share (5.98% profit vs. 4.49% customers). Action: Scale Paid Social campaigns cautiously, maintaining profitability by focusing on high-conversion audiences.

# Aggregate promotions and calculate adjusted profit
channel_promo_impact <- df_full %>%
  group_by(`Source of Customer`) %>%
  summarise(
    Total_Customers = n(),
    Total_First_Order_Profit = sum(`1st Order Profit`, na.rm = TRUE),
    Total_Subsequent_Order_Profit = sum(`Subsequent Order Profit`, na.rm = TRUE),
    Total_Value_of_All_Promotions = sum(`Total value of all promotions`, na.rm = TRUE),
    Total_Profit = sum(`1st Order Profit`, na.rm = TRUE) + sum(`Subsequent Order Profit`, na.rm = TRUE),
    Total_Net_Profit = Total_Profit - Total_Value_of_All_Promotions
  ) %>%
  mutate(
    Profit_Impact = Total_Profit - Total_Net_Profit,  # Total loss due to promotions
    Profit_Impact_Percentage = (Profit_Impact / Total_Profit) * 100  # Percentage loss
  )

# View the aggregated data
print(channel_promo_impact)
## # A tibble: 5 × 9
##   `Source of Customer` Total_Customers Total_First_Order_Profit
##   <chr>                          <int>                    <int>
## 1 Affiliates                      3868                    29751
## 2 Direct                         11252                   118219
## 3 Organic Search                  8392                    90581
## 4 Paid Search                     6518                    69435
## 5 Paid Social                     1411                    19029
## # ℹ 6 more variables: Total_Subsequent_Order_Profit <int>,
## #   Total_Value_of_All_Promotions <int>, Total_Profit <int>,
## #   Total_Net_Profit <int>, Profit_Impact <int>, Profit_Impact_Percentage <dbl>
# Bar plot showing original vs adjusted profit
ggplot(channel_promo_impact, aes(x = reorder(`Source of Customer`, -Total_Profit))) +
  geom_bar(aes(y = Total_Profit, fill = "Total Profit"), stat = "identity") +
  geom_bar(aes(y = Total_Net_Profit, fill = "Net Profit"), stat = "identity", alpha = 0.8) +
  labs(
    title = "Impact of Promotions on Channel Performance",
    x = "Source of Customer",
    y = "Profit",
    fill = "Profit Type"
  ) +
  theme_minimal()

# Bar plot showing profit impact percentage
ggplot(channel_promo_impact, aes(x = reorder(`Source of Customer`, -Profit_Impact_Percentage), y = Profit_Impact_Percentage, fill = `Source of Customer`)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Profit Impact Percentage by Channel Due to Promotions",
    x = "Source of Customer",
    y = "Impact Percentage (%)"
  ) +
  theme_minimal()

High promotional spending in Direct (69,762) is justified given its dominant profit contribution. Organic Search and Paid Search show good profitability but might benefit from a combined organic and paid strategy to reduce crowd-out impacts. Additionally, Affiliates demonstrate low profitability (113,542 net profit) despite moderate promotional costs, indicating inefficiency in this channel. This is backed by the rising cost for Affiliates during January to March of 2013, making it unsustainable as a customer acquisition channel. Paid Social’s high average profit per customer (53.99) shows potential for growth, given minimal crowd-out effects.

library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Convert 'Contact Allowed' to a binary 'conversion' column
df_full <- df_full %>%
  mutate(conversion = ifelse(`Contact Allowed` == "Y", 1, 0))

# Regression Model: Impact of Conversion Rate on Total Profit
model_total_profit <- lm(Total_Profit ~ conversion + `Source of Customer` + Age + Title, data = df_full)
summary(model_total_profit)
## 
## Call:
## lm(formula = Total_Profit ~ conversion + `Source of Customer` + 
##     Age + Title, data = df_full)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -72.93 -29.37 -10.81  17.61 353.19 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        12.935003   1.015709  12.735   <2e-16 ***
## conversion                         14.699235   0.492089  29.871   <2e-16 ***
## `Source of Customer`Direct         14.725804   0.808757  18.208   <2e-16 ***
## `Source of Customer`Organic Search 14.310708   0.838726  17.062   <2e-16 ***
## `Source of Customer`Paid Search    13.453586   0.876400  15.351   <2e-16 ***
## `Source of Customer`Paid Social    25.407326   1.339612  18.966   <2e-16 ***
## Age                                -0.143120   0.008215 -17.422   <2e-16 ***
## TitleMiss                          -0.384297   1.071276  -0.359    0.720    
## TitleMr                            15.019500   0.746471  20.121   <2e-16 ***
## TitleMrs                            1.720046   1.123762   1.531    0.126    
## TitleMs                            -0.311075   1.121211  -0.277    0.781    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.51 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.07523,    Adjusted R-squared:  0.07493 
## F-statistic: 248.8 on 10 and 30580 DF,  p-value: < 2.2e-16
# Regression Model: Impact of Conversion Rate on First Order Profit
model_first_order_profit <- lm(`1st Order Profit` ~ conversion + `Source of Customer` + Age + Title, data = df_full)
summary(model_first_order_profit)
## 
## Call:
## lm(formula = `1st Order Profit` ~ conversion + `Source of Customer` + 
##     Age + Title, data = df_full)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.820  -5.194  -0.843   4.115  43.771 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         6.858719   0.162805  42.128  < 2e-16 ***
## conversion                         -0.084954   0.078876  -1.077 0.281462    
## `Source of Customer`Direct          3.278260   0.129633  25.289  < 2e-16 ***
## `Source of Customer`Organic Search  3.351777   0.134437  24.932  < 2e-16 ***
## `Source of Customer`Paid Search     3.220577   0.140475  22.926  < 2e-16 ***
## `Source of Customer`Paid Social     5.919619   0.214722  27.569  < 2e-16 ***
## Age                                -0.034361   0.001317 -26.095  < 2e-16 ***
## TitleMiss                          -0.027290   0.171712  -0.159 0.873727    
## TitleMr                             2.236935   0.119650  18.696  < 2e-16 ***
## TitleMrs                            0.618564   0.180124   3.434 0.000595 ***
## TitleMs                            -0.148417   0.179715  -0.826 0.408900    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.814 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.07054,    Adjusted R-squared:  0.07024 
## F-statistic: 232.1 on 10 and 30580 DF,  p-value: < 2.2e-16
# Regression Model: Impact of Conversion Rate on Subsequent Order Profit
model_subsequent_order_profit <- lm(`Subsequent Order Profit` ~ conversion + `Source of Customer` + Age + Title, data = df_full)
summary(model_subsequent_order_profit)
## 
## Call:
## lm(formula = `Subsequent Order Profit` ~ conversion + `Source of Customer` + 
##     Age + Title, data = df_full)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -65.75 -33.66 -12.28  20.67 356.13 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        12.415290   1.104947  11.236   <2e-16 ***
## conversion                         14.733770   0.535323  27.523   <2e-16 ***
## `Source of Customer`Direct         11.257967   0.879812  12.796   <2e-16 ***
## `Source of Customer`Organic Search 11.018750   0.912414  12.076   <2e-16 ***
## `Source of Customer`Paid Search     9.924481   0.953398  10.410   <2e-16 ***
## `Source of Customer`Paid Social    19.796256   1.457307  13.584   <2e-16 ***
## Age                                -0.104771   0.008937 -11.724   <2e-16 ***
## TitleMiss                          -0.437682   1.165396  -0.376    0.707    
## TitleMr                            13.042050   0.812054  16.061   <2e-16 ***
## TitleMrs                            1.691230   1.222492   1.383    0.167    
## TitleMs                            -0.194792   1.219717  -0.160    0.873    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.24 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.05153,    Adjusted R-squared:  0.05122 
## F-statistic: 166.1 on 10 and 30580 DF,  p-value: < 2.2e-16
# Print summaries for interpretation
cat("\nRegression Analysis: Total Profit\n")
## 
## Regression Analysis: Total Profit
print(summary(model_total_profit))
## 
## Call:
## lm(formula = Total_Profit ~ conversion + `Source of Customer` + 
##     Age + Title, data = df_full)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -72.93 -29.37 -10.81  17.61 353.19 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        12.935003   1.015709  12.735   <2e-16 ***
## conversion                         14.699235   0.492089  29.871   <2e-16 ***
## `Source of Customer`Direct         14.725804   0.808757  18.208   <2e-16 ***
## `Source of Customer`Organic Search 14.310708   0.838726  17.062   <2e-16 ***
## `Source of Customer`Paid Search    13.453586   0.876400  15.351   <2e-16 ***
## `Source of Customer`Paid Social    25.407326   1.339612  18.966   <2e-16 ***
## Age                                -0.143120   0.008215 -17.422   <2e-16 ***
## TitleMiss                          -0.384297   1.071276  -0.359    0.720    
## TitleMr                            15.019500   0.746471  20.121   <2e-16 ***
## TitleMrs                            1.720046   1.123762   1.531    0.126    
## TitleMs                            -0.311075   1.121211  -0.277    0.781    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.51 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.07523,    Adjusted R-squared:  0.07493 
## F-statistic: 248.8 on 10 and 30580 DF,  p-value: < 2.2e-16
cat("\nRegression Analysis: First Order Profit\n")
## 
## Regression Analysis: First Order Profit
print(summary(model_first_order_profit))
## 
## Call:
## lm(formula = `1st Order Profit` ~ conversion + `Source of Customer` + 
##     Age + Title, data = df_full)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.820  -5.194  -0.843   4.115  43.771 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         6.858719   0.162805  42.128  < 2e-16 ***
## conversion                         -0.084954   0.078876  -1.077 0.281462    
## `Source of Customer`Direct          3.278260   0.129633  25.289  < 2e-16 ***
## `Source of Customer`Organic Search  3.351777   0.134437  24.932  < 2e-16 ***
## `Source of Customer`Paid Search     3.220577   0.140475  22.926  < 2e-16 ***
## `Source of Customer`Paid Social     5.919619   0.214722  27.569  < 2e-16 ***
## Age                                -0.034361   0.001317 -26.095  < 2e-16 ***
## TitleMiss                          -0.027290   0.171712  -0.159 0.873727    
## TitleMr                             2.236935   0.119650  18.696  < 2e-16 ***
## TitleMrs                            0.618564   0.180124   3.434 0.000595 ***
## TitleMs                            -0.148417   0.179715  -0.826 0.408900    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.814 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.07054,    Adjusted R-squared:  0.07024 
## F-statistic: 232.1 on 10 and 30580 DF,  p-value: < 2.2e-16
cat("\nRegression Analysis: Subsequent Order Profit\n")
## 
## Regression Analysis: Subsequent Order Profit
print(summary(model_subsequent_order_profit))
## 
## Call:
## lm(formula = `Subsequent Order Profit` ~ conversion + `Source of Customer` + 
##     Age + Title, data = df_full)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -65.75 -33.66 -12.28  20.67 356.13 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        12.415290   1.104947  11.236   <2e-16 ***
## conversion                         14.733770   0.535323  27.523   <2e-16 ***
## `Source of Customer`Direct         11.257967   0.879812  12.796   <2e-16 ***
## `Source of Customer`Organic Search 11.018750   0.912414  12.076   <2e-16 ***
## `Source of Customer`Paid Search     9.924481   0.953398  10.410   <2e-16 ***
## `Source of Customer`Paid Social    19.796256   1.457307  13.584   <2e-16 ***
## Age                                -0.104771   0.008937 -11.724   <2e-16 ***
## TitleMiss                          -0.437682   1.165396  -0.376    0.707    
## TitleMr                            13.042050   0.812054  16.061   <2e-16 ***
## TitleMrs                            1.691230   1.222492   1.383    0.167    
## TitleMs                            -0.194792   1.219717  -0.160    0.873    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.24 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.05153,    Adjusted R-squared:  0.05122 
## F-statistic: 166.1 on 10 and 30580 DF,  p-value: < 2.2e-16
# (a) Bar Chart of Conversions by Marketing Channel
channel_conv <- df %>%
  filter(`Contact Allowed` == "Y") %>%   # Only count successful conversions
  group_by(`Source of Customer`) %>%
  summarise(Conversions = n())

model_total_profit <- lm(Total_Profit ~ Age + conversion + `Source of Customer` + Title, data = df_full)
summary(model_total_profit)
## 
## Call:
## lm(formula = Total_Profit ~ Age + conversion + `Source of Customer` + 
##     Title, data = df_full)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -72.93 -29.37 -10.81  17.61 353.19 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        12.935003   1.015709  12.735   <2e-16 ***
## Age                                -0.143120   0.008215 -17.422   <2e-16 ***
## conversion                         14.699235   0.492089  29.871   <2e-16 ***
## `Source of Customer`Direct         14.725804   0.808757  18.208   <2e-16 ***
## `Source of Customer`Organic Search 14.310708   0.838726  17.062   <2e-16 ***
## `Source of Customer`Paid Search    13.453586   0.876400  15.351   <2e-16 ***
## `Source of Customer`Paid Social    25.407326   1.339612  18.966   <2e-16 ***
## TitleMiss                          -0.384297   1.071276  -0.359    0.720    
## TitleMr                            15.019500   0.746471  20.121   <2e-16 ***
## TitleMrs                            1.720046   1.123762   1.531    0.126    
## TitleMs                            -0.311075   1.121211  -0.277    0.781    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.51 on 30580 degrees of freedom
##   (850 observations deleted due to missingness)
## Multiple R-squared:  0.07523,    Adjusted R-squared:  0.07493 
## F-statistic: 248.8 on 10 and 30580 DF,  p-value: < 2.2e-16
  1. Contribution of Conversion (customers agreeing to receiving marketing)

Total Profit: Conversion has the strongest positive impact (14.70) on overall profitability, highlighting the importance of successfully converting customers.

First Order Profit: Conversion is not statistically significant (p = 0.28) for first-order profits.

Subsequent Order Profit: Conversion has a significant and strong positive impact (14.73) on subsequent order profits, suggesting that once a customer converts, they contribute significantly to long-term profitability through repeat purchases.

Insight: Conversion plays a crucial role in driving subsequent profitability, which ultimately feeds into total profit.

  1. Source of Customer

Total Profit: Paid Social: The most impactful source of customer acquisition (25.41), contributing significantly to overall profitability. Direct (14.73) and Organic Search (14.31) follow closely, making them critical channels for scaling. Paid Search (13.45) is the least impactful among paid channels.

First Order Profit: Paid Social: Highest impact (5.92), but the difference between channels is smaller compared to total profit. Direct, Organic Search, and Paid Search contribute similarly (between 3.22 and 3.35), indicating comparable performance for initial purchases.

Subsequent Order Profit: Paid Social again leads (19.80), followed by Direct (11.26) and Organic Search (11.02). Paid Search contributes the least (9.92) among major channels but still significantly impacts subsequent profits.

Insight: Paid Social consistently drives higher profitability, particularly in subsequent orders, suggesting that customers acquired through this channel have a strong long-term value.

  1. Age

Total Profit: Negative coefficient (-0.14) indicates that older customers are associated with lower total profit.

First Order Profit: Similar negative effect (-0.034), showing that younger customers contribute slightly more to initial profits.

Subsequent Order Profit: Stronger negative effect (-0.10), reinforcing that younger customers are more valuable over time.

Insight: Younger customers are more profitable both in the short and long term, making them a priority for targeting.

  1. Titles

Total Profit: “Mr” has the largest positive impact (15.02), while other titles (Miss, Mrs, Ms) have minimal or non-significant effects.

First Order Profit: “Mr” contributes 2.24, with Mrs adding a modest 0.62.

Subsequent Order Profit: “Mr” has a strong impact (13.04), consistent with total profit, while other titles remain non-significant.

Insight: Customers addressed as “Mr” are consistently more profitable across all metrics, which makes sense for a male fashion line.

# Aggregate metrics by marketing channel
channel_performance <- df_full %>%
  group_by(`Source of Customer`) %>%
  summarise(
    Total_1st_Order_Profit = sum(`1st Order Profit`, na.rm = TRUE),
    Total_Subsequent_Order_Profit = sum(`Subsequent Order Profit`, na.rm = TRUE),
    Total_Subsequent_Orders_Count = sum(`Subsequent Orders Count`, na.rm = TRUE),
    Total_Promotions_Value = sum(`Total value of all promotions`, na.rm = TRUE)
  )

print(channel_performance)
## # A tibble: 5 × 5
##   `Source of Customer` Total_1st_Order_Profit Total_Subsequent_Order_Profit
##   <chr>                                 <int>                         <int>
## 1 Affiliates                            29751                        108538
## 2 Direct                               118219                        416526
## 3 Organic Search                        90581                        317012
## 4 Paid Search                           69435                        236535
## 5 Paid Social                           19029                         66661
## # ℹ 2 more variables: Total_Subsequent_Orders_Count <int>,
## #   Total_Promotions_Value <int>
ggplot(df_full, aes(x = reorder(`Source of Customer`, -conversion), y = conversion)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Conversions by Marketing Channel",
       x = "Marketing Channel",
       y = "Number of Conversions") +
  theme_minimal()

Direct is the strongest overall performer, leading in first-order profit, subsequent profit, and repeat orders, despite incurring the highest promotional costs.

Organic Search is a close second, showing strong profitability and efficient promotional spending relative to its performance. Paid Social, while having the lowest volume, shows strong profitability metrics like Revenue per Subsequent Order, suggesting high customer value.

Affiliates and Paid Search contribute moderately but require optimization in promotional spending and subsequent order retention to improve their overall profitability.

As Direct is the strongest performer in all value categories as well as conversion, which significantly impacts total profit, we will focus on analyzing Direct channel’s target demographic.

# Focus on the Direct channel
direct_channel <- df_full %>%
  filter(`Source of Customer` == "Direct") %>%
  mutate(
    Age_Group = case_when(
      Age < 20 ~ "Under 20",
      Age >= 20 & Age < 30 ~ "20-29",
      Age >= 30 & Age < 40 ~ "30-39",
      Age >= 40 & Age < 50 ~ "40-49",
      Age >= 50 ~ "50 and above"
    )
  )

# Calculate conversion rates and profits by age group
age_analysis <- direct_channel %>%
  group_by(Age_Group) %>%
  summarise(
    Total_Customers = n(),
    Conversions = sum(conversion == 1, na.rm = TRUE),
    Total_Profit = sum(Total_Profit, na.rm = TRUE),
    Conversion_Rate = (Conversions / Total_Customers) * 100
  ) %>%
  arrange(desc(Conversion_Rate)) # Order by highest conversion rate


# View the results
print("Age Demographic Analysis:")
## [1] "Age Demographic Analysis:"
print(age_analysis)
## # A tibble: 4 × 5
##   Age_Group Total_Customers Conversions Total_Profit Conversion_Rate
##   <chr>               <int>       <int>        <int>           <dbl>
## 1 20-29                4774        2803       168053            58.7
## 2 Under 20             1762        1021        92841            57.9
## 3 30-39                3825        2215       164411            57.9
## 4 40-49                 891         499        39678            56.0

Under 20: Despite being a smaller demographic, they show high engagement and profitability relative to their size, making them a valuable secondary target for growth initiatives.

20-29: This is the strongest performing age group across all metrics, making it the primary target demographic for marketing efforts. Their high customer volume, conversions, profit, and conversion rate suggest that campaigns targeting this group are highly effective.

30-39: This group is a close competitor to 20-29 in terms of total profit and conversions, suggesting they should not be overlooked in campaign strategies.

40-49: This is the weakest-performing group across all metrics, with the smallest customer base, lowest conversions, and lowest profit. Efforts to improve engagement in this demographic may require tailored campaigns or alternative strategies to boost their profitability.

# Calculate conversion rates and profits by time of day (hour)
time_analysis <- direct_channel %>%
  group_by(Time) %>%
  summarise(
    Total_Customers = n(),
    Conversions = sum(conversion, na.rm = TRUE),
    Total_Profit = sum(Total_Profit, na.rm = TRUE),
    Conversion_Rate = (Conversions / Total_Customers) * 100
  ) %>%
  arrange(desc(Conversion_Rate)) # Order by highest conversion rate

direct_channel <- direct_channel %>%
  mutate(Hour = hour(hm(Time)))  # Extract hour using lubridate

# Group data into 4-hour time buckets
time_analysis <- direct_channel %>%
  mutate(Time_Bucket = cut(Hour, 
                           breaks = seq(0, 24, by = 4), 
                           include.lowest = TRUE, 
                           labels = c("12-4 AM", "4-8 AM", "8-12 PM", 
                                      "12-4 PM", "4-8 PM", "8-12 AM"))) %>%
  group_by(Time_Bucket) %>%
  summarise(
    Total_Customers = n(),
    Conversions = sum(conversion == 1, na.rm = TRUE),
    Total_Profit = sum(Total_Profit, na.rm = TRUE),
    Avg_Conversion_Rate = mean(conversion == 1, na.rm = TRUE) * 100
  )

# Plot Conversion Rate and Total Profit
ggplot(time_analysis, aes(x = Time_Bucket)) +
  geom_bar(aes(y = Avg_Conversion_Rate), stat = "identity", fill = "steelblue", alpha = 0.7) +
  geom_line(aes(y = Total_Profit / 1000, group = 1), color = "darkred", size = 1.2) +
  geom_point(aes(y = Total_Profit / 1000), color = "darkred", size = 3) +
  scale_y_continuous(
    name = "Average Conversion Rate (%)",
    sec.axis = sec_axis(~ . * 1000, name = "Total Profit (in Thousands)")
  ) +
  labs(
    title = "Time of Day Analysis (Grouped by Time Buckets)",
    x = "Time of Day",
    y = "Average Conversion Rate (%)"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.y.right = element_text(color = "darkred")
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

High Profit and Consistent Conversion: The 12-4 PM bucket performs the best, with high total profit and a stable conversion rate. Marketing and sales efforts should focus on maximizing reach and engagement during this period.

Underperforming Periods: The 4-8 AM and 4-8 PM buckets underperform both in terms of profit and conversion rates, suggesting less customer activity during these hours. Resources might be better allocated to other time periods.

# Focus on the Direct channel and prepare the data
city_analysis <- df_full %>%
  filter(`Source of Customer` == "Direct") %>%
  group_by(`Address City`) %>%
  summarise(
    Total_Customers = n(),
    Conversions = sum(conversion, na.rm = TRUE),
    Total_Profit = sum(Total_Profit, na.rm = TRUE),
    Conversion_Rate = (Conversions / Total_Customers) * 100
  ) %>%
  arrange(desc(Conversion_Rate)) # Order by highest conversion rate

# View the results
print("City Analysis:")
## [1] "City Analysis:"
print(city_analysis)
## # A tibble: 50 × 5
##    `Address City` Total_Customers Conversions Total_Profit Conversion_Rate
##    <chr>                    <int>       <dbl>        <int>           <dbl>
##  1 Longford                    16          15          973            93.8
##  2 Blackrock                   41          31         1662            75.6
##  3 Portlaoise                  56          40         2869            71.4
##  4 Tralee                      91          63         4267            69.2
##  5 Dungarvan                   22          15          759            68.2
##  6 Enniscorthy                 43          29         2176            67.4
##  7 Athlone                     54          36         1970            66.7
##  8 Gorey                       33          22         1452            66.7
##  9 Tullamore                   33          22         1692            66.7
## 10 Ashbourne                   45          29         2133            64.4
## # ℹ 40 more rows

For Longford & Blackrock (High Conversion, Low Profit):

Increase transaction value: Introduce upselling and cross-selling strategies to encourage higher-value purchases in these cities. Expand customer base: Consider local campaigns to attract new customers and grow market share in these regions. Optimize costs: Review supply chain and operational efficiencies in these areas to improve margins.

For Dublin & Limerick (Moderate Conversion, High Profit):

Focus on retention: Build loyalty programs or premium offerings to retain high-value customers in these cities. Replicate success: Identify what works in these regions (e.g., product mix, promotions) and adapt these strategies for underperforming locations. General Approach: Profit-driven segmentation: Segment campaigns based on profit potential rather than just conversion rates, ensuring resources are allocated efficiently.

Part 2

  1. The URL includes:

    • The base category: https://www.marksandspencer.com/ie/l/men/mens-shirts.
    • A fit filter: /fs5/slim-fit.
    • A color filter: /blue. - The fs5 section specifies filters in a hierarchical order: /fs5/{fit}/{color}
    • An anchor (#) that includes metadata for internal tracking, such as navigation source, product type, and other details.
  2. Regular Fit Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit#intid=finav_mw_plp_Regular%20Fit%20Shirts_foundit_fi_fi-Men%E2%80%99s-Regular-Fit_top_plp_12_1
  3. Slim Fit Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit#intid=finav_mw_plp_Slim%20Fit%20Shirts_foundit_fi_fi-Men%E2%80%99s-Slim-Fit_top_plp_12_2
  4. Blue Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/blue#intid=finav_mw_plp_Blue%20Shirts_foundit_fi_fi-Men%E2%80%99s-Blue_top_plp_12_3
  5. White Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/white#intid=finav_mw_plp_White%20Shirts_foundit_fi_fi-Men%E2%80%99s-White_top_plp_12_4
  6. Green Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/green#intid=finav_mw_plp_Green%20Shirts_foundit_fi_fi-Men%E2%80%99s-Green_top_plp_12_5
  7. Regular Fit Blue Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit/blue#intid=finav_mw_plp_Regular%20Fit%20Blue%20Shirts_foundit_fi_fi-Men%E2%80%99s-Blue-Regular-Fit_top_plp_12_6
  8. Regular Fit White Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit/white#intid=finav_mw_plp_Regular%20Fit%20White%20Shirts_foundit_fi_fi-Men%E2%80%99s-White-Regular-Fit_top_plp_12_7
  9. Regular Fit Green Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit/green#intid=finav_mw_plp_Regular%20Fit%20Green%20Shirts_foundit_fi_fi-Men%E2%80%99s-Green-Regular-Fit_top_plp_12_8
  10. Slim Fit Blue Shirts

    https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit/blue#intid=finav_mw_plp_Slim%20Fit%20Blue%20Shirts_foundit_fi_fi-Men%E2%80%99s-Blue-Slim-Fit_top_plp_12_9
  11. Slim Fit White Shirts

https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit/white#intid=finav_mw_plp_Slim%20Fit%20White%20Shirts_foundit_fi_fi-Men%E2%80%99s-White-Slim-Fit_top_plp_12_10
  1. Slim Fit Green Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit/green#intid=finav_mw_plp_Slim%20Fit%20Green%20Shirts_foundit_fi_fi-Men%E2%80%99s-Green-Slim-Fit_top_plp_12_11

Key Takeaways: - The pattern consistently appends fit and color as segments of the fs5 path. - The #intid portion tracks navigation metadata and might vary based on campaigns. - Ensure the correct fit and color filters are ordered properly in the URL.