Part 1
Primary Focus: Optimize the Direct channel’s performance while reducing inefficient costs in Affiliates.
Key Insights: - Direct is the dominant channel (largest volume, strong profitability). - Paid Social has the highest profit per customer, though with lower volume. - Geography & Age Trends: Dublin leads in Direct volume while Longford leads in conversion rate; younger demographics (20-29) are key profit drivers.
Recommendations: Cut costs by reallocating spend from Affiliates to Direct and Paid Social. Prioritize Dublin as it drives substantial profit and should be a focal point for further investment. Focus on 20-29 age group for maximum long-term profitability.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(rio)
rm(list=ls())
setwd("/Users/mkim/Desktop/2K")
df <- import("/Users/mkim/Desktop/2K/Demographic.csv")
dfvalue <- import("/Users/mkim/Desktop/2K/Value.csv")
# 1. Data Preparation
df <- df %>%
separate(`Registration Date`, into = c("Date", "Time"), sep = " ")
df <- df %>%
mutate(
`Date Of Birth` = mdy(`Date Of Birth`), # Convert to Date format (if stored as DD/MM/YYYY)
`Date` = mdy(`Date`), # Ensure the reference date is also in Date format
Age = as.integer(interval(`Date Of Birth`, `Date`) / years(1)) # Calculate Age
)
# 2. Merge in Customer Value Data (dfvalue)
# Assume customer value data is in 'dfvalue'
df_full <- left_join(df, dfvalue, by = "Customer ID") %>%
# Create an overall profitability metric
mutate(Total_Profit = `1st Order Profit` + `Subsequent Order Profit` - `Total value of all promotions`)
channel_attr <- df_full %>%
group_by(`Source of Customer`) %>%
summarise(
Num_Customers = n(),
Total_First_Order_Profit = sum(`1st Order Profit`, na.rm = TRUE),
Total_Subsequent_Order_Profit = sum(`Subsequent Order Profit`, na.rm = TRUE),
Total_Profit = sum(Total_Profit, na.rm = TRUE),
Avg_Profit_Per_Customer = Total_Profit / n()
) %>%
# Calculate percentage contributions
mutate(
Perc_Customers = Num_Customers / sum(Num_Customers) * 100,
Perc_Profit = Total_Profit / sum(Total_Profit) * 100
)
# View the aggregated data
print(channel_attr)
## # A tibble: 5 × 8
## `Source of Customer` Num_Customers Total_First_Order_Profit
## <chr> <int> <int>
## 1 Affiliates 3868 29751
## 2 Direct 11252 118219
## 3 Organic Search 8392 90581
## 4 Paid Search 6518 69435
## 5 Paid Social 1411 19029
## # ℹ 5 more variables: Total_Subsequent_Order_Profit <int>, Total_Profit <int>,
## # Avg_Profit_Per_Customer <dbl>, Perc_Customers <dbl>, Perc_Profit <dbl>
#Bar Chart – Total Profit by Channel
ggplot(channel_attr, aes(x = reorder(`Source of Customer`, -Total_Profit), y = Total_Profit, fill = `Source of Customer`)) +
geom_bar(stat = "identity") +
labs(title = "Total Profit Attribution by Source of Customer",
x = "Source of Customer",
y = "Total Profit") +
theme_minimal()
# Bar Chart – Average Profit per Customer
ggplot(channel_attr, aes(x = reorder(`Source of Customer`, -Avg_Profit_Per_Customer), y = Avg_Profit_Per_Customer, fill = `Source of Customer`)) +
geom_bar(stat = "identity") +
geom_text(aes(label = round(Avg_Profit_Per_Customer, 1)), vjust = -0.5, size = 4) +
labs(title = "Average Profit per Customer by Source of Customer",
x = "Source of Customer",
y = "Avg Profit per Customer") +
theme_minimal()
# Scatterplot – Customer vs. Profit Contribution by Source of Customer
ggplot(channel_attr, aes(x = Perc_Customers, y = Perc_Profit, color = `Source of Customer`)) +
geom_point(size = 3) +
geom_text(aes(label = `Source of Customer`), vjust = -1, size = 4, color = "black") +
labs(title = "Customer vs. Profit Contribution by Source of Customer",
x = "Percentage of Customers",
y = "Percentage of Total Profit") +
theme_minimal(base_size = 15) +
theme(
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 7),
axis.title = element_text(size = 12)
) +
coord_cartesian(clip = "off")
Observations:
Affiliates: This channel underperforms significantly, with the lowest average profit per customer (29.35) and a disproportionate profit contribution relative to its customer base (8.91% profit vs. 12.30% customers). Action: Review and optimize partnerships, focusing on high-value affiliates. Negotiate better cost structures or discontinue low-performing affiliates.
Direct: While it dominates in volume and profit, its average profit per customer (41.32) is lower than channels like Paid Social. Action: Focus on customer retention and increasing subsequent orders to enhance profitability.
Organic Search: Balances customer acquisition volume and profit contribution well. Its 42.10 average profit per customer is solid. Action: Continue investing in SEO and organic content strategies. Address crowd-out effects by complementing organic efforts with Paid Search.
Paid Search: Performs efficiently but shows slightly lower average profit per customer (40.84) compared to Organic Search and Paid Social. Action: Optimize ad spending by identifying high-performing keywords and reducing crowd-out from competitors.
Paid Social: While small in scale, it delivers the highest average profit per customer (53.99) and a profit contribution higher than its customer share (5.98% profit vs. 4.49% customers). Action: Scale Paid Social campaigns cautiously, maintaining profitability by focusing on high-conversion audiences.
# Aggregate promotions and calculate adjusted profit
channel_promo_impact <- df_full %>%
group_by(`Source of Customer`) %>%
summarise(
Total_Customers = n(),
Total_First_Order_Profit = sum(`1st Order Profit`, na.rm = TRUE),
Total_Subsequent_Order_Profit = sum(`Subsequent Order Profit`, na.rm = TRUE),
Total_Value_of_All_Promotions = sum(`Total value of all promotions`, na.rm = TRUE),
Total_Profit = sum(`1st Order Profit`, na.rm = TRUE) + sum(`Subsequent Order Profit`, na.rm = TRUE),
Total_Net_Profit = Total_Profit - Total_Value_of_All_Promotions
) %>%
mutate(
Profit_Impact = Total_Profit - Total_Net_Profit, # Total loss due to promotions
Profit_Impact_Percentage = (Profit_Impact / Total_Profit) * 100 # Percentage loss
)
# View the aggregated data
print(channel_promo_impact)
## # A tibble: 5 × 9
## `Source of Customer` Total_Customers Total_First_Order_Profit
## <chr> <int> <int>
## 1 Affiliates 3868 29751
## 2 Direct 11252 118219
## 3 Organic Search 8392 90581
## 4 Paid Search 6518 69435
## 5 Paid Social 1411 19029
## # ℹ 6 more variables: Total_Subsequent_Order_Profit <int>,
## # Total_Value_of_All_Promotions <int>, Total_Profit <int>,
## # Total_Net_Profit <int>, Profit_Impact <int>, Profit_Impact_Percentage <dbl>
# Bar plot showing original vs adjusted profit
ggplot(channel_promo_impact, aes(x = reorder(`Source of Customer`, -Total_Profit))) +
geom_bar(aes(y = Total_Profit, fill = "Total Profit"), stat = "identity") +
geom_bar(aes(y = Total_Net_Profit, fill = "Net Profit"), stat = "identity", alpha = 0.8) +
labs(
title = "Impact of Promotions on Channel Performance",
x = "Source of Customer",
y = "Profit",
fill = "Profit Type"
) +
theme_minimal()
# Bar plot showing profit impact percentage
ggplot(channel_promo_impact, aes(x = reorder(`Source of Customer`, -Profit_Impact_Percentage), y = Profit_Impact_Percentage, fill = `Source of Customer`)) +
geom_bar(stat = "identity") +
labs(
title = "Profit Impact Percentage by Channel Due to Promotions",
x = "Source of Customer",
y = "Impact Percentage (%)"
) +
theme_minimal()
High promotional spending in Direct (69,762) is justified given its dominant profit contribution. Organic Search and Paid Search show good profitability but might benefit from a combined organic and paid strategy to reduce crowd-out impacts. Additionally, Affiliates demonstrate low profitability (113,542 net profit) despite moderate promotional costs, indicating inefficiency in this channel. This is backed by the rising cost for Affiliates during January to March of 2013, making it unsustainable as a customer acquisition channel. Paid Social’s high average profit per customer (53.99) shows potential for growth, given minimal crowd-out effects.
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# Convert 'Contact Allowed' to a binary 'conversion' column
df_full <- df_full %>%
mutate(conversion = ifelse(`Contact Allowed` == "Y", 1, 0))
# Regression Model: Impact of Conversion Rate on Total Profit
model_total_profit <- lm(Total_Profit ~ conversion + `Source of Customer` + Age + Title, data = df_full)
summary(model_total_profit)
##
## Call:
## lm(formula = Total_Profit ~ conversion + `Source of Customer` +
## Age + Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -72.93 -29.37 -10.81 17.61 353.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.935003 1.015709 12.735 <2e-16 ***
## conversion 14.699235 0.492089 29.871 <2e-16 ***
## `Source of Customer`Direct 14.725804 0.808757 18.208 <2e-16 ***
## `Source of Customer`Organic Search 14.310708 0.838726 17.062 <2e-16 ***
## `Source of Customer`Paid Search 13.453586 0.876400 15.351 <2e-16 ***
## `Source of Customer`Paid Social 25.407326 1.339612 18.966 <2e-16 ***
## Age -0.143120 0.008215 -17.422 <2e-16 ***
## TitleMiss -0.384297 1.071276 -0.359 0.720
## TitleMr 15.019500 0.746471 20.121 <2e-16 ***
## TitleMrs 1.720046 1.123762 1.531 0.126
## TitleMs -0.311075 1.121211 -0.277 0.781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.51 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.07523, Adjusted R-squared: 0.07493
## F-statistic: 248.8 on 10 and 30580 DF, p-value: < 2.2e-16
# Regression Model: Impact of Conversion Rate on First Order Profit
model_first_order_profit <- lm(`1st Order Profit` ~ conversion + `Source of Customer` + Age + Title, data = df_full)
summary(model_first_order_profit)
##
## Call:
## lm(formula = `1st Order Profit` ~ conversion + `Source of Customer` +
## Age + Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.820 -5.194 -0.843 4.115 43.771
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.858719 0.162805 42.128 < 2e-16 ***
## conversion -0.084954 0.078876 -1.077 0.281462
## `Source of Customer`Direct 3.278260 0.129633 25.289 < 2e-16 ***
## `Source of Customer`Organic Search 3.351777 0.134437 24.932 < 2e-16 ***
## `Source of Customer`Paid Search 3.220577 0.140475 22.926 < 2e-16 ***
## `Source of Customer`Paid Social 5.919619 0.214722 27.569 < 2e-16 ***
## Age -0.034361 0.001317 -26.095 < 2e-16 ***
## TitleMiss -0.027290 0.171712 -0.159 0.873727
## TitleMr 2.236935 0.119650 18.696 < 2e-16 ***
## TitleMrs 0.618564 0.180124 3.434 0.000595 ***
## TitleMs -0.148417 0.179715 -0.826 0.408900
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.814 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.07054, Adjusted R-squared: 0.07024
## F-statistic: 232.1 on 10 and 30580 DF, p-value: < 2.2e-16
# Regression Model: Impact of Conversion Rate on Subsequent Order Profit
model_subsequent_order_profit <- lm(`Subsequent Order Profit` ~ conversion + `Source of Customer` + Age + Title, data = df_full)
summary(model_subsequent_order_profit)
##
## Call:
## lm(formula = `Subsequent Order Profit` ~ conversion + `Source of Customer` +
## Age + Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.75 -33.66 -12.28 20.67 356.13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.415290 1.104947 11.236 <2e-16 ***
## conversion 14.733770 0.535323 27.523 <2e-16 ***
## `Source of Customer`Direct 11.257967 0.879812 12.796 <2e-16 ***
## `Source of Customer`Organic Search 11.018750 0.912414 12.076 <2e-16 ***
## `Source of Customer`Paid Search 9.924481 0.953398 10.410 <2e-16 ***
## `Source of Customer`Paid Social 19.796256 1.457307 13.584 <2e-16 ***
## Age -0.104771 0.008937 -11.724 <2e-16 ***
## TitleMiss -0.437682 1.165396 -0.376 0.707
## TitleMr 13.042050 0.812054 16.061 <2e-16 ***
## TitleMrs 1.691230 1.222492 1.383 0.167
## TitleMs -0.194792 1.219717 -0.160 0.873
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.24 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.05153, Adjusted R-squared: 0.05122
## F-statistic: 166.1 on 10 and 30580 DF, p-value: < 2.2e-16
# Print summaries for interpretation
cat("\nRegression Analysis: Total Profit\n")
##
## Regression Analysis: Total Profit
print(summary(model_total_profit))
##
## Call:
## lm(formula = Total_Profit ~ conversion + `Source of Customer` +
## Age + Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -72.93 -29.37 -10.81 17.61 353.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.935003 1.015709 12.735 <2e-16 ***
## conversion 14.699235 0.492089 29.871 <2e-16 ***
## `Source of Customer`Direct 14.725804 0.808757 18.208 <2e-16 ***
## `Source of Customer`Organic Search 14.310708 0.838726 17.062 <2e-16 ***
## `Source of Customer`Paid Search 13.453586 0.876400 15.351 <2e-16 ***
## `Source of Customer`Paid Social 25.407326 1.339612 18.966 <2e-16 ***
## Age -0.143120 0.008215 -17.422 <2e-16 ***
## TitleMiss -0.384297 1.071276 -0.359 0.720
## TitleMr 15.019500 0.746471 20.121 <2e-16 ***
## TitleMrs 1.720046 1.123762 1.531 0.126
## TitleMs -0.311075 1.121211 -0.277 0.781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.51 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.07523, Adjusted R-squared: 0.07493
## F-statistic: 248.8 on 10 and 30580 DF, p-value: < 2.2e-16
cat("\nRegression Analysis: First Order Profit\n")
##
## Regression Analysis: First Order Profit
print(summary(model_first_order_profit))
##
## Call:
## lm(formula = `1st Order Profit` ~ conversion + `Source of Customer` +
## Age + Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.820 -5.194 -0.843 4.115 43.771
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.858719 0.162805 42.128 < 2e-16 ***
## conversion -0.084954 0.078876 -1.077 0.281462
## `Source of Customer`Direct 3.278260 0.129633 25.289 < 2e-16 ***
## `Source of Customer`Organic Search 3.351777 0.134437 24.932 < 2e-16 ***
## `Source of Customer`Paid Search 3.220577 0.140475 22.926 < 2e-16 ***
## `Source of Customer`Paid Social 5.919619 0.214722 27.569 < 2e-16 ***
## Age -0.034361 0.001317 -26.095 < 2e-16 ***
## TitleMiss -0.027290 0.171712 -0.159 0.873727
## TitleMr 2.236935 0.119650 18.696 < 2e-16 ***
## TitleMrs 0.618564 0.180124 3.434 0.000595 ***
## TitleMs -0.148417 0.179715 -0.826 0.408900
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.814 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.07054, Adjusted R-squared: 0.07024
## F-statistic: 232.1 on 10 and 30580 DF, p-value: < 2.2e-16
cat("\nRegression Analysis: Subsequent Order Profit\n")
##
## Regression Analysis: Subsequent Order Profit
print(summary(model_subsequent_order_profit))
##
## Call:
## lm(formula = `Subsequent Order Profit` ~ conversion + `Source of Customer` +
## Age + Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.75 -33.66 -12.28 20.67 356.13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.415290 1.104947 11.236 <2e-16 ***
## conversion 14.733770 0.535323 27.523 <2e-16 ***
## `Source of Customer`Direct 11.257967 0.879812 12.796 <2e-16 ***
## `Source of Customer`Organic Search 11.018750 0.912414 12.076 <2e-16 ***
## `Source of Customer`Paid Search 9.924481 0.953398 10.410 <2e-16 ***
## `Source of Customer`Paid Social 19.796256 1.457307 13.584 <2e-16 ***
## Age -0.104771 0.008937 -11.724 <2e-16 ***
## TitleMiss -0.437682 1.165396 -0.376 0.707
## TitleMr 13.042050 0.812054 16.061 <2e-16 ***
## TitleMrs 1.691230 1.222492 1.383 0.167
## TitleMs -0.194792 1.219717 -0.160 0.873
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.24 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.05153, Adjusted R-squared: 0.05122
## F-statistic: 166.1 on 10 and 30580 DF, p-value: < 2.2e-16
# (a) Bar Chart of Conversions by Marketing Channel
channel_conv <- df %>%
filter(`Contact Allowed` == "Y") %>% # Only count successful conversions
group_by(`Source of Customer`) %>%
summarise(Conversions = n())
model_total_profit <- lm(Total_Profit ~ Age + conversion + `Source of Customer` + Title, data = df_full)
summary(model_total_profit)
##
## Call:
## lm(formula = Total_Profit ~ Age + conversion + `Source of Customer` +
## Title, data = df_full)
##
## Residuals:
## Min 1Q Median 3Q Max
## -72.93 -29.37 -10.81 17.61 353.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.935003 1.015709 12.735 <2e-16 ***
## Age -0.143120 0.008215 -17.422 <2e-16 ***
## conversion 14.699235 0.492089 29.871 <2e-16 ***
## `Source of Customer`Direct 14.725804 0.808757 18.208 <2e-16 ***
## `Source of Customer`Organic Search 14.310708 0.838726 17.062 <2e-16 ***
## `Source of Customer`Paid Search 13.453586 0.876400 15.351 <2e-16 ***
## `Source of Customer`Paid Social 25.407326 1.339612 18.966 <2e-16 ***
## TitleMiss -0.384297 1.071276 -0.359 0.720
## TitleMr 15.019500 0.746471 20.121 <2e-16 ***
## TitleMrs 1.720046 1.123762 1.531 0.126
## TitleMs -0.311075 1.121211 -0.277 0.781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.51 on 30580 degrees of freedom
## (850 observations deleted due to missingness)
## Multiple R-squared: 0.07523, Adjusted R-squared: 0.07493
## F-statistic: 248.8 on 10 and 30580 DF, p-value: < 2.2e-16
Total Profit: Conversion has the strongest positive impact (14.70) on overall profitability, highlighting the importance of successfully converting customers.
First Order Profit: Conversion is not statistically significant (p = 0.28) for first-order profits.
Subsequent Order Profit: Conversion has a significant and strong positive impact (14.73) on subsequent order profits, suggesting that once a customer converts, they contribute significantly to long-term profitability through repeat purchases.
Insight: Conversion plays a crucial role in driving subsequent profitability, which ultimately feeds into total profit.
Total Profit: Paid Social: The most impactful source of customer acquisition (25.41), contributing significantly to overall profitability. Direct (14.73) and Organic Search (14.31) follow closely, making them critical channels for scaling. Paid Search (13.45) is the least impactful among paid channels.
First Order Profit: Paid Social: Highest impact (5.92), but the difference between channels is smaller compared to total profit. Direct, Organic Search, and Paid Search contribute similarly (between 3.22 and 3.35), indicating comparable performance for initial purchases.
Subsequent Order Profit: Paid Social again leads (19.80), followed by Direct (11.26) and Organic Search (11.02). Paid Search contributes the least (9.92) among major channels but still significantly impacts subsequent profits.
Insight: Paid Social consistently drives higher profitability, particularly in subsequent orders, suggesting that customers acquired through this channel have a strong long-term value.
Total Profit: Negative coefficient (-0.14) indicates that older customers are associated with lower total profit.
First Order Profit: Similar negative effect (-0.034), showing that younger customers contribute slightly more to initial profits.
Subsequent Order Profit: Stronger negative effect (-0.10), reinforcing that younger customers are more valuable over time.
Insight: Younger customers are more profitable both in the short and long term, making them a priority for targeting.
Total Profit: “Mr” has the largest positive impact (15.02), while other titles (Miss, Mrs, Ms) have minimal or non-significant effects.
First Order Profit: “Mr” contributes 2.24, with Mrs adding a modest 0.62.
Subsequent Order Profit: “Mr” has a strong impact (13.04), consistent with total profit, while other titles remain non-significant.
Insight: Customers addressed as “Mr” are consistently more profitable across all metrics, which makes sense for a male fashion line.
# Aggregate metrics by marketing channel
channel_performance <- df_full %>%
group_by(`Source of Customer`) %>%
summarise(
Total_1st_Order_Profit = sum(`1st Order Profit`, na.rm = TRUE),
Total_Subsequent_Order_Profit = sum(`Subsequent Order Profit`, na.rm = TRUE),
Total_Subsequent_Orders_Count = sum(`Subsequent Orders Count`, na.rm = TRUE),
Total_Promotions_Value = sum(`Total value of all promotions`, na.rm = TRUE)
)
print(channel_performance)
## # A tibble: 5 × 5
## `Source of Customer` Total_1st_Order_Profit Total_Subsequent_Order_Profit
## <chr> <int> <int>
## 1 Affiliates 29751 108538
## 2 Direct 118219 416526
## 3 Organic Search 90581 317012
## 4 Paid Search 69435 236535
## 5 Paid Social 19029 66661
## # ℹ 2 more variables: Total_Subsequent_Orders_Count <int>,
## # Total_Promotions_Value <int>
ggplot(df_full, aes(x = reorder(`Source of Customer`, -conversion), y = conversion)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Conversions by Marketing Channel",
x = "Marketing Channel",
y = "Number of Conversions") +
theme_minimal()
Direct is the strongest overall performer, leading in first-order profit, subsequent profit, and repeat orders, despite incurring the highest promotional costs.
Organic Search is a close second, showing strong profitability and efficient promotional spending relative to its performance. Paid Social, while having the lowest volume, shows strong profitability metrics like Revenue per Subsequent Order, suggesting high customer value.
Affiliates and Paid Search contribute moderately but require optimization in promotional spending and subsequent order retention to improve their overall profitability.
As Direct is the strongest performer in all value categories as well as conversion, which significantly impacts total profit, we will focus on analyzing Direct channel’s target demographic.
# Focus on the Direct channel
direct_channel <- df_full %>%
filter(`Source of Customer` == "Direct") %>%
mutate(
Age_Group = case_when(
Age < 20 ~ "Under 20",
Age >= 20 & Age < 30 ~ "20-29",
Age >= 30 & Age < 40 ~ "30-39",
Age >= 40 & Age < 50 ~ "40-49",
Age >= 50 ~ "50 and above"
)
)
# Calculate conversion rates and profits by age group
age_analysis <- direct_channel %>%
group_by(Age_Group) %>%
summarise(
Total_Customers = n(),
Conversions = sum(conversion == 1, na.rm = TRUE),
Total_Profit = sum(Total_Profit, na.rm = TRUE),
Conversion_Rate = (Conversions / Total_Customers) * 100
) %>%
arrange(desc(Conversion_Rate)) # Order by highest conversion rate
# View the results
print("Age Demographic Analysis:")
## [1] "Age Demographic Analysis:"
print(age_analysis)
## # A tibble: 4 × 5
## Age_Group Total_Customers Conversions Total_Profit Conversion_Rate
## <chr> <int> <int> <int> <dbl>
## 1 20-29 4774 2803 168053 58.7
## 2 Under 20 1762 1021 92841 57.9
## 3 30-39 3825 2215 164411 57.9
## 4 40-49 891 499 39678 56.0
Under 20: Despite being a smaller demographic, they show high engagement and profitability relative to their size, making them a valuable secondary target for growth initiatives.
20-29: This is the strongest performing age group across all metrics, making it the primary target demographic for marketing efforts. Their high customer volume, conversions, profit, and conversion rate suggest that campaigns targeting this group are highly effective.
30-39: This group is a close competitor to 20-29 in terms of total profit and conversions, suggesting they should not be overlooked in campaign strategies.
40-49: This is the weakest-performing group across all metrics, with the smallest customer base, lowest conversions, and lowest profit. Efforts to improve engagement in this demographic may require tailored campaigns or alternative strategies to boost their profitability.
# Calculate conversion rates and profits by time of day (hour)
time_analysis <- direct_channel %>%
group_by(Time) %>%
summarise(
Total_Customers = n(),
Conversions = sum(conversion, na.rm = TRUE),
Total_Profit = sum(Total_Profit, na.rm = TRUE),
Conversion_Rate = (Conversions / Total_Customers) * 100
) %>%
arrange(desc(Conversion_Rate)) # Order by highest conversion rate
direct_channel <- direct_channel %>%
mutate(Hour = hour(hm(Time))) # Extract hour using lubridate
# Group data into 4-hour time buckets
time_analysis <- direct_channel %>%
mutate(Time_Bucket = cut(Hour,
breaks = seq(0, 24, by = 4),
include.lowest = TRUE,
labels = c("12-4 AM", "4-8 AM", "8-12 PM",
"12-4 PM", "4-8 PM", "8-12 AM"))) %>%
group_by(Time_Bucket) %>%
summarise(
Total_Customers = n(),
Conversions = sum(conversion == 1, na.rm = TRUE),
Total_Profit = sum(Total_Profit, na.rm = TRUE),
Avg_Conversion_Rate = mean(conversion == 1, na.rm = TRUE) * 100
)
# Plot Conversion Rate and Total Profit
ggplot(time_analysis, aes(x = Time_Bucket)) +
geom_bar(aes(y = Avg_Conversion_Rate), stat = "identity", fill = "steelblue", alpha = 0.7) +
geom_line(aes(y = Total_Profit / 1000, group = 1), color = "darkred", size = 1.2) +
geom_point(aes(y = Total_Profit / 1000), color = "darkred", size = 3) +
scale_y_continuous(
name = "Average Conversion Rate (%)",
sec.axis = sec_axis(~ . * 1000, name = "Total Profit (in Thousands)")
) +
labs(
title = "Time of Day Analysis (Grouped by Time Buckets)",
x = "Time of Day",
y = "Average Conversion Rate (%)"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.y.right = element_text(color = "darkred")
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
High Profit
and Consistent Conversion: The 12-4 PM bucket performs the best, with
high total profit and a stable conversion rate. Marketing and sales
efforts should focus on maximizing reach and engagement during this
period.
Underperforming Periods: The 4-8 AM and 4-8 PM buckets underperform both in terms of profit and conversion rates, suggesting less customer activity during these hours. Resources might be better allocated to other time periods.
# Focus on the Direct channel and prepare the data
city_analysis <- df_full %>%
filter(`Source of Customer` == "Direct") %>%
group_by(`Address City`) %>%
summarise(
Total_Customers = n(),
Conversions = sum(conversion, na.rm = TRUE),
Total_Profit = sum(Total_Profit, na.rm = TRUE),
Conversion_Rate = (Conversions / Total_Customers) * 100
) %>%
arrange(desc(Conversion_Rate)) # Order by highest conversion rate
# View the results
print("City Analysis:")
## [1] "City Analysis:"
print(city_analysis)
## # A tibble: 50 × 5
## `Address City` Total_Customers Conversions Total_Profit Conversion_Rate
## <chr> <int> <dbl> <int> <dbl>
## 1 Longford 16 15 973 93.8
## 2 Blackrock 41 31 1662 75.6
## 3 Portlaoise 56 40 2869 71.4
## 4 Tralee 91 63 4267 69.2
## 5 Dungarvan 22 15 759 68.2
## 6 Enniscorthy 43 29 2176 67.4
## 7 Athlone 54 36 1970 66.7
## 8 Gorey 33 22 1452 66.7
## 9 Tullamore 33 22 1692 66.7
## 10 Ashbourne 45 29 2133 64.4
## # ℹ 40 more rows
For Longford & Blackrock (High Conversion, Low Profit):
Increase transaction value: Introduce upselling and cross-selling strategies to encourage higher-value purchases in these cities. Expand customer base: Consider local campaigns to attract new customers and grow market share in these regions. Optimize costs: Review supply chain and operational efficiencies in these areas to improve margins.
For Dublin & Limerick (Moderate Conversion, High Profit):
Focus on retention: Build loyalty programs or premium offerings to retain high-value customers in these cities. Replicate success: Identify what works in these regions (e.g., product mix, promotions) and adapt these strategies for underperforming locations. General Approach: Profit-driven segmentation: Segment campaigns based on profit potential rather than just conversion rates, ensuring resources are allocated efficiently.
Part 2
The URL includes:
https://www.marksandspencer.com/ie/l/men/mens-shirts./fs5/slim-fit./blue. - The fs5 section
specifies filters in a hierarchical order: /fs5/{fit}/{color}#) that includes metadata for internal
tracking, such as navigation source, product type, and other
details.Regular Fit Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit#intid=finav_mw_plp_Regular%20Fit%20Shirts_foundit_fi_fi-Men%E2%80%99s-Regular-Fit_top_plp_12_1Slim Fit Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit#intid=finav_mw_plp_Slim%20Fit%20Shirts_foundit_fi_fi-Men%E2%80%99s-Slim-Fit_top_plp_12_2Blue Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/blue#intid=finav_mw_plp_Blue%20Shirts_foundit_fi_fi-Men%E2%80%99s-Blue_top_plp_12_3White Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/white#intid=finav_mw_plp_White%20Shirts_foundit_fi_fi-Men%E2%80%99s-White_top_plp_12_4Green Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/green#intid=finav_mw_plp_Green%20Shirts_foundit_fi_fi-Men%E2%80%99s-Green_top_plp_12_5Regular Fit Blue Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit/blue#intid=finav_mw_plp_Regular%20Fit%20Blue%20Shirts_foundit_fi_fi-Men%E2%80%99s-Blue-Regular-Fit_top_plp_12_6Regular Fit White Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit/white#intid=finav_mw_plp_Regular%20Fit%20White%20Shirts_foundit_fi_fi-Men%E2%80%99s-White-Regular-Fit_top_plp_12_7Regular Fit Green Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/regular-fit/green#intid=finav_mw_plp_Regular%20Fit%20Green%20Shirts_foundit_fi_fi-Men%E2%80%99s-Green-Regular-Fit_top_plp_12_8Slim Fit Blue Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit/blue#intid=finav_mw_plp_Slim%20Fit%20Blue%20Shirts_foundit_fi_fi-Men%E2%80%99s-Blue-Slim-Fit_top_plp_12_9Slim Fit White Shirts
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit/white#intid=finav_mw_plp_Slim%20Fit%20White%20Shirts_foundit_fi_fi-Men%E2%80%99s-White-Slim-Fit_top_plp_12_10
https://www.marksandspencer.com/ie/l/men/mens-shirts/fs5/slim-fit/green#intid=finav_mw_plp_Slim%20Fit%20Green%20Shirts_foundit_fi_fi-Men%E2%80%99s-Green-Slim-Fit_top_plp_12_11
Key Takeaways: - The pattern consistently appends fit and color as
segments of the fs5 path. - The #intid portion
tracks navigation metadata and might vary based on campaigns. - Ensure
the correct fit and color filters are ordered properly in the URL.