This project analysis uses a digital marketing campaign dataset containing 10,000 records from five companies — TechCorp, Alpha Innovations, Innovate Industries, DataTech Solutions, and NexGen Systems — covering campaigns run throughout 2021.
Each record represents one marketing campaign and includes:
Campaign attributes: Type (Public Display, Email, Influencer, Search, Social Media), Channel (YouTube, Google Ads, Instagram, Facebook, Email, Website), Target Audience, Customer Segment, Location, and Duration.
Performance metrics:
The goal of this analysis is to evaluate campaign effectiveness across different types, channels, and audiences — with a primary focus on ROI, Impressions, and Conversion Rate.
marketing_data <- read_csv("marketing_campaign_data_s.csv")
## Rows: 10000 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): Company, Campaign_Type, Target_Audience, Duration, Channel_Used, A...
## dbl (6): Campaign_ID, Conversion_Rate, ROI, Clicks, Impressions, Engagement...
## date (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
marketing_data <- marketing_data %>%
mutate(Acquisition_Cost = as.numeric(gsub("[$,]", "", Acquisition_Cost)))
marketing_data
## # A tibble: 10,000 × 16
## Campaign_ID Company Campaign_Type Target_Audience Duration Channel_Used
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 182735 Innovate Ind… Display Men 18-24 30 days YouTube
## 2 188942 TechCorp Influencer All Ages 45 days Google Ads
## 3 134058 TechCorp Display Women 35-44 30 days Email
## 4 124022 TechCorp Social Media Men 18-24 60 days Google Ads
## 5 160997 Innovate Ind… Display Men 18-24 30 days Instagram
## 6 103065 DataTech Sol… Email All Ages 60 days Google Ads
## 7 124507 TechCorp Search Men 18-24 15 days Email
## 8 199365 Alpha Innova… Email Women 35-44 30 days Email
## 9 193627 TechCorp Email Men 25-34 15 days YouTube
## 10 45404 NexGen Syste… Social Media Men 25-34 30 days YouTube
## # ℹ 9,990 more rows
## # ℹ 10 more variables: Conversion_Rate <dbl>, Acquisition_Cost <dbl>,
## # ROI <dbl>, Location <chr>, Language <chr>, Clicks <dbl>, Impressions <dbl>,
## # Engagement_Score <dbl>, Customer_Segment <chr>, Date <date>
For my first figure, I am going to create a bar chart showing the average ROI for each Campaign Type, ordered from highest to lowest.
This helps identify which type of campaign delivers the best return on investment.
fig_dat1 <- marketing_data %>%
group_by(Campaign_Type) %>%
summarise(Avg_ROI = round(mean(ROI, na.rm = TRUE), 2)) %>%
arrange(desc(Avg_ROI))
ggplot(fig_dat1, aes(x = reorder(Campaign_Type, -Avg_ROI), y = Avg_ROI, fill = Campaign_Type)) +
geom_bar(stat = "identity", width = 0.6) +
geom_text(aes(label = Avg_ROI), vjust = -0.5, size = 4) +
coord_cartesian(ylim = c(4.5, 5.5)) +
labs(title = "Average ROI by Campaign Type", x = "Campaign Type", y = "Average ROI") +
theme_minimal() +
theme(legend.position = "none")
The chart reveals that all five campaign types deliver very similar average ROI values ranging between 4.97 and 5.02.
Influencer and Social Media campaigns marginally lead, while Email campaigns show the lowest average return.
This suggests that no single campaign type dramatically outperforms others, indicating a balanced but undifferentiated marketing strategy across the organization.
This visualization explores the relationship between Clicks and Impressions across different marketing channels using a scatter plot.
By highlighting channels with the highest and lowest performance, it reveals how audience reach translates into actual engagement and identifies which channels drive the most clicks relative to their impressions.
fig_dat2 <- marketing_data %>%
group_by(Channel_Used) %>%
summarise(
Avg_Clicks = round(mean(Clicks, na.rm = TRUE), 1),
Avg_Impressions = round(mean(Impressions, na.rm = TRUE), 1)
) %>%
mutate(Performance = case_when(
Avg_Clicks == max(Avg_Clicks) ~ "Highest Clicks",
Avg_Clicks == min(Avg_Clicks) ~ "Lowest Clicks",
TRUE ~ "Average"
))
ggplot(fig_dat2, aes(x = Avg_Impressions, y = Avg_Clicks, color = Performance, size = Performance)) +
geom_point() +
geom_text(aes(label = Channel_Used), vjust = -1.2, hjust = 0.5, size = 3.5, color = "black") +
scale_color_manual(values = c(
"Highest Clicks" = "#1D9E75",
"Lowest Clicks" = "#D85A30",
"Average" = "#534AB7"
)) +
scale_size_manual(values = c(
"Highest Clicks" = 6,
"Lowest Clicks" = 6,
"Average" = 4
)) +
scale_x_continuous(limits = c(5300, 5700)) +
scale_y_continuous(limits = c(540, 570)) +
labs(
title = "Average Clicks vs Impressions by Marketing Channel",
subtitle = "Green = Highest Clicks | Red = Lowest Clicks | Purple = Average",
x = "Average Impressions",
y = "Average Clicks",
color = "Performance",
size = "Performance"
) +
theme_minimal() +
theme(legend.position = "bottom", plot.margin = margin(20, 40, 10, 10))
Understanding from above visuals of Clicks vs Impressions by Channel (Grouped Scatter Plot):
Email emerges as the highest performing channel in terms of clicks, while Instagram records the lowest.
Google Ads generates the highest impressions but does not translate this reach into proportionally higher clicks,suggesting a gap between visibility and audience engagement for that channel.
For my third figure, I am going to create a pie chart comparing the average Conversion Rate across Customer Segments.
This helps determine which audience segments respond best to marketing campaigns.
fig_dat3 <- marketing_data %>%
group_by(Customer_Segment) %>%
summarise(Avg_Conversion = round(mean(Conversion_Rate, na.rm = TRUE), 4)) %>%
arrange(desc(Avg_Conversion))
ggplot(fig_dat3, aes(x = "", y = Avg_Conversion, fill = Customer_Segment)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Customer_Segment, "\n", Avg_Conversion)),
position = position_stack(vjust = 0.5), size = 3) +
labs(
title = "Average Conversion Rate by Customer Segment",
fill = "Customer Segment"
) +
theme_void()
Part 3 — Understanding from above visuals of Conversion Rate by Customer Segment:
Foodies show the highest average conversion rate at 8.05%, followed closely by Health & Wellness and Outdoor Adventurers at 8.02%.
Fashionistas record the lowest conversion rate at 7.93%. The differences are marginal, indicating that all customer segments respond with broadly similar conversion behavior.
This visualization displays the proportional distribution of campaign types used across all marketing campaigns in the dataset.
A donut chart effectively shows each campaign type’s share of the total, helping identify which strategies are most and least frequently deployed.
fig_dat4 <- marketing_data %>%
group_by(Campaign_Type) %>%
summarise(Count = n()) %>%
mutate(Percentage = round(Count / sum(Count) * 100, 1)) %>%
arrange(desc(Count))
ggplot(fig_dat4, aes(x = 2, y = Count, fill = Campaign_Type)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
xlim(0.5, 2.5) +
geom_text(aes(label = paste0(Campaign_Type, "\n", Percentage, "%")),
position = position_stack(vjust = 0.5), size = 3) +
labs(
title = "Distribution of Campaign Types",
fill = "Campaign Type"
) +
theme_void()
Understanding from above visuals of Campaign Type Distribution:
Display campaigns are the most frequently deployed at 20.8% of all campaigns, while Influencer and Email campaigns are the least used at 19.7% each.
Overall the distribution is nearly equal across all five campaign types, suggesting a deliberately balanced multi-channel strategy rather than concentration in any single type.
This visualization tracks the number of marketing campaigns launched each month throughout 2021. Identifying seasonal patterns in campaign frequency helps understand how marketing activity fluctuates over the year and whether certain months see higher campaign deployment.
fig_dat5 <- marketing_data %>%
select(Date) %>%
mutate(Month = floor_date(Date, "month")) %>%
group_by(Month) %>%
summarise(Campaign_Count = n()) %>%
arrange(Month)
ggplot(fig_dat5, aes(x = Month, y = Campaign_Count)) +
geom_line(color = "#534AB7", linewidth = 1.2) +
geom_point(color = "#D85A30", size = 3) +
geom_text(aes(label = Campaign_Count), vjust = -1, size = 3.5) +
labs(
title = "Monthly Marketing Campaign Count (2021)",
x = "Month",
y = "Number of Campaigns"
) +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
scale_y_continuous(limits = c(650, 950)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Understanding from above visuals of Monthly Campaign Count:
Campaign activity peaks in December with 895 campaigns and drops to its lowest in February with only 712 campaigns. There is a noticeable mid-year dip between June and September, followed by a recovery toward the year end — possibly reflecting budget planning cycles or seasonal marketing strategies aligned with holiday periods.
This visualization compares the average Engagement Score across different marketing channels using a radar chart.
A radar chart is ideal for comparing multiple categories across a single metric simultaneously, making it easy to identify which channels drive the highest audience engagement and which underperform relative to others.
fig_dat6 <- marketing_data %>%
group_by(Channel_Used) %>%
summarise(Avg_Engagement = round(mean(Engagement_Score, na.rm = TRUE), 2)) %>%
arrange(desc(Avg_Engagement))
radar_dat <- fig_dat6 %>%
pivot_wider(names_from = Channel_Used, values_from = Avg_Engagement)
radar_dat <- rbind(
rep(6, ncol(radar_dat)),
rep(5, ncol(radar_dat)),
radar_dat
)
radarchart(radar_dat,
axistype = 1,
pcol = "#534AB7",
pfcol = rgb(83, 74, 183, 80, maxColorValue = 255),
plwd = 2,
cglcol = "grey",
cglty = 1,
axislabcol = "grey30",
vlcex = 0.9,
title = "Average Engagement Score by Marketing Channel"
)
Understanding from above visuals of Engagement Score by Channel
Google Ads leads all channels with an average engagement score of 5.60, followed by Facebook and YouTube both at 5.50.
Website and Instagram show the lowest engagement at 5.37 and 5.38 respectively. While differences are small, Google Ads consistently generates the most audience interaction relative to other channels.
This visualization displays the top 10 performing marketing campaigns ranked by ROI using a horizontal bar chart.
Identifying the highest performing individual campaigns helps understand which specific combinations of company, campaign type, and strategy deliver the best returns.
fig_dat7 <- marketing_data %>%
select(Campaign_ID, Company, Campaign_Type, ROI, Conversion_Rate) %>%
arrange(desc(ROI)) %>%
head(10) %>%
mutate(Campaign_Label = paste0("ID:", Campaign_ID, " | ", Company))
ggplot(fig_dat7, aes(x = reorder(Campaign_Label, ROI), y = ROI, fill = Campaign_Type)) +
geom_bar(stat = "identity", width = 0.7) +
geom_text(aes(label = round(ROI, 2)), hjust = -0.2, size = 3.5) +
coord_flip() +
labs(
title = "Top 10 Marketing Campaigns by ROI",
x = "Campaign",
y = "ROI",
fill = "Campaign Type"
) +
scale_y_continuous(limits = c(0, 9)) +
theme_minimal() +
theme(axis.text.y = element_text(size = 8))
Understanding from above visuals of Top 10 Campaigns by ROI (Horizontal Bar Chart):
The top performing campaigns all achieve an ROI of 7.99 to 8.00 the maximum in the dataset. These high performers are spread across multiple companies and campaign types including Influencer, Social Media, and Display, confirming that no single strategy monopolizes top performance. NexGen Systems and Innovate Industries appear most frequently among the top 10.
This interactive visualization displays the correlation between key numerical marketing metrics including Clicks, Impressions, Conversion Rate, ROI, Engagement Score, and Acquisition Cost.
A heatmap is ideal for identifying relationships between multiple variables simultaneously, where darker colors indicate stronger correlations. The interactive feature allows users to hover over each cell to see the exact correlation value.
raw_dat8 <- marketing_data %>%
select(Clicks, Impressions, Conversion_Rate, ROI, Engagement_Score, Acquisition_Cost)
fig_dat8 <- round(cor(raw_dat8, use = "complete.obs"), 3)
corr_matrix <- as.data.frame(fig_dat8)
corr_long <- corr_matrix %>%
rownames_to_column("Var1") %>%
pivot_longer(-Var1, names_to = "Var2", values_to = "Correlation")
plot_ly(
data = corr_long,
x = ~Var2,
y = ~Var1,
z = ~Correlation,
type = "heatmap",
colors = colorRamp(c("#D85A30", "white", "#534AB7")),
zmin = -0.05,
zmax = 1,
text = ~round(Correlation, 3),
hovertemplate = "Variable 1: %{y}<br>Variable 2: %{x}<br>Correlation: %{text}<extra></extra>"
) %>%
add_annotations(
x = ~corr_long$Var2,
y = ~corr_long$Var1,
text = ~round(corr_long$Correlation, 3),
showarrow = FALSE,
font = list(size = 11, color = "black")
) %>%
layout(
title = "Correlation Heatmap of Marketing Metrics",
xaxis = list(title = "", tickangle = -45),
yaxis = list(title = ""),
margin = list(l = 120, b = 120)
) %>%
colorbar(title = "Correlation")
Understanding from above visuals of Correlation Heatmap (Interactive Plotly): The heatmap confirms that all six marketing metrics : Clicks, Impressions, Conversion Rate, ROI, Engagement Score, and Acquisition Cost are essentially independent of each other, with all correlation values ranging between -0.02 and 0.02.
This near-zero correlation across all variable pairs indicates that improving one metric does not automatically drive improvement in another, highlighting the complexity of optimizing marketing campaign performance holistically.