Project Title

Marketing Campaign Performance Analysis using Data Visualization

Introduction

This project focuses on analyzing a large-scale marketing campaign dataset to understand how different campaign strategies influence customer engagement, conversion rate, impressions, clicks, and return on investment (ROI).

Dataset Information

Dataset Name

Marketing Campaign Dataset

Source

Open marketing campaign dataset provided for academic analysis.

Link : https://www.kaggle.com/datasets/manishabhatt22/marketing-campaign-performance-dataset

Dataset Characteristics

  • File format: .csv
  • Rows: Approximately 200,000
  • Columns: 16
  • Flat structured dataset suitable for spreadsheet viewing
  • Includes temporal data (Date)
  • Contains categorical and numerical variables
  • Suitable for visualization and exploratory analysis

Key Variables

Variable Description
Campaign_Type Type of marketing campaign
Channel_Used Marketing channel used
Conversion_Rate Percentage of successful conversions
ROI Return on Investment
Clicks Number of clicks
Impressions Number of impressions
Engagement_Score Customer engagement level
Target_Audience Audience category
Location Geographic location
Date Campaign date

Research Objectives

The project aims to answer the following questions:

  1. Which campaign types generate the highest ROI?
  2. Which marketing channels perform best?
  3. How do impressions and clicks relate to conversion rates?
  4. Which customer segments show higher engagement?
  5. How does campaign performance vary over time?
  6. Which locations have stronger campaign performance?

Visualizations

The report contains at least eight figures with more than three visualization types.

Figures

  1. Bar chart of average ROI by campaign type
  2. Line chart showing campaign trends over time
  3. Scatter plot of clicks vs conversion rate
  4. Histogram of engagement scores
  5. Boxplot of ROI across channels
  6. Heatmap of campaign type vs target audience
  7. Pie chart of customer segments
  8. Faceted visualization comparing locations and ROI

Tools and Technologies

Importing Data

# Import the marketing dataset
Campaign_data <- read_csv("marketing_sampled_data.csv")
## Rows: 10000 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Company, Campaign_Type, Target_Audience, Duration, Channel_Used, A...
## dbl  (6): Campaign_ID, Conversion_Rate, ROI, Clicks, Impressions, Engagement...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Clean Acquisition_Cost: remove "$" and "," then convert to numeric
Campaign_data <- Campaign_data %>%
  mutate(Acquisition_Cost = as.numeric(gsub("[$,]", "", Acquisition_Cost)))

# Convert to tibble
Campaign_data <- tibble(Campaign_data)

# Preview the data
Campaign_data
## # A tibble: 10,000 × 16
##    Campaign_ID Company       Campaign_Type Target_Audience Duration Channel_Used
##          <dbl> <chr>         <chr>         <chr>           <chr>    <chr>       
##  1      182735 Innovate Ind… Display       Men 18-24       30 days  YouTube     
##  2      188942 TechCorp      Influencer    All Ages        45 days  Google Ads  
##  3      134058 TechCorp      Display       Women 35-44     30 days  Email       
##  4      124022 TechCorp      Social Media  Men 18-24       60 days  Google Ads  
##  5      160997 Innovate Ind… Display       Men 18-24       30 days  Instagram   
##  6      103065 DataTech Sol… Email         All Ages        60 days  Google Ads  
##  7      124507 TechCorp      Search        Men 18-24       15 days  Email       
##  8      199365 Alpha Innova… Email         Women 35-44     30 days  Email       
##  9      193627 TechCorp      Email         Men 25-34       15 days  YouTube     
## 10       45404 NexGen Syste… Social Media  Men 25-34       30 days  YouTube     
## # ℹ 9,990 more rows
## # ℹ 10 more variables: Conversion_Rate <dbl>, Acquisition_Cost <dbl>,
## #   ROI <dbl>, Location <chr>, Language <chr>, Clicks <dbl>, Impressions <dbl>,
## #   Engagement_Score <dbl>, Customer_Segment <chr>, Date <chr>

Figure 1 — Bar Chart of Average ROI by Campaign Type

The Bar Chart compares Campaign Performances. X-axis shows 5 types of campaign: Influencer, Social Media,Display,Search & Email. Y-axis Shows Average ROI for each type.The chart highlights which campaign type delivers the highest return on investment.The data is fetched from a two column tibble containing Campaign_Type and Average ROI. The key insight : No single campaign type dramatically outperforms the others, so diversification across types may be a sound approach.

fig_dat1 <- Campaign_data %>%
  select(Campaign_Type, ROI) %>%
  group_by(Campaign_Type) %>%
  summarise(avg_ROI = round(mean(ROI, na.rm = TRUE), 2), .groups = "drop") %>%
  arrange(desc(avg_ROI))

fig_dat1
## # A tibble: 5 × 2
##   Campaign_Type avg_ROI
##   <chr>           <dbl>
## 1 Influencer       5.02
## 2 Social Media     5.02
## 3 Search           5   
## 4 Display          4.98
## 5 Email            4.97
ggplot(fig_dat1, aes(x = reorder(Campaign_Type, -avg_ROI),
                     y = avg_ROI,
                     fill = Campaign_Type)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = avg_ROI), vjust = -0.5, size = 4) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Average ROI by Campaign Type",
    subtitle = "Based on 10,000 marketing campaigns",
    x        = "Campaign Type",
    y        = "Average ROI"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  )


Figure 3 — Scatter Plot of Clicks vs Conversion Rate

The Scatter Plot shows campaign performance, and whether the more clicks translate into better conversion rates. X-axis represents number of Clicks and Y-Axis represents the Conversion rate (%).Each point represents a single marketing campaign. Points are colored by Campaign types to distinguish between different Campaigns like Display, Email, Influencer, Search and Social Media.The data is fetched from a three column tibble containing Clicks, Conversion_Rate and Campaign_Type.The key insight: More clicks do not necessarily translate into better conversion rates and differences across campaign types are subtle.

fig_dat3 <- Campaign_data %>%
  select(Clicks, Conversion_Rate, Campaign_Type)

fig_dat3
## # A tibble: 10,000 × 3
##    Clicks Conversion_Rate Campaign_Type
##     <dbl>           <dbl> <chr>        
##  1    121            0.07 Display      
##  2    797            0.05 Influencer   
##  3    426            0.15 Display      
##  4    870            0.14 Social Media 
##  5    169            0.13 Display      
##  6    155            0.13 Email        
##  7    304            0.05 Search       
##  8    732            0.02 Email        
##  9    592            0.11 Email        
## 10    483            0.13 Social Media 
## # ℹ 9,990 more rows
ggplot(fig_dat3, aes(x = Clicks, y = Conversion_Rate, color = Campaign_Type)) +
  geom_jitter(alpha = 0.3, size = 1.5, height = 0.005) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.8) +
  scale_color_brewer(palette = "Set2") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title    = "Clicks vs Conversion Rate by Campaign Type",
    subtitle = "Each point represents one campaign",
    x        = "Clicks",
    y        = "Conversion Rate (%)",
    color    = "Campaign Type"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title      = element_text(face = "bold", hjust = 0.5),
    plot.subtitle   = element_text(hjust = 0.5),
    legend.position = "bottom"
  )


Figure 4 — Histogram of Engagement Scores

The Histogram shows the distribution of Engagement scores across all campaigns. The X-axis shows the Engagement Scores (1-10) and Y-Axis shows the Count of Campaigns falling in each bin. The data is fetched from Single column Tibble consisting Engagement_Score. The Key Insight: Engagement performance is consistent across campaigns, suggesting that no particular score range is disproportionately common.

fig_dat4 <- Campaign_data %>%
  select(Engagement_Score)

fig_dat4
## # A tibble: 10,000 × 1
##    Engagement_Score
##               <dbl>
##  1                2
##  2                5
##  3                5
##  4                1
##  5               10
##  6                6
##  7                1
##  8                8
##  9                9
## 10                8
## # ℹ 9,990 more rows
ggplot(fig_dat4, aes(x = Engagement_Score)) +
  geom_histogram(binwidth = 1, fill = "#66C2A5", color = "white", boundary = 0.5) +
  scale_x_continuous(breaks = 1:10) +
  labs(
    title    = "Distribution of Engagement Scores",
    subtitle = "Based on 10,000 marketing campaigns",
    x        = "Engagement Score (1–10)",
    y        = "Number of Campaigns"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5)
  )


Figure 5 — Boxplot of ROI Across Channels

The Boxplot how ROI varies across different marketing channels. The X-axis reresents channels used (YouTube, Website, Email, Instagram, Google Ads, Facebook). Y-Axis represents ROI values.Each box displays the median ROI, the interquartile range (IQR), and any outliers. A white diamond inside each box marks the mean ROI. The data is fetched from the Tibble consisting of Channel_used.The key insight : ROI is not uniform across channels — some platforms yield steadier returns, while others are more unpredictable.

fig_dat5 <- Campaign_data %>%
  select(Channel_Used, ROI)

fig_dat5
## # A tibble: 10,000 × 2
##    Channel_Used   ROI
##    <chr>        <dbl>
##  1 YouTube       5.82
##  2 Google Ads    7.37
##  3 Email         3.28
##  4 Google Ads    3.19
##  5 Instagram     6.55
##  6 Google Ads    2.81
##  7 Email         6.54
##  8 Email         7.18
##  9 YouTube       3.48
## 10 YouTube       5.15
## # ℹ 9,990 more rows
ggplot(fig_dat5, aes(x    = reorder(Channel_Used, ROI, FUN = median),
                     y    = ROI,
                     fill = Channel_Used)) +
  geom_boxplot(outlier.shape = 21, outlier.size = 1.5,
               outlier.alpha = 0.5, show.legend = FALSE) +
  stat_summary(fun = mean, geom = "point", shape = 23,
               size = 3, fill = "white") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Distribution of ROI Across Marketing Channels",
    subtitle = "White diamond indicates the mean; box shows median and IQR",
    x        = "Channel Used",
    y        = "ROI"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.text.x   = element_text(angle = 15, hjust = 1)
  )


Figure 6 — Heatmap of Campaign Type vs Target Audience

The Heatmap shows how average ROI varies across campaign types and target audiences.X-axis represents the different campaign types (Display, Email, Influencer, Search, Social Media). Y‑axis represents target audience groups (e.g., Men 18–24, Women 25–34, etc.).The fill color of each cell indicates the average ROI for that combination. The data is fetched from a tibble of three columns consisting of Campaign_Type, Target_Audience, avg_ROI. The Key Insight : ROI is not uniform across audiences and campaign types — tailoring campaigns to the right audience can yield measurable improvements.

fig_dat6 <- Campaign_data %>%
  select(Campaign_Type, Target_Audience, ROI) %>%
  group_by(Campaign_Type, Target_Audience) %>%
  summarise(avg_ROI = round(mean(ROI, na.rm = TRUE), 2), .groups = "drop")

fig_dat6
## # A tibble: 25 × 3
##    Campaign_Type Target_Audience avg_ROI
##    <chr>         <chr>             <dbl>
##  1 Display       All Ages           5.02
##  2 Display       Men 18-24          4.93
##  3 Display       Men 25-34          5   
##  4 Display       Women 25-34        5.06
##  5 Display       Women 35-44        4.92
##  6 Email         All Ages           4.97
##  7 Email         Men 18-24          4.91
##  8 Email         Men 25-34          5   
##  9 Email         Women 25-34        5.01
## 10 Email         Women 35-44        4.98
## # ℹ 15 more rows
ggplot(fig_dat6, aes(x    = Campaign_Type,
                     y    = Target_Audience,
                     fill = avg_ROI)) +
  geom_tile(color = "white", linewidth = 0.8) +
  geom_text(aes(label = avg_ROI), size = 3.5, color = "white", fontface = "bold") +
  scale_fill_gradient(low  = "#a8d5b5",
                      high = "#1a6b3c",
                      name = "Avg ROI") +
  labs(
    title    = "Heatmap of Average ROI by\nCampaign Type and Target Audience",
    subtitle = "Darker green = higher average ROI",
    x        = "Campaign Type",
    y        = "Target Audience"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    axis.text.x   = element_text(angle = 20, hjust = 1),
    panel.grid    = element_blank()
  )


Figure 7 — Pie Chart of Customer Segments

The Pie Chart shows how marketing campaigns are distributed across customer segments.Each slice represents one segment, sized according to the number of campaigns directed at that group.Percentage labels on the slices make the proportions clear at a glance.The data is fetched from tibble consisting of three columns which are Customer_Segment, count and percentage. The Key insight : The marketing campaigns are evenly allocated, reflecting a deliberate effort to reach diverse customer groups rather than concentrating resources on just one.

fig_dat7 <- Campaign_data %>%
  select(Customer_Segment) %>%
  group_by(Customer_Segment) %>%
  summarise(count = n(), .groups = "drop") %>%
  mutate(percentage = round(count / sum(count) * 100, 1),
         label      = paste0(Customer_Segment, "\n", percentage, "%"))

fig_dat7
## # A tibble: 5 × 4
##   Customer_Segment    count percentage label                       
##   <chr>               <int>      <dbl> <chr>                       
## 1 Fashionistas         2001       20   "Fashionistas\n20%"         
## 2 Foodies              2039       20.4 "Foodies\n20.4%"            
## 3 Health & Wellness    1962       19.6 "Health & Wellness\n19.6%"  
## 4 Outdoor Adventurers  1942       19.4 "Outdoor Adventurers\n19.4%"
## 5 Tech Enthusiasts     2056       20.6 "Tech Enthusiasts\n20.6%"
ggplot(fig_dat7, aes(x = "", y = count, fill = Customer_Segment)) +
  geom_col(width = 1, color = "white", linewidth = 0.8) +
  coord_polar(theta = "y", start = 0) +
  geom_text(aes(label = paste0(percentage, "%")),
            position = position_stack(vjust = 0.5),
            size     = 3.8,
            fontface = "bold",
            color    = "white") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Distribution of Campaigns\nby Customer Segment",
    subtitle = "Proportion of total campaigns targeting each segment",
    fill     = "Customer Segment",
    x        = NULL,
    y        = NULL
  ) +
  theme_void(base_size = 13) +
  theme(
    plot.title      = element_text(face = "bold", hjust = 0.5),
    plot.subtitle   = element_text(hjust = 0.5, margin = margin(b = 10)),
    legend.position = "right"
  )


Figure 8 — Faceted Visualization of ROI by Location and Campaign Type

The Faceted bar chart that compares campaign performance across different geographic markets. Each panel represents one location (e.g., Chicago, Houston, Los Angeles, Miami, New York). Within each panel, bars show the average ROI for each campaign type (Influencer, Social Media, Search, Display, Email).This layout makes it easy to compare how campaign types perform both within a single city and across multiple cities. The data is fetched from a Tibble consisting of Location, Campaign_Type, and avg_ROI. The key Insight : While ROI values are close across campaign types, regional differences matter — Miami shows the strongest returns, while Houston and Los Angeles reveal more variation. This insight can guide location‑specific marketing strategies.

fig_dat8 <- Campaign_data %>%
  select(Location, Campaign_Type, ROI) %>%
  group_by(Location, Campaign_Type) %>%
  summarise(avg_ROI = round(mean(ROI, na.rm = TRUE), 2), .groups = "drop")

fig_dat8
## # A tibble: 25 × 3
##    Location Campaign_Type avg_ROI
##    <chr>    <chr>           <dbl>
##  1 Chicago  Display          5.1 
##  2 Chicago  Email            4.97
##  3 Chicago  Influencer       5.07
##  4 Chicago  Search           5.03
##  5 Chicago  Social Media     4.99
##  6 Houston  Display          4.85
##  7 Houston  Email            5   
##  8 Houston  Influencer       4.96
##  9 Houston  Search           4.94
## 10 Houston  Social Media     5.04
## # ℹ 15 more rows
ggplot(fig_dat8, aes(x    = reorder(Campaign_Type, avg_ROI),
                     y    = avg_ROI,
                     fill = Campaign_Type)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = avg_ROI),
            hjust    = -0.1,
            size     = 2.8,
            fontface = "bold") +
  scale_fill_brewer(palette = "Set2") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  coord_flip() +
  facet_wrap(~ Location, ncol = 3) +
  labs(
    title    = "Average ROI by Campaign Type Across Locations",
    subtitle = "Each panel represents a geographic market",
    x        = "Campaign Type",
    y        = "Average ROI"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title       = element_text(face = "bold", hjust = 0.5),
    plot.subtitle    = element_text(hjust = 0.5),
    strip.text       = element_text(face = "bold", size = 10),
    strip.background = element_rect(fill = "#f0f0f0", color = NA),
    panel.spacing    = unit(1, "lines")
  )