Digital Marketing Campaign Performance

What Drives Conversion Rates?

Luis Berumen · Mickyas Shawel · Aldo Naranjo · Jake Evans

IBM 6510 - Econometrics

2026-05-16

Introduction

Research Question

How do engagement, exposure, and cost efficiency influence conversion rates across digital marketing campaigns?

  • Dependent Variable (Y): Conversion Rate
  • Key Variables (X): Clicks · Impressions · Engagement Score
  • Controls: Acquisition Cost · Duration · Campaign Type · Channel Used
  • Data: 200,000 campaigns — Kaggle synthetic dataset

Why This Matters

Conversion rate is the most direct measure of campaign effectiveness — whether exposure translated into action.

Understanding which levers drive conversions helps firms:

  • Allocate ad spend more efficiently
  • Compare channel performance
  • Justify marketing ROI to stakeholders

Literature Review

Manchanda et al. (2006)Banner Advertising & Internet Purchasing

  • Banner ad exposure alone increases purchase probability
  • Supports Impressions as a separate key variable

Kireyev, Pauwels & Gupta (2016)Do Display Ads Influence Search?

  • Display ads positively shift search behavior
  • Justifies Clicks and Impressions as separate regressors

Blake, Nosko & Tadelis (2015)Consumer Heterogeneity & Paid Search

  • Many clicks come from users who would have converted anyway
  • Motivates controlling for Engagement Score and Acquisition Cost

Lambrecht & Tucker (2013)When Does Retargeting Work?

  • Targeting and channel significantly moderate conversion outcomes
  • Supports including Campaign Type and Channel as controls

Econometric Model

\[\text{ConversionRate}_i = \beta_0 + \beta_1\text{Clicks}_i + \beta_2\text{Impressions}_i + \beta_3\text{EngagementScore}_i + \beta_4\text{AcquisitionCost}_i + \beta_5\text{Duration}_i + \beta_6\text{CampaignType}_i + \beta_7\text{ChannelUsed}_i + u_i\]

Variable Type Sign Rationale
Clicks Key + Direct user intent
Impressions Key + Exposure builds conversion probability
Engagement Score Key + Stronger purchase intent
Acquisition Cost Control Higher cost = less efficient spend
Duration Control + More time = more exposure
Campaign Type Control Varies Different conversion mechanisms
Channel Used Control Varies Different audience reach structures

Reference Categories:

  • Campaign Type → Email (baseline)
  • Channel Used → Email (baseline)

All dummy coefficients interpreted relative to Email.

Estimation: OLS with heteroscedasticity-robust standard errors (HC1) applied to all specifications.

Data & Summary Statistics

Source: Kaggle — Marketing Campaign Performance Dataset

Type: Cross-sectional | N: 200,000 | Variables: 16 total; 8 used

Missing values: None detected

⚠️ Synthetic dataset. Variables were likely generated independently. Results cannot be extrapolated to real-world conclusions.

Variable Mean SD Min Max
Conversion Rate 0.1 0.0 0 1e-01
Clicks 549.8 260.0 100 1e+03
Impressions 5507.3 2596.9 1000 1e+04
Engagement Score 5.5 2.9 1 1e+01
Acq. Cost ($) 12504.4 4337.7 5000 2e+04
Duration (days) 37.5 16.7 15 6e+01

Nearly identical means and medians — consistent with synthetic generation.

Descriptive Statistics

Approximately uniform, mean ≈ 0.08. No skew — log transformation not needed.

Flat fitted lines confirm near-zero correlation (r = 0.000).

Regression Results — Full Baseline Model

Variable Coeff SE t p
Key Explanatory Variables
Engagement Score -8.87e-06 3.16e-05 -0.28 0.779
Clicks 4.02e-08 3.50e-07 0.11 0.909
Impressions -4.43e-08 3.50e-08 -1.27 0.205
Controls — Continuous
Acquisition Cost 6.57e-09 2.10e-08 0.31 0.754
Duration -2.58e-06 5.43e-06 -0.48 0.634
Controls — Campaign Type (ref: Email)
Camp: Display 3.01e-04 2.88e-04 1.04 0.296
Camp: Influencer 5.26e-04 2.87e-04 1.83 0.067 *
Camp: Search 2.32e-04 2.87e-04 0.81 0.418
Camp: Social Media 3.46e-04 2.88e-04 1.20 0.229
Controls — Channel (ref: Email)
Chan: Facebook -2.89e-04 3.14e-04 -0.92 0.358
Chan: Google Ads -9.73e-05 3.14e-04 -0.31 0.757
Chan: Instagram -3.96e-04 3.13e-04 -1.26 0.206
Chan: Website -9.77e-05 3.13e-04 -0.31 0.755
Chan: YouTube -3.94e-04 3.14e-04 -1.25 0.21
Metric Value
Adj. R² −0.000028
AIC −713,980.71
BIC −713,817.41
N 200,000
Robust SE HC1
BP Test (p) 0.169

All variables insignificant except the intercept.

★ Influencer (highlighted) marginally significant at 10% — the one result worth discussing.

Adj. R² ≈ 0 — model explains essentially none of the variation in Conversion Rate.

Expected vs. Actual Signs

Variable Expected Actual Sign Match? Significant?
Clicks + + Yes No
Impressions + No No
Engagement Score + No No
Acquisition Cost + No No
Duration + No No
Camp: Display Varies + N/A No
Camp: Influencer Varies + N/A Yes (10%)
Camp: Search Varies + N/A No
Camp: Social Media Varies + N/A No
Chan: Facebook Varies N/A No
Chan: Google Ads Varies N/A No
Chan: Instagram Varies N/A No
Chan: Website Varies N/A No
Chan: YouTube Varies N/A No

The Influencer Finding

What the data shows

  • Influencer campaigns: +0.000526 above Email baseline
  • Marginally significant at 10% (p = 0.067)
  • All other campaign types: not significant

Why likely a statistical artifact

With n = 200,000, even trivially small differences become detectable. The magnitude (+0.053% above baseline) has no practical marketing relevance. The synthetic data-generating process likely introduced a small random difference that the large sample inflates into marginal significance.

Could it be theoretically real?

There is a theoretical reason to expect influencer campaigns to outperform email:

  • Influencer content is perceived as authentic — reducing ad skepticism
  • Email requires opt-in behavior and faces inbox competition
  • Influencers reach high-intent niche audiences that email lists may miss

Our result is directionally consistent with theory — even if not meaningful in magnitude on synthetic data.

Econometric Diagnostics

Heteroscedasticity

Breusch-Pagan Test

  • Statistic: 18.902
  • p-value: 0.169
  • Fail to reject H₀ — no heteroscedasticity detected

HC1 robust SEs applied to all models regardless.

Multicollinearity (VIF)

Variable VIF
Engagement Score 1.000
Clicks 1.000
Impressions 1.000
Acquisition Cost 1.000
Duration 1.000
Campaign Type 1.000
Channel Used 1.000

All ≈ 1.00 — no multicollinearity.

Outliers & Joint F-Tests

Outliers (|z| > 2): 0 (0.00%)

Adj R² identical with/without removal.

Robust F-Tests:

Test F p
Campaign_Type 0.886 0.471
Channel_Used 0.584 0.713
Clicks & Impr. 0.809 0.446

All joint tests: not significant.

Robustness Checks

Model Adj R² AIC BIC Key Finding
Full Baseline (n = 200,000) −0.000028 −713,981 −713,817 Baseline — all coefficients near zero and insignificant
RC1: Exclude Impressions (n = 50,000) −0.000056 −178,509 −178,377 Clicks coefficient stable (−4.41e-07 vs −4.42e-07). No sensitivity to Impressions.
RC2: Exclude Categorical Controls (n = 50,000) −0.000012 −178,520 −178,458 Key coefficients stable in sign and magnitude without categorical controls.
RC3: Email & Search Only (n = 80,027) −0.000014 −285,546 −285,434 Results hold in homogeneous subsample. No sign reversals on primary variables.

⚠️ Sample size caveat: RC1 and RC2 use n = 50,000; RC3 uses n = 80,027. Differences in AIC, BIC, and Adj R² partly reflect smaller sample sizes, not only variable changes. Model fit statistics should not be compared directly across subsamples.

Discussion & Limitations

What the null results mean

These are legitimate findings, not a modeling failure. A well-specified model on synthetic data correctly returns null results when variables were generated independently.

“Null results from a well-specified model on a synthetic dataset are an honest finding.”

Threats to Internal Validity

  1. Omitted Variable Bias — Unmeasured factors like brand awareness or audience targeting quality likely affect conversion rate and correlate with our regressors. Partially addressed by including Engagement Score and Acquisition Cost as proxies.
  2. Functional Form Misspecification — A linear model is appropriate given the uniform distribution of Conversion Rate, but real data may require log or quadratic transformations of Clicks and Impressions.
  3. Measurement Error — Engagement Score construction is undocumented; if it is a noisy proxy for true engagement, its coefficient would be attenuated toward zero.

Threats to External Validity

  1. Sample Selection — Data is synthetic and variables are independently generated with no real-world covariance structure. Findings cannot generalize to actual marketing campaigns.
  2. Simultaneous Causality — In real data, conversion rate could affect future ad spend and campaign type choices, creating reverse causality. Cross-sectional OLS cannot address this; IV or experimental designs would be required.

This project as a template

Despite these limitations, our specification, cleaning pipeline, diagnostics, and robustness checks form a fully reproducible framework ready to apply to real marketing data — where theoretically predicted relationships should emerge.

Conclusion

Main Findings

  1. No variable significantly explains Conversion Rate — all coefficients near zero, Adj R² ≈ 0
  2. Results consistent across all robustness checks — dropping variables or restricting the sample does not change the core finding
  3. Influencer dummy marginally significant at 10% — directionally consistent with theory, but likely a large-sample artifact on synthetic data
  4. No econometric issues detected — BP test homoscedastic, all VIF ≈ 1.0, zero outliers

Actionable Recommendation

Firms should not assume that channel type or campaign format alone drives conversion rate. Before optimizing campaign mix, invest in real tracking infrastructure and data quality — specifically, linking ad exposure to individual-level conversion outcomes. Until that covariance structure exists in the data, any model will return null results regardless of how well-specified it is. This framework is ready to deploy the moment that data is available.

If We Could Continue

  • Apply to real campaign-level data
  • Explore log transformations of Clicks and Impressions
  • Test interaction terms (e.g., Engagement Score × Campaign Type)
  • Address endogeneity via IV or natural experiment

Project Contributions

Member Role
Luis Berumen Research & Literature
Jake Evans Data Cleaning & Stats
Mickyas Shawel Model Specification
Aldo Naranjo Regression & Presentation

References

Academic Sources

  • Manchanda, P., Dube, J. P., Goh, K. Y., & Chintagunta, P. K. (2006). The effect of banner advertising on internet purchasing. Journal of Marketing Research, 43(1), 98–108.

  • Kireyev, P., Pauwels, K., & Gupta, S. (2016). Do display ads influence search? Attribution and dynamics in online advertising. International Journal of Research in Marketing, 33(3), 475–490.

Academic Sources (cont.)

  • Blake, T., Nosko, C., & Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: A large-scale field experiment. Econometrica, 83(1), 155–174.

  • Lambrecht, A., & Tucker, C. (2013). When does retargeting work? Information specificity in online advertising. Journal of Marketing Research, 50(5), 561–576.

Data & Software

  • Bhatt, M. (2023). Marketing Campaign Performance Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/manishabhatt22/marketing-campaign-performance-dataset

  • R Core Team (2024). R: A language and environment for statistical computing. https://www.R-project.org/