| Variable | Mean | SD | Min | Max |
|---|---|---|---|---|
| Conversion Rate | 0.1 | 0.0 | 0 | 1e-01 |
| Clicks | 549.8 | 260.0 | 100 | 1e+03 |
| Impressions | 5507.3 | 2596.9 | 1000 | 1e+04 |
| Engagement Score | 5.5 | 2.9 | 1 | 1e+01 |
| Acq. Cost ($) | 12504.4 | 4337.7 | 5000 | 2e+04 |
| Duration (days) | 37.5 | 16.7 | 15 | 6e+01 |
What Drives Conversion Rates?
IBM 6510 - Econometrics
2026-05-16
How do engagement, exposure, and cost efficiency influence conversion rates across digital marketing campaigns?
Conversion rate is the most direct measure of campaign effectiveness — whether exposure translated into action.
Understanding which levers drive conversions helps firms:
Manchanda et al. (2006) — Banner Advertising & Internet Purchasing
Kireyev, Pauwels & Gupta (2016) — Do Display Ads Influence Search?
Blake, Nosko & Tadelis (2015) — Consumer Heterogeneity & Paid Search
Lambrecht & Tucker (2013) — When Does Retargeting Work?
\[\text{ConversionRate}_i = \beta_0 + \beta_1\text{Clicks}_i + \beta_2\text{Impressions}_i + \beta_3\text{EngagementScore}_i + \beta_4\text{AcquisitionCost}_i + \beta_5\text{Duration}_i + \beta_6\text{CampaignType}_i + \beta_7\text{ChannelUsed}_i + u_i\]
| Variable | Type | Sign | Rationale |
|---|---|---|---|
| Clicks | Key | + | Direct user intent |
| Impressions | Key | + | Exposure builds conversion probability |
| Engagement Score | Key | + | Stronger purchase intent |
| Acquisition Cost | Control | − | Higher cost = less efficient spend |
| Duration | Control | + | More time = more exposure |
| Campaign Type | Control | Varies | Different conversion mechanisms |
| Channel Used | Control | Varies | Different audience reach structures |
Reference Categories:
All dummy coefficients interpreted relative to Email.
Estimation: OLS with heteroscedasticity-robust standard errors (HC1) applied to all specifications.
Source: Kaggle — Marketing Campaign Performance Dataset
Type: Cross-sectional | N: 200,000 | Variables: 16 total; 8 used
Missing values: None detected
⚠️ Synthetic dataset. Variables were likely generated independently. Results cannot be extrapolated to real-world conclusions.
| Variable | Mean | SD | Min | Max |
|---|---|---|---|---|
| Conversion Rate | 0.1 | 0.0 | 0 | 1e-01 |
| Clicks | 549.8 | 260.0 | 100 | 1e+03 |
| Impressions | 5507.3 | 2596.9 | 1000 | 1e+04 |
| Engagement Score | 5.5 | 2.9 | 1 | 1e+01 |
| Acq. Cost ($) | 12504.4 | 4337.7 | 5000 | 2e+04 |
| Duration (days) | 37.5 | 16.7 | 15 | 6e+01 |
Nearly identical means and medians — consistent with synthetic generation.
Approximately uniform, mean ≈ 0.08. No skew — log transformation not needed.
Flat fitted lines confirm near-zero correlation (r = 0.000).
| Variable | Coeff | SE | t | p |
|---|---|---|---|---|
| Key Explanatory Variables | ||||
| Engagement Score | -8.87e-06 | 3.16e-05 | -0.28 | 0.779 |
| Clicks | 4.02e-08 | 3.50e-07 | 0.11 | 0.909 |
| Impressions | -4.43e-08 | 3.50e-08 | -1.27 | 0.205 |
| Controls — Continuous | ||||
| Acquisition Cost | 6.57e-09 | 2.10e-08 | 0.31 | 0.754 |
| Duration | -2.58e-06 | 5.43e-06 | -0.48 | 0.634 |
| Controls — Campaign Type (ref: Email) | ||||
| Camp: Display | 3.01e-04 | 2.88e-04 | 1.04 | 0.296 |
| Camp: Influencer | 5.26e-04 | 2.87e-04 | 1.83 | 0.067 * |
| Camp: Search | 2.32e-04 | 2.87e-04 | 0.81 | 0.418 |
| Camp: Social Media | 3.46e-04 | 2.88e-04 | 1.20 | 0.229 |
| Controls — Channel (ref: Email) | ||||
| Chan: Facebook | -2.89e-04 | 3.14e-04 | -0.92 | 0.358 |
| Chan: Google Ads | -9.73e-05 | 3.14e-04 | -0.31 | 0.757 |
| Chan: Instagram | -3.96e-04 | 3.13e-04 | -1.26 | 0.206 |
| Chan: Website | -9.77e-05 | 3.13e-04 | -0.31 | 0.755 |
| Chan: YouTube | -3.94e-04 | 3.14e-04 | -1.25 | 0.21 |
| Metric | Value |
|---|---|
| Adj. R² | −0.000028 |
| AIC | −713,980.71 |
| BIC | −713,817.41 |
| N | 200,000 |
| Robust SE | HC1 |
| BP Test (p) | 0.169 |
All variables insignificant except the intercept.
★ Influencer (highlighted) marginally significant at 10% — the one result worth discussing.
Adj. R² ≈ 0 — model explains essentially none of the variation in Conversion Rate.
| Variable | Expected | Actual | Sign Match? | Significant? |
|---|---|---|---|---|
| Clicks | + | + | Yes | No |
| Impressions | + | − | No | No |
| Engagement Score | + | − | No | No |
| Acquisition Cost | − | + | No | No |
| Duration | + | − | No | No |
| Camp: Display | Varies | + | N/A | No |
| Camp: Influencer | Varies | + | N/A | Yes (10%) |
| Camp: Search | Varies | + | N/A | No |
| Camp: Social Media | Varies | + | N/A | No |
| Chan: Facebook | Varies | − | N/A | No |
| Chan: Google Ads | Varies | − | N/A | No |
| Chan: Instagram | Varies | − | N/A | No |
| Chan: Website | Varies | − | N/A | No |
| Chan: YouTube | Varies | − | N/A | No |
What the data shows
Why likely a statistical artifact
With n = 200,000, even trivially small differences become detectable. The magnitude (+0.053% above baseline) has no practical marketing relevance. The synthetic data-generating process likely introduced a small random difference that the large sample inflates into marginal significance.
Could it be theoretically real?
There is a theoretical reason to expect influencer campaigns to outperform email:
Our result is directionally consistent with theory — even if not meaningful in magnitude on synthetic data.
Breusch-Pagan Test
HC1 robust SEs applied to all models regardless.
| Variable | VIF |
|---|---|
| Engagement Score | 1.000 |
| Clicks | 1.000 |
| Impressions | 1.000 |
| Acquisition Cost | 1.000 |
| Duration | 1.000 |
| Campaign Type | 1.000 |
| Channel Used | 1.000 |
All ≈ 1.00 — no multicollinearity.
Outliers (|z| > 2): 0 (0.00%)
Adj R² identical with/without removal.
Robust F-Tests:
| Test | F | p |
|---|---|---|
| Campaign_Type | 0.886 | 0.471 |
| Channel_Used | 0.584 | 0.713 |
| Clicks & Impr. | 0.809 | 0.446 |
All joint tests: not significant.
| Model | Adj R² | AIC | BIC | Key Finding |
|---|---|---|---|---|
| Full Baseline (n = 200,000) | −0.000028 | −713,981 | −713,817 | Baseline — all coefficients near zero and insignificant |
| RC1: Exclude Impressions (n = 50,000) | −0.000056 | −178,509 | −178,377 | Clicks coefficient stable (−4.41e-07 vs −4.42e-07). No sensitivity to Impressions. |
| RC2: Exclude Categorical Controls (n = 50,000) | −0.000012 | −178,520 | −178,458 | Key coefficients stable in sign and magnitude without categorical controls. |
| RC3: Email & Search Only (n = 80,027) | −0.000014 | −285,546 | −285,434 | Results hold in homogeneous subsample. No sign reversals on primary variables. |
⚠️ Sample size caveat: RC1 and RC2 use n = 50,000; RC3 uses n = 80,027. Differences in AIC, BIC, and Adj R² partly reflect smaller sample sizes, not only variable changes. Model fit statistics should not be compared directly across subsamples.
What the null results mean
These are legitimate findings, not a modeling failure. A well-specified model on synthetic data correctly returns null results when variables were generated independently.
“Null results from a well-specified model on a synthetic dataset are an honest finding.”
Threats to Internal Validity
Threats to External Validity
This project as a template
Despite these limitations, our specification, cleaning pipeline, diagnostics, and robustness checks form a fully reproducible framework ready to apply to real marketing data — where theoretically predicted relationships should emerge.
Main Findings
Actionable Recommendation
Firms should not assume that channel type or campaign format alone drives conversion rate. Before optimizing campaign mix, invest in real tracking infrastructure and data quality — specifically, linking ad exposure to individual-level conversion outcomes. Until that covariance structure exists in the data, any model will return null results regardless of how well-specified it is. This framework is ready to deploy the moment that data is available.
If We Could Continue
Project Contributions
| Member | Role |
|---|---|
| Luis Berumen | Research & Literature |
| Jake Evans | Data Cleaning & Stats |
| Mickyas Shawel | Model Specification |
| Aldo Naranjo | Regression & Presentation |
Academic Sources
Manchanda, P., Dube, J. P., Goh, K. Y., & Chintagunta, P. K. (2006). The effect of banner advertising on internet purchasing. Journal of Marketing Research, 43(1), 98–108.
Kireyev, P., Pauwels, K., & Gupta, S. (2016). Do display ads influence search? Attribution and dynamics in online advertising. International Journal of Research in Marketing, 33(3), 475–490.
Academic Sources (cont.)
Blake, T., Nosko, C., & Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: A large-scale field experiment. Econometrica, 83(1), 155–174.
Lambrecht, A., & Tucker, C. (2013). When does retargeting work? Information specificity in online advertising. Journal of Marketing Research, 50(5), 561–576.
Data & Software
Bhatt, M. (2023). Marketing Campaign Performance Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/manishabhatt22/marketing-campaign-performance-dataset
R Core Team (2024). R: A language and environment for statistical computing. https://www.R-project.org/
IBM 6510 | Digital Marketing Campaign Performance