Digital Marketing Campaign Performance

Introduction

Research Question

How do engagement, exposure, and cost efficiency influence conversion rates across digital marketing campaigns?

Dependent Variable (Y): Conversion Rate
Key Variables (X): Clicks · Impressions · Engagement Score
Controls: Acquisition Cost · Duration · Campaign Type · Channel Used
Data: 200,000 campaigns — Kaggle synthetic dataset

Why This Matters

Conversion rate is the most direct measure of campaign effectiveness — whether exposure translated into action.

Understanding which levers drive conversions helps firms:

Allocate ad spend more efficiently
Compare channel performance
Justify marketing ROI to stakeholders

Literature Review

Manchanda et al. (2006) — Banner Advertising & Internet Purchasing

Banner ad exposure alone increases purchase probability
Supports Impressions as a separate key variable

Kireyev, Pauwels & Gupta (2016) — Do Display Ads Influence Search?

Display ads positively shift search behavior
Justifies Clicks and Impressions as separate regressors

Blake, Nosko & Tadelis (2015) — Consumer Heterogeneity & Paid Search

Many clicks come from users who would have converted anyway
Motivates controlling for Engagement Score and Acquisition Cost

Lambrecht & Tucker (2013) — When Does Retargeting Work?

Targeting and channel significantly moderate conversion outcomes
Supports including Campaign Type and Channel as controls

Our model is grounded in four academic studies, each of which directly motivated a specific variable choice and expected sign.

Manchanda and colleagues found that passive banner ad exposure — even without any clicks — increases purchase probability. This led us to include Impressions as a separate key variable with an expected positive sign, rather than treating it as noise or collapsing it with Clicks.

Kireyev, Pauwels, and Gupta showed that display ads shift downstream search behavior, confirming that the mechanisms behind Clicks and Impressions are distinct. This is why we include both as separate regressors rather than a single combined metric — and why both carry independent positive expected signs.

Blake, Nosko, and Tadelis found that many clicks come from users who were already likely to convert — meaning Clicks can reflect selection rather than pure campaign effectiveness. This cautioned us against treating Clicks as a clean causal variable and motivated including Engagement Score as a control for underlying purchase intent, and Acquisition Cost as a control for targeting quality. Engagement Score is expected positive; Acquisition Cost is expected negative since higher cost per acquisition signals less efficient targeting.

Finally, Lambrecht and Tucker showed that campaign type and channel significantly moderate how impressions translate into conversions. This directly motivated including Campaign Type and Channel Used as categorical controls — we expected these to vary in sign and magnitude depending on the conversion mechanism of each type.

Econometric Model

\[\text{ConversionRate}_i = \beta_0 + \beta_1\text{Clicks}_i + \beta_2\text{Impressions}_i + \beta_3\text{EngagementScore}_i + \beta_4\text{AcquisitionCost}_i + \beta_5\text{Duration}_i + \beta_6\text{CampaignType}_i + \beta_7\text{ChannelUsed}_i + u_i\]

Variable	Type	Sign	Rationale
Clicks	Key	+	Direct user intent
Impressions	Key	+	Exposure builds conversion probability
Engagement Score	Key	+	Stronger purchase intent
Acquisition Cost	Control	−	Higher cost = less efficient spend
Duration	Control	+	More time = more exposure
Campaign Type	Control	Varies	Different conversion mechanisms
Channel Used	Control	Varies	Different audience reach structures

Reference Categories:

Campaign Type → Email (baseline)
Channel Used → Email (baseline)

All dummy coefficients interpreted relative to Email.

Estimation: OLS with heteroscedasticity-robust standard errors (HC1) applied to all specifications.

Data & Summary Statistics

Source: Kaggle — Marketing Campaign Performance Dataset

Type: Cross-sectional | N: 200,000 | Variables: 16 total; 8 used

Missing values: None detected

⚠️ Synthetic dataset. Variables were likely generated independently. Results cannot be extrapolated to real-world conclusions.

Variable	Mean	SD	Min	Max
Conversion Rate	0.1	0.0	0	1e-01
Clicks	549.8	260.0	100	1e+03
Impressions	5507.3	2596.9	1000	1e+04
Engagement Score	5.5	2.9	1	1e+01
Acq. Cost ($)	12504.4	4337.7	5000	2e+04
Duration (days)	37.5	16.7	15	6e+01

Nearly identical means and medians — consistent with synthetic generation.

Descriptive Statistics

Approximately uniform, mean ≈ 0.08. No skew — log transformation not needed.

Flat fitted lines confirm near-zero correlation (r = 0.000).

Regression Results — Full Baseline Model

Variable	Coeff	SE	t	p
Key Explanatory Variables
Engagement Score	-8.87e-06	3.16e-05	-0.28	0.779
Clicks	4.02e-08	3.50e-07	0.11	0.909
Impressions	-4.43e-08	3.50e-08	-1.27	0.205
Controls — Continuous
Acquisition Cost	6.57e-09	2.10e-08	0.31	0.754
Duration	-2.58e-06	5.43e-06	-0.48	0.634
Controls — Campaign Type (ref: Email)
Camp: Display	3.01e-04	2.88e-04	1.04	0.296
Camp: Influencer	5.26e-04	2.87e-04	1.83	0.067 *
Camp: Search	2.32e-04	2.87e-04	0.81	0.418
Camp: Social Media	3.46e-04	2.88e-04	1.20	0.229
Controls — Channel (ref: Email)
Chan: Facebook	-2.89e-04	3.14e-04	-0.92	0.358
Chan: Google Ads	-9.73e-05	3.14e-04	-0.31	0.757
Chan: Instagram	-3.96e-04	3.13e-04	-1.26	0.206
Chan: Website	-9.77e-05	3.13e-04	-0.31	0.755
Chan: YouTube	-3.94e-04	3.14e-04	-1.25	0.21

Metric	Value
Adj. R²	−0.000028
AIC	−713,980.71
BIC	−713,817.41
N	200,000
Robust SE	HC1
BP Test (p)	0.169

All variables insignificant except the intercept.

★ Influencer (highlighted) marginally significant at 10% — the one result worth discussing.

Adj. R² ≈ 0 — model explains essentially none of the variation in Conversion Rate.

Here are the full baseline model results with robust standard errors. Let me walk through the sign, significance, and practical magnitude of each variable.

Engagement Score: coefficient -8.87e-06, expected positive, actual negative. Practically — moving from the minimum score of 1 to the maximum of 10 is associated with a change of only -0.000080 in Conversion Rate, less than one tenth of one percentage point. Statistically insignificant at p = 0.779.

Clicks: coefficient +4.02e-08, sign matches expectation. Practically — increasing Clicks by 100 above the sample mean of roughly 550 is associated with only +0.000004 change in Conversion Rate. Statistically insignificant at p = 0.909.

Impressions: coefficient -4.43e-08, expected positive, actual negative. Practically — increasing Impressions by 1,000 from the mean of roughly 5,000 is associated with -0.000044 change in Conversion Rate. Statistically insignificant at p = 0.205.

Acquisition Cost: coefficient +6.57e-09, expected negative, actual positive. Practically — a $1,000 increase in Acquisition Cost is associated with +0.0000066 change in Conversion Rate. Statistically insignificant at p = 0.754.

Duration: coefficient -2.58e-06, expected positive, actual negative. Practically — extending a campaign by 30 days is associated with -0.000077 change in Conversion Rate. Statistically insignificant at p = 0.634.

For the campaign dummies relative to Email — Display +3.01e-04, Influencer +5.26e-04 (marginally significant at 10%, p = 0.067), Search +2.32e-04, Social Media +3.46e-04. All magnitudes are below 0.001 in Conversion Rate — practically negligible. Channel dummies relative to Email are all negative and none significant, with the largest being Facebook at -2.89e-04, under 0.03 percentage points.

The Adjusted R-squared is essentially zero. These are null results — correct and expected for a synthetic dataset where variables were generated independently.

Expected vs. Actual Signs

Variable	Expected	Actual	Sign Match?	Significant?
Clicks	+	+	Yes	No
Impressions	+	−	No	No
Engagement Score	+	−	No	No
Acquisition Cost	−	+	No	No
Duration	+	−	No	No
Camp: Display	Varies	+	N/A	No
Camp: Influencer	Varies	+	N/A	Yes (10%)
Camp: Search	Varies	+	N/A	No
Camp: Social Media	Varies	+	N/A	No
Chan: Facebook	Varies	−	N/A	No
Chan: Google Ads	Varies	−	N/A	No
Chan: Instagram	Varies	−	N/A	No
Chan: Website	Varies	−	N/A	No
Chan: YouTube	Varies	−	N/A	No

The Influencer Finding

What the data shows

Influencer campaigns: +0.000526 above Email baseline
Marginally significant at 10% (p = 0.067)
All other campaign types: not significant

Why likely a statistical artifact

With n = 200,000, even trivially small differences become detectable. The magnitude (+0.053% above baseline) has no practical marketing relevance. The synthetic data-generating process likely introduced a small random difference that the large sample inflates into marginal significance.

Could it be theoretically real?

There is a theoretical reason to expect influencer campaigns to outperform email:

Influencer content is perceived as authentic — reducing ad skepticism
Email requires opt-in behavior and faces inbox competition
Influencers reach high-intent niche audiences that email lists may miss

Our result is directionally consistent with theory — even if not meaningful in magnitude on synthetic data.

Econometric Diagnostics

Heteroscedasticity

Breusch-Pagan Test

Statistic: 18.902
p-value: 0.169
Fail to reject H₀ — no heteroscedasticity detected

HC1 robust SEs applied to all models regardless.

Multicollinearity (VIF)

Variable	VIF
Engagement Score	1.000
Clicks	1.000
Impressions	1.000
Acquisition Cost	1.000
Duration	1.000
Campaign Type	1.000
Channel Used	1.000

All ≈ 1.00 — no multicollinearity.

Outliers & Joint F-Tests

Outliers (|z| > 2): 0 (0.00%)

Adj R² identical with/without removal.

Robust F-Tests:

Test	F	p
Campaign_Type	0.886	0.471
Channel_Used	0.584	0.713
Clicks & Impr.	0.809	0.446

All joint tests: not significant.

Robustness Checks

Model	Adj R²	AIC	BIC	Key Finding
Full Baseline (n = 200,000)	−0.000028	−713,981	−713,817	Baseline — all coefficients near zero and insignificant
RC1: Exclude Impressions (n = 50,000)	−0.000056	−178,509	−178,377	Clicks coefficient stable (−4.41e-07 vs −4.42e-07). No sensitivity to Impressions.
RC2: Exclude Categorical Controls (n = 50,000)	−0.000012	−178,520	−178,458	Key coefficients stable in sign and magnitude without categorical controls.
RC3: Email & Search Only (n = 80,027)	−0.000014	−285,546	−285,434	Results hold in homogeneous subsample. No sign reversals on primary variables.

⚠️ Sample size caveat: RC1 and RC2 use n = 50,000; RC3 uses n = 80,027. Differences in AIC, BIC, and Adj R² partly reflect smaller sample sizes, not only variable changes. Model fit statistics should not be compared directly across subsamples.

Discussion & Limitations

What the null results mean

These are legitimate findings, not a modeling failure. A well-specified model on synthetic data correctly returns null results when variables were generated independently.

“Null results from a well-specified model on a synthetic dataset are an honest finding.”

Threats to Internal Validity

Omitted Variable Bias — Unmeasured factors like brand awareness or audience targeting quality likely affect conversion rate and correlate with our regressors. Partially addressed by including Engagement Score and Acquisition Cost as proxies.
Functional Form Misspecification — A linear model is appropriate given the uniform distribution of Conversion Rate, but real data may require log or quadratic transformations of Clicks and Impressions.
Measurement Error — Engagement Score construction is undocumented; if it is a noisy proxy for true engagement, its coefficient would be attenuated toward zero.

Threats to External Validity

Sample Selection — Data is synthetic and variables are independently generated with no real-world covariance structure. Findings cannot generalize to actual marketing campaigns.
Simultaneous Causality — In real data, conversion rate could affect future ad spend and campaign type choices, creating reverse causality. Cross-sectional OLS cannot address this; IV or experimental designs would be required.

This project as a template

Despite these limitations, our specification, cleaning pipeline, diagnostics, and robustness checks form a fully reproducible framework ready to apply to real marketing data — where theoretically predicted relationships should emerge.

Let me walk through our limitations using the validity framework from class.

For internal validity — first, omitted variable bias. We likely omit brand awareness, targeting sophistication, and creative quality, all of which affect conversion and correlate with our included variables. We partially address this with Engagement Score and Acquisition Cost as proxies, but these are imperfect.

Second, functional form. Our linear specification is appropriate for this dataset given the uniform distribution of Conversion Rate. But in real-world data with right-skewed Clicks and Impressions, log transformations would likely be needed.

Third, measurement error. Engagement Score is an undocumented index. If it is a noisy proxy for true purchase intent, classical measurement error theory tells us the coefficient would be biased toward zero — which is consistent with what we observe, though we cannot distinguish this from genuine zero effect here.

For external validity — our data is synthetic. Variables were generated independently with no real covariance structure, so our null findings cannot be extrapolated to real campaigns. This is the most fundamental limitation.

Finally, simultaneous causality. In real data, conversion rate outcomes likely feed back into future campaign allocation decisions, creating reverse causality between our dependent variable and the categorical controls. Cross-sectional OLS cannot address this.

Conclusion

Main Findings

No variable significantly explains Conversion Rate — all coefficients near zero, Adj R² ≈ 0
Results consistent across all robustness checks — dropping variables or restricting the sample does not change the core finding
Influencer dummy marginally significant at 10% — directionally consistent with theory, but likely a large-sample artifact on synthetic data
No econometric issues detected — BP test homoscedastic, all VIF ≈ 1.0, zero outliers

Actionable Recommendation

Firms should not assume that channel type or campaign format alone drives conversion rate. Before optimizing campaign mix, invest in real tracking infrastructure and data quality — specifically, linking ad exposure to individual-level conversion outcomes. Until that covariance structure exists in the data, any model will return null results regardless of how well-specified it is. This framework is ready to deploy the moment that data is available.

If We Could Continue

Apply to real campaign-level data
Explore log transformations of Clicks and Impressions
Test interaction terms (e.g., Engagement Score × Campaign Type)
Address endogeneity via IV or natural experiment

Project Contributions

Member	Role
Luis Berumen	Research & Literature
Jake Evans	Data Cleaning & Stats
Mickyas Shawel	Model Specification
Aldo Naranjo	Regression & Presentation

To summarize our four main findings. One: no variable significantly explains Conversion Rate — all coefficients are near zero and Adj R² is essentially zero. Two: results are stable across all three robustness checks. Three: the Influencer dummy is marginally significant at 10%, directionally consistent with theory but likely a large-sample artifact. Four: no econometric issues detected.

For our actionable recommendation — even though we have null results, we can still take a position. Our recommendation to a firm or marketing team is this: do not optimize campaign mix based on channel type or format alone. The more fundamental investment is in data infrastructure — specifically, linking individual-level ad exposure to actual conversion outcomes. The reason our model returns null results is not a flaw in the methodology; it is a flaw in the data-generating process. Synthetic data has no real covariance structure by design. But this exact framework — the model, the diagnostics, the robustness checks — is ready to be applied the moment a firm has real tracking data that links impressions and clicks to conversions at the individual level. That is the prerequisite, and that is the recommendation.

Thank you.

References

Academic Sources

Manchanda, P., Dube, J. P., Goh, K. Y., & Chintagunta, P. K. (2006). The effect of banner advertising on internet purchasing. Journal of Marketing Research, 43(1), 98–108.
Kireyev, P., Pauwels, K., & Gupta, S. (2016). Do display ads influence search? Attribution and dynamics in online advertising. International Journal of Research in Marketing, 33(3), 475–490.

Academic Sources (cont.)

Blake, T., Nosko, C., & Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: A large-scale field experiment. Econometrica, 83(1), 155–174.
Lambrecht, A., & Tucker, C. (2013). When does retargeting work? Information specificity in online advertising. Journal of Marketing Research, 50(5), 561–576.

Data & Software

Bhatt, M. (2023). Marketing Campaign Performance Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/manishabhatt22/marketing-campaign-performance-dataset
R Core Team (2024). R: A language and environment for statistical computing. https://www.R-project.org/