1 Research note

This note specifies outcome variables, explanatory dimensions, and identification strategies to evaluate whether algorithmic pricing improves retailer profitability. Emphasis is placed on firm-level financial indicators available in the dataset and on causal methods suitable for staggered adoption across markets.

The study links retailer–product level evidence of algorithmic pricing on PriceSpy to firm-year financial outcomes. Adoption is detected at the firm level (AlgoRetailer = 0/1). Profitability impacts are estimated by comparing adopters with matched non-adopters using CEM/PSM, supplemented with DiD and SCM designs.

1.1 Profitability and performance

All firm-level monetary variables are expressed in EUR, nominal, after standardising magnitudes and converting currencies as described under “Accounting units and currency standardisation”. Unless stated otherwise, statistics and estimations use the winsorised versions of continuous variables.

  • EBIT (earnings before interest and tax, EUR)
  • OpProfit (operating profit, EUR)
  • NetIncome (EUR)
  • ROA = OpProfit ÷ TA
  • ProfitMargin = OpProfit ÷ Sales
  • SalesPerTA = Sales ÷ TA (asset turnover)
  • Growth indicators: SalesGrowth, OpProfitGrowth, ROAGrowth, ProfitMarginGrowth, SalesPerTAGrowth (firm-wise first differences divided by lag levels)

Winsorisation (implemented after currency conversion):
Within year, trim to the 1st/99th percentiles and store suffixed variables used in matching and regressions:

ROA_W, ProfitMargin_W, SalesPerTA_W, AvgInv_W, InvTurnover_W, DIO_W, SalesGrowth_W, OpProfitGrowth_W, ROAGrowth_W, ProfitMarginGrowth_W, SalesPerTAGrowth_W, DebtRatio_W, LogTA_W, LogSales_W.

(Sensitivity: a stricter variant winsorises within Country×Year; results available on request.)

1.2 Explanatory and treatment variables

Treatment

  • AlgoRetailer (0/1): Firm classified as algorithmic retailer (treatment group).
  • onPSpy: Indicator of presence on the PriceSpy platform (treated as 1 when missing for CH rows during integration).

Market presence and portfolio

  • Market coverage: number of categories and products listed (if aggregated).

  • Firm identifiers: StoreID, Identifier, CompanyName.

    • Identifier: not used in analysis, but included for data merging and tracking.
    • StoreID: links firm data to PriceSpy records.
    • CompanyName: unique firm identification (canonical).
  • Contextual dimensions: Country, Industry, Sector, SectorNew.

    • Country: one of the seven major markets where PriceSpy operates.
    • Industry: fine-grained sub-industry classification (Eikon) and 21 SNI codes (SHOF).
    • Sector: contains empty strings for Swedish data (no Sector column provided).
    • SectorNew: coarser aggregation of Industry used for grouping.

Scale and structure

  • TA (total assets), Sales (turnover), TCap (capital).
  • Leverage: DebtLT, DebtIntB, DebtRatio (and DebtRatio_W).
  • Liquidity and working capital: Cash, AccRec, AccPay, Inv, AvgInv (and AvgInv_W).
  • Cost and financing items: COGS, Expenses, Interest, Tax.

1.3 Controls for matching (CEM / PSM)

Construct a control group of non-adopters matched on pre-treatment characteristics. Suggested coarsening:

  • Country and sector: exact match
  • Size: TA, Sales, TCap (quantile bins)
  • Profitability: ROA_W, ProfitMargin_W
  • Leverage and liquidity: DebtRatio_W, Cash
  • Growth pre-trends: SalesGrowth_W, ROAGrowth_W
  • Platform presence/tenure: onPSpy, years active

Notes:

  • Prefer CEM to reduce model dependence; retain PSM as robustness.
  • Use winsorised pre-treatment averages to avoid distortion by outliers.

1.4 Identification and estimands

Difference-in-Differences (staggered adoption)

  • Event-study around adoption, report dynamic leads/lags
  • Use robust DiD estimators (Callaway–Sant’Anna; Sun–Abraham)
  • Firm and time fixed effects; optionally firm-by-category FE

Synthetic Control (SCM) for targeted cases

  • For markets with few adopters or single major firm adopters
  • Construct counterfactual trajectories from matched controls

Sensitivity

  • Vary adoption definitions (e.g., robustness to alternative thresholds)
  • Placebo adoption dates; drop early/late adopters
  • Reweight by firm size or leverage

1.5 Accounting units and currency standardisation

We standardise units before computing ratios and growth rates:

Magnitude adjustment

  • SHOF: reported in SEK thousands → multiply by 1,000.
  • Eikon: reported in EURno change.
  • Companies House (CH): reported in GBP millions → multiply by 1,000,000.

Currency conversion to EUR

  • SEK→EUR: divide monetary variables by the SEK/EUR rate (SekEur) for the corresponding year.
  • GBP→EUR: divide by the GBP/EUR rate (GbpEur) for the corresponding year.
  • FX rates are the ECB euro foreign exchange reference rates. For each year we take the last available business day in December (31 Dec when published; otherwise the most recent business day), excluding NA observations.
  • Practical detail: ECB publishes only on TARGET business days; a row can exist for 31 Dec but have an NA value. In such cases we select the last non-NA December observation; if an entire December is NA, we fall back to the latest non-NA earlier in the year.
  • After conversion we drop the temporary SekEur/GbpEur columns and proceed to construct ratios.

Order of operations

  1. Magnitude normalisation → 2) Currency conversion to EUR → 3) Derived ratios and growths → 4) NA/Inf filtering for derived metrics → 5) Within-year winsorisation.

Implications for interpretation

  • All monetary levels (e.g., TA, Sales, OpProfit, EBIT, NetIncome) are now EUR and comparable across sources and countries.
  • Ratios (ROA, ProfitMargin, SalesPerTA) are unit-free and unaffected by magnitude normalisation but can be affected by currency conversion if numerator/denominator originate from different books prior to harmonisation (not the case here).

1.6 Measurement definitions

Name Definition / formula Source
EBIT Operating profit before interest & tax Financial accounts
OpProfit Operating profit (alt. measure) Financial accounts
NetIncome Profit after tax and financing Financial accounts
Sales Turnover Financial accounts
TA Total assets Financial accounts
ROA OpProfit ÷ TA Derived
ProfitMargin OpProfit ÷ Sales Derived
SalesPerTA Sales ÷ TA (asset turnover) Derived
SalesGrowth ΔSales / Sales (YoY) Derived
ROAGrowth ΔROA / ROA (YoY) Derived
ProfitMarginGrowth ΔProfitMargin / ProfitMargin (YoY) Derived
SalesPerTAGrowth ΔSalesPerTA / SalesPerTA (YoY) Derived
DebtRatio Debt ÷ TA Financial accounts
AvgInv Average inventory Financial accounts
InvTurnover Sales ÷ AvgInv Derived
DIO Days inventory outstanding Derived
LogTA log(TA) Derived
LogSales log(Sales) Derived
*_W variables Winsorised versions of above (see list) Derived (trimmed)
AlgoRetailer 0/1 adoption indicator Platform data
onPSpy Presence on PriceSpy Platform data
Country, SectorNew Firm context Metadata

Notes

  • Units: All monetary variables are in EUR (nominal) after magnitude and currency conversion.
  • FX source: ECB euro reference rates (daily business days). Year-end rate is the last non-NA business day in December for SEK/EUR and GBP/EUR.
  • Weekend/holiday gaps: ECB does not publish on weekends/holidays. We do not forward-fill for the year-end selector; we explicitly pick the last available non-NA December date.
  • Data caveat: In rare years where December is entirely NA for a given series, we fall back to the latest non-NA earlier in the year (flagged in QA).

2 Data structure & scope

We load the harmonised firm–year panel, inspect structure and coverage, and generate a few quick tables.

  • Unit: firm-year with treatment status and winsorised variables
  • Keys: Country, CompanyName, Year
  • Merge: firm-level financials linked with adoption classification
  • Time span: 2000–2024 for Sweden and 2010–2024 for other countries
Panel scope by country (unique algos, on-PriceSpy firms, and all firms)
Country Unique algos Unique on PriceSpy Unique firms
Denmark 1 10 1234
Finland 1 12 3403
France 3 24 41188
Norway 2 35 6863
Sweden 22 374 974671
UK 10 50 14095
Total 33 493 1041443

Before turning to the descriptive comparison, we clarify the meaning of key variables:

  • ROA_W (Return on Assets, winsorised): Operating profit relative to total assets.
  • ProfitMargin_W (Profit margin, winsorised): Operating profit relative to sales.
  • SalesPerTA_W (Asset turnover, winsorised): Sales relative to total assets.
  • LogTA_W (Log of total assets, winsorised): Size of the firm in terms of asset base.
  • LogSales_W (Log of sales, winsorised): Size of the firm in terms of revenue.
Algo vs Non-algo on PriceSpy by Country (mean with sd in parentheses)
Country Group Unique firms ROA Profit margin Asset turnover Log TA Log Sales
Denmark Non-algo 10 -0.109 (0.666) -0.192 (1.175) 3.382 (1.759) 10.619 (1.192) 11.811 (1.255)
Finland Non-algo 12 0.045 (0.104) 0.007 (0.060) 3.609 (2.377) 9.654 (1.731) 10.755 (1.584)
France Non-algo 22 0.061 (0.174) -0.201 (1.403) 2.214 (1.078) 9.790 (2.642) 10.196 (2.766)
France Algo 2 0.117 (0.031) 0.025 (0.010) 4.916 (0.730) 7.059 (0.221) 8.306 (0.794)
Norway Non-algo 34 0.110 (0.162) 0.045 (0.076) 2.947 (1.498) 8.725 (1.712) 9.644 (1.595)
Norway Algo 1 0.100 (0.155) 0.010 (0.017) 6.653 (4.151) 8.351 (2.033) 9.542 (2.206)
Sweden Non-algo 506 0.057 (0.211) -0.026 (0.598) 2.770 (1.683) 13.181 (2.281) 13.839 (2.196)
Sweden Algo 31 0.000 (0.294) -0.123 (0.916) 3.359 (1.659) 12.618 (2.455) 13.494 (2.299)
UK Non-algo 41 0.135 (0.285) 0.001 (0.787) 2.445 (1.634) 10.575 (2.963) 11.145 (2.728)
UK Algo 9 0.082 (0.264) -0.026 (0.442) 3.370 (1.193) 13.560 (3.250) 14.141 (2.857)

3 Data Integration and Preprocessing

Scope. We merge the main panel (APfullDATA.rds) with Companies House (CH) and harmonise identifiers, industry labels, accounting magnitudes, currencies, and derived measures.

3.1 Ingestion & normalisation

  • Load main panel (Eikon; SHOF for Sweden) and CH long-format Excel; drop unused fields (after, CompanyID, IndustryGroup).
  • Annotate dataSource ("Eikon", "SHOF", "CH"), set onPSpy = 1 where missing in CH.
  • Force Country = "Sweden" for SHOF rows; coerce IDs to character.

3.2 Duplicate resolution & name standardisation

  • Preserve original CompanyName as CompanyNameOriginal; standardise via _scripts/standardize_company_names.R and _scripts/standardize_names_shof_eikon.R.
  • Lower-case CompanyName for matching; when both Eikon and SHOF exist for the same (CompanyName,Year), keep Eikon, drop SHOF.
  • Construct SectorNew from Sector with fallbacks; map CH “RETAILER” → “Retail Trade”.

3.3 Magnitude & currency

  • Apply magnitude factors (SHOF ×1,000; CH ×1,000,000; Eikon unchanged).
  • Merge year-end ECB rates: SekEur (EXR.D.SEK.EUR.SP00.A) and GbpEur (EXR.D.GBP.EUR.SP00.A), selecting the last non-NA December observation per year; fall back to the latest non-NA earlier in the year if needed.
  • Convert SHOF (SEK) and CH (GBP) to EUR by division; remove SekEur and GbpEur afterwards.

3.4 Derived variables (post-conversion)

  • Profitability/efficiency: ROA = OpProfit/TA, ProfitMargin = OpProfit/Sales, SalesPerTA = Sales/TA.
  • Working capital & inventory: AvgInv = (Inv + lag(Inv))/2 (by firm), InvTurnover = COGS/AvgInv, DIO = 365/InvTurnover.
  • Growth: first differences divided by lag (by firm) for Sales, OpProfit, ROA, ProfitMargin, SalesPerTA.
  • Leverage & scale: DebtRatio = DebtLT/TA, LogTA = log(TA+1), LogSales = log(Sales+1).

3.5 NA/Inf handling (post-conversion, pre-winsorisation)

  • Do not drop adopters (AlgoRetailer==1) solely for missing TA/Sales/OpProfit in raw sources; otherwise drop rows missing these keys.
  • Remove rows where derived ratios are Inf, -Inf, or NaN; retain expected first-year NAs in growth rates.
  • Document and flag firms with persistent missing financials (QA list retained).

3.6 Winsorisation

  • Within-year 1st/99th percentile winsorisation; create _W copies used for matching and estimation. (Sensitivity: Country×Year.)

3.7 Backups & QA

  • Persist _data/interim/backup_pre_final_clean.csv, _data/interim/backup_final_clean.csv, and final _data/processed/financials_final_clean.csv.
  • QA checks: year coverage (Sweden 2000–2024; other countries 2010–2024), empty-string scans in character columns, and summary stats of _W variables.