1 Research note

This note specifies outcome variables, explanatory dimensions, and identification strategies to evaluate whether algorithmic pricing improves retailer profitability. Emphasis is placed on firm-level financial indicators available in the dataset and on causal methods suitable for staggered adoption across markets.

The study links retailer–product level evidence of algorithmic pricing on PriceSpy to firm-year financial outcomes. Adoption is detected at the firm level (AlgoRetailer = 0/1). Profitability impacts are estimated by comparing adopters with matched non-adopters using CEM/PSM, supplemented with DiD and SCM designs.

1.1 Profitability and performance

  • EBIT (earnings before interest and tax)
  • OpProfit (alternative operating profit definition in the data)
  • NetIncome (bottom line, after tax and financing)
  • ROA = OpProfit ÷ TA
  • ProfitMargin = OpProfit ÷ Sales
  • SalesPerTA = Sales ÷ TA (asset turnover)
  • Growth indicators: SalesGrowth, OpProfitGrowth, ROAGrowth, ProfitMarginGrowth, SalesPerTAGrowth

Winsorization:
To reduce the influence of extreme outliers, the following variables are winsorized at the 1st/99th percentiles within each year. Winsorized versions carry the _W suffix and are used in matching and regressions:

  • ROAROA_W
  • ProfitMarginProfitMargin_W
  • SalesPerTASalesPerTA_W
  • AvgInvAvgInv_W
  • InvTurnoverInvTurnover_W
  • DIODIO_W
  • SalesGrowthSalesGrowth_W
  • OpProfitGrowthOpProfitGrowth_W
  • ROAGrowthROAGrowth_W
  • ProfitMarginGrowthProfitMarginGrowth_W
  • SalesPerTAGrowthSalesPerTAGrowth_W
  • DebtRatioDebtRatio_W
  • LogTALogTA_W
  • LogSalesLogSales_W

Rationale: EBIT/OpProfit, ROA, and ProfitMargin are central profitability metrics. SalesPerTA and growth rates capture efficiency and dynamics of performance. Winsorization ensures these measures are not distorted by extreme values.

1.2 Explanatory and treatment variables

Treatment

  • AlgoRetailer (0/1): Firm classified as algorithmic retailer (treatment group).
  • onPSpy: Indicator of presence on the PriceSpy platform.

Market presence and portfolio

  • Market coverage: number of categories and products listed (if aggregated).

  • Firm identifiers: StoreID, Identifier, CompanyName.

    • Identifier: not used in analysis, but included for data merging and tracking.
    • StoreID: links firm data to PriceSpy records.
    • CompanyName: unique firm identification (canonical).
  • Contextual dimensions: Country, Industry, Sector, SectorNew.

    • Country: one of the seven major markets where PriceSpy operates.
    • Industry: fine-grained sub-industry classification (Eikon) and 21 SNI codes (SHOF).
    • Sector: contains empty strings for Swedish data (no Sector column provided).
    • SectorNew: coarser aggregation of Industry used for grouping.

Scale and structure

  • TA (total assets), Sales (turnover), TCap (capital).
  • Leverage: DebtLT, DebtIntB, DebtRatio (and DebtRatio_W).
  • Liquidity and working capital: Cash, AccRec, AccPay, Inv, AvgInv (and AvgInv_W).
  • Cost and financing items: COGS, Expenses, Interest, Tax.

1.3 Controls for matching (CEM / PSM)

Construct a control group of non-adopters matched on pre-treatment characteristics. Suggested coarsening:

  • Country and sector: exact match
  • Size: TA, Sales, TCap (quantile bins)
  • Profitability: ROA_W, ProfitMargin_W
  • Leverage and liquidity: DebtRatio_W, Cash
  • Growth pre-trends: SalesGrowth_W, ROAGrowth_W
  • Platform presence/tenure: onPSpy, years active

Notes:

  • Prefer CEM to reduce model dependence; retain PSM as robustness.
  • Use winsorized pre-treatment averages to avoid distortion by outliers.

1.4 Identification and estimands

Difference-in-Differences (staggered adoption)

  • Event-study around adoption, report dynamic leads/lags
  • Use robust DiD estimators (Callaway–Sant’Anna; Sun–Abraham)
  • Firm and time fixed effects; optionally firm-by-category FE

Synthetic Control (SCM) for targeted cases

  • For markets with few adopters or single major firm adopters
  • Construct counterfactual trajectories from matched controls

Sensitivity

  • Vary adoption definitions (e.g., robustness to alternative thresholds)
  • Placebo adoption dates; drop early/late adopters
  • Reweight by firm size or leverage

1.5 Measurement definitions

Name Definition / formula Source
EBIT Operating profit before interest & tax Financial accounts
OpProfit Operating profit (alt. measure) Financial accounts
NetIncome Profit after tax and financing Financial accounts
Sales Turnover Financial accounts
TA Total assets Financial accounts
ROA OpProfit ÷ TA Derived
ProfitMargin OpProfit ÷ Sales Derived
SalesPerTA Sales ÷ TA (asset turnover) Derived
SalesGrowth ΔSales / Sales (YoY) Derived
ROAGrowth ΔROA / ROA (YoY) Derived
ProfitMarginGrowth ΔProfitMargin / ProfitMargin (YoY) Derived
SalesPerTAGrowth ΔSalesPerTA / SalesPerTA (YoY) Derived
DebtRatio Debt ÷ TA Financial accounts
AvgInv Average inventory Financial accounts
InvTurnover Sales ÷ AvgInv Derived
DIO Days inventory outstanding Derived
LogTA log(TA) Derived
LogSales log(Sales) Derived
*_W variables Winsorized versions of above (see list) Derived (trimmed)
AlgoRetailer 0/1 adoption indicator Platform data
onPSpy Presence on PriceSpy Platform data
Country, SectorNew Firm context Metadata

2 Data structure & scope

We load the harmonized firm–year panel, inspect structure and coverage, and generate a few quick tables.

  • Unit: firm-year with treatment status and winsorized variables
  • Keys: Country, CompanyName, Year
  • Merge: firm-level financials linked with adoption classification
  • Time span: 2000–2024 for Sweden and 2010 - 2024 for other countries
Panel scope by country (unique algos, on-PriceSpy firms, and all firms)
Country Unique algos Unique on PriceSpy Unique firms
Denmark 1 10 1234
Finland 1 12 3403
France 3 24 41188
Norway 2 35 6863
Sweden 22 374 974667
UK 10 50 14095
Total 33 493 1041439

Before turning to the descriptive comparison, we clarify the meaning of the key variables:

  • ROA_W (Return on Assets, winsorized): Operating profit relative to total assets.
    • Higher is better → indicates more efficient use of assets to generate profit.
  • ProfitMargin_W (Profit margin, winsorized): Operating profit relative to sales.
    • Higher is better → indicates stronger pricing power or cost control.
  • SalesPerTA_W (Asset turnover, winsorized): Sales relative to total assets.
    • Higher is better → indicates more efficient use of assets to generate revenue.
  • LogTA_W (Log of total assets, winsorized): Size of the firm in terms of asset base.
    • Higher indicates larger firms; not a “better/worse” measure, but important for scale.
  • LogSales_W (Log of sales, winsorized): Size of the firm in terms of revenue.
    • Higher indicates larger sales volumes; again, not a “better/worse” measure, but useful for scale.
Algo vs Non-algo on PriceSpy by Country (mean with sd in parentheses)
Country Group Unique firms ROA Profit margin Asset turnover Log TA Log Sales
Denmark Non-algo 10 -0.109 (0.666) -0.192 (1.175) 3.382 (1.759) 10.581 (1.254) 11.485 (0.845)
Finland Non-algo 12 0.045 (0.104) 0.007 (0.060) 3.609 (2.377) 9.642 (1.710) 10.588 (1.317)
France Non-algo 22 0.061 (0.174) -0.201 (1.403) 2.214 (1.078) 9.348 (2.215) 9.660 (2.085)
France Algo 2 0.117 (0.031) 0.025 (0.010) 4.916 (0.730) 6.725 (0.660) 8.306 (0.794)
Norway Non-algo 34 0.110 (0.162) 0.045 (0.076) 2.947 (1.498) 8.687 (1.686) 9.570 (1.452)
Norway Algo 1 0.100 (0.155) 0.010 (0.017) 6.653 (4.151) 7.756 (2.874) 9.542 (2.206)
Sweden Non-algo 506 0.057 (0.211) -0.026 (0.598) 2.770 (1.683) 8.887 (1.967) 9.558 (1.931)
Sweden Algo 31 0.000 (0.294) -0.123 (0.916) 3.359 (1.659) 8.588 (1.854) 9.490 (1.780)
UK Non-algo 41 0.135 (0.285) 0.001 (0.787) 2.445 (1.634) 9.848 (2.287) 10.397 (1.752)
UK Algo 9 0.082 (0.264) -0.026 (0.442) 3.370 (1.193) 7.190 (2.925) 8.628 (2.995)

3 Data Integration and Preprocessing

Scope. Firm–year panel linking PriceSpy retailer presence and algo status to financials from multiple sources (Eikon, SHOF, Companies House/CH). Processing aims for harmonized identifiers, consistent industry/sector labels, clean accounting signs, derived ratios, and robust outlier handling.

3.1 Ingestion and harmonization

  • Load main panel (APfullDATA.rds) and UK Companies House long-format Excel.

  • Drop unused columns (after, CompanyID) and IndustryGroup from CH.

  • Rename CH columns to align with the main schema (e.g., Identifier, StoreID, CompanyName, Country, Industry, Sector, TA, Year, Sales, OpProfit, EBIT, NetIncome, etc.).

  • Annotate CH rows: dataSource = "CH", onPSpy = 1; cast Identifier to character.

  • Row-bind main and CH data; keep only data in environment.

  • Remove blank CompanyName; convert to data.table (dt).

  • Normalize fields:

    • Force Country = "Sweden" where dataSource == "SHOF".
    • Replace onPSpy = NA with 1 (assume NA = on platform).
    • CSV round-trip to standardize storage types.

3.2 De-duplication and firm name standardization

  • Focus on Sweden & on-platform rows (onPSpy == 1 & Country == "Sweden").

  • Prefer Eikon over SHOF when duplicates exist in the same (CompanyName, Year).

  • Standardize retailer names:

    • Apply _scripts/standardize_company_names.R (manual mapping/audit).
    • Lowercase CompanyName to catch residual duplicates.
    • Apply _scripts/standardize_names_shof_eikon.R to align SHOF ↔︎ Eikon variants.
  • Remove SHOF rows where an Eikon row exists for the same CompanyNameYear.

3.3 Industry/Sector consolidation

  • Clean Industry: trim whitespace, map specific numeric codes to ALL-CAPS labels; convert ""NA.

  • For CH rows with Industry == NA, set Industry = "RETAILER" (temporary canonical label).

  • Build SectorNew:

    • If Sector == "", set SectorNew = Industry; otherwise SectorNew = Sector.
    • Replace SectorNew == "RETAILER" with "Retail Trade".
  • Expected quirk: Sector contains empty strings for Swedish data; handled via SectorNew.

3.4 Basic filtering and sign conventions

  • Keep a safety backup (dt_backup).

  • Remove rows with missing key inputs unless treated (AlgoRetailer == 1):

    • Drop if TA is NA (except treated), drop if OpProfit is NA (except treated), drop if Sales is NA (except treated).
  • Enforce accounting sign: set negative COGS to abs(COGS).

3.5 Core derived variables (firm–year)

  • Profitability and efficiency:

    • ROA = OpProfit / TA, ProfitMargin = OpProfit / Sales, SalesPerTA = Sales / TA.
  • Working capital & inventory dynamics:

    • AvgInv = (Inv + lag(Inv)) / 2 (by CompanyName), InvTurnover = COGS / AvgInv, DIO = 365 / InvTurnover.
  • Growth rates (by CompanyName; first year naturally NA):

    • SalesGrowth, OpProfitGrowth, ROAGrowth, ProfitMarginGrowth, SalesPerTAGrowth.
  • Leverage & scale transforms:

    • DebtRatio = DebtLT / TA; LogTA = log(TA + 1), LogSales = log(Sales + 1).

3.6 NA/Inf/NaN handling

  • For non-key metrics when onPSpy == 0, drop rows with COGS or Inv missing.

  • Remove rows where any of the following are Inf, -Inf, or NaN:
    ROA, ProfitMargin, SalesPerTA, AvgInv, InvTurnover, DIO, DebtRatio.
    (Growth variables keep expected first-year NAs.)

  • Replace any residual Inf/NaN in all derived variables (incl. growths) with NA.

3.7 Winsorization (within-year)

  • Trim the 1st/99th percentiles within Year and create _W copies (preferred for matching/estimation):
    ROA_W, ProfitMargin_W, SalesPerTA_W, AvgInv_W, InvTurnover_W, DIO_W,
    SalesGrowth_W, OpProfitGrowth_W, ROAGrowth_W, ProfitMarginGrowth_W, SalesPerTAGrowth_W,
    DebtRatio_W, LogTA_W, LogSales_W.
    (Option: winsorize within Country × Year for stricter comparability.)

3.8 Backups and QA

  • Write intermediary backups to _data/interim/backup_pre_final_clean.csv and _data/interim/backup_final_clean.csv.

  • Sanity checks:

    • summary(dt); verify year coverage (2000–2024).
    • Scan character columns for empty strings (expected only in original Sector).
    • Quick histograms/plots for years and key winsorized ratios.
  • Output: a harmonized, de-duplicated firm–year panel with derived and winsorized variables, ready for matching (CEM/PSM) and causal analysis.