Algorithmic Pricing &
Profitability
Variables and Identification Plan
This note specifies outcome variables, explanatory dimensions, and identification strategies to evaluate whether algorithmic pricing improves retailer profitability. Emphasis is placed on firm-level financial indicators available in the dataset and on causal methods suitable for staggered adoption across markets.
The study links retailer–product level evidence of algorithmic
pricing on PriceSpy to firm-year financial outcomes. Adoption is
detected at the firm level (AlgoRetailer
= 0/1).
Profitability impacts are estimated by comparing adopters with matched
non-adopters using CEM/PSM, supplemented with DiD and SCM designs.
Winsorization:
To reduce the influence of extreme outliers, the following variables are
winsorized at the 1st/99th percentiles within each year. Winsorized
versions carry the _W
suffix and are used in matching and
regressions:
ROA_W
ProfitMargin_W
SalesPerTA_W
AvgInv_W
InvTurnover_W
DIO_W
SalesGrowth_W
OpProfitGrowth_W
ROAGrowth_W
ProfitMarginGrowth_W
SalesPerTAGrowth_W
DebtRatio_W
LogTA_W
LogSales_W
Rationale: EBIT/OpProfit, ROA, and ProfitMargin are central profitability metrics. SalesPerTA and growth rates capture efficiency and dynamics of performance. Winsorization ensures these measures are not distorted by extreme values.
Treatment
AlgoRetailer
(0/1): Firm classified as algorithmic
retailer (treatment group).onPSpy
: Indicator of presence on the PriceSpy
platform.Market presence and portfolio
Market coverage: number of categories and products listed (if
aggregated).
Firm identifiers: StoreID
, Identifier
,
CompanyName
.
Identifier
: not used in analysis, but included for data
merging and tracking.StoreID
: links firm data to PriceSpy records.CompanyName
: unique firm identification
(canonical).Contextual dimensions: Country
,
Industry
, Sector
, SectorNew
.
Country
: one of the seven major markets where PriceSpy
operates.Industry
: fine-grained sub-industry classification
(Eikon) and 21 SNI codes (SHOF).Sector
: contains empty strings for Swedish data (no
Sector column provided).SectorNew
: coarser aggregation of Industry
used for grouping.Scale and structure
TA
(total assets), Sales
(turnover),
TCap
(capital).DebtLT
, DebtIntB
,
DebtRatio
(and DebtRatio_W
).Cash
,
AccRec
, AccPay
, Inv
,
AvgInv
(and AvgInv_W
).COGS
, Expenses
,
Interest
, Tax
.Construct a control group of non-adopters matched on pre-treatment characteristics. Suggested coarsening:
Notes:
Difference-in-Differences (staggered adoption)
Synthetic Control (SCM) for targeted cases
Sensitivity
Name | Definition / formula | Source |
---|---|---|
EBIT | Operating profit before interest & tax | Financial accounts |
OpProfit | Operating profit (alt. measure) | Financial accounts |
NetIncome | Profit after tax and financing | Financial accounts |
Sales | Turnover | Financial accounts |
TA | Total assets | Financial accounts |
ROA | OpProfit ÷ TA | Derived |
ProfitMargin | OpProfit ÷ Sales | Derived |
SalesPerTA | Sales ÷ TA (asset turnover) | Derived |
SalesGrowth | ΔSales / Sales (YoY) | Derived |
ROAGrowth | ΔROA / ROA (YoY) | Derived |
ProfitMarginGrowth | ΔProfitMargin / ProfitMargin (YoY) | Derived |
SalesPerTAGrowth | ΔSalesPerTA / SalesPerTA (YoY) | Derived |
DebtRatio | Debt ÷ TA | Financial accounts |
AvgInv | Average inventory | Financial accounts |
InvTurnover | Sales ÷ AvgInv | Derived |
DIO | Days inventory outstanding | Derived |
LogTA | log(TA) | Derived |
LogSales | log(Sales) | Derived |
*_W variables | Winsorized versions of above (see list) | Derived (trimmed) |
AlgoRetailer | 0/1 adoption indicator | Platform data |
onPSpy | Presence on PriceSpy | Platform data |
Country, SectorNew | Firm context | Metadata |
We load the harmonized firm–year panel, inspect structure and coverage, and generate a few quick tables.
Country | Unique algos | Unique on PriceSpy | Unique firms |
---|---|---|---|
Denmark | 1 | 10 | 1234 |
Finland | 1 | 12 | 3403 |
France | 3 | 24 | 41188 |
Norway | 2 | 35 | 6863 |
Sweden | 22 | 374 | 974667 |
UK | 10 | 50 | 14095 |
Total | 33 | 493 | 1041439 |
Before turning to the descriptive comparison, we clarify the meaning of the key variables:
Country | Group | Unique firms | ROA | Profit margin | Asset turnover | Log TA | Log Sales |
---|---|---|---|---|---|---|---|
Denmark | Non-algo | 10 | -0.109 (0.666) | -0.192 (1.175) | 3.382 (1.759) | 10.581 (1.254) | 11.485 (0.845) |
Finland | Non-algo | 12 | 0.045 (0.104) | 0.007 (0.060) | 3.609 (2.377) | 9.642 (1.710) | 10.588 (1.317) |
France | Non-algo | 22 | 0.061 (0.174) | -0.201 (1.403) | 2.214 (1.078) | 9.348 (2.215) | 9.660 (2.085) |
France | Algo | 2 | 0.117 (0.031) | 0.025 (0.010) | 4.916 (0.730) | 6.725 (0.660) | 8.306 (0.794) |
Norway | Non-algo | 34 | 0.110 (0.162) | 0.045 (0.076) | 2.947 (1.498) | 8.687 (1.686) | 9.570 (1.452) |
Norway | Algo | 1 | 0.100 (0.155) | 0.010 (0.017) | 6.653 (4.151) | 7.756 (2.874) | 9.542 (2.206) |
Sweden | Non-algo | 506 | 0.057 (0.211) | -0.026 (0.598) | 2.770 (1.683) | 8.887 (1.967) | 9.558 (1.931) |
Sweden | Algo | 31 | 0.000 (0.294) | -0.123 (0.916) | 3.359 (1.659) | 8.588 (1.854) | 9.490 (1.780) |
UK | Non-algo | 41 | 0.135 (0.285) | 0.001 (0.787) | 2.445 (1.634) | 9.848 (2.287) | 10.397 (1.752) |
UK | Algo | 9 | 0.082 (0.264) | -0.026 (0.442) | 3.370 (1.193) | 7.190 (2.925) | 8.628 (2.995) |
Scope. Firm–year panel linking PriceSpy retailer presence and algo status to financials from multiple sources (Eikon, SHOF, Companies House/CH). Processing aims for harmonized identifiers, consistent industry/sector labels, clean accounting signs, derived ratios, and robust outlier handling.
Load main panel (APfullDATA.rds
) and UK Companies
House long-format Excel.
Drop unused columns (after
, CompanyID
)
and IndustryGroup
from CH.
Rename CH columns to align with the main schema (e.g.,
Identifier
, StoreID
, CompanyName
,
Country
, Industry
, Sector
,
TA
, Year
, Sales
,
OpProfit
, EBIT
, NetIncome
,
etc.).
Annotate CH rows: dataSource = "CH"
,
onPSpy = 1
; cast Identifier
to
character.
Row-bind main and CH data; keep only data
in
environment.
Remove blank CompanyName
; convert to
data.table
(dt
).
Normalize fields:
Country = "Sweden"
where
dataSource == "SHOF"
.onPSpy = NA
with 1
(assume NA = on
platform).Focus on Sweden & on-platform rows
(onPSpy == 1 & Country == "Sweden"
).
Prefer Eikon over SHOF when
duplicates exist in the same (CompanyName
,
Year
).
Standardize retailer names:
_scripts/standardize_company_names.R
(manual
mapping/audit).CompanyName
to catch residual
duplicates._scripts/standardize_names_shof_eikon.R
to align
SHOF ↔︎ Eikon variants.Remove SHOF rows where an Eikon row exists for the same
CompanyName
–Year
.
Clean Industry
: trim whitespace, map specific
numeric codes to ALL-CAPS labels; convert ""
→
NA
.
For CH rows with Industry == NA
, set
Industry = "RETAILER"
(temporary canonical label).
Build SectorNew
:
Sector == ""
, set SectorNew = Industry
;
otherwise SectorNew = Sector
.SectorNew == "RETAILER"
with
"Retail Trade"
.Expected quirk: Sector
contains
empty strings for Swedish data; handled via
SectorNew
.
Keep a safety backup (dt_backup
).
Remove rows with missing key inputs unless
treated (AlgoRetailer == 1
):
TA
is NA
(except treated), drop if
OpProfit
is NA
(except treated), drop if
Sales
is NA
(except treated).Enforce accounting sign: set negative COGS
to
abs(COGS)
.
Profitability and efficiency:
ROA = OpProfit / TA
,
ProfitMargin = OpProfit / Sales
,
SalesPerTA = Sales / TA
.Working capital & inventory dynamics:
AvgInv = (Inv + lag(Inv)) / 2
(by
CompanyName
), InvTurnover = COGS / AvgInv
,
DIO = 365 / InvTurnover
.Growth rates (by CompanyName
; first year naturally
NA
):
SalesGrowth
, OpProfitGrowth
,
ROAGrowth
, ProfitMarginGrowth
,
SalesPerTAGrowth
.Leverage & scale transforms:
DebtRatio = DebtLT / TA
;
LogTA = log(TA + 1)
,
LogSales = log(Sales + 1)
.For non-key metrics when
onPSpy == 0
, drop rows with COGS
or
Inv
missing.
Remove rows where any of the following are Inf
,
-Inf
, or NaN
:
ROA
, ProfitMargin
, SalesPerTA
,
AvgInv
, InvTurnover
, DIO
,
DebtRatio
.
(Growth variables keep expected first-year NA
s.)
Replace any residual Inf
/NaN
in all
derived variables (incl. growths) with NA
.
Year
and create _W
copies (preferred
for matching/estimation):ROA_W
, ProfitMargin_W
,
SalesPerTA_W
, AvgInv_W
,
InvTurnover_W
, DIO_W
,SalesGrowth_W
, OpProfitGrowth_W
,
ROAGrowth_W
, ProfitMarginGrowth_W
,
SalesPerTAGrowth_W
,DebtRatio_W
, LogTA_W
,
LogSales_W
.Country × Year
for stricter
comparability.)Write intermediary backups to
_data/interim/backup_pre_final_clean.csv
and
_data/interim/backup_final_clean.csv
.
Sanity checks:
summary(dt)
; verify year coverage (2000–2024).Sector
).Output: a harmonized, de-duplicated firm–year panel with derived and winsorized variables, ready for matching (CEM/PSM) and causal analysis.