IRI distribution assessment plan

Author

Boris Lebedenko

Problem definition

The primary objective of this project is to elucidate the factors which affect Inventory Record Inaccuracy (IRI).
In order to achieve this objective via regression modeling, the distributions of the IRI’s need to be studied.
We focus on parametric models previously suggested in the literature, both continuous and discrete.

Data

The data will be collected from \(M=86\) branches about \(n=15\) items.
For each item there are 6 types of IRI, each to be examined separately.

Table 1: Types of IRI and their notation

IRI type	Notation	Suggested distributions
Shelf	s
Backroom shelf	b
Checkout	w	Normal
Data capture	c	Log-normal
Visual complexity	v	Log-normal
Total	T

Denote by \(\varepsilon^{(k)}_{j,i}\) the k-th type of IRI for branch \(j\) \((j = 1,2,\dots, M)\) and item type \(i\) \((i=1,2,\dots,n)\).
- We omit the upper index \(^{(k)}\) for notation brevity.

Assumption 1: The errors for each item are independent and identically distributed within a branch, i.e., \(\varepsilon_{j,1} \sim \varepsilon_{j,2} \sim \dots \sim \varepsilon_{j,15} \sim \mathcal{F}_{\theta_j}\).

Assumption 2: The errors are independent and identically distributed across all branches and items, i.e., \(\varepsilon_{j_1,i_1} \sim \varepsilon_{j2,i2} \sim \mathcal{F}_\theta\) for every \(j_1 \neq j_2\) and \(i_1 \neq i_2\).

Simultaneous testing for normality

We are testing the following hypotheses:

\(H_0\): For every \(j=1,2,\dots,86\):

\[\varepsilon_{j1},\dots,\varepsilon_{jn} \sim \mathcal{N}(\mu_j,\sigma^2_j)\]

\(H_1\): Otherwise, i.e., for some \(j\), \(\varepsilon_{ji}\) is not normally-distributed.

The following sections describe proposed methods to test these hypotheses.

Approach 1. Standardize-Combine-Assess

This approach relies on the assumption that all 86 branches share the same shape and differ only in mean and variance.

For the \(j\)-th branch calculate:

\[\hat{\mu}_j = \frac{1}{n}\sum_{i=1}^{n}\varepsilon_{ji},\quad \hat{\sigma}^2_j = \frac{1}{n-1}\sum_{i=1}^n(\varepsilon_{ji} - \hat{\mu}_j)^2.\] 2. Obtain standardized values:

\[z_{ji} = \frac{\varepsilon_{ji}-\hat{\mu}_j}{\hat{\sigma}_j}.\]

Collate all standardized values into one big vector \(\mathbf{z}\in \mathbb{R}^{86\times15}\).
Use the Shapiro-Wilk/Shapiro-Francia/Anderson-Darling test to assess the normality of \(\mathbf{z}\).

Approach 2. Meta-analytic multiple testing

Compute a normality test (Shapiro-Wilk) p-value for each sample \(j\): \(p_j\).
Use Fisher’s method to obtain a test statistic for the combined p-values:

\[\chi^2 = -2\sum_{j=1}^{86}\log(p_j).\] This test statistic is chi-squared distributed with \(2\times 86\) degrees of freedom under \(H_0\). A large test statistic implies that the combined p-value is small.

Note: This method inherently tests “are any non-normal?” If all p-values are modestly non-significant, the combined test likely won’t reject. However, if any single sample is severely non-normal, it may pull down the combined p-value.

Simultaneous testing for a specified parametric family

In this section we formulate testing frameworks for a specific parametric family \(\mathcal{F}_\theta\) for each class of IRI.

Approach 1. Pooled approach

Assumption \(\theta_1 = \theta_2 = \dots = \theta_M\), i.e., the parameter of the IRI distribution is equal for all \(M\) branches we can employ the following approach:

Use Maximum Likelihood Estimation on the entire sample vector \(\mathbf{\varepsilon}\) to obtain the estimator \(\hat{\theta}\).
Use a goodness of fit test (Pearson’s chi-squared for discrete and Kolmogorov-Smirnov) to assess whether the distribution is appropriate. Use a visualization like a QQ-plot or a histogram with an overlay of the density.

Approach 2. Meta-analytic multiple testing

Same framework as in the normal case with the appropriate estimation scheme for \(\theta\).

Selecting distribution family for IRI from specified alternatives

Some IRI types require choosing a parametric family. We consider the following alternatives:

Normal
Laplace
Log-normal
Beta

For each candidate model obtain an estimate \(\hat{\theta}\) of the parameter \(\theta\) via Maximum Likelihood based on \(\mathbf{\varepsilon}\).
Compute the log-likelihood for each fitted distribution: \[\ell(\hat{\theta}) = \sum_{t=1}^{M\times n} \ln\left(f(x_t\mid\hat{\theta})\right)\]
Calculate the AIC (Akaike Information Criterion): \[\text{AIC} = -2\ell(\hat{\theta}) + 2p\] where \(p\) is the number of estimated parameters.
Select the distribution that yields the lowest AIC (implying a better fit to the data).
Validate the fit using a goodness-of-fit test and a visual assessment.