In this article our propose is to analyze the selection distribution of the proportion, mean and variance of five stores carrying out a certain activity, distributed by daily turnover, as follows:
The population consists of 5 stores with different activities and turnovers, N={2,7,2,4,5} (in millions). From this population, we will extract 10 samples (n = 3) to analyze its characteristics and to examine the sampling distribution. The population distribution is characterized by the following characteristics: mean, variance, and standard deviation.
The mean of a population is the average value of all the units in the population:
\[ \mu = \frac{1}{N} \sum_{i=1}^{N} x_i \]
where:
\(\mu\) is the population mean,
\(N\) is the total number of units in the population,
\(x_i\) is the value of each unit.
The variance of a population measures how much the values deviate from the mean:
\[ \sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2 \]
where:
\(\sigma^2\) is the population variance,
\(\mu\) is the population mean,
\(x_i\) is the value of each unit.
The standard deviation is the square root of the variance:
\[ \sigma = \sqrt{\sigma^2} \]
where:
\(\sigma\) is the population standard deviation,
\(\sigma^2\) is the population variance.
The population distribution is characterized by the following features:
Thus, the mean, variance and the standard deviation of the population is \(\mu\) = 4, \(\sigma\) = 4.5 and \(\sqrt{\sigma^2}\) = 2.12.
The selection procedure used to extract samples is simple random sampling without replacement. For simple random sampling without replacement, the number of sample that we can extract is given by the following formula:
\[ C_N^n = \frac{N!}{n!(N-n)!} \]
where: N is the total number of units in the population and n is the size of the sample.
10 samples we can extract using this procedure: \[ C_5^3 = \frac{5!}{3!(5-3)!} = 10 \quad \text{samples} \]
The probability of extracting a single sample of size n = 3 units is given by the following formula:
\[ P(\text{sample}) = \frac{1}{C_N^n} = \frac{1}{\frac{N!}{n!(N-n)!}} \]
The probability of extracting a single sample using this procedure is:
\[ P(\text{sample}) = \frac{1}{C_5^3} = \frac{1}{\frac{5!}{3!(5-3)!}} = \frac{1}{10} = 0.1 \]
The probability of including a specific unit in the sample in simple random sampling without replacement is:
\[ P(\text{specific unit included}) = \frac{1}{N} + \dots + \frac{1}{N}= \sum_{i=1}^{n} \frac{1}{N} = \frac{n}{N} \]
The probability of including a unit in the sample using this procedure is:
\[ P(\text{specific unit included}) = \frac{1}{5} + \frac{1}{5} + \frac{1}{5}= \sum_{i=1}^{n} \frac{1}{N} = \frac{3}{5} = 0.6 \]
\[ P_1 = \frac{1}{5} = 0.2 \]
The total probability of selecting the specific unit in the second extraction is:
\[ P_2 = \left( \frac{4}{5} \right) \times \left( \frac{1}{4} \right) = \frac{4}{20} = \frac{1}{5} = 0.2 \]
where:
\[ P_3 = \left( \frac{4}{5} \right) \times \left( \frac{3}{4} \right) \times \left( \frac{1}{3} \right) = \frac{12}{60} = \frac{1}{5} = 0.2 \]
where:
\(\frac{4}{5}\) is the probability that another unit to be picked first
\(\frac{3}{4}\) is the probability that another unit to be picked second
\(\frac{1}{3}\) is the probability of selecting the specific unit in the third extraction
The total probability of selecting a specific unit is:
\[ P(\text{unit included}) = P_1 + P_2 + P_3 = \frac{1}{5} + \frac{1}{5} + \frac{1}{5} = \frac{3}{5} = 0.6 \]
Thus, the probability that a specific unit is included in the sample is 60%.
Therefore, as more units are removed from the population, there are fewer options left and the chance of selecting any remaining unit increases.
For a random sample of size n = 3, the sampling distribution of the sample proportion \(\hat{p}\) (proportion of the variable that satisfies the condition in the sample) follows approximately a normal distribution, provided that both np and n(1−p) are sufficiently large.
The possible samples that can be extracted from the reference population are:
As follows, we will calculate the mean, variance, and standard deviation for each sample extracted from the reference population. Additionally, we will graphically represent the sampling distribution of the mean, variance, and proportion.
The standardization formula when we want to standardize a variable, given that the population parameters (μ,σ) are known, is: \[ Z = \frac{X - \mu}{\sigma} \]
When the population parameters are unknown, we use the following formula: \[ Z = \frac{X - \bar{x}}{s} \; or \quad Z = \frac{X - \hat{\mu}}{\hat{\sigma}} \]
As can be seen, the estimated proportion values for each sample extracted from the reference population follow the following distribution:
As it can be seen form the graphic, most of the sample have a proportion of 0.33 for the attributive variable activity, followed by samples with a proportion value of 0.67, and finally, those with a proportion value of 0. The mean is 0.4, the variance is 0.04 and the standard deviation is 0.2 suggesting that the data is centered around 0.4 with relatively low variability.
The sample mean \(\bar{X}\) is an unbiased estimator of the population mean \(\mu\):
\[ \text{Bias}(\bar{X}) = \mathbb{E}[\bar{X}] - \mu = 0 \]
If \(\hat{p}\) is the sample proportion estimating the population proportion \(p\):
\[ \text{Bias}(\hat{p}) = \mathbb{E}[\hat{p}] - p \]
If the sample is randomly selected, \(\mathbb{E}[\hat{p}] = p\), meaning the sample proportion is unbiased.
However, in cases of non-random sampling, selection bias may introduce systematic error.
## Conclusions: We
observe that for proportion and
mean, the bias is very small, almost zero.
Therefore, the larger the sample size, the value of the estimator tends
to overlap with the value of the parameter.