The uniform distribution is a type of probability distribution where all possible values have the same probability of occurring. There are two main types of uniform distribution:
Discrete uniform distribution
Continuous uniform distribution
It applies when there is a finite number of possible values.
each value has the same probability of occurring
For a discrete uniform distribution over values \(\{ x_1, x_2, \dots, x_N \}\), the mean (expected value) and variance are given by:
Mean (Expacted Value) \[ E[X] = \frac{1}{N} \sum_{i=1}^{N} x_i \]
For a uniform distribution over consecutive integers {1,2,…,N}, this simplifies to: \[ E[X] = \frac{N + 1}{2} \]
Variance \[ \text{Var}(X) = E[X^2] - (E[X])^2 \]
using the formula for the sum of square of the first N numbers: \[E[X^2] = \frac{1}{N} \sum_{i=1}^{N} x_i^2 = \frac{(N+1)(2N+1)}{6}\]
Thus, the variance is:
\[\text{Var}(X) = \frac{(N^2-1)}{12}\]
Properties of the discrete uniform distribution
equal probability: each outcame \(x_i\) has the same probability \(P(X = x_i) = \frac{1}{N}\)
the distribution is symmetric around the mean value
the distribution is defined over a finite set of discrete values
If multiple independent discrete uniform variables are summed, they approximate a normal distribution (Central Limit Theorem).
Among all discrete distributions with a given range, the discrete uniform distribution has the highest entropy, meaning it represents maximum uncertainty.
Plotting the Mean and Variance of a Discrete Uniform Distribution
Therefore, if the variable is discrete, the uniform distribution is defined differently from the continuous one. Instead of using the continuous uniform distribution (where every value in an interval [a, b] has the same probability of occurring), we use the discrete uniform distribution, where a finite set of values \(\{ x_1, x_2, \dots, x_n \}\) each value has the same or equal probability, \(\frac{1}{N}\). In this case, we will generate a finite number of values (for example 10 values) as follows:
N <- 10 # maximum value
N
## [1] 10
x <- 1:N # the possible values
x
## [1] 1 2 3 4 5 6 7 8 9 10
pmf <- rep(1/N, N) # the probability of each generated value
pmf
## [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
To compute the CDF values we use cumsum() function:
cdf <- cumsum(pmf) # compute CDF values
cdf
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Probability function
Probability function for a discrete variable with \(\{ 1, 2, \dots, N \}\) is:
\[P(X = x) = \frac{1}{N}, \quad \text{for } x \in \{1,2,\dots,N\}\]
Cumulative Distribution Function (CDF)
The CDF function of a discrete uniform distribution is: \[F(x) = \begin{cases} 0, & x < 1 \\ \frac{k}{N}, & x = k, \quad k \in \{1, \dots, N\} \\ 1, & x \geq N \end{cases}\]
It applies to a continuous range of values
The probability density function is constant over the interval [a, b]
Let \(X \sim U(a, b)\) a random variable with uniform distribution in the interval [a, b] with \(a, b \in \mathbb{R},\) and a < b. The probability density function of \(X\) is: \[ f(x) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases} \] The parameters of the continuous uniform distribution are: the mean or the expected value and variance.
The fundamental properties of the uniform distribution are:
Mean:
\[
E[X] = \frac{a + b}{2}
\]
The mean represents the central location of the distribution, the midpoint of the interval [a, b].
Median: \[ E[X] = \frac{a + b}{2} \]
Mode: The mode is any value of [a, b]
Variance:
\[
\text{Var}(X) = \frac{(b - a)^2}{12}
\]
Variance measures the spread of the distribution. Since all values has the same probability of occurring, the variance depends only of the lenght of the [a, b] interval.
Probability density function:
\[f(x) =
\begin{cases}
\frac{1}{b - a}, & \text{if } a \leq x \leq b \\
0, & \text{otherwise}
\end{cases}\]
The function is constant over the [a, b] interval and describes the probability distribution of a continuous uniform variable.
In order to calculate the uniform density function for a continuous variable, in a certain interval, we will use the dunif() function for which we have to specify the following arguments: X - values, lower limit of the distribution (min or a), upper limit of the distribution (max or b) and so on.
Cumulative distribution function \[F(x) = \begin{cases} 0, & x < a \\ \frac{x - a}{b - a}, & a \leq x \leq b \\ 1, & x > b \end{cases}\]
The cumulative distribution function gives the probability that a random variable is less or equal to x value.
Moment generationg function
\[M_X(t) = \begin{cases} \frac{e^{bt} - e^{at}}{t(b - a)}, & t \neq 0 \end{cases}\]
Skewness and Kurtosis
Skewness: 0 (since the distribution is symmetric).
Kurtosis: -\(\frac{6}{5}\) (which indicates it is less “peaked” than a normal distribution).
Let’s say that we want to find the probability of our event’s outcome being between c and d. To calculate this probability, we can either determine the area by multiplying the length by the width or, since the distribution is uniform, we can compute the fraction of the desired interval relative to the entire distribution’s domain.
So, the probability of uniform distribution dependes on the length of the intervals and not on its position.
We will go step by step and we calculate the mean (expected value) and the variance for a continuous uniform distribution \(X \sim U(a, b)\).
The expected value is given by the following relation: \[ E[X] = \int_a^b x f(x) \,dx \] The probability density function (PDF) for a uniform distribution over [a, b] range is: \[ f(x) = \begin{cases} \frac{1}{b-a}, & a \leq x \leq b \\ 0, & \text{otherwise} \end{cases} \] Substituting f(x) into the integral: \[ E[X] = \int_a^b x \cdot \frac{1}{b-a} \,dx \] The \(\frac{1}{b-a}\) term is a constant and the value of integral is: \[ E[X] = \frac{1}{b-a} \int_a^b x \,dx \]
\[ \int x \,dx = \frac{x^2}{2} \] Evaluating the result form \(a\) to \(b\): \[ E[X] = \frac{1}{b-a} \left( \frac{b^2}{2} - \frac{a^2}{2} \right) \] \[ E[X] = \frac{b^2 - a^2}{2(b-a)} \] Knowing that \(b^2 - a^2 = (b-a)(b+a)\): \[ E[X] = \frac{(b-a)(b+a)}{2(b-a)} = \frac{b+a}{2} \] Thus, the mean value is: \[ E[X] = \frac{a+b}{2} \]
The variance is defined as: \[ \text{Var}(X) = E[X^2] - (E[X])^2 \]
We have to terms \(E[X^2]\) and \((E[X])^2\)
\subsection*{Compute the first term \( E[X^2] \)}
\[ E[X^2] = \int_a^b x^2 f(x) \,dx \] substituting the \(f(x)\) function: \[ E[X^2] = \frac{1}{b-a} \int_a^b x^2 \,dx \] We obtain: \[ \int x^2 \,dx = \frac{x^3}{3} \] Evaluating the result between \(a\) and \(b\) limits: \[ E[X^2] = \frac{1}{b-a} \left( \frac{b^3}{3} - \frac{a^3}{3} \right) \] \[ E[X^2] = \frac{b^3 - a^3}{3(b-a)} \] Using the identity \(b^3 - a^3 = (b-a)(b^2 + ab + a^2)\), we obtain the following result: \[ E[X^2] = \frac{(b-a)(b^2 + ab + a^2)}{3(b-a)} = \frac{b^2 + ab + a^2}{3} \]
\subsection*{Compute Var(X)}
\[ \text{Var}(X) = E[X^2] - (E[X])^2 \] Substituting the above values: \[ \text{Var}(X) = \frac{b^2 + ab + a^2}{3} - \left(\frac{a+b}{2}\right)^2 \] \[ \left(\frac{a+b}{2}\right)^2 = \frac{a^2 + 2ab + b^2}{4} \] \[ \text{Var}(X) = \frac{b^2 + ab + a^2}{3} - \frac{a^2 + 2ab + b^2}{4} \] The common denominator is 12: \[ \text{Var}(X) = \frac{4(b^2 + ab + a^2)}{12} - \frac{3(a^2 + 2ab + b^2)}{12} \] \[ \text{Var}(X) = \frac{4b^2 + 4ab + 4a^2 - 3a^2 - 6ab - 3b^2}{12} \] \[ \text{Var}(X) = \frac{(4b^2 - 3b^2) + (4a^2 - 3a^2) + (4ab - 6ab)}{12} \] \[ \text{Var}(X) = \frac{b^2 - 2ab + a^2}{12} \] We know that \(b^2 - 2ab + a^2 = (b-a)^2\), thus, the variance of X is: \[ \text{Var}(X) = \frac{(b-a)^2}{12} \]
The square root of the variance is the standard deviation (\(\sigma_X\)): \[\text{SD}(X) = \sigma_X = \sqrt{\text{Var}(X)} = \sqrt{\frac{(b-a)^2}{12}} = \frac{b-a}{\sqrt{12}}\]
Histograms of Uniform Distributions with Different Sample Sizes
For a continuous uniform distribution \(X \sim U(a, b)\), the mean, variance and standard deviation are: \[ E[X] = \frac{a+b}{2}, \quad \text{Var}(X) = \frac{(b-a)^2}{12}, \quad \text{SD}(X) = \frac{b-a}{\sqrt{12}} \]
For a discrete uniform distribution, the mean, variance and standard deviation are:
\[ E[X] = \frac{N + 1}{2}, \quad \text{Var}(X) = \frac{(N^2 - 1)}{12}, \quad \text{SD}(X) = \sqrt{\frac{N^2 - 1}{12}} \] In case of discrete variable, if we variate the sample size, we obtain the following results:
The probabilities and the cumulative relative frequencies for the sample with n=10 random values.
The probabilities and the cumulative relative frequencies for the sample with n = 100 random values.
The probabilities and the cumulative relative frequencies for the sample with n = 1000 random values.
The probabilities and the cumulative relative frequencies for the sample with n = 10000 random values.
The probabilities and the cumulative relative frequencies for the sample with n = 100000 random values.
The probabilities and the cumulative relative frequencies for the sample with n = 1000000 random values.
In the case of a continuous variable, when the sample size is small, the distribution may appear irregular, with some regions containing more values than others. As the sample size increases, the distribution becomes smoother and closely approximates the true probability density function, confirming that each value within the range \([a, b]\) has an equal probability density.
For a discrete variable, when the sample size is small, some discrete values may appear more frequently than others, leading to an irregular distribution. As the sample size increases, the frequency of each discrete value becomes closer to:
\[ P(X = x) = \frac{1}{N} \]
For a large sample, the empirical density distribution converges to the theoretical probability of \(\frac{1}{N}\) for each possible value.
As shown in the graphs, as the sample size increases, the empirical distribution converges to the theoretical distribution in both continuous and discrete uniform distributions.
In both cases (continuous and discrete), as the sample size increases, the empirical distribution converges to the theoretical distribution. The uniform distribution is useful in random number generation. The generated values can be used, for example, to randomly assign individuals to treatment groups in experimental studies or to select participants for surveys. Also we can use the uniform distribution in domain like gaming industry or traffic engineering.