Random variable and theoretical distributions play very important role in modeling uncertainty. It offers a syntactic approach for the languge of uncertainty. Many standard texts such as this on Mathematical Statistics and / or Statistical Inference deal with Distributions extensively.
Random variable
Range of a random variable
Nature of a variable based on its range
Discrete / Continuous
Bounded / Unbounded (Finite / Infinite)
Proabability Mass / Density Functions (PMF/PDF)
Summarizing a distribution
Expectation of functions of a random variable
Probability statements about a random variable
Order statistics of a random variable
Need for Theoretical distributions
Nature and Role of parameters
Following is a broad way of understanding various symbols used to represent random variables and distributions. However, this can be modified in a specific context; for more details, please refer
Variables: Upper case Latin alphabets \(X, Y, Z\) (preferably last 6 letters)
Unknown Value of a variable: Lower case Latin alphabets \(x, y, z\) (preferably last 6 letters)
Constants: Lower case Latin alphabets \(a, b, c\) (preferably first 5 letters)
Parameters: Lower case Latin alphabets \(a,b,c,k,l, m, n\) (except possibly the last 5 letters) or Lower case Greek alphabets \(\alpha, \beta, \gamma\)
Also, Upper case Latin alphabets are used to list more variables \(X_1, X_2, \cdots\cdots X_n\) and values of corresponding variables are denoted using lower case Latin alphabets \(x_1, x_2, \cdots\cdots x_n\) are used.
In the previous point, we could note another aspect called index to specify the number of variables.Lower Latin alphabets are used for index such as \(i=1,2,3\cdots\cdots N\) or \(j=1,2,3\cdots\cdots n\)
Collectively, in the language of Mathematical Statistics, \[X \sim f(X|\theta)\] implies that X is random variable with PDF \(f\) with parameter \(\theta\)
\(\theta\) can be any dimension
Range of X is denoted as \(\mathscr{A}_X\)
\(|\) should be read as given
\(f(X|\theta)\) should be read as, “probability mass (density) function of x given \(\theta\) is equal to ……..”
Whenever \(\theta\) is known, \(f(x|\theta)\) computes value of the function at \(x\) of \(X\)
There can be infinitely many ways to define mathematical form for a distribution satisfying two conditions; non-negativity and total probability is 1. However, such possibilities may be understood by realizing few mathematical functions and five operations \(+,~-,~\times,~\div,\)and \(\circ\); last one \(\circ\) indicates composition of functions
Constant Function \(f(x)=k\)
Power Function \(f(x) = x^k\), except possibly at \(x=0\)
Exponential Function \(f(x) = k^x\) with \(k>0\); in particular \(k=e\) draws much scientific attention
Logarithmic functions \(f(x) = \log_{a}x\) where \(a>0\) is the base
Common logarithm: \(a=10\)
Natural logarithm: \(a=e\)
Circular trigonometric functions
Each of the above five functions exhibit intereting characteristics for suitable values of respective parameters. In statistical paradigm three types of parameters are more prominent; location, shape, and scale parameters.
In this section, we provide some graphical form of above functional representations. Also, the role of parameters could be visually understood
For the first curve \(x^2\), zero is the origin; in other two curves, origin is shifted by three units on either side of zero.
library(ggplot2)
library(gridExtra)
library(grid)
PFun1 <- function(x) {
x^2
}
PFun2 <- function(x) {
(x-3)^2
}
PFun3 <- function(x) {
(x+3)^2
}
p1=ggplot(data.frame(x = c(-5, 5)), aes(x = x)) +
stat_function(fun = PFun1,col="red",size=2)+
geom_vline(xintercept=0, lty=2,size=2)+
geom_hline(yintercept=0, size=2)+
geom_text(x = 1, y = 15, label = 'a = 0',size=5)+
theme(text = element_text(size=20),
axis.text.x = element_text(size = 20))
p2=ggplot(data.frame(x = c(-5, 11)), aes(x = x))+
stat_function(fun = PFun2,col="blue",size=2)+
geom_vline(xintercept=3, lty=2,size=2)+
geom_hline(yintercept=0, size=2)+
geom_text(x = 1.5, y = 25, label = 'a = 3',size=5)+
theme(text = element_text(size=20),
axis.text.x = element_text(size = 20))
p3=ggplot(data.frame(x = c(-11, 5)), aes(x = x))+
stat_function(fun = PFun3,col="darkgreen",size=2)+
geom_vline(xintercept=-3, lty=2,size=2)+
geom_hline(yintercept=0, size=2)+
geom_text(x = -1.5, y = 25, label = 'a = -3',size=5)+
theme(text = element_text(size=20),
axis.text.x = element_text(size = 20))
grid.arrange(p1, p2,p3, ncol = 3,
top = textGrob("Location parameter 'a' in" ~(x-a)^2,
gp = gpar(col = "black", fontsize = 20)))
For different values of scale parameter \(b\) in \(e^{-bx}\) from 0 to 2
EFun1 <- function(x) {
exp(-x)
}
EFun2 <- function(x) {
exp(-2*x)
}
EFun3 <- function(x) {
exp(-3*x)
}
EFun4 <- function(x) {
exp(-0.1*x)
}
ggplot(data.frame(x = c(0, 2)), aes(x = x))+
stat_function(fun = EFun1,col="darkgreen",size=2)+
stat_function(fun = EFun2,col="darkblue",size=2)+
stat_function(fun = EFun3,col="red",size=2)+
stat_function(fun = EFun4,col="darkviolet",size=2)+
geom_vline(xintercept=0, size=2)+
geom_hline(yintercept=0, size=2)+
geom_text(x = 1.6, y = 0.25, label = 'b = 1',col="darkgreen",size=5)+
geom_text(x = 1, y = 0.25, label = 'b = 2',col="darkblue",size=5)+
geom_text(x = 0.25, y = 0.25, label = 'b = 3',col="red",size=5)+
geom_text(x = 1.75, y = 0.75, label = 'b = 0.1',col="darkviolet",size=5)+
labs(title = bquote("Scale Parameter 'b' in"~e^-{b*x}),size=20)
Some examples for combination of scale and location parameters, \(\frac{1}{b}f(\frac{(x-a)}{b})\). In the following, first curve has scale = 1 to compare when \(b > 1 ~\mathrm{or}~ < 1\)
EPFun1 <- function(x) {
exp(-x^2)
}
EPFun2 <- function(x) {
(1/3)*exp(-(1/3)*(x-2)^2)
}
EPFun3 <- function(x) {
(1/4)*exp(-(1/4)*(x+2)^2)
}
EPFun4 <- function(x) {
(1/0.1)*exp(-(1/0.1)*(x+2)^2)
}
p1=ggplot(data.frame(x = c(-3, 3)), aes(x = x)) +
stat_function(fun = EPFun1,col="darkred",size=2)+
geom_vline(xintercept=0, size=2)+
geom_hline(yintercept=0, size=2)+
geom_text(x = 2, y = 0.75, label = 'a = 0, b=1',col="darkred",size=5)
p2=ggplot(data.frame(x = c(-3, 7)), aes(x = x))+
stat_function(fun = EPFun2,col="blue",size=2)+
geom_vline(xintercept=0)+
geom_vline(xintercept=2, col="blue",size=2)+
geom_hline(yintercept=0)+
geom_text(x = -2, y = 0.2, label = 'a = 2, b=3',col="blue",size=5)
p3=ggplot(data.frame(x = c(-9, 5)), aes(x = x))+
stat_function(fun = EPFun3,col="darkgreen",size=2)+
geom_vline(xintercept=0)+
geom_vline(xintercept=-2, col="darkgreen",size=2)+
geom_hline(yintercept=0)+
geom_text(x = 4, y = 0.2, label = 'a = -2,b=4',col="darkgreen",size=5)
p4=ggplot(data.frame(x = c(-9, 5)), aes(x = x))+
stat_function(fun = EPFun4,col="darkviolet",size=2)+
geom_vline(xintercept=0)+
geom_vline(xintercept=-2, col="darkviolet",size=2)+
geom_hline(yintercept=0)+
geom_text(x = -5.5, y = 4, label = 'a = -2,b=0.1',col="darkviolet",size=5)
grid.arrange(p1, p2,p3,p4,nrow=2, ncol = 2,
top = textGrob("Location (a) and scale (b) parameters in" ~exp(-b*(x-a)^2),
gp = gpar(col = "black", fontsize = 20)))
SFun1 <- function(x) {
(1-x)^2
}
SFun2 <- function(x) {
(1-x)^4
}
SFun3 <- function(x) {
(1-x)^2.5
}
SFun4 <- function(x) {
(1-x)^-0.5
}
p1=ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = SFun1,col="darkred",size=2)+
stat_function(fun = SFun2,col="blue",size=2)+
stat_function(fun = SFun3,col="darkgreen",size=2)+
geom_vline(xintercept=0)+
geom_hline(yintercept=0)+
geom_text(x = 0.4, y = 0.8, label = 'c = 2', col="darkred", size=5)+
geom_text(x = 0.5, y = 0.7, label = 'c = 4', col="blue", size=5)+
geom_text(x = 0.75, y = 0.75, label = 'c = 2.5', col="darkgreen",size=5)
p2=ggplot(data.frame(x = c(0, 1)), aes(x = x))+
stat_function(fun = SFun4,col="darkviolet",size=2)+
geom_vline(xintercept=0)+
geom_hline(yintercept=0)+
geom_text(x = 0.5, y = 2, label = 'c = -0.5',col="darkviolet",size=5)
grid.arrange(p1, p2, ncol = 2,nrow=1,
top = textGrob("Shape (c) parameters in" ~(1-x)^c, gp = gpar(col = "black", fontsize = 20)))
Few more examples for combination of shape and location parameters
Fun1 <- function(x) {
x^3*exp(-x^2)
}
Fun2 <- function(x) {
x^0.5*exp(-x^2)
}
Fun3 <- function(x) {
x^0.5*exp(-(x-3)^2)
}
Fun4 <- function(x) {
x^-0.5*(1-x)^-0.5
}
p1=ggplot(data.frame(x = c(0, 3)), aes(x = x)) +
stat_function(fun = Fun1,col="red",size=2)+
geom_vline(xintercept=0)+
geom_hline(yintercept=0)
p2=ggplot(data.frame(x = c(0, 3)), aes(x = x))+
stat_function(fun = Fun2,col="blue",size=2)+
geom_vline(xintercept=0)+
geom_hline(yintercept=0)
p3=ggplot(data.frame(x = c(0, 3)), aes(x = x))+
stat_function(fun = Fun3,col="darkgreen",size=2)+
geom_vline(xintercept=0)+
geom_hline(yintercept=0)
p4=ggplot(data.frame(x = c(0, 1)), aes(x = x))+
stat_function(fun =Fun4,col="darkviolet",size=2)+
geom_vline(xintercept=0)+
geom_hline(yintercept=0)
grid.arrange(p1, p2,p3,p4,nrow=2, ncol = 2,
top = grid::textGrob("Shape and location parameters",
gp = gpar(col = "black", fontsize = 20)))
A theoretical distribution has practical reasoning so as to define in a functional form as described earlier. Following three are just to exemplify the basic requirements one has to know about a distribution. Genesis of many standard theoretical distributions (and other features) may be referred here; other similar texts by these authors are also worth reading.
Normal distribution
PDF: \(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)
Parameters: Location: \(\mu\), Scale: \(\sigma^2\)
Range of X, \(\mathscr{A}\): \(-\infty \lt x \lt\infty\)
Parameter Space: \(-\infty\lt\mu\lt\infty\), \(\sigma^2\gt0\)
Beta distribution
PDF: \(\frac{1}{\beta(\alpha,\beta)}x^{(\alpha-1)}(1-x)^{(\beta-1)}\)
Parameters: Shape: \(\alpha,\beta\)
Range of X, \(\mathscr{A}\): \(0\lt x\lt1\)
Parameter Space: \(\alpha\gt0\),\(\beta\gt0\)
Binomial distribution
PDF: \({n \choose x} \theta^x (1-\theta)^{n-x}\)
Parameters: Shape: \(n,\theta\)
Range of X, \(\mathscr{A}\): \(x = 0,1,2,...,n\)
Parameter Space: \(0\lt \theta \lt1\), \(n\gt0\) integer
Five basic forms of a function and associated operations would help in understanding varioud possible forms of a functions
These aspects are controlled by parameters in a function. In Statistical theory, they play vital roles in modeling many real time scenarios.
Hence, in a pdf \(f(x|\theta)\) it is equally important of knowing the range of the variable \(X\) and the parameter \(\theta\) space
Geometrically, additive and multiplicative changes and shape of a functions are effected using location, scale and shape parameters.
There are mathematical forms to represent real-time scenarios. A theoretical distribution may vary in its nature, range and parameter space depending on the purpose and need for its reasoning.
Web resources like Wiki are also notable references for theoretical disributions.One common aspect in any such resource is the inclusion of aforementioned keywords.
As an initial understanding, this notes attempts to explain the mathematical form of random variable and associated parameters.