Introduction

Random variable and theoretical distributions play very important role in modeling uncertainty. It offers a syntactic approach for the languge of uncertainty. Many standard texts such as this on Mathematical Statistics and / or Statistical Inference deal with Distributions extensively.

Few important keywords

  • Random variable

  • Range of a random variable

  • Nature of a variable based on its range

    • Discrete / Continuous

    • Bounded / Unbounded (Finite / Infinite)

  • Proabability Mass / Density Functions (PMF/PDF)

  • Summarizing a distribution

    • Expectation of functions of a random variable

    • Probability statements about a random variable

    • Order statistics of a random variable

  • Need for Theoretical distributions

  • Nature and Role of parameters

Conventional notations

Following is a broad way of understanding various symbols used to represent random variables and distributions. However, this can be modified in a specific context; for more details, please refer

  • Variables: Upper case Latin alphabets \(X, Y, Z\) (preferably last 6 letters)

  • Unknown Value of a variable: Lower case Latin alphabets \(x, y, z\) (preferably last 6 letters)

  • Constants: Lower case Latin alphabets \(a, b, c\) (preferably first 5 letters)

  • Parameters: Lower case Latin alphabets \(a,b,c,k,l, m, n\) (except possibly the last 5 letters) or Lower case Greek alphabets \(\alpha, \beta, \gamma\)

  • Also, Upper case Latin alphabets are used to list more variables \(X_1, X_2, \cdots\cdots X_n\) and values of corresponding variables are denoted using lower case Latin alphabets \(x_1, x_2, \cdots\cdots x_n\) are used.

  • In the previous point, we could note another aspect called index to specify the number of variables.Lower Latin alphabets are used for index such as \(i=1,2,3\cdots\cdots N\) or \(j=1,2,3\cdots\cdots n\)

  • Collectively, in the language of Mathematical Statistics, \[X \sim f(X|\theta)\] implies that X is random variable with PDF \(f\) with parameter \(\theta\)

    • \(\theta\) can be any dimension

    • Range of X is denoted as \(\mathscr{A}_X\)

    • \(|\) should be read as given

    • \(f(X|\theta)\) should be read as, “probability mass (density) function of x given \(\theta\) is equal to ……..”

    • Whenever \(\theta\) is known, \(f(x|\theta)\) computes value of the function at \(x\) of \(X\)

Functional form of Distributions

There can be infinitely many ways to define mathematical form for a distribution satisfying two conditions; non-negativity and total probability is 1. However, such possibilities may be understood by realizing few mathematical functions and five operations \(+,~-,~\times,~\div,\)and \(\circ\); last one \(\circ\) indicates composition of functions

  1. Constant Function \(f(x)=k\)

  2. Power Function \(f(x) = x^k\), except possibly at \(x=0\)

  3. Exponential Function \(f(x) = k^x\) with \(k>0\); in particular \(k=e\) draws much scientific attention

  4. Logarithmic functions \(f(x) = \log_{a}x\) where \(a>0\) is the base

    1. Common logarithm: \(a=10\)

    2. Natural logarithm: \(a=e\)

  5. Circular trigonometric functions

Each of the above five functions exhibit intereting characteristics for suitable values of respective parameters. In statistical paradigm three types of parameters are more prominent; location, shape, and scale parameters.

Visual Representations

In this section, we provide some graphical form of above functional representations. Also, the role of parameters could be visually understood

Location Parameter

For the first curve \(x^2\), zero is the origin; in other two curves, origin is shifted by three units on either side of zero.

library(ggplot2)
library(gridExtra)
library(grid)
PFun1 <- function(x) {
       x^2 
}
PFun2 <- function(x) {
       (x-3)^2
}

PFun3 <- function(x) {
       (x+3)^2
}

p1=ggplot(data.frame(x = c(-5, 5)), aes(x = x)) +
       stat_function(fun = PFun1,col="red",size=2)+
       geom_vline(xintercept=0, lty=2,size=2)+
       geom_hline(yintercept=0, size=2)+
       geom_text(x = 1, y = 15, label = 'a = 0',size=5)+
   theme(text = element_text(size=20),
         axis.text.x = element_text(size = 20))

p2=ggplot(data.frame(x = c(-5, 11)), aes(x = x))+       
stat_function(fun = PFun2,col="blue",size=2)+
       geom_vline(xintercept=3, lty=2,size=2)+
       geom_hline(yintercept=0, size=2)+
       geom_text(x = 1.5, y = 25, label = 'a = 3',size=5)+
   theme(text = element_text(size=20),
         axis.text.x = element_text(size = 20))


p3=ggplot(data.frame(x = c(-11, 5)), aes(x = x))+       
       stat_function(fun = PFun3,col="darkgreen",size=2)+
       geom_vline(xintercept=-3, lty=2,size=2)+
       geom_hline(yintercept=0, size=2)+
       geom_text(x = -1.5, y = 25, label = 'a = -3',size=5)+
   theme(text = element_text(size=20),
         axis.text.x = element_text(size = 20))


grid.arrange(p1, p2,p3, ncol = 3,
        top = textGrob("Location parameter 'a' in" ~(x-a)^2,
            gp = gpar(col = "black", fontsize = 20)))

Scale Parameter

For different values of scale parameter \(b\) in \(e^{-bx}\) from 0 to 2

EFun1 <- function(x) {
       exp(-x) 
}
EFun2 <- function(x) {
       exp(-2*x)
}

EFun3 <- function(x) {
       exp(-3*x)
}

EFun4 <- function(x) {
       exp(-0.1*x)
}

ggplot(data.frame(x = c(0, 2)), aes(x = x))+       
       stat_function(fun = EFun1,col="darkgreen",size=2)+
       stat_function(fun = EFun2,col="darkblue",size=2)+
       stat_function(fun = EFun3,col="red",size=2)+
       stat_function(fun = EFun4,col="darkviolet",size=2)+
       geom_vline(xintercept=0, size=2)+
       geom_hline(yintercept=0, size=2)+
       geom_text(x = 1.6, y = 0.25, label = 'b = 1',col="darkgreen",size=5)+
       geom_text(x = 1, y = 0.25, label = 'b = 2',col="darkblue",size=5)+
       geom_text(x = 0.25, y = 0.25, label = 'b = 3',col="red",size=5)+
       geom_text(x = 1.75, y = 0.75, label = 'b = 0.1',col="darkviolet",size=5)+
       labs(title = bquote("Scale Parameter 'b' in"~e^-{b*x}),size=20)

Scale and Location Parameters

Some examples for combination of scale and location parameters, \(\frac{1}{b}f(\frac{(x-a)}{b})\). In the following, first curve has scale = 1 to compare when \(b > 1 ~\mathrm{or}~ < 1\)

EPFun1 <- function(x) {
       exp(-x^2) 
}
EPFun2 <- function(x) {
      (1/3)*exp(-(1/3)*(x-2)^2)
}

EPFun3 <- function(x) {
       (1/4)*exp(-(1/4)*(x+2)^2)
}

EPFun4 <- function(x) {
      (1/0.1)*exp(-(1/0.1)*(x+2)^2)
}

p1=ggplot(data.frame(x = c(-3, 3)), aes(x = x)) +
       stat_function(fun = EPFun1,col="darkred",size=2)+
       geom_vline(xintercept=0, size=2)+
       geom_hline(yintercept=0, size=2)+
       geom_text(x = 2, y = 0.75, label = 'a = 0, b=1',col="darkred",size=5)

p2=ggplot(data.frame(x = c(-3, 7)), aes(x = x))+       
       stat_function(fun = EPFun2,col="blue",size=2)+
       geom_vline(xintercept=0)+
       geom_vline(xintercept=2, col="blue",size=2)+
       geom_hline(yintercept=0)+
      geom_text(x = -2, y = 0.2, label = 'a = 2, b=3',col="blue",size=5)

p3=ggplot(data.frame(x = c(-9, 5)), aes(x = x))+       
       stat_function(fun = EPFun3,col="darkgreen",size=2)+
       geom_vline(xintercept=0)+
       geom_vline(xintercept=-2, col="darkgreen",size=2)+
       geom_hline(yintercept=0)+
       geom_text(x = 4, y = 0.2, label = 'a = -2,b=4',col="darkgreen",size=5)

p4=ggplot(data.frame(x = c(-9, 5)), aes(x = x))+       
       stat_function(fun = EPFun4,col="darkviolet",size=2)+
       geom_vline(xintercept=0)+
       geom_vline(xintercept=-2, col="darkviolet",size=2)+
       geom_hline(yintercept=0)+
       geom_text(x = -5.5, y = 4, label = 'a = -2,b=0.1',col="darkviolet",size=5)

grid.arrange(p1, p2,p3,p4,nrow=2, ncol = 2,
top = textGrob("Location (a) and scale (b)  parameters in" ~exp(-b*(x-a)^2),
            gp = gpar(col = "black", fontsize = 20)))

Shape Parameter

SFun1 <- function(x) {
       (1-x)^2 
}
SFun2 <- function(x) {
       (1-x)^4
}

SFun3 <- function(x) {
       (1-x)^2.5
}

SFun4 <- function(x) {
       (1-x)^-0.5
}

p1=ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
       stat_function(fun = SFun1,col="darkred",size=2)+
       stat_function(fun = SFun2,col="blue",size=2)+
       stat_function(fun = SFun3,col="darkgreen",size=2)+
       geom_vline(xintercept=0)+
       geom_hline(yintercept=0)+
       geom_text(x = 0.4, y = 0.8, label = 'c = 2', col="darkred", size=5)+
       geom_text(x = 0.5, y = 0.7, label = 'c = 4', col="blue", size=5)+
       geom_text(x = 0.75, y = 0.75, label = 'c = 2.5', col="darkgreen",size=5)

p2=ggplot(data.frame(x = c(0, 1)), aes(x = x))+       
       stat_function(fun = SFun4,col="darkviolet",size=2)+
       geom_vline(xintercept=0)+
       geom_hline(yintercept=0)+
       geom_text(x = 0.5, y = 2, label = 'c = -0.5',col="darkviolet",size=5)

grid.arrange(p1, p2, ncol = 2,nrow=1,
             top = textGrob("Shape (c) parameters in" ~(1-x)^c, gp = gpar(col = "black", fontsize = 20)))

Few more examples for combination of shape and location parameters

Fun1 <- function(x) {
       x^3*exp(-x^2) 
}
Fun2 <- function(x) {
       x^0.5*exp(-x^2)
}

Fun3 <- function(x) {
       x^0.5*exp(-(x-3)^2)
}

Fun4 <- function(x) {
       x^-0.5*(1-x)^-0.5
}

p1=ggplot(data.frame(x = c(0, 3)), aes(x = x)) +
       stat_function(fun = Fun1,col="red",size=2)+
       geom_vline(xintercept=0)+
       geom_hline(yintercept=0)

p2=ggplot(data.frame(x = c(0, 3)), aes(x = x))+       
       stat_function(fun = Fun2,col="blue",size=2)+
       geom_vline(xintercept=0)+
       geom_hline(yintercept=0)

p3=ggplot(data.frame(x = c(0, 3)), aes(x = x))+       
       stat_function(fun = Fun3,col="darkgreen",size=2)+
       geom_vline(xintercept=0)+
       geom_hline(yintercept=0)

p4=ggplot(data.frame(x = c(0, 1)), aes(x = x))+       
       stat_function(fun =Fun4,col="darkviolet",size=2)+
       geom_vline(xintercept=0)+
       geom_hline(yintercept=0)

grid.arrange(p1, p2,p3,p4,nrow=2, ncol = 2,
 top = grid::textGrob("Shape and location parameters",
            gp = gpar(col = "black", fontsize = 20)))

Reading a Theoretical distribution

A theoretical distribution has practical reasoning so as to define in a functional form as described earlier. Following three are just to exemplify the basic requirements one has to know about a distribution. Genesis of many standard theoretical distributions (and other features) may be referred here; other similar texts by these authors are also worth reading.

Normal distribution

  • PDF: \(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)

  • Parameters: Location: \(\mu\), Scale: \(\sigma^2\)

  • Range of X, \(\mathscr{A}\): \(-\infty \lt x \lt\infty\)

  • Parameter Space: \(-\infty\lt\mu\lt\infty\), \(\sigma^2\gt0\)

Beta distribution

  • PDF: \(\frac{1}{\beta(\alpha,\beta)}x^{(\alpha-1)}(1-x)^{(\beta-1)}\)

  • Parameters: Shape: \(\alpha,\beta\)

  • Range of X, \(\mathscr{A}\): \(0\lt x\lt1\)

  • Parameter Space: \(\alpha\gt0\),\(\beta\gt0\)

Binomial distribution

  • PDF: \({n \choose x} \theta^x (1-\theta)^{n-x}\)

  • Parameters: Shape: \(n,\theta\)

  • Range of X, \(\mathscr{A}\): \(x = 0,1,2,...,n\)

  • Parameter Space: \(0\lt \theta \lt1\), \(n\gt0\) integer

Final Remarks

  • Five basic forms of a function and associated operations would help in understanding varioud possible forms of a functions

  • These aspects are controlled by parameters in a function. In Statistical theory, they play vital roles in modeling many real time scenarios.

  • Hence, in a pdf \(f(x|\theta)\) it is equally important of knowing the range of the variable \(X\) and the parameter \(\theta\) space

  • Geometrically, additive and multiplicative changes and shape of a functions are effected using location, scale and shape parameters.

  • There are mathematical forms to represent real-time scenarios. A theoretical distribution may vary in its nature, range and parameter space depending on the purpose and need for its reasoning.

  • Web resources like Wiki are also notable references for theoretical disributions.One common aspect in any such resource is the inclusion of aforementioned keywords.

  • As an initial understanding, this notes attempts to explain the mathematical form of random variable and associated parameters.