Basic Distributions

Some of the basic statistical distributions are :

Normal Distribution
Binomial Distribution
Poisson Distribution
t Distribution
F Distribution
Exponential Distribution
Chi Squared Distribution

To get help on various distributions available in the stats package of R, pass the following command :

help("distributions") #or,
?distributions

Normal Distribution

Let’s say,a variable \(X\) is normally distributed with a known \(\mu = 75\) and \(SD = 5\).

So,

\(X \sim NORM(\mu = 75, \sigma^2 = 25)\)

To get some help on normal distribution in R :

help("pnorm")
#or,
?pnorm

To find the probability of \(X <= 70\), i.e., \(P(X <= 70)\) is :

pnorm(q=70, mean = 75, sd = 5, lower.tail = TRUE )

## [1] 0.1586553

In R, lower.tail=TRUE is default argument.

To find the probability of \(X >= 85\), i.e., \(P(X >= 85)\) is :

pnorm(q=85, mean = 75, sd = 5, lower.tail = FALSE )

## [1] 0.02275013

The pnorm() command can also be used to calculate the probability when \(z\) value is given.

To find the probability of \(z >= 1\), i.e., \(P(z >= 1)\) is :

pnorm(q=1, mean = 0, sd = 1, lower.tail = FALSE )

## [1] 0.1586553

To find the 25th percentile (1st quartile), we can use the qnorm() command :

qnorm(p = 0.25, mean = 75, sd = 5, lower.tail = TRUE)

## [1] 71.62755

Let’s plot a probability density function :

x = seq(from = 55, to = 95, by = 0.25)
x

##   [1] 55.00 55.25 55.50 55.75 56.00 56.25 56.50 56.75 57.00 57.25 57.50 57.75
##  [13] 58.00 58.25 58.50 58.75 59.00 59.25 59.50 59.75 60.00 60.25 60.50 60.75
##  [25] 61.00 61.25 61.50 61.75 62.00 62.25 62.50 62.75 63.00 63.25 63.50 63.75
##  [37] 64.00 64.25 64.50 64.75 65.00 65.25 65.50 65.75 66.00 66.25 66.50 66.75
##  [49] 67.00 67.25 67.50 67.75 68.00 68.25 68.50 68.75 69.00 69.25 69.50 69.75
##  [61] 70.00 70.25 70.50 70.75 71.00 71.25 71.50 71.75 72.00 72.25 72.50 72.75
##  [73] 73.00 73.25 73.50 73.75 74.00 74.25 74.50 74.75 75.00 75.25 75.50 75.75
##  [85] 76.00 76.25 76.50 76.75 77.00 77.25 77.50 77.75 78.00 78.25 78.50 78.75
##  [97] 79.00 79.25 79.50 79.75 80.00 80.25 80.50 80.75 81.00 81.25 81.50 81.75
## [109] 82.00 82.25 82.50 82.75 83.00 83.25 83.50 83.75 84.00 84.25 84.50 84.75
## [121] 85.00 85.25 85.50 85.75 86.00 86.25 86.50 86.75 87.00 87.25 87.50 87.75
## [133] 88.00 88.25 88.50 88.75 89.00 89.25 89.50 89.75 90.00 90.25 90.50 90.75
## [145] 91.00 91.25 91.50 91.75 92.00 92.25 92.50 92.75 93.00 93.25 93.50 93.75
## [157] 94.00 94.25 94.50 94.75 95.00

Now, let’s calculate the propability density for each element of list x.

dens = dnorm(x, mean = 75, sd = 5)
plot(x, dens, 
     type = "h",
     xlab = "x",
     ylab = "Probability Density",
     las = 1,
     main = "Normal Distribution",
     col = "dark red"
     )

abline(v = 75, col = "red", lwd = 2)

To get a random sample from a normally distributed population, we can use, the rnorm() command.

randsample = rnorm(n=40, mean = 75, sd = 5)
randsample

##  [1] 75.95536 75.85261 75.88264 79.68332 66.58519 82.89505 71.81083 86.43229
##  [9] 74.45343 80.83728 82.26750 84.93609 76.74022 77.97471 68.53062 73.21949
## [17] 65.90488 68.60800 78.12507 79.01691 77.69162 72.77396 83.09418 76.60586
## [25] 80.41287 80.28058 74.07191 68.39433 75.11834 67.45095 74.31962 79.17617
## [33] 72.97175 83.70086 69.96044 85.74568 73.86673 68.97701 73.04973 84.79578

Let’s quickly make a histogram to see the distribution of the sample :

hist(randsample,
     xlab = "Sample",
     ylab = "Frequency",
     main = "Sample Distribution",
     col = "seagreen",
     border = "white")

From the above histogram, we can see that even the distribution of random sample is not normally distributed even though it has been taken from a normally distributed population.

Binomial Distribution

Let’s say,a variable \(X\) is binomially distributed with \(n=20\) trials and \(p=1/6\) probability of success.

So,

\(X \sim BIN(n = 20, p = 1/6)\)

To get some help on binomial distribution in R :

help("dbinom")
#or,
?dbinom

The dbinom command is used to find the values for the probability density function of \(X, f(X)\)

So, to find the probability of \(X=3\), i.e., \(P(X=3)\) :

dbinom(x=3, size = 20, prob = 1/6)

## [1] 0.2378866

So , the probability of exactly \(3\) successs on \(20\) trials is \(23.7\%\) (approx.)

To find the probability of \(X=0\) & \(X=1\) & \(X=2\) & \(X=3\), i.e., \(P(X=0)\) & \(P(X=1)\) & \(P(X=2)\) & \(P(X=3)\) is :

dbinom(x=0:3, size = 20, prob = 1/6)

## [1] 0.02608405 0.10433621 0.19823881 0.23788657

So, we can see :

Approx. \(2.6\%\) of getting \(0\) success in \(20\) trails
Approx. \(10.4\%\) of getting \(1\) success in \(20\) trails
Approx. \(19.8\%\) of getting \(2\) success in \(20\) trails
Approx. \(23.7\%\) of getting \(3\) success in \(20\) trails

Let’s plot a probability density function :

binomlist = seq(from = 0, to = 150, by = 1)
binomlist

##   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [19]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
##  [37]  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
##  [55]  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
##  [73]  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
##  [91]  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
## [109] 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
## [127] 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
## [145] 144 145 146 147 148 149 150

Now, let’s calculate the propability density for each element of list binomlist.

binomdens = dbinom(binomlist, size = 150, prob = 1/2)
plot(binomlist, binomdens, 
     type = "h",
     xlab = "x",
     ylab = "Probability Density",
     las = 1,
     main = "Binomial Distribution",
     col = "dark red"
     )
abline(v = 75, col = "red", lwd = 2)

To find the probability of \(X <= 6\), i.e., \(P(X <= 6)\) is :

sum(dbinom(x=0:3, size = 20, prob = 1/6))

## [1] 0.5665456

or, we can use the pbinom() command to get the probability distribution function of \(X, f(X)\)

pbinom(q=3, size = 20, prob = 1/6, lower.tail = T)

## [1] 0.5665456

So, the probablity of getting \(3\) or, fewer success in \(20\) trials is approx. \(56.6\%\).

The rbinom() command is used to take a random sample from a binomial distribution.
The qbinom() command is used to find quantiles for a binomial distribution.

Poisson Distribution

Let’s say,a variable \(X\) follows a poisson distribution with a known rate of \(\lambda=20\)

So,

\(X \sim POISSON(\lambda = 7)\)

To get some help on Poisson distribution in R :

help("dpois") #or,
?dpois

We can calculate the probabilities for Poisson distribution using the ppois() and dpois() commands

To find the probability of \(X=4\), i.e., \(P(X=3)\) :

dpois(x = 4, lambda = 7)

## [1] 0.09122619

So, there is approximately \(9\%\) chance of exactly \(4\) occurances.

To find the probability of exactly \(X=0\) & \(X=1\) & \(X=2\) & \(X=3\) & \(X=4\), i.e., \(P(X=0)\) & \(P(X=1)\) & \(P(X=2)\) & \(P(X=3)\) & \(P(X=4)\) is :

dpois(x = 0:4, lambda = 7)

## [1] 0.000911882 0.006383174 0.022341108 0.052129252 0.091226192

So, we can see :

Approx. \(0.09\%\) chance of exactly \(0\) occurances.
Approx. \(0.6\%\) chance of exactly \(1\) occurances.
Approx. \(2.2\%\) chance of exactly \(2\) occurances.
Approx. \(5.2\%\) chance of exactly \(3\) occurances.
Approx. \(9.1\%\) chance of exactly \(4\) occurances.

Let’s plot a probability density function :

poislist = seq(from = 0, to = 100)
poislist

##   [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [19]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
##  [37]  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
##  [55]  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
##  [73]  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
##  [91]  90  91  92  93  94  95  96  97  98  99 100

Now, let’s calculate the propability density for each element of list poislist.

poisdens = dpois(poislist, lambda = 2)
plot(poislist, poisdens, 
     type = "l",
     xlab = "x",
     ylab = "Probability Density",
     las = 1,
     main = "Poisson Distribution",
     col = "dark red"
     )

To find the probability of \(X <= 4\), i.e., \(P(X <= 4)\) is :

sum(dpois(x = 0:4, lambda = 7))

## [1] 0.1729916

or, we can use the ppois() command to get the probability distribution function of \(X, f(X)\)

ppois(q=4, lambda = 7, lower.tail = T)

## [1] 0.1729916

So, there is approximately \(17.2\%\) chance of exactly \(4\) or, fewer occurances.

Similarly, to find the probability of \(X >= 12\), i.e., \(P(X >= 12)\) is :

ppois(q=12, lambda = 7, lower.tail = F)

## [1] 0.02699977

So, there is approximately \(2.6\%\) chance of exactly \(12\) or, more occurances.

The rpois() command is used to take a random sample from a Poisson distribution.
The qpois() command is used to find quantiles for a Poisson distribution.

t Distribution

To get help of t-distribution, we can simple write :

help(pt)

or,

?pt

Let’s say we are given :

\(t-statistics = 2.3\) \(Sample \space Size (n) = 26\)

So,

\(df = n-1 = 25\)

To get \(p(t > 2.3)\) :

pt(q = 2.3, df = 25, lower.tail = FALSE)

## [1] 0.01503675

So, by the argument lower.tail = FALSE, we mean we want to find the the area after \(t > 2.3\).

Getting Two sided p-value :

This is used to get the area above \(t = 2.3\) and below \(t = -2.3\)

To get this :

pt(q=2.3, df=25, lower.tail = F) + pt(q=-2.3, df=25,lower.tail=T)

## [1] 0.03007351

Or,

pt(q=2.3, df=25, lower.tail = F) * 2

## [1] 0.03007351

Calculating t-value for \(95\%\) Confidence :

95% confidence implies the p-value in each tail as \(2.5\%\)

So, to find t-statistics when p-value of each tail is \(2.5\%\) :

qt(p = 0.025, df = 25, lower.tail = T)

## [1] -2.059539