Some of the basic statistical distributions are :
To get help on various distributions available in the stats package of R, pass the following command :
help("distributions") #or,
?distributions
Let’s say,a variable \(X\) is normally distributed with a known \(\mu = 75\) and \(SD = 5\).
So,
\(X \sim NORM(\mu = 75, \sigma^2 = 25)\)
To get some help on normal distribution in R :
help("pnorm")
#or,
?pnorm
To find the probability of \(X <= 70\), i.e., \(P(X <= 70)\) is :
pnorm(q=70, mean = 75, sd = 5, lower.tail = TRUE )
## [1] 0.1586553
lower.tail=TRUE is default argument.To find the probability of \(X >= 85\), i.e., \(P(X >= 85)\) is :
pnorm(q=85, mean = 75, sd = 5, lower.tail = FALSE )
## [1] 0.02275013
The pnorm() command can also be used to calculate the probability when \(z\) value is given.
To find the probability of \(z >= 1\), i.e., \(P(z >= 1)\) is :
pnorm(q=1, mean = 0, sd = 1, lower.tail = FALSE )
## [1] 0.1586553
To find the 25th percentile (1st quartile), we can use the qnorm() command :
qnorm(p = 0.25, mean = 75, sd = 5, lower.tail = TRUE)
## [1] 71.62755
Let’s plot a probability density function :
x = seq(from = 55, to = 95, by = 0.25)
x
## [1] 55.00 55.25 55.50 55.75 56.00 56.25 56.50 56.75 57.00 57.25 57.50 57.75
## [13] 58.00 58.25 58.50 58.75 59.00 59.25 59.50 59.75 60.00 60.25 60.50 60.75
## [25] 61.00 61.25 61.50 61.75 62.00 62.25 62.50 62.75 63.00 63.25 63.50 63.75
## [37] 64.00 64.25 64.50 64.75 65.00 65.25 65.50 65.75 66.00 66.25 66.50 66.75
## [49] 67.00 67.25 67.50 67.75 68.00 68.25 68.50 68.75 69.00 69.25 69.50 69.75
## [61] 70.00 70.25 70.50 70.75 71.00 71.25 71.50 71.75 72.00 72.25 72.50 72.75
## [73] 73.00 73.25 73.50 73.75 74.00 74.25 74.50 74.75 75.00 75.25 75.50 75.75
## [85] 76.00 76.25 76.50 76.75 77.00 77.25 77.50 77.75 78.00 78.25 78.50 78.75
## [97] 79.00 79.25 79.50 79.75 80.00 80.25 80.50 80.75 81.00 81.25 81.50 81.75
## [109] 82.00 82.25 82.50 82.75 83.00 83.25 83.50 83.75 84.00 84.25 84.50 84.75
## [121] 85.00 85.25 85.50 85.75 86.00 86.25 86.50 86.75 87.00 87.25 87.50 87.75
## [133] 88.00 88.25 88.50 88.75 89.00 89.25 89.50 89.75 90.00 90.25 90.50 90.75
## [145] 91.00 91.25 91.50 91.75 92.00 92.25 92.50 92.75 93.00 93.25 93.50 93.75
## [157] 94.00 94.25 94.50 94.75 95.00
Now, let’s calculate the propability density for each element of list x.
dens = dnorm(x, mean = 75, sd = 5)
plot(x, dens,
type = "h",
xlab = "x",
ylab = "Probability Density",
las = 1,
main = "Normal Distribution",
col = "dark red"
)
abline(v = 75, col = "red", lwd = 2)
To get a random sample from a normally distributed population, we can use, the rnorm() command.
randsample = rnorm(n=40, mean = 75, sd = 5)
randsample
## [1] 75.95536 75.85261 75.88264 79.68332 66.58519 82.89505 71.81083 86.43229
## [9] 74.45343 80.83728 82.26750 84.93609 76.74022 77.97471 68.53062 73.21949
## [17] 65.90488 68.60800 78.12507 79.01691 77.69162 72.77396 83.09418 76.60586
## [25] 80.41287 80.28058 74.07191 68.39433 75.11834 67.45095 74.31962 79.17617
## [33] 72.97175 83.70086 69.96044 85.74568 73.86673 68.97701 73.04973 84.79578
Let’s quickly make a histogram to see the distribution of the sample :
hist(randsample,
xlab = "Sample",
ylab = "Frequency",
main = "Sample Distribution",
col = "seagreen",
border = "white")
From the above histogram, we can see that even the distribution of random sample is not normally distributed even though it has been taken from a normally distributed population.
Let’s say,a variable \(X\) is binomially distributed with \(n=20\) trials and \(p=1/6\) probability of success.
So,
\(X \sim BIN(n = 20, p = 1/6)\)
To get some help on binomial distribution in R :
help("dbinom")
#or,
?dbinom
The dbinom command is used to find the values for the probability density function of \(X, f(X)\)
So, to find the probability of \(X=3\), i.e., \(P(X=3)\) :
dbinom(x=3, size = 20, prob = 1/6)
## [1] 0.2378866
So , the probability of exactly \(3\) successs on \(20\) trials is \(23.7\%\) (approx.)
To find the probability of \(X=0\) & \(X=1\) & \(X=2\) & \(X=3\), i.e., \(P(X=0)\) & \(P(X=1)\) & \(P(X=2)\) & \(P(X=3)\) is :
dbinom(x=0:3, size = 20, prob = 1/6)
## [1] 0.02608405 0.10433621 0.19823881 0.23788657
So, we can see :
Let’s plot a probability density function :
binomlist = seq(from = 0, to = 150, by = 1)
binomlist
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
## [55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
## [73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
## [91] 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
## [109] 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
## [127] 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
## [145] 144 145 146 147 148 149 150
Now, let’s calculate the propability density for each element of list binomlist.
binomdens = dbinom(binomlist, size = 150, prob = 1/2)
plot(binomlist, binomdens,
type = "h",
xlab = "x",
ylab = "Probability Density",
las = 1,
main = "Binomial Distribution",
col = "dark red"
)
abline(v = 75, col = "red", lwd = 2)
To find the probability of \(X <= 6\), i.e., \(P(X <= 6)\) is :
sum(dbinom(x=0:3, size = 20, prob = 1/6))
## [1] 0.5665456
or, we can use the pbinom() command to get the probability distribution function of \(X, f(X)\)
pbinom(q=3, size = 20, prob = 1/6, lower.tail = T)
## [1] 0.5665456
So, the probablity of getting \(3\) or, fewer success in \(20\) trials is approx. \(56.6\%\).
rbinom() command is used to take a random sample from a binomial distribution.qbinom() command is used to find quantiles for a binomial distribution.Let’s say,a variable \(X\) follows a poisson distribution with a known rate of \(\lambda=20\)
So,
\(X \sim POISSON(\lambda = 7)\)
To get some help on Poisson distribution in R :
help("dpois") #or,
?dpois
We can calculate the probabilities for Poisson distribution using the ppois() and dpois() commands
To find the probability of \(X=4\), i.e., \(P(X=3)\) :
dpois(x = 4, lambda = 7)
## [1] 0.09122619
So, there is approximately \(9\%\) chance of exactly \(4\) occurances.
To find the probability of exactly \(X=0\) & \(X=1\) & \(X=2\) & \(X=3\) & \(X=4\), i.e., \(P(X=0)\) & \(P(X=1)\) & \(P(X=2)\) & \(P(X=3)\) & \(P(X=4)\) is :
dpois(x = 0:4, lambda = 7)
## [1] 0.000911882 0.006383174 0.022341108 0.052129252 0.091226192
So, we can see :
Let’s plot a probability density function :
poislist = seq(from = 0, to = 100)
poislist
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
## [55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
## [73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
## [91] 90 91 92 93 94 95 96 97 98 99 100
Now, let’s calculate the propability density for each element of list poislist.
poisdens = dpois(poislist, lambda = 2)
plot(poislist, poisdens,
type = "l",
xlab = "x",
ylab = "Probability Density",
las = 1,
main = "Poisson Distribution",
col = "dark red"
)
To find the probability of \(X <= 4\), i.e., \(P(X <= 4)\) is :
sum(dpois(x = 0:4, lambda = 7))
## [1] 0.1729916
or, we can use the ppois() command to get the probability distribution function of \(X, f(X)\)
ppois(q=4, lambda = 7, lower.tail = T)
## [1] 0.1729916
So, there is approximately \(17.2\%\) chance of exactly \(4\) or, fewer occurances.
Similarly, to find the probability of \(X >= 12\), i.e., \(P(X >= 12)\) is :
ppois(q=12, lambda = 7, lower.tail = F)
## [1] 0.02699977
So, there is approximately \(2.6\%\) chance of exactly \(12\) or, more occurances.
rpois() command is used to take a random sample from a Poisson distribution.qpois() command is used to find quantiles for a Poisson distribution.To get help of t-distribution, we can simple write :
help(pt)
or,
?pt
Let’s say we are given :
\(t-statistics = 2.3\) \(Sample \space Size (n) = 26\)
So,
\(df = n-1 = 25\)
To get \(p(t > 2.3)\) :
pt(q = 2.3, df = 25, lower.tail = FALSE)
## [1] 0.01503675
So, by the argument lower.tail = FALSE, we mean we want to find the the area after \(t > 2.3\).
This is used to get the area above \(t = 2.3\) and below \(t = -2.3\)
To get this :
pt(q=2.3, df=25, lower.tail = F) + pt(q=-2.3, df=25,lower.tail=T)
## [1] 0.03007351
Or,
pt(q=2.3, df=25, lower.tail = F) * 2
## [1] 0.03007351
95% confidence implies the p-value in each tail as \(2.5\%\)
So, to find t-statistics when p-value of each tail is \(2.5\%\) :
qt(p = 0.025, df = 25, lower.tail = T)
## [1] -2.059539