library(ggplot2)
<- c(1:4)
x <- 1/4
d <- rep(d,4)
p <- data.frame(x,p)
data <- ggplot(data=data, aes(x=x, y=p))+
plot geom_bar(stat="identity", fill="cornflowerblue") +
labs(title="Uniform Probability Distribution for n=4", x="Number", y="Probability")
plot
The Uniform Distribution
With the binomial distribution there was only one possible outcome. It is a true or false, on or off, heads or tails choice from a single Bernoulli trial. However we can have a probability space where there are more possible events. For example when throwing a standard 6 sided die, there are 6 possible options corresponding to the values on each of the faces. These are usually the numbers 1 to 6 but there is no reason that they cannot be any other numbers.
The probability space is defined by the geometry of the die. There are six faces and as long as it has been made to be fair then each has the same probability of being the uppermost face after a throw. The probability is therefore 1/6 for each of the faces giving a total probability of 1.
Formal Mathematical Definition
The uniform or rectangular distribution gives an equal probability to all n possibilities that make up the sample space.
Usually we use dice based on regular polyhedra - tetrahedral (4-sided), cubic, octahedral (8-sided), dodecahedron (12-sided) and icosahedron (20-sided). These are also the Platonic solids.
The probability is defined by a single parameter, n - the number of possible outcomes.
\[f(x)= P(X=x) = {\frac{1}{n}} \]
In most real world cases we cannot assume that there is an equal probability for all possible events. For dice this is not a bad assumption as die can be made fair. But for something like a world cup or a sports league the probability of each of the competitors winning is not the same.
library(ggplot2)
<- c(1:6)
x <- 1/6
d <- rep(d,6)
p <- data.frame(x,p)
data <- ggplot(data=data, aes(x=x, y=p))+
plot geom_bar(stat="identity", fill="coral") +
labs(title="Uniform Probability Distribution for n=6", x="Number", y="Probability")
plot
library(ggplot2)
<- c(1:8)
x <- 1/8
d <- rep(d,8)
p <- data.frame(x,p)
data <- ggplot(data=data, aes(x=x, y=p))+
plot geom_bar(stat="identity", fill="darkorchid") +
labs(title="Uniform Probability Distribution for n=8", x="Number", y="Probability")
plot
library(ggplot2)
<- c(1:12)
x <- 1/12
d <- rep(d,12)
p <- data.frame(x,p)
data <- ggplot(data=data, aes(x=x, y=p))+
plot geom_bar(stat="identity", fill="tomato") +
labs(title="Uniform Probability Distribution for n=12", x="Number", y="Probability")
plot
library(ggplot2)
<- c(1:20)
x <- 1/20
d <- rep(d,20)
p <- data.frame(x,p)
data <- ggplot(data=data, aes(x=x, y=p))+
plot geom_bar(stat="identity", fill="olivedrab") +
labs(title="Uniform Probability Distribution for n=20", x="Number", y="Probability")
plot
There is no discrete random distribution in base R. If you wanted to sample from a simple uniform random distribution you could simulate it using the sample function with replacement.
For example to simulate 10 rolls of a six sided dice you could use the following code.
<- c(1:6)
x sample(x,10,replace=TRUE)
[1] 5 4 2 2 6 5 4 1 6 2
If you wanted to simulate ages between 18 and 60 and if you assumed that they were uniformly distributed in the population you could use the following code.
<- c(18:60)
x sample(x,10,replace=TRUE)
[1] 51 24 28 55 47 20 29 46 25 18
You can also use the same process to generate random DNA or amino acid sequences.
<- c("A","C","T","G")
x sample(x,50,replace=TRUE)
[1] "G" "A" "T" "G" "C" "G" "C" "C" "G" "T" "G" "T" "C" "C" "A" "A" "T" "C" "T"
[20] "A" "C" "C" "G" "C" "A" "C" "C" "T" "T" "T" "T" "T" "G" "A" "A" "A" "T" "T"
[39] "T" "A" "C" "T" "A" "A" "A" "G" "A" "C" "A" "A"
If you sample from the uniform discrete distribution and plot the results it will converge over infinite sample size to the uniform distribution but for small samples there can be large variations.
library(ggplot2)
<- c(1:6)
x <- sample(x,10,replace=TRUE)
y <- table(y)
t <- as.data.frame(t)
data <- ggplot(data=data, aes(x=y, y=Freq/10))+
plot geom_bar(stat="identity", fill="steelblue") +
labs(title="Sampled Probability Distribution for a six sided die: Sample Size = 10", x="Number", y="Probability")
plot
library(ggplot2)
<- c(1:6)
x <- sample(x,10000,replace=TRUE)
y <- table(y)
t <- as.data.frame(t)
data <- ggplot(data=data, aes(x=y, y=Freq/10000))+
plot geom_bar(stat="identity", fill="steelblue") +
labs(title="Sampled Probability Distribution for a six sided die: Sample Size = 10000", x="Number", y="Probability")
plot
As an alternative there is an implementation of the discrete uniform distribution in the purr library. Where we can run the same sampling as with the sample function.
library(purrr)
<- rdunif(10,a=1,b=6) y
Warning: `rdunif()` was deprecated in purrr 1.0.0.
<- table(y)
t <- as.data.frame(t)
data <- ggplot(data=data, aes(x=y, y=Freq/10))+
plot geom_bar(stat="identity", fill="coral") +
labs(title="Sampled Probability Distribution for a six sided die: Sample Size = 10", x="Number", y="Probability")
plot
library(purrr)
<- rdunif(10000,a=1,b=6)
y <- table(y)
t <- as.data.frame(t)
data <- ggplot(data=data, aes(x=y, y=Freq/10000))+
plot geom_bar(stat="identity", fill="coral") +
labs(title="Sampled Probability Distribution for a six sided die: Sample Size = 10000", x="Number", y="Probability")
plot
If the world was simple then we would live in a world of uniform distributions. The world is complex and so we don’t. The next level of complexity that we can introduce is to have uneven but still discrete distributions using Urn models where there are different numbers of balls in the urn of different colours/numbers/letters. Then you can have different probabilities and you again can sample with replacement. There are many variations on urn models with and without replacement.
The uniform distribution describes all of the possible event space for many games. The binomial distribution describes how often you are likely to see a successful event in repeated plays of that “game”. Another question that you can ask is what is the probability for how many times you need to play the game before you will win. That is the geometric or logarithmic distribution.