| Feature | PMD (Discrete Var) | PDF (Continuous Var) |
|---|---|---|
| Function Type | Probability Mass Function | Probability Density Function |
| Applicable to | Discrete Random Variables | Continuous Random Variables |
| What it gives | Probability that \(X = x\) | Density of \(X\) at \(x\) |
| Exact Value Probabilities | Possible (e.g., \(P(X = 3) = 0.25)\) | Not possible, \(P(X = x) = 0\) for all \(x\) |
| Sum vs Integral | Probabilities add up to 1 | Total area under the curve is 1 |
| Example | Rolling a die | Heights of people |
The (probability) mass function of a discrete random variable \(X\) is \(f_X(x) = P\{X = x\}\).
The mass function has two basic properties:
\(f_X (x) ≥ 0\) for all x ∈ S, the state space. Probabilities are non-negative.
\(\sum_{x}{}f_X = 1\) The collection
\(C_x = {ω; X (ω) = x}\)
for all x ∈ S, forms a partition of the probability space, Ω.
This ensures that the total probability across all possible values of \(X\) sums up to 1
| Example of PMF (Rolling a Die): |
| For a fair die, the PMF is: |
| \(P(X = x) = \begin{cases} \frac{1}{6}, & \text{if } x = 1, 2, 3, 4, 5, 6 \\ 0, & \text{otherwise} \end{cases}\) |
| This means that the probability of rolling any particular number (1 through 6) is \(\frac{1}{6}\), and the probability of rolling anything outside this range is 0 |
Example 7.22. Let’s make tosses of a biased coin whose outcomes are independent. We shall continue tossing until we obtain a toss of heads. Let X denote the random variable that gives the number of tails before the first head and p denote the probability of heads in any given toss. Then
\(f_X(0) = P\{X =0\} = P\{H\} = p\)
\(f_X(1) = P\{X = 1\} = P\{TH\} = (1 - p)p\)
\(f_X(2) = P\{X = 2\} = P\{TTH\} = {(1 - p)}^2p\)
\(f_X(x) = P\{X = x\} = P\{T \cdots TH\} = {(1 - p)}^xp\)
So, the probability mass function \(f_X(x) = P\{X = x\} = P\{T \cdots TH\} = {(1 - p)}^xp\)
Because the terms in the mass function form a geometric sequence, X is called a geometric random variable
Recall that a geometric sequence \(c, cr, cr^2, \cdots, cr^n\) has sum
\(s_n = c + cr + cr^2 + \cdots + cr^n = \frac{1 - r^{n+1}}{1-r}\)
for \(r \neq 1\)
If \(|r| < 1\), then \(lim_{n \to \infty} r^n = 0\)
And \(s_n = c + cr + cr^2 + \cdots + cr^n = lim_{n \to \infty} s_n = \frac{c}{1-r}\)
Exercise 7.24. We use R to investigate a geometric random variable with \(p = \frac{1}{4}\)
x <- c(0: 10)
f <- dgeom(x, 1 / 4)
F <- pgeom(x, 1 / 4)
data.frame(x, f, F)
## x f F
## 1 0 0.25000000 0.2500000
## 2 1 0.18750000 0.4375000
## 3 2 0.14062500 0.5781250
## 4 3 0.10546875 0.6835938
## 5 4 0.07910156 0.7626953
## 6 5 0.05932617 0.8220215
## 7 6 0.04449463 0.8665161
## 8 7 0.03337097 0.8998871
## 9 8 0.02502823 0.9249153
## 10 9 0.01877117 0.9436865
## 11 10 0.01407838 0.9577649
Check that the jumps in the cumulative distribution function \(F_X(x) - F_X(x - 1)\) is equal to the value of the mass function
In a geometric distribution, the probability mass function (PMF) is defined as: \(f(X) = P(X = x) = {(1-p)^x}p\)
The cumulative distribution function (CDF) is the probability that the random variable \(X\) is less than or equal to a given value \(x\):
\(F_X(x) = P(X \leq x)\)
The jump in the CDF at a specific value of \(x\) can be calculated as:
\(F_X(x) - F_X(x - 1) = P(X = x)\)
Find
P.X.leq <- pgeom(4, 1 / 4)
P.X.leq
## [1] 0.7626953
P.2.lt.X.leq.5 <- pgeom(5, 1 / 4) - pgeom(5, 1 / 4)
P.2.lt.X.leq.5
## [1] 0
P.X.geq.5 <- 1 - pgeom(4, 1 / 4)
P.X.geq.5
## [1] 0.2373047
What is a Continuous Random Variable? A continuous random variable can take on any value within a given range (often infinite or uncountable).
For example, the time it takes to complete a task, or the exact height of individuals.
For \(X\) a random variable whose distribution function \(F_X\) has a derivative. The function \(f_X\) satisfying
\(F_X(x) = \int_{x}^{-\infty}f_X(t)dt\)
is called the probability density function and
\(X\) is called a continuous random variable.
By the fundamental theorem of calculus, the density function is the derivative of the distribution function.
\(f_X(x) = lim_{\Delta x \to 0} \frac{F_X(x+\Delta x) - F_X(x)}{\Delta x} = {F_X}'(x)\)
In other words, if ∆x is small,
\(F_X (x + ∆x) − F_X (x) \approx fX (x)∆x\)
We can compute probabilities by evaluating definite integrals
\(P{a < X ≤ b} = F_X (b) − F_X (a)\)
\(= \int_{a}^{b}f_X(t)dt\).
The density function has two basic properties that mirror the properties of the mass function:
\(f_X(x) \geq 0\) for all x in the state space
\(\int_{- \infty}^{\infty} f_X(x) dx = 1\).
Exercise. Let \(f_X\) be the density for a random variable X and pick a number x0. Explain why \(P{X = x_0} = 0\).
Exercise. Let X be a continuous random variable with density
\(f_X(x) = \begin{cases} 0, & \text{if } x \leq 0 \\ \frac{2}{1 \sqrt{x}}, & \text{if } 0 < x \leq 1 \\ \\ 0, & \text{if } 1 < x \\ \end{cases}\)
In summary:
PMF: Applies to discrete variables and gives the probability for exact values.
PDF: Applies to continuous variables and gives the probability density, with probabilities calculated over intervals.