Mathematical Statistics for Health Researchers

Author
Affiliation

Bongani Ncube

University Of the Witwatersrand (School of Public Health)

Published

March 29, 2025

Keywords

Expectation, Variance, Moment Generating Functions, Statistical Analysis, Biostatistics

Summary of Random variables

Discrete Variable Continuous Variable
Definition A random variable is discrete if it can assume at most a finite or countably infinite number of possible values A random variable is continuous if it can assume any value in some interval or intervals of real numbers and the probability that it assumes any specific value is 0
Density Function A function f is called a density for X if:
(1) \(f(x) \ge 0\)
(2) \(\sum_{all~x}f(x)=1\)
(3) \(f(x)=P(X=x)\) for x real
A function f is called a density for X if:
(1) \(f(x) \ge 0\) for x real
(2) \(\int_{-\infty}^{\infty} f(x) \; dx=1\)
(3) \(P[a \le X \le] =\int_{a}^{b} f(x) \; dx\) for a and b real
Cumulative Distribution Function
for x real
\(F(x)=P[X \le x]\) \(F(x)=P[X \le x]=\int_{-\infty}^{\infty}f(t)dt\)
\(E[H(X)]\) \(\sum_{all ~x}H(x)f(x)\) \(\int_{-\infty}^{\infty}H(x)f(x)\)
\(\mu=E[X]\) \(\sum_{all ~ x}xf(x)\) \(\int_{-\infty}^{\infty}xf(x)\)
Ordinary Moments
the kth ordinary moment for variable X is defined as: \(E[X^k]\)
\(\sum_{all ~ x \in X}(x^kf(x))\) \(\int_{-\infty}^{\infty}(x^kf(x))\)
Moment generating function (mgf)
\(m_X(t)=E[e^{tX}]\)
\(\sum_{all ~ x \in X}(e^{tx}f(x))\) \(\int_{-\infty}^{\infty}(e^{tx}f(x)dx)\)


Expected Value For descrete Random Variables

Recall, a random variable is a real-valued function defined over a sample space, usually denoted by \(X\) or \(Y,\) and \(X\) is discrete if the space of \(X\) is finite or countably infinite.

Note

If \(X\) is a discrete random variable with probability function \(p(x),\) then the expected value of \(X\), denoted \(E(X),\) is \[E(X) = \sum_{\text{all }x} x \cdot p(x).\] The expected value \(E(X)\) is also called the mean of \(X\), and is often denoted as \(\mu_X,\) or \(\mu\) if the random variable \(X\) is understood.

Note

Let \(X\) be a discrete random variable with probability function \(p(x),\) and suppose \(g(X)\) is a real-valued function of \(X\). Then the expected value of \(g(X)\) is \[E(g(X)) = \sum_{\text{all }x} g(x) \cdot p(x).\]

Variance

Note

If \(X\) is a random variable with expected value \(E(X) = \mu,\) the variance of \(X\), denoted \(V(X),\) is \[V(X) = E((X-\mu)^2).\] The variance of \(X\) is often denoted \(\sigma^2_X,\) or \(\sigma^2\) if the random variable is understood. Also, \(\sqrt{V(X)},\) denoted \(\sigma_X\) or \(\sigma,\) is called the standard deviation of \(X\).

Properties of Expected Value

Note

Suppose \(X\) is a discrete random variable, \(c \in \mathbb{R}\) is a constant, and \(g,\) \(g_1,\) and \(g_2\) are functions of \(X\).

  1. \(E(c) = c\).
  2. \(E(c\cdot g(X))= cE(g(X))\).
  3. \(E(g_1(X) \pm g_2(X)) = E(g_1(X))\pm E(g_2(X))\).

Let’s take the time to prove these properties. Each of them essentially follows by properties of summations.

Proof
  1. Given a constant \(c,\) we can view this constant as a function of \(X,\) say \(f(x) = c\). Then \[\begin{align*} E(c) &= \sum_{\text{all }x} c \cdot p(x) \\ &= c \sum_{\text{all }x} p(x) \end{align*}\]

Since the sum over all \(x\) of \(p(x)\) is 1 for any probability model, the result follows.

  1. Here appeal to Theorem: \[\begin{align*} E(c\cdot g(X)) &= \sum_{\text{all }x} c \cdot g(x) \cdot p(x) & \\ &= c \sum_{\text{all }x} g(x) p(x) &\text{by arithmetic}\\ &= c E(g(X)) & \end{align*}\]

  2. Here we also appeal to Theorem and arithmetic: \[\begin{align*} E(g_1(x) \pm g_2(x)) &= \sum_{\text{all }x} (g_1(x) \pm g_2(x))\cdot p(x) &\\ &= \sum_{\text{all }x} (g_1(x) p(x) \pm g_2(x) p(x)) &\text{by arithmetic}\\ &= \sum_{\text{all }x} g_1(x) p(x) \pm \sum_{\text{all }x} g_2(x) p(x) &\text{by arithmetic}\\ &= E(g_1(X)) \pm E(g_2(X)) & \end{align*}\]

Let \(X\) be a discrete random variable with probability function \(p(x)\) and expected value \(E(X) = \mu\). Then \[V(X) = E(X^2)-\mu^2.\]

Important

By definition, \[\begin{align*} V(X) &= E((X-\mu)^2)\\ &= E(X^2 - 2\mu X + \mu^2) &\text{by expanding}\\ &= E(X^2) - E(2\mu X) + E(\mu^2) &\text{by E() Property 3} \\ &= E(X^2) - 2\mu E(X) + \mu^2 &\text{by E() Properties 2 and 1}\\ &= E(X^2) - 2\mu^2 + \mu^2 & \text{since }E(X)=\mu \\ V(X) &= E(X^2) - \mu^2. \end{align*}\]

Tchebysheff’s Theorem

Let \(X\) be a random variable with mean \(E(X) = \mu\) and finite variance \(V(X) = \sigma^2 > 0\). Then for any constant \(k > 0,\) \[P(|X - \mu| < k\sigma ) \geq 1 - \frac{1}{k^2}.\] Equivalently, \[P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}.\]

Important

We prove Tchebysheff’s inequality in the case for a discrete random variable, and we come back to this theorem after defining continuous random variables.

Let \(k > 0\) be given.

Then \[V(X) = \sum_{\text{all }x} (x - \mu)^2 p(x),\] by the definition of variance. We can partition the space of \(X\) into three disjoint sets, depending on the location of \(x\) relative to \(\mu \pm k\sigma\):

\[V(X) = \sum_{\text{all } x \leq \mu - k\sigma} (x - \mu)^2 p(x) + \sum_{\text{all } x \text{ s.t. } |x-\mu|< k\sigma } (x - \mu)^2 p(x) + \sum_{\text{all } x \geq \mu + k\sigma} (x - \mu)^2 p(x)\]

Each of these three sums is non-negative, and for the first and third sums we can also say that \((x-\mu)^2 \geq k^2\sigma^2\) for all \(x\) in the given range, so it follows that \[V(x) \geq \sum_{\text{all } x \leq \mu - k\sigma} k^2\sigma^2 p(x) + 0 + \sum_{\text{all } x \geq \mu + k\sigma} k^2\sigma^2 p(x).\] So,

\[\begin{align*} \sigma^2 &\geq \sum_{\text{all } x \leq \mu - k\sigma} k^2\sigma^2 p(x) + 0 + \sum_{\text{all } x \geq \mu + k\sigma} k^2\sigma^2 p(x) \\ &= k^2\sigma^2 \left(\sum_{\text{all } x \leq \mu - k\sigma} p(x) + \sum_{\text{all } x \geq \mu + k\sigma} p(x) \right) \\ &= k^2\sigma^2\left(P(X\leq \mu-k\sigma)+P(X \geq \mu+k\sigma)\right) \\ &= k^2\sigma^2P(|X-\mu|\geq k\sigma) \end{align*}\]

Dividing both sides of the inequality by the positive value \(k^2\sigma^2\) gives us the result: \[P(|X-\mu| \geq k\sigma) \leq \frac{1}{k^2}.\]

Expected Value for Continuous Random Variables

Note

If \(X\) is a continuous random variable with probability density function \(f(x),\) then the expected value of \(X\), denoted \(E(X),\) is \[E(X) = \int_{-\infty}^\infty x \cdot f(x)~dx,\] provided this integral exists. The expected value \(E(X)\) is also called the mean of \(X\), and is often denoted as \(\mu_X,\) or \(\mu\) if the random variable \(X\) is understood.

The expected value of the function \(g(X)\) of \(X\) is \[E(g(X)) = \int_{-\infty}^\infty g(x) \cdot f(x)~dx,\] provided this integral exists.

The variance of \(X\) is \[V(X) = E((X-\mu_X)^2),\] provided this integral exists.

As in the discrete case, one can show \(V(X) = E(X^2)-E(X)^2,\) a working formula for variance which is sometimes easier to use to calculate variance.

Find \(E(X)\) and \(V(X)\) where \(X\) is the continuous random variable .

Recall \(X\) has density function \(\displaystyle f(x) = 3x^2/8\) for \(0 \leq x \leq 2\).

Expected Value: \[\begin{align*} E(X) &= \int_0^2 x \cdot 3x^2/8~dx \\ &= \frac{3}{8} \int_0^2 x^3~dx \\ &= \frac{3}{8}\frac{1}{4}x^4 ~\biggr|_0^2 \\ &= \frac{3}{2}. \end{align*}\]

Variance: We first find \(E(X^2)\): \[\begin{align*} E(X^2) &= \int_0^2 x^2 \cdot 3x^2/8~dx \\ &= \frac{3}{8} \int_0^2 x^4~dx \\ &= \frac{3}{8}\frac{1}{5}x^5 ~\biggr|_0^2 \\ &= \frac{12}{5}. \end{align*}\]

Then, \[\begin{align*} V(X) &= E(X^2) - E(X)^2 \\ &= (12/5) - (3/2)^2\\ &= 0.15. \end{align*}\]

The properties of expected value that held for discrete random variables also hold for continuous random variables.

Note

Suppose \(X\) is a continuous random variable, \(c \in \mathbb{R}\) is a constant, and \(g,\) \(g_1,\) and \(g_2\) are functions of \(X\).

  1. \(E(c) = c\).
  2. \(E(c\cdot g(X))= cE(g(X))\).
  3. \(E(g_1(X) \pm g_2(X)) = E(g_1(X)\pm g_2(X))\).

These results follow immediately from properties of integration. For instance, to prove property 1 we observe that for constant \(c,\) \[E(c) = \int_{-\infty}^\infty c\cdot f(x)~ dx = c \int_{-\infty}^\infty f(x)~ dx,\] and the integral in the last expression equals 1 by definition of a valid probability density function.

Let \(X\) be a random variable (discrete or continuous) with \(E(X) = \mu\) and \(V(X) = \sigma^2,\) and let \(a, b\) be constants. Then

  1. \(\displaystyle E(aX + b) = aE(X) + b = a \mu + b.\)
  2. \(\displaystyle V(aX + b) = a^2V(X) = a^2 \sigma^2.\)

Proof.

  1. This result follows immediately from properties of expected value .

  2. Let \(Y = aX + b\). Then (a) says that \(E(Y) = a \mu + b,\) so \[\begin{align*} V(Y) &= E((Y-(a\mu + b))^2) \\ &= E\left(((aX+b)-(a\mu + b))^2\right)\\ &= E\left((aX-a\mu)^2\right)\\ &= a^2 E\left((X-\mu)^2\right) \end{align*}\] But \(E\left((X-\mu)^2\right)=V(X)\) by the definition of variance, so the result follows.

Moments and Moment-Generating Functions For Descrete Random Variables

Moment generating functions(MGFs), Probability generating functions (PGFs) and characteristic functions provide a way of representing pdfs/pmfs through functions of a single variable. The are useful in many ways and these include:

  1. Provide an easy way of calculating the moments of a distribution. This helps in the computation of mean and variance functions for the different variables.
  2. Provide some powerful tools for addressing certain counting and combinatorial problems
  3. Provide easy way of characterizing the distribution of the sum of independent random variables.
  4. Provide a bridge between complex analysis and probability, so the complex analysis methods can be brought to bear on probability problems.
  5. Provide powerful tools for proving limiting theorems such as the law of large numbers and the central limit theorems.
We have seen that:

For random variable \(X\) we have seen that \(E(X)\) and \(E(X^2)\) provide useful information:

  • \(\mu = E(X)\) gives the mean of the distribution
  • \(\sigma^2 = E(X^2) - E(X)^2\) gives the variance of the distribution.
Important

Let \(X\) be a random variable, and \(k \geq 1\). The \(k\)th moment of \(X\) about the origin is \(E(X^k)\). More generally, for any constant \(c \in \mathbb{R},\) \(E((X-c)^k)\) is called the \(k\)th moment of \(X\) about \(x = c\).

Often times we can encode all the moments of a random variable in an object called a moment-generating function.

Note

Let \(X\) be a discrete random variable with density function \(p(x)\). If there is a positive real number \(h\) such that for all \(t \in (-h,h),\) \[E(e^{tx})\] exists and is finite, then the function of \(t\) defined by \[m(t) = E(e^{tx})\] is called the moment-generating function of \(X\).

Suppose \(X\) has the density function \[ \begin{array}{c|c|c|c|c} x & 0 & 1 & 2 & 3 \\ \hline p(x) & .1 & .2 & .3 & .4 \end{array} \]

Then, for any real number \(t,\)

\[\begin{align*} m(t) &= E(e^{tx}) \\ &= \sum_{x=0}^3 e^{tx}\cdot p(x)\\ &= e^0\cdot (.1) +e^t\cdot (.2)+e^{2t}\cdot (.3) +e^{3t}\cdot (.4)\\ &= .1 + .2e^t + .3e^{2t} + .4e^{3t}, \end{align*}\]

and this sum exists as a finite number for any \(-\infty < t < \infty,\) so the mgf for \(X\) exists.

How does \(m(t)\) encode the moments \(E(X), E(X^2), E(X^3), \ldots\)?

Suppose \(X\) is a random variable with moment-generating function \(m(t)\) which exists for \(t\) in some open interval containing 0. Then the \(k\)th moment of \(X\) equals the \(k\)th derivative of \(m(t)\) evaluated at \(t = 0\): \[E(X^k) = m^{(k)}(0).\]

Proof. Let’s say \(X\) is discrete and \[m(t) = \sum_{\text{all }x} e^{tx}\cdot p(x).\] Then the derivative of \(m(t)\) with respect to the variable \(t\) is Then \[m^\prime(t) = \sum_{\text{all }x} x\cdot e^{tx}\cdot p(x),\] and letting \(t = 0\) we have \[m^\prime(0) = \sum_{\text{all }x} x\cdot e^{0}\cdot p(x),\] which equals \(E(X)\) since \(e^0 = 1\).

The second derivative of \(m(t)\) is \[\begin{align*} m^{\prime\prime}(t) &= \frac{d}{dt}\left[m^\prime(t)\right]\\ &=\sum_{\text{all }x} x^2\cdot e^{tx}\cdot p(x) \end{align*}\]

Evaluating this at \(t = 0\) gives \[m^{\prime\prime}(t)=\sum_{\text{all }x} x^2\cdot 1 \cdot p(x) = E(X^2).\]

Continuing in this manner, for any \(k \geq 1,\) the \(k\)th derivative of \(m(t)\) is \[m^{(k)}(t)=\sum_{\text{all }x} x^k\cdot e^{tx}\cdot p(x),\] which evaluates to the defintion of \(E(X^k)\) when \(t = 0\).

The mgf for a geometric distribution

If \(X\) is geometric with parameter \(p,\) then \[p(x) = (1-p)^{x-1}\cdot p,\] for \(x = 1, 2, 3, \ldots,\) and

\[\begin{align*} m(t) &= E(e^{tx})\\ &= \sum_{x = 1}^\infty e^{tx}(1-p)^{x-1}\cdot p\\ &= pe^t \sum_{x=1}^\infty e^{t(x-1)}(1-p)^{x-1} &\text{since }e^t\cdot e^{t(x-1)} = e^{tx}\\ &= pe^t \sum_{x=1}^\infty[e^t(1-p)]^{x-1} &= pe^t \sum_{k=0}^\infty[e^t(1-p)]^{k} &\text{where }k=x-1 \text{ is a change of index}\\ &= pe^t\frac{1}{1-e^t(1-p)} \end{align*}\]

The last step is true by the geometric series formula, provided \(|e^t(1-p)|<1\). Since \(0\leq |e^t(1-p)| = e^t(1-p),\) the series converges by the geometric series formula if and only if \(e^t(1-p) < 1\). Well,

\[\begin{align*} e^t(1-p) < 1 &\iff e^t < \frac{1}{1-p} \\ &\iff t < \ln\left(\frac{1}{1-p}\right). \end{align*}\]

In other words, yes, there exists an interval containing 0 for which \(m(t)\) exists for all \(t\) in the interval.

The mgf for a Poisson distribution

Find the mgf of a Poisson random variable \(X\) with parameter \(\lambda\). Since we’re considering a Poisson distribution, our strategy for finding the mgf will be to work our expectation to look like a power series for \(e^{\text{junk}}\).

Strategy: Work our series to include \[\sum_{x=0}^\infty\frac{(\text{junk})^x}{x!}\] since this converges to \(e^{\text{junk}}\).

\[\begin{align*} m(t) &= E(e^{tx})\\ &= \sum_{x = 0}^\infty e^{tx}\frac{\lambda^x e^{-\lambda}}{x!}\\ &= e^{-\lambda} \sum_{x=0}^\infty \frac{(\lambda e^t)^x}{x!} &\text{here it is!}\\ &= e^{-\lambda}e^{[\lambda e^t]} &\text{for all } -\infty < t < \infty\\ &= e^{\lambda(e^t-1)}. \end{align*}\]

Let’s derive our \(\mu\) and \(\sigma\) formulas for a Poisson random variable using the mgf.

The first derivative is \[m^\prime(t) = e^{\lambda(e^t-1)} \cdot \lambda e^t,\] and \(m^\prime(0) = e^{\lambda(1-1)}\cdot \lambda e^0 = \lambda.\)

The second derivative is \[m^{\prime\prime}(t) = (e^{\lambda(e^t-1)} \cdot \lambda e^t) \cdot \lambda e^t + e^{\lambda(e^t-1)} \cdot \lambda e^t,\] so \[m^{\prime\prime}(0) = \lambda^2 + \lambda.\]

Now \[\mu = m^\prime(0) = \lambda,\] check! And, \[\sigma^2 = m^{\prime\prime}(0) - [m^\prime(0)]^2 = (\lambda^2 + \lambda) - \lambda^2 = \lambda,\] check again!

Moments and Moment-Generating Functions For Normal Random Variables

the moment-generating function (mgf) associated with a discrete random variable \(X,\) should it exist, is given by \[m_X(t) = E(e^{tX})\] where the function is defined on some open interval of \(t\) values containing 0. The same definition applies to continuous random variables. We have seen that this mgf encodes information about \(X\): the \(k\)th derivative of \(m\) evaluated at \(t = 0\) gives us the \(k\)th moment. That is, for \(k = 1,2,3,\ldots,\) \[m_X^{(k)}(0) = E(X^k).\]

In fact, it turns out that the mgf gives us all the information about a random variable \(X,\) per the following theorem, whose proof is beyond the scope of this course.

Let \(m_X(t)\) and \(m_Y(t)\) denote the mgfs of random variables \(X\) and \(Y,\) respectively. If both mgfs exist and \(m_X(t) = m_Y(t)\) for all values of \(t\) then \(X\) and \(Y\) have the same probability distribution.

Find the mgf for the standard normal random variable \(Z \sim N(0,1)\).

\[\begin{align*} m_Z(t) &= E(e^{tZ})\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-z^2/2}\cdot e^{tz}~dz\\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{tz-z^2/2}~dz\\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-\frac{1}{2}(z-t)^2+\frac{1}{2}t^2}~dz &\text{complete the square}\\ &= e^{\frac{1}{2}t^2}\left[\frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-\frac{1}{2}(z-t)^2}~dz\right] \end{align*}\]

The bracketed portion of this last expression equals 1, for all \(t,\) since it is the integral of the density function of a \(N(t,1)\) distribution, so \[m_Z(t) = e^{\frac{1}{2}t^2},\] for all \(-\infty < t < \infty\).

More generally, for \(X \sim N(\mu,\sigma),\) one can show its mgf is

\[\begin{equation} m(t) = e^{\left(\mu t + \frac{\sigma^2}{2}t^2\right)} \end{equation}\]

We now return to the proof of Theorem , which we restate as the following lemma.

If \(X\) is \(N(\mu,\sigma)\) and \(Z = \frac{X-\mu}{\sigma},\) then \(Z\) is \(N(0,1)\).

Note

Let \(X\) be \(N(\mu,\sigma),\) and \(Z = \frac{X-\mu}{\sigma}\). Then the mgf for \(Z\) is

\[\begin{align*} m_Z(t) &= E\left[e^{tZ}\right]\\ &= E\left[e^{t\left(\frac{X-\mu}{\sigma}\right)}\right]\\ &= E\left[e^{\frac{Xt}{\sigma} - \frac{\mu t}{\sigma}}\right]\\ &= E\left[e^{Xt/\sigma} \cdot e^{-\mu t/\sigma}\right] \\ &= e^{-\mu t/\sigma}\cdot E\left[e^{Xt/\sigma}\right]\\ &= e^{-\mu t/\sigma}\cdot m_X(t/\sigma) \end{align*}\] This last step follows because \(\displaystyle E\left[e^{Xt/\sigma}\right]\) is the mgf of \(X\) evaluated at \(t/\sigma\). Then,

\[\begin{align*} m_Z(t) &= e^{-\mu t/\sigma}\cdot e^{\left(\mu (t/\sigma) + \frac{\sigma^2}{2}(t/\sigma)^2\right)} \\ &= e^{t^2/2} \end{align*}\]

But hey! This mgf is the mgf for \(N(0,1),\) so by Theorem , since \(Z = (X-\mu)/\sigma\) and \(N(0,1)\) have the same mgf, they have the same probability distribution.

If \(Z\) is \(N(0,1)\) then \(Z^2\) is \(\chi^2(1)\).

The proof of this lemma is left for now.

Note

Let \(X_1, X_2, \ldots, X_n\) be independent random variables with mgfs \(m_1(t), m_2(t), \ldots m_n(t),\) respectively. If \(S_n = X_1 + X_2 + \cdots + X_n\) then \[m_{S_n}(t) = m_1(t) \cdot m_2(t) \cdot ~\cdots~ \cdot m_n(t).\]

Sketch of Proof:

\[\begin{align*} m_{S_n}(t) &= E\left[e^{t{S_n}}\right]\\ &= E\left[e^{t(X_1 + X_2 + \cdots X_n)}\right]\\ &= E\left[e^{tX_1}\cdot\ e^{tX_2} \cdot ~\cdots~ \cdot e^{tX_n}\right]\\ &= E\left[e^{tX_1}\right] \cdot E\left[e^{tX_2}\right] \cdot ~\cdots~ \cdot E\left[e^{tX_n}\right]\\ &= M_{X_1}(t) \cdot M_{X_2}(t) \cdot ~\cdots~ \cdot M_{X_n}(t) \end{align*}\]

Note

Let \(X_1, X_2, \ldots, X_n\) be independent random variables coming from a distribution with mgf \(M(t)\) and distribution function \(F(x)\) . If \(S_n = X_1 + X_2 + \cdots + X_n\) then \[m_{S_n}(t) = m_1(t) \cdot m_2(t) \cdot ~\cdots~ \cdot m_n(t)=[m(t)]^n\]

That the \(E[~]\) distributes through the product in line 4 above follows since the \(X_i\) are assumed to be independent.

Note

Let \(X_1, X_2, \ldots, X_n\) be independent normal random variables with \(X_i \sim N(\mu_i, \sigma_i),\) and let \(a_1, a_2, \ldots, a_n\) be constants. If \[S_n = \sum_{i=1}^n a_i X_i,\] then \(U\) is normally distribution with \[\mu = \sum_{i=1}^n a_i \mu_i ~~~ \text{ and } ~~~ \sigma^2 = \sum_{i=1}^n a_i^2 \sigma_i^2.\]

Note

Since \(X_i\) is \(N(\mu_i,\sigma_i),\) \(X_i\) has mgf \[m_{X_i}(t) = e^{\left(\mu_it + \sigma_i^2t^2/2\right)}.\] For constant \(a_i,\) the random variable \(a_iX_i\) has mgf \[m_{a_iX_i}(t) =E(e^{a_iX_it}) = m_{X_i}(a_it) = e^{\left(\mu_ia_it + a_i^2\sigma_i^2t^2/2\right)}.\] Then by Theorem and properties of exponents, for \(S_n = \sum a_i X_i,\) \[\begin{align*} m_{S_n}(t) &= \prod_{i=1}^n m_{a_iX_i}(t) \\ &= \prod_{i=1}^n e^{\left(\mu_ia_it + a_i^2\sigma_i^2t^2/2\right)}\\ &= e^{\left(t\sum a_i\mu_i + \frac{t^2}{2}\sum a_i^2\mu_i^2\right)} \end{align*}\]

But hey! This is the mgf for a normal distribution with mean \(\sum a_i \mu\) and variance \(\sum a_i^2 \sigma_i^2,\) so we have proved the result.

Let \(X_1, X_2, \ldots, X_n\) be independent normal random variables with \(X_i \sim N(\mu_i, \sigma_i),\) and \(\displaystyle Z_i = \frac{X_i - \mu_i}{\sigma_i}\) for \(i = 1, \ldots, n\). Then \[U = \sum_{i=1}^n Z_i^2\] is \(\chi^2(n)\).

Note

Suppose the number of customers arriving at a particular checkout counter in an hour follows a Poisson distribution. Let \(X_1\) record the time until the first arrival, \(X_2,\) the time between the 1st and 2nd arrival, and so on, up to \(X_n,\) the time between the \((n-1)\)st and \(n\)th arrival. Then it turns out the \(X_i\) are independent, and each is an exponential random variable with density \[f_{X_i}(x_i) = \frac{1}{\theta}e^{-x_i/\theta},\] for \(x_i > 0\) (and 0 else). Find the density function for the waiting time \(U\) until the \(n\)th customer arrives.

Well \(U = X_1 + X_2 + \cdots + X_n,\) so by Theorem , \[m_U(t) = m_1(t)\cdot ~\cdots~ \cdot m_n(t) = (1-\theta t)^{-n}.\] But, hey! This is the mgf for a gamma\((\alpha = n, \beta = \theta)\) random variable so by Theorem , \(U\) is gamma\((n,\theta)\). So \[f_U(u) = \frac{1}{(n-1)!\theta^n}u^{n-1}e^{-u/\theta},\] for \(u > 0\) (and 0 else).

Note

If \(Y_1\) is \(N(10,.5)\) and \(Y_2\) is \(N(4,.2)\) and \(U = 100 + 7Y_1 + 3Y_2,\) how is \(U\) distributed, and what value marks the 90th percentile for \(U\)?

Theorem says that \(U\) is normal with \[E(U) = 100 + 7 \cdot 10 + 3 \cdot 4 = 182,\] and \[V(U) = 0 + 7^2\cdot (.5)^2 + 3^2\cdot(.2)^2 = 12.61,\] so \(\sigma_U = \sqrt{12.61} = 3.55.\)

The 90th percentile can be found in R with the qnorm() function:

qnorm(.9,mean=182,sd=3.55)
[1] 186.5495
MGF for a Uniform Distribution

Find the moment-generating function for \(X ~\sim U(\theta_1, \theta_2)\).

\[\begin{align*} m_X(t) &= E(e^{tX})\\ &= \int_{\theta_1}^{\theta_2} e^{tx}\frac{1}{\theta_2-\theta_1}~dx\\ &= \frac{1}{\theta_2-\theta_1} \frac{1}{t}e^{tx}~\biggr|_{\theta_1}^{\theta_2} \\ &= \frac{e^{t(\theta_2-\theta_1)}}{t(\theta_2-\theta_1)}. \end{align*}\]

MGF for a Gamma Distribution

Find the moment-generating function for \(X \sim \text{gamma}(\alpha,\beta)\) and compute \(E(X)\) and \(V(X)\).

\[\begin{align*} m_X(t) &= E(e^{tX})\\ &= \int_{0}^{\infty} e^{tx} \cdot \frac{1}{\beta^\alpha \Gamma(\alpha)}x^{\alpha-1}e^{-(x/\beta)}~dx\\ &= \frac{1}{\beta^\alpha \Gamma(\alpha)} \int_{0}^{\infty} x^{\alpha - 1}e^{-x(1/\beta-t)}~dx\\ &= \frac{1}{\beta^\alpha \Gamma(\alpha)} \cdot \left(\frac{1}{1/\beta - t}\right)^\alpha \Gamma(\alpha) \int_{0}^{\infty} \frac{x^{\alpha - 1}e^{-x(1/\beta-t)}}{\left(\frac{1}{1/\beta - t}\right)^\alpha \Gamma(\alpha)}\cdot ~dx\\ &= \frac{1}{\beta^\alpha \Gamma(\alpha)} \cdot \left(\frac{1}{1/\beta - t}\right)^\alpha \Gamma(\alpha) \end{align*}\]

The last integral above evaluates to 1 because it is the pdf for a \(\text{gamma}(\alpha,\beta)\) distribution! After simplifying we obtain \[m_X(t) = (1-\beta t)^{-\alpha}.\]

With the mgf for a gamma random variable in hand, we can now derive its mean and variance, thus proving Theorem.

\[\begin{align*} m_X^\prime(t) &= -\alpha(1-\beta t)^{-\alpha-1}\cdot(-\beta) \\ &= \alpha\beta(1-\beta t)^{-\alpha-1}, \end{align*}\] so \[E(X) = m_X^\prime(0) = \alpha\beta.\] Turning to the second derivative, \[\begin{align*} m_X^{\prime\prime}(t) &= (-\alpha-1)\alpha\beta(1-\beta t)^{-\alpha-2}\cdot(\beta)\\ &= \alpha(\alpha+1)\beta^2(1-\beta t)^{-\alpha-2}, \end{align*}\] so \[E(X^2) = m_X^{\prime\prime}(0) = \alpha(\alpha+1)\beta^2.\] Thus, \[V(X) = E(X^2)-E(X)^2 = \alpha(\alpha+1)\beta^2 - (\alpha\beta)^2 = \alpha\beta^2.\]

Moment generating function

Moment generating function properties:

  1. \(\frac{d^k(m_X(t))}{dt^k}|_{t=0}=E[X^k]\)
  2. \(\mu=E[X]=m_X'(0)\)
  3. \(E[X^2]=m_X''(0)\)

mgf Theorems

Let \(X_1,X_2,...X_n,Y\) be random variables with moment-generating functions \(m_{X_1}(t),m_{X_2}(t),...,m_{X_n}(t),m_{Y}(t)\)

  1. If \(m_{X_1}(t)=m_{X_2}(t)\) for all t in some open interval about 0, then \(X_1\) and \(X_2\) have the same distribution
  2. If \(Y = \alpha + \beta X_1\), then \(m_{Y}(t)= e^{\alpha t}m_{X_1}(\beta t)\)
  3. If \(X_1,X_2,...X_n\) are independent and \(Y = \alpha_0 + \alpha_1 X_1 + \alpha_2 X_2 + ... + \alpha_n X_n\) (where \(\alpha_0, ... ,\alpha_n\) are real numbers), then \(m_{Y}(t)=e^{\alpha_0 t}m_{X_1}(\alpha_1t)m_{X_2}(\alpha_2 t)...m_{X_n}(\alpha_nt)\)
  4. Suppose \(X_1,X_2,...X_n\) are independent normal random variables with means \(\mu_1,\mu_2,...\mu_n\) and variances \(\sigma^2_1,\sigma^2_2,...,\sigma^2_n\). If \(Y = \alpha_0 + \alpha_1 X_1 + \alpha_2 X_2 + ... + \alpha_n X_n\) (where \(\alpha_0, ... ,\alpha_n\) are real numbers), then Y is normally distributed with mean \(\mu_Y = \alpha_0 + \alpha_1 \mu_1 +\alpha_2 \mu_2 + ... + \alpha_n \mu_n\) and variance \(\sigma^2_Y = \alpha_1^2 \sigma_1^2 + \alpha_2^2 \sigma_2^2 + ... + \alpha_n^2 \sigma_n^2\)

Moment

Moment Uncentered Centered
1st \(E(X)=\mu=Mean(X)\)
2nd \(E(X^2)\) \(E((X-\mu)^2)=Var(X)=\sigma^2\)
3rd \(E(X^3)\) \(E((X-\mu)^3)\)
4th \(E(X^4)\) \(E((X-\mu)^4)\)

Skewness(X) = \(E((X-\mu)^3)/\sigma^3\)

Kurtosis(X) = \(E((X-\mu)^4)/\sigma^4\)

Variate Transformations

Transformations and Expectations

Distributions of Functions of a Random Variable

If X is a random variable with cdf \(F_X(x)\), then any function of X, say g(X), is also a random variable. We set Y=g(X), then for any set A

\[P(Y \in A) = P(g(X) \in A)\]

Formally, if we write y = g(x), the function g(x) defines a mapping from the original sample space of X, \(\mathcal{X}\), to a new sample space, \(\mathcal{Y}\), the sample space of the random variable Y. That is,

\[g(x): \mathcal{X} \to \mathcal{Y}\]

We associate with g an inverse mapping, denoted by \(g^{-1}\),

\[g^{-1}(A) = \{ x \in \mathcal{X}: g(x) \in A\}\]

If the random variable Y is now defined by Y = g(X), we can write for any set \(A \subset \mathcal{Y}\),

\[P(Y \in A) = P(g(X) \in A) = P(\{x\in\mathcal{X}: g(x) \in A\} = P(X \in g^{-1}(A))\]

If Y is a discrete random variable, the pmf for Y is

\[f_Y(y) = P(Y=y) = \sum_{x \in g^{-1}(y)}P(X=x) =\sum_{x \in g^{-1}(y)}f_X(x),\text{ for }y \in \mathcal{Y}\]

It’s easiest to deal with function g(x) that are monotone, that is those that satisfy either increasing or decreasing. It the transformation x –> g(x) is monotone, then it is one-to-one and onto from \(\mathcal{X} \to \mathcal{Y}\).

Theorem 2.1.3

Let X have cdf \(F_X(x)\), let Y = g(X), and let \(\mathcal{X} = \{ x: f_X(x) > 0\}\), \(\mathcal{Y} = \{y: y = g(x)\text{ for some }x \in \mathcal{X}\}\).

  • If g is an increasing function on \(\mathcal{X}\), \(F_Y(y) = F_X(g^{-1}(g))\text{ for }y \in \mathcal{Y}\)
  • If g is a decreasing function on \(\mathcal{X}\) and X is a continuous random variable, \(F_Y(y) = 1-F_X(g^{-1}(y))\text{ for }y \in \mathcal{Y}\)

Theorem 2.1.5

Let X have pdf \(f_X(x)\) and let \(Y=g(X)\), where g is a monotone function. Let \(\mathcal{X} = \{ x: f_X(x) > 0\}\), \(\mathcal{Y} = \{y: y = g(x)\text{ for some }x \in \mathcal{X}\}\). Suppose that \(f_X(x)\) is continuous on \(\mathcal{X}\) and that \(g^{-1}(y)\) has a continuous derivative on \(\mathcal{Y}\). THen the pdf of Y is given by

\[f_Y(y)= \begin{cases}f_X(g^{-1}(y))|\frac{d}{dy}g^{-1}(y)| & \quad y \in \mathcal{Y} \\ 0 & \quad \text{otherwise}\end{cases}\]