qnorm(.9,mean=182,sd=3.55)
[1] 186.5495
Expectation, Variance, Moment Generating Functions, Statistical Analysis, Biostatistics
Discrete Variable | Continuous Variable | |
---|---|---|
Definition | A random variable is discrete if it can assume at most a finite or countably infinite number of possible values | A random variable is continuous if it can assume any value in some interval or intervals of real numbers and the probability that it assumes any specific value is 0 |
Density Function | A function f is called a density for X if: (1) \(f(x) \ge 0\) (2) \(\sum_{all~x}f(x)=1\) (3) \(f(x)=P(X=x)\) for x real |
A function f is called a density for X if: (1) \(f(x) \ge 0\) for x real (2) \(\int_{-\infty}^{\infty} f(x) \; dx=1\) (3) \(P[a \le X \le] =\int_{a}^{b} f(x) \; dx\) for a and b real |
Cumulative Distribution Function for x real |
\(F(x)=P[X \le x]\) | \(F(x)=P[X \le x]=\int_{-\infty}^{\infty}f(t)dt\) |
\(E[H(X)]\) | \(\sum_{all ~x}H(x)f(x)\) | \(\int_{-\infty}^{\infty}H(x)f(x)\) |
\(\mu=E[X]\) | \(\sum_{all ~ x}xf(x)\) | \(\int_{-\infty}^{\infty}xf(x)\) |
Ordinary Moments the kth ordinary moment for variable X is defined as: \(E[X^k]\) |
\(\sum_{all ~ x \in X}(x^kf(x))\) | \(\int_{-\infty}^{\infty}(x^kf(x))\) |
Moment generating function (mgf) \(m_X(t)=E[e^{tX}]\) |
\(\sum_{all ~ x \in X}(e^{tx}f(x))\) | \(\int_{-\infty}^{\infty}(e^{tx}f(x)dx)\) |
Recall, a random variable is a real-valued function defined over a sample space, usually denoted by \(X\) or \(Y,\) and \(X\) is discrete if the space of \(X\) is finite or countably infinite.
If \(X\) is a discrete random variable with probability function \(p(x),\) then the expected value of \(X\), denoted \(E(X),\) is \[E(X) = \sum_{\text{all }x} x \cdot p(x).\] The expected value \(E(X)\) is also called the mean of \(X\), and is often denoted as \(\mu_X,\) or \(\mu\) if the random variable \(X\) is understood.
Let \(X\) be a discrete random variable with probability function \(p(x),\) and suppose \(g(X)\) is a real-valued function of \(X\). Then the expected value of \(g(X)\) is \[E(g(X)) = \sum_{\text{all }x} g(x) \cdot p(x).\]
If \(X\) is a random variable with expected value \(E(X) = \mu,\) the variance of \(X\), denoted \(V(X),\) is \[V(X) = E((X-\mu)^2).\] The variance of \(X\) is often denoted \(\sigma^2_X,\) or \(\sigma^2\) if the random variable is understood. Also, \(\sqrt{V(X)},\) denoted \(\sigma_X\) or \(\sigma,\) is called the standard deviation of \(X\).
Suppose \(X\) is a discrete random variable, \(c \in \mathbb{R}\) is a constant, and \(g,\) \(g_1,\) and \(g_2\) are functions of \(X\).
Let’s take the time to prove these properties. Each of them essentially follows by properties of summations.
Since the sum over all \(x\) of \(p(x)\) is 1 for any probability model, the result follows.
Here appeal to Theorem: \[\begin{align*} E(c\cdot g(X)) &= \sum_{\text{all }x} c \cdot g(x) \cdot p(x) & \\ &= c \sum_{\text{all }x} g(x) p(x) &\text{by arithmetic}\\ &= c E(g(X)) & \end{align*}\]
Here we also appeal to Theorem and arithmetic: \[\begin{align*} E(g_1(x) \pm g_2(x)) &= \sum_{\text{all }x} (g_1(x) \pm g_2(x))\cdot p(x) &\\ &= \sum_{\text{all }x} (g_1(x) p(x) \pm g_2(x) p(x)) &\text{by arithmetic}\\ &= \sum_{\text{all }x} g_1(x) p(x) \pm \sum_{\text{all }x} g_2(x) p(x) &\text{by arithmetic}\\ &= E(g_1(X)) \pm E(g_2(X)) & \end{align*}\]
Let \(X\) be a discrete random variable with probability function \(p(x)\) and expected value \(E(X) = \mu\). Then \[V(X) = E(X^2)-\mu^2.\]
By definition, \[\begin{align*} V(X) &= E((X-\mu)^2)\\ &= E(X^2 - 2\mu X + \mu^2) &\text{by expanding}\\ &= E(X^2) - E(2\mu X) + E(\mu^2) &\text{by E() Property 3} \\ &= E(X^2) - 2\mu E(X) + \mu^2 &\text{by E() Properties 2 and 1}\\ &= E(X^2) - 2\mu^2 + \mu^2 & \text{since }E(X)=\mu \\ V(X) &= E(X^2) - \mu^2. \end{align*}\]
Let \(X\) be a random variable with mean \(E(X) = \mu\) and finite variance \(V(X) = \sigma^2 > 0\). Then for any constant \(k > 0,\) \[P(|X - \mu| < k\sigma ) \geq 1 - \frac{1}{k^2}.\] Equivalently, \[P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}.\]
We prove Tchebysheff’s inequality in the case for a discrete random variable, and we come back to this theorem after defining continuous random variables.
Let \(k > 0\) be given.
Then \[V(X) = \sum_{\text{all }x} (x - \mu)^2 p(x),\] by the definition of variance. We can partition the space of \(X\) into three disjoint sets, depending on the location of \(x\) relative to \(\mu \pm k\sigma\):
\[V(X) = \sum_{\text{all } x \leq \mu - k\sigma} (x - \mu)^2 p(x) + \sum_{\text{all } x \text{ s.t. } |x-\mu|< k\sigma } (x - \mu)^2 p(x) + \sum_{\text{all } x \geq \mu + k\sigma} (x - \mu)^2 p(x)\]
Each of these three sums is non-negative, and for the first and third sums we can also say that \((x-\mu)^2 \geq k^2\sigma^2\) for all \(x\) in the given range, so it follows that \[V(x) \geq \sum_{\text{all } x \leq \mu - k\sigma} k^2\sigma^2 p(x) + 0 + \sum_{\text{all } x \geq \mu + k\sigma} k^2\sigma^2 p(x).\] So,
\[\begin{align*} \sigma^2 &\geq \sum_{\text{all } x \leq \mu - k\sigma} k^2\sigma^2 p(x) + 0 + \sum_{\text{all } x \geq \mu + k\sigma} k^2\sigma^2 p(x) \\ &= k^2\sigma^2 \left(\sum_{\text{all } x \leq \mu - k\sigma} p(x) + \sum_{\text{all } x \geq \mu + k\sigma} p(x) \right) \\ &= k^2\sigma^2\left(P(X\leq \mu-k\sigma)+P(X \geq \mu+k\sigma)\right) \\ &= k^2\sigma^2P(|X-\mu|\geq k\sigma) \end{align*}\]
Dividing both sides of the inequality by the positive value \(k^2\sigma^2\) gives us the result: \[P(|X-\mu| \geq k\sigma) \leq \frac{1}{k^2}.\]
If \(X\) is a continuous random variable with probability density function \(f(x),\) then the expected value of \(X\), denoted \(E(X),\) is \[E(X) = \int_{-\infty}^\infty x \cdot f(x)~dx,\] provided this integral exists. The expected value \(E(X)\) is also called the mean of \(X\), and is often denoted as \(\mu_X,\) or \(\mu\) if the random variable \(X\) is understood.
The expected value of the function \(g(X)\) of \(X\) is \[E(g(X)) = \int_{-\infty}^\infty g(x) \cdot f(x)~dx,\] provided this integral exists.
The variance of \(X\) is \[V(X) = E((X-\mu_X)^2),\] provided this integral exists.
As in the discrete case, one can show \(V(X) = E(X^2)-E(X)^2,\) a working formula for variance which is sometimes easier to use to calculate variance.
Find \(E(X)\) and \(V(X)\) where \(X\) is the continuous random variable .
Recall \(X\) has density function \(\displaystyle f(x) = 3x^2/8\) for \(0 \leq x \leq 2\).
Expected Value: \[\begin{align*} E(X) &= \int_0^2 x \cdot 3x^2/8~dx \\ &= \frac{3}{8} \int_0^2 x^3~dx \\ &= \frac{3}{8}\frac{1}{4}x^4 ~\biggr|_0^2 \\ &= \frac{3}{2}. \end{align*}\]
Variance: We first find \(E(X^2)\): \[\begin{align*} E(X^2) &= \int_0^2 x^2 \cdot 3x^2/8~dx \\ &= \frac{3}{8} \int_0^2 x^4~dx \\ &= \frac{3}{8}\frac{1}{5}x^5 ~\biggr|_0^2 \\ &= \frac{12}{5}. \end{align*}\]
Then, \[\begin{align*} V(X) &= E(X^2) - E(X)^2 \\ &= (12/5) - (3/2)^2\\ &= 0.15. \end{align*}\]
The properties of expected value that held for discrete random variables also hold for continuous random variables.
Suppose \(X\) is a continuous random variable, \(c \in \mathbb{R}\) is a constant, and \(g,\) \(g_1,\) and \(g_2\) are functions of \(X\).
These results follow immediately from properties of integration. For instance, to prove property 1 we observe that for constant \(c,\) \[E(c) = \int_{-\infty}^\infty c\cdot f(x)~ dx = c \int_{-\infty}^\infty f(x)~ dx,\] and the integral in the last expression equals 1 by definition of a valid probability density function.
Let \(X\) be a random variable (discrete or continuous) with \(E(X) = \mu\) and \(V(X) = \sigma^2,\) and let \(a, b\) be constants. Then
Proof.
This result follows immediately from properties of expected value .
Let \(Y = aX + b\). Then (a) says that \(E(Y) = a \mu + b,\) so \[\begin{align*} V(Y) &= E((Y-(a\mu + b))^2) \\ &= E\left(((aX+b)-(a\mu + b))^2\right)\\ &= E\left((aX-a\mu)^2\right)\\ &= a^2 E\left((X-\mu)^2\right) \end{align*}\] But \(E\left((X-\mu)^2\right)=V(X)\) by the definition of variance, so the result follows.
Moment generating functions(MGFs), Probability generating functions (PGFs) and characteristic functions provide a way of representing pdfs/pmfs through functions of a single variable. The are useful in many ways and these include:
For random variable \(X\) we have seen that \(E(X)\) and \(E(X^2)\) provide useful information:
Let \(X\) be a random variable, and \(k \geq 1\). The \(k\)th moment of \(X\) about the origin is \(E(X^k)\). More generally, for any constant \(c \in \mathbb{R},\) \(E((X-c)^k)\) is called the \(k\)th moment of \(X\) about \(x = c\).
Often times we can encode all the moments of a random variable in an object called a moment-generating function.
Let \(X\) be a discrete random variable with density function \(p(x)\). If there is a positive real number \(h\) such that for all \(t \in (-h,h),\) \[E(e^{tx})\] exists and is finite, then the function of \(t\) defined by \[m(t) = E(e^{tx})\] is called the moment-generating function of \(X\).
Suppose \(X\) has the density function \[ \begin{array}{c|c|c|c|c} x & 0 & 1 & 2 & 3 \\ \hline p(x) & .1 & .2 & .3 & .4 \end{array} \]
Then, for any real number \(t,\)
\[\begin{align*} m(t) &= E(e^{tx}) \\ &= \sum_{x=0}^3 e^{tx}\cdot p(x)\\ &= e^0\cdot (.1) +e^t\cdot (.2)+e^{2t}\cdot (.3) +e^{3t}\cdot (.4)\\ &= .1 + .2e^t + .3e^{2t} + .4e^{3t}, \end{align*}\]
and this sum exists as a finite number for any \(-\infty < t < \infty,\) so the mgf for \(X\) exists.
How does \(m(t)\) encode the moments \(E(X), E(X^2), E(X^3), \ldots\)?
Suppose \(X\) is a random variable with moment-generating function \(m(t)\) which exists for \(t\) in some open interval containing 0. Then the \(k\)th moment of \(X\) equals the \(k\)th derivative of \(m(t)\) evaluated at \(t = 0\): \[E(X^k) = m^{(k)}(0).\]
Proof. Let’s say \(X\) is discrete and \[m(t) = \sum_{\text{all }x} e^{tx}\cdot p(x).\] Then the derivative of \(m(t)\) with respect to the variable \(t\) is Then \[m^\prime(t) = \sum_{\text{all }x} x\cdot e^{tx}\cdot p(x),\] and letting \(t = 0\) we have \[m^\prime(0) = \sum_{\text{all }x} x\cdot e^{0}\cdot p(x),\] which equals \(E(X)\) since \(e^0 = 1\).
The second derivative of \(m(t)\) is \[\begin{align*} m^{\prime\prime}(t) &= \frac{d}{dt}\left[m^\prime(t)\right]\\ &=\sum_{\text{all }x} x^2\cdot e^{tx}\cdot p(x) \end{align*}\]
Evaluating this at \(t = 0\) gives \[m^{\prime\prime}(t)=\sum_{\text{all }x} x^2\cdot 1 \cdot p(x) = E(X^2).\]
Continuing in this manner, for any \(k \geq 1,\) the \(k\)th derivative of \(m(t)\) is \[m^{(k)}(t)=\sum_{\text{all }x} x^k\cdot e^{tx}\cdot p(x),\] which evaluates to the defintion of \(E(X^k)\) when \(t = 0\).
If \(X\) is geometric with parameter \(p,\) then \[p(x) = (1-p)^{x-1}\cdot p,\] for \(x = 1, 2, 3, \ldots,\) and
\[\begin{align*} m(t) &= E(e^{tx})\\ &= \sum_{x = 1}^\infty e^{tx}(1-p)^{x-1}\cdot p\\ &= pe^t \sum_{x=1}^\infty e^{t(x-1)}(1-p)^{x-1} &\text{since }e^t\cdot e^{t(x-1)} = e^{tx}\\ &= pe^t \sum_{x=1}^\infty[e^t(1-p)]^{x-1} &= pe^t \sum_{k=0}^\infty[e^t(1-p)]^{k} &\text{where }k=x-1 \text{ is a change of index}\\ &= pe^t\frac{1}{1-e^t(1-p)} \end{align*}\]
The last step is true by the geometric series formula, provided \(|e^t(1-p)|<1\). Since \(0\leq |e^t(1-p)| = e^t(1-p),\) the series converges by the geometric series formula if and only if \(e^t(1-p) < 1\). Well,
\[\begin{align*} e^t(1-p) < 1 &\iff e^t < \frac{1}{1-p} \\ &\iff t < \ln\left(\frac{1}{1-p}\right). \end{align*}\]
In other words, yes, there exists an interval containing 0 for which \(m(t)\) exists for all \(t\) in the interval.
Find the mgf of a Poisson random variable \(X\) with parameter \(\lambda\). Since we’re considering a Poisson distribution, our strategy for finding the mgf will be to work our expectation to look like a power series for \(e^{\text{junk}}\).
Strategy: Work our series to include \[\sum_{x=0}^\infty\frac{(\text{junk})^x}{x!}\] since this converges to \(e^{\text{junk}}\).
\[\begin{align*} m(t) &= E(e^{tx})\\ &= \sum_{x = 0}^\infty e^{tx}\frac{\lambda^x e^{-\lambda}}{x!}\\ &= e^{-\lambda} \sum_{x=0}^\infty \frac{(\lambda e^t)^x}{x!} &\text{here it is!}\\ &= e^{-\lambda}e^{[\lambda e^t]} &\text{for all } -\infty < t < \infty\\ &= e^{\lambda(e^t-1)}. \end{align*}\]
Let’s derive our \(\mu\) and \(\sigma\) formulas for a Poisson random variable using the mgf.
The first derivative is \[m^\prime(t) = e^{\lambda(e^t-1)} \cdot \lambda e^t,\] and \(m^\prime(0) = e^{\lambda(1-1)}\cdot \lambda e^0 = \lambda.\)
The second derivative is \[m^{\prime\prime}(t) = (e^{\lambda(e^t-1)} \cdot \lambda e^t) \cdot \lambda e^t + e^{\lambda(e^t-1)} \cdot \lambda e^t,\] so \[m^{\prime\prime}(0) = \lambda^2 + \lambda.\]
Now \[\mu = m^\prime(0) = \lambda,\] check! And, \[\sigma^2 = m^{\prime\prime}(0) - [m^\prime(0)]^2 = (\lambda^2 + \lambda) - \lambda^2 = \lambda,\] check again!
the moment-generating function (mgf) associated with a discrete random variable \(X,\) should it exist, is given by \[m_X(t) = E(e^{tX})\] where the function is defined on some open interval of \(t\) values containing 0. The same definition applies to continuous random variables. We have seen that this mgf encodes information about \(X\): the \(k\)th derivative of \(m\) evaluated at \(t = 0\) gives us the \(k\)th moment. That is, for \(k = 1,2,3,\ldots,\) \[m_X^{(k)}(0) = E(X^k).\]
In fact, it turns out that the mgf gives us all the information about a random variable \(X,\) per the following theorem, whose proof is beyond the scope of this course.
Let \(m_X(t)\) and \(m_Y(t)\) denote the mgfs of random variables \(X\) and \(Y,\) respectively. If both mgfs exist and \(m_X(t) = m_Y(t)\) for all values of \(t\) then \(X\) and \(Y\) have the same probability distribution.
Find the mgf for the standard normal random variable \(Z \sim N(0,1)\).
\[\begin{align*} m_Z(t) &= E(e^{tZ})\\ &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}}e^{-z^2/2}\cdot e^{tz}~dz\\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{tz-z^2/2}~dz\\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-\frac{1}{2}(z-t)^2+\frac{1}{2}t^2}~dz &\text{complete the square}\\ &= e^{\frac{1}{2}t^2}\left[\frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-\frac{1}{2}(z-t)^2}~dz\right] \end{align*}\]
The bracketed portion of this last expression equals 1, for all \(t,\) since it is the integral of the density function of a \(N(t,1)\) distribution, so \[m_Z(t) = e^{\frac{1}{2}t^2},\] for all \(-\infty < t < \infty\).
More generally, for \(X \sim N(\mu,\sigma),\) one can show its mgf is
\[\begin{equation} m(t) = e^{\left(\mu t + \frac{\sigma^2}{2}t^2\right)} \end{equation}\]
We now return to the proof of Theorem , which we restate as the following lemma.
If \(X\) is \(N(\mu,\sigma)\) and \(Z = \frac{X-\mu}{\sigma},\) then \(Z\) is \(N(0,1)\).
Let \(X\) be \(N(\mu,\sigma),\) and \(Z = \frac{X-\mu}{\sigma}\). Then the mgf for \(Z\) is
\[\begin{align*} m_Z(t) &= E\left[e^{tZ}\right]\\ &= E\left[e^{t\left(\frac{X-\mu}{\sigma}\right)}\right]\\ &= E\left[e^{\frac{Xt}{\sigma} - \frac{\mu t}{\sigma}}\right]\\ &= E\left[e^{Xt/\sigma} \cdot e^{-\mu t/\sigma}\right] \\ &= e^{-\mu t/\sigma}\cdot E\left[e^{Xt/\sigma}\right]\\ &= e^{-\mu t/\sigma}\cdot m_X(t/\sigma) \end{align*}\] This last step follows because \(\displaystyle E\left[e^{Xt/\sigma}\right]\) is the mgf of \(X\) evaluated at \(t/\sigma\). Then,
\[\begin{align*} m_Z(t) &= e^{-\mu t/\sigma}\cdot e^{\left(\mu (t/\sigma) + \frac{\sigma^2}{2}(t/\sigma)^2\right)} \\ &= e^{t^2/2} \end{align*}\]
But hey! This mgf is the mgf for \(N(0,1),\) so by Theorem , since \(Z = (X-\mu)/\sigma\) and \(N(0,1)\) have the same mgf, they have the same probability distribution.
If \(Z\) is \(N(0,1)\) then \(Z^2\) is \(\chi^2(1)\).
The proof of this lemma is left for now.
Let \(X_1, X_2, \ldots, X_n\) be independent random variables with mgfs \(m_1(t), m_2(t), \ldots m_n(t),\) respectively. If \(S_n = X_1 + X_2 + \cdots + X_n\) then \[m_{S_n}(t) = m_1(t) \cdot m_2(t) \cdot ~\cdots~ \cdot m_n(t).\]
Sketch of Proof:
\[\begin{align*} m_{S_n}(t) &= E\left[e^{t{S_n}}\right]\\ &= E\left[e^{t(X_1 + X_2 + \cdots X_n)}\right]\\ &= E\left[e^{tX_1}\cdot\ e^{tX_2} \cdot ~\cdots~ \cdot e^{tX_n}\right]\\ &= E\left[e^{tX_1}\right] \cdot E\left[e^{tX_2}\right] \cdot ~\cdots~ \cdot E\left[e^{tX_n}\right]\\ &= M_{X_1}(t) \cdot M_{X_2}(t) \cdot ~\cdots~ \cdot M_{X_n}(t) \end{align*}\]
Let \(X_1, X_2, \ldots, X_n\) be independent random variables coming from a distribution with mgf \(M(t)\) and distribution function \(F(x)\) . If \(S_n = X_1 + X_2 + \cdots + X_n\) then \[m_{S_n}(t) = m_1(t) \cdot m_2(t) \cdot ~\cdots~ \cdot m_n(t)=[m(t)]^n\]
That the \(E[~]\) distributes through the product in line 4 above follows since the \(X_i\) are assumed to be independent.
Let \(X_1, X_2, \ldots, X_n\) be independent normal random variables with \(X_i \sim N(\mu_i, \sigma_i),\) and let \(a_1, a_2, \ldots, a_n\) be constants. If \[S_n = \sum_{i=1}^n a_i X_i,\] then \(U\) is normally distribution with \[\mu = \sum_{i=1}^n a_i \mu_i ~~~ \text{ and } ~~~ \sigma^2 = \sum_{i=1}^n a_i^2 \sigma_i^2.\]
Since \(X_i\) is \(N(\mu_i,\sigma_i),\) \(X_i\) has mgf \[m_{X_i}(t) = e^{\left(\mu_it + \sigma_i^2t^2/2\right)}.\] For constant \(a_i,\) the random variable \(a_iX_i\) has mgf \[m_{a_iX_i}(t) =E(e^{a_iX_it}) = m_{X_i}(a_it) = e^{\left(\mu_ia_it + a_i^2\sigma_i^2t^2/2\right)}.\] Then by Theorem and properties of exponents, for \(S_n = \sum a_i X_i,\) \[\begin{align*} m_{S_n}(t) &= \prod_{i=1}^n m_{a_iX_i}(t) \\ &= \prod_{i=1}^n e^{\left(\mu_ia_it + a_i^2\sigma_i^2t^2/2\right)}\\ &= e^{\left(t\sum a_i\mu_i + \frac{t^2}{2}\sum a_i^2\mu_i^2\right)} \end{align*}\]
But hey! This is the mgf for a normal distribution with mean \(\sum a_i \mu\) and variance \(\sum a_i^2 \sigma_i^2,\) so we have proved the result.
Let \(X_1, X_2, \ldots, X_n\) be independent normal random variables with \(X_i \sim N(\mu_i, \sigma_i),\) and \(\displaystyle Z_i = \frac{X_i - \mu_i}{\sigma_i}\) for \(i = 1, \ldots, n\). Then \[U = \sum_{i=1}^n Z_i^2\] is \(\chi^2(n)\).
Suppose the number of customers arriving at a particular checkout counter in an hour follows a Poisson distribution. Let \(X_1\) record the time until the first arrival, \(X_2,\) the time between the 1st and 2nd arrival, and so on, up to \(X_n,\) the time between the \((n-1)\)st and \(n\)th arrival. Then it turns out the \(X_i\) are independent, and each is an exponential random variable with density \[f_{X_i}(x_i) = \frac{1}{\theta}e^{-x_i/\theta},\] for \(x_i > 0\) (and 0 else). Find the density function for the waiting time \(U\) until the \(n\)th customer arrives.
Well \(U = X_1 + X_2 + \cdots + X_n,\) so by Theorem , \[m_U(t) = m_1(t)\cdot ~\cdots~ \cdot m_n(t) = (1-\theta t)^{-n}.\] But, hey! This is the mgf for a gamma\((\alpha = n, \beta = \theta)\) random variable so by Theorem , \(U\) is gamma\((n,\theta)\). So \[f_U(u) = \frac{1}{(n-1)!\theta^n}u^{n-1}e^{-u/\theta},\] for \(u > 0\) (and 0 else).
If \(Y_1\) is \(N(10,.5)\) and \(Y_2\) is \(N(4,.2)\) and \(U = 100 + 7Y_1 + 3Y_2,\) how is \(U\) distributed, and what value marks the 90th percentile for \(U\)?
Theorem says that \(U\) is normal with \[E(U) = 100 + 7 \cdot 10 + 3 \cdot 4 = 182,\] and \[V(U) = 0 + 7^2\cdot (.5)^2 + 3^2\cdot(.2)^2 = 12.61,\] so \(\sigma_U = \sqrt{12.61} = 3.55.\)
The 90th percentile can be found in R with the qnorm()
function:
qnorm(.9,mean=182,sd=3.55)
[1] 186.5495
Find the moment-generating function for \(X ~\sim U(\theta_1, \theta_2)\).
\[\begin{align*} m_X(t) &= E(e^{tX})\\ &= \int_{\theta_1}^{\theta_2} e^{tx}\frac{1}{\theta_2-\theta_1}~dx\\ &= \frac{1}{\theta_2-\theta_1} \frac{1}{t}e^{tx}~\biggr|_{\theta_1}^{\theta_2} \\ &= \frac{e^{t(\theta_2-\theta_1)}}{t(\theta_2-\theta_1)}. \end{align*}\]
Find the moment-generating function for \(X \sim \text{gamma}(\alpha,\beta)\) and compute \(E(X)\) and \(V(X)\).
\[\begin{align*} m_X(t) &= E(e^{tX})\\ &= \int_{0}^{\infty} e^{tx} \cdot \frac{1}{\beta^\alpha \Gamma(\alpha)}x^{\alpha-1}e^{-(x/\beta)}~dx\\ &= \frac{1}{\beta^\alpha \Gamma(\alpha)} \int_{0}^{\infty} x^{\alpha - 1}e^{-x(1/\beta-t)}~dx\\ &= \frac{1}{\beta^\alpha \Gamma(\alpha)} \cdot \left(\frac{1}{1/\beta - t}\right)^\alpha \Gamma(\alpha) \int_{0}^{\infty} \frac{x^{\alpha - 1}e^{-x(1/\beta-t)}}{\left(\frac{1}{1/\beta - t}\right)^\alpha \Gamma(\alpha)}\cdot ~dx\\ &= \frac{1}{\beta^\alpha \Gamma(\alpha)} \cdot \left(\frac{1}{1/\beta - t}\right)^\alpha \Gamma(\alpha) \end{align*}\]
The last integral above evaluates to 1 because it is the pdf for a \(\text{gamma}(\alpha,\beta)\) distribution! After simplifying we obtain \[m_X(t) = (1-\beta t)^{-\alpha}.\]
With the mgf for a gamma random variable in hand, we can now derive its mean and variance, thus proving Theorem.
\[\begin{align*} m_X^\prime(t) &= -\alpha(1-\beta t)^{-\alpha-1}\cdot(-\beta) \\ &= \alpha\beta(1-\beta t)^{-\alpha-1}, \end{align*}\] so \[E(X) = m_X^\prime(0) = \alpha\beta.\] Turning to the second derivative, \[\begin{align*} m_X^{\prime\prime}(t) &= (-\alpha-1)\alpha\beta(1-\beta t)^{-\alpha-2}\cdot(\beta)\\ &= \alpha(\alpha+1)\beta^2(1-\beta t)^{-\alpha-2}, \end{align*}\] so \[E(X^2) = m_X^{\prime\prime}(0) = \alpha(\alpha+1)\beta^2.\] Thus, \[V(X) = E(X^2)-E(X)^2 = \alpha(\alpha+1)\beta^2 - (\alpha\beta)^2 = \alpha\beta^2.\]
Moment generating function properties:
mgf Theorems
Let \(X_1,X_2,...X_n,Y\) be random variables with moment-generating functions \(m_{X_1}(t),m_{X_2}(t),...,m_{X_n}(t),m_{Y}(t)\)
Moment | Uncentered | Centered |
---|---|---|
1st | \(E(X)=\mu=Mean(X)\) | |
2nd | \(E(X^2)\) | \(E((X-\mu)^2)=Var(X)=\sigma^2\) |
3rd | \(E(X^3)\) | \(E((X-\mu)^3)\) |
4th | \(E(X^4)\) | \(E((X-\mu)^4)\) |
Skewness(X) = \(E((X-\mu)^3)/\sigma^3\)
Kurtosis(X) = \(E((X-\mu)^4)/\sigma^4\)
If X is a random variable with cdf \(F_X(x)\), then any function of X, say g(X), is also a random variable. We set Y=g(X), then for any set A
\[P(Y \in A) = P(g(X) \in A)\]
Formally, if we write y = g(x), the function g(x) defines a mapping from the original sample space of X, \(\mathcal{X}\), to a new sample space, \(\mathcal{Y}\), the sample space of the random variable Y. That is,
\[g(x): \mathcal{X} \to \mathcal{Y}\]
We associate with g an inverse mapping, denoted by \(g^{-1}\),
\[g^{-1}(A) = \{ x \in \mathcal{X}: g(x) \in A\}\]
If the random variable Y is now defined by Y = g(X), we can write for any set \(A \subset \mathcal{Y}\),
\[P(Y \in A) = P(g(X) \in A) = P(\{x\in\mathcal{X}: g(x) \in A\} = P(X \in g^{-1}(A))\]
If Y is a discrete random variable, the pmf for Y is
\[f_Y(y) = P(Y=y) = \sum_{x \in g^{-1}(y)}P(X=x) =\sum_{x \in g^{-1}(y)}f_X(x),\text{ for }y \in \mathcal{Y}\]
It’s easiest to deal with function g(x) that are monotone, that is those that satisfy either increasing or decreasing. It the transformation x –> g(x) is monotone, then it is one-to-one and onto from \(\mathcal{X} \to \mathcal{Y}\).
Theorem 2.1.3
Let X have cdf \(F_X(x)\), let Y = g(X), and let \(\mathcal{X} = \{ x: f_X(x) > 0\}\), \(\mathcal{Y} = \{y: y = g(x)\text{ for some }x \in \mathcal{X}\}\).
Theorem 2.1.5
Let X have pdf \(f_X(x)\) and let \(Y=g(X)\), where g is a monotone function. Let \(\mathcal{X} = \{ x: f_X(x) > 0\}\), \(\mathcal{Y} = \{y: y = g(x)\text{ for some }x \in \mathcal{X}\}\). Suppose that \(f_X(x)\) is continuous on \(\mathcal{X}\) and that \(g^{-1}(y)\) has a continuous derivative on \(\mathcal{Y}\). THen the pdf of Y is given by
\[f_Y(y)= \begin{cases}f_X(g^{-1}(y))|\frac{d}{dy}g^{-1}(y)| & \quad y \in \mathcal{Y} \\ 0 & \quad \text{otherwise}\end{cases}\]