Data 605 Assignment Week 8

Alexander Ng

10/17/2019

Chapter 7.2 page 303 Problem 11

Statement

A company buys 100 lightbulbs, each of which has an exponential lifetime of 1000 hours. What is the expected time for the first of these bulbs to burn out? (See Exercise 10.)

Solution

According to exercise 10, for \(n\) independent random variables each of which has an exponential density with mean \(\mu\). Let \(M\) be the minimum value of the \(X_j\). Exercise 10 states that the density for \(M\) is exponential with mean \(\mu/n\).

Since a variable \(X_j\) with exponential distribution and mean \(\mu\) must have intensity \(\lambda\) satisfying \[\mu = \frac{1}{\lambda}\]

Since \(\mu = 1000\) hours, this implies \(\lambda = 1/1000 = 0.001\).

Exercise 10 implies the expected time for the first of these bulbs to burn out is 10 hours. This can be seen by calculating the expected value of the minimum \(M\) as follows:

\[ E[M] = \frac{\mu}{n} = \frac{1000}{100} = 10\]

We can also empirically demonstrate this by a Monte Carlo simulation below.

First we will simulate \(S=5000\) trials. In each trial, we randomly simulate the lifetimes of \(n=100\) independent lightbulbs which follow an exponential density with parameter \(\lambda = 0.001\). We extract the minimum \(V\) and the average \(E\) of the lifetimes \((X_1, \dots, X_n)\). Then we plot the histogram of the distributions of \(V\) and \(E\) and summarize their means.

lambda = 1/1000
batch = 100
N = 5000

# Allows the storage for the trial outcomes
V = c(1:N)
E = c(1:N)

# Run the simulation

for (i in 1:N)
{
    U = runif(batch)
    T = -log(U)/lambda # T follows an exponential distribution
    V[i] = min(T)   # Stores the batch minimum
    E[i] = mean(T)  # Stores the batch mean
}
# Show the histogram and mean of the batch minima
# The mean of the batch minima is close to 10.
hist(V)

summary(V)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00601   2.92017   6.80071   9.91079  13.81737 136.14662
# The mean of the batch means is close to 1000.
hist(E)

summary(E)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   669.5   932.2   997.8  1000.6  1066.7  1383.7

The above results confirm the minimum \(M\) has mean 10 while the average lifetime is 1000 hours.

Chapter 7.2 page 303 Problem 14

Statement

Assume that \(X_1\) and \(X_2\) are independent random variables, each having an exponential density with parameter \(\lambda\). Show that \(Z = X_1 - X_2\) has density

\[ f_{Z}(z) = \frac{1}{2}\lambda e^{-\lambda |z|}\]

Solution

The density \(f_Z\) is obtained by calculating the convolution of the sum of \(X_1\) and \(-X_2\) carefully. This requires integrating the convolution over the support of the density carefully.

For notational simplicity, let \(X = X_1\) have an exponential distribution with parameter \(\lambda > 0\) and density \(f\). Let \(Y = -X_2\) where \(X_2\) is an i.i.d. rv. The density of \(Y\) which we denote as \(g\) is the reflection of \(f\) across the \(y\) axis.

\[g(t) = f(-t) \text{ for all } t\] Thus, we have the densities: \[ \begin{equation} f(t) = \begin{cases} \lambda e^{-\lambda t } & t \geq 0 \\ 0 & t < 0 \\ \end{cases} \end{equation} \] \[ \begin{equation} g(t) = \begin{cases} \lambda e^{-\lambda |t| } & t \leq 0 \\ 0 & t > 0 \\ \end{cases} \end{equation} \]

The density \(f_Z\) of \(Z = X + Y\) is the convolution of the densities \(f\) and \(g\). This means \[ f_Z(z) = (f * g)(z) = \int_{-\infty}^{\infty} f(z-y)g(y)dy\] The convolution is calculated differently according to the sign of \(z\). There are two cases.

Case 1. \(z\geq 0\).

The support of the integrand \(f(z-y)g(y)\) are those \(y\) where \(f(z-y) \neq 0\) and \(g(y) \neq 0\). Since we know the support of \(f\) is all non-negative numbers, one constraint is

\[ f(z-y) \neq 0 \implies z-y > 0 \implies z > y\]

The second constraint is the support of \(g\) is all non-positive reals. Thus we have \[ g(y) \neq 0 \implies y \leq 0 \]

But since \(z\) is assumed positive and \(y\) is required to be non-positive, the first constraint \(z>y\) is redundant. Thus, the support of the integrand is simply \(y \leq 0\).

On this interval, we observe \(|y| = -y\). Thus we write the convolution as

\[\begin{align} f_Z(z) &= (f * g)(z) \\ &= \int_{-\infty}^{0} f(z-y)g(y)dy \\ &= \int_{-\infty}^{0} \lambda e^{-\lambda (z-y)} \lambda e^{-\lambda (-y)}dy \\ &= \lambda^2 e^{-\lambda z}\int_{-\infty}^{0} e^{2\lambda y}dy \\ &= \lambda^2 e^{-\lambda z} \frac{e^{2\lambda y}}{2\lambda} \vert_{-\infty}^{0} \\ &= \lambda^2 e^{-\lambda z} \left( \frac{1}{2\lambda} - 0 \right) \\ &= \frac{\lambda}{2} e^{-\lambda z} \text{ for } z > 0 \\ &= \frac{\lambda}{2} e^{-\lambda |z|} \\ \end{align} \] This proves the result for Case 1.

Case 2. \(z<0\)

We will show the support of the integrand of the convolution is on the interval \((-\infty, z]\)

The support of \(g\) implies the following constraints:

\[g(y) \neq 0 \implies y \leq 0\] and the support of \(f\) implies:

\[f(z-y) \neq 0 \implies z-y > 0 \implies y < z \]

Since $z < 0 $, the second constraint implies \[y \in (-\infty, z]\]

Moreover, we observe \(|y| = -y\) in this interval and use that below.

\[ \begin{align} f_Z(z) & = (f*g)(z) \\ & = \int_{-\infty}^{z} f(z-y)g(y)dy \\ & = \int_{-\infty}^{z} \lambda e^{-\lambda (z-y)} \lambda e^{-\lambda y}dy \\ & = \lambda^2 e^{-\lambda z} \int_{-\infty}^{0} e^{2\lambda y}dy \\ & = \lambda^2 e^{-\lambda z} \left( \frac{e^{2\lambda y}}{2 \lambda} \right) \rvert_{-\infty}^{z} \\ & = \lambda^2 e^{-\lambda z} \left( \frac{e^{2\lambda z}}{ 2\lambda} - 0 \right) \\ & = \frac{\lambda}{2} e^{\lambda z} \\ & = \frac{\lambda}{2} e^{-\lambda |z|} \text{ for } z \leq 0 \end{align} \]

This proves the problem for both cases.

Chapter 8.2 Page 320-321 Problem 1

Statement

Let X be a continuous random variable with mean µ = 10 and variance \(\sigma^2 = 100/3\). Using Chebyshev’s Inequality, find an upper bound for the following probabilities.
(a) P(|X − 10| ≥ 2). (b) P(|X − 10| ≥ 5). (c) P(|X − 10| ≥ 9). (d) P(|X − 10| ≥ 20).

Solution

The inequality states: \[P(\lvert X - \mu \rvert \geq \epsilon ) \leq \frac{ \sigma^2}{\epsilon}\]

  1. Using \(\epsilon = 2\) implies \[ P\left( |X - 10| \geq 2 \right) \leq \frac{\sigma^2}{\epsilon} = \frac{100}{3 \cdot 2} = \frac{100}{6} = \frac{50}{3}\]

  2. Using \(\epsilon = 5\) implies \[ P\left( |X - 10| \geq 5 \right) \leq \frac{\sigma^2}{\epsilon} = \frac{100}{3 \cdot 5} = \frac{100}{15} = \frac{20}{3}\]

  3. Using \(\epsilon = 9\) implies \[ P\left( |X - 10| \geq 9 \right) \leq \frac{\sigma^2}{\epsilon} = \frac{100}{3 \cdot 9} = \frac{100}{27}\]
  4. Using \(\epsilon = 20\) implies \[ P\left( |X - 10| \geq 20 \right) \leq \frac{\sigma^2}{\epsilon} = \frac{100}{3 \cdot 20} = \frac{5}{3}\]