Lesson 5.1

Introduction

Many types of measurements, such as height, weight, angle, temperature, etc., may in principle have a continuum of possible values. Continuous random variables are used to model uncertainty regarding future values of such measurements.

The main difference between discrete and continuous random variables is in the sample space, i.e., the collection of possible outcomes. The former type is used when the possible outcomes are separated from each other as the integers are. The latter type is used when the possible outcomes are the entire line of real numbers or when they form an interval (possibly an open ended one) of real numbers.

The difference between the two types of sample spaces implies differences in the way the distribution of the random variables is being described. For discrete random variables one may list the probability associated with each value in the sample space using a table, a formula, or a bar plot. For continuous random variables, on the other hand, probabilities are assigned to intervals of values, and not to specific values. Thence, densities are used in order to display the distribution.

A probability distribution for a continuous random variable \(X\) is written as a function, \(f(x)\) which is called a probability density function (plf). A density function has the following properties:

\(f(x) \ge 0\) for all \(x\) (so that the curve cannot go below the horizontal axis)
the total area under the density curve is equal to \(1\)

In continuous random variables integration replaces summation and the density replaces the probability in the computation of quantities such as the probability of an event, the expectation, and the variance.

The mean of a discrete random variable is given in the formula \(\mu = \sum x \cdot p(x)\), and the mean of a continuous random variable is the same formula with the summation replaced by integration

\[E(X) = \int x \cdot f(x)dx\] where \(f(x)\) is the probability density function of \(X\).

NOTE: We will not actually be integrating any functions, this is for your information only.

Likewise, the variance of a discrete random variable is
\(Var(X) = \sigma^2 = \sum (x-\mu)^2 \cdot p(x)\) and the variance for a continuous random variable is replaced by: \[Var(X) = \sigma^2 = \int (x-\mu)^2 \cdot f(x)dx\]

The PDF

The Uniform distribution is used in order to model measurements that may have values in a given interval, with all values in this interval equally likely to occur.

A random variable \(X\) that has a uniform distribution over the interval \((a,b)\) is denoted by \(X \sim \text{Unif}(a,b)\). The probability density function (pdf) for the uniform distribution is

\[f(x) = \frac{1}{b-a} \text{ for } a \le x \le b \text{ and } f(x) = 0 \text{ elsewhere}\] which can be displayed graphically as

Notice that the uniform density follows the two properties listed above. It is non-negative for all \(x\) and the area under the density curve is 1 (recall that the area of a square is base times height).

Example 1:

\(X\) is a uniform random variable on the interval \((3,7)\). Find the probability density function of \(X\).

Click For Answer

The height of the density curve must be equal to the reciprocal of the base of the density curve so that the area under the density curve is 1, since area = base times height. The base is equal to \(4 = 7 - 3\) so the height must be \(\frac{1}{4}=0.25\). Hence, the probability density function is \[f(x) = 0.25 \text{ for } 3 \le x \le 7 \text{ and } f(x) = 0 \text{ elsewhere}\]

The CDF

The probability that a continuous random variable takes on values over an interval is the area under the curve along that interval. We can find this area using integration (not required for this course) or we can use the formula for the area of a rectangle, i.e. area equals base times height.

The cumulative distribution function (cdf) of the Uniform distribution on the interval \((a,b)\) is

\[F(x_0) = P(X \le x_0) = \int_{- \infty}^{x_0} f(x)dx = \int_{a}^{x_0} \frac{1}{b-a}dx = (x_0-a)\cdot \frac{1}{b-a} = \text{ base} \cdot \text{height}\]

This function can be computed using the R command punif(x,a,b)

Suppose that \(X\) has a probability distribution with density function,

\[f(x)=0.5 \text{ for } 0 \le x \le 2 \text{ and } 0 \text{ elsewhere.}\]

It’s easy to see that \(P(X \le 1) = 0.5\) since it takes up half the area under the curve. We can also check our answer using the formula for the area of a rectangle or by integrating the density over the interval of interest \[ P(X \le 1) = \int_1^2 0.5 dx = 0.5 = (1-0) \cdot 0.5 = \text{base} \cdot \text{height}\] or we can use R

> punif(1,0,2)

[1] 0.5

It is very important to note that \(P(X=a) = 0\) when \(X\) is a continuous random variable. Why is this so?

the area under the curve at one point is zero, i.e. \(\int_a^a f(x)dx = 0\)
you can always go out another decimal place with a continuous random variable

Example 2:

The amount of time, in minutes, that a person must wait for a bus is uniformly distributed between 0 and 15 minutes, inclusive.

(a). Find the probability that a randomly selected person waits

less than 10 minutes for a bus
more than 5 minutes for a bus
between 3 and 5 minutes for a bus
more than 6 minutes for a bus given that he waits at least 4 minutes

(b). Find the median time that a randomly selected person waits for a bus

Click For Answer

Let \(X=\) the time a randomly selected person waits for a bus. So \(X \sim \text{Unif}(0,15)\) and \[f(x) = \frac{1}{15} \text{ for } 0 \le x \le 15\] (a).

\(P(X<10)=\frac{10-0}{15-0}=0.67=\text{punif}(10,0,15)\)
\(P(X>5)=\frac{15-5}{15-0}=0.67=1-P(X \le 5)=1-\text{punif}(5,0,15)\)
\(P(3 \le X \le 5)=\frac{5-3}{15-0}=0.13=P(X \le 5) - P(X \le 3) = \text{punif}(5,0,15)-\text{punif}(3,0,15)\)
\(P(X>6 | X \ge 4)=\frac{P(X>6 \ \cap \ X \ge 4)}{P(X \ge 4)}=\frac{P(X > 6)}{P(X \ge 4)} = \frac{1-\text{punif}(6,0,15)}{1-\text{punif}(4,0,15)}=0.82\)

(b). The median is the \(50^{th}\) percentile of the distribution so we want to find \(k\) such that \(P(X < k) = 0.5\). We know that \(P(X<k)=(k − 0) \cdot \frac{1}{15}\) so setting these equal to each other and solving for \(k\) gives \(k=7.5\). You can also do this in R, qunif(0.5,0,15)