Basic Concepts of Probability

An event is a set of possible outcomes of an experiment.The probability of an event is what we expect its relative frequency to approach as we run the experiment a large number of times.These values can arise from several sources.

Probabilities are always numbers in the range \(0\) to \(1\).


If the elementary outcomes of an experiment are discrete and all equally likely, the theoretical probability of an event is defined as

\[\frac{\text{Number of elementary outcomes in the event}}{\text{Total number of elementary outcomes}}\]

As an example, consider the probability of getting a head when you flip a fair coin. There are two elementary outcomes of this experiment, a head, or a tail. Since the coin is fair, the outcomes are equally likely. There are two possible outcomes and only one is in our event. So, the probability is \(1/2\). In everyday language, you may hear “fifty-fifty.” To stick with the language of probability we say that the probability is \(.5\).

Using equally likely elementary outcomes with discrete experiments is clearly advantageous. Consider the probability of getting a sum of seven dots when rolling a pair of dice. If we think of the elementary outcomes of this experiment as the numbers between two and twelve, the outcomes are not equally likely. There is only one way to get a sum of two and only one way to get sum of twelve, but there are many ways to get a sum of seven. There are 36 possible equally likely elemementary outcome when we consider the orderd pairs of outcomes describing what happened to die 1 and what happened to die 2. These are \((2,5),(5,2),(3,4),(4,3),(6,1) \text{ and } (1,6)\). Since there are six of them, the probabiliy of getting a seven


This is really just noting that you can describe some empirical facts using the language of probability. An example or two should help.

Based on the Current Population Survey, about 60% of the US population over the age of 16 meets the definition of participating in the labor force. We can say that if a member of the population over the age of 16 were picked at random, the probability of that person being a labor force participant is about .6.

Based on data from the CDC, there were 2,515,458 deaths in the US in 2010. Of those the primary cause of death was heart disease in 596,577 cases. We can say that if we were to pick a 2010 death certificate at random the probability of the death having a primary cause of heart disease is .237.


This is really just the using the language of probability to express a personal degree of belief. Here are some examples based on my own personal experiences.

  • I believe that the probability of having a class cancelled this winter because of snow is about .5.

  • I believe that the probability of my laptop having a hardware failure during the next 6 months is less than .05.

Random Variables

Some experiments have numerical outcomes. The numerical outcomes are called random variables.

Discrete Random Variables

For example, If I flip a fair coin three times and record the number of heads I get as the outcome of the experiment, I have a random variable with four possible values - \(0, 1, 2 \text{ and }3\). Such a random variable, with a finite number of possible valuables is called discrete.

Continuous Random Variables

Random variables can also be continuous, as opposed to discrete. For example, suppose the experiment is to pick a rock at random from a large pile of rocks and record the weight. The number of possible values is conceptually infinite, although it would be recorded only to the degree of accuracy provided by our scales.

The Distribution of Probability

Random variables have functions which describe the relative frequency with which they take on their range of possible values.

Probability Distribution Function

In the case of a discrete random variable, this function is called a distribution function.

For example consider the simple experiment of flipping a fair coin once and counting the number of heads as the outcome. There are two possible values of the random variable \(0\) and \(1\). If the distribution function is denoted \(d()\), we have \(d(0)=.5\) and \(d(1)=.5\).

The other example above, flipping a fair coin three times and counting th enumber of heads, is a little more complicated. However it fits the theoretical requirements for a binomial random variable.

Binomial Random Variables

The following requirements define a binomial random variable.

The binomial probability didsrtibution function gives us the probability of exactly \(x\) successes given the values of \(n\) and \(p\).


Fortunately, you will never have to use this formula. R provides the function dbinom() to do this calculation. For example, to calculate the probability of exactly 5 successes out of 10 independent trials with a constant probability .2 of success on each trial, the following code will work.

## [1] 0.02642412

it is easy to get a table of all of the values of the probability distribution for this example as follows.

# Create a vector x with the possible values of the random variable.
x <- 0:10

# Use dbinom() to create the value of the probability of each value in x.
d <- dbinom(x,10,.2)

# Put x and d together in a dataframe for display purposes.
df <- data.frame(x,d)

#Display the probability distribution.
##     x            d
## 1   0 0.1073741824
## 2   1 0.2684354560
## 3   2 0.3019898880
## 4   3 0.2013265920
## 5   4 0.0880803840
## 6   5 0.0264241152
## 7   6 0.0055050240
## 8   7 0.0007864320
## 9   8 0.0000737280
## 10  9 0.0000040960
## 11 10 0.0000001024

The drudgery that students of 1960 experienced in doing the computations with the formula above is no longer necessary. Doing these computations by hand does not increase your insight into statistics. The relative ease of invoking dbinom() leads some people to fear that “You can’t understand it if you let a computer do all the work.” There are two human parts of this kind of problem.

This first of these takes experience and is where you need to really use your mind. The second is what you have just learned.

To help you gain some skill in the first human task, here are some examples. For each of these you should decide if the binomial distribution fits and identify the values on n and p.

Click here for my explanation. You may have to right click and select “open in a new tab or window” depending on the form in which you’ve received this document.

Continuous Random Variables - The Normal Distribution

A continuous random variable takes on any numerical value in some interval, which may stretch all the way from \(-\infty\) to \(+\infty\). Every continuous random variable has a probability density function which describes how probability is distributed among its possible values. This function is different from the probability distribution function of a discrete random variable. Note the words “distribution” and “density.” The value of a probability density function is not a probability. In the case of a continuous random variable, the probability of an interval is given by the area under the density function above the interval. The probability of any specific value of a continuous random variable is always \(0\). For these random variables, probability is an area and a line segment has no area.

The most important type of continuous random variable has a normal probability density function.Such a random variable has a central peak with high probability for intervals near the peak and low probability for intervals far away from the peak. TThere are two parameters which chanracterize a normal random variable, the mean and the standard deviation. In the most important case, the standard normal distribution, the mean is \(0\) and the standard deviation is \(1\). The graph below shows a standard normal distribution and indicates that the probability of a value being less than 1.5. Strictly speaking it shows the probability of the interval from \(-4\) to \(+1.5\). However, the amount of probability (area) to the left of \(-4\) is essentially zero.

This is a function I created to do normal probability plots. It involves several tricks that I don’t expect you to learn. However, learning to use it is easy and I do expect you to do that.

MyNormProb <- function(
  lb = NA,                       # Lower bound
  ub = NA,                        # Upper bound
  mean = 0,                      # Mean 
  sd = 1,                        # Standard deviation
  MyLabel = "Standard Normal RV" # Description of the variable
  # This function produces a plot and the probability that
  # a normal random variable with the given mean and
  # standrard deviation falls between the given lower and 
  # upper bounds. 

if ( {lb <- mean - 4 * sd}
if ( {ub <- mean + 4 * sd}

x <- seq(-4,4,length=1000)*sd + mean
hx <- dnorm(x,mean,sd)

plot(x, hx, type="n", xlab=MyLabel, ylab="Density",
     main="Normal Distribution", axes=FALSE)

i <- x >= lb & x <= ub
lines(x, hx)
polygon(c(lb,x[i],ub), c(0,hx[i],0), col="red")

area <- pnorm(ub, mean, sd) - pnorm(lb, mean, sd)
result <- paste("P(",lb,"< ",MyLabel," <" ,ub,") =",
                signif(area, digits=3))
segments(x0 = mean, y0 = 0, x1 = mean, y1 = dnorm(mean,mean,sd))

v <- MyNormProb(ub=1.5,MyLabel="Standard Normal Distribution")

How does one find normal probabiities? You have already learned that this can be done with tables. To solve the problem above, you would locate 1.5 in the border of the normal curve table and read the value .933 in the body of the table. That process is no longer necessary. The result is available throug the pnorm() function. for example, consider the following.

## [1] 0.9331928

The pnorm() function gives you the area under the standard normal curve to the left of its input value. If you want the are to the right, you subtract from 1, which is the entire area under the curve.

## [1] 0.0668072

Note that because of symmetry, this is the same as the area to the left of \(-1.5\).

## [1] 0.0668072

Here are the graphical displays.

v <-MyNormProb(ub=1.5,MyLabel = "Standard Normal")

v <-MyNormProb(lb=1.5,MyLabel = "Standard Normal")

v <-MyNormProb(ub=-1.5,MyLabel = "Standard Normal") 

To get an area (probability) between two values, we get the area to the left of the larger value and subtract the area to the left of the smaller value. For example, supoose we want the area under the normal curve between .8 and 2.4.

pnorm(2.4) - pnorm(.8)
## [1] 0.2036579

And here’s the picture.

MyNormProb(lb=.8,ub=2.4,MyLabel="Standard Normal")

## [1] 0.2036579

Non-Standard Normal Distributions

Most random variables with a normal distribution don’t have means of $0% and standard deviations of \(1\). However, the family of normal distributions has a strong property of geometric similarity, which have allowed us to do everything by recasting the non-standard random variable as a standard normal random variable. Suppose we have to deal with a mean of 100 and a standard deviation of 10. We want to know the probbility that a randomly selected value will be less than 115. We think of 115 as being 1.5 standard deviations to the right of the mean. This is known as a z-score. Formally, we define the z-score as


This can be reversed if our work leads us to a z-score and we need the value or values on the original scale of measuremt.

\[ x = z\sigma + \mu\]

In this example, \(x=115\), \(\mu=100\) and \(\sigma=10\). Doing the arithmetic, we get \(z=1.5\). In the pre-R era, it was necessary to compute the z-value and refer to the standard normal table. However, pnorm() is equipped to receive the values of \(\mu\) and \(\sigma\) as optional parameters (mean and sd respectively) and do the necessary transformations internally.

## [1] 0.9331928
## [1] 0.9331928

Going Backwards

Sometimes we have to reverse the work done with pnorm(). The situation is that we have a probability either to the left or the right of a z-score and we want the z-score. For example, it is reasonable to ask what z-score would have 90% of the area to the left. With the table, we would look in the body of the table for .9 and read our answer from the borders of the table. With R, we use the qnorm() function, which is the inverse of the pnorm() function.

## [1] 1.281552

Note what happens when we put this result back into pnorm().

## [1] 0.9000001

We get a tiny bit of rounding error because the displayed result stopped short of displaying an obscene number of digits. We can make the result clearer by letting pnorm() have the result of qnorm() without human intervention.

## [1] 0.9

Note that qnorm() also accepts mean and sd as optional parameters, as does MyNormProb().

## [1] 119.2233