Functions and Summary Statistics of Discrete Probability Distributions

Author

Dr Andrew Dalby

The equivalent to the mean as a summary statistic for a probability distribution is the expected value or the expectation. This will be the mean value of the random variable X. For example this could be the expected number of cytosines in 4 DNA bases drawn at random.

x 0 1 2 3 4
P(X=x) 81/256 108/256 54/256 12/256 1/256
x(P(X=x)) 0 108/256 108/256 36/256 4/256

Which is 1.

Formal Mathematical Definition of the Expectation

The expectation of a discrete random variable X will be:

\[E(X)=\sum xf(x) \]

For the binomial distribution the expectation can be simplified to:

\[ E(X) = np \]

As we expect the event to happen np times in n trials with a probability of p. That would have been a simpler method for calculating the expectation of the number of cytosines in 4 DNA bases drawn at random!

As the expectation depends on X and P(X=x) is a distinct function, if we transform the values of X then the probabilities remain the same. This results in the general formula.

\[ E(g(X)=\sum g(x)f(x) \]

If g(X) is a multiplication then the expectation is multiplied by the same value. If it is the addition of two functions then the expectation of the total will be the sum of the expectations of the two terms.

\[ E(aX)=a E(X) \]

\[ E(g(X)+h(X))=E(g(X))+E(h(X)) \]

Formal Mathematical Definition of the Variance

\[ Var(X)=E(X-\mu)^{2} \]

where \(\mu = E(X)\)

This can be rewritten as:

\[ Var(X)= E(X^{2})-\mu^{2} \]

For the binomial distribution:

\[ Var(X)= np(1-p) \]

Example Problems

In a specific population 20% of members have the Rhesus negative blood type. Find to 2 significant figures the probability of there being more than one of this blood group in a random sample of six people from this population.

This is an example of a binomial distribution problem where the probability of success is 0.2 and the number of trials is 6.

We could calculate the probability for there being 1,2,3,4,5 and 6 Rhesus negative members of the sample, but it is easier to calculate the probability of their being 0 and 1 and subtract this from one. In R this is calculated using the pbinom function that will calculate the cumulative probability.

answer <- 1-pbinom(1,6,0.2)
signif(answer,2)
[1] 0.34