Geometric Distribution

0.1 Geometric Distribution

0.1.1 Geometric Distribution

The Geometric distribution is related to the Binomial distribution in that both are based on independent trials in which the probability of success is constant and equal to $p$.
However, a Geometric random variable is the number of trials until the first failure, whereas a Binomial random variable is the number of successes in n trials.

0.1.2 Type I Geometric Distribution

Geometric distributions model (some) discrete random variables. Typically, a Geometric random variable is the number of trials required to obtain the first failure, for example, the number of tosses of a coin until the first ‘tail’ is obtained, or a process where components from a production line are tested, in turn, until the first defective item is found.
A discrete random variable X is said to follow a Geometric distribution with parameter p, written $X \sim Ge(p)$, if it has probability distribution \[P(X=x) = p^{x-1}(1-p)^x\] where
$x = 1, 2, 3, \ldots$
p = success probability; $0 < p < 1$
The trials must meet the following requirements:

the total number of trials is potentially infinite; there are just two outcomes of each trial; success and failure; the outcomes of all the trials are statistically independent;

all the trials have the same probability of success.

The Geometric distribution has expected value and variance \[E(X)= 1/(1-p)\] \[V(X)=p/{(1-p)^2}\].

Geometric distributions model (some) discrete random variables. Typically, a Geometric random variable is the number of trials required to obtain the first failure, for example, the number of tosses of a coin until the first ‘tail’ is obtained, or a process where components from a production line are tested, in turn, until the first defective item is found.
A discrete random variable X is said to follow a Geometric distribution with parameter p, written $X \sim Ge(p)$, if it has probability distribution \[P(X=x) = p^{x-1}(1-p)^x\] where
$x = 1, 2, 3, \ldots$
p = success probability; $0 < p < 1$ \end{itemize}
The trials must meet the following requirements:

the total number of trials is potentially infinite; there are just two outcomes of each trial; success and failure; the outcomes of all the trials are statistically independent;

all the trials have the same probability of success. \end{itemize} * The Geometric distribution has expected value and variance \[E(X)= 1/(1-p)\] \[V(X)=p/{(1-p)^2}\].

The Geometric distribution is related to the Binomial distribution in that both are based on independent trials in which the probability of success is constant and equal to $p$.
However, a Geometric random variable is the number of trials until the first failure, whereas a Binomial random variable is the number of successes in n trials. \end{itemize}

%——————————————————————%

The geometric distribution is used for Bernouilli Trials, where there outcome are classified as either failures or successes. \end{itemize}

In probability theory, the geometric distribution is either of two discrete probability distributions:

The probability distribution of the number of trials needed to get first success, supported on the set $\{ 1, 2, 3, \ldots\}$
The probability distribution of the number of failures before the first success, supported on the set $\{ 0, 1, 2, 3, \ldots\}$
Which of these one calls “the” geometric distribution is a matter of convention and convenience. A solution for one can quickly be surmised from the other.
These two different geometric distributions should not be confused with each other.
Often, the name shifted geometric distribution is adopted for the former one (distribution of the number X);
however, to avoid ambiguity, it is considered wise to indicate which is intended, by mentioning the support explicitly.
It’s the probability that the first occurrence of success require k number of independent trials, each with success probability p.
If the probability of success on each trial is p, then the probability that the kth trial (out of k trials) is the first success is \[ P(X = k) = (1-p)^{k-1}\,p\, \phantom{space} for k = 1, 2, 3, \ldots \]
The above form of geometric distribution is used for modeling the number of trials until the first success.
By contrast, the following form of geometric distribution is used for modeling number of failures until the first success: \[ P(Y=k) = (1 - p)^k\,p\, \phantom{space} for k = 0, 1, 2, 3, \ldots\]

In either case, the sequence of probabilities is a geometric sequence.

For example, suppose an ordinary die is thrown repeatedly until the first time a “1” appears.
The probability distribution of the number of times it is thrown is supported on the infinite set ${ 1, 2, 3, \ldots }$ and is a geometric distribution with p = 1/6.

The expected value of a geometrically distributed random variable X is 1/p and the variance is $(1 - p)/p^2$: \[ \mathrm{E}(X) = \frac{1}{p}, \qquad\mathrm{var}(X) = \frac{1-p}{p^2}. \] * Similarly, the expected value of the geometrically distributed random variable Y (where Y corresponds to the pmf listed in the right column) is (1 - p)/p, and its variance is (1 - p)/p2: \[ \mathrm{E}(Y) = \frac{1-p}{p}, \qquad\mathrm{var}(Y) = \frac{1-p}{p^2}.\]

\end{itemize}

Now consider an experiment with only two outcomes. Independent repeated trials of such an experiment are called Bernoulli trials, named after the Swiss mathematician Jacob Bernoulli (16541705). * The term means that the outcome of any trial does not depend on the previous outcomes (such as tossing a coin).
We will call one of the outcomes the success" and the other outcome thefailure”. \end{itemize}

Suppose that I am at a party and I start asking girls to dance. Let X be the number of girls that I need to ask in order to find a partner. If the first girl accepts, then X=1. If the first girl declines but the next girl accepts, then X=2. And so on.
When X=n, it means that I failed on the first n-1 tries and succeeded on the nth try. My probability of failing on the first try is (1-p). My probabilty of failing on the first two tries is (1-p)(1-p).
My probability of failing on the first n-1 tries is (1-p)n-1. Then, my probability of succeeding on the nth try is p. Thus, we have

\[ P(X = n) = (1-p)^{n-1}p \]

This is known as the geometric distribution. When you have a sequence of numbers in which the (n+1)th number is a multiple of the nth number, it is called a geometric sequence. In this case, P(X = n+1) is a multiple of P(X = n). (What is that multiple?)
What is the probability that it will take more than n tries to succeed? We know that if I ask an infinite number of girls to dance, eventually one of them will accept. So, the probability that it will take more than n tries is the same as the probability that I fail n times. That is,

\[ P(X > n) = (1-p)^n \]

If X is geometric with parameter p, what is E(X)?

We are faced with an infinite sum. Multiplying X times P(X) for X = 1, 2, 3, … gives

\[ [1] S = p + 2p(1-p) + 3p(1-p)2 +...+np(1-p)n-1 \] * Multiply both sides by (1-p) and you have

[2] (1-p)S = p(1-p) + 2p(1-p)2 + 3p(1-p)3 +…+np(1-p)n

%——————————————————————————-%

Subtracting [2] from [1] gives

\[S - (1-p)S = pS = p[1 + (1-p) + (1-p)2 + ...(1-p)n] = p(1/p) = 1 S = 1/p \]

Therefore, the mean of the geometric distribution is equal to 1/p. If we are trying to estimate how many girls I will have to ask to dance until I find a partner, and p, the probability of one girl accepting, is .2, then on average I will have to ask five girls. You will not have to know it, but for the record, the variance of the geometric distribution is (1-p)/p2.

%———————–

The formulae for geometric distribution is

%P(X=k) = (1-p)^{k-1} p^k%

% P(X ) = ?

%P(X=k) = (1-0.2)^{4-1} ^4%

\end{itemize}

%————————————————————-%

Geometric distributions model (some) discrete random variables.
Typically, a Geometric random variable is the number of trials required to obtain the first .
For example, the number of tosses of a coin untill the first ‘tail’ is obtained, or a process where components from a production line are tested, in turn, until the first defective item is found.

\end{itemize}

A Geometric random variable is the number of trials until the first , whereas a Binomial random variable is the number of successes in $n$ trials. \end{itemize}

%————————————————————-%

A discrete random variable X is said to follow a Geometric distribution with parameter p, written \[X \sim Geo(p),\] if it has probability distribution \[P(X=x) = p^{x-1}(1-p)^x\] where

$x = 0, 1, 2, 3, \ldots$
p = success probability; $0 < p < 1$ \end{itemize}

\[ P(X = n) = (1-p)^{n-1}p \]

\[ P(X > n) = (1-p)^n \]

%————————————–%

$ E[X] = 1/p $
The variance of the geometric distribution is \[Var(X) = (1-p)/p^2\].

\end{itemize}

%————————————————————-%

The trials must meet the following requirements:

the total number of trials is potentially infinite; there are just two outcomes of each trial; success and failure; the outcomes of all the trials are statistically independent;

all the trials have the same probability of success. \end{itemize}

%————————————————————-%

The Geometric distribution has expected value and variance \[E(X)= {1\over(1-p)}\] \[V(X)=\frac{p}{{(1-p)^2}}\].

\end{itemize}

%————————————————————-%

The Geometric distribution is related to the Binomial distribution in that both are based on independent trials in which the probability of success is constant and equal to $p$.
However, a Geometric random variable is the number of trials until the first , whereas a Binomial random variable is the number of successes in $n$ trials. \end{itemize}

%————————————————————-% %=========================================================%

Geometric distributions model (some) discrete random variables. Typically, a Geometric random variable is the number of trials required to obtain the first failure, for example, the number of tosses of a coin untill the first ‘tail’ is obtained, or a process where components from a production line are tested, in turn, until the first defective item is found.

%=========================================================%

A discrete random variable X is said to follow a Geometric distribution with parameter p, written $X \sim Ge(p)$, if it has probability distribution \[P(X=x) = p^{x-1}(1-p)^x\] where

$x = 1, 2, 3, \ldots$
p = success probability; $0 < p < 1$ \end{itemize}

%=========================================================%

The trials must meet the following requirements:

the total number of trials is potentially infinite; there are just two outcomes of each trial; success and failure; the outcomes of all the trials are statistically independent;

all the trials have the same probability of success. \end{itemize}

%=========================================================%

The Geometric distribution has expected value and variance \[E(X)= 1/(1-p)\] \[V(X)=p/{(1-p)^2}\].

The Geometric distribution is related to the Binomial distribution in that both are based on independent trials in which the probability of success is constant and equal to $p$.

However, a Geometric random variable is the number of trials until the first failure, whereas a Binomial random variable is the number of successes in n trials.

%————————————–%

\[ P(X = n) = (1-p)^{n-1}p \]

\[ P(X > n) = (1-p)^n \]

The expeced value is $ E[X] = 1/p $
The variance of the geometric distribution is \[Var(X) = \frac{1-p}{p^2}\].

\end{itemize}

The following conditions characterize the hypergeometric distribution: The result of each draw (the elements of the population being sampled) can be classified into one of two mutually exclusive categories (e.g. Pass/Fail or Female/Male or Employed/Unemployed). The probability of a success changes on each draw, as each draw decreases the population (sampling without replacement from a finite population).

A random variable X follows the hypergeometric distribution if its probability mass function (pmf) is given by[1] \[ P(X = k) = \frac{\binom{K}{k} \binom{N - K}{n-k}}{\binom{N}{n}},\] where

N is the population size,
K is the number of success states in the population,
n is the number of draws,
k is the number of observed successes,
$\textstyle {a \choose b}$ is a binomial coefficient. \end{itemize}

When sampling is done without replacement of each sampled item taken from a finite population of items, the Bernoulli process does not apply because there is a systematic change in the probability of success as items are removed from the population.

When sampling without replacement is used in a situation that would otherwise qualify as a Bernoulli process, the hypergeometric distribution is the appropriate discrete probability distribution.
Given that X is the designated number of successes, N is the total number of items in the population, T is the total number of successes included in the population, and n is the number of items in the sample, the formula for determining hypergeometric probabilities is \end{itemize}

Two types of groups
Select $k$ from Group 1
Select $n-k$ from group 2. \end{itemize}

\[ \frac{ {n_1 \choose k_1}\times {n_2 \choose k_2} }{{n_T \choose k_T}} \]

$k_T = k_1 + k_2$
$n_T = n_1 + n_2$
Suppose we have to selected a group of 8 people from 18.
Of these 18 people, 10 are males and 8 are females.
What is the probability that the committee contains 5 females \end{itemize}

\end{document}

In the last class, we looked at how to compute the mean, variance and standard deviation.

As these are key outcomes of this part of the course, we shall briefly go over this material again

The mean (i.e. average) value is denoted with a bar over the set name i.e. ” “.

  (pronounced “x bar”)  is the sample mean.

%=================================================%

Example

A sample data set comprised of five values.

What is the sample mean value of data set ” ” (i.e. What is ?)

The sample mean is 44

%=================================================%

Variance How do we calculate the variance? We can use scientific calculators or we can calculate it by hand using the following formula :

We are calculating the difference between each observation x and the mean . Remark : The mean is used in the calculation. Some of the differences will be positive and some will be negative so we square the differences to make them all positive.

%=================================================%

An easier formula to use if you are calculating the sample standard deviation by hand is
The population variance (which is rarely know) is denoted by the Greek letter (sigma squared).
Important :The standard deviation $\sigma$ is the square root of the variance $\sigma^2$. \end{itemize}

\end{document}

0.1.3 Videos

Geometric Distribution with Chi Square Test for Goodness of Fit