Probabilities and Frequency Distributions
POLS 3316: Statistics for Political Scientists)

Tom Hanna

2023-10-01

Agenda and Announcements

Video using statistics so far with demonstration data set
Problem Set 2: Practice with means, medians, modes, variance, standard deviations, sets, probabilities
Quiz focus: variance, standard deviation, sets, probabilities

Recap

P(A) = \(\frac{Favorable Outcomes}{All Possible Outcomes}\) (where “favorable” means outcomes where A occurs)
P(A) is the proportion we expect if we repeat the process a large number of times
Proportion of elements or subsets to the sample space
Important terms: union, intersection, mutually exclusive, empty set, complement, event, disjoint
Probability rules: 0 to 1. P(Something) = 1. etc.
Ended with an example of probability of non-mutually exclusive sets

Non-mutually exclusive sets

Mutually exclusive sets: No elements in common
Non-mutually exclusive sets: Sets with elements in common

Non-mutually exclusive sets

The probability for the union of mutually exclusive sets was the sum of their probabilities:
P(A \(\cup\) B) = P(A) + P(B)

Non-mutually exclusive sets

The probability for the union of mutually exclusive sets was the sum of their probabilities: P(A \(\cup\) B) = P(A) + P(B)
With non-mutually exclusive sets this results in double counting of the shared elements, so… what can happen?

Non-mutually exclusive sets

The probability for the union of mutually exclusive sets was the sum of their probabilities: P(A \(\cup\) B) = P(A) + P(B)
With non-mutually exclusive sets this results in double counting of the shared elements, so… what can happen?
S = {1,2,3,4,5,6,7,8,9,10}
A = {1,2,3,4,5,6}
B = {5,6,7,8,9,10}

Non-mutually exclusive sets

The probability for the union of mutually exclusive sets was the sum of their probabilities: P(A \(\cup\) B) = P(A) + P(B)
With non-mutually exclusive sets this results in double counting of the shared elements, so…
It could lead to P > 1

Another example

S = {C,D,E,F,G,H,W,Z} (8 possibilities)
A = {C,D,E,W,Z} (5 favorable)
B = {D,E,F,G,H} (5 favorable)
P(A) = 5/8 and P(B) = 5/8
P(A \(\cup\) B) = 10/8 violates the rule P = 0 to 1
{D,E} got counted twice
{D,E} is W \(\cap\) Z

Solution

So for non-exclusive pairs, our formula is:

P(A \(\cup\) B) = P(A) + P(B) - P(A \(\cap\) B)

Note that this actually works for mutually exclusive sets too because for mutually exclusive sets, the intersection is the empty set and has P = 0

Independence

For now, problems will assume independence unless explicitly specified *
Non-independent (conditional) events are covered by Bayes Rule
You don’t have to worry about Bayes Rule for now except to understand that it exists and applies to non-independent events aka conditional events
Independent events are unrelated
If learning the probability of one event (A) doesn’t affect the probability of the other event (B), they are independent
Example: If I draw a number from a hat, then flip a coin the outcome of the draw doesn’t affect the outcome of the coin flip.
P(A \(\cap\) B) = P(A)P(B) for independent events
The probability rules for non-independent events, called conditional probability, are different

Independence

For now, problems will assume independence unless explicitly specified

Independence

For now, problems will assume independence unless explicitly specified
Non-independent (conditional) events are covered by Bayes Rule

Independence

For now, problems will assume independence unless explicitly specified
Non-independent (conditional) events are covered by Bayes Rule
You don’t have to worry about Bayes Rule for now except to understand that it exists and applies to non-independent events aka conditional events

Independence

For now, problems will assume independence unless explicitly specified
Non-independent (conditional) events are covered by Bayes Rule
You don’t have to worry about Bayes Rule for now except to understand that it exists and applies to non-independent events aka conditional events
Independent events are unrelated

Independence

For now, problems will assume independence unless explicitly specified
Non-independent (conditional) events are covered by Bayes Rule
You don’t have to worry about Bayes Rule for now except to understand that it exists and applies to non-independent events aka conditional events
Independent events are unrelated
If learning the probability of one event (A) doesn’t affect the probability of the other event (B), they are independent

Independence (Continued)

Independent events are unrelated
If learning the probability of one event (A) doesn’t affect the probability of the other event (B), they are independent
Example: If I draw a number from a hat, then flip a coin the outcome of the draw doesn’t affect the outcome of the coin flip.

Independence

For independent events

P(A \(\cap\) B) = P(A)P(B) for independent events

Independence

P(A \(\cap\) B) = P(A)P(B) for independent events

Example: If I draw a number from a hat with the numbers 1 to 5, then flip a coin the outcome of the draw doesn’t affect the outcome of the coin flip.
H = {1,2,3,4,5}
C = {head,tail}
What is P(1 \(\cap\) tail)?

Possible Test Question

What if on a short answer test question, I ask: “Event A and Event B are not independent. How would you determine the conditional probability of event A given event B?” What would you answer?

Why does the hat draw-coin flip example work?

If we create a set of all possible O it looks like this:

O = {1+head,2+head,3+head,4+head,5+head,1+tail,2+tail,3+tail,4+tail,5+tail}

Why does the formula work?

If we create a set of all possible O it looks like this:

O = {1+head,2+head,3+head,4+head,5+head,1+tail,2+tail,3+tail,4+tail,5+tail}
(1 \(\cap\) tail) or {1+tail} is one event

Why does the formula work?

If we create a set of all possible O it looks like this:

O = {1+head,2+head,3+head,4+head,5+head,1+tail,2+tail,3+tail,4+tail,5+tail}
(1 \(\cap\) tail) or {1+tail} is one event
P(1 \(\cap\) tail) = 1 favorable / 10 possible

Back to the test question

Answers:

I would apply Bayes Rule.
OR
I would construct a sample space and determine the probabilities with set theory. (Note: This is what Bayes Rule actually does.)

Why do we use data?

Purpose: analyzing data for causal inference (to begin to make statements about cause and effect - inferring causes)
Complex and uncertain data requires that we make…

Assumptions about the data

Because the world is complex, to make sense of unknowns we make assumptions about data
The assumptions are useful approximations even when not preceisely true
We still need to check that the real data does not seriously violate the assumptions

Data Assumptions: Random, Independent, and Identically Distributed

Randomness and independence matter as assumptions about data
Specifically, these are assumptions about the Data Generating Process or DGP
The Data Generating Process: the way the world produces the data

The Data Generating Process

The source of the data matters - the DGP matters
Previously stated: Data comes from a random world
So the DGP is random

Independence and Distribution

Events in the data are independent and identically distributed - the IID assumption

Independence and Distribution

Events in the data are independent and identically distributed - the IID assumption
Independence is statistical independence - the outcome of one event does not affect our belief about the probability of another event
The hat draw does not affect the coin toss
X does not affect Y

If X does affect Y, we may begin to infer some direct or indirect causal relationship in some direction somewhere possibly through one or more additional variables, but not necessarily that X causes Y. This is commonly shortened to the not quite accurate summary “correlation does not imply causation.”

Independence and Distribution

Events in the data are independent and identically distributed - the IID assumption
Independence is statistical independence - the outcome of one event does not affect our belief about the probability of another event
Identically distributed: drawn from the same probability distribution

So…

Introduction to distributions

R has functions for at least 20 distributions
The most important is the normal distribution
This is because of the central limit theorem
We will look at these in the most detail: normal, binomial, uniform, poisson

Distribution examples

The following are histograms
They represent the frequency or simply the number count of observations for each value
For example, if the value 4 shows 500, it means there that 4 came up 500 times in the data
The graphs were produced by generating random numbers based on the particular distribution with an R function

Uniform distribution

All outcomes are equally likely

Uniform distribution

All outcomes are equally likely

Uniform distribution: with code

All outcomes are equally likely

rand.unif <- runif(100000, min = 0, max = 10)
hist(rand.unif, breaks = 20, freq = TRUE, main = "uniform distribution of 100,000 random draws", xlab = 'x', col = "red")

Normal Distribution

symmetrical around its mean with most values near the central peak
width is a function of the standard deviation
Other names: Gaussian distribution, bell curve

Normal Distribution

Normal Distribution: with code

rand.norm<- rnorm(100000)
hist(rand.norm, breaks = 200, freq = TRUE, main = "normal distribution, sd = 1, 100,000 random draws", xlab = 'x', col = "red")

Binomial Distribution

binary
success/failure
yes/no
distribution for a number of Bernoulli trials

Binomial example

n = 1 makes this a Bernoulli distribution

Binomial example: with code

n = 1 makes this a Bernoulli distribution

rand.binom<- rbinom(100000,1,.5)
hist(rand.binom, breaks = 200, freq = TRUE, main = "binomial distribution, p = .5, 1 trial, 100,000 draws", xlab = 'x', col = "red")

Binomial example: with code

trials = 25

rand.binom2 <- rbinom(100000,25,.5)
hist(rand.binom2, breaks = 200, freq = TRUE, main = "binomial distribution, p = .5, 25 trials, 100,000 draws", xlab = 'x', col = "red")

Preview of the Central Limit Theorem

What happens if we do the same thing above but do it 1,000 times and plot the counts?

Preview of the Central Limit Theorem

Preview of the Central Limit Theorem: code

rand.binom3<- rbinom(100000,1000,.5)
hist(rand.binom3, breaks = 200, freq = TRUE, main = "Histogram of binomial distribution, p = .5, 1000 trials, 100,000 draws", xlab = 'x', col = "red")

Preview of the Central Limit Theorem

For sufficiently large sample sizes, the distribution of sample means approximates a normal distribution
This means with a large enough number of trials, we can apply the normal distribution to know things about measures of central tendency, measures of dispersion, and probabilities
Sample sizes above 30
This is just a preview

68-95-99.7 Rule

One of the rules for normal distributions is:

The 68-95-99.7 rule

68% of the data is within 1 standard deviation of the mean
95% of the data is within 2 standard deviations of the mean
99.7% of the data is within 3 standard deviations of the mean

Preview of the Law of Large Numbers

The law of large numbers tells us that if we repeat an experiment a large number of time, the average of the results will be close to the expected value
This allows us to apply the actual mean of the sample to the expected mean of the population

Poisson distribution

Count of number of events in a fixed time/space
Knownconstant mean rate (We know how often they occur on average)
Independent of time since last event

Poisson distribution

Poisson distribution: with code

rand.poiss<- rpois(100000,1)
hist(rand.poiss, breaks = 200, freq = TRUE, main = "poisson distribution, lambda = 1, 100,000 draws", xlab = 'x', col = "red")

Why we can’t use standard OLS regression for other DGP

We base the likelihood of something being significant on the proximity to the mean
As things get further from the mean in a normal distribution, they become less likely

Why we can’t use standard OLS regression for other DGP

Authorship and License

Author: Tom Hanna
Website: tomhanna.me
License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.</>

Probabilities and Frequency Distributions POLS 3316: Statistics for Political Scientists)

Agenda and Announcements

Recap

Non-mutually exclusive sets

Non-mutually exclusive sets

Non-mutually exclusive sets

Non-mutually exclusive sets

Non-mutually exclusive sets

Another example

Solution

Independence

Independence

Independence

Independence

Independence

Independence

Independence (Continued)

Independence

Independence

Possible Test Question

Why does the hat draw-coin flip example work?

Why does the formula work?

Why does the formula work?

Back to the test question

Why do we use data?

Assumptions about the data

Data Assumptions: Random, Independent, and Identically Distributed

The Data Generating Process

Independence and Distribution

Independence and Distribution

Independence and Distribution

Introduction to distributions

Distribution examples

Uniform distribution

Uniform distribution

Uniform distribution: with code

Normal Distribution

Normal Distribution

Normal Distribution: with code

Binomial Distribution

Binomial example

Binomial example: with code

Binomial example: with code

Preview of the Central Limit Theorem

Preview of the Central Limit Theorem

Preview of the Central Limit Theorem: code

Preview of the Central Limit Theorem

68-95-99.7 Rule

Preview of the Law of Large Numbers

Poisson distribution

Poisson distribution

Poisson distribution: with code

Why we can’t use standard OLS regression for other DGP

Why we can’t use standard OLS regression for other DGP

Why we can’t use standard OLS regression for other DGP

Authorship and License

Probabilities and Frequency Distributions
POLS 3316: Statistics for Political Scientists)