Prob. & Freq. Distributions
Variable Types, Correlation, Covariance
POLS 3316: Statistics for Political Scientists

Tom Hanna

2023-09-28

Agenda

  • This is October 4th and 9th lecture

  • Last things before hypothesis testing

      - Finish probability - m
      - Finish frequency distribution - m
      - Types of variables - a
      - Correlations, covariance - t
  • practice stats problems by hand

m = basis of methods of testing; a = assumptions about methods; t = what we are testing

Announcements and reminders

  • Problem Set 2 (exam practice): Due October 10

  • Exam in this classroom October 11

      - Calculator (no phones)
      - Pencils/pens
      - 1 page single sided of notes
      - blank scratch paper
      - Formula sheet will be printed on the test

Probabilities and Frequence Distributions Conclusion

Other probability topics

Pre-quiz Question 14: I flip a fair coin ten times, the probability that coin comes up heads exactly five times is:

Simpler example

  • Exactly 2 tails out of 3 flips:

  • Question: In general, using sets how would we start?

Simpler example

  • Exactly 2 tails out of 3 flips:

  • Question: In general, using sets how would we start?

  • Answer: define the sample space

  • Question: What is the sample space for this problem?

Simple example: Sample space

  • The sample space is:

S = {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH}

What is our total possible outcomes or the number of elements in the samples space?

Simple example: Sample space

  • The sample space is:

S = {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH}

What is our total possible outcomes or the number of elements in the samples space? 8

(Note that with 3 trials of two possible outcomes this is \(2^3\))

  • What is our next general step?

Simple example: Sample space

  • The sample space is:

S = {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH}

What is our total possible outcomes or the number of elements in the samples space? 8

(Note that with 3 trials of two possible outcomes this is \(2^3\))

  • Question: What is our next general step?

  • Answer: Define the event space

Simple example: Event space

  • The event space is possible ways that we can get exactly two tails
  • In probability, this is our favorable outcomes
  • What is our event space?

Simple example: Event space

  • What is our event space?
  • s = {TTH,THT,HTT}
  • Question: What is our favorable outcomes or number of elements in the event space?

Simple example: Event space

  • What is our event space?
  • s = {TTH,THT,HTT}
  • Question: What is our favorable outcomes or number of elements in the event space?
  • Answer: 3
  • Question: So what is the probability?

Simple example: Probability

  • P(A) = \(\frac{favorable_A}{all \; possible \; outcomes_A}\)

  • P(s) = \(\frac{favorable_S}{all \; possible \; outcomes_S}\)

  • favorable_S is the event space s

  • P(s) =

Simple example: Probability

  • P(A) = \(\frac{favorable_A}{all \; possible \; outcomes_A}\)

  • P(s) = \(\frac{favorable_S}{all \; possible \; outcomes_S}\)

  • favorable_S is the event space s

  • P(s) = \(\frac{3}{8}\)

Slightly more complicated example

  • If we wanted at least two tails
  • The sample space is unchanged with 8 elements
  • Q: What is the event space, \(s_b\)

Slightly more complicated example

  • If we wanted at least two tails
  • The sample space is unchanged with 8 elements
  • Q; What is the event space, \(s_b\)
  • A: \(s_b\) = {TTT,TTH,THT,HTT}
  • Q: What is the total elements?
  • Q: What is the probability?

Slightly more complicated example

  • If we wanted at least two tails
  • The sample space is unchanged with 8 elements
  • Q; What is the event space, \(s_b\)
  • A: \(s_b\) = {TTT,TTH,THT,HTT}
  • Q: What is the total elements? 4
  • Q: What is the probability? \(\frac{4}{8}\)

Bigger numbers

  • For bigger numbers this is too cumbersome to do by hand
  • But conceptually this is how we derive probabilities
  • Thinking in terms of sample and event spaces can be useful in understanding how to solve problems and in understanding probabilities we are given.
  • We have rules for combinations and permutations
  • If you master these rules you will get a better GRE score than me

Combinations and permutations

  • Permutations: Order matters

  • With repetition - number possible: \(n^r\)

  • Without repetition - Number possible: \(\frac{n!}{(n − r)!}\)

  • Combinations: Order doesn’t matter

  • Number possible: \(\binom{n}{r}\) = \(\frac{n!}{r!(n − r)!}\)

Question 14: Combinations

Question 14: I flip a fair coin ten times, the probability that coin comes up heads exactly five times is:

There are 10 flips, each with two possible outcomes. which means there are \(2^{10}\) possible outcomes. The number of favorable possibilities is the combinations that can result in 5 heads, found with the combinations formula

  • \(\binom{n}{r}\) or \(\binom{10}{5}\)
  • or \(\frac{n!}{r!(n − r)!}\)

Question 14

  • That yields this cumbersome formula:

\(\frac{10!}{(5!)^2 * 2^{10}}\)

  • This can be rewritten as this:

\(\frac{10*9*8*7*6*5*4*3*2*1}{(5*4*3*2*1)*(5*4*3*2*1)*2^{10}}\)

  • which simplifies to

\(\frac{10*9*8*7*6}{(5*4*3*2*1)*2^{10}}\)

Potential Test Questions

If you see a test question on this, I would expect you to be able to:

  • Identify if there is a combination vs permutation

  • Identify if it is with or without repetition

  • Solving a permutation question may be bonus material

  • Complete a very simple example using sets, i.e. exactly 2 tails out of 3 flips.

      - For this I might ask for a numeric answer, to pick the correct sample space from a list, to pick the correct event space from a list, or some other stage of the problem.
  • Set up the problem without completing it. For example see next slide:

Set up without completing examples

  • Narrative: The answer to this problem would be figured by dividing the combination 10 choose 5 by the number of possible outcomes, which is 2 to the tenth power
  • Formula: \(\frac{\binom{n}{r}}{2^{10}}\)
  • Narrative: The favorable outcomes are a combination choosing 5 from 10 and the possible outcomes are \(2^10\)
  • Narrative: I would use a combination to find the favorable outcomes and use 2^10 as the total possible outcomes

If you intend to take the GRE

Types of variables

  • We need to know types of variables because they affect how we should treat the data.
  • The techniques for this class are only completely valid for one type: continuous.

Types of variables

  • Continuous - numbers with decimal places

  • Discrete - whole numbers or integers

  • Categorical - numbers just represent categories

      - Ordinal - ordered highest to lowest
      - Nominal - no particular order
  • Binary - choice of two alternatives such as war/peace

  • Count variables

Continuous variable: examples

  • temperature
  • amount of rain in a city
  • height in inches \(^t\)
  • age \(^t\)
  • nations’ GDP
  • household income
  • population

\(^t\) Technically only if we measure to fairly small fractions, but see the slide following the next.

Discrete variable: examples

  • number of days with a high above 90 degrees
  • number of days of rain in a city
  • height in feet
  • age in decades
  • GDP by billions
  • income by ranges of $5,000
  • district magnitude

Discrete v. Continuous

  • The distinction between discrete and continuous is a matter of judgment. At the smallest level, called the Planck scale in physics, every physical measurement is discrete.

  • It is the size of the measurement vs the size of the gaps that matter

      - measuring by centimer if we are measuring 5 centimeters is discrete
      - measuring by centimer if we are measuring 1000 km is continuous

Planck scale: \(10x10^{−35}\) m, \(5x10^-{44}\) seconds

Categorical: Nominal

  1. Numbers that represent categories in no particular order.
  2. We can’t do math on the numbers because the numbers really just represent names.
  • color
  • ethnicity
  • religion
  • country of origin
  • zip code
  • form of government

Categorical: Ordinal

  1. Order matters.
  2. We still can’t use math because the numbers don’t represent ratios, just order

Categorical: Ordinal

  • hot, warm, cool, cold

  • Movie ratins (G, PG, PG-13, R, NC-17)

  • Education level

  • Likert scales

  • Regime type

      - Closed Authocracy
      - Electoral Autocracy
      - Electoral Democracy
      - Liberal Democracy

What do we do with categorical variables?

  • As independent variables, R can handle them by treating them as factors
  • As dependent variables, we have to use more advanced Maximum Likelihood Estimate (MLE) techniques

Binary variables

  • yes/no as in a survey answer
  • war/peace
  • democracy/non-democracy
  • proportional/majoritarian
  • freedom/slavery

Count variables (Poisson variables)

  • Number of days of rain in a city in a year
  • Number of text messages per hour
  • Number of militarized interstate disputes per country per year
  • Number of men kicked to death by horses per Prussian cavalry unit per year (Ladislaus Bortkiewicz, 1898)

Types of variables

We need to know types of variables because they affect how we should treat the data. :

  • Different types have different distributions
  • Simple statistics treating categorical variables as continuous or discrete numerical values yield false results
  • Discrete values may be misinterpreted

Population and Sample

  • Population is the full set

  • All possible data

  • The entire group of interest

      - the population of country under study
      - the set of nation states
      - the set of Members of Congress
  • S{}

Population and Sample

  • Sample is the subset that we actually observe
  • Population is S{}, sample is s{}
  • S is always less than s

Population and Sample formulas

For many formulas:

  • Population statistics are divided by n
  • Sample statistics are divided by n - 1
  • This is to correct for the bias created by not observing the entire population
  • The larger n is, the less difference the -1 makes

Correlation and Covariance

  • Related measures
  • Do not worry about computing them for the test
  • You should understand what they measure

Correlation

  • What is correlation?

  • The strength of association between two variables

Correlation of 1

  • 1 is perfect positive correlation

      - It doesn't mean you just multiply times 1 and get the other
      - The relationship can be complex (not a line)
      - It just means there is no variation at all in the relationship

Correlation of 0

  • 0 is perfectly uncorrelated
  • There is zero relationship between the variables

Negative correlation

  • -1 is a perfect negative correlation
  • It is still a perfect correlation
  • Indicates that when one increases the other decreases in perfect relationship
  • Like a correlation of 1, it can be complex

Computing correlations

  • Many ways to figure correlation
  • Probably the most common is Pearson’s Correlation Coefficient
  • Pearson’s correlation is the default method when using the cor() function in R

Computing correlations

  • The formula for Pearson’s correlation is:

Pearson’s Correlation Formula

Covariance

  • What is covariance?
  • A measure of the joint variability of two random variables
  • Measures the direction of the relationship
  • Correlation measures strength of the relationship

Covariance Formula

  • The formual for covariance is:

Covariance formula

Finding correlation from covariance

  • Cor(x,y) = \(\frac{Cov(x,y)}{\sigma_x,\sigma_y}\)

  • \(\sigma_x\) is standard deviation of x

  • \(\sigma_y\) is standard deviation of y

Authorship, License, Credits

Covariance formula from: EDUCBA

Creative Commons License