Prob. & Freq. Distributions
Variable Types, Correlation, Covariance
POLS 3316: Statistics for Political Scientists
2023-09-28
This is October 4th and 9th lecture
Last things before hypothesis testing
- Finish probability - m
- Finish frequency distribution - m
- Types of variables - a
- Correlations, covariance - t
practice stats problems by hand
m = basis of methods of testing; a = assumptions about methods; t = what we are testing
Problem Set 2 (exam practice): Due October 10
Exam in this classroom October 11
- Calculator (no phones)
- Pencils/pens
- 1 page single sided of notes
- blank scratch paper
- Formula sheet will be printed on the test
Pre-quiz Question 14: I flip a fair coin ten times, the probability that coin comes up heads exactly five times is:
Exactly 2 tails out of 3 flips:
Question: In general, using sets how would we start?
Exactly 2 tails out of 3 flips:
Question: In general, using sets how would we start?
Answer: define the sample space
Question: What is the sample space for this problem?
S = {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH}
What is our total possible outcomes or the number of elements in the samples space?
S = {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH}
What is our total possible outcomes or the number of elements in the samples space? 8(Note that with 3 trials of two possible outcomes this is \(2^3\))
S = {TTT , TTH , THT , THH , HTT , HTH , HHT , HHH}
What is our total possible outcomes or the number of elements in the samples space? 8(Note that with 3 trials of two possible outcomes this is \(2^3\))
Question: What is our next general step?
Answer: Define the event space
P(A) = \(\frac{favorable_A}{all \; possible \; outcomes_A}\)
P(s) = \(\frac{favorable_S}{all \; possible \; outcomes_S}\)
favorable_S is the event space s
P(s) =
P(A) = \(\frac{favorable_A}{all \; possible \; outcomes_A}\)
P(s) = \(\frac{favorable_S}{all \; possible \; outcomes_S}\)
favorable_S is the event space s
P(s) = \(\frac{3}{8}\)
Permutations: Order matters
With repetition - number possible: \(n^r\)
Without repetition - Number possible: \(\frac{n!}{(n − r)!}\)
Combinations: Order doesn’t matter
Number possible: \(\binom{n}{r}\) = \(\frac{n!}{r!(n − r)!}\)
Question 14: I flip a fair coin ten times, the probability that coin comes up heads exactly five times is:
There are 10 flips, each with two possible outcomes. which means there are \(2^{10}\) possible outcomes. The number of favorable possibilities is the combinations that can result in 5 heads, found with the combinations formula
\(\frac{10!}{(5!)^2 * 2^{10}}\)
\(\frac{10*9*8*7*6*5*4*3*2*1}{(5*4*3*2*1)*(5*4*3*2*1)*2^{10}}\)
\(\frac{10*9*8*7*6}{(5*4*3*2*1)*2^{10}}\)
If you see a test question on this, I would expect you to be able to:
Identify if there is a combination vs permutation
Identify if it is with or without repetition
Solving a permutation question may be bonus material
Complete a very simple example using sets, i.e. exactly 2 tails out of 3 flips.
- For this I might ask for a numeric answer, to pick the correct sample space from a list, to pick the correct event space from a list, or some other stage of the problem.
Set up the problem without completing it. For example see next slide:
These are a substantial number of GRE quantitative section questions and they can be both tricky and time consuming if you don’t know them well.
https://www.mathsisfun.com/combinatorics/combinations-permutations.html
https://statisticsbyjim.com/probability/permutations-probabilities/
Continuous - numbers with decimal places
Discrete - whole numbers or integers
Categorical - numbers just represent categories
- Ordinal - ordered highest to lowest
- Nominal - no particular order
Binary - choice of two alternatives such as war/peace
Count variables
\(^t\) Technically only if we measure to fairly small fractions, but see the slide following the next.
The distinction between discrete and continuous is a matter of judgment. At the smallest level, called the Planck scale in physics, every physical measurement is discrete.
It is the size of the measurement vs the size of the gaps that matter
- measuring by centimer if we are measuring 5 centimeters is discrete
- measuring by centimer if we are measuring 1000 km is continuous
Planck scale: \(10x10^{−35}\) m, \(5x10^-{44}\) seconds
hot, warm, cool, cold
Movie ratins (G, PG, PG-13, R, NC-17)
Education level
Likert scales
Regime type
- Closed Authocracy
- Electoral Autocracy
- Electoral Democracy
- Liberal Democracy
We need to know types of variables because they affect how we should treat the data. :
Population is the full set
All possible data
The entire group of interest
- the population of country under study
- the set of nation states
- the set of Members of Congress
S{}
For many formulas:
What is correlation?
The strength of association between two variables
1 is perfect positive correlation
- It doesn't mean you just multiply times 1 and get the other
- The relationship can be complex (not a line)
- It just means there is no variation at all in the relationship
Pearson’s Correlation Formula
Covariance formula
Cor(x,y) = \(\frac{Cov(x,y)}{\sigma_x,\sigma_y}\)
\(\sigma_x\) is standard deviation of x
\(\sigma_y\) is standard deviation of y
Covariance formula from: EDUCBA
Author: Tom Hanna
Website: tomhanna.me
License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.</>
GOVT2306, Fall 2023, Instructor: Tom Hanna