POLS3316: Statistics for Political Scientists
Lecture 2: Variables, Units of Observation, Variable Types, Measures of Central Tendency
Instructor: Tom Hanna, Fall 2025
2026-01-27
Why do we use descriptive statistics?
- Explore the data
- See patterns in the data
- Communicate about the data
What are the most basic things we need to know?
- What are the variables?
- What is the scope of the data (time, geography, cases)?
- What is the unit of observation?
What are variables?
- In math (algebra) - a symbol that represents a number
- In science - a characteristic or attribute that can vary across units of observation
- In statistics - a characteristic or attribute that can vary across units of observation
- In R coding - a column in a data frame
- In all of the above - something that can take on different values
What is a unit of observation?
- In science - the entity that is being measured
- In statistics - the entity that is being measured
- In R coding - a row in a data frame
- In all of the above - the thing that has the variable measured on/for it
Putting it together
- Variables are characteristics or attributes that vary across units of observation
Example
Titanic data
Passenger_Type number_of_deaths
1 1st-Male-Child 0
2 2nd-Male-Child 0
3 3rd-Male-Child 35
4 Crew-Male-Child 0
5 1st-Female-Child 0
6 2nd-Female-Child 0
7 3rd-Female-Child 17
8 Crew-Female-Child 0
- units of observation
- variables: columns
Measures of Central Tendency
Measures of central tendency help us:
- reveal patterns
- find the typical measurement
- find the center
Measures of Central Tendency
A few numbers that can summarize the center of measurement
Mean
- Symbol: \(\bar{x}\)
- Not the middle value
- Not the most common
- The center of mass - the sum above equals the sum below
- Formula is \(\bar{x} = \frac{\sum X_i}{n}\)
- Read that: The mean of X equals the sum of the observations (i) of X divided by the number (n) of observations.
Example A:
A. What is the mean of 1,5,7,9,10,12,18
Example A:
A. What is the mean of 1,5,7,9,10,12,18
[1] "Mean produced by sum function: 8.85714285714286"
[1] "Mean produced by mean function: 8.85714285714286"
Example A:
A. What is the mean of 1,5,7,9,10,12,18
Center of mass
Example B
B. What is the mean of 10,20,25,30,35,40,45,50,75
Example B
B. What is the mean of 10,20,25,30,35,40,45,50,75
Example A
A - 1,5,7,9,10,12,18
Example A:
A. What is the mean of 1,5,7,9,10,12,18
Example B
B - 10,20,25,30,35,40,45,50,75
Example B
B - 10,20,25,30,35,40,45,50,75
Example B
B - 10,20,25,30,35,40,45,50,75
Midpoint and Center of Mass
Keep in mind for later
In both of our examples, the mean and median were close but not the same. That isn’t always the case.
Mode
- Most common value
- Just count
Examples:
C. 1,2,3,4,4,5,6,7
Answer:
D. 10,20,30,30,40,40,40,50,50,60,70
Answer:
Advantages and disadvantages
Median isn’t affected by outliers
Mean gives the broader picture because it includes the outliers.
Mode is the only option for categorical variables.
Variable Types
- Categorical (nominal, ordinal)
- Numerical (interval, ratio)
Variable Type Examples: Categorical
Nominal (Order is meaningless)
- Gender
- Race
- Religion
- Democrat vs Republican (also binary)
Ordinal (ORDer matters)
- Education level
- Income brackets
- Likert scale responses
Binary
- Yes/No
- 0/1
- True/False
Variable Type Examples: Numerical
Interval (equal intervals, no true zero) *
- Temperature (Celsius, Fahrenheit)
- IQ scores
- Calendar years
Ratio (equal intervals, true zero) *
- Height
- Weight
- Age
- Income
- Kelvin temperature
Discrete (countable values)
- Number of children
- Number of countries in a trade agreement
- Battle deaths in a conflict
Social Science examples
- V-Dem v2x_libdem → ratio (0-1, true zero = no polyarchy) *
- Polity2 score (-10 to 10) → interval (0 not “no democracy”) *
- Country GDP → ratio *
- Year → interval
- Battle deaths (COW) → ratio/discrete *
- is_autocracy (0/1) → binary/categorical
- regime_type (dem/au/other) → nominal categorical
- freedom_level (low/med/high) → ordinal categorical
Social Science examples