POLS3316: Statistics for Political Scientists

Lecture 2: Variables, Units of Observation, Variable Types, Measures of Central Tendency

Instructor: Tom Hanna, Fall 2025

2026-01-27

Descriptive Statistics

Why do we use descriptive statistics?

  • Explore the data
  • See patterns in the data
  • Communicate about the data

What are the most basic things we need to know?

  • What are the variables?
  • What is the scope of the data (time, geography, cases)?
  • What is the unit of observation?

What are variables?

  • In math (algebra) - a symbol that represents a number
  • In science - a characteristic or attribute that can vary across units of observation
  • In statistics - a characteristic or attribute that can vary across units of observation
  • In R coding - a column in a data frame
  • In all of the above - something that can take on different values

What is a unit of observation?

  • In science - the entity that is being measured
  • In statistics - the entity that is being measured
  • In R coding - a row in a data frame
  • In all of the above - the thing that has the variable measured on/for it

Putting it together

  • Variables are characteristics or attributes that vary across units of observation

Example

Titanic data

     Passenger_Type number_of_deaths
1    1st-Male-Child                0
2    2nd-Male-Child                0
3    3rd-Male-Child               35
4   Crew-Male-Child                0
5  1st-Female-Child                0
6  2nd-Female-Child                0
7  3rd-Female-Child               17
8 Crew-Female-Child                0
  • units of observation
  • variables: columns

Wide format

BEWARE! Not all data is formatted this way! Sometimes you have to think “is this a variable or a unit of observation?”

For example, the following data on Scandinavian temperatures:

  country avgtemp.1994 avgtemp.1995 avgtemp.1996
1  Sweden            3            8            5
2 Denmark            9           11            8
3  Norway            9            5            9

It looks like the unit of observation is country and the variable is a combination of year and temperature.

Long Format

If we look at it in long format, it’s a little clearer:

  country year avgtemp
1  Sweden 1994       3
2 Denmark 1994       9
3  Norway 1994       9
4  Sweden 1995       8
5 Denmark 1995      11
6  Norway 1995       5
7  Sweden 1996       5
8 Denmark 1996       8
9  Norway 1996       9

The variable is average temperature.

Country-year format

  country_year avg_temp high_temp low_temp
1 Denmark-1994        9        12        4
2 Denmark-1995       11        14       10
3 Denmark-1996        8         9        7
4  Norway-1994        9        11        6
5  Norway-1995        5        10        1
6  Norway-1996        9        11        4
7  Sweden-1994        3         6        0
8  Sweden-1995        8        10        5
9  Sweden-1996        5         9        4

But the unit of observation is no longer country - it is country-year.

Beware again: This is wide format in statistical terms. This is long format in R tidyverse nomenclature.

Measures of Central Tendency

Measures of central tendency help us:

  • reveal patterns
  • find the typical measurement
  • find the center

Measures of Central Tendency

A few numbers that can summarize the center of measurement

  • Mean

  • Median

  • Mode

Mean

  • Symbol: \(\bar{x}\)
  • Not the middle value
  • Not the most common
  • The center of mass - the sum above equals the sum below
  • Formula is \(\bar{x} = \frac{\sum X_i}{n}\)
  • Read that: The mean of X equals the sum of the observations (i) of X divided by the number (n) of observations.

Example A:

A. What is the mean of 1,5,7,9,10,12,18

Example A:

A. What is the mean of 1,5,7,9,10,12,18

[1] "Mean produced by sum function: 8.85714285714286"
[1] "Mean produced by mean function: 8.85714285714286"

Example A:

A. What is the mean of 1,5,7,9,10,12,18

Center of mass

Example B

B. What is the mean of 10,20,25,30,35,40,45,50,75

Example B

B. What is the mean of 10,20,25,30,35,40,45,50,75

[1] 36.66667
[1] 36.66667

Median

  • Midpoint
  • Half observations are greater, half are lower
  • Just count
  • Even observations - midpoint between middle two

Example A

A - 1,5,7,9,10,12,18

Example A:

A. What is the mean of 1,5,7,9,10,12,18

[1] "Median of A: 9"

Example B

B - 10,20,25,30,35,40,45,50,75

Example B

B - 10,20,25,30,35,40,45,50,75

[1] "Median of B: 35"

Example B

B - 10,20,25,30,35,40,45,50,75

Midpoint and Center of Mass

Keep in mind for later

In both of our examples, the mean and median were close but not the same. That isn’t always the case.

Mode

  • Most common value
  • Just count

Examples:

C. 1,2,3,4,4,5,6,7

Answer:

D. 10,20,30,30,40,40,40,50,50,60,70

Answer:

Advantages and disadvantages

  • Median isn’t affected by outliers

  • Mean gives the broader picture because it includes the outliers.

  • Mode is the only option for categorical variables.

Variable Types

  • Categorical (nominal, ordinal)
  • Numerical (interval, ratio)

Variable Type Examples: Categorical

  • Nominal (Order is meaningless)

      - Gender
      - Race
      - Religion
      - Democrat vs Republican (also binary)
  • Ordinal (ORDer matters)

      - Education level
      - Income brackets
      - Likert scale responses
  • Binary

      - Yes/No
      - 0/1
      - True/False

Variable Type Examples: Numerical

  • Interval (equal intervals, no true zero) *

      - Temperature (Celsius, Fahrenheit)
      - IQ scores
      - Calendar years
  • Ratio (equal intervals, true zero) *

      - Height
      - Weight
      - Age
      - Income
      - Kelvin temperature
  • Discrete (countable values)

      - Number of children
      - Number of countries in a trade agreement
      - Battle deaths in a conflict

Social Science examples

  • V-Dem v2x_libdem → ratio (0-1, true zero = no polyarchy) *
  • Polity2 score (-10 to 10) → interval (0 not “no democracy”) *
  • Country GDP → ratio *
  • Year → interval
  • Battle deaths (COW) → ratio/discrete *
  • is_autocracy (0/1) → binary/categorical
  • regime_type (dem/au/other) → nominal categorical
  • freedom_level (low/med/high) → ordinal categorical

Skewed distribution - when mean and median are different

The three numbers are often different for the same sample or population.

Example:

Negatively skewed, Normal, and Positively Skewed distributions

Authorship and License

skewed distribution source: Statistics by Jim

Creative Commons License