What are ranking data?

Example and analysis borrowed from this paper. These data show aggregated responses provided by 13 women who were asked to rank their preference for spending leisure time with (1: male; 2: female; 3: both sexes).

Each column is an option that was ranked. Each row is a possible rank order. And the freq_ranking is the number of times this rank order appeared in the data.

d <- leisure.black %>% 
  rename(male = X1,
         female = X2,
         both_sexes = X3,
         freq_ranking = n) 

d %>% kable()
male female both_sexes freq_ranking
1 2 3 1
1 3 2 1
2 1 3 0
2 3 1 5
3 1 2 0
3 2 1 6

So if we look at the last row of the table, this tells us that the ranking Male:3, Female:2, and Both:1 occurred six times in the dataset.

Descriptive statistics for rank data

The goal is provide a set of numbers that clearly communicates the central tendency of people’s preferences. There seem to be three common stats presented for rank data, which each answer a different question:

  1. Mean Ranks (how popular was an option?)
  2. Pairwise ranking comparisons (how many times was an option ranked higher than each of the other options?)
  3. Marginal frequencies (how many times was an option put at this ranking spot?)

Mean Rank

The first thing to do is compute the popularity of an option using the mean rank attributed to an object. We can think of this as a weighted mean where we take the mean of all of the rankings for that option but weighted by the frequency of that option being ranked at that rank

# popularity of the male option 
stats::weighted.mean(x = d$male, w = d$freq_ranking)
## [1] 2.307692

There’s an R function destat() that will compute all of the mean ranks from an aggregated data set

tibble(
  option = c('male', 'female', 'both'),
  mean_rank = destat(d)$mean.rank 
) %>% 
  kable(digits = 2)
option mean_rank
male 2.31
female 2.46
both 1.23

The mean ranks tell us that the “both” option was clearly most preferred, and there is no strong preference between the other two objects.

Pairwise frequencies

This measure tells us how many times a given object was ranked higher than each of the other objects.

pairwise_matrix <- destat(d)$pair 

# clean up table and print
colnames(pairwise_matrix) <- c("male", "female", "both")
rownames(pairwise_matrix) <- c("male", "female", "both")

pairwise_matrix %>% kable()
male female both
male 0 7 2
female 6 0 1
both 11 12 0

Using this information, we can say things like:

the option both was ranked higher than male 11 times, higher than female 12 times.

Marginal frequencies

This measure tells us the number of times people ranked an object at particular ranking spot.

marginal_matrix <- destat(d)$mar 

# clean up table and print
colnames(marginal_matrix) <- c("1", "2", "3")
rownames(marginal_matrix) <- c("male", "female", "both")

marginal_matrix %>% kable()
1 2 3
male 2 5 6
female 0 7 6
both 11 1 1

Using this information, we can say things like:

the option female was ranked first 0 times, second 7 times, and third 6 times.