First version: 10-DEC-2021
Updated: 30-JAN-2022 (thanks to Jacob van Etten, Kaue de Sousa and Heather Turner for their suggestions)

1 Packages needed

library(readxl)
library(PlackettLuce)

2 Introduction

The main objective of this document is to show how rankings of our Working paper on Opportunities for implementing impoved tree seed sourcing for forest landscape restoration, agroforestry and wider tree planting at the project design stage: findings of a survey of planters, researchers and funders were analyzed via the PlackettLuce package (for a description of the package, see Turner et al. 2020).

The example shown here refers to Question 11 of the survey, with results discussed in section 3.5 (Information needed by funders from prospective planters to assess tree seed sourcing quality) of the working paper.

Question 11. Let us assume that tree planters are required to explain how they will source the tree seed that they intend to use for planting. (So that an assessment of the quality of tree seed sourcing is possible.) If this is so, how important is it for the planters to provide the different pieces of information below?

Please rank between these 5 options (where 1 is the most important information planters should provide and 5 the least important – each option must have a different rank):

The expected performance of seed at the chosen planting site (providing information on the ‘match’ between seed and site)
The provenance (original collection site) or variety of the seed(s) to be used
A list of species to be planted
The type of seed source (e.g., from natural forest, plantation, farmland, seed orchard)
The supplier of seed(s) (of each chosen species)

Earlier in the survey (Question 7), the respondents were asked to choose whether their institution was best described as an institution that directly carries out or organizes tree planting (‘Planter’), as an institution that does research on tree planting (‘Researcher’) or as an institution that funds tree planting (‘Funder’). For the analysis of Question 11, respondents from the category of ‘Planters’ were excluded.

3 Data preparation

3.1 Load the data

Prior to importing the data, data was sorted by the different rankings. Such prior sorting is not required for the statistical analysis.

data.file <- paste0(getwd(), "\\TQTP R pub.xlsx")
sdata <- data.frame(read_excel(path=data.file, sheet="data"))

print(sdata)

##    respondent Site.matching Origin Species.list Source.type Supplier
## 1      funder             1      2            4           3        5
## 2      funder             1      2            5           4        3
## 3      funder             1      3            5           4        2
## 4      funder             1      4            5           2        3
## 5      funder             2      1            4           3        5
## 6      funder             2      4            1           3        5
## 7      funder             2      4            1           5        3
## 8      funder             2      5            1           4        3
## 9      funder             3      1            5           4        2
## 10     funder             3      2            1           5        4
## 11     funder             3      2            5           4        1
## 12     funder             3      4            5           2        1
## 13     funder             3      5            2           1        4
## 14     funder             5      2            3           1        4
## 15 researcher             1      2            3           5        4
## 16 researcher             1      2            4           3        5
## 17 researcher             1      2            4           3        5
## 18 researcher             1      2            5           3        4
## 19 researcher             1      2            5           3        4
## 20 researcher             1      2            5           3        4
## 21 researcher             1      2            5           4        3
## 22 researcher             1      3            2           4        5
## 23 researcher             1      3            2           4        5
## 24 researcher             1      3            2           4        5
## 25 researcher             1      3            2           4        5
## 26 researcher             1      3            2           4        5
## 27 researcher             1      3            2           5        4
## 28 researcher             1      3            2           5        4
## 29 researcher             1      3            4           2        5
## 30 researcher             1      3            4           2        5
## 31 researcher             1      3            5           2        4
## 32 researcher             1      3            5           2        4
## 33 researcher             1      4            3           5        2
## 34 researcher             1      4            5           2        3
## 35 researcher             1      4            5           3        2
## 36 researcher             2      1            3           4        5
## 37 researcher             2      1            3           4        5
## 38 researcher             2      1            4           3        5
## 39 researcher             2      1            4           3        5
## 40 researcher             2      1            5           4        3
## 41 researcher             2      1            5           4        3
## 42 researcher             2      3            1           4        5
## 43 researcher             2      3            1           4        5
## 44 researcher             2      3            1           4        5
## 45 researcher             2      3            1           4        5
## 46 researcher             2      3            1           4        5
## 47 researcher             2      3            1           4        5
## 48 researcher             2      3            1           4        5
## 49 researcher             2      3            1           4        5
## 50 researcher             2      3            1           5        4
## 51 researcher             2      3            1           5        4
## 52 researcher             2      3            4           1        5
## 53 researcher             2      3            5           4        1
## 54 researcher             3      1            4           2        5
## 55 researcher             3      1            5           2        4
## 56 researcher             3      1            5           2        4
## 57 researcher             3      2            1           4        5
## 58 researcher             3      2            1           4        5
## 59 researcher             3      2            1           4        5
## 60 researcher             3      2            1           4        5
## 61 researcher             3      2            1           5        4
## 62 researcher             3      2            1           5        4
## 63 researcher             3      2            4           1        5
## 64 researcher             3      4            1           2        5
## 65 researcher             3      4            1           2        5
## 66 researcher             3      4            5           1        2
## 67 researcher             3      5            1           2        4
## 68 researcher             4      1            3           2        5
## 69 researcher             4      1            3           2        5
## 70 researcher             4      2            1           3        5
## 71 researcher             4      2            1           5        3
## 72 researcher             4      3            1           2        5
## 73 researcher             4      3            1           2        5
## 74 researcher             4      3            5           1        2
## 75 researcher             5      1            3           2        4
## 76 researcher             5      1            3           2        4
## 77 researcher             5      1            4           2        3
## 78 researcher             5      1            4           3        2
## 79 researcher             5      2            1           3        4
## 80 researcher             5      2            1           4        3
## 81 researcher             5      3            1           2        4
## 82 researcher             5      3            1           4        2
## 83 researcher             5      3            1           4        2

3.2 Prepare data with reference level for worst combined-ranked option

To analyze data for all the respondents, the type of respondent is not required. The first column can thus be removed.

sdata1 <- sdata[, -1]

The number of options (items to be ranked) equals the number of columns in the data:

opt.n <- ncol(sdata1)
opt.n

## [1] 5

Two new data sets will be created. In the first of these new data sets, the first column corresponds to the option with the worst combined rank overall. This option should be the ‘Supplier’ column, as this column had the largest average ranking.

colSums(sdata1) / nrow(sdata1)

## Site.matching        Origin  Species.list   Source.type      Supplier 
##      2.457831      2.506024      2.795181      3.240964      4.000000

col1 <- as.numeric(which.max(colSums(sdata1)))
col1

## [1] 5

sdataW <- cbind(sdata1[, col1], sdata1[, -col1])
names(sdataW)[1] <- names(sdata1)[col1]

head(sdataW)

##   Supplier Site.matching Origin Species.list Source.type
## 1        5             1      2            4           3
## 2        3             1      2            5           4
## 3        2             1      3            5           4
## 4        3             1      4            5           2
## 5        5             2      1            4           3
## 6        5             2      4            1           3

The data now need reorganizing into columns that show which option was given rank 1, rank 2, … for each of the respondents. This is the same format as the nascar data set from PlacettLuce.

sdataWR <- sdataW

names(sdataWR) <- paste0("rank", c(1:opt.n))
sdataWR[,] <- NA

for (i in 1:nrow(sdataWR)) {
  data.i <- sdataW[i, 1:opt.n]
  for (j in 1:opt.n) {
    sdataWR[i, j] <- which(data.i == j)
  }
}

sdataW[1:2, ] # first respondent ranked column 2 (Site.matching) first

##   Supplier Site.matching Origin Species.list Source.type
## 1        5             1      2            4           3
## 2        3             1      2            5           4

sdataWR[1:2, ] # first respondent ranked column 2 (Site.matching) first

##   rank1 rank2 rank3 rank4 rank5
## 1     2     3     5     4     1
## 2     2     3     1     5     4

This data set can now be used to create a data set in the required format for PlackettLuce.

sdataWP <- as.rankings(sdataWR,
                       input="orderings",
                       items=names(sdataW)) # column names were the options

sdataWP[1,]

## [1] "Site.matching > Origin > Source.type ..."

sdataWP[2,]

## [1] "Site.matching > Origin > Supplier >  ..."

3.3 Prepare data with reference level for best combined-ranked option

In the second of the new data sets, the reference level corresponds to the option with the best combined rank overall. As it happens in our data set, this was already the case (column ‘Site.matching’). But within a general working pipeline, this is how the new data set can be created.

col1 <- as.numeric(which.min(colSums(sdata1)))
col1

## [1] 1

# col1 <- 2 # to make 'Origin' the reference level, see section 4.2
sdataB <- cbind(sdata1[, col1], sdata1[, -col1])
names(sdataB)[1] <- names(sdata1)[col1]

head(sdataB)

##   Site.matching Origin Species.list Source.type Supplier
## 1             1      2            4           3        5
## 2             1      2            5           4        3
## 3             1      3            5           4        2
## 4             1      4            5           2        3
## 5             2      1            4           3        5
## 6             2      4            1           3        5

Next create the data set with columns for different ranks.

sdataBR <- sdataB

names(sdataBR) <- paste0("rank", c(1:opt.n))
sdataBR[,] <- NA

for (i in 1:nrow(sdataBR)) {
  data.i <- sdataB[i, 1:opt.n]
  for (j in 1:opt.n) {
    sdataBR[i, j] <- which(data.i == j)
  }
}

The sdataBR data set can now be converted into a rankings object.

sdataBP <- as.rankings(sdataBR,
                       input="orderings",
                       items=names(sdataB)) # column names were the options

sdataBP[1,]

## [1] "Site.matching > Origin > Source.type ..."

4 Import data, updated in January 2022

It is actually possible to directly create a rankings object from the input data that was loaded in Section 3.1. Moreover, as shown in the updates for model fitting (Section 6), it is possible to define the reference levels during that step. Thus steps shown in sections 3.2 and 3.3 are not necessary.

sdata2022 <- as.rankings(sdata1)
sdata2022[1:10, ]

##  [1] "Site.matching > Origin > Source.type ..."
##  [2] "Site.matching > Origin > Supplier >  ..."
##  [3] "Site.matching > Supplier > Origin >  ..."
##  [4] "Site.matching > Source.type > Suppli ..."
##  [5] "Origin > Site.matching > Source.type ..."
##  [6] "Species.list > Site.matching > Sourc ..."
##  [7] "Species.list > Site.matching > Suppl ..."
##  [8] "Species.list > Site.matching > Suppl ..."
##  [9] "Origin > Supplier > Site.matching >  ..."
## [10] "Species.list > Origin > Site.matchin ..."

5 Fit the models

5.1 Models with baseline of worst combined-ranked option

The coefficient for the reference level (‘Supplier’) is zero.

W.mod <- PlackettLuce(sdataWP)
W.mod

## Call: PlackettLuce(rankings = sdataWP)
## 
## Coefficients:
##      Supplier  Site.matching         Origin   Species.list    Source.type  
##        0.0000         1.2115         1.2687         0.7729         0.6741

summary(W.mod)

## Call: PlackettLuce(rankings = sdataWP)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Supplier        0.0000         NA      NA       NA    
## Site.matching   1.2115     0.2072   5.846 5.04e-09 ***
## Origin          1.2687     0.2034   6.237 4.46e-10 ***
## Species.list    0.7729     0.2083   3.710 0.000207 ***
## Source.type     0.6741     0.1966   3.429 0.000606 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

With all the coefficients being significant (P < 0.001), we concluded in the working paper that ‘Supplier’ was statistically significantly the lowest-ranked option compared to all other response options.

The quasi-variances can be visualized as follows.

qv <- qvcalc(W.mod)
qv$qvframe <- qv$qvframe[order(coef(W.mod)), ]

par(mai=c(1.5, 0.5, 0.5, 0.5))

plot(qv, xaxt="n")
axis(1, at=c(1:opt.n), 
     labels=rownames(qv$qvframe), 
     las=2)

5.2 Models with baseline of best combined-ranked option

The coefficient for the reference level (‘Site.matching’) is zero. Whereas we could have expected that coefficients would be negative for all the other options, the coefficient was positive for the ‘Origin’ option. This means that PlackettLuce ranked ‘Origin’ higher than ‘Site.matching’. However, the probability associated with the coefficient for ‘Origin’ was significantly higher than 0.05, indicating that both ‘Site.matching’ and ‘Origin’ did not differ in ranking significantly.

The third-ranked option is ‘Species.list’, significantly different from ‘Site.matching’ and ‘Origin’. This could be checked by making ‘Origin’ the baseline - see the script of section 3.2 with reference to this section (4.2).

B.mod <- PlackettLuce(sdataBP)
B.mod

## Call: PlackettLuce(rankings = sdataBP)
## 
## Coefficients:
## Site.matching         Origin   Species.list    Source.type       Supplier  
##       0.00000        0.05721       -0.43863       -0.53745       -1.21151

summary(B.mod)

## Call: PlackettLuce(rankings = sdataBP)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Site.matching  0.00000         NA      NA       NA    
## Origin         0.05721    0.18601   0.308  0.75839    
## Species.list  -0.43863    0.19993  -2.194  0.02824 *  
## Source.type   -0.53745    0.19114  -2.812  0.00493 ** 
## Supplier      -1.21151    0.20724  -5.846 5.04e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

When visualizing the rankings, the confidence intervals for the ‘Species.List’ overlapped those of the best and second-best option. In the working paper, we relied on the significance levels of the coefficients to determine if rankings were significantly different (and therefore, that ‘Species.List’ ranked significantly lower).

qv <- qvcalc(B.mod)
qv$qvframe <- qv$qvframe[order(coef(B.mod)), ]

par(mai=c(1.5, 0.5, 0.5, 0.5))

plot(qv, xaxt="n")
axis(1, at=c(1:opt.n), 
     labels=rownames(qv$qvframe), 
     las=2)

6 Fit the models, updated in January 2022

The baseline can be defined during the model fitting.

6.1 Fit the model

PL.mod <- PlackettLuce(sdata2022)
PL.mod

## Call: PlackettLuce(rankings = sdata2022)
## 
## Coefficients:
## Site.matching         Origin   Species.list    Source.type       Supplier  
##       0.00000        0.05721       -0.43863       -0.53745       -1.21151

summary(PL.mod)

## Call: PlackettLuce(rankings = sdata2022)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Site.matching  0.00000         NA      NA       NA    
## Origin         0.05721    0.18601   0.308  0.75839    
## Species.list  -0.43863    0.19993  -2.194  0.02824 *  
## Source.type   -0.53745    0.19114  -2.812  0.00493 ** 
## Supplier      -1.21151    0.20724  -5.846 5.04e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

6.2 Modify the baseline level via argument ref

The baseline level can be chosen while summarizing via argument of ref.

coef(PL.mod, ref="Supplier")

## Site.matching        Origin  Species.list   Source.type      Supplier 
##     1.2115053     1.2687198     0.7728725     0.6740572     0.0000000

summary(PL.mod, ref="Supplier")

## Call: PlackettLuce(rankings = sdata2022)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Site.matching   1.2115     0.2072   5.846 5.04e-09 ***
## Origin          1.2687     0.2034   6.237 4.46e-10 ***
## Species.list    0.7729     0.2083   3.710 0.000207 ***
## Source.type     0.6741     0.1966   3.429 0.000606 ***
## Supplier        0.0000         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

coef(PL.mod, ref="Origin")

## Site.matching        Origin  Species.list   Source.type      Supplier 
##   -0.05721454    0.00000000   -0.49584729   -0.59466260   -1.26871983

summary(PL.mod, ref="Origin")

## Call: PlackettLuce(rankings = sdata2022)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Site.matching -0.05721    0.18601  -0.308  0.75839    
## Origin         0.00000         NA      NA       NA    
## Species.list  -0.49585    0.19455  -2.549  0.01081 *  
## Source.type   -0.59466    0.18388  -3.234  0.00122 ** 
## Supplier      -1.26872    0.20342  -6.237 4.46e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

As before, the candidate level for the worst combined ranking overall can be calculated by the average rankings:

col1 <- as.numeric(which.max(colSums(sdata1)))
worst.cand <- names(sdata1)[col1]
worst.cand

## [1] "Supplier"

col2 <- as.numeric(which.min(colSums(sdata1)))
best.cand <- names(sdata1)[col2]
best.cand

## [1] "Site.matching"

These candidates can now be used while summarizing the results:

summary(PL.mod, ref=worst.cand)

## Call: PlackettLuce(rankings = sdata2022)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Site.matching   1.2115     0.2072   5.846 5.04e-09 ***
## Origin          1.2687     0.2034   6.237 4.46e-10 ***
## Species.list    0.7729     0.2083   3.710 0.000207 ***
## Source.type     0.6741     0.1966   3.429 0.000606 ***
## Supplier        0.0000         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

summary(PL.mod, ref=best.cand)

## Call: PlackettLuce(rankings = sdata2022)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## Site.matching  0.00000         NA      NA       NA    
## Origin         0.05721    0.18601   0.308  0.75839    
## Species.list  -0.43863    0.19993  -2.194  0.02824 *  
## Source.type   -0.53745    0.19114  -2.812  0.00493 ** 
## Supplier      -1.21151    0.20724  -5.846 5.04e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual deviance:  739.93 on 826 degrees of freedom
## AIC:  747.93 
## Number of iterations: 7

6.3 Plotting

Also the qvcalc function has an argument of ref:

qv <- qvcalc(PL.mod, ref=worst.cand)
qv$qvframe <- qv$qvframe[order(coef(PL.mod)), ]

par(mai=c(1.5, 0.5, 0.5, 0.5))

plot(qv, xaxt="n")
axis(1, at=c(1:opt.n), 
     labels=rownames(qv$qvframe), 
     las=2)

qv <- qvcalc(PL.mod, ref=best.cand)
qv$qvframe <- qv$qvframe[order(coef(PL.mod)), ]

par(mai=c(1.5, 0.5, 0.5, 0.5))

plot(qv, xaxt="n")
axis(1, at=c(1:opt.n), 
     labels=rownames(qv$qvframe), 
     las=2)

6.4 Coefficients without the log scale

Setting the argument of log to FALSE, the probability of winning is calculated. Here the reference level is not required.

coef(PL.mod, log=FALSE, ref=worst.cand)

## Site.matching        Origin  Species.list   Source.type      Supplier 
##    0.27887884    0.29530005    0.17985387    0.16293142    0.08303582

coef(PL.mod, log=FALSE, ref=best.cand)

## Site.matching        Origin  Species.list   Source.type      Supplier 
##    0.27887884    0.29530005    0.17985387    0.16293142    0.08303582

7 Separate analysis for different types of respondents

In the working paper, we further analyzed whether rankings were different for different types of respondents.

Here I will only show the results for researchers, where the baseline level corresponds to the best-combined ranked option.

7.1 Subset the data

sdata2 <- sdata[sdata$respondent == "researcher", ]
sdata2 <- sdata2[, -1]

7.2 Prepare data in the same way as the combined data

Make the first column the reference level. In this case, the reference level is ‘Origin’.

col1 <- as.numeric(which.min(colSums(sdata2)))
col1

## [1] 2

sdataB <- cbind(sdata2[, col1], sdata2[, -col1])
names(sdataB)[1] <- names(sdata2)[col1]

head(sdataB)

##    Origin Site.matching Species.list Source.type Supplier
## 15      2             1            3           5        4
## 16      2             1            4           3        5
## 17      2             1            4           3        5
## 18      2             1            5           3        4
## 19      2             1            5           3        4
## 20      2             1            5           3        4