library(readxl)
library(PlackettLuce)
The main objective of this document is to show how rankings of our Working paper on Opportunities for implementing impoved tree seed sourcing for forest landscape restoration, agroforestry and wider tree planting at the project design stage: findings of a survey of planters, researchers and funders were analyzed via the PlackettLuce package (for a description of the package, see Turner et al. 2020).
The example shown here refers to Question 11 of the survey, with results discussed in section 3.5 (Information needed by funders from prospective planters to assess tree seed sourcing quality) of the working paper.
Question 11. Let us assume that tree planters are required to explain how they will source the tree seed that they intend to use for planting. (So that an assessment of the quality of tree seed sourcing is possible.) If this is so, how important is it for the planters to provide the different pieces of information below?
Please rank between these 5 options (where 1 is the most important information planters should provide and 5 the least important – each option must have a different rank):
Earlier in the survey (Question 7), the respondents were asked to choose whether their institution was best described as an institution that directly carries out or organizes tree planting (‘Planter’), as an institution that does research on tree planting (‘Researcher’) or as an institution that funds tree planting (‘Funder’). For the analysis of Question 11, respondents from the category of ‘Planters’ were excluded.
Prior to importing the data, data was sorted by the different rankings. Such prior sorting is not required for the statistical analysis.
data.file <- paste0(getwd(), "\\TQTP R pub.xlsx")
sdata <- data.frame(read_excel(path=data.file, sheet="data"))
print(sdata)
## respondent Site.matching Origin Species.list Source.type Supplier
## 1 funder 1 2 4 3 5
## 2 funder 1 2 5 4 3
## 3 funder 1 3 5 4 2
## 4 funder 1 4 5 2 3
## 5 funder 2 1 4 3 5
## 6 funder 2 4 1 3 5
## 7 funder 2 4 1 5 3
## 8 funder 2 5 1 4 3
## 9 funder 3 1 5 4 2
## 10 funder 3 2 1 5 4
## 11 funder 3 2 5 4 1
## 12 funder 3 4 5 2 1
## 13 funder 3 5 2 1 4
## 14 funder 5 2 3 1 4
## 15 researcher 1 2 3 5 4
## 16 researcher 1 2 4 3 5
## 17 researcher 1 2 4 3 5
## 18 researcher 1 2 5 3 4
## 19 researcher 1 2 5 3 4
## 20 researcher 1 2 5 3 4
## 21 researcher 1 2 5 4 3
## 22 researcher 1 3 2 4 5
## 23 researcher 1 3 2 4 5
## 24 researcher 1 3 2 4 5
## 25 researcher 1 3 2 4 5
## 26 researcher 1 3 2 4 5
## 27 researcher 1 3 2 5 4
## 28 researcher 1 3 2 5 4
## 29 researcher 1 3 4 2 5
## 30 researcher 1 3 4 2 5
## 31 researcher 1 3 5 2 4
## 32 researcher 1 3 5 2 4
## 33 researcher 1 4 3 5 2
## 34 researcher 1 4 5 2 3
## 35 researcher 1 4 5 3 2
## 36 researcher 2 1 3 4 5
## 37 researcher 2 1 3 4 5
## 38 researcher 2 1 4 3 5
## 39 researcher 2 1 4 3 5
## 40 researcher 2 1 5 4 3
## 41 researcher 2 1 5 4 3
## 42 researcher 2 3 1 4 5
## 43 researcher 2 3 1 4 5
## 44 researcher 2 3 1 4 5
## 45 researcher 2 3 1 4 5
## 46 researcher 2 3 1 4 5
## 47 researcher 2 3 1 4 5
## 48 researcher 2 3 1 4 5
## 49 researcher 2 3 1 4 5
## 50 researcher 2 3 1 5 4
## 51 researcher 2 3 1 5 4
## 52 researcher 2 3 4 1 5
## 53 researcher 2 3 5 4 1
## 54 researcher 3 1 4 2 5
## 55 researcher 3 1 5 2 4
## 56 researcher 3 1 5 2 4
## 57 researcher 3 2 1 4 5
## 58 researcher 3 2 1 4 5
## 59 researcher 3 2 1 4 5
## 60 researcher 3 2 1 4 5
## 61 researcher 3 2 1 5 4
## 62 researcher 3 2 1 5 4
## 63 researcher 3 2 4 1 5
## 64 researcher 3 4 1 2 5
## 65 researcher 3 4 1 2 5
## 66 researcher 3 4 5 1 2
## 67 researcher 3 5 1 2 4
## 68 researcher 4 1 3 2 5
## 69 researcher 4 1 3 2 5
## 70 researcher 4 2 1 3 5
## 71 researcher 4 2 1 5 3
## 72 researcher 4 3 1 2 5
## 73 researcher 4 3 1 2 5
## 74 researcher 4 3 5 1 2
## 75 researcher 5 1 3 2 4
## 76 researcher 5 1 3 2 4
## 77 researcher 5 1 4 2 3
## 78 researcher 5 1 4 3 2
## 79 researcher 5 2 1 3 4
## 80 researcher 5 2 1 4 3
## 81 researcher 5 3 1 2 4
## 82 researcher 5 3 1 4 2
## 83 researcher 5 3 1 4 2
To analyze data for all the respondents, the type of respondent is not required. The first column can thus be removed.
sdata1 <- sdata[, -1]
The number of options (items to be ranked) equals the number of columns in the data:
opt.n <- ncol(sdata1)
opt.n
## [1] 5
Two new data sets will be created. In the first of these new data sets, the first column corresponds to the option with the worst combined rank overall. This option should be the ‘Supplier’ column, as this column had the largest average ranking.
colSums(sdata1) / nrow(sdata1)
## Site.matching Origin Species.list Source.type Supplier
## 2.457831 2.506024 2.795181 3.240964 4.000000
col1 <- as.numeric(which.max(colSums(sdata1)))
col1
## [1] 5
sdataW <- cbind(sdata1[, col1], sdata1[, -col1])
names(sdataW)[1] <- names(sdata1)[col1]
head(sdataW)
## Supplier Site.matching Origin Species.list Source.type
## 1 5 1 2 4 3
## 2 3 1 2 5 4
## 3 2 1 3 5 4
## 4 3 1 4 5 2
## 5 5 2 1 4 3
## 6 5 2 4 1 3
The data now need reorganizing into columns that show which option was given rank 1, rank 2, … for each of the respondents. This is the same format as the nascar data set from PlacettLuce.
sdataWR <- sdataW
names(sdataWR) <- paste0("rank", c(1:opt.n))
sdataWR[,] <- NA
for (i in 1:nrow(sdataWR)) {
data.i <- sdataW[i, 1:opt.n]
for (j in 1:opt.n) {
sdataWR[i, j] <- which(data.i == j)
}
}
sdataW[1:2, ] # first respondent ranked column 2 (Site.matching) first
## Supplier Site.matching Origin Species.list Source.type
## 1 5 1 2 4 3
## 2 3 1 2 5 4
sdataWR[1:2, ] # first respondent ranked column 2 (Site.matching) first
## rank1 rank2 rank3 rank4 rank5
## 1 2 3 5 4 1
## 2 2 3 1 5 4
This data set can now be used to create a data set in the required format for PlackettLuce.
sdataWP <- as.rankings(sdataWR,
input="orderings",
items=names(sdataW)) # column names were the options
sdataWP[1,]
## [1] "Site.matching > Origin > Source.type ..."
sdataWP[2,]
## [1] "Site.matching > Origin > Supplier > ..."
In the second of the new data sets, the reference level corresponds to the option with the best combined rank overall. As it happens in our data set, this was already the case (column ‘Site.matching’). But within a general working pipeline, this is how the new data set can be created.
col1 <- as.numeric(which.min(colSums(sdata1)))
col1
## [1] 1
# col1 <- 2 # to make 'Origin' the reference level, see section 4.2
sdataB <- cbind(sdata1[, col1], sdata1[, -col1])
names(sdataB)[1] <- names(sdata1)[col1]
head(sdataB)
## Site.matching Origin Species.list Source.type Supplier
## 1 1 2 4 3 5
## 2 1 2 5 4 3
## 3 1 3 5 4 2
## 4 1 4 5 2 3
## 5 2 1 4 3 5
## 6 2 4 1 3 5
Next create the data set with columns for different ranks.
sdataBR <- sdataB
names(sdataBR) <- paste0("rank", c(1:opt.n))
sdataBR[,] <- NA
for (i in 1:nrow(sdataBR)) {
data.i <- sdataB[i, 1:opt.n]
for (j in 1:opt.n) {
sdataBR[i, j] <- which(data.i == j)
}
}
The sdataBR data set can now be converted into a rankings object.
sdataBP <- as.rankings(sdataBR,
input="orderings",
items=names(sdataB)) # column names were the options
sdataBP[1,]
## [1] "Site.matching > Origin > Source.type ..."
It is actually possible to directly create a rankings object from the input data that was loaded in Section 3.1. Moreover, as shown in the updates for model fitting (Section 6), it is possible to define the reference levels during that step. Thus steps shown in sections 3.2 and 3.3 are not necessary.
sdata2022 <- as.rankings(sdata1)
sdata2022[1:10, ]
## [1] "Site.matching > Origin > Source.type ..."
## [2] "Site.matching > Origin > Supplier > ..."
## [3] "Site.matching > Supplier > Origin > ..."
## [4] "Site.matching > Source.type > Suppli ..."
## [5] "Origin > Site.matching > Source.type ..."
## [6] "Species.list > Site.matching > Sourc ..."
## [7] "Species.list > Site.matching > Suppl ..."
## [8] "Species.list > Site.matching > Suppl ..."
## [9] "Origin > Supplier > Site.matching > ..."
## [10] "Species.list > Origin > Site.matchin ..."
The coefficient for the reference level (‘Supplier’) is zero.
W.mod <- PlackettLuce(sdataWP)
W.mod
## Call: PlackettLuce(rankings = sdataWP)
##
## Coefficients:
## Supplier Site.matching Origin Species.list Source.type
## 0.0000 1.2115 1.2687 0.7729 0.6741
summary(W.mod)
## Call: PlackettLuce(rankings = sdataWP)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Supplier 0.0000 NA NA NA
## Site.matching 1.2115 0.2072 5.846 5.04e-09 ***
## Origin 1.2687 0.2034 6.237 4.46e-10 ***
## Species.list 0.7729 0.2083 3.710 0.000207 ***
## Source.type 0.6741 0.1966 3.429 0.000606 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
With all the coefficients being significant (P < 0.001), we concluded in the working paper that ‘Supplier’ was statistically significantly the lowest-ranked option compared to all other response options.
The quasi-variances can be visualized as follows.
qv <- qvcalc(W.mod)
qv$qvframe <- qv$qvframe[order(coef(W.mod)), ]
par(mai=c(1.5, 0.5, 0.5, 0.5))
plot(qv, xaxt="n")
axis(1, at=c(1:opt.n),
labels=rownames(qv$qvframe),
las=2)
The coefficient for the reference level (‘Site.matching’) is zero. Whereas we could have expected that coefficients would be negative for all the other options, the coefficient was positive for the ‘Origin’ option. This means that PlackettLuce ranked ‘Origin’ higher than ‘Site.matching’. However, the probability associated with the coefficient for ‘Origin’ was significantly higher than 0.05, indicating that both ‘Site.matching’ and ‘Origin’ did not differ in ranking significantly.
The third-ranked option is ‘Species.list’, significantly different from ‘Site.matching’ and ‘Origin’. This could be checked by making ‘Origin’ the baseline - see the script of section 3.2 with reference to this section (4.2).
B.mod <- PlackettLuce(sdataBP)
B.mod
## Call: PlackettLuce(rankings = sdataBP)
##
## Coefficients:
## Site.matching Origin Species.list Source.type Supplier
## 0.00000 0.05721 -0.43863 -0.53745 -1.21151
summary(B.mod)
## Call: PlackettLuce(rankings = sdataBP)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Site.matching 0.00000 NA NA NA
## Origin 0.05721 0.18601 0.308 0.75839
## Species.list -0.43863 0.19993 -2.194 0.02824 *
## Source.type -0.53745 0.19114 -2.812 0.00493 **
## Supplier -1.21151 0.20724 -5.846 5.04e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
When visualizing the rankings, the confidence intervals for the ‘Species.List’ overlapped those of the best and second-best option. In the working paper, we relied on the significance levels of the coefficients to determine if rankings were significantly different (and therefore, that ‘Species.List’ ranked significantly lower).
qv <- qvcalc(B.mod)
qv$qvframe <- qv$qvframe[order(coef(B.mod)), ]
par(mai=c(1.5, 0.5, 0.5, 0.5))
plot(qv, xaxt="n")
axis(1, at=c(1:opt.n),
labels=rownames(qv$qvframe),
las=2)
The baseline can be defined during the model fitting.
PL.mod <- PlackettLuce(sdata2022)
PL.mod
## Call: PlackettLuce(rankings = sdata2022)
##
## Coefficients:
## Site.matching Origin Species.list Source.type Supplier
## 0.00000 0.05721 -0.43863 -0.53745 -1.21151
summary(PL.mod)
## Call: PlackettLuce(rankings = sdata2022)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Site.matching 0.00000 NA NA NA
## Origin 0.05721 0.18601 0.308 0.75839
## Species.list -0.43863 0.19993 -2.194 0.02824 *
## Source.type -0.53745 0.19114 -2.812 0.00493 **
## Supplier -1.21151 0.20724 -5.846 5.04e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
The baseline level can be chosen while summarizing via argument of ref.
coef(PL.mod, ref="Supplier")
## Site.matching Origin Species.list Source.type Supplier
## 1.2115053 1.2687198 0.7728725 0.6740572 0.0000000
summary(PL.mod, ref="Supplier")
## Call: PlackettLuce(rankings = sdata2022)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Site.matching 1.2115 0.2072 5.846 5.04e-09 ***
## Origin 1.2687 0.2034 6.237 4.46e-10 ***
## Species.list 0.7729 0.2083 3.710 0.000207 ***
## Source.type 0.6741 0.1966 3.429 0.000606 ***
## Supplier 0.0000 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
coef(PL.mod, ref="Origin")
## Site.matching Origin Species.list Source.type Supplier
## -0.05721454 0.00000000 -0.49584729 -0.59466260 -1.26871983
summary(PL.mod, ref="Origin")
## Call: PlackettLuce(rankings = sdata2022)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Site.matching -0.05721 0.18601 -0.308 0.75839
## Origin 0.00000 NA NA NA
## Species.list -0.49585 0.19455 -2.549 0.01081 *
## Source.type -0.59466 0.18388 -3.234 0.00122 **
## Supplier -1.26872 0.20342 -6.237 4.46e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
As before, the candidate level for the worst combined ranking overall can be calculated by the average rankings:
col1 <- as.numeric(which.max(colSums(sdata1)))
worst.cand <- names(sdata1)[col1]
worst.cand
## [1] "Supplier"
col2 <- as.numeric(which.min(colSums(sdata1)))
best.cand <- names(sdata1)[col2]
best.cand
## [1] "Site.matching"
These candidates can now be used while summarizing the results:
summary(PL.mod, ref=worst.cand)
## Call: PlackettLuce(rankings = sdata2022)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Site.matching 1.2115 0.2072 5.846 5.04e-09 ***
## Origin 1.2687 0.2034 6.237 4.46e-10 ***
## Species.list 0.7729 0.2083 3.710 0.000207 ***
## Source.type 0.6741 0.1966 3.429 0.000606 ***
## Supplier 0.0000 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
summary(PL.mod, ref=best.cand)
## Call: PlackettLuce(rankings = sdata2022)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Site.matching 0.00000 NA NA NA
## Origin 0.05721 0.18601 0.308 0.75839
## Species.list -0.43863 0.19993 -2.194 0.02824 *
## Source.type -0.53745 0.19114 -2.812 0.00493 **
## Supplier -1.21151 0.20724 -5.846 5.04e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 739.93 on 826 degrees of freedom
## AIC: 747.93
## Number of iterations: 7
Also the qvcalc function has an argument of ref:
qv <- qvcalc(PL.mod, ref=worst.cand)
qv$qvframe <- qv$qvframe[order(coef(PL.mod)), ]
par(mai=c(1.5, 0.5, 0.5, 0.5))
plot(qv, xaxt="n")
axis(1, at=c(1:opt.n),
labels=rownames(qv$qvframe),
las=2)
qv <- qvcalc(PL.mod, ref=best.cand)
qv$qvframe <- qv$qvframe[order(coef(PL.mod)), ]
par(mai=c(1.5, 0.5, 0.5, 0.5))
plot(qv, xaxt="n")
axis(1, at=c(1:opt.n),
labels=rownames(qv$qvframe),
las=2)
Setting the argument of log to FALSE, the probability of winning is calculated. Here the reference level is not required.
coef(PL.mod, log=FALSE, ref=worst.cand)
## Site.matching Origin Species.list Source.type Supplier
## 0.27887884 0.29530005 0.17985387 0.16293142 0.08303582
coef(PL.mod, log=FALSE, ref=best.cand)
## Site.matching Origin Species.list Source.type Supplier
## 0.27887884 0.29530005 0.17985387 0.16293142 0.08303582
In the working paper, we further analyzed whether rankings were different for different types of respondents.
Here I will only show the results for researchers, where the baseline level corresponds to the best-combined ranked option.
sdata2 <- sdata[sdata$respondent == "researcher", ]
sdata2 <- sdata2[, -1]
Make the first column the reference level. In this case, the reference level is ‘Origin’.
col1 <- as.numeric(which.min(colSums(sdata2)))
col1
## [1] 2
sdataB <- cbind(sdata2[, col1], sdata2[, -col1])
names(sdataB)[1] <- names(sdata2)[col1]
head(sdataB)
## Origin Site.matching Species.list Source.type Supplier
## 15 2 1 3 5 4
## 16 2 1 4 3 5
## 17 2 1 4 3 5
## 18 2 1 5 3 4
## 19 2 1 5 3 4
## 20 2 1 5 3 4
Next create the data set with columns for different ranks.
sdataBR <- sdataB
names(sdataBR) <- paste0("rank", c(1:opt.n))
sdataBR[,] <- NA
for (i in 1:nrow(sdataBR)) {
data.i <- sdataB[i, 1:opt.n]
for (j in 1:opt.n) {
sdataBR[i, j] <- which(data.i == j)
}
}
The sdataBR data set can now be converted into a rankings object.
sdataBP <- as.rankings(sdataBR,
input="orderings",
items=names(sdataB)) # column names were the options
sdataBP[1,]
## 15
## "Site.matching > Origin > Species.lis ..."
B.mod <- PlackettLuce(sdataBP)
B.mod
## Call: PlackettLuce(rankings = sdataBP)
##
## Coefficients:
## Origin Site.matching Species.list Source.type Supplier
## 0.0000 -0.1878 -0.4816 -0.7070 -1.5073
summary(B.mod)
## Call: PlackettLuce(rankings = sdataBP)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Origin 0.0000 NA NA NA
## Site.matching -0.1878 0.2044 -0.918 0.358371
## Species.list -0.4816 0.2125 -2.266 0.023471 *
## Source.type -0.7070 0.2032 -3.479 0.000503 ***
## Supplier -1.5073 0.2297 -6.562 5.3e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual deviance: 602.04 on 686 degrees of freedom
## AIC: 610.04
## Number of iterations: 7
Visualize the rankings.
qv <- qvcalc(B.mod)
qv$qvframe <- qv$qvframe[order(coef(B.mod)), ]
par(mai=c(1.5, 0.5, 0.5, 0.5))
plot(qv, xaxt="n")
axis(1, at=c(1:opt.n),
labels=rownames(qv$qvframe),
las=2)
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] PlackettLuce_0.4.0 readxl_1.3.1
##
## loaded via a namespace (and not attached):
## [1] gmp_0.6-2 Rcpp_1.0.7 RSpectra_0.16-0 pillar_1.4.4
## [5] cellranger_1.1.0 compiler_4.0.2 CVXR_1.0-9 tools_4.0.2
## [9] partykit_1.2-13 rpart_4.1-15 digest_0.6.25 bit_4.0.4
## [13] lifecycle_0.2.0 tibble_3.0.1 evaluate_0.14 lattice_0.20-41
## [17] pkgconfig_2.0.3 rlang_0.4.8 Matrix_1.2-18 igraph_1.2.6
## [21] yaml_2.2.1 mvtnorm_1.1-1 libcoin_1.0-8 xfun_0.15
## [25] Rmpfr_0.8-4 stringr_1.4.0 knitr_1.28 vctrs_0.3.4
## [29] psychotools_0.6-1 qvcalc_1.0.2 bit64_4.0.5 grid_4.0.2
## [33] psychotree_0.15-4 R6_2.4.1 survival_3.1-12 rmarkdown_2.3
## [37] Formula_1.2-3 magrittr_1.5 ellipsis_0.3.1 htmltools_0.5.1.1
## [41] splines_4.0.2 sandwich_2.5-1 stringi_1.4.6 inum_1.0-4
## [45] crayon_1.3.4 zoo_1.8-8