Psychometrics can be used in transport engineering to evaluate the habits and attitudes of system users. Such habits and attitudes can be measured to evaluate users’ preferences and choices. Since choices may differ from preferences due to external factors such as cost, supply, availability, convenience, etc., psychometrics can help assess when habits and attitudes are dissonant, revealing restrictions that limit users’ opportunities or even lack of supply or quality of services. The main limitation of psychometrics in transport engineering concerns the variables. Most studies in transport engineering use secondary data such as spatial data, sociodemographic, supply, and infrastructure data. The few interviews with users do not usually adopt a Likert scale and do not rely on ordinal data. Another prevalent source of interviews is origin/destination surveys, which do not usually bring ordinal questions but bring sociodemographic and spatial data. Although some of these data can be processed for ordinal data, they are designed for something other than that purpose and can introduce much noise into the results.
Psychometrics can contribute to obtaining a more integrated view in transport engineering analyses. Transportation engineering interview reviews often evaluate each question individually. When a set of questions belonging to the same topic are evaluated within the same construct, the analyses are enriched, allowing the information load of all questions to be integrated into more robust analyses.
The latent variable consists of a hidden variable model obtained through a representational measurement instrument that depends on the discrimination capacity of the items that compose it, the difficulty of the items, and the chance of random correctness. In this sense, the latent variable corresponds to a phenomenon that is not directly observable but obtained through other variables (items). Thus, items with solid covariance and a theoretical basis that associates them can be evaluated jointly as a set of individuals’ responses determined by the latent variable. In other words, the latent viable corresponds to a latent cause that is not observed but can be obtained or estimated from associated items. Ultimately, the latent variable consists of a latent construct or trait that is not directly observable. However, although not observable, this trait is elementary in shaping the habits and attitudes observed. Latent variables are abstract elements with a dispositive character expressed through measured attitudes; therefore, latent variables are expressed as parameters of the associated items.
CTT has as its main point of analysis the total result obtained in a test (dealing with the question of the individual’s true score and the error associated with the measurement). In contrast, IRT’s main point of analysis is focused on the test items individually. CTT also assumes that the expected value for the test is the true value and that the result’s precision increases as more observations representing the population of interest are added. CTT aims to select the best items from a more extensive set based on discrimination and difficulty criteria. Another characteristic of CTT is that it requires parallel instruments to assess reliability. CTT also uses Confirmatory Factor Analysis to ensure that the data is coherent with the chosen factor structure (latent structure) using an assumption of a linear relationship between the items, which, therefore, requires (even if only as an assumption) the same scale difficulty for all items. The IRT differs not as an opposition but as an improvement of the CTT. IRT assumes that the individual has latent traits that influence the chance of success in getting the items right. Therefore, this latent (dominant) trait allows us to estimate the individual’s chance of success. From this point on, the IRT begins to differentiate itself because if the latent trait is independent in each individual, it is independent of the sample and independent of the level of difficulty of the instrument. This means that, unlike CTT, IRT postulates that an individual’s latent trait can be estimated even from a non-representative sample. Another postulate that differentiates the IRT from the CTT is that while the CTT was based on the assumption that measurement errors must be for all individuals, the IRT does not have such an imposition. Furthermore, IRT does not require the application of parallel tests to estimate the instrument’s reliability. Finally, while CTT dealt separately with the parameters difficulty, discrimination, and random error, the IRT incorporates the error with the same scale as the skill, allowing them to be related.
Item 1 presents a reasonable capacity for discrimination. However, it presents little difficulty at the same time, indicating that individuals with almost no ability can answer the item correctly. Item 2 has a low discrimination capacity, which ends up being insignificant, as the item has the lowest degree of difficulty and is, therefore, unnecessary in a test. Item 3 presents a strong correlation between skill and success, indicating that the chance of success increases linearly with the individual’s skill, and therefore, the discrimination capacity is low. Item 4 brings a new situation where there is little chance of individuals with little skill responding correctly to the item; however, after a certain degree of skill, the chance of a correct response increases; on the other hand, the discriminant delta in X is vast, which reduces the item’s ability to discriminate. Finally, item 5 demonstrates greater discrimination capacity, although with a moderate slope. This item also presents a high degree of difficulty, as the chance of a correct answer only varies after the individual’s 50th percentile of ability (latent trait).
library(dplyr)
library(corrplot)
library(mirt)
df = readRDS("C:/Users/fagne/OneDrive/PPGEP-LASTRAN/Doutorado/ComputationalPsychometricsAppliedToEngineering/ds_2.rds")
dfsub = df[ , 2:ncol(df)]
dfsub$ne = NULL
glimpse(dfsub)
## Rows: 11
## Columns: 15
## $ pl <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1
## $ gi <dbl> 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1
## $ ef <dbl> 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1
## $ re <dbl> 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1
## $ co <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1
## $ ts <dbl> 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1
## $ fl <dbl> 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1
## $ `in` <dbl> 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1
## $ pr <dbl> 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1
## $ he <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1
## $ sa <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
## $ rl <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0
## $ pa <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0
## $ en <dbl> 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1
## $ fr <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1
#dfsub$sa = (dfsub$sa-1)*(-1)
M<-cor(dfsub)
M
## pl gi ef re co ts
## pl 1.00000000 0.6708204 0.5416667 0.54166667 0.2886751 0.38575837
## gi 0.67082039 1.0000000 0.6708204 0.67082039 0.4303315 0.44854261
## ef 0.54166667 0.6708204 1.0000000 0.54166667 0.2886751 0.38575837
## re 0.54166667 0.6708204 0.5416667 1.00000000 0.2886751 -0.03857584
## co 0.28867513 0.4303315 0.2886751 0.28867513 1.0000000 0.35634832
## ts 0.38575837 0.4485426 0.3857584 -0.03857584 0.3563483 1.00000000
## fl 0.38575837 0.8280787 0.3857584 0.38575837 0.3563483 0.21428571
## in 0.76980036 0.5163978 0.2405626 0.24056261 0.2222222 0.62360956
## pr 0.46291005 0.6900656 0.4629100 0.46291005 0.6236096 0.17857143
## he 0.24056261 0.5163978 0.7698004 0.24056261 0.2222222 0.62360956
## sa 0.19364917 0.2886751 0.1936492 0.19364917 0.6708204 0.23904572
## rl 0.08333333 0.2608746 0.5416667 0.08333333 -0.2405626 0.38575837
## pa 0.28867513 0.4303315 0.2886751 0.28867513 -0.2222222 -0.13363062
## en 0.67082039 0.6333333 0.2608746 0.67082039 0.4303315 0.06900656
## fr 0.81009259 0.8280787 0.3857584 0.81009259 0.3563483 0.21428571
## fl in pr he sa rl
## pl 0.38575837 0.7698004 0.46291005 0.24056261 0.1936492 0.08333333
## gi 0.82807867 0.5163978 0.69006556 0.51639778 0.2886751 0.26087460
## ef 0.38575837 0.2405626 0.46291005 0.76980036 0.1936492 0.54166667
## re 0.38575837 0.2405626 0.46291005 0.24056261 0.1936492 0.08333333
## co 0.35634832 0.2222222 0.62360956 0.22222222 0.6708204 -0.24056261
## ts 0.21428571 0.6236096 0.17857143 0.62360956 0.2390457 0.38575837
## fl 1.00000000 0.1336306 0.57142857 0.13363062 0.2390457 -0.03857584
## in 0.13363062 1.0000000 0.35634832 0.38888889 0.1490712 0.24056261
## pr 0.57142857 0.3563483 1.00000000 0.35634832 0.4183300 0.03857584
## he 0.13363062 0.3888889 0.35634832 1.00000000 0.1490712 0.76980036
## sa 0.23904572 0.1490712 0.41833001 0.14907120 1.0000000 -0.51639778
## rl -0.03857584 0.2405626 0.03857584 0.76980036 -0.5163978 1.00000000
## pa 0.35634832 0.2222222 0.62360956 0.22222222 -0.1490712 0.28867513
## en 0.44854261 0.5163978 0.31052950 0.04303315 0.2886751 -0.14907120
## fr 0.60714286 0.6236096 0.57142857 0.13363062 0.2390457 -0.03857584
## pa en fr
## pl 0.28867513 0.67082039 0.81009259
## gi 0.43033148 0.63333333 0.82807867
## ef 0.28867513 0.26087460 0.38575837
## re 0.28867513 0.67082039 0.81009259
## co -0.22222222 0.43033148 0.35634832
## ts -0.13363062 0.06900656 0.21428571
## fl 0.35634832 0.44854261 0.60714286
## in 0.22222222 0.51639778 0.62360956
## pr 0.62360956 0.31052950 0.57142857
## he 0.22222222 0.04303315 0.13363062
## sa -0.14907120 0.28867513 0.23904572
## rl 0.28867513 -0.14907120 -0.03857584
## pa 1.00000000 -0.04303315 0.35634832
## en -0.04303315 1.00000000 0.82807867
## fr 0.35634832 0.82807867 1.00000000
corrplot(M, method="pie")
library(RColorBrewer)
cor.mtest <- function(mat, ...) {
mat <- as.matrix(mat)
n <- ncol(mat)
p.mat<- matrix(NA, n, n)
diag(p.mat) <- 0
for (i in 1:(n - 1)) {
for (j in (i + 1):n) {
tmp <- cor.test(mat[, i], mat[, j], ...)
p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
p.mat <- cor.mtest(dfsub)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method="color", col=col(200),
type="upper", order="hclust",
addCoef.col = "black",
tl.col="black", tl.srt=45,
p.mat = p.mat, sig.level = 0.999, insig = "blank",
diag=FALSE
)
#dfsub$sa = (dfsub$sa-1)*(-1)
mirt(data = dfsub ,model = 1, itemtype ="Rasch", verbose =FALSE)
##
## Call:
## mirt(data = dfsub, model = 1, itemtype = "Rasch", verbose = FALSE)
##
## Full-information item factor analysis with 1 factor(s).
## Converged within 1e-04 tolerance after 20 EM iterations.
## mirt version: 1.41
## M-step optimizer: nlminb
## EM acceleration: Ramsay
## Number of rectangular quadrature: 61
## Latent density type: Gaussian
##
## Log-likelihood = -70.54286
## Estimated parameters: 16
## AIC = 173.0857
## BIC = 179.452; SABIC = 131.276
## G2 (32751) = 91.1, p = 1
## RMSEA = 0, CFI = NaN, TLI = NaN
mirt(data = dfsub ,model = 1, itemtype ="2PL", verbose =FALSE)
##
## Call:
## mirt(data = dfsub, model = 1, itemtype = "2PL", verbose = FALSE)
##
## Full-information item factor analysis with 1 factor(s).
## Converged within 1e-04 tolerance after 303 EM iterations.
## mirt version: 1.41
## M-step optimizer: BFGS
## EM acceleration: Ramsay
## Number of rectangular quadrature: 61
## Latent density type: Gaussian
##
## Log-likelihood = -61.03451
## Estimated parameters: 30
## AIC = 182.069
## BIC = 194.0059; SABIC = 103.6759
## G2 (32737) = 72.09, p = 1
## RMSEA = 0, CFI = NaN, TLI = NaN
mirt(data = dfsub ,model = 1, itemtype ="3PL", verbose =FALSE, draws = 10000)
##
## Call:
## mirt(data = dfsub, model = 1, itemtype = "3PL", draws = 10000,
## verbose = FALSE)
##
## Full-information item factor analysis with 1 factor(s).
## FAILED TO CONVERGE within 1e-04 tolerance after 500 EM iterations.
## mirt version: 1.41
## M-step optimizer: BFGS
## EM acceleration: Ramsay
## Number of rectangular quadrature: 61
## Latent density type: Gaussian
##
## Log-likelihood = -55.13017
## Estimated parameters: 45
## AIC = 200.2603
## BIC = 218.1656; SABIC = 82.67063
## G2 (32722) = 60.28, p = 1
## RMSEA = 0, CFI = NaN, TLI = NaN
plot(0,0)
psych::glb(dfsub, key=NULL)
## Call: psych::glb(r = dfsub, key = NULL)
##
## Estimates of the Greatest Lower Bound for reliability, based on factor and cluster models
## GLB estimated from factor based communalities = 0.98 with 6 factors.
## Use glb.fa to see more details
##
## Various estimates based upon splitting the scale into two (see keys for the various splits)
## Beta = 0.65
## Beta fa 0.54 This is an estimate of the worst split half reliability
## Kmeans clusters for best split 0.99
## Cluster based estimates
## glb.IC = 0.96
## glb.max 0.99 Is the maximum of these estimates
## alpha-PC = 0.84 An estimate of alpha based on eignvalues
## TenBerge bounds
## mu0 = 0.89 mu1 = 0.9 mu2 = 0.91 mu3 = 0.91
##
## estimated greatest lower bound based upon splitHalf = 0.77
##
## Use short = FALSE to see the various ways of splitting the scale
mirt::itemstats(dfsub)$overall
glimpse(dfsub)
## Rows: 11
## Columns: 15
## $ pl <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1
## $ gi <dbl> 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1
## $ ef <dbl> 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1
## $ re <dbl> 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1
## $ co <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1
## $ ts <dbl> 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1
## $ fl <dbl> 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1
## $ `in` <dbl> 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1
## $ pr <dbl> 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1
## $ he <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1
## $ sa <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
## $ rl <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0
## $ pa <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0
## $ en <dbl> 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1
## $ fr <dbl> 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1
\[j = \frac{r_d(1-r_o)}{r_o(1-r_d)}\]
ro = 0.89
rd = 0.9
j = (rd*(1-ro)) / (ro*(1-rd))
paste0("J: ", j)
## [1] "J: 1.1123595505618"
n = ncol(dfsub)
paste0("Itens atuais no dataset: ", n)
## [1] "Itens atuais no dataset: 15"
itens = ceiling(n*j)
paste0("Itens necessários para internal consistency= 0.9: ", itens)
## [1] "Itens necessários para internal consistency= 0.9: 17"
#ufs::scaleStructure(dfsub[ , c(1,2,3,4,5,11,12,14)])
dfsub[, 1:15] <- sapply(dfsub[, 1:15], as.integer)
glimpse(dfsub)
## Rows: 11
## Columns: 15
## $ pl <int> 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1
## $ gi <int> 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1
## $ ef <int> 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1
## $ re <int> 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1
## $ co <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1
## $ ts <int> 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1
## $ fl <int> 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1
## $ `in` <int> 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1
## $ pr <int> 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1
## $ he <int> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1
## $ sa <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
## $ rl <int> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0
## $ pa <int> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0
## $ en <int> 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1
## $ fr <int> 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1
ufs::scaleStructure(dfsub[ , c(1,2,3,4,5,6,11,12,14)])
Dataframe: | dfsub[, c(1, 2, 3, 4, 5, 6, 11, 12, 14)] |
Items: | all |
Observations: | 11 |
Positive correlations: | 32 |
Number of correlations: | 36 |
Percentage positive correlations: | 89 |
Omega (total): | 0.85 |
Omega (hierarchical): | 0.69 |
Revelle’s Omega (total): | 0.92 |
Greatest Lower Bound (GLB): | NA |
Coefficient H: | 0.92 |
Coefficient Alpha: | 0.82 |
Omega (total): | [0.71; 0.98] |
Coefficient Alpha: | [0.65; 0.99] |
Ordinal Omega (total): | 0.93 |
Ordinal Omega (hierarch.): | 0.88 |
Ordinal Coefficient Alpha: | 0.91 |
Ordinal Omega (total): | [0.87; 0.99] |
Ordinal Coefficient Alpha: | [0.83; 0.99] |
Note: the normal point estimate and confidence interval for omega are based on the procedure suggested by Dunn, Baguley & Brunsden (2013) using the MBESS function ci.reliability, whereas the psych package point estimate was suggested in Revelle & Zinbarg (2008). See the help (‘?ufs::scaleStructure’) for more information.