Interpretable Predictive Segmentation for Doctoral LIS Workforce Planning

A theory-informed, reproducible analysis of expressed doctoral study interest in the Philippine Librarians Census

Author
Affiliation

Dan Anthony Dorado

School of Library and Information Studies

Published

May 21, 2026

Abstract

Doctoral education in library and information science (LIS) is a workforce-development mechanism through which the profession builds research capacity, academic leadership, and evidence-informed institutional practice. Yet little is known about how structural opportunity, professional capital, and institutional access conditions shape expressed interest in doctoral LIS study within national librarian populations. This article develops an interpretable predictive segmentation framework using the Philippine Librarians Census to classify expressed doctoral study interest and translate empirically derived workforce segments into ethically bounded recruitment personas. The study uses a cross-sectional, sequential predictive segmentation design combining descriptive workforce profiling, interpretable machine-learning classification, calibration and subgroup diagnostics, segmentation analysis, and persona translation safeguards. The findings show that expressed doctoral study interest is patterned across educational, professional, institutional, and geographic contexts, supporting the use of predictive analytics as a planning tool rather than as an admissions, ranking, or individual forecasting system. The article contributes to LIS workforce research by demonstrating how national professional census data can support doctoral pipeline planning while preserving inferential restraint, interpretability, fairness awareness, and non-exclusionary governance.

Keywords

library and information science education, doctoral education, predictive analytics, machine learning, workforce planning, recruitment personas, Philippine librarians

1 Introduction

1.1 Doctoral Workforce Development in LIS

Doctoral education in library and information science (LIS) is not merely an advanced credentialing activity. It is part of the workforce infrastructure through which the profession develops future researchers, educators, policy actors, institutional leaders, and evidence-oriented practitioners. In national LIS systems where professional work is distributed across institutional sectors and geographic regions, doctoral program planning is therefore both an educational-design problem and a problem of structural access to professional mobility.

This article develops an interpretable predictive segmentation framework for doctoral LIS workforce planning. It does not attempt to predict actual enrollment, admissions suitability, individual capability, or professional worth. Its narrower target is expressed doctoral study interest: a self-reported signal that can support institutional planning but cannot substitute for longitudinal enrollment evidence. This construct separation is central. Expressed interest, professional preparedness, institutional reachability, recruitment feasibility, and realized enrollment are related but analytically distinct phenomena.

1.2 Structural Inequality in Graduate Education Access

The study assumes that doctoral interest is socially situated. Expressed interest may be amplified by prior access to graduate education, research mentoring, professional networks, institutional encouragement, and geographic proximity to advanced study. It may also be suppressed by cost, workload, distance, limited information, or low perceived feasibility. Thus, a weak or absent expressed-interest signal should not be read as lack of potential. It may reflect unequal opportunity to imagine doctoral study as a realistic pathway.

1.3 Philippine LIS Workforce Context

The study is grounded in the Philippine Librarians Census, a landmark national account of the librarian workforce (Obille & Dorado, 2022). Subsequent scholarship based on the same professional landscape shows the value of national workforce evidence for understanding the structure, distribution, and policy relevance of librarianship in the Philippines (Dorado, 2024). The Philippine context makes doctoral planning especially important because professional opportunities, graduate education access, institutional support, and research exposure are likely to vary across regions and employment settings.

The problem addressed here is not simply whether there is “demand” for a PhD in LIS. Demand is too broad a construct for the available cross-sectional evidence. The more defensible problem is how national workforce evidence can identify professional profiles associated with expressed doctoral study interest and how those profiles can be translated into ethically bounded recruitment personas without exceeding the limits of observational data.

1.4 Problem Statement

Despite the strategic importance of doctoral education to LIS workforce sustainability, little evidence exists regarding how structural inequalities, professional capital, and institutional access conditions shape expressed doctoral study interest within national librarian populations. Existing LIS workforce studies remain predominantly descriptive and provide limited insight into how interpretable predictive analytics may support ethically bounded doctoral pipeline planning without reproducing exclusionary educational logics.

This study develops and evaluates an interpretable predictive segmentation framework using national Philippine librarian census data to identify statistically distinguishable professional profiles associated with expressed doctoral study interest. Rather than treating machine learning as an admissions, ranking, or selection mechanism, the study positions predictive segmentation as a bounded institutional planning tool for understanding heterogeneous patterns of doctoral aspiration across professional contexts.

The study further examines whether empirically derived segments can be translated into recruitment-relevant personas while preserving the inferential limits of cross-sectional workforce data. In this framing, personas are not latent psychological types, real individuals, or deterministic categories. They are interpretive communication abstractions derived from empirical segments and intended for non-exclusionary program planning.

1.5 Research Questions and Objectives

The study is guided by four research questions, each aligned with a different inferential level:

  1. What demographic, educational, professional, and institutional characteristics are associated with expressed interest in doctoral LIS study among Filipino librarians?
  2. How accurately can interpretable machine-learning models classify expressed doctoral study interest using workforce and professional profile variables?
  3. What stable workforce profile segments emerge among librarians with elevated predicted probabilities of expressed doctoral study interest?
  4. How can empirically derived workforce segments be translated into ethically bounded recruitment personas for institutional doctoral pipeline planning?

The corresponding objectives are to construct a reproducible R workflow, evaluate interpretable prediction models, identify segmentation patterns associated with expressed doctoral study interest, and translate stable segment patterns into planning-oriented personas with explicit safeguards against overinterpretation.

1.6 Study Scope and Boundaries

This is a cross-sectional predictive segmentation study. It supports classification of expressed interest patterns and strategic interpretation of professional heterogeneity. It does not support causal claims about why librarians pursue doctoral study, individual forecasting of future enrollment, admissions decisions, merit ranking, or exclusionary recruitment.

The study identifies statistically observable patterns associated with expressed doctoral interest but does not directly model the psychological or sociological mechanisms underlying aspiration formation.

The article makes three contributions. First, it demonstrates an interpretable workforce analytics approach for doctoral planning. Second, it develops an ethically bounded approach to educational segmentation that rejects admissions, ranking, and exclusionary uses. Third, it provides evidence that structural inequality and professional opportunity conditions shape the visibility of doctoral study interest in a national LIS workforce.

3 Methodology

3.1 Sequential Predictive Segmentation Design

This study uses a sequential explanatory analytics design: a cross-sectional interpretable predictive modeling stage followed by post-hoc segmentation and bounded persona translation. The analysis is exploratory and strategic rather than causal or confirmatory. Its purpose is to classify patterns of expressed doctoral study interest, identify interpretable professional segments, and translate empirically stable segment patterns into recruitment personas. Consistent with transparent prediction-model reporting principles, the section documents data provenance, outcome construction, eligibility rules, preprocessing, model development, validation, and interpretive use of model outputs (G. S. Collins et al., 2015).

The study follows a prediction-and-interpretation workflow. The major stages are summarized in Table 1, which functions as the reproducibility map for the rest of the article.

  1. Define the analytic outcome as expressed doctoral study interest.
  2. Construct an eligible analytic sample from the census.
  3. Prepare predictors representing demographic, educational, employment, geographic, professional-development, and professional-affiliation domains.
  4. Train interpretable benchmark and non-linear machine-learning models.
  5. Evaluate discrimination, calibration-relevant summaries, and classification performance.
  6. Use model scores and segmentation to construct bounded strategic personas.
  7. Interpret the personas as planning profiles rather than deterministic labels.
Table 1: Overview of the reproducible methodology.
Stage Purpose
Data source Use the openly available Philippine Librarians Census as national workforce evidence.
Outcome construction Identify respondents reporting doctoral study as a future educational plan, interpreted as expressed interest rather than enrollment behavior.
Analytic sample Restrict analysis to respondents with observable outcome status and without current PhD attainment.
Preprocessing Clean missing values, recode outcome, retain substantively relevant planning predictors, and avoid leakage.
Prediction Classify expressed doctoral study interest using interpretable and non-linear classifiers.
Segmentation Group respondents into strategically meaningful workforce profiles.
Persona synthesis Translate statistical patterns into recruitment and program-design personas with explicit anti-stereotyping safeguards.

As Table 1 shows, the workflow is organized from provenance to interpretation. This order is deliberate: the personas are presented only after the outcome, sample, predictors, preprocessing, prediction, and segmentation logic have been made explicit.

3.2 Data Source and Census Structure

The empirical source is the Philippine Librarians Census (Obille & Dorado, 2022), a national survey conducted by the University of the Philippines School of Library and Information Studies from November 2018 to October 2019 and released openly through Zenodo. The census was designed to establish baseline evidence for LIS workforce planning, professional development, education, and policy. It remains especially valuable for this study because it links professional circumstances with educational aspirations in a single national workforce instrument.

The analysis uses the local RDS file included with this article’s reproducible materials. The file is read directly in R to avoid transcription or spreadsheet-conversion errors. Table 2 records the analytic file path and confirms the dimensions of the imported object.

Table 2: Imported source data audit.
Object Source_File Rows Columns
raw_census LibrarianCensus.rds 684 107

The audit in Table 2 establishes that the analysis begins from the expected source object before any exclusions or transformations are applied. This table is included to make the starting point of the workflow inspectable.

3.3 Reproducibility Repository and Public Rendering

The article is accompanied by a public reproducibility repository at https://github.com/dddorado/lis-phd-workforce-analytics. The repository contains the Quarto manuscript, local RDS data source, bibliography, citation style, generated tables, generated figures, model diagnostic output, conceptual framework image, and rendered HTML and PDF outputs. It is intended to make the full evidentiary chain inspectable: readers can trace the analysis from the imported census object, through the executable R workflow, to the tables and figures discussed in the results.

A public rendered instance of the article is available on RPubs at https://rpubs.com/danddorado/1434018. The RPubs version serves as a stable reading copy, while the GitHub repository functions as the computational archive. The two outputs should be read together: RPubs supports access and review, and GitHub supports rerendering, file inspection, version control, and verification of the analysis artifacts. This reproducibility arrangement does not change the inferential limits of the study; it makes those limits more transparent by exposing the source file, operationalization decisions, generated outputs, and reporting workflow used to produce the manuscript.

3.4 Outcome Operationalization and Construct Validity

The dependent variable is expressed doctoral study interest, operationalized from the survey’s forward-looking education-plan item. Respondents who selected PhD are coded as the positive class, while respondents who selected another observed plan are coded as the comparison class. Respondents without an observed future-education plan are excluded from supervised modeling because their outcome status cannot be determined. Respondents already reporting PhD attainment are also excluded so that the model classifies possible future-oriented interest rather than describing current PhD holders.

This outcome is interpreted as expressed interest, not as application, admission, enrollment, persistence, capability, suitability, or demand for a specific program. The distinction is central to construct validity. A stated plan can inform recruitment strategy, but it cannot by itself establish realized demand. The binary coding is therefore a pragmatic classification device rather than a claim that doctoral interest is naturally dichotomous. Table 3 documents the resulting analytic sample and the observed positive-class rate after applying these rules.

Table 3: Analytic outcome and sample audit.
Measure Value
Original census records 684
Records with observed modeled outcome and no current PhD 420
Positive outcome records 135
Observed positive outcome rate 32.1%

As shown in Table 3, the supervised analysis is based on the subset of records for which expressed doctoral study interest can be observed. The positive-class rate provides the baseline against which later model performance and segmentation results should be read.

3.5 Predictor Domains

RQ1 organizes predictors into four conceptual domains: demographic location in the workforce, educational capital, professional capital, and institutional access conditions. Demographic variables describe social and geographic position. Educational variables represent accumulated formal preparation. Professional variables represent career stage, role, sector, and workplace experience. Institutional access variables represent conditions that may make doctoral study more or less feasible. This domain structure is used to prevent a purely data-driven predictor list and to keep interpretation tied to the theoretical framework.

3.6 Leakage Control

Predictors are selected to represent workforce characteristics that could plausibly inform doctoral recruitment strategy before a PhD program has direct applicant data. The candidate predictor domains include demographic background, professional experience, educational preparation, current study status, employment setting, institutional context, compensation, region, mobility, continuing professional development, collection or service indicators, and professional affiliation.

Variables that directly encode the outcome, duplicate the outcome, or would be unavailable in a real recruitment-planning scenario are excluded from model training. This leakage-control step is important because the goal is not merely to maximize predictive performance. The goal is to estimate a credible planning model whose predictors could be interpreted ethically and operationally. Table 4 summarizes how many protocol-defined predictors are available in the imported source.

Table 4: Predictor availability audit.
Measure Value
Candidate predictors named in protocol 46
Available predictors 46
Unavailable predictors 0

The availability audit in Table 4 is a reproducibility checkpoint. It separates the conceptual modeling protocol from the fields actually present in the local source file, making later changes in source structure easier to diagnose.

3.7 Missing Data and Preprocessing

The preprocessing protocol is designed to preserve substantive meaning while making the data suitable for supervised learning and segmentation. Text fields are trimmed, common non-informative responses are converted to missing values, categorical predictors are treated as nominal unless an ordered interpretation is explicit, and numeric fields are retained as numeric when their meaning is stable. Apparent extreme values in age and years of service are treated conservatively as missing for modeling because they are more likely to reflect data-entry noise than plausible professional histories.

Missingness is handled inside the modeling workflow rather than by deleting all incomplete records. The working assumption is that missingness is unlikely to be completely random because survey nonresponse may reflect professional context, question relevance, or respondent burden. Numeric predictors are imputed using training-set medians. Categorical predictors are imputed using an explicit missing category. Low-frequency categorical levels may be pooled during modeling to reduce instability. The exact imputation and encoding operations are to be estimated within resampling folds to avoid information leakage from assessment data into training data. Table 5 identifies the candidate predictors with the highest missingness after sample construction, which is useful for interpreting model stability and the limits of persona detail.

Table 5: Highest-missingness candidate predictors after analytic-sample construction.
Field Missing_N Missing_Percent
completing completing 199 47.4
net_salary_group net_salary_group 126 30.0
gross_salary_group gross_salary_group 85 20.2
type type 77 18.3
net_salary net_salary 54 12.9
gross_salary gross_salary 50 11.9
location location 22 5.2
lvm lvm 22 5.2
worktravel worktravel 20 4.8
position position 6 1.4
tenure tenure 4 1.0
cpdsatisfaction cpdsatisfaction 4 1.0
industry industry 3 0.7
age_group age_group 2 0.5
benefits6 benefits6 2 0.5

The missingness pattern in Table 5 is not treated as a reason to discard the analytic sample. Instead, it informs cautious interpretation: highly incomplete fields may still contribute planning signal, but they should not carry excessive weight in persona narratives.

3.8 Modeling and Validation

Two complementary supervised-learning approaches are specified. First, a regularized logistic regression model serves as an interpretable benchmark because it estimates a linear decision surface and can identify stable directional associations after encoding. Second, a random forest model serves as a flexible non-linear classifier because it can capture interactions and threshold effects common in workforce data. Where software availability permits, explainable boosting or SHAP-supported gradient boosting may be added as a sensitivity model, but black-box deep learning is not appropriate for the size, structure, or institutional stakes of this dataset. The benchmark model emphasizes transparency; the non-linear model emphasizes predictive flexibility.

Model development is organized around a stratified 75/25 train-test split, with repeated five-fold cross-validation within the training data. Hyperparameters are selected within the resampling procedure using ROC AUC as the primary tuning criterion and average precision as a secondary check under class imbalance. The holdout test set is reserved for final performance estimation. The primary discrimination metric is ROC AUC. Secondary metrics include average precision, balanced accuracy, sensitivity, specificity, precision, recall, F1, Brier score, and calibration diagnostics. Because this is a planning model, no single threshold is treated as universally correct. Threshold-dependent metrics are reported at a transparent reference threshold and may be varied in sensitivity analysis rather than interpreted as final decision rules.

The preferred validation architecture includes calibration assessment and subgroup performance checks. Calibration matters because institutional planning depends on whether estimated probabilities are numerically meaningful, not only whether cases are ranked correctly. Subgroup performance checks are required because educational access and professional opportunity can be stratified by gender, region, sector, and institutional context. These checks are interpreted as bias diagnostics rather than as claims of model fairness.

3.9 Calibration and Fairness Audit

Because doctoral access is socially and institutionally stratified, predictive segmentation can reproduce existing inequalities if used carelessly. The study therefore treats bias assessment as part of interpretation rather than as an optional technical appendix. At minimum, subgroup diagnostics should examine model performance across gender, region, institutional sector, and employment setting. Relevant summaries include subgroup recall, false-positive rate, false-negative rate, subgroup calibration, demographic parity of high-score classification, and the distribution of predicted probabilities across groups.

These diagnostics do not make the model fair by themselves. They identify where model outputs may be less reliable, where outreach could become exclusionary, and where program planners should avoid overinterpreting modeled likelihood. The ethical standard is non-exclusionary deployment: model outputs may inform where additional advising, bridge support, or recruitment attention is needed, but they must not be used to restrict opportunity.

3.10 Segmentation Stability Assessment

The persona component requires segmentation in addition to prediction. Segmentation is conducted on planning-relevant workforce features rather than on the outcome alone. This ensures that personas represent recognizable professional profiles, not merely high- and low-probability bins. The segmentation procedure standardizes numeric features, encodes categorical features, and compares candidate solutions for interpretability, separation, and strategic usefulness.

The preferred solution is not selected mechanically by a single index. It must satisfy four criteria: statistical adequacy, stability, interpretability to LIS decision-makers, and usefulness for differentiated recruitment or program design. Latent class analysis or Gaussian mixture modeling would be preferred when assumptions and software support permit probabilistic class membership. K-means can be used only as a pragmatic exploratory method when variables are carefully encoded and standardized, and when the solution is interpreted cautiously.

Cluster validation should include silhouette or comparable separation diagnostics, sensitivity to the number of segments, sensitivity to initialization, entropy or posterior classification uncertainty when probabilistic methods are used, and bootstrap or resampling-based stability checks. Each resulting segment is profiled by its size, observed expressed-interest rate, central tendencies, dominant professional characteristics, uncertainty indicators, and modeled likelihood distribution. Segments are retained only if they are reproducible enough to support interpretation and distinct enough to justify differentiated planning.

3.11 Persona Translation Protocol

Personas are constructed after prediction and segmentation, using four evidence streams: observed expressed interest by segment, modeled likelihood by segment, dominant professional characteristics, and strategic recruitment implications. Each persona includes a descriptive label, profile markers, likely program-design needs, recruitment implications, uncertainty qualifiers, and cautions. Labels are intentionally interpretive and should be read as planning shorthand rather than as essential characteristics of individuals.

Persona construction follows explicit translation rules. First, every persona must be traceable to a segment or empirically observed profile. Second, persona narratives must distinguish expressed interest from actual enrollment behavior. Third, no persona should be used to exclude individuals from outreach, advising, or opportunity. Fourth, every persona must include the statement: “This profile represents a statistical tendency rather than an individual prediction.” Fifth, persona labels must avoid claims about ability, merit, motivation, or psychological type unless directly supported by evidence. Sixth, personas should be reviewed by LIS educators or stakeholders before operational use to assess resonance, stereotype risk, and practical relevance. The proper use of personas is to broaden and sharpen recruitment design, not to narrow access. Table 6 specifies the required components of each persona so that the narrative profiles remain auditable.

Table 6: Persona construction template.
Element Description
Persona label Short strategic name for the segment.
Empirical basis Segment membership, observed outcome rate, and modeled likelihood distribution.
Profile markers Dominant professional and educational characteristics.
Recruitment implication Message, channel, or advising strategy suggested by the profile.
Program-design implication Scheduling, bridge, supervision, funding, or curriculum implication.
Uncertainty qualifier Explicit reminder that the persona is a statistical tendency rather than an individual prediction.
Caution Boundary condition preventing overinterpretation or exclusionary use.

As Table 6 indicates, each persona must include both strategic implications and a caution statement. This keeps the personas useful for recruitment design while reducing the risk that they will be mistaken for fixed identities or eligibility categories.

3.12 Ethical Governance

The analysis uses openly available secondary data and reports only aggregate patterns. Its purpose is strategic program planning, not individual-level selection, eligibility screening, or automated decision-making. Even when a model assigns high or low likelihood to a profile, that estimate should not be interpreted as an individual’s capacity for doctoral study. Doctoral participation is shaped by institutional support, financing, mentoring, scheduling, research opportunity, and personal circumstances that may not be fully captured in workforce data.

The ethical stance of the study is therefore opportunity-oriented. Machine learning is used to identify where program outreach, bridge advising, and support structures may be most needed. Personas are treated as tools for institutional empathy and planning discipline, not as fixed categories of professional worth.

4 Results

4.1 Analytic Sample and Outcome Distribution

The first result is the construction of the analytic sample used for classification and persona development. Table 7 summarizes the source file, the retained modeling sample, and the observed expressed-interest rate. The table establishes the empirical baseline for the remaining results: the analysis is not estimating universal demand, but modeling the subset of respondents for whom future educational intention can be observed and interpreted.

Table 7: Analytic sample and outcome summary.
Measure Value
Rows in original RDS 684
Variables in original RDS 107
Modeling sample after target observed and current PhD excluded 420
PhD-intending respondents in modeling sample 135
Observed PhD-interest rate 32.1%
Outcome definition after5years == “Ph.D.”

As shown in Table 7, roughly one-third of the analytic sample indicated expressed doctoral study interest. This rate is large enough to justify classification as an exploratory planning exercise, but it also confirms that doctoral interest is not evenly distributed across the whole professional population. Recruitment strategy should therefore be segmented rather than generic.

Figure 2 visualizes the same outcome distribution. The figure is useful because it makes the class balance visible before model performance is interpreted. A moderately imbalanced outcome requires attention to metrics beyond accuracy, especially recall, precision, balanced accuracy, calibration, and area-under-curve measures.

Figure 2: Distribution of the modeled doctoral-aspiration outcome.

The distribution in Figure 2 supports the use of both threshold-independent and threshold-dependent metrics. Because the positive class represents a strategic recruitment audience rather than a rare adverse event, the analysis prioritizes ranking, segmentation, and interpretation over a single fixed classification threshold.

4.2 Expressed Doctoral Study Interest Across Professional Groups

Before interpreting model output, it is important to inspect observed expressed-interest rates across major professional groups. Table 8 reports the highest observed group rates among categories with sufficient representation. These descriptive patterns are not causal estimates, but they identify where expressed doctoral study interest appears most concentrated.

Table 8: Highest observed doctoral-aspiration rates by professional group.
Variable Category N PhD-intending N PhD-interest rate
25 age_group 51-60 14 9 0.643
20 Position level Management 11 7 0.636
3 Highest/current education MLIS 140 85 0.607
4 Highest/current education Master degree 23 12 0.522
1 Currently enrolled Yes 166 77 0.464
10 Island group Mindanao 54 23 0.426
26 age_group 41-50 70 29 0.414
15 Library type Academic 197 80 0.406
21 Position level Supervisory 127 50 0.394
7 Institution sector Government 196 76 0.388
11 Island group Luzon 138 51 0.370
27 age_group 31-40 126 43 0.341

The pattern in Table 8 suggests that expressed interest is associated with professional capital and institutional location rather than simple headcount alone. The table also shows why persona construction is preferable to broad recruitment messaging: high-interest groups can be small, while larger groups may require different forms of preparation and support.

The group-level plots in Figure 3, Figure 4, Figure 5, and Figure 6 provide complementary visual checks. Together, they show that expressed interest varies across educational preparation, institutional sector, library type, and position level, reinforcing the need for differentiated recruitment and advising.

Figure 3: Observed doctoral-aspiration rate by educational background.

Figure 3 indicates that educational background is one of the clearest descriptive separators of expressed doctoral study interest. This does not imply that less advanced groups should be excluded from outreach; rather, it suggests that bridge pathways and advising may be especially important for audiences earlier in the graduate-study pipeline.

Figure 4: Observed doctoral-aspiration rate by institutional sector.

Figure 4 shows that institutional context matters for interpreting demand. Sectoral differences may reflect career incentives, promotion structures, research expectations, and access to graduate-study support.

Figure 5: Observed doctoral-aspiration rate by library type.

Figure 5 adds a service-context dimension to the results. Library type helps distinguish audiences whose doctoral motivations may differ, such as research leadership, academic service development, professional advancement, or specialized institutional capacity-building.

Figure 6: Observed doctoral-aspiration rate by position level.

Figure 6 suggests that career stage is also relevant. Doctoral recruitment cannot be reduced to early-career interest alone; it must also account for mid-career and senior professionals who may view doctoral study as a route to leadership, research authority, or academic mobility.

4.3 Predictive Model Performance

The supervised models were evaluated through repeated cross-validation, with regularized logistic regression serving as an interpretable benchmark and random forest serving as a flexible non-linear classifier. Table 9 reports the cross-validated performance metrics for both models.

Table 9: Cross-validated predictive performance by model.
Model Roc Auc Average Precision Balanced Accuracy F1 Recall Precision
Regularized logistic regression 0.793 +/- 0.026 0.601 +/- 0.029 0.711 +/- 0.039 0.608 +/- 0.051 0.674 +/- 0.100 0.558 +/- 0.029
Random forest 0.816 +/- 0.016 0.622 +/- 0.066 0.700 +/- 0.028 0.594 +/- 0.040 0.659 +/- 0.103 0.548 +/- 0.022

As Table 9 shows, both models perform meaningfully above chance, indicating that expressed doctoral study interest is patterned rather than random with respect to the available workforce evidence. The random forest has the stronger ROC AUC and average precision, while the logistic model remains useful as a transparent benchmark. The appropriate interpretation is therefore not that the model can determine who will enroll, but that the data contain enough signal to support strategic segmentation.

4.4 Calibration and Bias Diagnostics

Discrimination metrics show whether the model ranks cases effectively, but planning also requires attention to calibration and subgroup reliability. Table 10 reports decile-level calibration of the modeled interest scores. The table compares the mean predicted score with the observed expressed-interest rate within score bands.

Table 10: Calibration of modeled doctoral-interest scores by decile.
Score_Band N Mean_Predicted Observed_Rate Calibration_Gap
[0.078,0.152] 42 0.125 0.000 0.125
(0.152,0.2] 42 0.174 0.024 0.150
(0.2,0.251] 42 0.220 0.000 0.220
(0.251,0.375] 42 0.320 0.000 0.320
(0.375,0.452] 42 0.408 0.048 0.360
(0.452,0.524] 42 0.485 0.333 0.152
(0.524,0.584] 42 0.554 0.548 0.007
(0.584,0.642] 42 0.613 0.452 0.161
(0.642,0.705] 42 0.669 0.810 -0.140
(0.705,0.785] 42 0.737 1.000 -0.263

Table 10 should be interpreted as a planning diagnostic rather than as proof of deployable probability accuracy. Large gaps between predicted and observed rates would indicate that modeled probabilities should be used mainly for ranking and segmentation, not for precise enrollment forecasting.

Bias diagnostics examine whether model performance varies across major subgroups. Table 11 summarizes subgroup sample size, observed expressed-interest rate, mean modeled score, recall, and false-positive rate for selected grouping variables. These metrics are not a complete fairness audit, but they identify where model interpretation may require additional caution.

Table 11: Subgroup performance diagnostics for modeled doctoral-interest scores.
Variable Group N Observed_Rate Mean_Score Calibration_Gap High_Score_Rate Precision Recall False_Positive_Rate False_Negative_Rate
Female gender Female 317 0.312 0.428 0.115 0.432 0.672 0.929 0.206 0.071
Male gender Male 78 0.385 0.453 0.068 0.462 0.722 0.867 0.208 0.133
With diverse SOGIE gender With diverse SOGIE 18 0.278 0.411 0.134 0.389 0.714 1.000 0.154 0.000
Government institution Government 196 0.388 0.484 0.096 0.500 0.745 0.961 0.208 0.039
Private institution Private 192 0.266 0.384 0.118 0.370 0.620 0.863 0.191 0.137
NGO institution NGO 31 0.258 0.394 0.136 0.419 0.538 0.875 0.261 0.125
NCR lvm NCR 165 0.279 0.398 0.120 0.370 0.672 0.891 0.168 0.109
Luzon lvm Luzon 138 0.370 0.455 0.086 0.493 0.706 0.941 0.230 0.059
Mindanao lvm Mindanao 54 0.426 0.498 0.072 0.574 0.677 0.913 0.323 0.087
Visayas lvm Visayas 41 0.293 0.429 0.136 0.366 0.733 0.917 0.138 0.083
Missing1 lvm Missing 22 0.136 0.355 0.219 0.318 0.429 1.000 0.211 0.000
Academic type Academic 197 0.406 0.487 0.081 0.563 0.676 0.938 0.308 0.062
School type School 89 0.213 0.339 0.125 0.258 0.696 0.842 0.100 0.158
Missing type Missing 77 0.325 0.430 0.106 0.416 0.719 0.920 0.173 0.080
Special type Special 41 0.171 0.345 0.174 0.195 0.750 0.857 0.059 0.143
Public type Public 16 0.250 0.473 0.223 0.500 0.500 1.000 0.333 0.000

Table 11 is included to prevent a common overclaim: a model with acceptable overall performance may still be less reliable for specific groups. The table reports subgroup precision, recall, false-positive rate, false-negative rate, high-score classification rate, and calibration gap as descriptive fairness diagnostics. It therefore supports the article’s non-deployment boundary. The model can inform institutional planning, but any future operational use would require more formal fairness evaluation, stakeholder review, and external validation.

4.5 Model Interpretation and Feature Importance

Model interpretation focuses on global feature importance rather than individual-level prediction. Table 12 lists the top-ranked predictors from the random forest model, while Figure 7 visualizes the same ranking. These outputs are used to identify the broad dimensions most relevant to planning, not to assign deterministic importance to any one person’s profile.

Table 12: Top model feature-importance rankings.
Variable Importance Relative importance
Highest/current education 0.274 0.274
Currently enrolled 0.119 0.119
Age 0.066 0.066
Years in service 0.056 0.056
Gross salary bracket/value 0.051 0.051
Library type 0.041 0.041
Net salary bracket/value 0.033 0.033
theses 0.030 0.030
Island group 0.027 0.027
CPD sufficiency 0.024 0.024
Willing to travel for work/study 0.024 0.024
Position level 0.023 0.023

Table 12 shows that the strongest signals are not isolated demographic traits but a cluster of educational, career-stage, institutional, and professional-development indicators. This pattern supports the article’s persona strategy: expressed-interest profiles are best understood as configurations of professional capital and context rather than as single-variable categories.

Figure 7: Random forest feature-importance ranking.

The visual ranking in Figure 7 confirms the same interpretation in a more scannable form. The steepness of the ranking also suggests that a small number of planning dimensions carry much of the model’s global signal, while lower-ranked features should be interpreted more cautiously.

4.6 Market Segments

Segmentation translates model-relevant patterns into strategic audience groups. Table 13 reports the four market segments, their sizes, observed expressed-interest counts, rates, and dominant profile markers. Figure 8 provides a visual summary of these segment differences.

Table 13: Market segments for PhD in LIS recruitment planning.
Segment N PhD-intending N PhD-interest rate Median age Median years service Top education Top sector Top library type Top position Top island group Currently enrolled share Travel willing share
Advanced-degree growth seekers 19 10 0.526 37 14.0 MLIS Private Academic Supervisory NCR 0.474 0.526
Mid-career government career consolidators 103 46 0.447 43 17.0 MLIS Government Academic Supervisory Luzon 0.214 0.323
Younger government academic aspirants 140 59 0.421 30 6.5 BLIS Government Academic Senior level NCR 0.679 0.415
Early-career credential builders 158 20 0.127 24 2.0 BLIS Private Academic Junior level NCR 0.255 0.327

The segments in Table 13 show a strategic tension common in doctoral pipeline planning: the smallest segment has the highest observed expressed-interest rate, while larger segments may represent broader but more developmentally varied recruitment audiences. This finding argues for a portfolio strategy rather than a single recruitment campaign.

Figure 8: Observed doctoral aspiration and profile differences by market segment.

Figure 8 reinforces the segment-level interpretation. The visual separation among segments suggests that recruitment audiences differ not only in estimated interest but also in professional profile, implying different messages, supports, and program pathways.

Because segment interpretation is vulnerable to overstatement, Table 14 reports a minimal stability diagnostic using the scored sample: segment size, share of the analytic sample, mean modeled score, and within-segment score dispersion. These values do not replace formal bootstrap stability, entropy, or latent-class diagnostics, but they make visible whether segments are sharply or weakly separated in modeled-score space.

Table 14: Descriptive segment separation and uncertainty diagnostics.
Segment N Share Mean_Score Score_SD Mean_Silhouette Bootstrap_Agreement
3 Mid-career government career consolidators 103 0.245 0.568 0.164 0.032 0.859
1 Advanced-degree growth seekers 19 0.045 0.561 0.131 0.064 0.923
4 Younger government academic aspirants 140 0.333 0.506 0.162 0.031 0.788
2 Early-career credential builders 158 0.376 0.259 0.146 0.083 0.885

Table 14 reinforces the need for cautious persona translation. The mean silhouette and bootstrap agreement columns are descriptive stability indicators, not definitive validation. Segments with low separation, weaker bootstrap agreement, or overlapping score distributions should be treated as planning strata rather than as sharply bounded populations. Formal latent profile or latent class validation remains a priority for future research.

4.7 Recruitment Personas

The final result is the translation of segments into personas. Table 15 presents the persona labels, associated segments, approximate sizes, observed expressed-interest rates, profile markers, and recruitment implications. These personas are communication abstractions for institutional planning, not psychological profiles or stable professional identities.

Table 15: Recruitment personas derived from prediction and segmentation results.
Persona Associated segment Approximate size Observed PhD-interest rate Profile markers Recruitment implication
The advanced academic professional Advanced-degree growth seekers 19 0.526 MLIS; Private; Academic; Supervisory; median age 37 Emphasize research mentoring, publication pathways, flexible dissertation supervision, and recognition of prior graduate work.
The public-sector advancement candidate Mid-career government career consolidators 103 0.447 MLIS; Government; Academic; Supervisory; median age 43 Frame the PhD around leadership, policy, evidence-based service improvement, and government-compatible scheduling.
The emerging academic-service leader Younger government academic aspirants 140 0.421 BLIS; Government; Academic; Senior level; median age 30 Position the PhD as a pathway from professional service to research leadership, with clear MLIS-to-PhD advising and scholarship guidance.
The early-career credential builder Early-career credential builders 158 0.127 BLIS; Private; Academic; Junior level; median age 24 Offer bridge advising from BLIS/MLIS, staged milestones, peer cohorts, and funding information that clarifies the path to doctoral readiness.

Table 15 identifies four bounded recruitment abstractions. The advanced academic professional profile appears closest to immediate doctoral recruitment because of advanced preparation and high observed interest. The public-sector advancement candidate and emerging academic-service leader profiles represent larger strategic audiences whose expressed interest may be tied to leadership, institutional contribution, and career mobility. The early-career credential builder profile has lower immediate expressed interest but remains important for pipeline development, bridge advising, and long-term program sustainability. Each persona represents a statistical tendency rather than an individual prediction.

Personas developed in this study are probabilistic communication abstractions derived from aggregated workforce patterns and should not be interpreted as deterministic representations of individual librarians. Because segment boundaries overlap and the source evidence is cross-sectional, persona labels should remain provisional, revisable, and subject to stakeholder validation before any operational use.

Taken together, the results support the use of machine learning as a planning instrument for PhD in LIS development. The models identify meaningful signal, the feature rankings clarify the broad dimensions of that signal, the segments organize the professional landscape, and the personas translate the analysis into differentiated recruitment and program-design strategies without claiming to predict actual enrollment.

5 Discussion

5.1 Principal Findings

The findings show that expressed doctoral study interest among Philippine librarians is patterned enough to support strategic classification, segmentation, and persona development. The models do not merely reproduce a random distribution of interest. They identify a coherent structure in which educational preparation, career stage, institutional context, professional development, and mobility-related conditions combine to distinguish different professional profiles. This is the central methodological contribution of the study: machine learning can help convert national workforce evidence into a practical planning language for doctoral LIS education without claiming to forecast actual enrollment.

The results also show that the most strategically important audiences are not identical in size, expressed-interest rate, or recruitment logic. The smallest segment has the highest observed expressed-interest rate, while larger segments contain more varied profiles. This distinction matters for program planning. A proposed PhD in LIS should not rely only on the most immediately reachable candidates, nor should it assume that all interested professionals need the same recruitment message. The evidence instead points toward a portfolio strategy that includes immediate doctoral recruitment, mid-career advancement pathways, emerging-leader cultivation, and longer-term pipeline development.

5.2 Structural Interpretation of Findings

The results are best interpreted structurally rather than psychologically. The model does not reveal why any individual librarian wants or does not want doctoral education. It reveals how expressed interest is distributed across a professional field. This field is shaped by accumulated capital, access to educational opportunity, institutional expectations, and professional mobility pathways (Abbott, 1988; Bourdieu, 1986; Perna, 2006).

From this perspective, higher modeled interest should not be read as intrinsic motivation alone. It may reflect greater access to graduate education, stronger professional networks, clearer research identity, institutional incentives, or more visible returns to doctoral study. Lower modeled interest may reflect lower feasibility, weaker access, uncertain costs, limited mentoring, or weaker exposure to research careers. The implication is that doctoral pipeline planning should address structural conditions, not merely market to individuals.

5.3 Workforce Inequality and Doctoral Pathways

The findings suggest that professional opportunity structures shape the visibility of doctoral interest. Geographic location, institutional sector, and workplace context may influence whether doctoral education appears useful, accessible, and institutionally supported. These are not merely background variables. They are conditions through which professional capital becomes available or constrained.

This interpretation shifts the institutional question from “Who is most interested?” to “Where are doctoral pathways already visible, and where must they be made more feasible?” A doctoral program that ignores geographic and institutional asymmetry may recruit efficiently in the short term while reproducing long-standing inequalities in graduate access. A program that treats segmentation as a diagnostic of unequal opportunity can instead design distributed advising, scholarship pathways, remote participation options, and research mentorship pipelines.

5.4 Implications for PhD in LIS Program Design

The persona structure suggests that doctoral program design should be differentiated from the beginning. For advanced academic professionals, the program should foreground research supervision, publication pathways, methodological training, and opportunities to convert professional expertise into scholarly contribution. This audience may ask whether the program has sufficient academic depth, supervisory capacity, research culture, and institutional prestige to justify the opportunity cost of doctoral study.

For public-sector advancement candidates, the program should connect doctoral study to leadership, policy, institutional improvement, and evidence-based service development. The recruitment message should not present the PhD as an abstract academic credential alone. It should clarify how doctoral training can support public knowledge institutions, government service, organizational decision-making, and national LIS capacity.

For emerging academic-service leaders, the key design challenge is pathway clarity. This group may be professionally motivated but may need stronger advising on the transition from professional practice to research formation. A PhD program can serve this audience by offering structured research-preparation modules, mentoring, writing support, scholarship information, and clear expectations about the relationship between prior graduate study and doctoral preparation.

For early-career credential builders, the immediate strategic task is not aggressive PhD recruitment. It is pipeline development. This group may benefit more from bridge advising, MLIS-to-PhD roadmaps, research exposure, peer cohorts, and staged milestones. Treating this audience as a developmental pipeline rather than an immediate enrollment market can help the institution build long-term capacity without overstating current expressed interest.

5.5 Implications for Recruitment and Pipeline Strategy

The results argue against a single broad recruitment campaign. A generic message about the availability of a PhD in LIS would likely underperform because it would ignore the different motivations, constraints, and readiness profiles identified in the analysis. Instead, recruitment should be segmented by persona.

Immediate recruitment can prioritize high-readiness professionals with advanced preparation and strong alignment with academic or leadership trajectories. Mid-career recruitment can emphasize flexible scheduling, workplace relevance, policy contribution, and institutional leadership. Emerging-leader recruitment can stress mentoring, research identity formation, and scholarship pathways. Pipeline recruitment can focus on long-term advising and preparation rather than immediate application.

This approach also changes how recruitment success should be measured. A strategic PhD recruitment plan should track not only applications and enrollments, but also advising contacts, bridge-program participation, research-preparation engagement, scholarship inquiries, and movement from early interest to application preparation. In other words, the personas suggest a doctoral pipeline with multiple developmental stages rather than a simple conversion model.

5.6 Interpretable ML for Institutional Planning

A major interpretive boundary of this study is that the model should not be used to rank individuals for opportunity. Its proper role is decision support for institutional planning. This distinction is consistent with the broader caution in predictive analytics: models can support planning when they are transparent about their target, validation, and limitations, but they become problematic when treated as deterministic judgments about individual futures (G. S. Collins et al., 2015; Sghir et al., 2022). It is also consistent with arguments for interpretable models in high-stakes decision contexts and with ethical warnings that learning analytics can reshape institutional responsibility if students or professionals become objects of surveillance rather than agents (Rudin, 2019; Slade & Prinsloo, 2013).

In this study, machine learning is valuable because it reveals structure in the professional landscape. It helps answer questions such as: where is expressed doctoral study interest concentrated, what kinds of professional profiles are associated with that interest, and what support strategies might fit different audiences? These are planning questions, not eligibility questions. The ethical use of the model is therefore expansive: it should help the institution design more inclusive and responsive pathways, not restrict outreach to those with the highest modeled likelihood.

This distinction is particularly important for LIS education because doctoral participation is not a fixed individual trait. It is shaped by institutional support, mentoring, funding, workload, family responsibilities, geographic access, professional recognition, and prior exposure to research. A person with lower modeled likelihood may become an excellent doctoral student if provided with the right pathway. Conversely, a person with high modeled likelihood may face barriers not visible in the data. The model should be read as a map of strategic conditions, not as a verdict on individual capacity.

5.7 Equity and Access Considerations

The persona findings have an equity dimension. If recruitment focuses only on the most immediately reachable candidates, the program may reproduce existing inequalities in access to graduate preparation, research mentoring, and professional advancement. A doctoral program that aims to strengthen the national LIS workforce must therefore distinguish between observed expressed interest and potential for future participation. Observed interest identifies where current signals are strongest; potential identifies where institutional support could widen access.

This is where the early-career and emerging-leader personas become strategically important. Their lower or more varied modeled interest should not be interpreted as low value. Rather, these groups point to the need for bridge structures: research boot camps, writing clinics, faculty mentorship, scholarship advising, cohort-based preparation, and flexible study planning. Such supports can make doctoral education more accessible while also expanding the future applicant pool.

The caution from data-driven persona research is relevant here. Personas can help institutions empathize with audience segments, but they can also flatten complex lives into overly neat profiles if used carelessly (Salminen et al., 2021). The ethical response is to keep persona labels provisional, evidence-linked, and revisable. Personas should guide program design conversations, not replace direct engagement with prospective students.

5.8 Ethical Limits of Predictive Segmentation

Predictive segmentation should remain a planning instrument with explicit governance boundaries. It should not become a hidden sorting mechanism that allocates attention only to those already advantaged by prior education, institutional support, or geographic proximity. Fairness diagnostics, calibration checks, and stakeholder review are therefore not technical embellishments. They are safeguards against turning structural inequality into apparently neutral model output.

The persona layer has the same ethical limit. Personas are useful only when treated as bounded planning abstractions. They should support questions such as which advising pathways, scholarship messages, or research-preparation opportunities might be needed. They should not be used to infer individual motivation, merit, or likelihood of success.

5.9 Practical Governance Implications

If institutions use predictive segmentation for doctoral pipeline planning, governance should be explicit before deployment. A responsible governance process should document the model purpose, prohibit individual ranking, disclose the limitations of expressed-interest data, review subgroup performance, and specify who may access outputs. It should also require periodic recalibration, stakeholder review of persona language, and a process for retiring or revising personas that become misleading.

The practical implication is that analytics should expand institutional responsibility rather than automate it. A school using this framework should ask where additional advising, funding, remote access, or research preparation is needed, not which individuals deserve attention. Transparency, fairness monitoring, and non-exclusionary use are therefore core conditions for responsible adoption.

5.10 Contributions to LIS Workforce Research

This study contributes to LIS education research by showing how workforce evidence can be translated into a reproducible doctoral-planning framework. The literature has long positioned LIS education as a bridge between universities and the profession (Birdi, 2022), and has emphasized the need for educational programs to respond to workforce needs and skills gaps (Katuli-Munyoro & Mutula, 2017). This article extends that discussion by demonstrating how machine learning can operationalize workforce responsiveness without reducing education planning to crude demand estimation or individualized targeting.

The article also contributes methodologically. It integrates prediction, segmentation, and persona synthesis in a single workflow. Predictive performance establishes whether the data contain usable signal. Feature importance clarifies which broad dimensions structure the signal. Segmentation organizes the professional population into strategic groups. Personas translate those groups into program-design and recruitment implications. This sequence provides a replicable template for other LIS schools considering new graduate programs or evaluating advanced-degree interest.

The theoretical contribution is to connect doctoral pipeline planning to professional capital and structural access. The study demonstrates that expressed interest in doctoral LIS education can be analyzed as a patterned workforce phenomenon rather than as a simple individual preference or marketing target. This reframing moves the contribution beyond institutional recruitment analytics and toward a theory-informed account of how professional opportunity structures become visible through predictive segmentation.

Finally, the study contributes to the responsible use of analytics in LIS. Rather than presenting machine learning as a neutral oracle, the article frames it as an interpretive tool embedded in institutional decision-making. The value of the model depends not only on performance metrics, but also on whether its outputs are understandable, calibrated, ethically bounded, and useful for expanding educational opportunity.

5.11 Limitations

Several limitations should guide interpretation. First, the outcome represents expressed doctoral study interest, not application, admission, enrollment, persistence, or completion. Expressed interest is temporally unstable and may shift with tuition, delivery mode, scholarship availability, family responsibilities, workload, or institutional encouragement. The findings therefore support interest exploration and recruitment planning, but they do not forecast actual cohort size. Second, the analysis relies on secondary survey data collected before the proposed program exists as a concrete offering. Respondents could not evaluate specific tuition levels, delivery modes, faculty expertise, scholarship packages, or admission requirements.

Third, the outcome is self-reported and may be affected by social desirability, institutional signaling, misunderstanding of doctoral requirements, or aspiration suppression under structural constraint. Fourth, the modeling workflow depends on the quality and completeness of the available survey evidence. Missingness, noisy responses, unobserved socioeconomic constraints, and uneven representation across regions or institutional groups can affect both model performance and persona detail. Fifth, feature importance should be interpreted globally and cautiously. It indicates which predictors are useful to the fitted model, not which factors causally produce doctoral interest.

Sixth, the segmentation and persona labels are interpretive. They are intended to support strategy, not to describe fixed identities. Seventh, the present results are context-bound to the Philippine LIS workforce and should not be generalized to other national systems without replication. Future work should validate the personas through interviews, focus groups, prospective-student consultations, or pilot recruitment activities. Such validation would help determine whether the model-derived profiles resonate with librarians’ own accounts of their motivations, constraints, and doctoral ambitions.

5.12 Future Research and Validation

The next research step is longitudinal validation. Once a PhD in LIS program is offered, future studies should track whether expressed interest, modeled probability, segment membership, advising participation, and bridge-program engagement predict application, admission, enrollment, persistence, and completion. This would allow the framework to move from cross-sectional interest classification toward validated doctoral pipeline analysis.

A second priority is qualitative triangulation. Interviews, focus groups, and participatory persona review with librarians from different regions and institutional contexts would test whether the personas are recognizable, useful, and non-stereotyping. A third priority is methodological replication using probabilistic segmentation, calibration refinement, and more formal fairness assessment. Such work would strengthen both the scholarly contribution and the ethical governance of predictive segmentation in LIS education.

5.13 Strategic Takeaway

The central takeaway is that a PhD in LIS recruitment strategy should be evidence-informed, segmented, and developmental. The strongest current expressions of interest may be concentrated among advanced and institutionally positioned professionals, but the long-term success of the program depends on a broader ecosystem of advising, bridge preparation, financial support, and research identity formation. Machine learning helps reveal where those strategies might be directed; it does not replace academic judgment, ethical recruitment, or direct engagement with the profession.

References

Abbiati, M., Nendaz, M., & Cerutti, B. (2024). Exploring medical career choice to better inform swiss physician workforce planning: Protocol for a national cohort study. JMIR Research Protocols. https://doi.org/10.2196/53138
Abbott, A. (1988). The system of professions: An essay on the division of expert labor. University of Chicago Press. https://doi.org/10.7208/chicago/9780226189666.001.0001
Almahri, F. A. A. J., Bell, D., & Arzoky, M. (2019). Personas design for conversational systems in education. Informatics, 6(4), 46. https://doi.org/10.3390/informatics6040046
Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and machine learning: Limitations and opportunities. MIT Press.
Birdi, B. (2022). The contribution of library and information science education to decolonising. In Narrative expansions: Interpreting decolonisation in academic libraries (pp. 91–104). Facet. https://doi.org/10.29085/9781783304998.009
Bourdieu, P. (1986). The forms of capital. In J. G. Richardson (Ed.), Handbook of theory and research for the sociology of education (pp. 241–258). Greenwood.
Collins, G. S., Reitsma, J. B., & Altman, D. G. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Journal of Clinical Epidemiology, 68(2), 112–121. https://doi.org/10.1016/j.jclinepi.2014.11.010
Collins, L. M., & Lanza, S. T. (2009). Latent class and latent transition analysis. Wiley. https://doi.org/10.1002/9780470567333
Dorado, D. A. D. (2024). Exploring the landscape of librarianship in the philippines: Establishing the profession’s population parameter estimates. Journal of Librarianship and Information Science, 57(3), 733–745. https://doi.org/10.1177/09610006241240485
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53(1), 109–132. https://doi.org/10.1146/annurev.psych.53.100901.135153
Freidson, E. (2001). Professionalism: The third logic. University of Chicago Press.
Hernandez-Blanco, A., Herrera-Flores, B., & Tomas, D. (2019). A systematic review of deep learning approaches to educational data mining. Complexity, 2019(1). https://doi.org/10.1155/2019/1306039
Katuli-Munyoro, P., & Mutula, S. M. (2017). Redefining library and information science education and training in zimbabwe to close the workforce skills gaps. Journal of Librarianship and Information Science, 51(4), 915–926. https://doi.org/10.1177/0961000617748472
Knox, J. (2017). Data power in education: Exploring critical awareness with the learning analytics report card. Television & New Media, 18(8), 734–752. https://doi.org/10.1177/1527476417690029
Macgilchrist, F. (2021). What is ’critical’ in critical studies of edtech? Three responses. Learning, Media and Technology, 46(3), 243–249. https://doi.org/10.1080/17439884.2021.1958843
Obille, K. L. B., & Dorado, D. A. D. (2022). Philippine librarians census [dataset]. Zenodo. https://doi.org/10.5281/zenodo.6864788
Panigrahi, P. (2010). Library and information science education in east and north-east india: Retrospect and prospects. DESIDOC Journal of Library & Information Technology, 30(5), 32–47. https://doi.org/10.14429/djlit.30.613
Park, D.-H., & Kang, J. (2022). Constructing data-driven personas through an analysis of mobile application store data. Applied Sciences, 12(6), 2869. https://doi.org/10.3390/app12062869
Perna, L. W. (2006). Studying college access and choice: A proposed conceptual model. In Higher education: Handbook of theory and research (pp. 99–157). Kluwer Academic Publishers. https://doi.org/10.1007/1-4020-4512-3_3
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
Salminen, J., Jung, S., & Jansen, B. J. (2021). Are data-driven personas considered harmful? Persona Studies, 7(1), 48–63. https://doi.org/10.21153/psj2021vol7no1art1236
Sghir, N., Adadi, A., & Lahmer, M. (2022). Recent advances in predictive learning analytics: A decade systematic review (2012–2022). Education and Information Technologies, 28(7), 8299–8333. https://doi.org/10.1007/s10639-022-11536-0
Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist, 57(10), 1510–1529. https://doi.org/10.1177/0002764213479366