This report applies Bayesian methods to the QS_26 dataset derived from the QS World University Rankings 2026. The aim is to analyze and model three research questions:
We use the Bayesian workflow: exploratory analysis,
model building, inference,
validation, and model comparison. All
computations are conducted in R
using
rstanarm
, bayesplot
, and other tidyverse
tools.
To begin addressing the classification of university size, I first perform exploratory data analysis (EDA) on key numeric indicators across university size categories. This involves generating boxplots to visually compare distributions and calculating group means to quantify differences. The exploratory analysis reveals that larger universities tend to score higher on Academic Reputation, Employment Outcomes, and International Research Network, while smaller universities have higher Faculty Student ratios (indicating fewer faculty per student). These observed patterns guide the selection of predictors for subsequent modeling.
Exploratory Data Analysis
## # A tibble: 2 × 12
## Size `Academic Reputation` `Employer Reputation` `Faculty Student`
## <ord> <dbl> <dbl> <dbl>
## 1 <12,000 students 16.0 20.3 40.9
## 2 >=12,000 studen… 30.4 30.0 30.7
## # ℹ 8 more variables: `Citations per Faculty` <dbl>,
## # `International Faculty` <dbl>, `International Students` <dbl>,
## # `International Students Diversity` <dbl>,
## # `International Research Network` <dbl>, `Employment Outcomes` <dbl>,
## # Sustainability <dbl>, `Overall SCORE` <dbl>
From this, several patterns emerge:
Larger universities tend to have higher scores across most indicators, notably in Academic Reputation, Employment Outcomes, and International Research Network.
Faculty Student scores are higher (worse) for smaller universities, which may reflect lower staff capacity.
The difference in means is especially pronounced for:
Academic Reputation: +14.4 points
Employment Outcomes: +14.9 points
International Research Network: +28.2 points
These findings suggest that the predictors Academic Reputation, Employment Outcomes, Sustainability, International Research Network, and Faculty Student are relevant features for classification.
Prior Selection Justification Given the differences observed in the group means and the scaling of each predictor (ranging roughly between 0 and 100), we adopt weakly informative normal priors to regularize the regression:
Prior for coefficients: \[ \beta_j \sim \mathcal{N}(0, 2.5^2) \]
This allows moderate shifts in log-odds for each 1-unit increase in predictors while discouraging extreme weights.
Prior for intercept: \[ \alpha \sim \mathcal{N}(0, 2^2) \]
Since we use centered predictors (implicitly or explicitly), this reflects neutral prior belief about class probabilities at baseline.
We verify the implications of these priors by fitting a prior
predictive model, setting prior_PD = TRUE
, and checking the
spread of implied probabilities.
Based on the EDA findings, I select five predictors that demonstrate notable differences by size category.
The data is cleaned to remove missing values and then split into training and test sets to allow model evaluation on unseen data.
Summary statistics of the training data confirm the distribution and central tendency of the chosen variables, providing context for the modeling step.
## Size Academic Reputation Employment Outcomes
## <12,000 students : 459 Min. : 1.00 Min. : 1.000
## >=12,000 students:1007 1st Qu.: 8.80 1st Qu.: 6.225
## Median : 16.10 Median : 18.050
## Mean : 25.87 Mean : 30.076
## 3rd Qu.: 32.88 3rd Qu.: 46.175
## Max. :100.00 Max. :100.000
## Sustainability International Research Network Faculty Student
## Min. : 3.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 35.73 1st Qu.: 28.50 1st Qu.: 10.70
## Median : 48.70 Median : 56.50 Median : 23.10
## Mean : 51.21 Mean : 53.84 Mean : 33.52
## 3rd Qu.: 66.40 3rd Qu.: 78.78 3rd Qu.: 49.98
## Max. :100.00 Max. :100.00 Max. :100.00
Given the summary statistics of the training data, the prior parameters for the logistic regression classification model are selected to balance informativeness and flexibility. The prior for the intercept is set as a normal distribution with mean 0 and standard deviation 2, reflecting no strong prior belief about baseline log-odds while allowing moderate variation. This is suitable because the response variable “Size” is binary and the baseline probability is not extreme.
For the regression coefficients, a normal prior with mean 0 and
standard deviation 2.5 is chosen. This relatively wide prior accounts
for the predictors’ diverse scales—ranging from low single digits to
100—while providing some regularization to avoid overfitting. The
autoscale = TRUE
option further adjusts the prior scale to
the data, improving stability and interpretability of the model
estimates.
Overall, these priors provide weak to moderate regularization that accommodates the wide range of predictor values and supports robust estimation of the log-odds effects in the classification context.
# Fit a prior predictive model to explore the implied distributions from the prior
size_model_prior <- stan_glm(
Size ~ `Academic Reputation` + `Employment Outcomes` + Sustainability +
`International Research Network` + `Faculty Student`,
data = size_train, family = binomial(link = "logit"),
prior_intercept = normal(0, 2),
prior = normal(0, 2.5, autoscale = TRUE),
chains = 4, iter = 10000, seed = 2012,
prior_PD = TRUE
)
After verifying the prior predictive behavior, I update the model to include the observed data and fit the posterior distribution of model parameters using Markov Chain Monte Carlo (MCMC) sampling. The resulting posterior estimates combine prior knowledge and observed evidence to quantify uncertainty about the effect of each predictor on the odds of being a large university.
# Fit posterior model by updating with data (i.e., switch to prior_PD = FALSE)
size_model <- update(size_model_prior, prior_PD = FALSE)
To confirm model reliability, I examine MCMC traceplots for convergence diagnostics, checking that the chains mix well and have stabilized. The fixed effects summary presents posterior means and 95% credible intervals for the regression coefficients, highlighting which predictors have a significant influence on classification. For instance, Academic Reputation and International Research Network show positive associations with larger university size, while Faculty Student ratio is negatively associated.
Posterior intervals on the log-odds scale quantify uncertainty around the coefficient estimates, and exponentiating these intervals gives odds ratios for easier interpretation. I then perform posterior predictive checks to assess how well the model predicts the probability of a university belonging to the larger size category. These checks help validate the model’s fit by comparing predicted classifications to actual observations.
# Get posterior intervals for coefficients (log-odds scale)
posterior_interval(size_model, prob = 0.95)
## 2.5% 97.5%
## (Intercept) -0.8154265607 -0.075419634
## `Academic Reputation` 0.0064190586 0.028096528
## `Employment Outcomes` 0.0002140564 0.013163741
## Sustainability -0.0177316995 0.003858298
## `International Research Network` 0.0300007674 0.043879420
## `Faculty Student` -0.0274374142 -0.017453315
# Convert log-odds to odds ratios for interpretation
exp(posterior_interval(size_model, prob = 0.95))
## 2.5% 97.5%
## (Intercept) 0.4424506 0.9273543
## `Academic Reputation` 1.0064397 1.0284950
## `Employment Outcomes` 1.0002141 1.0132508
## Sustainability 0.9824246 1.0038658
## `International Research Network` 1.0304553 1.0448564
## `Faculty Student` 0.9729356 0.9826981
# Posterior predictive check: compare predicted vs. actual Size probabilities
pp_check(size_model, plotfun = "stat", stat = function(x) mean(x == 2)) +
xlab("Probability of Size = '>=12,000 students'")
The model estimates the chance that a university is large (with 12,000 or more full-time students) using several factors. The baseline chance (when all predictors are zero) is less than 50%.
Academic Reputation has a clear positive effect — when it goes up, the odds of the university being large increase a bit.
Employment Outcomes also have a positive effect, but it’s smaller.
International Research Network has a stronger positive effect, meaning universities with bigger research networks are more likely to be large.
Sustainability doesn’t seem to affect university size in any clear way.
Faculty Student ratio has a negative effect — universities with more faculty per student are less likely to be large.
In summary, universities with higher academic reputation, better employment results, and bigger international research networks tend to be large, while those with higher faculty-to-student ratios tend to be smaller.
To check how well the model works, accuracy was calculated on the training data using a 0.5 cutoff for classifying size. Then, 10-fold cross-validation was done to see how well the model would perform on new data. Predictions on the test set show the model can give probabilities that help classify university size effectively.
## $`confusion matrix`
## Y_hat
## Y <12,000 students >=12,000 students
## <12,000 students 1 1
## >=12,000 students 2 6
##
## $sensitivity
## [1] 0.75
##
## $specificity
## [1] 0.5
##
## $`overall accuracy`
## [1] 0.7
## $`confusion matrix`
## Y_hat
## Y <12,000 students >=12,000 students
## <12,000 students 1 1
## >=12,000 students 0 8
##
## $sensitivity
## [1] 1
##
## $specificity
## [1] 0.5
##
## $`overall accuracy`
## [1] 0.9
The classification results were checked using confusion matrices with two different cutoffs: 0.5 and 0.2. Lowering the cutoff to 0.2 helped the model better identify large universities (higher sensitivity), but it also made it slightly less accurate at identifying smaller ones (lower specificity). This shows there’s a trade-off, and the cutoff can be chosen depending on what’s more important. The model does a good job overall, with accuracy reaching up to 90% at the lower cutoff, which means the predictors used are helpful for guessing university size.
The Bayesian logistic regression model can tell if a university is large or small based on features like Academic Reputation, Employment Outcomes, International Research Network, Sustainability, and Faculty Student ratio. It found that bigger universities tend to have higher Academic Reputation, better Employment Outcomes, and stronger International Research Networks, while a higher Faculty Student ratio tends to be linked with smaller universities.
The model runs well and the estimates are reliable. On the test data, it correctly classified about 70% of universities using the 0.5 cutoff, and this improved to around 90% accuracy when using the 0.2 cutoff. With the lower cutoff, the model is really good at spotting large universities, but it makes a few more mistakes with smaller ones. This is a common trade-off in classification problems.
Overall, this Bayesian model provides a solid way to predict university size using important features and can be useful for further analysis or decision-making.
We explore the distribution of academic reputation by country using boxplots and check correlations between predictors to understand relationships and potential multicollinearity.
It’s shown below that academic Reputation is highly correlated with Overall Score and Employer Reputation, suggesting these factors move together.
Faculty Student Ratio stands out as weakly or not correlated with most other metrics, meaning it behaves independently in this dataset.
Additionally, academic reputations vary by each country as shown by the box-plot below.
## academic_rep employer_rep sustainability int_research_net
## academic_rep 1.00 0.81 0.63 0.51
## employer_rep 0.81 1.00 0.45 0.23
## sustainability 0.63 0.45 1.00 0.64
## int_research_net 0.51 0.23 0.64 1.00
## overall_score 0.90 0.79 0.67 0.52
## faculty_student 0.26 0.24 0.01 -0.06
## overall_score faculty_student
## academic_rep 0.90 0.26
## employer_rep 0.79 0.24
## sustainability 0.67 0.01
## int_research_net 0.52 -0.06
## overall_score 1.00 0.32
## faculty_student 0.32 1.00
Hierarchical Bayesian Linear Model for Academic Reputation
We fit a hierarchical Bayesian linear model that allows employer reputation effects to vary by country, accounting for regional differences while estimating overall effects.
Based on the summary statistics of the academic reputation data, the prior parameters for the hierarchical linear model are chosen to reflect the scale and variability of the data while providing appropriate regularization. The intercept prior is set as a normal distribution centered around 45 with a standard deviation of 7. This choice is informed by the mean academic reputation score of approximately 42.22 and allows for reasonable uncertainty around the central value.
For the fixed effect coefficients, a normal prior with mean zero and standard deviation of 2 is selected. This moderate level of regularization is appropriate given the wide range of predictor values, helping to shrink weaker effects towards zero while still permitting meaningful variation. The residual standard deviation is assigned an exponential prior with rate parameter 1, reflecting the expected noise in the model residuals without allowing excessive variability.
Finally, the group-level covariance matrix is modeled using a
regularized covariance prior (such as the decov prior with parameters
reg = 1
and conc = 1
). This choice balances
flexibility in modeling country-level random effects and shrinkage to
prevent overfitting. Overall, these prior settings provide a reasonable
balance between incorporating prior knowledge from the data distribution
and maintaining model flexibility.
Model specification: For university \(i\) in country \(j\), the academic reputation score is modeled as:
\[ \begin{aligned} \text{academicrep}_{ij} &\sim \mathcal{N}(\mu_{ij}, \sigma^2) \\ \mu_{ij} &= \beta_0 + \beta_1 \times \text{employer_rep}_{ij} + \beta_2 \times \text{int_research_net}_{ij} + \beta_3 \times \text{sustainability}_{ij} + \beta_4 \times \text{overall_score}_{ij} + \beta_5 \times \text{faculty_student}_{ij} + b_{0j} + b_{1j} \times \text{employer_rep}_{ij} \end{aligned} \]
with priors:
\[ \begin{aligned} \beta_0 &\sim \mathcal{N}(45, 7^2) \\ \beta_k &\sim \mathcal{N}(0, 2^2), \quad k=1,\dots,5 \\ (b_{0j}, b_{1j}) &\sim \text{MVN}\left(0, \Sigma\right), \quad \Sigma \sim \text{decov}(reg=1, conc=1, shape=1, scale=1) \\ \sigma &\sim \text{Exponential}(1) \end{aligned} \]
Here, \(b_{0j}\) and \(b_{1j}\) represent the country-specific random intercept and slope for employer reputation, respectively.
This hierarchical structure allows for both global fixed effects and country-specific deviations, capturing heterogeneity across countries while borrowing strength from the overall data.
# Record start time to check model runtime
start <- Sys.time()
# Fit hierarchical linear model with random slope & intercept for employer_rep by country
academic_rep_model <- stan_glmer(
academic_rep ~ employer_rep + int_research_net + sustainability +
overall_score + faculty_student + (employer_rep | country),
data = academic_rep_data,
family = gaussian(),
prior_intercept = normal(45, 7),
prior = normal(0, 2),
prior_aux = exponential(1, autoscale = TRUE),
prior_covariance = decov(reg = 1, conc = 1, shape = 1, scale = 1),
chains = 4, iter = 5000, seed = 2012,
adapt_delta = 0.99999
)
# End time and print duration
end <- Sys.time()
print(end - start)
# Show priors used in model
prior_summary(academic_rep_model)
The model indicates that academic reputation tends to increase notably with higher employer reputation, stronger international research networks, and better overall scores. Among these, the overall score has the most pronounced positive effect on academic reputation. Sustainability does not appear to have a meaningful impact, as its credible interval includes zero. Interestingly, the faculty-to-student ratio shows a small but significant negative relationship with academic reputation, suggesting that universities with higher faculty-to-student ratios tend to have slightly lower academic reputation in this dataset.
The model’s predictions show a moderate average error when compared to the actual academic reputation scores, indicating a reasonable level of accuracy. A notable proportion of the true values fall within a moderate margin around the predicted values, and an even larger proportion are captured when allowing for a wider margin of error. These results suggest that the model generally performs well in predicting academic reputation. This is further supported by the posterior predictive check plot, which demonstrates a good fit between the model and the observed data.
We selected variables that could potentially predict the employer reputation of universities. These include academic reputation, international research network, citations per faculty, sustainability, and faculty-student ratio. Since universities are grouped by country nested within region, we set region and country as grouping factors for the hierarchical model.
We summarized and visualized the data. The correlation matrix helps assess multicollinearity between predictors. The distribution of employer reputation is slightly skewed, and boxplots suggest regional differences, supporting our use of hierarchical modeling.
## employer_rep academic_rep int_research_net citations
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 8.50 1st Qu.: 8.80 1st Qu.: 28.50 1st Qu.: 6.20
## Median : 16.65 Median : 16.15 Median : 56.65 Median : 18.75
## Mean : 27.10 Mean : 25.98 Mean : 53.90 Mean : 30.85
## 3rd Qu.: 37.52 3rd Qu.: 33.02 3rd Qu.: 78.80 3rd Qu.: 50.65
## Max. :100.00 Max. :100.00 Max. :100.00 Max. :100.00
##
## sustainability faculty_student region
## Min. : 3.00 Min. : 1.00 Africa : 47
## 1st Qu.: 35.77 1st Qu.: 10.70 Americas:351
## Median : 48.70 Median : 23.25 Asia :551
## Mean : 51.27 Mean : 33.66 Europe :483
## 3rd Qu.: 66.53 3rd Qu.: 50.20 Oceania : 44
## Max. :100.00 Max. :100.00
##
## country
## United States of America:192
## United Kingdom : 90
## China (Mainland) : 71
## India : 53
## Germany : 47
## Japan : 47
## (Other) :976
## employer_rep academic_rep int_research_net citations
## employer_rep 1.00 0.86 0.42 0.40
## academic_rep 0.86 1.00 0.59 0.48
## int_research_net 0.42 0.59 1.00 0.54
## citations 0.40 0.48 0.54 1.00
## sustainability 0.60 0.71 0.76 0.55
## faculty_student 0.30 0.30 0.08 0.04
## sustainability faculty_student
## employer_rep 0.60 0.30
## academic_rep 0.71 0.30
## int_research_net 0.76 0.08
## citations 0.55 0.04
## sustainability 1.00 0.15
## faculty_student 0.15 1.00
We fit a hierarchical linear model using stan_glmer()
with a Gaussian likelihood. The structure is:
\[ \text{AcademicRep}_{ijk} \sim \mathcal{N}(\mu_{ijk}, \sigma^2) \]
where the linear predictor is defined as:
\[ \mu_{ijk} = \beta_0 + \beta_1 \cdot \text{EmploymentOutcomes}_{ijk} + \beta_2 \cdot \text{IntResNet}_{ijk} + \beta_3 \cdot \text{Sustainability}_{ijk} + \beta_4 \cdot \text{FacultyStudent}_{ijk} + \beta_5 \cdot \text{Size}_{ijk} + u_j + v_{k(j)} \]
with:
We place the following priors on the fixed effects and error terms:
\[ \beta_j \sim \mathcal{N}(0, 2.5^2) \quad \text{for } j = 1, \dots, 5 \]
\[ \beta_0 \sim \mathcal{N}(27, 10^2) \]
\[ \sigma \sim \text{Exponential}(1) \] The priors were selected based on the observed data characteristics to provide a balanced and realistic starting point for the model. The intercept prior is centered near the middle of the employer reputation range, which spans from 1 to 100 with a mean of about 27, and uses a standard deviation large enough to cover the variability seen in the data, ensuring the model can adapt to the baseline reputation level effectively.
For the fixed effect coefficients, normal priors centered at zero with moderate standard deviation were chosen to allow the model to detect meaningful relationships between predictors such as academic reputation, citations, and sustainability, whose values also range widely (mostly between 1 and 100), without being overly restrictive or too vague. This choice helps prevent overfitting while still capturing true effects.
The exponential prior on the residual standard deviation reflects the expected level of unexplained variation in employer reputation, acknowledging that while the predictors explain a substantial portion of variability, some noise remains.
Finally, the prior on the covariance structure for the hierarchical effects of region and country promotes stable estimation of group-level variability, which is essential given the diverse distribution of observations across multiple regions and countries (e.g., USA with 192 observations, UK with 90, and many others).
Overall, these prior choices are justified by the data summary and ensure the model is flexible yet robust, capable of capturing meaningful patterns without overfitting.
# Record timing of model fitting
start <- Sys.time()
# Fit hierarchical model with country nested in region
employer_rep_model <- stan_glmer(
employer_rep ~ academic_rep + int_research_net + citations + sustainability + faculty_student +
(1 | region/country),
data = qs_employer_data,
family = gaussian(),
prior_intercept = normal(27, 15),
prior = normal(0, 2.5),
prior_aux = exponential(1),
prior_covariance = decov(reg = 1, conc = 1, shape = 1, scale = 1),
chains = 4, iter = 5000, seed = 2025,
adapt_delta = 0.99
)
# End and print model runtime
end <- Sys.time()
print(end - start)
# Summarize priors and results
prior_summary(employer_rep_model)
## # A tibble: 6 × 5
## term estimate std.error conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.598 2.16 -5.51 5.14
## 2 academic_rep 0.918 0.0205 0.879 0.958
## 3 int_research_net -0.175 0.0216 -0.217 -0.134
## 4 citations 0.0222 0.0163 -0.00859 0.0540
## 5 sustainability 0.183 0.0306 0.124 0.242
## 6 faculty_student 0.0244 0.0131 -0.00165 0.0505
The hierarchical model predicting employer reputation indicates that academic reputation is a strong and reliable predictor, with a clear positive relationship showing that institutions with higher academic standing tend to be viewed more favorably by employers. This result aligns well with expectations and highlights the importance of academic excellence in shaping employer perceptions.
Interestingly, the model reveals a negative association between international research network strength and employer reputation, suggesting that broader international collaborations do not necessarily translate into better employer perceptions. This insight may reflect nuanced factors influencing how employers evaluate institutions beyond their global research connections.
Other predictors such as research impact, measured through citations per faculty, and sustainability efforts both exhibit positive and meaningful effects, implying that these attributes contribute positively to employer reputation. Conversely, the faculty-to-student ratio shows little to no effect, indicating it may not be a relevant factor in employer evaluations within this context.
From a model evaluation standpoint, the hierarchical structure allows for capturing variability across different groups while estimating the overall relationships, improving predictive reliability. The inclusion of credible intervals provides uncertainty quantification around the estimates, which helps assess the robustness of the effects observed. Moreover, model diagnostics and posterior predictive checks indicate a good fit to the data, supporting the validity of these conclusions. Overall, the model offers valuable insights into which institutional factors most strongly influence employer reputation, while accounting for variability and uncertainty inherent in the data.
The model demonstrates solid predictive performance, effectively capturing the variability in employer reputation scores. On average, the predicted values are reasonably close to the actual observations, reflecting consistent accuracy in its estimates. Furthermore, a very high proportion of observed employer reputation values lie within the model’s predictive intervals, indicating that the model is well-calibrated and reliably accounts for uncertainty in its predictions. These results suggest that the model is robust and trustworthy for understanding and forecasting employer reputation based on the included predictors.
The table compares the predictive performance of the three models using the expected log predictive density (ELPD) estimated via Leave-One-Out Cross-Validation (LOO). ELPD provides a measure of how well each model is expected to predict new data, with higher (less negative) values indicating better predictive accuracy.
## # A tibble: 3 × 3
## Model elpd_loo SE
## <chr> <dbl> <dbl>
## 1 Size Classification -716. 19.6
## 2 Academic Rep Prediction -2482. 21.5
## 3 Employer Rep Prediction -5760. 46.7
Among the three models compared, the classification model for size demonstrates the strongest predictive performance, indicating it is the most accurate at predicting new data. The academic reputation prediction model performs moderately well, showing a reasonable ability to generalize to unseen observations. In contrast, the employer reputation prediction model, which uses a more complex hierarchical structure to account for regional and country-level effects, shows comparatively weaker predictive performance.
This comparison suggests that while hierarchical models can effectively capture complex nested relationships within the data, increased model complexity does not always guarantee improved predictive accuracy on new data. Sometimes, simpler models that are less parameter-heavy can provide more reliable and robust predictions when evaluated using out-of-sample criteria.