Modeling Study Performance: A Logistic Regression Approach

Introduction

Academic performance is shaped by a complex interplay of psychological, behavioral, and demographic factors. This analysis applies logistic regression to model study performance as a binary outcome using data from a student survey (N = 275, MCI Innsbruck). We investigate how mental well-being, digital dependency, sleep adequacy, marijuana use, and gender relate to the odds of achieving above-average academic results. Logistic regression is well suited here because the outcome is dichotomous and we are interested in the likelihood of group membership rather than a continuous score.

Data Preparation

# Load survey and meta data
df = read.xlsx("ihsm25_survey.isle.xlsx", sheetName = "data", stringsAsFactors = TRUE)
df_meta = read.xlsx("ihsm25_survey.isle.xlsx", sheetName = "meta")
n_raw = nrow(df)
df = df[df$student != 0, ]

# Apply variable names and factor levels from the meta sheet
varnames = df_meta$altName
names(df) = varnames

values = df_meta$valLabel
names(values) = varnames

fnames = names(df)[sapply(names(df), function(x) class(df[, x]) == "factor")]
for (name in fnames) {
  df[, name] = factor(df[, name], trimws(strsplit(gsub("[()]", "", values[name]), "\\|")[[1]]))
}

# Keep variable labels for reference
varlabels = df_meta$varLabel
names(varlabels) = df_meta$altName

n_total = nrow(df)

The survey contains self-reported data on demographics, health behaviors, substance use, digital habits, and psychological well-being. We construct two multi-item scales and recode several predictors before fitting the model.

# Convert digital behavior items to numeric, set "No answer" (level 8) to NA
digi_vars = c("hdb_stop", "hdb_lazy", "hdb_anxious", "hdb_happy")
for (v in digi_vars) {
  df[[paste0("n_", v)]] = as.numeric(df[[v]])
  df[[paste0("n_", v)]][df[[paste0("n_", v)]] == 8] = NA
}
# Reverse-code hdb_stop so higher values mean poorer ability to disconnect
df$n_hdb_stop = 8 - df$n_hdb_stop

# Reliability of all four items
alpha_all = cronbach.alpha(df[, paste0("n_", digi_vars)], na.rm = TRUE)
alpha_all

## 
## Cronbach's alpha for the 'df[, paste0("n_", digi_vars)]' data-set
## 
## Items: 4
## Sample units: 275
## alpha: 0.509

The four-item set shows questionable internal consistency. Examining the inter-item correlations suggests that hdb_stop (“I can stop using my smartphone when I need to do something else”) correlates weakly with the other items, likely because it measures self-regulation rather than dependency.

digi_vars3 = c("hdb_lazy", "hdb_anxious", "hdb_happy")
alpha_reduced = cronbach.alpha(df[, paste0("n_", digi_vars3)], na.rm = TRUE)
alpha_reduced

## 
## Cronbach's alpha for the 'df[, paste0("n_", digi_vars3)]' data-set
## 
## Items: 3
## Sample units: 275
## alpha: 0.568

# Construct final digineed scale as the mean of three items
df$digineed = rowMeans(df[, paste0("n_", digi_vars3)], na.rm = TRUE)

Removing hdb_stop improves Cronbach’s alpha to an acceptable level. The final digineed scale captures the degree of digital dependency — a composite of physical inactivity due to phone use, anxiety when separated from the phone, and reliance on the phone for happiness. Higher scores indicate stronger digital dependency.

# Convert WHO-5 well-being items to numeric, set "No answer" (level 7) to NA
mh_vars = c("feel_good", "feel_relaxed", "feel_active", "feel_fresh", "feel_fulfilled")
for (v in mh_vars) {
  df[[paste0("n_", v)]] = as.numeric(df[[v]])
  df[[paste0("n_", v)]][df[[paste0("n_", v)]] == 7] = NA
}

# Reliability check
alpha_mh = cronbach.alpha(df[, paste0("n_", mh_vars)], na.rm = TRUE)
alpha_mh

## 
## Cronbach's alpha for the 'df[, paste0("n_", mh_vars)]' data-set
## 
## Items: 5
## Sample units: 275
## alpha: 0.79

# Construct mental health scale as the mean of all five items
df$mentalhealth = rowMeans(df[, paste0("n_", mh_vars)], na.rm = TRUE)

The five mental health items show good reliability. The mentalhealth scale represents the WHO-5 well-being index — a well-validated measure of subjective psychological well-being. Higher scores reflect better mental health.

# Binary outcome: High performance = Excellent or Good GPA
df$high_perf = factor(
  ifelse(df$gpa %in% c("Excellent", "Good"), "High", "Low"),
  levels = c("Low", "High")
)

prop.table(table(df$high_perf))

## 
##       Low      High 
## 0.2690909 0.7309091

The outcome variable distinguishes students with above-average academic performance (Excellent or Good GPA) from those with average or below-average performance (Average, Sufficient, or Insufficient). In the European grading system, “Good” and above represent a clearly above-average standing, making this a meaningful threshold. Approximately 73% of students fall into the High group and 27% into the Low group — an imbalance worth noting but not severe enough to compromise logistic regression.

# Adequate sleep: 7+ hours per night
df$sleep_adequate = factor(
  ifelse(df$sleep %in% c("7-8 hours", "9 or more hours"), "Adequate", "Inadequate"),
  levels = c("Inadequate", "Adequate")
)

# Marijuana use: dichotomized due to sparse cell counts
df$use_marijuana2 = factor(NA, levels = c("Not at all", "At least sometimes"))
df$use_marijuana2[df$use_marijuana %in% "Not at all"] = "Not at all"
df$use_marijuana2[df$use_marijuana %in% c("Several days", "More than half of the days", "Nearly every day")] = "At least sometimes"

# Gender remains as is, with "female" as the reference level
df$gender = relevel(df$gender, ref = "female")

Sleep is dichotomized at 7 hours because sleep research consistently identifies this as the threshold for adequate rest in young adults. Marijuana use is collapsed into two groups because the original five-level variable has very small counts in the upper frequency categories. Gender enters the model as a demographic control.

Descriptive Statistics

t_outcome = as.data.frame(table(df$high_perf))
names(t_outcome) = c("Performance", "Count")
t_outcome$Percent = round(prop.table(table(df$high_perf)) * 100, 1)
kable_classic(kable(t_outcome, align = "lrr"), full_width = FALSE)

Performance	Count	Percent
Low	74	26.9
High	201	73.1

The sample contains about 2.7 times as many high-performing as low-performing students. This imbalance means the model’s baseline accuracy (always predicting “High”) would be about 73% — a benchmark against which we can evaluate the model’s added value.

cont_vars = df[, c("mentalhealth", "digineed")]
sum_tab = do.call(rbind, lapply(cont_vars, function(x) {
  c(Mean = round(mean(x, na.rm = TRUE), 2),
    SD = round(sd(x, na.rm = TRUE), 2),
    Min = round(min(x, na.rm = TRUE), 2),
    Max = round(max(x, na.rm = TRUE), 2))
}))
kable_classic(kable(sum_tab, align = "rrrr"), full_width = FALSE)

	Mean	SD	Min	Max
mentalhealth	3.80	0.89	1.4	6
digineed	3.55	1.32	1.0	7

Mental health scores span nearly the full range of the scale (1–6) with a mean of 3.8, suggesting that most students report moderately positive well-being. Digital dependency scores are centered near the midpoint, with substantial variability across respondents.

# Cross-tabulations with outcome
cat_vars = list(
  "Sleep" = df$sleep_adequate,
  "Marijuana Use" = df$use_marijuana2,
  "Gender" = df$gender
)

for (nm in names(cat_vars)) {
  ct = table(cat_vars[[nm]], df$high_perf)
  pt = round(prop.table(ct, 1) * 100, 1)
  tbl = cbind(ct, pt)
  colnames(tbl) = c("Low (n)", "High (n)", "Low (%)", "High (%)")
  cat("\n###", nm, "\n\n")
  print(kable_classic(kable(tbl, align = "lrrrr"), full_width = FALSE))
}

Sleep

	Low (n)	High (n)	Low (%)	High (%)
Inadequate	28	56	33.3	66.7
Adequate	46	145	24.1	75.9

Marijuana Use

	Low (n)	High (n)	Low (%)	High (%)
Not at all	47	152	23.6	76.4
At least sometimes	8	19	29.6	70.4

Gender

	Low (n)	High (n)	Low (%)	High (%)
female	39	127	23.5	76.5
male	33	73	31.1	68.9
other	1	1	50.0	50.0

Students who report adequate sleep show a notably higher proportion of high performance compared to those with inadequate sleep, suggesting a tangible association between rest and academic outcomes. The pattern for marijuana use is less pronounced in the raw data, though a slightly smaller share of regular users falls into the high-performance category. The gender split in performance is relatively balanced, with both groups showing similar proportions across performance levels.

Logistic Regression Model

We model the log-odds of high study performance as a function of five predictors:

Mental health — better well-being should support cognitive function and academic engagement.
Digital dependency — higher dependency may distract from studying and reduce performance.
Sleep adequacy — sufficient sleep is essential for memory consolidation and attention.
Marijuana use — regular use may impair concentration and executive function.
Gender — included as a demographic control to account for baseline differences.

model = glm(high_perf ~ mentalhealth + digineed + sleep_adequate + use_marijuana2 + gender,
            data = df, family = binomial)
n_model = nobs(model)
lrt = anova(model, test = "Chisq")
lrt_p = lrt[2, "Pr(>Chi)"]

Model Results

# Extract coefficients, ORs, CIs, and p-values
s = summary(model)
or = exp(coef(model))
ci = exp(confint(model))
p_vals = coef(s)[, 4]

# Clean table, dropping the intercept
or_tab = data.frame(
  Predictor = names(or)[-1],
  OR = round(or[-1], 3),
  CI = paste0("[", round(ci[-1, 1], 3), ", ", round(ci[-1, 2], 3), "]"),
  p = sprintf("%.3f", p_vals[-1])
)
names(or_tab) = c("Predictor", "Odds Ratio", "95% CI", "p")

kable_classic(
  kable(or_tab, align = "llrl", row.names = FALSE),
  full_width = FALSE
)

Predictor	Odds Ratio	95% CI	p
mentalhealth	1.201	[0.796, 1.807]	0.379
digineed	0.832	[0.646, 1.07]	0.151
sleep_adequateAdequate	1.370	[0.664, 2.763]	0.385
use_marijuana2At least sometimes	1.211	[0.471, 3.467]	0.704
gendermale	0.576	[0.297, 1.112]	0.100
genderother	0.372	[0.013, 10.551]	0.507

## 
## Call:
## glm(formula = high_perf ~ mentalhealth + digineed + sleep_adequate + 
##     use_marijuana2 + gender, family = binomial, data = df)
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                        1.0761     0.9985   1.078   0.2811  
## mentalhealth                       0.1831     0.2080   0.880   0.3787  
## digineed                          -0.1841     0.1282  -1.436   0.1509  
## sleep_adequateAdequate             0.3147     0.3620   0.869   0.3847  
## use_marijuana2At least sometimes   0.1913     0.5030   0.380   0.7037  
## gendermale                        -0.5519     0.3353  -1.646   0.0997 .
## genderother                       -0.9900     1.4915  -0.664   0.5069  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 246.88  on 222  degrees of freedom
## Residual deviance: 238.97  on 216  degrees of freedom
##   (52 Beobachtungen als fehlend gelöscht)
## AIC: 252.97
## 
## Number of Fisher Scoring iterations: 4

None of the five predictors reached statistical significance at the conventional alpha level of 0.05. The model as a whole did not significantly outperform the null model (likelihood ratio test: p = 0.185), indicating that the selected predictors — mental health, digital dependency, sleep adequacy, marijuana use, and gender — do not reliably differentiate between high and low academic performers in this sample.

Although non-significant, the direction and magnitude of the estimated effects are worth examining. The odds ratio for mentalhealth is 1.201, meaning that a one-unit increase on the well-being scale corresponds to an estimated 20.1% higher odds of high performance. This is a modest effect, and the wide confidence interval (crossing 1.0) suggests that the true population effect could range from a 20.4% decrease to an 80.7% increase — too imprecise to draw firm conclusions.

Digital dependency (digineed) shows an odds ratio of 0.832. The direction aligns with the expectation that higher dependency is associated with lower performance, but again the confidence interval includes 1.0. Sleep adequacy and marijuana use yield odds ratios of 1.37 and 1.211, respectively, but neither is distinguishable from the null given the available data. The wide confidence intervals for marijuana use and the “other” gender category reflect very small subgroup sizes.

Gender emerges as the closest to a marginal trend: male students have an estimated 42.4% lower odds of high performance compared to female students (OR = 0.576, p = 0.100). This effect, while not significant at the 5% level, may warrant further investigation in larger samples.

# Prepare data for the forest plot
or_df = data.frame(
  Predictor = c("Mental Health", "Digital Dependency", "Sleep: Adequate",
                "Marijuana: At Least Sometimes", "Gender: Male", "Gender: Other"),
  OR = or[-1],
  CI_lower = ci[-1, 1],
  CI_upper = ci[-1, 2]
)
or_df$Predictor = factor(or_df$Predictor, levels = rev(or_df$Predictor))

ggplot(or_df, aes(x = OR, y = Predictor)) +
  geom_vline(xintercept = 1, linetype = "dashed", color = "grey50") +
  geom_point(size = 3, color = "#2c3e50") +
  geom_segment(aes(x = CI_lower, xend = CI_upper), linewidth = 1, color = "#2c3e50") +
  scale_x_log10() +
  labs(x = "Odds Ratio (log scale)", y = NULL) +
  theme_minimal(base_size = 11)

The forest plot visualises each predictor’s odds ratio and its 95% confidence interval. All confidence intervals cross the reference line at OR = 1, confirming that none of the effects reach conventional significance. The plot highlights the considerable uncertainty around several estimates — particularly for marijuana use and the “other” gender category, where small subgroup sizes produce wide intervals.

Model Evaluation

# Predicted probabilities and classification
pred_prob = predict(model, type = "response")
pred_class = factor(ifelse(pred_prob > 0.5, "High", "Low"), levels = c("Low", "High"))
actual = df[names(pred_prob), "high_perf"]

cm = table(Predicted = pred_class, Actual = actual)
cm

##          Actual
## Predicted Low High
##      Low    0    1
##      High  54  168

accuracy = sum(diag(cm)) / sum(cm)
sensitivity = cm[2, 2] / sum(cm[, 2])
specificity = cm[1, 1] / sum(cm[, 1])

The model achieves an overall accuracy of 75.3%. However, this figure is misleading: the model classifies every observation into the majority class (“High”), which already has a base rate of 73%. Sensitivity appears near-perfect at 99.4%, but specificity is zero — the model never identifies low performers. This pattern reveals that the predictors, in combination, do not separate the two groups better than a naive always-High rule.

# McFadden's pseudo-R-squared
pseudo_r2 = 1 - model$deviance / model$null.deviance
pseudo_r2

## [1] 0.0320457

# AIC
AIC(model)

## [1] 252.971

# Likelihood ratio test against null model
lrt[2, c("Df", "Deviance", "Pr(>Chi)")]

##              Df Deviance Pr(>Chi)
## mentalhealth  1   1.7552   0.1852

McFadden’s pseudo-R-squared is 0.032, indicating that the model explains only about 3% of the variance in the outcome — a very small improvement over the null model, which aligns with the non-significant likelihood ratio test. Together, these diagnostics confirm that the selected predictors have limited ability to distinguish between high and low academic performers in this sample.

Discussion and Limitations

None of the five predictors in the logistic regression model reached statistical significance, and the model overall did not outperform the null model. This null result is an informative finding in its own right: it suggests that the psychological and behavioural factors examined here — mental well-being, digital dependency, sleep adequacy, and marijuana use — may not be strong enough determinants of GPA-based performance to be detectable in a sample of 223 students, once they are considered together.

Several factors likely contribute to this outcome. First, the sample size, while adequate for simple descriptive analysis, provides limited statistical power for detecting small-to-moderate effects in a multivariable logistic regression. The odds ratios we observe (ranging from 0.37 to 1.37) are modest in magnitude, and detecting effects of this size would typically require a larger sample. Second, the GPA measure, though a natural proxy for study performance, is self-reported and categorised into broad ordinal levels. This coarseness reduces variance and may obscure real associations that continuous, institutionally verified grades would reveal.

Third, missing data reduced the effective sample from 275 to 223 observations — a loss of nearly 18.9% — primarily due to “No answer” responses on the scale items. This attrition further erodes power and may introduce systematic bias if non-response correlates with the outcome. Fourth, the digineed scale showed only borderline reliability (Cronbach’s alpha = 0.568), meaning that the measurement of digital dependency contains considerable noise, which attenuates estimated effects.

Fifth, the set of predictors is far from exhaustive. Study-related behaviours such as planning, concentration, and perceived academic challenge were not included because of their conceptual proximity to the outcome, but their omission likely limits the model’s explanatory power. It is plausible that factors like study time, motivation, prior academic achievement, or teaching quality are more directly relevant to GPA than the health and lifestyle variables examined here.

Finally, all measures are self-reported and subject to social desirability bias and recall error — particularly for substance use and screen time. The cross-sectional design precludes any causal interpretation, and the sample is drawn from a single university, which limits generalisability.

Future research could address these limitations through larger, multi-institutional samples; the use of institutionally verified grade data; more refined measures of digital behaviour; and longitudinal designs that track changes in both predictors and performance over time. Despite the null findings, the present analysis demonstrates a principled approach to logistic regression modelling — from scale construction and variable recoding to model evaluation and honest reporting of results. The directions of the observed effects are broadly consistent with expectations, and some (particularly the gender difference, p = 0.100) may warrant further investigation in better-powered studies.