#install.packages("xlsx") 
library(xlsx)

# set working directory
setwd("~/R")

# read data
df = read.xlsx("ihsm25_survey.isle.xlsx", sheetName = "data", stringsAsFactors = TRUE)
df_meta = read.xlsx("ihsm25_survey.isle.xlsx", sheetName = "meta")

# Binary Outcome
# 1 = High Performance (Excellent or Good)
# 0 = Standard Performance (Average, Sufficient, or Insufficient)
df$high_perf = ifelse(df$G05Q15 %in% c("Excellent", "Good"), 1, 0)

# Predictors---convert frequency/satisfaction labels into 1-5 scales
levels_5 = c("Never", "Rarely", "Sometimes", "Often", "Always")
soc_levels = c("Very unsatisfied", "Unsatisfied", "Neutral", "Satisfied", "Very satisfied")
df$planning = as.numeric(factor(df$G05Q06.SQ001., levels = levels_5))
df$concentration = as.numeric(factor(df$G05Q06.SQ003., levels = levels_5))
df$social_sat = as.numeric(factor(df$G05Q24, levels = soc_levels))

# Estimate the Logistic Regression Model
model = glm(high_perf ~ planning * concentration + social_sat, data = df,  family = "binomial")
summary(model)
## 
## Call:
## glm(formula = high_perf ~ planning * concentration + social_sat, 
##     family = "binomial", data = df)
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)  
## (Intercept)              1.7385     2.1307   0.816   0.4145  
## planning                -0.5532     0.5738  -0.964   0.3350  
## concentration           -0.9382     0.6326  -1.483   0.1381  
## social_sat               0.1347     0.1459   0.923   0.3558  
## planning:concentration   0.3109     0.1703   1.826   0.0678 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 319.03  on 272  degrees of freedom
## Residual deviance: 300.07  on 268  degrees of freedom
##   (3 Beobachtungen als fehlend gelöscht)
## AIC: 310.07
## 
## Number of Fisher Scoring iterations: 4
# Odds Ratios and 95% Confidence Intervals
results = exp(cbind(OR = coef(model), confint(model)))
## Waiting for profiling to be done...
print("Odds Ratios and 95% Confidence Intervals:")
## [1] "Odds Ratios and 95% Confidence Intervals:"
print(results)
##                               OR      2.5 %     97.5 %
## (Intercept)            5.6886833 0.09330439 429.710308
## planning               0.5751213 0.18065242   1.749580
## concentration          0.3913424 0.10740538   1.312005
## social_sat             1.1441742 0.85703037   1.522677
## planning:concentration 1.3646759 0.98599724   1.931925

The outcome variable high_perf was created by combining the responses “Excellent” and “Good” into a single category. This cutoff was selected to separate students who are clearly meeting or surpassing the learning goals from those who may be struggling or only achieving the basic standard (those rated from “Average” to “Insufficient”). This binary classification helps us focus on identifying the behaviors that are specifically associated with top academic performance.

Analysis of Results (Odds Ratios & Significance) The interaction between planning and concentration was the strongest finding (OR = 1.36, p = 0.068), indicating that the positive effect of planning on performance increases with higher concentration levels. A one-unit increase in their combined effect was associated with about 36% higher odds of strong performance.

Social satisfaction showed a positive but non-significant association with performance (OR = 1.14, p = 0.35), suggesting a 14% increase in the odds of high performance for each step up in satisfaction, although the confidence interval (0.85–1.52) included 1.

The main effects of planning and concentration alone were below 1.0, which is common in models including interaction terms and indicates that each variable by itself is not a strong predictor when the other is held at zero.

At the 5% significance level, none of the predictors reached statistical significance. The interaction term came closest (p = 0.068), indicating a marginal trend rather than strong evidence of an effect. This suggests that the observed relationships may exist, but the current sample does not provide enough evidence to rule out chance completely.

The model also has several limitations. First, the relatively small reduction from null deviance to residual deviance (319 vs. 300) indicates that only a limited share of the variation in performance is explained, and a larger sample could help clarify the interaction effect. Second, important factors such as prior academic achievement or financial stress were not included, raising the possibility of omitted variable bias. Finally, because both the predictors and the outcome are self-reported, the data may be affected by social desirability bias, with students potentially overstating traits such as concentration or performance.