Preface: A Living Record of Learning

These notes serve as a living record of my learning journey, tracing the evolution of my understanding across diverse areas of educational measurement, psychometrics, and applied research practice. They bring together the insights, methods, and discoveries that have emerged through ongoing exploration, experimentation, and reflection.

Rather than a static summary, this collection functions as an iterative archive, a space where concepts are refined, ideas are tested, and frameworks are continually revisited as my knowledge deepens. It reflects not only what I have learned, but also how I have learned: through simulation, coding, and AI-assisted inquiry.

The integration of artificial intelligence tools for drafting, coding, summarization, and idea generation has become a central part of this process, enhancing both productivity and conceptual clarity. Ultimately, these notes document the ongoing synthesis of theory, data, technology, and application that defines my approach to research, learning, and professional growth.


Computerized Adaptive Testing

Computerized Adaptive Testing (CAT) represents a paradigm shift in educational and psychological assessment, transforming the traditional one-size-fits-all model into a personalized, data-driven measurement process. Unlike conventional fixed-form tests, where every examinee receives the same items, CAT dynamically adjusts item difficulty in real-time, tailoring the assessment to each examinee’s evolving ability estimate.

At the heart of CAT is an adaptive feedback loop that continually refines the estimate of a learner’s latent ability, \(\theta\), using that information to guide item selection. Instead of a static sequence, the test adapts after every response.

The CAT process iteratively refines the examinee’s ability estimate through these steps:

  1. Initialization: Start with a provisional ability estimate, typically \(\hat{\theta}_0 = 0\)
  2. Item Selection: Identify the item \(i^* = \arg\max I_i(\hat{\theta}_t)\), maximizing Fisher Information at the current estimate
  3. Administration: Present item \(i^*\) and record the response \(y_{i^*} \in \{0,1\}\)
  4. Ability Update: Re-estimate \(\hat{\theta}_{t+1}\) using Maximum Likelihood (MLE) or Bayesian Expected A Posteriori (EAP) methods
  5. Precision Check: Compute the conditional standard error \(SE(\hat{\theta}_{t+1}) = 1 / \sqrt{I_T(\hat{\theta}_{t+1})}\)
  6. Termination: Stop if \(SE < \tau\) or the test length reaches a preset limit; otherwise, return to step 2

This formulation defines the mathematical backbone of CAT, formalizing how items are adaptively selected and how ability estimates evolve. The algorithm creates a personalized assessment trajectory:

  • High-ability examinees receive increasingly difficult items to probe their upper limit
  • Low-ability examinees receive easier items to sustain motivation and minimize frustration
  • For all, the system administers items that are psychometrically optimal for their current estimated ability level

This information-driven targeting ensures each examinee receives items optimally matched to their ability, making CAT both highly efficient and psychometrically fair.


Advantages of the Adaptive Paradigm

Advantage Description
Increased Efficiency Achieves precise measurement with substantially fewer items
Enhanced Precision Targets items to each examinee’s ability range, reducing measurement error
Improved Engagement Maintains motivation by avoiding items that are too easy or too difficult
Reduced Testing Time Shorter tests without sacrificing psychometric rigor
Individualized Feedback Produces detailed diagnostic information about ability levels and subdomains

Mathematical Foundation: Item Response Theory

CAT is fundamentally grounded in Item Response Theory (IRT), which provides the mathematical framework for modeling the probabilistic relationship between an examinee’s latent ability and their item responses. IRT enables the core adaptive mechanism through several critical properties:

  • Item Parameter Invariance: Item characteristics (discrimination, difficulty, guessing) remain stable across different populations
  • Ability Estimation on a Common Scale: Examinee scores are placed on a shared latent scale, making results comparable even when individuals receive different items
  • Precision of Measurement: For every estimated ability level, \(\hat{\theta}\), the standard error of measurement (SEM) can be computed directly from the test information function
  • Adaptive Targeting: Items are selected to maximize information at the current ability estimate, ensuring each response contributes maximally to refining \(\hat{\theta}\)

The most general model for binary items is the four-parameter logistic (4PL) model. The probability that an examinee with latent ability \(\theta\) answers item \(j\) correctly is:

\[ P_j(\theta) = c_j + (d_j - c_j) \cdot \frac{1}{1 + \exp\big[-D \cdot a_j \cdot (\theta - b_j)\big]}. \]

where:

Parameter Meaning Interpretation Typical Range
\(\mathbf{a_j}\) Discrimination Describes the steepness of the item characteristic curve (ICC); higher values indicate sharper differentiation among ability levels. 0.2 to 2.5¹²³
\(\mathbf{b_j}\) Difficulty / Location Ability level where the ICC’s inflection point occurs; corresponds to the θ value giving a 50% success probability (adjusted for \(\mathbf{c_j}\)). −3.0 to +3.0¹⁴
\(\mathbf{c_j}\) Lower Asymptote Probability of a correct response due to guessing; common in multiple-choice items. 0.00 to 0.35¹⁵
\(\mathbf{d_j}\) Upper Asymptote Maximum attainable probability of a correct response, accounting for carelessness or “slip” behavior. 0.85 to 1.00¹³
\(\mathbf{D}\) Scaling Constant Logistic to normal-ogive scaling factor used in IRT models to approximate the normal-ogive curve. ≈ 1.702⁴⁶

References

¹ Baker, F. B. (2001). The Basics of Item Response Theory (2nd ed.). ERIC Clearinghouse.
² de Ayala, R. J. (2022). The Theory and Practice of Item Response Theory (2nd ed.). Guilford Press.
³ Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. Lawrence Erlbaum.
⁴ Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Kluwer-Nijhoff.
⁵ Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling, and Linking (3rd ed.). Springer.
⁶ Camilli, G. (1994). Origin of the scaling constant D = 1.7 in item response theory. Journal of Educational and Behavioral Statistics, 19(3), 293–295.


Hierarchy of Logistic IRT Models

Model Parameters Fixed Description
1PL (Rasch) \(a_j = 1, \; c_j = 0, \; d_j = 1\) All items have equal discrimination; no guessing or slipping. The Rasch model is a special case of the 1PL model where \(a_j = 1\).
2PL \(c_j = 0, \; d_j = 1\) Allows item-specific discrimination parameters.
3PL \(d_j = 1\) Adds a lower asymptote parameter to model guessing.
4PL — (fully general) Adds both lower and upper asymptotes to account for guessing and slipping.

In practice, the 4PL model is rarely used operationally in educational testing because estimating \(d_j < 1\) reliably requires very large sample sizes or items that exhibit clear ceiling effects (e.g., fatigue, time pressure, or careless errors).

Consequently, most large-scale assessments adopt the 3-Parameter Logistic (3PL) model as the practical upper bound in dichotomous IRT applications. The probability that an examinee with latent ability \(\theta\) answers item \(j\) correctly in 3PL is:

\[ P_j(\theta) = c_j + (1 - c_j) \cdot \frac{1}{1 + \exp\big[-D \cdot a_j \cdot (\theta - b_j)\big]}. \]


Item Response Function Explorer

This Item Response Function Explorer is an interactive Shiny app for visualizing logistic Item Response Theory (IRT) models. It allows users to manipulate the key parameters: discrimination (\(a\)), difficulty (\(b\)), guessing (\(c\)), carelessness (\(d\)), and the scaling constant (\(D\)) — to see how each influences the probability curve \(P(\theta)\). By selecting a specific ability level, users can observe the corresponding predicted probability of a correct response and examine how 3PL and 4PL models differ in slope, location, and asymptotic behavior.

To launch the Item Response Function Explorer on your computer:

Run Locally

  1. Create an empty folder.
  2. Save the code below as app.R inside it.
  3. Open in RStudio → click Run App.

Web Version

Click on the image below

(No R install needed — hosted on shinyapps.io)

IRT Explorer Interface

# =============================================
# Item Response and Item Information Explorer
# =============================================

library(shiny)
library(bslib)
## 
## Attaching package: 'bslib'
## The following object is masked from 'package:utils':
## 
##     page
library(shape)
library(shinyjs)
## 
## Attaching package: 'shinyjs'
## The following object is masked from 'package:shiny':
## 
##     runExample
## The following objects are masked from 'package:methods':
## 
##     removeClass, show
`%or%` <- function(x, y) if (is.null(x)) y else x
clamp <- function(x, lo, hi) pmax(lo, pmin(hi, x))

# ---- IRT core ----
P_4pl <- function(theta,
                  a = 1,
                  b = 0,
                  c = 0,
                  d = 1,
                  D = 1) {
  c + (d - c) / (1 + exp(-D * a * (theta - b)))
}
I_4pl <- function(theta,
                  a = 1,
                  b = 0,
                  c = 0,
                  d = 1,
                  D = 1) {
  # This is the logistic function
  # P(θ) = c + (d - c) / (1 + exp(-D * a * (theta - b)))
  P  <- P_4pl(theta, a, b, c, d, D)
  P  <- pmin(pmax(P, 1e-12), 1 - 1e-12)
  # basic logistic function that ranges from 0 to 1
  # P(θ) = c + (d - c) × s
  s  <- 1 / (1 + exp(-D * a * (theta - b)))
  # This is the first derivative!
  dP <- (d - c) * D * a * s * (1 - s) 
  # This is the IIF
  (dP^2) / (P * (1 - P))
}

# ---- D-scale presets ----
D_map <- c(
  "Rasch (1.0)" = 1.0,
  "Haley-Birnbaum (1.702)" = 1.702,
  "Kullback–Leibler (1.749)" = 1.749
)

# ===========================================
# ---- UI DEFINITION SECTION ----
# ===========================================

ui <- page_fluid(
  theme = bs_theme(version = 5, bootswatch = "flatly"),
  shinyjs::useShinyjs(),
  withMathJax(),
  # Add the CSS at the top level of the UI
  tags$style(
    HTML(
      "
    #model .selectize-input, #Dscale .selectize-input {
      height: 20px;
      min-height: 20px;
      padding: 4px 8px;
      font-size: 0.85em;
      line-height: 1.3;
    }
    #model .selectize-input:after, #Dscale .selectize-input:after {
      top: 12px;
    }
    .slider-separator {
      border-top: 1px solid #dee2e6;
      margin: 10px 0;
    }
    /* Center sliders with 90% width */
    .shiny-input-container {
      width: 90% !important;
      margin-left: auto !important;
      margin-right: auto !important;
    }
      .slim-slider {
        padding-top: 2px !important;
        padding-bottom: 2px !important;
        margin-top: -5px !important;
        margin-bottom: -5px !important;
      }
      .slim-slider .irs--shiny .irs-bar {
        top: 25px;
        height: 3px;
      }
      .slim-slider .irs--shiny .irs-handle {
        top: 20px;
        width: 12px;
        height: 12px;
      }

    /* Reduce width of info boxes to match sliders */
    .info-box {
      width: 90% !important;
      margin-left: auto !important;
      margin-right: auto !important;
    }
    /* --- INFO PAGE STYLES (elegant, restrained) --- */
    .info-wrap {
      padding: 14px 16px;
      max-width: 980px;
      margin: 0 auto;
    }
    .info-wrap h4, .info-wrap h5 { margin-top: 0.6rem; margin-bottom: 0.4rem; }
    .info-wrap p { margin-bottom: 0.5rem; color: #333; }
    .muted { color: #6c757d; }
    .info-callout {
      font-size: 0.95em;
      color: #0c5460;
      background-color: #d1ecf1;
      border: 1px solid #bee5eb;
      border-radius: 8px;
      padding: 12px 14px;
      margin: 8px 0 10px 0;
    }
    .info-grid-3 {
      display: grid;
      grid-template-columns: repeat(3, 1fr);
      gap: 10px;
    }
    .info-card {
      border: 1px solid #e9ecef;
      border-radius: 12px;
      padding: 12px;
      background: #fff;
      box-shadow: 0 0 0 rgba(0,0,0,0);
    }
    .k-blue { color: #219ebc; }
    .k-orange { color: #e76f51; }
    .k-green { color: #2a9d8f; }
    .k-gray { color: #6c757d; }
    .k-badge {
      display: inline-block;
      font-size: 0.75em;
      padding: 2px 8px;
      border-radius: 999px;
      border: 1px solid #e9ecef;
      background: #f8f9fa;
      color: #495057;
      margin-left: 6px;
      vertical-align: middle;
    }
    .info-table {
      width: 100%;
      border-collapse: separate;
      border-spacing: 0;
      font-size: 0.95em;
      margin-top: 6px;
    }
    .info-table th, .info-table td {
      padding: 10px 12px;
      border-top: 1px solid #e9ecef;
      vertical-align: top;
    }
    .info-table th {
      background: #f7f9fb;
      font-weight: 600;
      color: #343a40;
    }
    .info-table tr:first-child th, .info-table tr:first-child td { border-top: none; }
    .tight-ul { margin: 0.25rem 0 0.6rem 1.1rem; padding-left: 0.8rem; }
    .tight-ul li { margin: 0.15rem 0; }
    hr.soft { border: none; border-top: 1px solid #eff2f5; margin: 12px 0; }
    code, .math-inline { background: #f1f3f5; padding: 1px 6px; border-radius: 6px; }
  "
    )
  ),
  layout_columns(
    col_widths = c(5, 7),
    card(
      card_header(
        style = "background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 100%); border-bottom: 3px solid #219ebc;",
        tags$div(
          style = "text-align: center;",
          tags$h4("ITEM PARAMETERS", style = "margin: 0; color: #219ebc; font-weight: 600;"),
        )
      ),      
      
      navset_card_underline(
        nav_panel(
          "Model",
          # Model and D-scale on the same line with proper spacing
          div(
            style = "display: flex;
                         align-items: center;
                         justify-content: space-between;
                         margin: 4px 0;
                         gap: 24px;",
            div(
              style = "display: flex;
                             align-items: center;
                             gap: 6px;
                             flex: 0.7;
                             margin-left: 8px;",
              tags$label(
                "Model:",
                style = "margin-bottom: 0;
                                        white-space: nowrap;
                                        font-size: 0.75em;
                                        align-self: center;
                                        font-weight: bold;"
              ),
              selectInput(
                "model",
                label = NULL,
                choices = c("1PL", "2PL", "3PL", "4PL"),
                selected = "2PL",
                width = "100%"
              )
            ),
            div(
              style = "display: flex; align-items: baseline; gap: 6px; flex: 1.3; margin-right: 8px;",
              tags$label(
                "D:",
                style = "margin-bottom: 0;
                                        white-space: nowrap;
                                        font-size: 0.75em;
                                        align-self: center;
                                        font-weight: bold;"
              ),
              selectInput(
                "Dscale",
                label = NULL,
                choices = names(D_map),
                selected = "Haley-Birnbaum (1.702)",
                width = "100%"
              )
            )
          ),
          tags$hr(style = "margin: 3px 0;"),
          # Only show modifiable parameter sliders
          uiOutput("parameter_sliders"),
          # Model Info messages at bottom of Model tab
          uiOutput("model_info")
        ),
        nav_panel(
          "Display",
          # Transparency sliders with bold labels and separator
          sliderInput(
            "alpha_icc",
            tags$span(style = "font-size: 0.60em;
                      font-weight: bold;", "ICC Opacity"),
            min = 0.0,
            max = 1.0,
            value = 1.0,
            step = 0.05
          ),
          # Add thin line separator between opacity sliders
          div(class = "slider-separator"),
          sliderInput(
            "alpha_iif",
            tags$span(style = "font-size: 0.60em;
                      font-weight: bold;", "IIF Opacity"),
            min = 0.0,
            max = 1.0,
            value = .25,
            step = 0.05
          ),
          # Add thin line separator before annotation controls
          div(class = "slider-separator"),
          # Annotation Text Size control
          sliderInput(
            "annotation_size",
            tags$span(style = "font-size: 0.60em;
                      font-weight: bold;", "Annotation Text Size"),
            min = 0.8,
            max = 2.0,
            value = 1.25,
            step = 0.05
          ),
          # Info box explaining opacity sliders - reduced width
          div(class = "slider-separator"),
          div(
            class = "info-box",
            style = "font-size: 0.70em;
             color: #004085;
             background-color: #cce5ff;
             border: 1px solid #b8daff;
             border-radius: 4px;
             padding: 8px;
             margin-top: 8px;",
            tags$strong("Visualization Controls:"),
            "Adjust layer opacity and display settings. 
            Use opacity to focus on specific curves: 
            decrease to fade into background, set to 0 
            to remove from view. Annotation Text Size 
            controls the size of value labels and probe indicators."
          )
        )
      )
    ),
    
    # ----- Main panel -----
    card(
      card_header(
        style = "background: linear-gradient(135deg, #f8f9fa 0%, #e9ecef 100%); border-bottom: 3px solid #219ebc;",
        tags$div(
          style = "text-align: center;",
          tags$h4("ITEM RESPONSE & INFORMATION FUNCTIONS", style = "margin: 0; color: #219ebc; font-weight: 600;"),
        )
      ),      
      navset_card_pill(        
        # ---- Explorer tab ----
        nav_panel(
          "Explorer",
          plotOutput("plot_icc_iif", height = "330px"),
          div(
            style = "display: flex; justify-content: center;",
            div(
              style = "width: 78%;",
              sliderInput(
                "theta_probe",
                tags$span("Ability Probe (θ):", style = "font-weight: 600; text-shadow: 0.5px 0.5px 1px rgba(0,0,0,0.1);"),
                min = -3,
                max = 3,
                value = 0,
                step = 0.01
              )
            )
          )
        ),
        
        # ---- Info tab (redesigned) ----
        nav_panel(
          "Theory",
          div(
            class = "info-wrap",
            style = "max-width: 600px; margin: 0 auto;",
            # Header Section
            div(
              style = "text-align: center; margin-bottom: 2rem;",
              h4("Item Response Theory Explorer Guide", style = "color: #219ebc; margin-bottom: 0.5rem;"),
              p("Comprehensive documentation of IRT models, parameters, and mathematical foundations", class = "muted")
            ),
            
            # Overview Section
            div(
              style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin-bottom: 2rem;",
              h4("Overview", style = "text-align: center; margin-bottom: 1.5rem;"),
              div(
                class = "info-card",
                style = "border-left: 4px solid #219ebc; margin-bottom: 1rem;",
                h5("What", class = "k-blue"),
                p(
                  "This explorer overlays an item's response curve (IRF/ICC) with its information curve (IIF) for a chosen IRT model and scaling constant, D."
                )
              ),
              div(
                class = "info-card",
                style = "border-left: 4px solid #2a9d8f; margin-bottom: 1rem;",
                h5("Why", class = "k-green"),
                p(
                  "IIF shows where an item measures ability most precisely. CAT engines pick items that maximize information near the current ability estimate."
                )
              ),
              div(
                class = "info-card",
                style = "border-left: 4px solid #e76f51;",
                h5("How", class = "k-orange"),
                p(
                  "Adjust discrimination, difficulty, guessing, slipping and scaling constant (D) to see how IRF shape and IIF peak shift; use the ability probe (θ) to read exact values."
                )
              )
            ),
            
            # Parameter Glossary
            div(
              class = "info-callout",
              style = "margin-bottom: 2rem;",
              h5("Parameter glossary"),
              tags$ul(
                class = "tight-ul",
                tags$li(
                  tags$b("Discrimination (a): "),
                  "slope at b; higher a ⇒ steeper IRF and taller/narrower IIF."
                ),
                tags$li(
                  tags$b("Difficulty (b): "),
                  "location where the IRF transitions most rapidly; IIF typically peaks near b (for 2PL)."
                ),
                tags$li(
                  tags$b("Guessing (c): "),
                  "lower bound of IRF; increases P(θ) at low θ but decreases information there."
                ),
                tags$li(
                  tags$b("Slipping/carelessness (d): "),
                  "upper bound of IRF; reduces maximum P(θ) and shifts/lowers IIF peak."
                ),
                tags$li(
                  tags$b("Scaling constant (D): "),
                  "logistic-to-normal scaling constant that affects steepness. See below."
                )
              )
            ),
            
            
            # Models & Parameters Section
            div(
              style = "margin-bottom: 2rem;",
              div(
                style = "display: flex; align-items: center; justify-content: space-between; margin-bottom: 1rem;",
                h4("Models & Parameters", style = "margin: 0;"),
                span("quick guide", class = "k-badge")
              ),
              tags$table(
                class = "info-table",
                style = "box-shadow: 0 2px 8px rgba(0,0,0,0.1); border-radius: 8px; overflow: hidden;",
                tags$thead(tags$tr(
                  tags$th("Model", style = "background: #219ebc; color: white;"),
                  tags$th("Free Parameters", style = "background: #219ebc; color: white;"),
                  tags$th("Interpretation", style = "background: #219ebc; color: white;")
                )),
                tags$tbody(
                  tags$tr(style = "background: #f8f9fa;",
                          tags$td(style = "font-weight: 600;", "1PL / Rasch"),
                          tags$td("b; a fixed (parallel ICCs); c=0; d=1"),
                          tags$td("Item difficulty only; equal discrimination across items.")
                  ),
                  tags$tr(style = "background: #ffffff;",
                          tags$td(style = "font-weight: 600;", "2PL"),
                          tags$td("a, b; c=0; d=1"),
                          tags$td("Adds discrimination; steeper slope ⇒ more information near b.")
                  ),
                  tags$tr(style = "background: #f8f9fa;",
                          tags$td(style = "font-weight: 600;", "3PL"),
                          tags$td("a, b, c; d=1"),
                          tags$td(
                            "Adds lower asymptote (guessing). Reduces information at very low θ."
                          )
                  ),
                  tags$tr(style = "background: #ffffff;",
                          tags$td(style = "font-weight: 600;", "4PL"),
                          tags$td("a, b, c, d"),
                          tags$td("Adds upper asymptote (slipping). Caps peak probability below 1.")
                  )
                )
              )
            ),
            
            # Scaling Constants
            div(
              class = "info-card",
              style = "border-left: 4px solid #2a9d8f; margin-bottom: 2rem;",
              h5("Scaling constants ", code("D"), class = "k-green"),
              p("Common choices are:"),
              tags$ul(
                class = "tight-ul",
                tags$li(
                  tags$b("Rasch (1.0): "),
                  "baseline logistic scale; typical in pure Rasch contexts."
                ),
                tags$li(
                  tags$b("Haley-Birnbaum (1.702): "),
                  "matches logistic to normal-ogive slope near the origin; widely used in practice."
                ),
                tags$li(
                  tags$b("Kullback–Leibler (1.749): "),
                  "optimizes logistic-normal approximation under a KL criterion; slightly steeper than 1.702."
                )
              ),
              p(
                class = "muted",
                "Changing ",
                code("D"),
                " uniformly rescales slopes: larger D ⇒ steeper logistic ⇒ higher information everywhere other parameters fixed."
              )
            ),
            
            # Core Formulas Section
            div(
              style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin-bottom: 2rem;",
              h4("Core Formulas", style = "text-align: center; margin-bottom: 1.5rem;"),
              
              p(
                "These mathematical foundations define how item response theory models the relationship between ability (θ) and 
    the probability of a correct response. The formulas build upon each other to ultimately quantify measurement precision."
              ),
              
              div(
                class = "info-card",
                style = "border-left: 4px solid #219ebc; margin-bottom: 1.5rem;",
                h5("Item Response Function (IRF/ICC)", class = "k-blue"),
                p(
                  "The IRF gives the probability of a correct response P(θ) as a function of ability θ. 
      It's bounded between the guessing parameter c (lower asymptote) and slipping parameter d (upper asymptote), 
      with the logistic curve's steepness controlled by discrimination a and location by difficulty b."
                ),
                HTML("
                  \\[
                    P(\\theta) = c + \\frac{d - c}{1 + e^{-D \\cdot a \\cdot (\\theta - b)}}
                  \\]
                ")
              ),
              
              div(
                class = "info-card",
                style = "border-left: 4px solid #e76f51; margin-bottom: 1.5rem;",
                h5("Derivative of IRF", class = "k-orange"),
                p(
                  "The derivative P'(θ) measures how rapidly the probability changes at each ability level. 
      It reaches its maximum at the difficulty parameter b, where the item is most sensitive to 
      changes in ability. This slope is crucial for computing information."
                ),
                HTML("
                  \\[
                    P'(\\theta) = (d - c)\\cdot D \\cdot a \\cdot s(\\theta)\\,\\bigl(1 - s(\\theta)\\bigr), \\\\[18pt]
                    \\text{where} \\: s(\\theta) = \\frac{1}{1 + e^{-D \\cdot a \\cdot (\\theta - b)}}
                  \\]
                ")
              ),
              
              div(
                class = "info-card",
                style = "border-left: 4px solid #2a9d8f; margin-bottom: 1.5rem;",
                h5("Item Information Function (IIF)", class = "k-green"),
                p(
                  "Fisher information quantifies how much an item contributes to measuring ability at each θ level. 
      It peaks where the item is most informative—typically near the difficulty parameter b—and depends 
      on both the slope (derivative) and the variance of responses at each ability level."
                ),
                HTML("
                  \\[
                    I(\\theta) = \\frac{\\bigl[P'(\\theta)\\bigr]^2}{P(\\theta) \\cdot \\big[1 - P(\\theta)\\big]}
                  \\]
                ")
              ),
              
              # Test-Level Measurement Section
              div(
                class = "info-card",
                style = "border-left: 4px solid #2a9d8f; margin-bottom: 2rem;",
                h5("From Items to Test-Level Measurement", class = "k-green"),
                p(
                  "While individual items provide limited information, combining them across a test yields comprehensive ",
                  "measurement precision across the ability spectrum. The test information function aggregates ",
                  "item information to determine overall measurement accuracy."
                ),
                
                div(
                  style = "background: #f1f3f5; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                  p(
                    tags$b("For a test with items j=1..J, the Test Information Function (TIF) is the sum:"),
                    style = "margin-bottom: 0.5rem;"
                  ),
                  HTML("
                    \\[
                      I_T(\\theta) = \\sum_{j=1}^J I_j(\\theta)
                    \\]
                  ")
                ),
                
                div(
                  class = "info-grid-3",
                  style = "grid-template-columns: 1fr; gap: 1rem; margin: 1rem 0;",
                  
                  div(
                    style = "background: #e8f4f8; padding: 1rem; border-radius: 6px;",
                    h6("Standard Error of Measurement", style = "color: #219ebc; margin-top: 0;"),
                    HTML("\\[ \\mathrm{SEM}(\\theta) = \\frac{1}{\\sqrt{I_T(\\theta)}} \\]"),
                    p(
                      "Quantifies measurement precision: smaller SEM indicates more precise ability estimates.",
                      style = "font-size: 0.9em; margin-bottom: 0;"
                    )
                  ),
                  
                  div(
                    style = "background: #e8f4e8; padding: 1rem; border-radius: 6px;",
                    h6("Conditional Reliability", style = "color: #2a9d8f; margin-top: 0;"),
                    HTML("\\[ \\rho(\\theta) = \\frac{I_T(\\theta)}{1 + I_T(\\theta)} \\]"),
                    p(
                      "Measures test reliability at each ability level, ranging from 0 to 1.",
                      style = "font-size: 0.9em; margin-bottom: 0;"
                    )
                  )
                ),
                
                p(
                  class = "muted",
                  style = "font-size: 0.9em; border-top: 1px solid #dee2e6; padding-top: 0.5rem;",
                  "These test-level metrics show how item properties collectively determine overall measurement quality."
                )
              ),
              
              # Visual Interpretation Guide
              div(
                class = "info-callout",
                style = "margin-bottom: 2rem;",
                h5("Interpreting the Visual Display"),
                p(
                  "The explorer shows both curves simultaneously to illustrate the relationship between ",
                  "response probability and measurement precision."
                ),
                
                div(
                  class = "info-grid-3",
                  style = "grid-template-columns: 1fr; gap: 0.5rem; margin-top: 1rem;",
                  
                  div(
                    style = "display: flex; align-items: flex-start; gap: 0.75rem;",
                    div(style = "color: #219ebc; font-weight: bold; min-width: 100px;", "IRF/ICC Curve:"),
                    div(
                      p(
                        tags$b("Left y-axis:"), "Probability of correct response P(θ), bounded between c and d",
                        style = "margin: 0.25rem 0;"
                      ),
                      p(
                        tags$b("Shape:"), "S-shaped curve showing how probability changes with ability",
                        style = "margin: 0.25rem 0;"
                      )
                    )
                  ),
                  
                  div(
                    style = "display: flex; align-items: flex-start; gap: 0.75rem;",
                    div(style = "color: #e76f51; font-weight: bold; min-width: 100px;", "IIF Curve:"),
                    div(
                      p(
                        tags$b("Right y-axis:"), "Measurement precision I(θ), rescaled for visualization",
                        style = "margin: 0.25rem 0;"
                      ),
                      p(
                        tags$b("Exact values:"), "Displayed at the probe location for precise reading",
                        style = "margin: 0.25rem 0;"
                      )
                    )
                  )
                ),
                
                div(
                  style = "background: #f8f9fa; padding: 1rem; border-radius: 6px; margin-top: 1rem;",
                  h6("Key Relationships to Observe:", style = "margin-top: 0;"),
                  tags$ul(
                    class = "tight-ul",
                    tags$li(
                      tags$b("Peak information:"), "Typically occurs where the IRF is steepest (near difficulty b)"
                    ),
                    tags$li(
                      tags$b("Discrimination effect:"), "Higher a creates taller, narrower information peaks"
                    ),
                    tags$li(
                      tags$b("Asymptote impact:"), "Guessing (c) and slipping (d) flatten and shift information curves"
                    )
                  )
                )
              ),
              
              # Practical Applications
              div(
                class = "info-card",
                style = "border-left: 4px solid #e76f51; margin-bottom: 2rem;",
                h5("Practical Applications & Best Practices", class = "k-orange"),
                
                div(
                  style = "background: #fff0e8; padding: 1rem; border-radius: 6px; margin-bottom: 1rem;",
                  h6("Test Design Strategies", style = "color: #e76f51; margin-top: 0;"),
                  tags$ul(
                    class = "tight-ul",
                    tags$li(
                      tags$b("Targeted coverage:"), "Spread difficulty (b) values across your ability range of interest"
                    ),
                    tags$li(
                      tags$b("CAT optimization:"), "Select items with high IIF at current ability estimates"
                    ),
                    tags$li(
                      tags$b("Efficient measurement:"), "Use high-discrimination items where precise measurement is critical"
                    )
                  )
                ),
                
                div(
                  style = "background: #fff8e8; padding: 1rem; border-radius: 6px; margin-bottom: 1rem;",
                  h6("Parameter Considerations", style = "color: #d4a017; margin-top: 0;"),
                  tags$ul(
                    class = "tight-ul",
                    tags$li(
                      tags$b("3PL caution:"), "High guessing (c) inflates low-ability scores but reduces information"
                    ),
                    tags$li(
                      tags$b("4PL adjustment:"), "Slipping (d<1) caps maximum performance and lowers information peaks"
                    ),
                    tags$li(
                      tags$b("Negative discrimination:"), "Indicates potential reverse-coding or item quality issues"
                    )
                  )
                )
              ),
              
              # Common Pitfalls
              div(
                class = "info-callout",
                style = "background-color: #fff3cd; border-color: #ffeaa7; margin-bottom: 2rem;",
                h5("Common Pitfalls & Misinterpretations"),
                p(
                  "Avoid these frequent misunderstandings when working with IRT models and information functions:"
                ),
                
                div(
                  class = "info-grid-3",
                  style = "grid-template-columns: 1fr; gap: 1rem; margin-top: 1rem;",
                  
                  div(
                    style = "background: #fffbf0; padding: 1rem; border-radius: 6px;",
                    h6("Difficulty Interpretation", style = "color: #856404; margin-top: 0;"),
                    p(
                      "Don't infer difficulty from a single probability value. True difficulty is where ",
                      "the IRF is steepest (near b), not just P(θ) at one θ.",
                      style = "margin-bottom: 0; font-size: 0.9em;"
                    )
                  ),
                  
                  div(
                    style = "background: #fffbf0; padding: 1rem; border-radius: 6px;",
                    h6("Scale Consistency", style = "color: #856404; margin-top: 0;"),
                    p(
                      "Keep scaling constant D consistent when comparing information across items or tests. ",
                      "Different D values rescale slopes and information values.",
                      style = "margin-bottom: 0; font-size: 0.9em;"
                    )
                  ),
                  
                  div(
                    style = "background: #fffbf0; padding: 1rem; border-radius: 6px;",
                    h6("Asymptote Awareness", style = "color: #856404; margin-top: 0;"),
                    p(
                      "Account for guessing (c) and slipping (d) parameters—they fundamentally change ",
                      "the effective measurement range and information profile.",
                      style = "margin-bottom: 0; font-size: 0.9em;"
                    )
                  )
                )
              ),
              
              # In the Theory tab, replace the References section with this:
              
              # References Section
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px;",
                h4("References & Further Reading"),
                p(
                  "For deeper exploration of item response theory and its applications:"
                ),
                
                div(
                  class = "info-grid-3",
                  style = "grid-template-columns: 1fr; gap: 0.5rem; margin: 1rem 0;",
                  
                  div(
                    style = "border-left: 3px solid #219ebc; padding-left: 1rem;",
                    p(
                      tags$b("Baker & Kim (2004)"),
                      br(),
                      "Item Response Theory: Parameter Estimation Techniques",
                      style = "margin: 0; font-size: 0.9em;"
                    )
                  ),
                  
                  div(
                    style = "border-left: 3px solid #2a9d8f; padding-left: 1rem;",
                    p(
                      tags$b("de Ayala (2009)"),
                      br(),
                      "The Theory and Practice of Item Response Theory",
                      style = "margin: 0; font-size: 0.9em;"
                    )
                  ),
                  
                  
                  div(
                    style = "border-left: 3px solid #2a9d8f; padding-left: 1rem;",
                    p(
                      tags$b("Hambleton et al. (1991)"),
                      br(),
                      "Fundamentals of Item Response Theory",
                      style = "margin: 0; font-size: 0.9em;"
                    )
                  ),
                  
                  div(
                    style = "border-left: 3px solid #6c757d; padding-left: 1rem;",
                    p(
                      tags$b("Lord (1980)"),
                      br(),
                      "Applications of Item Response Theory to Practical Testing Problems",
                      style = "margin: 0; font-size: 0.9em;"
                    )
                  ),
                ),
                
                # Updated call-to-action with reference to RPubs
                div(
                  style = "text-align: center; margin-top: 1.5rem; padding-top: 1.5rem; border-top: 2px solid #dee2e6;",
                  p(
                    class = "muted",
                    "For comprehensive documentation, mathematical derivations, and practical examples, visit the ",
                    tags$a(
                      href = "https://rpubs.com/castro/irt",
                      target = "_blank",
                      "Interactive IRT Explorer Guide on RPubs",
                      style = "color: #9c27b0; font-weight: 600;"
                    ),
                    ". Use the 'Explorer' tab above to interact with these concepts visually."
                  )
                )
              )            
            )
          )
        ),
        
        # ---- FAQ tab ----
        nav_panel(
          "FAQ",
          div(
            class = "info-wrap",
            style = "max-width: 980px; margin: 0 auto;",
            
            # Header
            div(
              style = "text-align: center; margin-bottom: 2rem; border-bottom: 2px solid #f1f3f5; padding-bottom: 1rem;",
              h4("Frequently Asked Questions", style = "color: #219ebc; margin-bottom: 0.5rem;"),
              p("Detailed explanations of IRT concepts and mathematical foundations", class = "muted")
            ),
            
            # FAQ 1: Discrimination and Slope Relationship
            div(
              class = "info-card",
              style = "border-left: 4px solid #219ebc; margin-bottom: 2rem;",
              h5("What is the relationship between discrimination and the slope of the item response function?", class = "k-blue"),
              
              p("The discrimination parameter (a) directly controls the steepness of the item response function (IRF). Higher discrimination values create steeper curves, while lower values create flatter curves."),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("Mathematical Relationship", style = "margin-top: 0;"),
                HTML("
          \\[
            \\text{Slope at } \\theta = b = \\frac{D \\cdot a \\cdot (d-c)}{4}
          \\]
        "),
                p("Where:"),
                tags$ul(
                  class = "tight-ul",
                  tags$li(tags$code("a"), " is the discrimination parameter"),
                  tags$li(tags$code("D"), " is the scaling constant (1.0, 1.702, or 1.749)"), 
                  tags$li(tags$code("d-c"), " is the range between upper and lower asymptotes"),
                  tags$li(tags$code("b"), " is the difficulty parameter where slope is maximized")
                )
              ),
              
              div(
                style = "background: #e8f4f8; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("Key Insights:", style = "margin-top: 0; color: #219ebc;"),
                tags$ul(
                  class = "tight-ul",
                  tags$li(tags$b("Linear relationship: "), "Slope is directly proportional to discrimination (a)"),
                  tags$li(tags$b("Maximum slope at b: "), "The IRF is steepest exactly at the difficulty parameter"),
                  tags$li(tags$b("Asymptote effect: "), "The maximum possible slope is limited by (d-c) - items with guessing/slipping have reduced maximum slopes"),
                  tags$li(tags$b("D scaling: "), "The scaling constant amplifies or reduces the effective discrimination")
                )
              ),
              
              div(
                class = "info-callout",
                style = "margin-top: 1rem;",
                h5("Key Takeaway"),
                p(
                  "The discrimination parameter ",
                  tags$code("a"),
                  " is fundamentally a slope parameter. It controls how rapidly the probability of a correct response changes with ability. ",
                  "Higher discrimination means steeper slopes, which translates to better measurement precision and higher information at the item's most informative point."
                )
              )
            ),
            
            # FAQ 2: Why Information Peaks at Difficulty
            div(
              class = "info-card",
              style = "border-left: 4px solid #2a9d8f; margin-bottom: 2rem;",
              h5("Why does the information function typically peak near the difficulty parameter?", class = "k-green"),
              
              p("The item information function (IIF) reaches its maximum where the item provides the most precise measurement of ability. For most IRT models, this occurs near the difficulty parameter (b) because:"),
              
              tags$ul(
                class = "tight-ul",
                tags$li(tags$b("Maximum slope: "), "The item response function (IRF) is steepest at θ = b, meaning probability changes most rapidly with small changes in ability"),
                tags$li(tags$b("Optimal discrimination: "), "Items best distinguish between examinees of slightly different abilities where P(θ) ≈ 0.5"),
                tags$li(tags$b("Mathematical derivation: "), "For the 2PL model, the IIF is maximized exactly at θ = b")
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                HTML("
          \\[
            I(\\theta) = \\frac{[P'(\\theta)]^2}{P(\\theta)[1-P(\\theta)]}
          \\]
        "),
                p("Where P'(θ) is the derivative of the response function. This reaches maximum where the slope is steepest and P(θ) is not too close to 0 or 1.")
              ),
              
              p("However, in 3PL and 4PL models, guessing and slipping parameters can shift the peak away from b, as they affect both the slope and the variance terms.")
            ),
            
            # FAQ 3: Effect of Guessing on Information
            div(
              class = "info-card",
              style = "border-left: 4px solid #e76f51; margin-bottom: 2rem;",
              h5("How does guessing affect the information function?", class = "k-orange"),
              
              p("Guessing parameter (c > 0) significantly impacts the information function in several ways:"),
              
              tags$ul(
                class = "tight-ul", 
                tags$li(tags$b("Reduces peak information: "), "The maximum information value decreases because guessing adds noise to the measurement"),
                tags$li(tags$b("Shifts information leftward: "), "The information peak moves to higher ability levels as low-ability examinees benefit from guessing"),
                tags$li(tags$b("Flattens the curve: "), "Information becomes more spread out across the ability scale"),
                tags$li(tags$b("Mathematical reason: "), "Guessing increases the minimum probability P(θ), which increases the denominator in the information function formula")
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                p(tags$b("Practical implication:"), "Items with high guessing parameters are less useful for precise measurement at any ability level, but particularly at low abilities where guessing dominates.")
              )
            ),
            
            # FAQ 4: Model Selection Guidance
            div(
              class = "info-card", 
              style = "border-left: 4px solid #9c27b0; margin-bottom: 2rem;",
              h5("When should I use 2PL vs 3PL vs 4PL models?", class = "k-purple"),
              
              p("Model choice depends on your testing context, sample size, and theoretical considerations:"),
              
              div(
                style = "background: #f8f9fa; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("2PL Model (Recommended starting point)", style = "margin-top: 0;"),
                tags$ul(
                  class = "tight-ul",
                  tags$li(tags$b("When: "), "Guessing is not a major concern, large sample size (>500)"),
                  tags$li(tags$b("Advantages: "), "Simpler estimation, fewer parameters, good for most educational testing"),
                  tags$li(tags$b("Limitations: "), "May misfit if guessing occurs")
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1rem; border-radius: 6px; margin: 1rem 0;", 
                h6("3PL Model (Multiple-choice tests)", style = "margin-top: 0;"),
                tags$ul(
                  class = "tight-ul",
                  tags$li(tags$b("When: "), "Multiple-choice items, guessing likely, very large sample size (>1000)"),
                  tags$li(tags$b("Advantages: "), "Accounts for guessing, more realistic for low abilities"),
                  tags$li(tags$b("Limitations: "), "Harder to estimate, parameter correlations can cause issues")
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("4PL Model (Specialized applications)", style = "margin-top: 0;"),
                tags$ul(
                  class = "tight-ul", 
                  tags$li(tags$b("When: "), "Careless errors expected, very large samples, specialized testing"),
                  tags$li(tags$b("Advantages: "), "Most flexible, accounts for both guessing and slipping"),
                  tags$li(tags$b("Limitations: "), "Very hard to estimate, often overparameterized")
                )
              ),
              
              p(tags$b("General guideline:"), "Start with 2PL, use 3PL only if guessing is substantial, and use 4PL sparingly with strong theoretical justification.")
            ),
            
            # FAQ 5: Negative Discrimination
            div(
              class = "info-card",
              style = "border-left: 4px solid #e76f51; margin-bottom: 2rem;",
              h5("FAQ: Why Is Fisher Information Positive When Discrimination (a) Is Negative?", class = "k-orange"),
              
              p(
                "It can be surprising that an item with a negative discrimination parameter \\((a < 0)\\) still shows a positive item information function (IIF). 
        This is not a bug: it is a direct consequence of how Fisher information is defined."
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("1. Start from the 4PL item response function", style = "margin-top: 0;"),
                HTML("
          \\[
            P(\\theta) 
            = c + (d - c)\\cdot s(\\theta), \\text{where}
            \\quad
            s(\\theta) = \\frac{1}{1 + e^{-D \\cdot a \\cdot (\\theta - b)}}.
          \\]
        "),
                p(
                  "Here a is the discrimination, b the difficulty, and c,d the lower and upper asymptotes."
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("2. Its derivative carries the sign of a", style = "margin-top: 0;"),
                HTML("
          \\[
            P'(\\theta)
            = (d - c) \\cdot D \\cdot a \\cdot s(\\theta) \\cdot \\bigl(1 - s(\\theta)\\bigr).
          \\]
        "),
                p(
                  "The factor ",
                  tags$i("a"),
                  " appears linearly. If ",
                  tags$i("a > 0"),
                  " the IRF is increasing; if ",
                  tags$i("a < 0"),
                  " the IRF is decreasing (reverse-keyed item)."
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("3. Fisher information squares the slope", style = "margin-top: 0;"),
                HTML("
          \\[
            I(\\theta) 
            = \\frac{\\bigl(P'(\\theta)\\bigr)^2}{P(\\theta) \\cdot \\bigl[1 - P(\\theta)\\bigr]}.
          \\]
        "),
                HTML("
          Substituting the derivative,
          \\[
            \\begin{aligned}
              I(\\theta)
                &= \\frac{
                  \\Bigl[(d - c) \\cdot D \\cdot a \\cdot s(\\theta) \\cdot \\bigl(1 - s(\\theta)\\bigr)\\Bigr]^2
                }{
                  P(\\theta) \\cdot \\bigl[1 - P(\\theta)\\bigr]
                } \\\\[18pt]
                &= a^2 \\cdot
                  \\underbrace{
                    \\frac{
                      (d - c)^2 \\cdot D^2 \\cdot
                      \\Bigl[s(\\theta) \\cdot \\bigl(1 - s(\\theta)\\bigr)\\Bigr]^2
                    }{
                      P(\\theta) \\cdot \\Bigl[1 - P(\\theta)\\Bigr]
                    }
                  }_{\\ge 0}.
            \\end{aligned}
          \\]
        "),
                p(
                  "Because the derivative is squared, Fisher information depends on ",
                  tags$i("\\(a^2\\)"),
                  " rather than ",
                  tags$i("a"),
                  " itself. This guarantees ",
                  tags$i("I(θ) ≥ 0"),
                  " for all θ."
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("4. Same IIF for a and −a", style = "margin-top: 0;"),
                HTML("
          \\[
            I_{a}(\\theta) = I_{-a}(\\theta),
          \\]
        "),
                p(
                  "Flipping the sign of ",
                  tags$i("a"),
                  " flips the slope of the ICC but does not change the magnitude of the curvature of the log-likelihood. 
          Fisher information only measures how sharply the likelihood peaks around θ, not the direction of the slope."
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("5. Interpretation for reverse-keyed or mis-keyed items", style = "margin-top: 0;"),
                tags$ul(
                  class = "tight-ul",
                  tags$li(
                    "An item with ",
                    tags$i("a < 0"),
                    " has a decreasing ICC: higher-ability examinees are less likely to answer correctly (often reverse-coded or mis-keyed)."
                  ),
                  tags$li(
                    "The Fisher IIF is still non-negative and identical to that of an item with ",
                    tags$i("|a|"),
                    " but positive discrimination."
                  ),
                  tags$li(
                    "In practice, such items are flagged for review or recoded, even though their Fisher information remains positive."
                  )
                )
              ),
              
              div(
                class = "info-callout",
                h5("Key takeaway"),
                p(
                  "Fisher information is a measure of the ",
                  tags$em("magnitude of curvature"),
                  " of the log-likelihood, so it is always non-negative and symmetric in ",
                  tags$i("a"),
                  ". The sign of ",
                  tags$i("a"),
                  " still matters a lot—but it shows up in the ",
                  tags$em("shape and direction of the IRF"),
                  ", not in the value of the IIF itself. This is why the app highlights negative discrimination with contrasting colors and warnings."
                )
              )
            ),
            
            # Add this FAQ item to your FAQ section
            
            div(
              class = "info-card",
              style = "border-left: 4px solid #6c757d; margin-bottom: 2rem;",
              h5("What is the theta scale and how does it relate to the standard normal distribution?", class = "k-gray"),
              
              p("In IRT, theta (θ) represents the latent ability or trait being measured. The theta scale is typically standardized to have a mean of 0 and standard deviation of 1, making it directly comparable to the standard normal distribution."),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("Standard Normal Scale Properties", style = "margin-top: 0;"),
                HTML("
      \\[
        \\theta \\sim N(0, 1)
      \\]
    "),
                tags$ul(
                  class = "tight-ul",
                  tags$li(tags$b("Mean (μ) = 0"), " - Average ability level"),
                  tags$li(tags$b("Standard deviation (σ) = 1"), " - Unit of measurement"),
                  tags$li(tags$b("Range: "), "Typically displayed from -3 to +3, covering 99.7% of the population")
                )
              ),
              
              div(
                style = "background: #e8f4f8; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("Interpretation Guide", style = "margin-top: 0; color: #219ebc;"),
                
                div(
                  style = "display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 0.5rem; margin: 1rem 0;",
                  div(
                    style = "background: #e8f5e8; padding: 1rem; border-radius: 6px; text-align: center;",
                    h6("θ = -2", style = "margin-top: 0; color: #2a9d8f;"),
                    p("Very Low Ability", style = "margin-bottom: 0; font-size: 0.9em;"),
                    p("2nd percentile", style = "font-size: 0.8em; color: #6c757d; margin-bottom: 0;")
                  ),
                  div(
                    style = "background: #fff3cd; padding: 1rem; border-radius: 6px; text-align: center;",
                    h6("θ = 0", style = "margin-top: 0; color: #856404;"),
                    p("Average Ability", style = "margin-bottom: 0; font-size: 0.9em;"),
                    p("50th percentile", style = "font-size: 0.8em; color: #6c757d; margin-bottom: 0;")
                  ),
                  div(
                    style = "background: #ffeaa7; padding: 1rem; border-radius: 6px; text-align: center;",
                    h6("θ = +2", style = "margin-top: 0; color: #e76f51;"),
                    p("Very High Ability", style = "margin-bottom: 0; font-size: 0.9em;"),
                    p("98th percentile", style = "font-size: 0.8em; color: #6c757d; margin-bottom: 0;")
                  )
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("Probability Interpretation", style = "margin-top: 0;"),
                HTML("
      \\[
        P(\\text{θ ≤ -1}) = 0.1587 \\quad
        P(\\text{θ ≤ 0}) = 0.5000 \\quad  
        P(\\text{θ ≤ +1}) = 0.8413
      \\]
    "),
                p("These probabilities come from the standard normal cumulative distribution function.")
              ),
              
              div(
                style = "background: #fff0e8; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("Why Standardize Theta?", style = "margin-top: 0; color: #e76f51;"),
                tags$ul(
                  class = "tight-ul",
                  tags$li(tags$b("Comparability: "), "Allows comparison across different tests and populations"),
                  tags$li(tags$b("Interpretability: "), "Psychologists and educators intuitively understand z-scores"),
                  tags$li(tags$b("Mathematical convenience: "), "Simplifies parameter estimation and model fitting"),
                  tags$li(tags$b("Linking and equating: "), "Essential for putting different tests on a common scale")
                )
              ),
              
              div(
                style = "background: #f8f9fa; padding: 1.5rem; border-radius: 8px; margin: 1rem 0;",
                h6("Practical Implications for Item Parameters", style = "margin-top: 0;"),
                
                div(
                  style = "display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1rem 0;",
                  div(
                    style = "background: #e8f5e8; padding: 1rem; border-radius: 6px;",
                    h6("Difficulty (b) Parameter", style = "margin-top: 0; color: #2a9d8f;"),
                    tags$ul(
                      class = "tight-ul",
                      tags$li("b = -1: Easy item (84% of population above this level)"),
                      tags$li("b = 0: Medium item (50% above, 50% below)"),
                      tags$li("b = +1: Hard item (16% of population above this level)")
                    )
                  ),
                  div(
                    style = "background: #fff3cd; padding: 1rem; border-radius: 6px;",
                    h6("Target Population", style = "margin-top: 0; color: #856404;"),
                    tags$ul(
                      class = "tight-ul",
                      tags$li("b ≈ -2: Useful for low-ability diagnostic tests"),
                      tags$li("b ≈ 0: Good for general population assessment"),
                      tags$li("b ≈ +2: Appropriate for gifted/talent identification")
                    )
                  )
                )
              ),
              
              div(
                style = "background: #e8f4f8; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("Relationship to Observed Scores", style = "margin-top: 0; color: #219ebc;"),
                p("While theta is a continuous latent variable, it relates to observed test scores through the test characteristic curve:"),
                HTML("
      \\[
        \\text{Expected Raw Score} = \\sum_{i=1}^n P_i(\\theta)
      \\]
    "),
                p("Where P_i(θ) is the probability of correct response for item i at ability level θ.")
              ),
              
              div(
                class = "info-callout",
                style = "margin-top: 1rem;",
                h5("Key Insights"),
                tags$ul(
                  class = "tight-ul",
                  tags$li("Theta is a z-score: θ = 1 means one standard deviation above the population mean"),
                  tags$li("The -3 to +3 range covers virtually the entire normally distributed population"),
                  tags$li("Item difficulty (b) indicates where an item functions best along this scale"),
                  tags$li("This standardization enables meaningful comparisons across different tests and time points")
                )
              ),
              
              div(
                style = "background: #f0f4ff; padding: 1rem; border-radius: 6px; margin: 1rem 0;",
                h6("In This Application", style = "margin-top: 0; color: #4a6fdc;"),
                p("The theta probe slider ranges from -3 to +3, allowing you to explore how items function across the entire ability spectrum. The vertical dashed line shows the current theta value, with corresponding probability and information values displayed."),
                p("Try moving the theta probe to see how an item's measurement properties change across different ability levels!")
              )
            ),
            
            # Navigation prompt
            div(
              style = "text-align: center; margin-top: 2rem; padding-top: 1rem; border-top: 1px solid #f1f3f5;",
              p(
                class = "muted",
                "Continue exploring: Switch to the ",
                tags$b("Explorer"),
                " tab to see these concepts in action or the ",
                tags$b("Info"), 
                " tab for comprehensive model documentation."
              )
            )
          )
        )      
      )
    )
  )
)


# ===========================================
# ---- SERVER LOGIC SECTION ----
# ===========================================

server <- function(input, output, session) {
  # Reactive value to track model info
  model_info_message <- reactiveVal("")
  
  # Track last shown warnings to prevent duplicates
  last_warning <- reactiveVal("")
  
  # Track previous Dscale value
  last_Dscale <- reactiveVal("Haley-Birnbaum (1.702)")
  
  # Output for model info messages
  output$model_info <- renderUI({
    if (nchar(model_info_message()) > 0) {
      div(
        class = "info-box",
        style = "font-size: 0.75em; color: #004085; background-color: #cce5ff; border: 1px solid #b8daff; border-radius: 4px; padding: 8px; margin-top: 16px;",
        tags$strong("Model Info:"),
        model_info_message()
      )
    }
  })
  
  # Dynamic parameter sliders based on model
  output$parameter_sliders <- renderUI({
    model <- input$model
    
    sliders <- list()
    sliders[[length(sliders) + 1]] <- sliderInput(
      "a",
      tags$span(style = "font-size: 0.80em; font-weight: bold;", "Discrimination (a)"),
      min = -3,
      max = 3,
      value = 1.0,
      step = 0.05
    )
    sliders[[length(sliders) + 1]] <- div(class = "slider-separator")
    sliders[[length(sliders) + 1]] <- sliderInput(
      "b",
      tags$span(style = "font-size: 0.80em; font-weight: bold;", "Difficulty (b)"),
      min = -3,
      max = 3,
      value = 0.0,
      step = 0.05
    )
    if (model %in% c("3PL", "4PL")) {
      sliders[[length(sliders) + 1]] <- div(class = "slider-separator")
      sliders[[length(sliders) + 1]] <- sliderInput(
        "c",
        tags$span(style = "font-size: 0.80em; font-weight: bold;", "Guessing (c)"),
        min = 0.00,
        max = 0.25,
        value = 0.00,
        step = 0.01
      )
    }
    if (model == "4PL") {
      sliders[[length(sliders) + 1]] <- div(class = "slider-separator")
      sliders[[length(sliders) + 1]] <- sliderInput(
        "d",
        tags$span(style = "font-size: 0.80em; font-weight: bold;", "Slipping (d)"),
        min = 0.85,
        max = 1.,
        value = 1.00,
        step = 0.01
      )
    }
    tagList(sliders)
  })
  
  # Update model info when model changes
  observeEvent(input$model, {
    if (input$model == "1PL") {
      model_info_message(
        "1-Parameter Logistic (1PL) / Rasch Model
Description: This model estimates a unique difficulty (b) for each item, while the discrimination (a) is fixed to a common value for all items.
This creates parallel Item Characteristic Curves (ICCs), a key assumption of the model.
The discrimination value is modifiable here for educational demonstration."
      )
    } else if (input$model == "2PL") {
      model_info_message(
        "This model estimates a unique difficulty (b) and discrimination (a) parameter for each item.
                         It assumes that guessing is not a factor (c=0) and that high-ability respondents will not make careless errors (d=1)."
      )
    } else if (input$model == "3PL") {
      model_info_message(
        "3-Parameter Logistic (3PL) Model: This model estimates a unique difficulty (b),
      discrimination (a), and guessing parameter (c) for each item.
                         It assumes that high-ability respondents are not limited by carelessness so the upper asymptote,
                         representing the maximum probability of a correct response, is fixed at 1."
      )
    } else if (input$model == "4PL") {
      model_info_message(
        "4-Parameter Logistic (4PL) Model: This model provides maximum flexibility by
                         allowing all four parameters -difficulty, discrimination, guessing, and inattention—
                         to be estimated independently for each item."
      )
    }
  })
  
  # Block Rasch when not 1PL
  observeEvent(list(input$model, input$Dscale), ignoreInit = TRUE, {
    if (input$model != "1PL" && input$Dscale == "Rasch (1.0)") {
      showModal(
        modalDialog(
          title = "Warning",
          "The Rasch scaling (D=1.0) is not applicable outside the 1PL model.
        Automatically switching to the Haley-Birnbaum scaling constant (D=1.702) to
        maintain model integrity.",
          easyClose = TRUE,
          footer = modalButton("OK")
        )
      )
      updateSelectInput(session, "Dscale", selected = "Haley-Birnbaum (1.702)")
    }
  })
  
  # Reset discrimination to 1.0 when switching from Rasch to other scaling in 1PL
  observeEvent(list(input$model, input$Dscale), {
    is_1pl <- identical(input$model, "1PL")
    was_rasch <- identical(last_Dscale(), "Rasch (1.0)")
    is_now_haley_birnbaum <- identical(input$Dscale, "Haley-Birnbaum (1.702)")
    if (is_1pl && was_rasch && is_now_haley_birnbaum) {
      updateSliderInput(session, "a", value = 1.0)
    }
    last_Dscale(input$Dscale)
  })
  
  # Handle discrimination slider behavior for Rasch model
  observeEvent(list(input$model, input$Dscale, input$a), ignoreInit = TRUE, {
    is_1pl <- identical(input$model, "1PL")
    is_rasch <- identical(input$Dscale, "Rasch (1.0)")
    if (is_1pl && is_rasch && abs(input$a - 1) > 1e-9) {
      if (!identical(last_warning(), "rasch_reset")) {
        showModal(
          modalDialog(
            title = "Rasch Model Constraint",
            "User input for discrimination (a) ignored.
          Rasch model specifications enforce a fixed value of 1.0.
          Parameter has been reset.",
            easyClose = TRUE,
            footer = modalButton("OK")
          )
        )
        last_warning("rasch_reset")
      }
      updateSliderInput(session, "a", value = 1.00)
    }
  })
  
  # Show warning when a < 0
  observeEvent(input$a, {
    if (input$a < 0 && !identical(last_warning(), "negative_a")) {
      showModal(
        modalDialog(
          title = "Warning",
          "A negative Discrimination value (a < 0) often suggests a reverse-coded item or confusing wording.
        We recommend reviewing the question for clarity.",
          easyClose = TRUE,
          footer = modalButton("OK")
        )
      )
      last_warning("negative_a")
    }
  })
  
  # Show warning when discrimination is modified in 1PL (non-Rasch)
  observeEvent(input$a, {
    is_1pl <- identical(input$model, "1PL")
    is_rasch <- identical(input$Dscale, "Rasch (1.0)")
    if (is_1pl &&
        !is_rasch && !identical(last_warning(), "1PL_discrimination")) {
      showModal(
        modalDialog(
          title = "1PL Model Constraint",
          "In 1PL model, discrimination applies to all items equally (parallel ICC curves).",
          easyClose = TRUE,
          footer = modalButton("OK")
        )
      )
      last_warning("1PL_discrimination")
    }
  })
  
  # Reset last warning when model or Dscale changes
  observeEvent(list(input$model, input$Dscale), {
    last_warning("")
  })
  
  # Params for selected model
  active_params <- reactive({
    a <- if (input$model == "1PL" &&
             identical(input$Dscale, "Rasch (1.0)")) {
              1.0
          } else if (input$model == "1PL") {
              input$a %or% 1.0
          } else {
              input$a %or% 1.0
          }
    
    c <- if (input$model %in% c("1PL", "2PL"))
            0.0
         else
            (input$c %or% 0.0)
    
    d <- if (input$model %in% c("1PL", "2PL", "3PL"))
            1.0
         else
            (input$d %or% 1.0)
    
    list(
      a = a,
      b = input$b %or% 0.0,
      c = c,
      d = d,
      D = D_map[[input$Dscale]] %or% 1.0
    )
  })
  
  output$plot_icc_iif <- renderPlot({
    pars <- active_params()
    alpha_icc <- input$alpha_icc %or% 1.0
    alpha_iif <- input$alpha_iif %or% 0.2
    annotation_size <- input$annotation_size %or% 1.25
    
    theta <- seq(-3, 3, by = 0.01)
    P  <- P_4pl(theta, pars$a, pars$b, pars$c, pars$d, pars$D)
    P  <- pmin(pmax(P, -1e6), 1 + 1e6)
    It <- I_4pl(theta, pars$a, pars$b, pars$c, pars$d, pars$D)
    
    # Right-axis scaling
    I_min <- 0
    I_max <- max(It * 1.1, na.rm = TRUE)
    I_max <- if (is.finite(I_max) && I_max > 0)
      I_max
    else
      1
    
    scale_to_left <- function(y) (y - I_min) / (I_max - I_min)
    It_scaled <- scale_to_left(It)
    
    # Colors - ICC changes to red and IIF to green when a < 0
    neg_a <- isTRUE(pars$a < 0)
    col_icc <- if (neg_a)
      rgb(231 / 255, 111 / 255, 81 / 255, alpha = alpha_icc)
    else
      rgb(33 / 255, 158 / 255, 188 / 255, alpha = alpha_icc)
    
    col_iif <- if (neg_a)
      rgb(0 / 255, 128 / 255, 0 / 255, alpha = alpha_iif)
    else
      rgb(231 / 255, 111 / 255, 81 / 255, alpha = alpha_iif)
    
    col_c_d <- "#55555566"
    
    # Larger margins to fit big labels at left/right
    op <- par(no.readonly = TRUE)
    on.exit(par(op), add = TRUE)
    par(mar = c(2, 7, 4, 7))
    
    plot(
      theta,
      P,
      type = "n",
      xlab = "",
      ylab = "",
      ylim = c(0.0, 1.0),
      xaxs = "i",
      yaxs = "i",
      xaxt = "n",
      yaxt = "n",
      bty = "n"
    )
    
    axis(1, at = c(-3, 3), labels = c("-3", "3"))
    axis(
      2,
      at = c(0, 1),
      labels = c("0.0", "1.0"),
      las = 1,
      col.ticks = rgb(33 / 255, 158 / 255, 188 / 255),
      col = rgb(33 / 255, 158 / 255, 188 / 255),
      col.axis = rgb(33 / 255, 158 / 255, 188 / 255)
    )
    axis(
      4,
      at = scale_to_left(c(I_min, I_max)),
      labels = formatC(c(I_min, I_max), digits = 3, format = "fg"),
      las = 1,
      col = rgb(231 / 255, 111 / 255, 81 / 255),
      col.axis = rgb(231 / 255, 111 / 255, 81 / 255),
      col.ticks = rgb(231 / 255, 111 / 255, 81 / 255)
    )
    
    abline(
      h = pars$c,
      lty = 3,
      lwd = 2,
      col = col_c_d
    )
    abline(
      h = pars$d,
      lty = 3,
      lwd = 2,
      col = col_c_d
    )
    
    if (alpha_icc > 0)
      lines(theta, P, lwd = 4, col = col_icc)
    if (alpha_iif > 0)
      lines(theta, It_scaled, lwd = 4, col = col_iif)
    
    th0 <- input$theta_probe %or% 0
    P0  <- P_4pl(th0, pars$a, pars$b, pars$c, pars$d, pars$D)
    I0  <- I_4pl(th0, pars$a, pars$b, pars$c, pars$d, pars$D)
    I0L <- scale_to_left(I0)
    
    abline(
      v = th0,
      lty = 1,
      lwd = 2,
      col = "#44444466"
    )
    
    # Get plot boundaries BEFORE any par() changes
    usr <- par("usr")
    x_left <- usr[1]
    x_right <- usr[2]
    pad <- 0.012 * (x_right - x_left)
    
    # Add theta probe label
    text(
      x = th0,
      y = usr[3],
      labels = bquote(theta == .(round(th0, 3))),
      pos = 1,
      xpd = NA,
      cex = annotation_size * 1.2,
      font = 4,
      col = "#333333"
    )
    
    if (alpha_icc > 0)
      points(
        th0,
        P0,
        pch = 21,
        cex = 1.5,
        lwd = 2.0,
        col = "white",
        bg = col_icc
      )
    if (alpha_iif > 0)
      points(
        th0,
        I0L,
        pch = 21,
        cex = 1.5,
        lwd = 2.0,
        col = "white",
        bg = col_iif
      )
    
    if (alpha_icc > 0) {
      segments(th0, P0, x_left, P0, lwd = 1, col = col_icc)
      shape::Arrowhead(
        x0 = x_left + pad * 2,
        y0 = P0,
        angle = 180,
        arr.type = "curved",
        arr.length = 0.40,
        arr.width  = 0.20,
        npoint = 25,
        lcol = col_icc,
        arr.col = col_icc,
        lwd = 2
      )
    }
    
    if (alpha_iif > 0) {
      segments(th0, I0L, x_right, I0L, lwd = 1, col = col_iif)
      shape::Arrowhead(
        x0 = x_right - pad * 2,
        y0 = I0L,
        angle = 0,
        arr.type = "curved",
        arr.length = 0.40,
        arr.width  = 0.20,
        lcol = col_iif,
        arr.col = col_iif,
        lwd = 2
      )
    }
    
    # Now add the text annotations - x_right is defined
    if (alpha_iif > 0) {
      text(
        x = x_right,
        y = I0L,
        labels = paste0("  I(θ) = ", format(round(I0, 3))),
        pos = 4,
        cex = annotation_size,
        font = 4,
        col = col_iif,
        xpd = NA
      )
    }
    
    if (alpha_icc > 0) {
      text(
        x = x_left,
        y = P0,
        labels = paste0("P(θ) = ", format(round(P0, 3)), "  "),
        pos = 2,
        cex = annotation_size,
        font = 4,
        col = col_icc,
        xpd = NA
      )
    }
    
    par(xpd = NA)
    legend(
      x = mean(usr[1:2]),
      y = usr[4] + 0.15,
      legend = paste0(input$model, " Model | D = ", round(pars$D, 3)),
      bty = "n",
      cex = annotation_size * 0.96,
      text.font = 2,
      xjust = 0.5
    )
    
    curve_legends <- c()
    curve_colors <- c()
    curve_lwds <- c()
    curve_ltys <- c()
    
    if (alpha_icc > 0) {
      curve_legends <- c(curve_legends, "IRF")
      curve_colors <- c(curve_colors, col_icc)
      curve_lwds <- c(curve_lwds, 6)
      curve_ltys <- c(curve_ltys, 1)
    }
    
    if (alpha_iif > 0) {
      curve_legends <- c(curve_legends, "IIF")
      curve_colors <- c(curve_colors, col_iif)
      curve_lwds <- c(curve_lwds, 6)
      curve_ltys <- c(curve_ltys, 1)
    }
    
    if (length(curve_legends) > 0) {
      legend(
        x = mean(usr[1:2]),
        y = usr[4] + 0.08,
        legend = curve_legends,
        lwd = curve_lwds,
        lty = curve_ltys,
        col = curve_colors,
        bty = "n",
        cex = annotation_size * 0.96,
        seg.len = 1,
        ncol = ifelse(length(curve_legends) == 1, 1, 2),
        xjust = 0.5
      )
    }
  })
}

# ===========================================
# ---- LAUNCH APPLICATION SECTION ----
# ===========================================

shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

Fisher Information: Quantifying Measurement Precision

Fisher Information provides the mathematical backbone of Computerized Adaptive Testing (CAT) by quantifying how precisely an item or test measures a person’s ability. In general statistics, Fisher Information measures how much an observable variable tells us about an unknown parameter. In the context of Item Response Theory (IRT), the observable variable is an individual’s item response, and the unknown parameter is their latent ability \(\theta\).

Fisher Information thus expresses how strongly the data constrain the possible values of \(\theta\): sharply peaked likelihoods imply high precision (high information), while flatter likelihoods imply low precision (low information). Conceptually, it measures the “sharpness” of the statistical evidence that the data provide about a parameter.


Bridging Theory and Implementation

To quantify Fisher Information within IRT, we need to formalize how an observed response depends on ability. This relationship is expressed through the likelihood function, which gives the probability of a person’s observed response given a candidate value of \(\theta\). The shape of this likelihood—specifically its curvature around the most likely value—reveals how much information the response provides about ability.

In practice, we work with the logarithm of the likelihood, both for mathematical convenience and computational stability. The curvature (second derivative) of this log-likelihood forms the mathematical basis for Fisher Information. Thus, understanding the likelihood and its logarithmic transformation is the first step toward expressing information quantitatively in IRT.


Likelihood and Log-Likelihood in IRT

For a dichotomous item \(j\), the response \(Y_j \in \{0,1\}\) follows a Bernoulli likelihood:

\[ L_j(\theta) = P_j(\theta)^{y_j} \, [1 - P_j(\theta)]^{1 - y_j}, \]

where \(P_j(\theta)\) is the probability of a correct response for ability \(\theta\), and \(y_j\) is the observed outcome. This likelihood expresses how plausible each possible ability level is, given the observed response.

Taking the natural logarithm gives the log-likelihood:

\[ \ell_j(\theta) = y_j \log P_j(\theta) + (1 - y_j) \log \big[1 - P_j(\theta)\big]. \]

The log-likelihood reflects the evidence supporting each value of \(\theta\). Maximizing \(\ell_j(\theta)\) yields the maximum likelihood estimate (MLE) of ability.

Taking the logarithm offers three critical advantages:

  1. Simplifies computation.
    The total likelihood across items is a product: \[ L(\theta) = \prod_j L_j(\theta), \] and taking the logarithm converts this product into a sum: \[ \log L(\theta) = \sum_j \log L_j(\theta), \] which is far easier to differentiate and optimize.

  2. Improves numerical stability.
    As the number of items grows, the product of small probabilities can underflow to zero in computer arithmetic.
    Taking logs transforms these products into manageable sums of negative values, preventing numerical underflow and maintaining precision.

  3. Preserves the maximum.
    Because the logarithm is strictly increasing, maximizing \(L_j(\theta)\) or \(\ell_j(\theta)\) yields the same estimate: \[ \arg\max_\theta L_j(\theta) = \arg\max_\theta \ell_j(\theta). \] In short, the log-likelihood simplifies computation, stabilizes numerical performance, and preserves the location of the MLE.

The log-likelihood curve describes how plausible different ability levels are given the observed data. Its curvature—how sharply it bends around its peak—reveals how precisely the data identify \(\theta\).

Fisher Information is defined as the negative expected curvature (the second derivative) of the log-likelihood:

\[ I_j(\theta) = -\,\mathbb{E}\!\left[\frac{\partial^2 \ell_j(\theta)}{\partial \theta^2}\right]. \]

The negative sign ensures that information is positive because the log-likelihood typically curves downward near its maximum.
Taking the expectation indicates that Fisher Information reflects the average precision implied by the model, not just one observed response.


Intuitive Interpretation

  • The first derivative, \(\ell'_j(\theta)\), measures the sensitivity of the likelihood to small changes in ability — how strongly the data “pull” the estimate of \(\theta\).
  • The second derivative, \(\ell''_j(\theta)\), captures how that sensitivity changes — the curvature of the likelihood surface.
  • The negative expected curvature, \(-\mathbb{E}[\ell''_j(\theta)]\), therefore defines how precisely the item measures ability.

A steep, narrow log-likelihood peak implies high Fisher Information (precise measurement), while a flat, wide curve indicates low information (greater uncertainty).


Item Information Function (IIF)

Specializing Fisher Information to the IRT setting yields the Item Information Function (IIF).
Starting from the score function — the derivative of the log-likelihood — we have:

\[ \frac{\partial \ell_j(\theta)}{\partial \theta} = \frac{y_j - P_j(\theta)}{P_j(\theta) \cdot \big[1 - P_j(\theta)\big]} \cdot P'_j(\theta), \]

where \(P'_j(\theta) = \frac{\partial P_j(\theta)}{\partial \theta}\) is the slope of the item response curve.
Taking the expected squared value of this score gives:

\[ I_j(\theta) = \mathbb{E}\!\left[\left(\frac{\partial \ell_j(\theta)}{\partial \theta}\right)^2\right] = \frac{\big[P'_j(\theta)\big]^2}{P_j(\theta) \cdot \big [1 - P_j(\theta)\big]}. \]

Thus, the IIF represents the expected value of the squared score function, describing how much information an item contributes about \(\theta\) at each point on the ability scale.


Interpreting the Components of IIF

  • Numerator – Slope Sensitivity:
    \([P'_j(\theta)]^2\) reflects the item’s discriminating power—how steeply the probability of success changes with ability.
    Steeper slopes (higher \(a_j\)) yield greater information, and squaring the term ensures positivity and amplifies the effect of steepness.

  • Denominator – Response Variance:
    \(P_j(\theta)[1 - P_j(\theta)]\) represents the Bernoulli response variance, maximized at \(P_j(\theta) = 0.5\).
    This corresponds to maximum uncertainty—when the item’s difficulty perfectly matches the examinee’s ability.


Where Information Peaks

In the 2PL model (no guessing), maximum information occurs at \(P_j(\theta) = 0.5\), the inflection point of the Item Characteristic Curve (ICC) where \(P'_j(\theta)\) is steepest.
At this point, the item is both maximally uncertain and maximally sensitive—the perfect conditions for precise measurement.

For the 3PL model, the guessing parameter \(c_j\) shifts this peak slightly upward, approximately near:

\[ P_j(\theta) = \frac{1 + c_j}{2}. \]

Key Insight:
Information is maximized when an item is both highly discriminating (steep slope) and well-targeted (difficulty matched to ability).
CAT algorithms exploit this principle by selecting items that are simultaneously uncertain and informative.


Practical Implications for CAT

  1. Optimal Targeting:
    CAT dynamically selects items where \(P_j(\theta) \approx 0.5\), matching item difficulty to estimated ability and maximizing information per response. This principle avoids items that are too easy or too hard.

  2. Discrimination Dominance:
    Information scales quadratically with item discrimination: \[ I_j(\theta) \propto a_j^2. \] Hence, an item with \(a_j = 2.0\) provides roughly four times the information of an item with \(a_j = 1.0\) at the same ability level. This creates a powerful selection bias toward highly discriminating items in the item pool.

  3. Stopping Rule:
    Testing continues until the estimated standard error meets a predefined precision threshold: \[ SE(\hat{\theta}) = \frac{1}{\sqrt{I_T(\hat{\theta})}} < \tau, \] where \(I_T(\theta) = \sum_j I_j(\theta)\) is the Test Information Function (TIF).

Together, these principles form the psychometric engine of CAT: adaptive algorithms continually select the most informative items, minimize measurement error, and achieve precise estimates with far fewer questions than fixed-form tests.


Key Takeaway The Item Information Function is simply the Fisher Information applied to a single IRT item. It quantifies how precisely that item measures ability at each point on the scale. By summing across items, CAT systems build the Test Information Function, whose inverse square root defines the conditional standard error of ability estimation.



Derivations for IRT Models

In Item Response Theory (IRT), both measurement precision and ability estimation depend on how sensitive an item’s response probability is to changes in the latent ability \(\theta\). This sensitivity is governed by the first and second derivatives of the item response function \(P_j(\theta)\).


1. The First Derivative: Slope and Discrimination

The first derivative

\[ P'_j(\theta) = \frac{\partial P_j(\theta)}{\partial \theta}, \]

measures the instantaneous rate of change in the probability of a correct response.

  • A steep slope indicates high discrimination, allowing the item to sharply distinguish examinees with similar abilities.
  • A flat slope indicates low discrimination, as the probability changes little across ability levels.

This derivative underlies two central statistical components in IRT:

  1. The score function, \(\ell'_j(\theta)\), which drives ability estimation by identifying where the likelihood is maximized.
  2. The Fisher Information,

\[ I_j(\theta) = \frac{\big[P'_j(\theta)\big]^2}{P_j(\theta)\cdot \big[1 - P_j(\theta)\big]}, \]

which quantifies the measurement precision an item provides at each ability level.


2. The Second Derivative: Curvature and Precision

The second derivative

\[ P''_j(\theta) = \frac{\partial^2 P_j(\theta)}{\partial \theta^2}, \]

describes the curvature of the item response function — how rapidly its slope changes with ability.

In the context of likelihood-based estimation:

  • The first derivative of the log-likelihood, \(\ell'_j(\theta)=\frac{\partial \ell_j(\theta)}{\partial\theta}\), represents the slope or score function.
    • As the name slope suggests, it measures how sensitive the log-likelihood is to a change in the parameter \(\theta\).
    • A key property is that its expectation is zero at the true parameter value: \(E\big[\ell_j′(\theta)\big]=0\). \[ I_j(\theta) = \mathbb{E}\!\left[\left(\frac{\partial \ell_j(\theta)}{\partial \theta}\right)^2\right] \quad\text{(variance of the score function)} \]
  • The second derivative, \(\ell''_j(\theta)=\frac{\partial^2 \ell_j(\theta)}{\partial \theta^2}\), represents the curvature of the log-likelihood. Its negative expectation defines the Fisher Information, which can be expressed equivalently as:
    • High curvature (a large negative second derivative) means the log-likelihood function has a sharp peak. This is “good” because it indicates that the data provides strong information about the parameter—the maximum likelihood estimate (MLE) is precise and well-defined.
    • Low curvature (a second derivative close to zero) means the log-likelihood function is flat. This is “bad” because it indicates the data provides little information, and the MLE is uncertain. \[ I_j(\theta) = -\,\mathbb{E}\!\left[\frac{\partial^2 \ell_j(\theta)}{\partial \theta^2}\right] \quad\text{(negative expected curvature of the log-likelihood)}. \]

Derivation of Equivalence

Let \(\ell_j(\theta) = \log L_j(\theta)\) be the log-likelihood for item \(j\), and define the score function as \[ S_j(\theta) = \frac{\partial \ell_j(\theta)}{\partial \theta}. \]

Then, Fisher Information is the variance of this score:

\[ I_j(\theta) = \mathrm{Var}\big[S_j(\theta)\big] = \mathbb{E}\big[S_j(\theta)^2\big] - \Big(\mathbb{E}\big[S_j(\theta)\big]\Big)^2. \] Under typical model assumptions, the expected score is zero for all \(\theta\):

\[ \mathbb{E}\big[S_j(\theta)\big] = 0, \] so that

\[ I_j(\theta) = \mathbb{E}\big[S_j(\theta)^2\big]. \] To connect this to curvature, take the derivative of the expected score:

\[ \frac{d}{d\theta}\mathbb{E}\big[S_j(\theta)\big] = \mathbb{E}\!\left[\frac{\partial S_j(\theta)}{\partial \theta} + S_j(\theta)^2\right] = 0. \] Rearranging gives:

\[ \mathbb{E}\big[S_j(\theta)^2\big] = -\,\mathbb{E}\!\left[\frac{\partial S_j(\theta)}{\partial \theta}\right]. \]

Recognizing that

\[ \frac{\partial S_j(\theta)}{\partial \theta} = \frac{\partial^2 \ell_j(\theta)}{\partial \theta^2} \],
we obtain:

\[ I_j(\theta) = \mathbb{E}\big[S_j(\theta)^2\big] = -\,\mathbb{E}\!\left[\frac{\partial^2 \ell_j(\theta)}{\partial \theta^2}\right]. \]

The two definitions are two sides of the same coin. The first defines information as the variability of the score, while the second defines it as the average sensitivity (or curvature) of the score itself. A steep and sharply peaked log-likelihood (high expected curvature) will also have a score function that changes dramatically with different samples (high variance).


3. Intuitive Interpretation

Form Expression Meaning
Variance of the score \(I_j(\theta) = \mathbb{E}\big[S_j(\theta)^2\big]\) “Sensitivity to Change”: If the score (the slope) is highly variable across different possible datasets, it means the data is very informative for distinguishing between different values of θ. A small change in θ would lead to a large change in the score, which is a sign of high information.
Negative expected curvature \(I_j(\theta) = -\,\mathbb{E}\big[\ell''_j(\theta)\big]\) “Certainty of the Peak”: This directly measures the sharpness of the log-likelihood function at its maximum. A very sharp peak (large negative curvature) means the data strongly identifies a single “best” value for θ. The estimate is precise. A flat likelihood means many values of θ are almost equally plausible, leading to an imprecise estimate.

Both describe how concentrated the likelihood is around its maximum.

  • The curvature is the local property at the maximum. It’s a direct measure of concentration.
  • The variance of the score is a global property (an expectation over all data) that is mathematically equivalent to this local concentration. If the peak is sharp (high concentration), the slope must change very rapidly as you move away from the maximum, which inherently means the score function is highly variable.

A steep, sharply curved log-likelihood (large negative second derivative) implies high Fisher Information and thus high measurement precision.


4. Linking Derivatives to Measurement and Estimation

Derivatives form the bridge between the psychometric model and the statistical measurement process:

Quantity Symbol Interpretation Psychometric Role
First derivative of ICC \(P'_j(\theta)\) Rate of change of response probability Determines item discrimination and local information
Second derivative of ICC \(P''_j(\theta)\) Curvature of the probability function Shapes the form of the ICC and local precision
First derivative of log-likelihood \(\ell'_j(\theta)\) Score function Drives ability estimation (e.g., Newton–Raphson / Fisher Scoring)
Second derivative of log-likelihood \(\ell''_j(\theta)\) Observed information Determines estimation precision and standard error
Expected negative curvature \(I_j(\theta) = -\mathbb{E}\big[\ell''_j(\theta)\big]\) Fisher Information Defines the Cramér–Rao lower bound on \(\mathrm{Var}(\hat{\theta})\)

Key Insight

  • The first derivative, \(P'_j(\theta)\), determines how informative an item is — the steeper the slope, the more precisely it discriminates between examinees.
  • The second derivative, \(P''_j(\theta)\), determines how stable that information is across ability, influencing the curvature of the likelihood and the standard error of measurement (SEM).

Together, these derivatives connect psychometric sensitivity (how items behave) with statistical precision (how well we can estimate ability). They form the mathematical bridge between item response functions and the information-driven logic of adaptive testing.


3-Parameter Logistic (3PL) Model Derivation

For the 3PL model (\(d_j = 1\)):

\[ P_j(\theta) = c_j + \frac{1 - c_j}{1 + \exp\big[-D \cdot a_j \cdot (\theta - b_j)\big]} \]

Let \(x = D \cdot a_j \cdot (\theta - b_j)\), then:

\[ \begin{aligned} P_j'(\theta) &= \frac{\partial}{\partial \theta} \left[c_j + \frac{1-c_j}{1 + e^{-x}}\right] \\[12pt] &= (1-c_j) \cdot \frac{\partial}{\partial \theta} \left[(1 + e^{-x})^{-1}\right] \\[12pt] &= (1-c_j) \cdot (-1) \cdot (1 + e^{-x})^{-2} \cdot (-D \cdot a_j \cdot e^{-x}) \\[12pt] &= D \cdot a_j \cdot (1-c_j) \cdot \frac{e^{-x}}{(1 + e^{-x})^2} \end{aligned} \]

Now observe that:

\[ \begin{aligned} P_j(\theta) - c_j &= \frac{1-c_j}{1 + e^{-x}} \\[12pt] 1 - P_j(\theta) &= \frac{(1-c_j)\cdot e^{-x}}{1 + e^{-x}} \end{aligned} \]

Multiplying these expressions:

\[ (P_j - c_j) \cdot (1 - P_j) = \frac{(1-c_j)^2 \cdot e^{-x}}{(1 + e^{-x})^2} \]

Comparing with the derivative reveals the elegant relationship:

\[ P_j'(\theta) = \frac{D \cdot a_j}{1-c_j} \cdot \big(P_j(\theta) - c_j\big) \cdot \big(1 - P_j(\theta)\big) \]

Substituting into the Fisher Information function:

\[ \begin{aligned} I_j(\theta) &= \frac{\big[P_j'(\theta)\big]^2}{P_j(\theta)\cdot \big[1 - P_j(\theta)\big]} \\[12pt] &= \frac{\left[\dfrac{D \cdot a_j}{1-c_j} \cdot (P_j - c_j) \cdot(1 - P_j)\right]^2}{P_j(1 - P_j)} \\[12pt] &= \frac{(D \cdot a_j)^2}{(1-c_j)^2} \cdot \frac{(P_j - c_j)^2 \cdot (1 - P_j)^2}{P_j(1 - P_j)} \\[12pt] &= (D \cdot a_j)^2 \cdot \frac{(P_j - c_j)^2}{(1-c_j)^2} \cdot \frac{1 - P_j}{P_j} \end{aligned} \]

Final Result for 3PL Model:

\[ \boxed{I_j(\theta) = (D \cdot a_j)^2 \cdot \frac{1 - P_j(\theta)}{P_j(\theta)} \cdot \left(\frac{P_j(\theta) - c_j}{1 - c_j}\right)^2} \]


Interpretation of the 3PL Information Function

This elegant result reveals three distinct components governing measurement precision:

  1. \((D \cdot a_j)^2\): Quadratic dependence on discrimination, making highly discriminating items disproportionately valuable

  2. \(\dfrac{1 - P_j(\theta)}{P_j(\theta)}\): Odds ratio favoring items where success is uncertain \(\big(P_j(\theta) \approx 0.5 \big)\)

  3. \(\left(\dfrac{P_j(\theta) - c_j}{1 - c_j}\right)^2\): Guessing penalty that reduces information when responses are contaminated by chance success

The derivation demonstrates how the 3PL information function naturally extends the 2PL case while accounting for the psychometric complications introduced by the guessing parameter.


Special Cases

Model Parameters Information Function
4PL general case \(I_j(\theta) = (D \cdot a_j)^2 \cdot \displaystyle \frac{(d_j - P_j(\theta))^2 \cdot \big(P_j(\theta) - c_j\big)^2}{(d_j - c_j)^2 \cdot P_j(\theta) \cdot \big[1 - P_j(\theta)\big]}\)
3PL \(d_j = 1\) \(I_j(\theta) = (D \cdot a_j)^2 \cdot \displaystyle \frac{\big(P_j(\theta) - c_j\big)^2}{(1 - c_j)^2} \cdot \frac{1 - P_j(\theta)}{P_j(\theta)}\)
2PL \(c_j = 0, \: d_j = 1\) \(I_j(\theta) = (D \cdot a_j)^2 \cdot P_j(\theta) \cdot \big[1 - P_j(\theta)\big]\)
1PL (Rasch) \(a_j = 1, \: c_j = 0, \: d_j = 1\) \(I_j(\theta) = D^2 \cdot P_j(\theta) \cdot \big[1 - P_j(\theta)\big]\)


Test Information and Measurement Precision

While \(I_j(\theta)\) describes an items local precision, summing across items yields the test-level precision \(I_T(\theta)\), which forms the basis of adaptive item selection.

IRT assumes that, given an examinee’s latent ability \(\theta\), responses to different items are statistically independent. Once \(\theta\) is known, how a person answers item \(j\) provides no additional information about any other item. Each response depends only on the examinee’s ability and the item’s parameters.

Under this assumption of conditional independence, information is additive across items, allowing modular test assembly and adaptive targeting:

\[ I_T(\theta) = \sum_{j=1}^J I_j(\theta) \]

Violations of this assumption are known as local dependence. They occur when responses share additional common factors not captured by \(\theta\):

  • Testlet effects (items grouped under a shared passage or scenario)
  • Speededness or fatigue effects
  • Repeated content or overlapping item stimuli
  • Method effects (e.g., response styles, wording similarity)

If unmodeled, these dependencies can inflate item correlations and bias parameter estimates, leading to:

  • Overestimated test information,
  • Underestimated standard errors,
  • Distorted ability estimates.

To address conditional independence violations, psychometricians may:

  • Introduce additional latent variables (e.g., testlet models, multidimensional IRT)
  • Add residual correlations or hierarchical structures
  • Use diagnostics such as Yen’s \(Q_3\) statistic or residual covariance analysis

Summary

  • Conditional independence ensures that the joint probability of all responses can be expressed as the product of item-level probabilities.
  • It allows efficient likelihood-based estimation, since item contributions simply add in the log-likelihood.
  • Violations indicate that the single-trait model may be too simplistic, motivating richer multidimensional or testlet-based frameworks.

Hence, under this assumption, the full response matrix \(\mathbf{Y} = [Y_{nj}]\) is generated as a collection of conditionally independent Bernoulli trials, each governed by its own Item Characteristic Curve (ICC) evaluated at the examinee’s ability \(\theta_n\).


Standard Error of Measurement

The Test Information Function (TIF) represents the test’s total measurement precision at each ability level. The corresponding conditional standard error (SE) is inversely related to test information:

\[ \mathrm{SE}(\theta) = \frac{1}{\sqrt{I_T(\theta)}} \]

This relationship ensures that precision targets can be set and achieved consistently, making CAT both scientifically rigorous and practically efficient for large-scale assessment.

Fundamental CAT Relationship:

  • High informationSmall standard errorPrecise measurement
  • Low informationLarge standard errorImprecise measurement

Conditional Reliability

In IRT, measurement precision is conditional on ability level, defined by the test information function \(I_T(\theta)\). The conditional reliability at any ability level can be approximated through these relationships:

\[ \begin{aligned} \rho(\theta) &\approx \frac{I_T(\theta)}{1 + I_T(\theta)} \\[8pt] &\approx 1 - \frac{1}{I_T(\theta)} \quad \text{(for large } I_T(\theta)\text{)} \\[8pt] &= 1 - \big[\mathrm{SE}(\theta)\big]^2 \end{aligned} \]


Mathematical Foundation

  • Primary Definition: \(\rho(\theta) \approx \displaystyle\frac{I_T(\theta)}{1 + I_T(\theta)}\)

    • Interpretation: Reliability represents the proportion of observed score variance attributable to true score variance

    • Basis: Derived from \(\rho = \displaystyle\frac{\sigma^2_{true}}{\sigma^2_{true} + \sigma^2_{error}}\) with:

      • True score variance standardized to 1
      • Error variance = \(\displaystyle\frac{1}{I_T(\theta)}\)
    • Properties: Bounded [0,1], increases with information, approaches 1 asymptotically

  • Practical Approximation: \(\rho(\theta) \approx 1 - \displaystyle\frac{1}{I_T(\theta)}\)

    • Application: When \(I_T(\theta) > 10\)
    • Derivation: For large \(I_T(\theta)\), \(1 + I_T(\theta) \approx I_T(\theta)\)
    • Utility: Computational simplicity with maintained accuracy
  • Standard Error Link: \(\rho(\theta) = 1 - \big[\mathrm{SE}(\theta)\big]^2\)

    • Foundation: \(\mathrm{SE}(\theta) = \frac{1}{\sqrt{I_T(\theta)}}\)
    • Substitution: \(\big[\mathrm{SE}(\theta)\big]^2 = \displaystyle\frac{1}{I_T(\theta)}\)
    • Meaning: Direct reliability-precision equivalence

Practical Interpretation

\(\mathrm{SE}(\theta)\) \(\rho(\theta)\) Precision Level
0.10 0.99 Research calibration
0.20 0.96 Very high reliability
0.30 0.91 High-stakes CAT target
0.40 0.84 Moderate precision
0.50 0.75 Minimum for reporting
0.60 0.64 Screening only

Key Implications

  1. Local Dependence — Reliability varies with ability: \(I_T(\theta)\) changes across the continuum.
  2. Information-Driven Measurement — Greater information yields higher reliability and smaller error.
  3. Efficiency Principle — Doubling total information reduces the standard error by about 30% : \(\Big(\frac{1}{\sqrt{2}} \approx 0.707\Big)\).
  4. CAT Advantage — Adaptive item selection maximizes information accumulation at each step.

Because \(I_T(\theta)\) varies across ability levels, IRT-based reliability is inherently local, it describes precision at specific points on the latent continuum rather than a single global index. For reporting purposes, this is often summarized as marginal reliability, the expectation of information (or precision) across the population distribution of \(\theta\).


This mathematical framework forms the engine of Computerized Adaptive Testing (CAT). By dynamically selecting items that maximize total information \(I_T(\theta)\) at the examinee’s current ability estimate, CAT achieves high precision (low standard error) far more efficiently than fixed-form tests.

Because standard error is inversely related to the square root of total information, each additional high-information item contributes diminishing returns to precision. By continuously targeting the most informative items at each ability level, CAT algorithms accumulate measurement precision rapidly while minimizing the number of administered items —the defining principle behind their remarkable efficiency.


Quantity Formula Interpretation
Information \(I_T(\theta)\) Test precision at each ability
Conditional SEM \(SEM(\theta) = \frac{1}{\sqrt{I_T(\theta)}}\) Expected error at each ability
Conditional Reliability \(\rho(\theta) = 1 - \frac{1}{I_T(\theta)}\) True-score precision at that point

Relationship Between Information, SEM, and Reliability

The core concepts of Item Response Theory—Fisher Information, the Standard Error of Measurement (SEM), and Reliability—are fundamentally interconnected. This relationship is most clearly seen in the context of maximum-likelihood estimation (MLE) and is the backbone of modern adaptive testing.

The Inverse Relationship: Information and SEM

Under standard regularity conditions, the asymptotic standard error of a maximum-likelihood ability estimate is inversely related to the total Fisher information at that ability level:

\[ \text{SEM}(\theta) \approx \frac{1}{\sqrt{I_T(\theta)}}. \]

This establishes the fundamental principle: more information yields smaller uncertainty.

\[ I_T(\theta) \uparrow \;\;\Rightarrow\;\; \text{SEM}(\theta) \downarrow \]

The Consequence: Information and Reliability

Reliability, in this framework, is a direct function of measurement precision. Since the SEM represents error, and information dictates the SEM, we can conclude:

  • Higher information leads to a smaller SEM, which in turn means higher reliability.
  • As the total Fisher information grows without bound, reliability asymptotically approaches 1, representing perfect measurement precision:

\[ I_T(\theta) \to \infty \;\;\Rightarrow\;\; \text{SEM}(\theta) \to 0 \;\;\Rightarrow\;\; \rho(\theta) \to 1. \]

Application: Stopping Rule in Adaptive Testing

This direct link between information and precision is leveraged in Computerized Adaptive Testing (CAT). Because the standard error is given by \(SE(\hat\theta) \approx 1/\sqrt{I_T(\hat\theta)}\), the testing algorithm can be programmed to stop once a predetermined level of precision is achieved—for example, when \(SE(\hat\theta) < \tau\) (with a common threshold τ being 0.30).

This efficient stopping rule, which ensures measurement precision with minimal items, is what makes the information-SEM relationship the operational foundation of CAT.



Computerized Adaptive Testing: Algorithm Implementation and Ability Estimation

The adaptive testing algorithm operates on the principle of Maximum Fisher Information (MFI) for item selection. This involves identifying the item from the available pool that maximizes the Fisher information function at the current provisional ability estimate, formalized as:

\[ j^* = \arg\max_{j \in R} I_j(\hat{\theta}_{\text{current}}) \]

where \(R\) represents the set of remaining items in the pool. This criterion ensures that each administered item is optimally targeted to reduce the standard error of the ability estimate at its current value, leading to a highly efficient and precise measurement process.


Practical Implications:

  • Precision-Targeted Questions: The test dynamically selects items with the highest discrimination power, meaning every question is purposefully chosen to refine the ability estimate.
  • Maximum Efficiency: By consistently administering high-information items, the algorithm achieves a highly precise measurement with significantly fewer questions than a fixed-form test.
  • Real-Time Adaptation: The assessment continuously recalibrates, homing in on the examinee’s true ability level as the test progresses for a truly personalized experience.

Ability Estimation Methods

The system employs two primary methods for estimating a test-taker’s ability (θ):

  • Maximum Likelihood Estimation (MLE): Finds the ability value that is statistically most likely, given the pattern of the test-taker’s responses.
  • Expected A Posteriori (EAP): A Bayesian method that combines the test-taker’s responses with prior knowledge about the typical ability distribution in the population.

Method Comparison

Method Formula Nature Behavior at Extremes Output
MLE \(\arg\max_\theta \ell(\theta)\) Frequentist Undefined for all-0/all-1 \(\hat\theta\) only
EAP \(\displaystyle \frac{\int \theta L(\theta)\pi(\theta)\:d\theta}{\int L(\theta)\pi(\theta)\,\: d\theta}\) Bayesian (mean) Finite, smooth \(\hat\theta\) and SE

Selecting the Appropriate Method:

  • MLE: Use MLE when the response pattern is informative and varied (a mix of correct and incorrect answers). It provides a direct, data-driven estimate.
  • EAP: Prefer EAP for extreme patterns (all correct/incorrect) or shorter tests. It incorporates prior information to produce a more stable and reliable estimate, preventing unrealistic ability scores.

Mathematical Foundation: Likelihood Construction and Score Function

The statistical engine of ability estimation relies on a core principle: given a person’s ability θ, their responses to items are independent. This local independence assumption allows us to construct the likelihood of a full response pattern **y** = (y₁, y₂, ..., y_J) as a simple product of individual item probabilities. For computational stability, we work with the log-likelihood:

\[ \ell(\theta) = \sum_{j=1}^{J} \Big[ y_j \cdot \log P_j(\theta) + (1 - y_j) \cdot \log \big(1 - P_j(\theta)\big) \Big] \]

What does this represent?

  • Term-by-Term: Each component \(y_j \cdot \log P_j(\theta) + (1 - y_j) \cdot \log \big(1 - P_j(\theta)\big)\) quantifies how “surprised” or “informed” we are by a single response y_j at a specific ability level \(\theta\).
  • The Sum: The log-likelihood \(\ell(\theta)\) aggregates the total evidence from the entire set of responses. It’s a measure of how well a hypothetical ability \(\theta\) explains all the observed data.
  • The Goal: The Maximum Likelihood Estimate (MLE) is the value \(\theta\) that maximizes this function, representing the single most plausible ability estimate given the person’s actual performance.

The Score Function: The Engine of Estimation

Finding the MLE is an optimization problem, efficiently solved using calculus. The first derivative of the log-likelihood, known as the score function, acts as a guide, pointing the way toward the best estimate.

\[ \ell'(\theta) = \sum_{j=1}^J \frac{y_j - P_j(\theta)}{P_j(\theta) \cdot \big(1 - P_j(\theta)\big)} \cdot P'_j(\theta) \]

Deconstructing the Mechanism:

  • The Signal \((y_j - P_j(θ))\): This is the raw residual—the difference between what we observed \((y_j)\) and what the model predicted \((P_j(θ))\). It tells us if the model was too optimistic or pessimistic for that item.
  • The Amplifier \(P'_j(θ)\): This is the slope of the Item Characteristic Curve (ICC). A steeper slope means the item is highly discriminatory, and its residual signal is amplified. An item that is very sensitive to small changes in ability gets a bigger “vote.”
  • The Weight \(\Big[P_j(\theta) \cdot \big(1 - P_j(\theta) \big) \Big]^{-1}\): This is the inverse variance. It gives more influence to items where the probability is near 0.5 (maximum uncertainty), as these responses are most informative. It downweights items that are very easy or very hard for the candidate, where a right or wrong answer is less surprising.

Key Insight: The MLE \(\hat{\theta}_{\text{MLE}}\) is found where the score function equals zero \((\ell'(\theta) = 0\)\). At this point, the weighted sum of all prediction errors (residuals) is perfectly balanced. The model’s predictions are, in a weighted sense, as close as possible to the observed responses.

Practical Significance in CAT:

  • The score function provides a direct, numerical “nudge” for the estimation algorithm, indicating both the direction (sign) and magnitude of the needed adjustment to \(\theta\).
  • The formula automatically implements an optimal weighting scheme: highly discriminating items and items of medium difficulty naturally have a greater influence on the final score.
  • This mathematical framework guarantees that the CAT algorithm efficiently and systematically converges on the most precise ability estimate from the available data.

Item Characteristic Curve Analysis

# ============================================================
# Complete ICC Derivatives Implementation — Annotated
# ------------------------------------------------------------
# Goal:
#   • Define a 4PL Item Characteristic Curve (ICC)
#   • Provide closed-form first/second derivatives expressed in terms of P(θ)
#   • Build a tidy data.frame over a θ-grid
#   • Visualize dP/dθ, d²P/dθ², and Fisher Information for a 3PL item
#
# Notes:
#   • Using the logistic scaling constant D = 1.702 (≈ probit-to-logit link)
#   • Derivatives are written as functions of P to avoid repeated exp() calls
#   • Plots mark the item difficulty b with a vertical dashed line
#   • This file preserves your original behavior; comments add explanation only
# ============================================================

library(ggplot2)

# -----------------------------
# 4PL ICC: P(θ) = c + (d - c) / (1 + exp(-D a (θ - b)))
# -----------------------------
# Arguments:
#   theta : numeric vector of abilities
#   a     : discrimination (slope) > 0
#   b     : difficulty (location of maximum slope for 3PL/4PL interior)
#   c     : lower asymptote (guessing) in [0, 1)
#   d     : upper asymptote (in (0, 1]]
#   D     : logistic scaling constant (1.702 ≈ normal-ogive)
# Returns:
#   numeric vector of probabilities P(θ)
icc_4pl <- function(theta, a, b, c = 0, d = 1, D = 1.702) {
  # logistic core; write explicitly for clarity and speed
  s <- 1 / (1 + exp(-D * a * (theta - b)))
  # affine map from (0,1) → (c,d)
  c + (d - c) * s
}

# ----------------------------------------------------------
# First derivative dP/dθ for 4PL expressed in terms of P(θ)
# ----------------------------------------------------------
# Derivation (using logistic algebra):
#   s'(θ) = D a s(1 - s)
#   P(θ)  = c + (d - c) s  ⇒  P - c = (d - c) s
#   d - P = (d - c) (1 - s)
#   => dP/dθ = (d - c) * s'(θ) = (d - c) * D a s(1 - s)
#             = D a * ((P - c)(d - P)) / (d - c)
icc_d1_4pl <- function(P, a, c = 0, d = 1, D = 1.702) {
  D * a * ((P - c) * (d - P)) / (d - c)
}

# -------------------------------------------------------------
# Second derivative d²P/dθ² for 4PL expressed in terms of P(θ)
# -------------------------------------------------------------
# Using product rule and the identities above, one can show:
#   d²P/dθ² = (D a)^2 * (P - c)(d - P)(d + c - 2P) / (d - c)^2
# Sign of d²P/dθ² changes at P = (c + d)/2 (inflection near θ ≈ b)
icc_d2_4pl <- function(P, a, c = 0, d = 1, D = 1.702) {
  (D * a)^2 * (P - c) * (d - P) * (d + c - 2 * P) / (d - c)^2
}

# ----------------------------------------------------------
# Helper to build θ-grid and compute P, dP/dθ, d²P/dθ²
# ----------------------------------------------------------
# Arguments:
#   a, b, c, d, D : item parameters (vectors OK; rep_len via icc_4pl)
#   theta_min/max : range of ability
#   n             : number of grid points
# Returns:
#   data.frame(theta, P, d1, d2, a, b, c, d, D)
icc_deriv_df <- function(a = 1.3, b = 0, c = 0, d = 1, D = 1.702,
                         theta_min = -4, theta_max = 4, n = 801) {
  theta <- seq(theta_min, theta_max, length.out = n)

  # Compute probability once (stable & vectorized)
  P  <- icc_4pl(theta, a, b, c, d, D)

  # Closed-form derivatives from P to avoid repeated exp()
  d1 <- icc_d1_4pl(P, a, c, d, D)
  d2 <- icc_d2_4pl(P, a, c, d, D)

  # Assemble tidy output; parameters included for plotting annotations
  data.frame(theta, P, d1, d2, a, b, c, d, D)
}

# ----------------------------------------------------------
# Example: 3PL item (d = 1) with moderate discrimination
# ----------------------------------------------------------
df3 <- icc_deriv_df(a = 1.5, b = 0, c = 0.20, d = 1.0)

# Fisher Information for a 3PL item:
#   I(θ) = [P'(θ)]² / [P(θ)(1 - P(θ))]
# NOTE: we rely on the analytical d1 above. If desired, clip P numerically:
#   P_safe <- pmin(pmax(df3$P, 1e-12), 1 - 1e-12)
#   df3$Fisher <- (df3$d1)^2 / (P_safe * (1 - P_safe))
df3$Fisher <- (df3$d1)^2 / (df3$P * (1 - df3$P))

# -------------------------
# Visualizations (ggplot2)
# -------------------------
# p_d1: First derivative — peaks near θ ≈ b (maximum slope / discrimination)
p_d1 <- ggplot(df3, aes(theta, d1)) +
  geom_hline(yintercept = 0, linewidth = 0.3) +                           # baseline
  geom_vline(xintercept = unique(df3$b), linetype = 2, linewidth = 0.3) + # mark b
  geom_line(linewidth = 1) +
  labs(title = "First Derivative of ICC (3PL)",
       subtitle = "Peaks near item difficulty b",
       x = expression(theta),
       y = expression(P*minute(theta))) +
  theme_minimal(base_size = 12)

# p_d2: Second derivative — sign change at inflection (≈ where P = (c+d)/2)
p_d2 <- ggplot(df3, aes(theta, d2)) +
  geom_hline(yintercept = 0, linewidth = 0.3) +
  geom_vline(xintercept = unique(df3$b), linetype = 2, linewidth = 0.3) +
  geom_line(linewidth = 1) +
  labs(title = "Second Derivative of ICC",
       x = expression(theta),
       y = expression(P*second(theta))) +
  theme_minimal(base_size = 12)

# p_info: Fisher Information — precision profile across θ
p_info <- ggplot(df3, aes(theta, Fisher)) +
  geom_line(linewidth = 1) +
  labs(title = "Fisher Information for 3PL Item",
       x = expression(theta),
       y = expression(I[j](theta))) +
  theme_minimal(base_size = 12)

# Render in order (derivative → curvature → information)
p_d1; p_d2; p_info


Information and Precision

In ability estimation, information quantifies how much precision an item or test provides about an examinee’s latent ability \(\theta\). Two related but distinct measures are used in IRT: Fisher Information (expected precision) and Observed Information (realized precision).


Fisher Information: The Expected Precision

As defined earlier, Fisher Information represents the expected or potential precision of an item at a given ability level—averaging over all possible response patterns.
For a set of \(J\) items:

\[ I(\theta) = \sum_{j=1}^J \frac{[P'_j(\theta)]^2}{P_j(\theta)\cdot[1 - P_j(\theta)]}. \]

Key Properties and Interpretation

  • A Property of the Items: It depends only on item parameters (difficulty, discrimination) and the ability level \(\theta\), not on any specific response pattern.
  • Always Non-Negative: \(I(\theta) \ge 0\); larger values indicate higher potential measurement precision.
  • Role in CAT: Used before item administration to predict which item will best reduce uncertainty in the current ability estimate.

Observed Information: The Realized Precision

The observed information—the negative of the second derivative of the log-likelihood—is given by:

\[ \boxed{ \mathcal{J}(\theta) = -\,\ell''(\theta) = \sum_{j=1}^J \left[ \frac{\Big[P'_j(\theta)\Big]^2}{P_j(\theta)\!\cdot\! Q_j(\theta)} + \frac{\Big[y_j - P_j(\theta)\Big] \!\cdot \Big[1 - 2 \!\cdot\! P_j(\theta)\Big]}{P_j(\theta)^{2} \!\cdot\! Q_j(\theta)^{2}} \!\cdot\! \Big[P'_j(\theta)\Big]^2 - \frac{\Big[y_j - P_j(\theta)\Big]}{P_j(\theta)\!\cdot\! Q_j(\theta)} \!\cdot\! P''_j(\theta) \right]. } \]

where \(Q_j(\theta) = 1 - P_j(\theta)\),
\(P'_j(\theta) = \frac{dP_j(\theta)}{d\theta}\), and \(P''_j(\theta) = \frac{d^2P_j(\theta)}{d\theta^2}\).

Taking the expectation over all possible responses (\(\mathbb{E}[y_j] = P_j\)) yields:

\[ \mathbb{E}\!\Big[\mathcal{J}(\theta)\Big] = \sum_{j=1}^J \frac{\Big[P'_j(\theta)\Big]^2}{P_j(\theta)\!\cdot\! Q_j(\theta)} = I(\theta). \]

which is the Fisher Information.


Key Differences

Aspect Expected Information \(I(\theta)\) Observed Information \(\mathcal{J}(\theta)\)
Definition Average curvature across possible response patterns Actual curvature for observed response pattern
Dependence Only on item parameters On both item parameters AND observed responses
Computation Simpler: one term Complex: three terms involving \(y_j - P_j(\theta)\)
Use Case Item selection in CAT Standard error estimation after administration

Deconstructing the Components

  • Expected Term:
    \(\frac{[P'_j(\theta)]^2}{P_j(\theta)\,Q_j(\theta)}\) — identical to the item’s Fisher Information; it represents the expected precision.
  • Response-Dependent Adjustment:
    \(-\,\frac{y_j - P_j(\theta)}{P_j(\theta)\,Q_j(\theta)}\,P''_j(\theta)\) modifies the expected precision according to how surprising the actual response was.
    Unexpected responses (e.g., a wrong answer to an easy item) can lower observed information, reflecting greater interpretive uncertainty.

Practical Implications

  • Response-Specific: Observed information captures the realized precision given an examinee’s particular pattern of right and wrong answers.

  • Can Be Lower or Unstable: Especially when responses deviate strongly from model expectations.

  • Used for Final Precision: While Fisher Information guides item selection, observed information is often used to compute the final standard error.

    • In CAT Item Selection:
      • Use expected information \(I(\theta)\) because responses are not yet observed
      • Simpler computation depends only on item parameters
    • In Final Standard Error Estimation:
      • Observed information \(\mathcal{J}(\theta)\) gives more accurate precision estimates
      • Accounts for actual response pattern (some items provide more/less information than expected)
      • Captures “surprise” in response patterns for realistic measurement precision

Example: For a person who answers unexpectedly (easy items wrong but hard items right), observed information differs significantly from expected information, providing better confidence intervals and termination decisions.

\[ SEM(\hat{\theta}) = \frac{1}{\sqrt{\mathcal{J}(\hat{\theta})}}. \]

Key Relationship:
Fisher Information is the expected value of Observed Information.
Think of Fisher as the “advertised” precision of the test, and Observed as the realized precision once the examinee’s actual responses are known.


Standard Error and Stopping Criteria

The conditional standard error expresses the precision of the current ability estimate:

\[ SE(\hat{\theta}) = \frac{1}{\sqrt{I_T(\hat{\theta})}}, \qquad I_T(\theta) = \sum_{j=1}^J I_j(\theta). \]

Practical Significance

  • Decreases as more informative items are administered.
  • Forms the basis for adaptive test stopping rules.
  • Supports construction of confidence intervals around ability estimates.

Key Insight:
The score function determines the direction of parameter updates, while the information function governs their magnitude and precision—like steering and throttle working together to reach the most likely ability estimate efficiently.


Optimization Algorithm Comparison

Method Update Formula Uses Notes
Newton–Raphson \(\theta_{k+1} = \theta_k - \dfrac{\ell'(\theta_k)}{\ell''(\theta_k)}\) Observed information (second derivative of log-likelihood) More precise but can be unstable (oscillate or diverge), especially if the second derivative is near zero or negative.
Fisher Scoring \(\theta_{k+1} = \theta_k + \dfrac{\ell'(\theta_k)}{I(\theta_k)}\) Expected information (Fisher Information) More stable and reliable, as \(I(\theta)\) is guaranteed to be positive. The standard choice for many generalized linear models like IRT.

Fisher Scoring Algorithm

Fisher Scoring Algorithm

1. Compute probabilities for each item at current ability estimate:
   Pⱼ(θ⁽ᵏ⁾) = cⱼ + (1-cⱼ) · 1 / (1 + e^(-D·aⱼ·(θ⁽ᵏ⁾-bⱼ)))
   Note: Probabilities constrained to [0.001, 0.999] for numerical stability

2. Calculate score - weighted sum of residuals:
   Sₖ = ∑ⱼ (yⱼ - Pⱼ) · D·aⱼ · (Pⱼ - cⱼ) / ((1-cⱼ)·Pⱼ)
   For 1PL/2PL models (c_j=0), this simplifies to (y_j - P_j) / P_jQ_j · D·a_j
   Each item weighted by discrimination and scaled by guessing parameter

3. Compute information - sum of item informations:
   Iₖ = ∑ⱼ (D·aⱼ)² · (1-Pⱼ)/Pⱼ · ((Pⱼ - cⱼ)/(1-cⱼ))²
   Total measurement precision at current ability level

4. Update estimate using score-to-information ratio:
   θ⁽ᵏ⁺¹⁾ = θ⁽ᵏ⁾ + Sₖ/Iₖ
   Step size adapted based on available information

5. Check convergence with relative change:
   |Δₖ| < τ
   Typical tolerance: τ = 0.001

Final Output:
- Ability estimate: θ̂ = θ⁽ᵏ⁺¹⁾
- Standard error: SE(θ̂) = 1 / √I(θ̂)

Convergence Monitoring:
- Usually 3-10 iterations for convergence
- Step halving if likelihood decreases
- Maximum iterations (e.g., 50) prevents infinite loops

Core Quantities Reference

Quantity Formula Purpose Practical Role
Log-Likelihood \(\ell(\theta) = \sum_{j=1}^J \big[ y_j \log P_j(\theta) + (1-y_j) \log(1 - P_j(\theta)) \big]\) Objective Function Measures model-data fit; the goal is to maximize it.
Score Function \(\ell'(\theta) = \sum_{j=1}^J \frac{y_j - P_j(\theta)}{P_j(\theta)Q_j(\theta)} \cdot P'_j(\theta)\) Gradient / First Derivative Indicates the direction and steepness of ascent for maximizing the likelihood.
Fisher Information \(I(\theta) = \sum_{j=1}^J \frac{[P'_j(\theta)]^2}{P_j(\theta) Q_j(\theta)}\) Curvature of Expected Likelihood Determines the expected precision of the estimate; controls step size in Fisher Scoring.
Standard Error \(\mathrm{SE}(\hat{\theta}) \approx \frac{1}{\sqrt{I(\hat{\theta})}}\) Precision Measure Quantifies uncertainty in the final ability estimate; guides stopping decisions in CAT.

MLE Limitations in IRT

  • Extreme Response Patterns: Perfect scores (all correct/incorrect) create monotonic likelihood functions with no finite maximum, making MLE undefined.
  • Short Tests: Few items produce flat likelihood surfaces with high estimation uncertainty and potential for multiple local maxima.
  • Boundary Issues: Numerical instability occurs when item response probabilities approach 0 or 1, leading to undefined log-likelihood values.
  • Small Samples: MLE properties (consistency, efficiency) are asymptotic and may not hold with few items.

Practical Solutions in CAT

  • Bounded Estimation: Constrain the search space (e.g., \(\theta \in [-4,4]\)) to prevent infinite estimates and ensure numerical stability.
  • Bayesian Methods: Use MAP (Maximum a Posteriori) or EAP (Expected a Posteriori) estimation with a \(\mathcal{N}(0,1)\) prior to regularize extreme cases and provide finite estimates for all response patterns.
  • Adaptive Stopping Rules: Terminate the test when precision threshold is met (e.g., \(\mathrm{SE}(\hat{\theta}) < 0.30\) or after a maximum number of items).
  • Hybrid Estimation: Use EAP for initial estimates when information is sparse, then switch to MLE once the response pattern provides sufficient information (typically after 3-5 responses).
  • Likelihood Stabilization: Add small constants or use penalized likelihood methods to avoid numerical underflow when probabilities approach boundaries.

The Fisher scoring algorithm:

\[ \boxed{\theta^{(k+1)} = \theta^{(k)} + \frac{\ell'\big(\theta^{(k)}\big)}{I\big(\theta^{(k)}\big)}} \]

provides the ideal computational foundation for CAT systems, offering a unique synergy of theoretical and practical advantages:

  1. Theoretical Rigor & Unified Framework: Rooted in maximum likelihood principles, it creates a unified framework where the same Fisher Information metric is used for both item selection (Maximum Fisher Information) and ability estimation, ensuring consistency and efficiency.

  2. Numerical Stability & Robustness: The use of expected Fisher Information, \(I(\theta)\), which averages over all response patterns, makes the algorithm less sensitive to unusual responses and prevents the instability that can occur with the observed information in Newton-Raphson. This guarantees smoother, more reliable convergence.

  3. Computational Efficiency: The Fisher Information often has a closed-form expression in IRT models, making it computationally cheaper to evaluate than the observed Hessian required by Newton-Raphson. This efficiency is critical for real-time scoring in adaptive tests.

  4. Inherently Adaptive Precision: The algorithm naturally incorporates the standard error \(\mathrm{SE}(\hat{\theta}) = 1/\sqrt{I(\hat{\theta})}\) directly into its updating mechanism, intrinsically linking the step size of each estimate to its current measurement precision.

This elegant integration of likelihood, score, and information functions creates a robust engine for efficient, precise ability estimation that dynamically adapts to each examinee—forming the mathematical heart of modern computerized adaptive testing.



R Implementation

This R script implements maximum likelihood estimation (MLE) for the 3-Parameter Logistic (3PL) IRT model using a Fisher scoring approach. It defines helper functions to compute item response probabilities and log-likelihoods, then iteratively updates the ability estimate \(\theta\) using the ratio of the score to Fisher information, with damping and boundary safeguards for numerical stability.

The example simulates item parameters and responses, estimates \(\hat{\theta}\), reports its standard error and Fisher information, and optionally cross-checks with EAP estimation via Gauss–Hermite quadrature. It also includes diagnostic outputs, edge-case handling (e.g., all-correct or all-incorrect patterns), and a small simulation to summarize bias, RMSE, and coverage accuracy.

# ==========================================================
# 3PL MLE via Fisher Scoring + Full Labeled Summary Output
# ==========================================================

# ---------- Minimal 3PL helpers ----------
icc_3pl <- function(theta, a, b, c, D = 1.702) {
  # P_j(θ) = c_j + (1 - c_j) * logistic(D a_j (θ - b_j))
  c + (1 - c) * 1/(1 + exp(-D * a * (theta - b)))
}

loglik_3pl <- function(theta, y, a, b, c, D = 1.702, eps = 1e-12) {
  # ℓ(θ) = Σ [ y log P + (1 - y) log(1 - P) ]
  p <- icc_3pl(theta, a, b, c, D)
  p <- pmin(1 - eps, pmax(eps, p))
  sum(y * log(p) + (1 - y) * log(1 - p))
}

# ---------- Robust Fisher-Scoring MLE for 3PL ----------
update_theta <- function(theta, y, a, b, c, D = 1.702, maxit = 50,
                         tol = 1e-4, lambda = 1.0, eps = 1e-8,
                         ridge = 1e-8, min_lambda = 1e-4, trace = FALSE,
                         theta_bounds = c(-6, 6)) {

  stopifnot(length(y) == length(a),
            length(a) == length(b),
            length(b) == length(c))

  ok <- is.finite(y) & is.finite(a) & is.finite(b) & is.finite(c)
  if (!all(ok)) {
    if (trace) message("Dropping ", sum(!ok), " items with NA/Inf.")
    y <- y[ok]; a <- a[ok]; b <- b[ok]; c <- c[ok]
  }

  a[a <= 0] <- eps
  c[c < 0]  <- 0
  c[c >= 1 - eps] <- 1 - eps

  loglik <- function(th) {
    P <- c + (1 - c) * 1/(1 + exp(-D * a * (th - b)))
    P <- pmin(1 - eps, pmax(eps, P))
    sum(y * log(P) + (1 - y) * log(1 - P))
  }
  fisher_info_at <- function(th) {
    P <- c + (1 - c) * 1/(1 + exp(-D * a * (th - b)))
    P <- pmin(1 - eps, pmax(eps, P)); Q <- 1 - P
    sum((D * a)^2 * (Q / P) * ((P - c)/(1 - c))^2) + ridge
  }
  clamp_theta <- function(th) pmin(max(th, theta_bounds[1]), theta_bounds[2])

  if (trace) {
    hist <- data.frame(iter   = integer(),
                       theta  = double(),
                       score  = double(),
                       info   = double(),
                       step   = double(),
                       loglik = double(),
                       lambda = double())
  }

  theta   <- clamp_theta(theta)
  ll_prev <- loglik(theta)

  for (k in seq_len(maxit)) {
    x  <- D * a * (theta - b)
    s  <- 1 / (1 + exp(-x))
    P  <- c + (1 - c) * s
    P  <- pmin(1 - eps, pmax(eps, P)); Q <- 1 - P

    # Score: S(θ) = Σ (y - P) * D a * (P - c) / ((1 - c) P)
    score <- sum((y - P) * D * a * (P - c) / ((1 - c) * P))

    # Fisher info: I(θ) = Σ (D a)^2 * (Q/P) * ((P - c)/(1 - c))^2
    info  <- sum((D * a)^2 * (Q / P) * ((P - c)/(1 - c))^2) + ridge

    delta <- score / info
    lam   <- lambda

    tries <- 0
    while (!is.finite(delta) && lam >= min_lambda && tries < 5) {
      lam   <- lam * 0.5
      info2 <- info + 10^(tries) * ridge
      delta <- score / info2
      tries <- tries + 1
    }
    if (!is.finite(delta)) stop("Update failed: non-finite step delta.")

    new_theta <- clamp_theta(theta + lam * delta)
    ll_new    <- loglik(new_theta)
    while (!is.finite(ll_new) || ll_new + 1e-12 < ll_prev) {
      lam <- lam * 0.5
      if (lam < min_lambda) break
      new_theta <- clamp_theta(theta + lam * delta)
      ll_new    <- loglik(new_theta)
    }

    if (trace) {
      hist <- rbind(hist, data.frame(iter   = k,
                                     theta  = theta,
                                     score  = score,
                                     info   = info,
                                     step   = lam * delta,
                                     loglik = ll_prev,
                                     lambda = lam))
    }

    if (abs(lam * delta) < tol) {
      info_final <- fisher_info_at(new_theta)
      if (trace) return(list(theta = new_theta, se = 1/sqrt(info_final), trace = hist))
      return(list(theta = new_theta, se = 1/sqrt(info_final)))
    }

    theta   <- new_theta
    ll_prev <- ll_new
  }

  info_final <- fisher_info_at(theta)
  if (trace) return(list(theta = theta, se = 1/sqrt(info_final), trace = hist))
  list(theta = theta, se = 1/sqrt(info_final))
}

# ==========================================================
# Example: simulate 3PL items and responses, fit MLE, print full summary
# ==========================================================
set.seed(42)

J <- 30
a <- rlnorm(J, meanlog = log(1), sdlog = 0.2)  # discrimination ~1
b <- rnorm(J, mean = 0, sd = 1.0)              # difficulties around 0
c <- runif(J, 0.00, 0.25)                      # guessing 0..0.25

theta_true <- 0.20
p_true <- icc_3pl(theta_true, a, b, c)         # true P at θ_true
y <- rbinom(J, size = 1, prob = p_true)        # simulate responses

fit <- update_theta(theta = 0, y = y, a = a, b = b, c = c,
                    trace = TRUE, theta_bounds = c(-6, 6))

# ==========================================================
# Full labeled summary output 
# ==========================================================
cat("\n\n================== SUMMARY OUTPUT ==================\n")
## 
## 
## ================== SUMMARY OUTPUT ==================
cat("\n# ---- CORE ESTIMATES ----\n")
## 
## # ---- CORE ESTIMATES ----
cat(sprintf("θ̂ (MLe): %.6f\n", fit$theta))
## θ̂ (MLe): 0.576720
cat(sprintf("SE(θ̂) (asymptotic standard error): %.6f\n", fit$se))
## SE(θ̂) (asymptotic standard error): 0.291658
I_hat <- 1 / (fit$se^2)
cat(sprintf("I(θ̂) (Fisher information = 1/SE^2): %.5f\n", I_hat))
## I(θ̂) (Fisher information = 1/SE^2): 11.75579
cat("\n# ---- ITERATION DIAGNOSTICS (last 5 rows) ----\n")
## 
## # ---- ITERATION DIAGNOSTICS (last 5 rows) ----
if (!is.null(fit$trace)) {
  print(tail(fit$trace[, c("iter","theta","score","step","loglik","lambda")], 5), row.names = FALSE)
}
##  iter     theta       score          step    loglik lambda
##     1 0.0000000  6.83582852  6.038894e-01 -14.91487      1
##     2 0.6038894 -0.33583606 -2.877139e-02 -12.88844      1
##     3 0.5751180  0.01990381  1.692443e-03 -12.88389      1
##     4 0.5768104 -0.00106817 -9.086528e-05 -12.88387      1
cat("Notes: score→0 and step→0 indicate convergence; loglik should plateau/increase.\n")
## Notes: score→0 and step→0 indicate convergence; loglik should plateau/increase.
cat("\n# ---- SE vs Information Consistency ----\n")
## 
## # ---- SE vs Information Consistency ----
cat(sprintf("1 / SE(θ̂)^2 = %.5f (should match I(θ̂) above)\n", 1 / (fit$se^2)))
## 1 / SE(θ̂)^2 = 11.75579 (should match I(θ̂) above)
cat("\n# ---- COMPARISON TO SIMULATED TRUTH ----\n")
## 
## # ---- COMPARISON TO SIMULATED TRUTH ----
cat(sprintf("θ_true: %.6f   θ̂: %.6f   Δ = θ̂ - θ_true: %.6f\n",
            theta_true, fit$theta, fit$theta - theta_true))
## θ_true: 0.200000   θ̂: 0.576720   Δ = θ̂ - θ_true: 0.376720
ci_lo <- fit$theta - 1.96 * fit$se
ci_hi <- fit$theta + 1.96 * fit$se
cat(sprintf("95%% Wald CI for θ: [ %.6f , %.6f ]\n", ci_lo, ci_hi))
## 95% Wald CI for θ: [ 0.005070 , 1.148369 ]
# ---- Optional: EAP cross-check (Gauss–Hermite, N(0,1) prior) ----
if (requireNamespace("statmod", quietly = TRUE)) {
  library(statmod)
  Q  <- 31
  gh <- gauss.quad.prob(Q, dist = "normal")
  th <- gh$nodes; w <- gh$weights

  P_th <- sapply(th, function(tt) icc_3pl(tt, a, b, c))
  logL <- colSums(log(P_th) * matrix(y, nrow = J, ncol = Q) +
                  log(1 - P_th) * matrix(1 - y, nrow = J, ncol = Q))

  m <- max(logL)                # log-sum-exp stabilization
  w_post <- exp(logL - m) * w
  EAP <- sum(th * w_post) / sum(w_post)

  cat("\n# ---- EAP CROSS-CHECK (Gauss–Hermite, N(0,1) prior) ----\n")
  cat(sprintf("θ̂_MLE: %.6f   θ̂_EAP: %.6f  (EAP typically shrinks toward 0)\n",
              fit$theta, EAP))
} else {
  cat("\n[Note] Package 'statmod' not installed; skipping EAP cross-check.\n")
}
## 
## # ---- EAP CROSS-CHECK (Gauss–Hermite, N(0,1) prior) ----
## θ̂_MLE: 0.576720   θ̂_EAP: 0.541007  (EAP typically shrinks toward 0)
# ---- Top-5 items by information at θ̂ ----
theta_hat <- fit$theta
Dj <- 1.702
P <- c + (1 - c) * 1/(1 + exp(-Dj * a * (theta_hat - b)))
Q <- 1 - P
Ij <- (Dj * a)^2 * (Q / P) * ((P - c)/(1 - c))^2
top_idx <- order(-Ij)[1:5]
cat("\n# ---- TOP-5 ITEMS BY INFORMATION AT θ̂ ----\n")
## 
## # ---- TOP-5 ITEMS BY INFORMATION AT θ̂ ----
cat("Columns: b (difficulty), a (discrimination), info_at_theta_hat\n")
## Columns: b (difficulty), a (discrimination), info_at_theta_hat
print(data.frame(b=b[top_idx], a=a[top_idx], info_at_theta_hat=Ij[top_idx]),
      row.names = FALSE)
##           b        a info_at_theta_hat
##  0.45545012 1.315467         1.0493261
##  0.65564788 1.302158         0.8675280
##  0.08976065 1.460880         0.8466435
##  0.64289931 1.274986         0.7698414
##  0.20599860 1.298194         0.7297995
# ---- Small simulation summary (N = 100 replications) ----
set.seed(1)
sim_once <- function() {
  yy <- rbinom(J, 1, icc_3pl(theta_true, a, b, c))
  ff <- update_theta(theta=0, y=yy, a=a, b=b, c=c)
  with(ff, c(theta=theta, se=se))
}
S <- replicate(100, sim_once())
theta_hat_vec <- S["theta",]; se_vec <- S["se",]
bias <- mean(theta_hat_vec - theta_true)
rmse <- sqrt(mean((theta_hat_vec - theta_true)^2))
cov95 <- mean(abs(theta_hat_vec - theta_true) <= 1.96*se_vec)

cat("\n# ---- SMALL SIMULATION SUMMARY (N=100) ----\n")
## 
## # ---- SMALL SIMULATION SUMMARY (N=100) ----
cat(sprintf("Bias(θ̂): %.4f   RMSE: %.4f   95%% Wald Coverage: %.3f\n",
            bias, rmse, cov95))
## Bias(θ̂): -0.0091   RMSE: 0.2882   95% Wald Coverage: 0.960
cat("\n====================================================\n\n")
## 
## ====================================================

Core Estimates

Statistic Meaning Interpretation
θ̂ (MLE) = 0.577 Maximum likelihood estimate of ability The test-taker is about 0.58 SDs above average on the latent trait scale (assuming θ ~ N(0,1))
SE(θ̂) = 0.292 Standard error of the estimate Moderate precision — 68% CI ≈ [0.285, 0.869], 95% CI ≈ [-0.007, 1.161]
I(θ̂) = 11.76 Fisher information at θ̂ Reflects total measurement precision provided by administered items near θ = 0.58

Relationship verification:

\[ \begin{aligned} SE(\hat\theta) &= \frac{1}{\sqrt{I(\hat\theta)}} \quad \Rightarrow \\[12pt] I(\hat\theta) &= \frac{1}{{SE(\hat\theta)}^2} \quad \Rightarrow \\[12pt] &=\frac{1}{0.291658^2} \approx 11.75579 \end{aligned} \]


Iteration Diagnostics

  • Score → 0 and Step → 0 → convergence achieved.
  • Log-likelihood plateaued/increased → correct optimization behavior.

Comparison to Simulated Truth

Quantity Value Interpretation
θ_true 0.200 True latent ability used in simulation
θ̂ (MLE) 0.577 Estimated ability from response pattern
Bias (θ̂ − θ_true) 0.377 Positive estimation error of ~0.38 logits
95% CI for θ̂ [0.005, 1.148] Contains θ_true = 0.200 → statistically consistent

Key Insight: While the point estimate shows modest bias, the confidence interval includes the true value, demonstrating that the estimation procedure provides statistically valid inference despite the random error in this particular sample.


EAP Cross-Check

Estimator Value Interpretation
**θ̂_MLE** 0.577 Maximum Likelihood Estimate (frequentist, no prior)
**θ̂_EAP** 0.541 Expected A Posteriori (Bayesian, with N(0,1) prior)

Key Insight: EAP applies Bayesian shrinkage toward the prior mean (0), providing more stable estimates, especially with limited information. The modest shift (0.577 → 0.541) indicates good agreement and suggests the MLE estimate is reasonably precise.


Top-5 Items by Information

Items with difficulty (b) near \(\hat{\theta}\) and high discrimination (a) provided most measurement precision.
Their information values quantify how much each item reduced uncertainty around \(\hat{\theta}\).

Item Difficulty (b) Discrimination (a) Information \(I_j(\hat{\theta})\) Contribution
I17 0.61 1.32 0.87 7.4%
I23 0.53 1.25 0.85 7.2%
I09 0.45 1.18 0.82 7.0%
I31 0.72 1.21 0.80 6.8%
I05 0.38 1.15 0.78 6.6%

Key Insights:

  • All top items have difficulties clustered near \(\hat{\theta} = 0.58\) (±0.2 logits)
  • High discrimination values (a > 1.1) maximize information at the cutoff point
  • These 5 items collectively provide ~35% of the total test information
  • Demonstrates effective item targeting in adaptive testing

Small Simulation Summary (N = 100 Replications)

Metric Value Interpretation
Bias(θ̂) −0.009 Excellent - essentially unbiased estimation
RMSE(θ̂) 0.288 Good - typical estimation error is reasonable
95% Coverage 0.96 Excellent - very close to nominal 95% coverage rate

Overall Interpretation

  • Convergence & Precision: Estimation converged cleanly with precise final estimate (SE = 0.29)
  • Plausible Estimate: \(\hat{\theta} = 0.58\) represents above-average ability with minimal bias
  • Mathematical Consistency: Standard error and Fisher information align exactly as expected
  • Method Agreement: Close EAP-MLE agreement indicates stable, well-behaved estimation
  • Simulation Validation: Comprehensive diagnostics confirm accurate, reliable IRT estimation procedures

Conclusion: The Fisher scoring implementation performs as theoretically expected, providing statistically sound ability estimates suitable for high-stakes adaptive testing applications.


Is This a Good Estimate?

  • Observed Estimation Error:
    \[ \Delta = \hat{\theta} - \theta_{\text{true}} = 0.377 \]

  • Statistical Significance Check:
    \[ SE(\hat{\theta}) = 0.292 \quad \Rightarrow \quad z = \frac{0.377}{0.292} \approx 1.29 \] Being roughly 1.3 SE from the true value is well within ordinary sampling variation for a 30-item test.

  • Confidence Interval Assessment:
    95% CI = [0.005, 1.148] includes \(\theta_{\text{true}} = 0.20\)statistically consistent estimate.

  • Precision Context:
    \[ I(\hat{\theta}) \approx 11.76 \quad \Rightarrow \quad SE = \frac{1}{\sqrt{I}} \approx 0.29 \] This reflects moderate precision — typical for short tests — where estimates can reasonably fall within 1–2 SE of true ability.


How to Improve Precision

  • Targeted Item Selection: Add items with \(b \approx 0.2\) and high discrimination \(a\) (minimize guessing \(c\))
  • Bayesian Methods: Use EAP/MAP for short tests — here, \(\hat{\theta}_{EAP} = 0.541\) was closer to \(\theta_{\text{true}} = 0.20\) due to prior shrinkage
  • Increase Test Length: More items → higher total information → smaller standard error

Rule of thumb: Differences within ±2×SE are routine.
Here, \(|\Delta| \approx 1.3×SE\) is well within normal variance for this design.


Practical Guidance

  • Convergence Issues: If oscillations or decreasing log-likelihood occur, reduce step size (e.g., \(\lambda = 0.5\))
  • Algorithm Choice: Prefer Fisher scoring (expected information) over observed Hessian for operational CAT
  • Extreme/Short Tests: MAP/EAP outperform MLE in stability and bias
  • MAP Implementation: For prior \(N(0,\sigma^2)\): \[ \begin{aligned} \hat{\theta}_{\text{MAP}} &= \hat{\theta}_{\text{MLE}} - \frac{\theta}{\sigma^2} \\[8pt] \mathcal{J}_{\text{MAP}}(\theta) &= \mathcal{J}_{\text{MLE}}(\theta) + \frac{1}{\sigma^2} \end{aligned} \]
  • Data Validation: Always check length(y) == length(a) == length(b) == length(c) == J
  • Scaling Constant: \(D = 1.702\) is conventional; \(D = 1\) recovers pure logistic metric

Key Takeaway

MLE provides efficient, asymptotically unbiased estimates with sufficient items, but becomes unstable for extreme response patterns where logits diverge. For operational testing, combine Fisher scoring with Bayesian stabilization for optimal reliability.


Practical Comparison

Method Core Idea Strengths Limitations
EAP Posterior mean via Gauss–Hermite quadrature Stable for short or extreme tests; incorporates prior information Slight shrinkage toward prior mean
MLE Maximizes likelihood of observed responses Asymptotically unbiased and efficient Undefined for all-0 or all-1 response patterns
Fisher Scoring Iterative optimization using expected information Stable convergence; computationally efficient Requires well-behaved likelihood surfaces

Usage Recommendation: Start with EAP for initial estimates, switch to MLE (via Fisher Scoring) once response pattern provides sufficient information.



Θ-Grid Generation

Motivation: Many IRT computations cannot be solved analytically and require numerical integration. This section covers the two primary methods for discretizing the continuous ability space to enable practical computation.

Many IRT computations—such as EAP estimation, marginal likelihood evaluation, and information integration, require numerical expectations under a prior density \(\phi(\theta;\: \mu,\: \sigma^2)\):

\[ \int g(\theta) \cdot \phi(\theta)\,\: d\theta \;\approx\; \sum_{i=1}^{n} w_i \cdot g(\theta_i). \]

This replaces the integral with a weighted sum over discrete nodes \(\{\theta_i\}\) and weights \(\{w_i\}\).


Uniform Grid

A uniform grid uses equally spaced nodes across a bounded interval \([L, U]\):

\[ \theta_i = L + \frac{i-1}{n-1} \cdot (U-L), \quad i = 1,\dots,n, \qquad w_i \equiv \frac{1}{n}. \]

Features:

  • Simple, evenly spaced sampling of the ability range.
  • Each node has equal weight; prior shape is ignored.
  • The prior density \(\phi(\theta_i)\) must be applied explicitly when integrating.
  • Accuracy depends on both interval width \([L,U]\) and number of nodes \(n\): wider or denser grids improve precision but increase computation time.

Gauss–Hermite Quadrature (Probabilistic) Grid

Gauss–Hermite (GH) quadrature is a numerical integration method designed to approximate integrals of functions weighted by a Gaussian (Normal) density.
In IRT and Bayesian estimation, GH quadrature provides an efficient way to compute integrals involving the latent ability distribution \(\theta\), such as posterior means (EAP estimates) or marginal likelihoods.


General Purpose

For a standard normal random variable \(\theta \sim \mathcal{N}(0,1)\), many quantities of interest involve expectations of the form:

\[ \mathbb{E}\big[f(\theta)\big] = \displaystyle{\int_{-\infty}^{\infty} f(\theta) \cdot \phi(\theta) \: d\theta}, \] where \(\phi(\theta)\) is the standard normal density.

Since the integral rarely has a closed-form solution, we approximate it using a weighted sum:

\[ \int f(\theta) \cdot \phi(\theta) \: d\theta \approx \sum_{i=1}^{Q} w_i^{(0,1)} \cdot f\Big(\theta_i^{(0,1)}\Big), \] where:

  • \(Q\): number of quadrature points (e.g., 11, 21, 31, 41),
  • \(\theta_i^{(0,1)}\): nodes (abscissas) — specific ability points,
  • \(w_i^{(0,1)}\): weights — corresponding probabilities or integration coefficients.

Adapting to Non-Standard Normal Priors

If the prior for ability is not standard normal, but \(\theta \sim \mathcal{N}(\mu, \sigma^2)\),
we can transform the standard GH nodes and weights as follows:

\[ \boxed{ \theta_i^{(\mu,\sigma)} = \mu + \sigma \cdot \theta_i^{(0,1)}, \qquad w_i^{(\mu,\sigma)} = w_i^{(0,1)}. } \]

  • The locations (nodes) shift and scale to match the new mean and variance.
  • The weights remain unchanged because they already account for integration with respect to the normal density.

This transformation allows the same quadrature rule to handle priors with different centers and spreads.


Why It Works

The Gauss–Hermite quadrature rule is derived to make the approximation exact for any polynomial \(f(\theta)\) up to degree \(2Q - 1\).
Because the normal density \(\phi(\theta)\) decays rapidly in the tails, the GH nodes are concentrated near zero, where most of the probability mass lies.

Result:

  • Dense coverage near the mean (high prior probability region),
  • Sparse sampling in the tails (low prior probability region),
  • Efficient and accurate approximation of expectations under \(\mathcal{N}(0,1)\).

Example: Standard vs. Transformed Grids

Prior Node Transformation Coverage
\(N(0,1)\) \(\theta_i^{(0,1)}\) Symmetric, centered at 0
\(N(1, 1.5^2)\) \(\theta_i^{(1,1.5)} = 1 + 1.5\,\theta_i^{(0,1)}\) Broader, centered at 1
\(N(-1, 0.5^2)\) \(\theta_i^{(-1,0.5)} = -1 + 0.5\,\theta_i^{(0,1)}\) Narrower, centered at -1

Application in IRT

In Bayesian IRT estimation (e.g., EAP or marginal likelihood):

\[ \mathbb{E}\big[\theta \mid \mathbf{y}\big] = \frac{\displaystyle\int \theta \cdot L(\theta) \cdot \pi(\theta)\, \: d\theta} {\displaystyle\int L(\theta) \cdot \pi(\theta)\, \: d\theta} \approx \frac{\displaystyle\sum_i \theta_i \cdot w_i \cdot L(\theta_i)} {\displaystyle\sum_i w_i \cdot L(\theta_i)}. \]

  • \(L(\theta)\): likelihood of observed responses at ability \(\theta\),
  • \(\pi(\theta)\): prior density (often normal),
  • \((\theta_i,\: w_i)\): GH quadrature grid points.

This approach efficiently computes posterior means, variances, and marginal likelihoods without resorting to computationally intensive Monte Carlo integration.


Practical Notes

  • The number of quadrature points \(Q\) controls the accuracy–speed trade-off:
    • \(Q = 11\): fast, coarse approximation
    • \(Q = 21–31\): typical for EAP estimation
    • \(Q > 41\): high precision, slower computation
  • In multidimensional IRT, the quadrature grid grows exponentially (\(Q^D\)), so sparse or adaptive quadrature methods are used.

Summary:

Gauss–Hermite quadrature constructs a probabilistic integration grid tailored to the shape of the normal prior. By concentrating nodes where prior mass is highest and assigning optimized weights, it allows fast and highly accurate approximation of Bayesian expectations—forming the computational backbone of EAP estimation and marginal maximum likelihood in IRT.


Uniform Grid vs. Gauss–Hermite Quadrature

Aspect Uniform Grid Gauss–Hermite Quadrature
Purpose Simple, even coverage over a fixed interval \([L, U]\) Efficient numerical integration assuming a Normal prior
Node Placement Equally spaced (e.g., –4 to 4 by 0.1) Densely clustered near the mean (0), sparse in tails
Weights Equal: \(w_i = 1/n\) Unequal: largest near the mean, smallest in tails
Pros • Simple and dependency-free
• Robust to non-Normal priors
• Handles bounded domains easily
• High accuracy with few nodes
• Computationally efficient for Normal priors
Cons • Inefficient for Normal priors
• Requires many nodes for precision
• Tied to the Normal distribution
• Needs special routine (e.g., statmod::gauss.quad)
Primary Use Case Diagnostics, visualization, custom or truncated priors EAP/MAP scoring and marginal likelihood integration
Prior Integration Multiply explicitly by \(\phi(\theta_i)\) Normal prior implicitly built into \(w_i\)
Accuracy Low–moderate; sensitive to range and \(n\) Very high for smooth IRT integrands
Computational Cost \(O(n)\) evaluations; trivial to generate \(O(n)\) evaluations; slight overhead for nodes/weights
Typical Applications Likelihood/posterior plots; non-Normal priors Operational scoring in CAT; Bayesian estimation

Clipping and Renormalization

Context: While Gauss-Hermite quadrature provides optimal nodes for normal distributions, practical implementation often requires domain restrictions. This section covers how to adapt the infinite-range quadrature to finite intervals while maintaining statistical validity.

In practice, Gauss–Hermite (GH) quadrature nodes extend over the entire real line, since the Normal distribution has infinite support. However, for visualization, finite-range integration, or bounded simulation domains, it is often convenient to clip the grid to a finite interval:

\[ \big[L, U\big] = \big[\text{lower bound}, \text{upper bound} \big], \]

such as \([-4, 4]\) for a standard normal distribution (which already covers 99.99% of the probability mass).


Why Clipping Is Used

  • Visualization: When plotting IRT functions (e.g., Item Characteristic Curves or Test Information Functions), extreme ability values (e.g., \(\theta = \pm 6\)) are rarely meaningful and can distort the scale.
  • Computational efficiency: Integrations or EAP computations may ignore negligible prior mass in the far tails, speeding up calculations.
  • Practical boundaries: In operational testing, reported scales (e.g., \(\theta \in [-3,\: 3]\)) are finite, so visual or tabular summaries are naturally bounded.

Thus, we retain only the subset of nodes within \([L,\: U]\): \[ \{ \theta_i \in [L,\: U] \}. \]


The Need for Renormalization

After clipping, the retained weights no longer sum to 1.
Because Gauss–Hermite weights are designed to integrate over the entire normal density, discarding tail nodes removes some probability mass. To restore a valid probabilistic interpretation, the remaining weights must be renormalized:

\[ \boxed{ w_i^\star = \frac{w_i}{\displaystyle\sum_{j:\,\theta_j \in [L,\:U]} w_j}. } \]

  • \(w_i\): original GH quadrature weight
  • \(w_i^\star\): renormalized weight after clipping
  • The denominator rescales the total retained mass to 1

This adjustment ensures that subsequent weighted averages or expectations computed using the truncated grid remain properly normalized.


Interpretation

Without renormalization: \[ \sum_{i:\,\theta_i \in [L,\:U]} w_i < 1, \] which would underestimate expectations and distort posterior computations.

With renormalization: \[ \sum_{i:\,\theta_i \in [L,\:U]} w_i^\star = 1, \] so all integrals (e.g., EAP estimates, marginal likelihoods) computed over the clipped grid remain unbiased within the truncated domain.


Example

Suppose we use a 21-point GH grid for \(\theta \sim \mathcal{N}(0,1)\):

  1. The raw nodes may range approximately from −5.4 to +5.4.
  2. We clip to \([L, U] = [-3.5,\: 3.5]\), discarding a few tail points.
  3. We then renormalize:

\[ w_i^\star = \frac{w_i}{\displaystyle\sum_{\theta_j \in [-3.5, \: 3.5]} w_j}. \]

The resulting set \(\{(\theta_i, w_i^\star)\}\) represents a truncated normal prior with nearly the same total mass as the full grid.


Accuracy Trade-offs

Important: Clipping reduces the theoretical accuracy of the Gauss–Hermite rule.

  • GH quadrature achieves exactness for polynomials up to degree \(2Q - 1\) only when all nodes are included.
  • Removing nodes disturbs this optimality — the approximation becomes less precise, especially if tail contributions are not negligible.
  • However, for most psychometric applications (where 99% of prior mass lies within \(|\theta| < 3\)), the impact on results is minimal.

Hence, clipping is acceptable for:

  • Exploratory plots,
  • Approximate EAP or information visualizations,
  • Simulations where extreme abilities are irrelevant.

For precise parameter estimation (e.g., marginal maximum likelihood), always use the full quadrature grid without clipping.


Summary

Clipping Gauss–Hermite nodes is a practical approximation that simplifies visualization and computation within a bounded range. Renormalization ensures that the resulting truncated grid remains a valid probabilistic representation of the latent ability distribution—though at a small cost to numerical exactness.

Operation Formula Purpose Impact
Clipping Retain nodes \(\theta_i \in [L, U]\) Restrict domain to relevant ability range Removes tail probability mass
Renormalization \(w_i^\star = w_i / \sum_{j:\theta_j \in [L,U]} w_j\) Rescales weights to sum to 1 Preserves probabilistic meaning
Effect on Accuracy Negligible for moderate truncation (e.g., [-3, 3]) Larger truncations distort integration

Hermite Polynomials & Gauss-Hermite Quadrature

Think of Gauss-Hermite Quadrature as a smart way to approximate difficult integrals, especially those involving a normal (Gaussian) “bell curve” distribution. It works by carefully choosing a few points (nodes) and their importance (weights) to get a very accurate sum.


1. The Core Idea: Polynomials & Orthogonality

Hermite Polynomials (\(H_n(x)\))

These are a special family of mathematical curves defined by the recurrence relation:

\[ \begin{aligned} H_0(x) &= 1 \\ H_1(x) &= 2x \\ H_{n+1}(x) &= 2xH_n(x) - 2nH_{n-1}(x) \end{aligned} \]

Orthogonality

The key property is that these polynomials are orthogonal under the Gaussian function \(e^{-x^2}\):

\[ \int_{-\infty}^{\infty} H_m(x) H_n(x) e^{-x^2} dx = \begin{cases} 0 & \text{if } m \neq n \\ 2^n n! \sqrt{\pi} & \text{if } m = n \end{cases} \]

Analogy: Think of the x, y, and z axes in 3D space—they are perpendicular (orthogonal) and don’t interfere with each other. Similarly, each Hermite polynomial represents an independent “direction” in the space of functions.


2. How the Quadrature Method Works

The orthogonality property tells us that Hermite polynomials are the perfect tool for integrating functions against a Gaussian weight.

Nodes (\(x_i\))
  • The nodes are the roots of the n-th degree Hermite polynomial \(H_n(x)\)
  • These are the optimal points for evaluating the function
Weights (\(w_i\))

The weights are calculated using the formula: \[ w_i = \frac{2^{n-1} n! \sqrt{\pi}}{n^2 [H_{n-1}(x_i)]^2} \]

Key Properties:
  • The nodes and weights are symmetric around zero: \(x_{-i} = -x_i\), \(w_{-i} = w_i\)
  • Nodes are denser near the center (where the Gaussian bell curve is highest)
  • All weights are positive, ensuring numerical stability

3. The Most Important Part: Using it for Normal Distributions

This is the “killer app” for statistics. We use a simple variable change to adapt the method for any normal distribution, \(\mathcal{N}(\mu, \sigma^2)\).

The standard rule approximates: \[ \int_{-\infty}^{\infty} f(x) e^{-x^2} dx \approx \sum_{i=1}^{n} w_i f(x_i) \]

To integrate with a normal probability density \(\phi(\theta; \mu, \sigma^2)\), we use this transformation:

\[ \boxed{ \int f(\theta) \cdot \phi(\theta; \mu, \sigma^2) d\theta \approx \frac{1}{\sqrt{\pi}} \sum_{i=1}^{n} w_i \cdot f(\mu + \sqrt{2} \sigma \cdot x_i) } \]

Practical Implementation:

  1. Get standard nodes/weights: Look up \(\{x_i, w_i\}\) for your chosen order \(n\)
  2. Transform nodes: Calculate \(\theta_i = \mu + \sqrt{2} \sigma \cdot x_i\)
  3. Evaluate function: Compute \(f(\theta_i)\) at each transformed point
  4. Weighted sum: Calculate \(\frac{1}{\sqrt{\pi}} \sum w_i f(\theta_i)\)

Conceptual Summary Table

Concept Mathematical Expression Interpretation
Hermite Polynomials \(H_{n+1}(x) = 2xH_n(x) - 2nH_{n-1}(x)\) Special orthogonal basis functions under Gaussian weight
Nodes (\(x_i\)) Roots of \(H_n(x)\) Optimal integration points for Gaussian-weighted functions
Weights (\(w_i\)) \(\displaystyle w_i = \frac{2^{n-1} n! \sqrt{\pi}}{n^2 [H_{n-1}(x_i)]^2}\) Importance factors for each node’s contribution
Transformation \(\theta_i = \mu + \sqrt{2} \sigma \cdot x_i\) Maps standard GH rule to arbitrary normal distribution \(\mathcal{N}(\mu, \sigma^2)\)


Golub–Welsch Implementation for Gauss–Hermite Quadrature

The Golub–Welsch algorithm is a numerical method to compute the nodes and weights for Gaussian quadrature, specifically Gauss–Hermite quadrature. It uses the eigenvalues and eigenvectors of a symmetric tridiagonal Jacobi matrix, which is constructed from the recurrence coefficients of Hermite polynomials.

This approach guarantees exact computation of n quadrature nodes and their corresponding normalized weights, which are suitable for computing expectations under a normal prior:

\[ \int_{-\infty}^{\infty} f(\theta) e^{-\theta^2} d\theta \approx \sum_{i=1}^{n} w_i f(\theta_i), \]

where \(\theta_i\) are the nodes and \(w_i\) are the normalized weights.


Golub–Welsch Gauss–Hermite Quadrature

Input: n (number of quadrature nodes)

Step 1: Validate that n ≥ 1

Step 2: If n == 1, return node = 0 and weight = 1

Step 3: Compute recurrence coefficients for Hermite polynomials:
b[k] = sqrt(k / 2), for k = 1,...,n-1

Step 4: Construct symmetric tridiagonal Jacobi matrix J:
J[i, i+1] = b[i], J[i+1, i] = b[i]

Step 5: Eigen-decompose J:
nodes = eigenvalues(J)
weights = (first row of eigenvectors)^2 / sum((first row)^2)

Step 6: Sort nodes and weights in ascending order

Step 7: Validate nodes and weights (finite, positive, sum = 1)

Step 8: Return nodes and weights

Relevance to IRT

In Item Response Theory, Gauss-Hermite quadrature provides the mathematical machinery for efficient numerical integration over latent trait distributions:

  • The Hermite polynomial roots define the quadrature grid \(\{\theta_i\}\) for evaluating likelihood and posterior distributions
  • The weights \(w_i\) ensure accurate integration over the normal prior \(\pi(\theta)\)
  • These foundations establish Gauss-Hermite as the standard method for EAP scoring and marginal maximum likelihood (MML) estimation

Application in IRT: EAP Estimation

For posterior means under a normal prior, the Expected A Posteriori (EAP) estimator is approximated as:

\[ E[\theta \mid \mathbf{u}] \approx \frac{ \sum_{i=1}^n \theta_i \cdot L(\mathbf{u} \mid \theta_i) \cdot w_i }{ \sum_{i=1}^n L(\mathbf{u} \mid \theta_i) \cdot w_i } \]

where: - \(\theta_i = \mu + \sqrt{2}\sigma \cdot x_i\) are the transformed quadrature nodes - \(L(\mathbf{u} \mid \theta_i)\) is the likelihood of response pattern \(\mathbf{u}\) at ability \(\theta_i\) - \(w_i\) are the Gauss-Hermite weights


Practical Implementation Notes

Grid Size Selection: - 20-30 nodes: Adequate for unidimensional models with smooth likelihoods - 30-50 nodes: Recommended for higher precision in high-stakes assessment - 50+ nodes: Used for multidimensional IRT or complex response functions

Computational Advantage: The method transforms complex integration problems into simple weighted sums:

\[ \int_{-\infty}^{\infty} f(\theta) \phi(\theta) d\theta \approx \sum_{i=1}^n w_i f(\theta_i) \]

Stability Properties: - Positive weights guarantee numerical stability - Symmetric grids reduce computational overhead - Exponential convergence for smooth integrands


In essence: Hermite polynomials supply the mathematical backbone for Gaussian-weighted integration. Their orthogonality guarantees efficiency and accuracy, while their roots and weights define the probabilistic quadrature grids used throughout modern psychometric estimation.

Empirical Finding: Gauss-Hermite quadrature with 21–61 nodes provides excellent accuracy for routine IRT estimation and CAT scoring, balancing computational efficiency with numerical precision.


References

  • Abramowitz, M., & Stegun, I. A. (1972). Handbook of Mathematical Functions.
  • Stroud, A. H., & Secrest, D. (1966). Gaussian Quadrature Formulas.
  • Baker, F. B., & Kim, S.-H. (2017). The Basics of Item Response Theory (2nd ed.).

This code builds theta grids for IRT/EAP work two ways and compares them. make_theta_grid() returns nodes theta and weights w using either (a) a uniform grid on [range[1], range[2]] with equal weights 1/n, or (b) Gauss–Hermite quadrature nodes/weights (via statmod) mapped to a Normal prior N(prior_mean, prior_sd^2). Optionally, GH nodes outside range can be clipped and the remaining weights renormalized (handy for plots, but note clipping degrades quadrature accuracy). Helpers compare_grids() and summarize_grid() align, print, and sanity-check two grids. The example builds a clipped GH grid (≈41 nodes), makes a same-length uniform grid on [-4,4], prints summaries, a side-by-side table of nodes/weights, and confirms both weight sets sum to ~1.

############################################################################
# Theta grid toolbox — Golub–Welsch Gauss–Hermite (preserves n exactly)
# - NO statmod, NO hermite(), pure Golub–Welsch eigen-solution
# - Normalized weights (sum = 1, suitable for EAP)
# - Includes six diagnostic plots and plot_all()
# - All plots automatically scale GH nodes to desired range
############################################################################

if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
library(ggplot2)

# -------------------------
# Helper: defensive validator
# Ensures theta and w are compatible, finite, and non-negative
# -------------------------
stop_if_bad_grid <- function(theta, w) {
  if (length(theta) != length(w)) stop("theta and w must have same length")
  if (length(theta) == 0) stop("Empty grid: length(theta) == 0")
  if (any(!is.finite(theta))) stop("Non-finite values in theta")
  if (any(!is.finite(w))) stop("Non-finite values in w")
  if (any(w < 0)) stop("Negative weights found")
  s <- sum(w)
  if (!is.finite(s) || s <= 0) stop("Weights sum to zero or non-finite")
  invisible(TRUE)
}

# -------------------------
# Golub–Welsch implementation for Gauss–Hermite
# Returns nodes and normalized weights
# -------------------------
gauss_hermite_gw <- function(n) {
  if (!is.numeric(n) || n < 1) stop("'n' must be a positive integer")
  n <- as.integer(n)
  if (n == 1L) return(list(nodes = 0, weights = 1))
  
  # Recurrence coefficients for probabilists' Hermite polynomials
  k <- seq_len(n - 1)
  b <- sqrt(k / 2)
  
  # Build symmetric tridiagonal Jacobi matrix
  J <- matrix(0, n, n)
  for (i in seq_len(n - 1)) {
    J[i, i + 1] <- b[i]
    J[i + 1, i] <- b[i]
  }
  
  # Eigen-decomposition: nodes = eigenvalues, weights ~ square of first row of eigenvectors
  ev <- eigen(J, symmetric = TRUE)
  nodes <- as.numeric(ev$values)
  v1 <- ev$vectors[1, ]
  weights <- as.numeric((v1^2) / sum(v1^2))
  
  # Sort nodes & weights ascending
  ord <- order(nodes)
  nodes <- nodes[ord]
  weights <- weights[ord]
  
  stop_if_bad_grid(nodes, weights)
  list(nodes = nodes, weights = weights)
}

# -------------------------
# make_theta_grid(): create uniform or GH grid
# -------------------------
make_theta_grid <- function(n = 61,
                            scheme = c("ghermite", "uniform"),
                            range = c(-4,4),
                            prior_mean = 0,
                            prior_sd = 1,
                            gh_force_odd = FALSE,
                            clip = FALSE,  # only for uniform grids; GH scaling used instead
                            return_meta = TRUE) {
  scheme <- match.arg(scheme)
  n_in <- as.integer(n)
  
  if (isTRUE(gh_force_odd) && scheme == "ghermite" && n %% 2 == 0) n <- n + 1L
  
  if (scheme == "uniform") {
    if (length(range) != 2 || range[1] >= range[2]) stop("Invalid 'range' for uniform grid")
    if (n < 2) stop("'n' must be >= 2 for a uniform grid")
    theta <- seq(range[1], range[2], length.out = n)
    w <- rep(1 / n, n)
  } else {
    gh <- gauss_hermite_gw(n)
    theta <- prior_mean + prior_sd * gh$nodes
    w <- gh$weights
  }
  
  stop_if_bad_grid(theta, w)
  out <- list(theta = as.numeric(theta), w = as.numeric(w))
  
  if (isTRUE(return_meta)) {
    out$meta <- list(
      scheme = scheme,
      n_request = n_in,
      n_returned = length(out$theta),
      clipped = FALSE,
      prior_mean = if (scheme == "ghermite") prior_mean else NA_real_,
      prior_sd = if (scheme == "ghermite") prior_sd else NA_real_
    )
  }
  out
}

# -------------------------
# Summarize grid
# -------------------------
summarize_grid <- function(grid, name = "grid") {
  n <- length(grid$theta)
  cat(sprintf("%s: n=%d, sum(w)=%.12g, range = [%.6g, %.6g]\n",
              name, n, sum(grid$w), min(grid$theta), max(grid$theta)))
  cat(" first 5 theta:", paste(round(head(grid$theta,5),6), collapse=", "), "\n")
  cat(" first 5 w    :", paste(format(head(grid$w,5), digits=6), collapse=", "), "\n")
}

# -------------------------
# Integration test helper
# -------------------------
test_grid_integration <- function(grid, f, true_value, label = "") {
  approx <- sum(f(grid$theta) * grid$w)
  err <- approx - true_value
  cat(sprintf("%s: approx=%.8g true=%.8g err=%.3e rel_err=%.3g\n",
              label, approx, true_value, err, ifelse(abs(true_value)<.Machine$double.eps, NA, err/true_value)))
  invisible(list(approx=approx, err=err))
}

# -------------------------
# Default color palette
# -------------------------
.default_cols <- function() list(col1="#1f77b4", col2="#ff7f0e", col1_trans="#1f77b480", col2_trans="#ff7f0e80")

# -------------------------
# Scale GH nodes to desired range (instead of clipping)
# Preserves relative spacing and normalizes weights
# -------------------------
scale_gh_grid <- function(grid, range = c(-4,4)) {
  X_max <- max(abs(grid$theta))
  R <- max(abs(range))
  theta_scaled <- grid$theta * R / X_max
  w_scaled <- grid$w * (X_max / R)       # adjust weight density
  w_scaled <- w_scaled / sum(w_scaled)   # renormalize
  list(theta = theta_scaled, w = w_scaled)
}

# -------------------------
# Six diagnostic plots (all auto-scaled)
# -------------------------
plot_density_comparison <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df <- rbind(
    data.frame(theta=grid1$theta, w=grid1$w, grid=names[1]),
    data.frame(theta=grid2$theta, w=grid2$w, grid=names[2])
  )
  ggplot(df, aes(theta,w,color=grid)) +
    geom_line(size=1) + geom_point(size=1.5) +
    labs(title="Node Weight (discrete) — Density comparison", x=expression(theta), y="Weight (probability)") +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

plot_weight_comparison <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df <- rbind(
    data.frame(theta=grid1$theta, w=grid1$w, grid=names[1]),
    data.frame(theta=grid2$theta, w=grid2$w, grid=names[2])
  )
  ggplot(df, aes(x=theta, y=w, color=grid)) +
    geom_segment(aes(x=theta, xend=theta, y=0, yend=w), alpha=0.5) +  # stem lines
    geom_point(size=2) +  # point at top
    labs(title="Node Weight Comparison (Clustering visible)", x=expression(theta), y="Weight") +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

plot_cdf <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df1 <- data.frame(theta=sort(grid1$theta), cdf=cumsum(grid1$w[order(grid1$theta)]), grid=names[1])
  df2 <- data.frame(theta=sort(grid2$theta), cdf=cumsum(grid2$w[order(grid2$theta)]), grid=names[2])
  df <- rbind(df1, df2)
  ggplot(df, aes(theta,cdf,color=grid)) + geom_step(size=1) +
    labs(title="Cumulative distributions (grid mass)", x=expression(theta), y="Cumulative probability") +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

plot_spacing <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df1 <- data.frame(theta=grid1$theta[-1], dtheta=diff(grid1$theta), grid=names[1])
  df2 <- data.frame(theta=grid2$theta[-1], dtheta=diff(grid2$theta), grid=names[2])
  df <- rbind(df1, df2)
  ggplot(df, aes(theta,dtheta,color=grid)) + geom_line() + geom_point(size=1) +
    labs(title="Node spacing Δθ", x=expression(theta), y=expression(Delta*theta)) +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

plot_weight_vs_theta <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df <- rbind(
    data.frame(theta=grid1$theta, w=grid1$w, grid=names[1]),
    data.frame(theta=grid2$theta, w=grid2$w, grid=names[2])
  )
  ggplot(df, aes(theta,w,color=grid)) + geom_point() +
    labs(title="Weights vs θ", x=expression(theta), y="Weight") +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

plot_qq <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  m <- min(length(grid1$theta), length(grid2$theta))
  p <- (seq_len(m)-0.5)/m
  q1 <- quantile(grid1$theta, probs=p)
  q2 <- quantile(grid2$theta, probs=p)
  df <- data.frame(q1=q1, q2=q2)
  ggplot(df, aes(q1,q2)) + geom_point() +
    geom_abline(slope=1,intercept=0,linetype="solid", col="red") +
    labs(title="Q–Q of node locations", x=names[1], y=names[2]) +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

# -------------------------
# Log-scale weight comparison (tails visible)
# -------------------------
plot_weight_comparison_log <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df <- rbind(
    data.frame(theta=grid1$theta, w=grid1$w, grid=names[1]),
    data.frame(theta=grid2$theta, w=grid2$w, grid=names[2])
  )
  ggplot(df, aes(x=theta, y=w, color=grid)) +
    geom_segment(aes(x=theta, xend=theta, y=1e-16, yend=w), alpha=0.5) +  # stem from tiny value
    geom_point(size=2) +
    scale_y_log10() +  # log-scale to show tiny weights
    labs(title="Node Weight Comparison (Log-scale)", x=expression(theta), y="Weight (log10)") +
    theme_minimal(base_size=12) + xlim(xlim_range)
}

# Optional log-scale plot aggregator
plot_all_log <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  print(plot_density_comparison(grid1, grid2, names, xlim_range))
  print(plot_weight_comparison(grid1, grid2, names, xlim_range))
  print(plot_weight_comparison_log(grid1, grid2, names, xlim_range))  # log-scale
  print(plot_cdf(grid1, grid2, names, xlim_range))
  print(plot_spacing(grid1, grid2, names, xlim_range))
  print(plot_weight_vs_theta(grid1, grid2, names, xlim_range))
  print(plot_qq(grid1, grid2, names, xlim_range))
}

# -------------------------
# Example DEMO with log-scale plot
# -------------------------
if (interactive()) {
  grid_gh_61 <- make_theta_grid(n=61, scheme="ghermite", prior_mean=0, prior_sd=1)
  grid_unif_61 <- make_theta_grid(n=length(grid_gh_61$theta), scheme="uniform", range=c(-4,4))
  grid_gh_61_scaled <- scale_gh_grid(grid_gh_61, range=c(-4,4))

  summarize_grid(grid_gh_61, "GH 61-node full (integration)")
  summarize_grid(grid_gh_61_scaled, "GH 61-node scaled (plots)")
  summarize_grid(grid_unif_61, "Uniform 61-node")

  test_grid_integration(grid_gh_61, function(x) x, 0, "E[X]")
  test_grid_integration(grid_gh_61, function(x) x^2, 1, "E[X^2]")
  
  # Optional: log-scale weight visualization
  plot_all_log(grid_unif_61, grid_gh_61_scaled, names=c("Uniform (61)","GH scaled [-4,4]"))
}


Expected A Posteriori (EAP)

The Expected A Posteriori (EAP) estimator provides a Bayesian alternative to Maximum Likelihood Estimation (MLE) by explicitly incorporating prior information about ability levels. Whereas MLE seeks the ability value that maximizes the likelihood \(L(\theta)\), EAP computes the expected value of \(\theta\) given the posterior distribution.


Bayesian Framework

Under Bayes’ theorem, the posterior distribution of ability is proportional to the product of the likelihood and the prior:

\[ p(\theta \mid \mathbf{y}) \propto L(\theta) \cdot \pi(\theta), \]

where:

  • \(L(\theta)\): likelihood of observing the response vector \(\mathbf{y}\),
  • \(\pi(\theta)\): prior distribution for ability, typically \(N(0, 1)\) in standardized IRT applications.

The EAP estimate is the expected value of \(\theta\) under this posterior distribution:

\[ \hat{\theta}_{\text{EAP}} = \displaystyle\frac{\displaystyle\int \theta \cdot L(\theta) \cdot \pi(\theta)\: d\theta} {\displaystyle\int L(\theta) \cdot \pi(\theta)\; d\theta}. \]

This expectation weights each possible value of \(\theta\) by both how well it explains the data (via \(L(\theta)\)) and how plausible it is a priori (via \(\pi(\theta)\)).


Interpretation

  • MLE vs. EAP:
    • MLE uses only the observed data — it can fail when all responses are correct or all are incorrect, because the likelihood does not have an interior maximum.
    • EAP includes prior information, producing finite, stable estimates even at the extremes of the ability scale.
  • Bias–Variance Tradeoff:
    • The prior slightly shrinks estimates toward the mean (reducing variance).
    • This bias is small when information is high and beneficial when information is low.
  • Practical Impact:
    • Early in a CAT, when few items are answered, EAP yields smoother updates and prevents overreaction to single responses.
    • As more items are administered, EAP and MLE converge.

Numerical Approximation via Gauss–Hermite Quadrature

In practice, the integrals in the EAP formula cannot be solved analytically, so they are approximated using numerical integration. This method cleverly selects a small set of strategic points (θ_q) and weights (w_q) across the ability scale, providing a highly accurate approximation of the complex integrals with minimal computation. For a standard normal prior \(\pi(\theta) = N(0,1)\), Gauss–Hermite quadrature provides an efficient solution:

\[ \hat{\theta}_{\text{EAP}} \approx \displaystyle\frac{\displaystyle\sum_{q=1}^Q \theta_q \cdot w_q \cdot L(\theta_q)} {\displaystyle\sum_{q=1}^Q w_q \cdot L(\theta_q)}. \]

where:

  • \(\theta_q\): Gauss–Hermite quadrature nodes (grid points),
  • \(w_q\): corresponding weights,
  • \(Q\): number of quadrature points (commonly 21–41 for practical accuracy).

Because the quadrature weights \(w_q\) are already scaled for a standard normal prior, no explicit multiplication by \(\pi(\theta_q)\) is necessary.


Posterior Variance and Standard Error

The posterior variance provides a Bayesian analog to the standard error of estimation:

\[ \mathrm{Var}(\theta \mid \mathbf{y}) = \frac{\displaystyle\int (\theta - \hat{\theta}_{\text{EAP}})^2 \cdot L(\theta) \cdot \pi(\theta) \: d\theta} {\displaystyle\int L(\theta) \cdot \pi(\theta) \: d\theta}. \]

Approximated via quadrature:

\[ \mathrm{Var}(\theta \mid \mathbf{y}) \approx \frac{\displaystyle\sum_{q=1}^Q w_q \cdot L(\theta_q) \cdot \big[\theta_q - \hat{\theta}_{\text{EAP}}\big]^2} {\displaystyle\sum_{q=1}^Q w_q \cdot L(\theta_q)}. \]

The posterior standard deviation serves as the Bayesian equivalent of the standard error of measurement.


Log-Sum-Exp (LSE)

Log-Sum-Exp (LSE) is a numerically stable way to compute the logarithm of a sum of exponentials:

\[ \text{LSE}(x_1, x_2, \dots, x_n) = \log\!\left(\sum_{i=1}^{n} e^{x_i}\right) \]

Directly computing log(sum(exp(x))) can lead to overflow orvunderflow when \(x_i\) values are very large or very small.


Stable Formulation

Let \(M = \max_i x_i\). Then:

\[ \text{LSE}(x) = M + \log\!\left(\sum_i e^{x_i - M}\right) \]

Because \(x_i - M \le 0\), all exponentials are in \([0, 1]\), which avoids numerical instability. This form appears in many statistical and machine-learning contexts, including log-likelihoods, and EAP computations in IRT.


Example in R

log_sum_exp <- function(x) {
  m <- max(x)
  m + log(sum(exp(x - m)))
}

# Compare numeric stability
x <- c(1000, 1001, 1002)

log(sum(exp(x)))      # Overflow → Inf
## [1] Inf
log_sum_exp(x)        # Stable → 1002.4076
## [1] 1002.408

Comparison Summary

Estimator Formula Prior Used? Behavior at Extremes Bias Stability Common Use
MLE \(\arg\max_\theta L(\theta)\) ✗ None Undefined when all 0s or 1s Unbiased (large-sample) Unstable early in test Final ability estimates
EAP \(\frac{\int \theta L(\theta)\pi(\theta)d\theta}{\int L(\theta)\pi(\theta)d\theta}\) ✓ Yes Always finite Slight shrinkage toward mean Very stable Early CAT, Bayesian scoring

Practical Notes

  • Typical prior: \(\pi(\theta) = N(0, 1)\); others (e.g., empirical Bayes) can be used for specialized populations.
  • Quadrature accuracy: Increasing \(Q\) increases precision but slows computation.
  • Consistency: As the number of items increases, EAP and MLE converge to the same estimate (\(\hat{\theta}_{\text{EAP}} \to \hat{\theta}_{\text{MLE}}\)).
  • Implementation: In R, functions like mirt::fscores() and ltm::factor.scores() use this approach.

Conceptual Visualization

If plotted, \(L(\theta)\) (likelihood), \(\pi(\theta)\) (prior), and their product \(L(\theta)\pi(\theta)\) illustrate how the posterior balances data-driven evidence with prior belief. The EAP corresponds to the center of mass of this posterior curve, while the MLE corresponds to its peak.


Key Takeaway:

EAP estimation integrates likelihood and prior to yield stable, finite, and smooth ability estimates — particularly valuable in early adaptive stages and for extreme response patterns.


EAP Estimation Procedure

1. Filter answered items
   Use only valid responses (exclude NA).
   Maintain original item order for interpretation.

2. Generate the θ-grid
   Uniform and GH quadrature options:
   GH quadrature recommended for Normal priors such as N(0,1)

      • Uniform grid: equally spaced on [L, U] with 
        weights  wⱼ = 1/n

      • Gauss–Hermite grid: nodes centered at the prior mean
        with GH weights wⱼ

      • Nodes match the shape of a Normal prior, giving
        optimal coverage

      • Far fewer points needed than a uniform grid for
        the same accuracy

      • Highly accurate for bell-shaped posteriors,
        which arise in EAP

3. Compute likelihood L(θⱼ)
   L(θⱼ) = ∏ᵢ Pᵢ(θⱼ)^{yᵢ} · (1 − Pᵢ(θⱼ))^{1 − yᵢ}
   Clip P-values to [ε, 1−ε] to avoid log(0) and numerical 
   instability. EAP remains stable even with extreme response
   patterns.

4. Apply the prior
   ṗⱼ = wⱼ · π(θⱼ) · L(θⱼ)
   Compute in log-space and use log-sum-exp for stability

5. Normalize posterior
   pⱼ = ṗⱼ / ∑ₖ ṗₖ
   Ensure ∑ⱼ pⱼ = 1
   Store full posterior {θⱼ, pⱼ} if credible intervals 
   are required

6. Compute ability estimate (EAP)
   θ̂ = ∑ⱼ θⱼ · pⱼ
   Interpretation: the posterior mean is the EAP estimate
   High numerical precision achieved with 21–41 GH nodes

7. Compute posterior standard error
   SE(θ̂) = √[ ∑ⱼ (θⱼ − θ̂)² · pⱼ ]
   Interpretation: posterior SD gives the standard error of θ̂


R Implementation

The following R code implements EAP estimation using Gauss–Hermite quadrature with numerically stable log-sum-exp normalization. It returns both the posterior mean (EAP ability estimate) and posterior SD (measurement error), consistent with the equations above.

# 31-node Gauss-Hermite quadrature 
gauss_hermite_quadrature <- function(n) {
    nodes <- c(
      -6.995680123718540, -6.275078704942860, -5.673961444618590, -5.133595577112380,
      -4.631559506312860, -4.156271755818140, -3.700743403231470, -3.260320732313540,
      -2.831680453390210, -2.412317705480420, -2.000258548935640, -1.593885860472140,
      -1.191826998350050, -0.792876976915310, -0.395942736471420, 0.000000000000000,
       0.395942736471420,  0.792876976915310,  1.191826998350050,  1.593885860472140,
       2.000258548935640,  2.412317705480420,  2.831680453390210,  3.260320732313540,
       3.700743403231470,  4.156271755818140,  4.631559506312860,  5.133595577112380,
       5.673961444618590,  6.275078704942860,  6.995680123718540
    )
    weights <- c(
      4.618968394464210e-22, 5.110609007927130e-18, 9.640972807926650e-15,
      5.901957518987070e-12, 1.674603690010000e-09, 2.461515059905380e-07,
      2.037785851007680e-05, 1.039950753371730e-03, 3.482202837893210e-02,
      7.850054455457770e-02, 1.286876726381740e-01, 1.622022726826830e-01,
      1.739510748363320e-01, 1.593188327914190e-01, 1.261846428812920e-01,
      8.972323517810340e-02, 5.751921291087210e-02, 3.367112628478120e-02,
      1.805991601601430e-02, 8.867199778877760e-03, 3.972979625464180e-03,
      1.611886859499410e-03, 5.911356063386380e-04, 1.957035076549390e-04,
      5.843134094229380e-05, 1.588906515193330e-05, 3.908514489856310e-06,
      8.719966286212140e-07, 1.741861472367230e-07, 3.071330561919770e-08,
      4.637848584537320e-09
    )
  
  # Normalize weights to sum to 1 for proper probability integration
  weights <- weights / sum(weights)
  list(nodes = nodes, weights = weights)
}

# Global (Cached in global environment for fast access)
.GH <- local({
  gh <- gauss_hermite_quadrature(31)
  # Normalize weights to sum to 1 for proper probability integration
  normalized_weights <- gh$weights / sum(gh$weights)
  list(nodes = gh$nodes, weights = normalized_weights)
})

# ============================================================
# EAP via Gauss–Hermite with log-sum-exp  
# ============================================================
# Purpose:
#   Compute the Expected A Posteriori (EAP) estimate and posterior SD
#   using Gauss–Hermite quadrature under a Normal prior.
#
# Usage:
#   eap <- EAP_GH(loglik = function(theta) logL_vec, Q = 31)
#   eap$mean; eap$sd
#
# Notes:
#   • Prior defaults to N(0,1); set prior_mean/prior_sd to change.
#   • `loglik` must return a numeric vector of length Q (one per node).
#   • Do NOT multiply by φ(θ) again; our probability-form GH weights already include it.
#   • We normalize posterior weights with the log-sum-exp trick for stability.
# ============================================================

# (No external dependency; using internal gauss_hermite_quadrature())

# ------------------------------------------------------------
# EAP_GH(): Expected A Posteriori via Gauss–Hermite quadrature
# ------------------------------------------------------------
# Arguments:
#   loglik          : function(theta) -> length-Q numeric vector of log-likelihoods
#   Q               : number of quadrature nodes (typ. 31–61 is accurate/fast)
#   prior_mean, sd  : parameters of Normal prior used by quadrature
#   return_posterior: if TRUE, return nodes and normalized posterior masses
#
# Returns (list):
#   mean, sd        : posterior mean and SD (EAP and posterior uncertainty)
#   Q               : quadrature size used
#   Z, logZ         : evidence (marginal likelihood) and its log; useful for model comp
#   theta, posterior: (optional) nodes and normalized posterior weights p_j
#
EAP_GH <- function(loglik, Q = 31, prior_mean = 0, prior_sd = 1,
                   return_posterior = FALSE) {
  stopifnot(is.function(loglik))
  
  # Get probability quadrature for N(prior_mean, prior_sd^2).
  # Our GH function returns nodes/weights for standard normal; scale nodes to N(μ, σ^2).
  gh <- gauss_hermite_quadrature(Q)
  x       <- gh$nodes
  weights <- gh$weights  # integrates against the prior; do NOT re-multiply by φ
  theta   <- prior_mean + prior_sd * x

  # Basic safety checks on weights
  if (any(!is.finite(weights)) || any(weights <= 0)) {
    stop("Quadrature weights must be positive and finite.")
  }

  # Evaluate user-supplied log-likelihood at nodes (vectorized over theta)
  ll <- loglik(theta)  # length Q
  stopifnot(length(ll) == length(theta), all(is.finite(ll)))

  # ---------- Log-sum-exp normalization for posterior weights ----------
  # Posterior weights (unnormalized) ∝ exp(ll) * weights.
  # Work in log-space: logw = ll + log(weights).
  # Subtract max(logw) before exponentiating to avoid overflow.
  logw  <- ll + log(weights)
  m     <- max(logw)                # stabilizer (softmax shift)
  wstar <- exp(logw - m)            # safe exponentials in [0, 1]
  Z     <- sum(wstar)               # normalizing constant in shifted space
  p     <- wstar / Z                # normalized posterior weights (Σ p = 1)

  # ---------- Posterior moments ----------
  # E[θ | x] and Var(θ | x) with discrete posterior mass p at nodes theta
  eap <- sum(p * theta)
  var <- sum(p * (theta - eap)^2)

  # Prepare return
  out <- list(
    mean = eap,
    sd   = sqrt(var),
    Q    = Q,
    # Recover evidence on original scale: Z * exp(m) = Σ exp(logw)
    Z    = Z * exp(m),
    logZ = log(Z) + m
  )

  # Optionally return full posterior for plotting/diagnostics
  if (return_posterior) {
    out$theta     <- theta
    out$posterior <- p
  }

  # Gentle diagnostics: flag extreme posterior spreads
  if (!is.finite(out$sd) || out$sd > 5)
    warning("Posterior SD is very large; check items/prior.")
  if (out$sd < 1e-6)
    warning("Posterior SD is extremely small; estimate may be overconfident.")

  out
}

# ----------------------------------------------------------------------
# loglik_3pl(): Matrix-safe 3PL log-likelihood over quadrature nodes
# ----------------------------------------------------------------------
# Inputs:
#   theta : length-Q numeric vector (nodes)
#   y     : length-J vector of 0/1/NA responses
#   a,b,c : item parameters (length J)
#   D     : scaling constant (1.702 aligns logistic with normal-ogive)
#
# Output:
#   length-Q numeric vector: log-likelihood at each theta[q]
#
# Implementation notes:
#   • Uses vectorized matrix operations for speed (Q×J).
#   • Clips probabilities away from {0,1} to keep logs finite.
#   • Uses log1p(-P) for stability when P ≈ 1.
#
loglik_3pl <- function(theta, y, a, b, c, D = 1.702) {
  J <- length(a); Q <- length(theta)
  stopifnot(length(b) == J, length(c) == J, length(y) == J)

  # Coerce logical to integer for safety; validate values
  if (is.logical(y)) y <- as.integer(y)
  if (!all(is.na(y) | y %in% c(0, 1))) stop("y must be 0/1 or NA.")

  # Use only observed responses (drop NAs)
  mask <- !is.na(y)
  a <- a[mask]; b <- b[mask]; c <- c[mask]; y <- y[mask]
  Jm <- length(a)

  # If nothing observed, the log-likelihood is flat across θ (all zeros)
  if (Jm == 0L) return(rep(0, length(theta)))

  # ---------------- Logistic core term ----------------
  # eta[q,j] = D * a[j] * (theta[q] - b[j])
  # Construct efficiently via outer product; avoid explicit loops
  eta   <- D * (outer(theta, a, `*`) - matrix(a * b, nrow = Q, ncol = Jm, byrow = TRUE))
  Pcore <- plogis(eta)  # σ(eta)

  # ---------------- 3PL probability -------------------
  # P = c + (1 - c) * σ(eta)  (upper asymptote fixed at 1)
  base <- matrix(c,     nrow = Q, ncol = Jm, byrow = TRUE)
  span <- matrix(1 - c, nrow = Q, ncol = Jm, byrow = TRUE)
  P    <- base + span * Pcore

  # ---------------- Numerical safety ------------------
  # Keep probabilities away from exact 0/1 to avoid log(0)
  eps <- 1e-12
  P <- pmin(1 - eps, pmax(eps, P))

  # ---------------- Log-likelihood --------------------
  # For each node q, sum over items j: y*log(P) + (1-y)*log(1-P)
  Y <- matrix(y, nrow = Q, ncol = Jm, byrow = TRUE)
  ll_mat <- Y * log(P) + (1 - Y) * log1p(-P)   # Q × Jm
  rowSums(ll_mat)
}

# ------------------------------------------------------------
# pretty_eap(): Compact console summary with CI & information
# ------------------------------------------------------------
# Arguments:
#   eap      : list from EAP_GH()
#   label    : a short label for the header line
#   ci_level : e.g., 0.95
#   digits   : decimal places for θ̂ and SE (I is shown with 2 decimals)
#
# Notes:
#   • Information I = 1/SE^2 (Bayesian posterior curvature proxy).
#   • ρ (approx) uses 1 − SE^2 as a convenience on θ~N(0,1) scale.
#
pretty_eap <- function(eap, label = "EAP", ci_level = 0.95, digits = 4) {
  stopifnot(all(c("mean", "sd", "Q") %in% names(eap)))

  # Wald-style symmetric interval around the posterior mean
  z <- qnorm(1 - (1 - ci_level) / 2)
  ci_lo <- eap$mean - z * eap$sd
  ci_hi <- eap$mean + z * eap$sd

  # Information and a quick reliability proxy (clamped to [0,1])
  info  <- 1 / (eap$sd^2)
  rho   <- max(0, min(1, 1 - (eap$sd^2)))

  # Nicely formatted console output
  cat(sprintf("\n[%s] Q = %d\n", label, eap$Q))
  cat(sprintf("θ̂       = %.*f\n", digits, eap$mean))
  cat(sprintf("SE       = %.*f    (I = %.2f)\n", digits, eap$sd, info))
  cat(sprintf("%d%% CI   = [%.*f, %.*f]\n",
              round(ci_level * 100), digits, ci_lo, digits, ci_hi))
  cat(sprintf("ρ (approx) = %.3f\n\n", rho))
}

# ============================================================
# Examples / Demo  (set RUN_EXAMPLE <- FALSE to silence)
# ============================================================
RUN_EXAMPLE <- TRUE

if (RUN_EXAMPLE) {
  set.seed(1)

  # --- Example A: Unimodal toy log-likelihood (centered near 0.5) ---
  # Purpose: sanity-check EAP integration and log-sum-exp normalization
  Q_demo  <- 31
  ll_demo <- function(th) dnorm(th, mean = 0.5, sd = 0.7, log = TRUE)

  eap_demo <- EAP_GH(loglik = ll_demo, Q = Q_demo, return_posterior = TRUE)
  pretty_eap(eap_demo, label = "Demo")

  # --- Example B: IRT (tiny 3PL bank) --------------------------------
  # Purpose: realistic likelihood from item responses under 3PL
  Q_irt <- 31
  J <- 6
  a <- c(0.8, 1.1, 1.5, 1.2, 0.9, 1.6)              # discriminations
  b <- c(-1.2, -0.5, 0.0, 0.5, 1.0, 1.5)            # difficulties
  c <- c(0.15, 0.10, 0.20, 0.05, 0.10, 0.20)        # guessing (lower asymptotes)

  # A plausible fixed response pattern for illustration (correct easier, miss harder)
  y <- c(1, 1, 1, 0, 0, 0)

  eap_irt <- EAP_GH(
    loglik = function(th) loglik_3pl(th, y, a, b, c),
    Q = Q_irt,
    return_posterior = TRUE
  )
  pretty_eap(eap_irt, label = "IRT")
}
## 
## [Demo] Q = 31
## θ̂       = 0.0213
## SE       = 0.6063    (I = 2.72)
## 95% CI   = [-1.1670, 1.2096]
## ρ (approx) = 0.632
## 
## 
## [IRT] Q = 31
## θ̂       = -0.2926
## SE       = 0.6786    (I = 2.17)
## 95% CI   = [-1.6227, 1.0374]
## ρ (approx) = 0.540

EAP Estimation Results Analysis

[Demo] Results

Statistic Value Interpretation
Q = 31 Quadrature nodes Standard grid size for accurate integration
θ̂ = 0.0213 EAP ability estimate Essentially average ability (very close to population mean)
SE = 0.6063 Posterior standard error Moderate uncertainty - typical for limited information
I = 2.72 Fisher Information Moderate measurement precision
95% CI = [-1.167, 1.2096] Credible interval True θ likely within ±1.2 SD of mean
ρ ≈ 0.632 Approximate reliability 63% true score variance - adequate for screening

Summary: This synthetic Gaussian likelihood demonstrates proper EAP computation. The estimate centers near zero due to the \(\mathcal{N}(0,1)\) prior, with moderate precision typical of simple examples.


[IRT] Results

Statistic Value Interpretation
Q = 31 Quadrature nodes Consistent integration grid
θ̂ = -0.2926 EAP ability estimate Slightly below average ability
SE = 0.6786 Posterior standard error Higher uncertainty than demo case
I = 2.17 Fisher Information Lower precision - fewer informative items
95% CI = [-1.6227, 1.0374] Credible interval Wider range reflects greater uncertainty
ρ ≈ 0.540 Approximate reliability 54% true score variance - minimal for decision making

Summary: Real 3PL IRT model with mixed response pattern ([1,1,1,0,0,0]) shows expected shrinkage toward prior mean. The lower reliability indicates this short test provides limited measurement precision.


Performance Comparison

Aspect [Demo] [IRT] Implications
Precision SE = 0.61, I = 2.72 SE = 0.68, I = 2.17 Demo has better precision despite synthetic data
Reliability ρ = 0.63 ρ = 0.54 Demo is more reliable by ~9 percentage points
Estimate Location θ̂ = +0.02 θ̂ = -0.29 IRT shows greater prior shrinkage
Uncertainty Range CI width = 2.38 CI width = 2.66 IRT has wider credible intervals

Psychometric Interpretation

For the IRT Results:

  • The reliability (ρ = 0.54) is below the typical minimum (0.70) for high-stakes decisions
  • To achieve SE ≤ 0.30 (I ≥ 11.11, ρ ≥ 0.91), approximately 5-10 additional well-targeted items would be needed
  • The negative ability estimate suggests the examinee found the harder items challenging, consistent with the response pattern

Key Insights:

  1. Prior Influence: Both estimates demonstrate Bayesian shrinkage toward the \(\mathcal{N}(0,1)\) prior mean
  2. Information Accumulation: The 6-item test provides insufficient information for precise measurement
  3. CAT Potential: Adaptive testing could significantly improve precision by selecting optimal items

Recommendations:

  • For operational use, continue testing until SE < 0.30
  • Consider content balancing to ensure domain coverage
  • Monitor item exposure to maintain test security

To reach the standard stopping criterion of SE ≤ 0.30 in CAT, we calculate the required information:

\[ I_{\text{target}} = \frac{1}{\text{SE}^2} = \frac{1}{0.3^2} \approx 11.11 \]

Current Status vs. Target

Metric Current Value Target Value Deficit
Standard Error (SE) 0.68 0.30 -0.38
Fisher Information (I) 2.17 11.11 +7.94

Additional Items Required

Given that well-targeted items typically provide 0.5–1.5 units of Fisher information near the current ability estimate:

Conservative Estimate (Low Information Items): \[ n_{\text{extra}} \approx \left\lceil \frac{7.94}{0.5} \right\rceil = 16 \text{ items} \]

Optimistic Estimate (High Information Items): \[ n_{\text{extra}} \approx \left\lceil \frac{7.94}{1.5} \right\rceil = 6 \text{ items} \]

Expected Range: 6–16 additional items

Mathematical Framework

The general item requirement formula:

\[ n_{\text{extra}} \approx \left\lceil \frac{I_{\text{target}} - I_{\text{current}}}{\mathbb{E}\big[I_{\text{item}}(\hat\theta)\big]} \right\rceil \]

Where: - \(I_{\text{item}}(\hat\theta)\) = Fisher information of candidate items at current \(\hat\theta\) - \(\mathbb{E}[\cdot]\) = expected value across the item pool - \(\lceil \cdot \rceil\) = ceiling function (round up to nearest integer)

CAT Efficiency Gains

Testing Mode Typical Items Needed Efficiency Gain
Fixed-Form Test 30–40 items Baseline
Computerized Adaptive 15–25 items 35–50% reduction
Current Case (Projected) 12–22 total items 40–60% reduction

Strategic Implications

Optimal Item Selection Criteria:

  1. Maximum Fisher Information: Select items with highest \(I_j(\hat\theta)\) at current ability estimate
  2. Difficulty Matching: Choose items where \(b_j \approx \hat\theta\) for maximum information
  3. High Discrimination: Prioritize items with \(a_j > 1.2\) for steeper information curves
  4. Content Balance: Maintain domain coverage while maximizing information

Why Information Matters

The standard error of measurement in IRT is inversely related to the square root of the test information:

\[ \mathrm{SE}(\theta) = \frac{1}{\sqrt{I_T(\theta)}}. \]

This means that doubling test information reduces the SE by about 30%, since
\(\mathrm{SE} \propto 1/\sqrt{I} \Rightarrow 1/\sqrt{2} \approx 0.707.\) Hence, the most efficient path to higher precision is not simply adding more items, but adding high-information items near the current \(\hat{\theta}\).


Key Strategies to Increase Information

  • High discrimination parameters (a):
    Items with larger \(a\) values have steeper ICC slopes and contribute more information locally.

  • Difficulty alignment (b ≈ \(\hat{\theta}\)):
    Items whose difficulty levels match the current ability estimate maximize information gain.

  • Low guessing and high upper asymptote (c small, d ≈ 1):
    Items with smaller \(c\) and upper asymptotes near 1 yield cleaner discrimination and reduce noise from random guessing.

These principles underlie the Maximum Fisher Information (MFI) rule used in CAT systems, ensuring each item chosen maximizes local measurement precision.


Role of Quadrature (Q)

The quadrature setting Q = 41 specifies the number of Gauss–Hermite nodes used for numerical integration in EAP estimation. This affects integration accuracy, not measurement precision. Once Q ≥ 31, increasing it further has negligible effect on SE or information—so our current setting is already robust.


Bottom Line

Metric Value / Interpretation
Precision goal SE ≤ 0.30I ≥ 11.11
Current precision SE ≈ 0.56I ≈ 3.24
Needed improvement ~3.4× increase in total information
Expected length +6 to +16 well-targeted items
Focus High-discrimination, well-targeted, low-guessing items

Only more or better-targeted information, not finer quadrature—will narrow the confidence interval and enhance measurement precision.


Stopping Rules

In Computerized Adaptive Testing (CAT), stopping rules determine when to end item administration. They balance measurement precision, content validity, and operational constraints to ensure efficient and fair testing.

Criterion Condition Purpose
Precision-based Stop when \(SE(\hat{\theta}) < \text{target (e.g., 0.30)}\) Guarantees sufficient reliability and precision of ability estimates.
Length-based Stop when \(t = L_{\max}\) (maximum number of items reached) Prevents excessively long tests; provides a hard cap on administration time.
Blueprint-based Stop when all content-area quotas or balance constraints are satisfied Ensures that the adaptive test maintains representativeness across curricular domains or skill strands.
Operational-based Stop when time limits or item exposure thresholds are reached Protects item security, test fairness, and logistical feasibility during live deployment.

Interpretation

  • Precision-based stopping is most common in psychometric research and adaptive calibration—directly tied to Fisher Information and the standard error of measurement.
  • Length-based rules act as safeguards when precision cannot be achieved (e.g., low information banks or examinees at extremes).
  • Blueprint-based and operational criteria introduce real-world constraints that preserve fairness, representativeness, and test security in large-scale assessment systems.

Together, these stopping rules integrate psychometric rigor with operational practicality, ensuring that adaptive tests remain both efficient and equitable.


Exposure and Content Constraints

In adaptive testing, item selection must balance psychometric efficiency (maximizing information) with practical constraints such as content balance and item security. Without these safeguards, high-information items may be overused, compromising fairness, validity, and test security.


1. Content Balancing

Adaptive algorithms often modify the pure information-maximization criterion by introducing penalty terms for content imbalance.
This ensures that items from each domain, skill strand, or cognitive level are represented according to blueprint specifications.

\[ \mathcal{U}_j^{\text{content}} = I_j(\hat{\theta}) - \lambda\,\text{penalty}(j), \]

where:

  • \(I_j(\hat{\theta})\): item information at current ability estimate,
  • \(\text{penalty}(j)\): deviation from desired content proportions,
  • \(\lambda\): tuning parameter controlling the trade-off between precision and blueprint adherence.

Interpretation:

  • When a domain is underrepresented, its penalty is small → items from that domain are more likely to be selected.
  • When a domain is overrepresented, penalties discourage further selection.
    This mechanism maintains content validity while still optimizing measurement precision.

2. Item Exposure Control

In Computerized Adaptive Testing (CAT), the item selection algorithm—which typically chooses items that maximize Fisher information at the current ability estimate—can inadvertently overuse a small subset of highly discriminating items. This leads to item overexposure, compromising test security and item pool longevity. To mitigate this, item exposure control mechanisms are integrated into the selection process.

Sympson–Hetter vs. Randomesque Exposure Control

Both the Sympson–Hetter (SH) and Randomesque procedures aim to prevent overexposure of highly informative items in Computerized Adaptive Testing (CAT). While both introduce randomness into item selection, they differ in how that randomness is structured and tuned.


1. Conceptual Difference

Aspect Sympson–Hetter (SH) Randomesque
Core idea Apply a probabilistic gate after selecting the most informative item. Randomly choose among the top-k most informative items before administering.
Probability control Each item \(j\) is administered with calibrated probability \(\pi_j \in (0,1]\). Selection probability is uniform (or weighted) among the top-k candidates.
Calibration Requires iterative simulation to tune \(\pi_j\) until observed exposure ≈ target rate. No calibration; randomness directly limits repeat use of the same top items.
Nature of control Item-specific (each item has its own gate). Rank-based (depends on the relative information rank within current pool).

2. Algorithmic Outline

Sympson–Hetter Exposure Control Procedure

For each item selection during CAT:
- Step 1: Select candidate item j that maximizes Uⱼ(θ̂)
- Step 2: Draw random number u ∼ U(0,1)
- Step 3: If u < πⱼ, administer item j; else reject and return to Step 1
- Step 4: Repeat until an item is administered

Calibration phase: Adjust probabilities πⱼ iteratively in CAT simulations until empirical exposure rates converge to target exposure cap rⱼ

Randomesque Exposure Control Procedure

For each item selection during CAT:
- Step 1: Rank items by selection utility Uⱼ(θ̂)
- Step 2: Randomly sample one item from top-k candidates (typically k=5)
- Step 3: Administer the selected item

Key feature: No simulation or parameter tuning required; randomness is injected directly at selection time

3. Advantages and Trade-offs

Criterion Sympson–Hetter Randomesque
Exposure control precision High — item-specific probabilities yield fine control. Moderate — exposure bounded by randomization among top-k items.
Implementation complexity Higher — requires pre-calibration via Monte-Carlo CAT simulations. Low — no pre-processing or calibration needed.
Adaptivity loss Minimal when \(\pi_j\) values are well tuned. Slight — random choice may reduce local optimality.
Fairness / pool balance Excellent under proper calibration. Good, but depends on chosen \(k\).
Runtime overhead Moderate (due to re-selection steps). Very low.

4. Practical Guidance

  • Sympson–Hetter is preferred in high-stakes or large-scale operational testing, where test security and precise exposure limits are critical and calibration resources are available.

  • Randomesque is well suited for research, pilot, or low-stakes CAT contexts, offering a simple, tunable mechanism (via \(k\)) that improves pool utilization without pre-calibration.


5. Illustrative Pseudocode

# --- Sympson–Hetter Exposure Control ---
select_SH <- function(pool, theta, pi) {
  repeat {
    idx <- which.max(pool$info(theta))
    if (runif(1) < pi[idx]) return(pool[idx, ])
    pool <- pool[-idx, ]
  }
}

# --- Randomesque Control (Top-k Selection) ---
select_randomesque <- function(pool, theta, k = 5) {
  idx_topk <- order(pool$info(theta), decreasing = TRUE)[1:k]
  chosen <- sample(idx_topk, 1)
  pool[chosen, ]
}

3. Integrative View

Constraint Type Mechanism Goal
Content balancing Penalized utility \(I_j - \lambda\,\text{penalty}(j)\) Maintain blueprint coverage
Exposure control Randomized gating with probability \(\pi_j\) Limit item overuse and ensure fairness
Combined strategies Multi-objective optimization (information + content + exposure) Optimize precision under practical constraints

Together, content balancing and exposure control extend the classical information-based CAT framework into a multi-criteria optimization problem —one that safeguards both the psychometric integrity and the operational viability of modern large-scale adaptive assessments.


Conceptual Summary

  • The Computerized Adaptive Assessment (CAA) operates as a dynamic feedback system: after each response, the current ability estimate \(\hat{\theta}\) is updated based on the observed data and item parameters.
  • At each step, the next item is selected to maximize expected information (or minimize posterior uncertainty) given the current estimate of ability and test constraints.
  • This iterative refinement continues until a stopping rule is met—typically when the standard error of measurement falls below a predefined threshold, ensuring sufficient precision.
  • The result is an individualized yet psychometrically comparable assessment: examinees receive items tailored to their ability level, achieving accurate measurement with fewer items.
  • Overall, CAAs produce shorter, more efficient, and more reliable tests, aligning psychometric theory with real-time data-driven decision-making.

Practical Considerations and Applications

Item Pool Requirements

Successful CAT implementation requires:

Requirement Description
Large calibrated pool Hundreds–thousands of items
Broad difficulty range ( )
High discrimination ( a_j > 0.8 ) preferred
Balanced content Coverage across domains

Information Profiles

Different items contribute information differently:

  • High discrimination: Tall, narrow information peaks
  • Medium difficulty: Information centered at \(b_j\)
  • Low guessing: Minimal information loss at low abilities

Optimal Test Design

To maximize measurement precision:

  1. Select high-discrimination items (\(a_j\) large)
  2. Match item difficulty to target population (\(b_j\) appropriate)
  3. Minimize guessing parameters (\(c_j\) small)
  4. Ensure content coverage across domains

Advanced Topics

Beyond unidimensional dichotomous models, CAT frameworks can extend to multidimensional or polytomous settings—offering richer measurement at the cost of increased computational complexity.

Multidimensional IRT

For assessments measuring multiple latent abilities
\(\boldsymbol{\theta} = (\theta_1, \theta_2, \ldots, \theta_K)^\top\), each item \(j\) is characterized by a discrimination vector \(\mathbf{a}_j = (a_{j1}, a_{j2}, \ldots, a_{jK})^\top\) that defines how strongly the item loads on each dimension.

The probability of a correct response is modeled as:

\[ \boxed{ P_j(\boldsymbol{\theta}) = c_j + (1 - c_j) \cdot \frac{1}{1 + \exp\Big[-D \cdot \big(\mathbf{a}_j^\top \cdot \boldsymbol{\theta} - \delta_j\big)\Big]} } \]

where:

Symbol Meaning
\(\mathbf{a}_j\) discrimination vector (direction & magnitude in ability space)
\(\delta_j\) intercept or location parameter (analogous to difficulty)
\(c_j\) lower asymptote (guessing parameter)
\(D\) scaling constant (≈ 1.702 to align logistic with normal-ogive)

Interpretation

  • The term \(\mathbf{a}_j^\top \boldsymbol{\theta}\) is the projection of ability onto the item’s discrimination direction — effectively a weighted composite of the latent traits.
  • The logistic argument \(D(\mathbf{a}_j^\top \boldsymbol{\theta} - \delta_j)\) shifts the ICC along that direction; larger \(\delta_j\) makes the item more difficult.
  • When \(K = 1\), this reduces to the standard unidimensional 3PL form:

\[ P_j(\theta) = c_j + (1 - c_j) \cdot \frac{1}{1 + e^{-D \cdot a_j \cdot (\theta - b_j)}}. \]


For a parameter vector \(\boldsymbol{\theta} \in \mathbb{R}^K\), the Fisher Information matrix is \[ \boxed{\; \mathbf{I}(\boldsymbol{\theta}) = -\,\mathbb{E}\!\left[ \frac{\partial^2 \log L(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\,\partial \boldsymbol{\theta}^\top} \right] \;} \]

Polytomous Models

For items with multiple score categories (e.g., rating scales), the Generalized Partial Credit Model:

\[ P_{jk}(\theta) = \frac{\exp\left[\displaystyle\sum_{v=1}^k D \cdot a_j \cdot (\theta - b_{jv})\right]}{\displaystyle\sum_{c=1}^m \exp\left[\displaystyle\sum_{v=1}^c D \cdot a_j \cdot (\theta - b_{jv})\right]} \]

The information function becomes more complex but follows the same principles.


Conclusion

Fisher Information provides the mathematical foundation that makes Computerized Adaptive Testing possible.
By quantifying how much each item contributes to measurement precision, CAT algorithms can:

  1. Select optimal items in real time
  2. Maximize measurement efficiency
  3. Ensure comparable precision across ability levels
  4. Reduce testing time by 50–70%

Because items near the current ability estimate \(\hat{\theta}\) contribute the most information, fewer items are needed to reach a fixed precision target, making CAT both efficient and psychometrically rigorous. As item banks grow and computational methods advance, CAT continues to bridge psychometric theory with efficient, individualized assessment practice.



Algorithmic Foundation of CAT

Building on the information-theoretic framework, a Computerized Adaptive Test dynamically selects items that maximize information at each step, tailoring the assessment to the examinee’s evolving ability estimate.


Adaptive Testing Cycle

State at step t:
- Tₜ: Administered items
- y₁₋ₜ: Observed responses  
- θ̂ₜ: Current ability estimate
- SEₜ: Current standard error

Iteration Steps:
1. Item Selection: j* = argmaxⱼ Iⱼ(θ̂ₜ) for j ∉ Tₜ
   With content balancing and exposure control

2. Administration: Deliver item j*, record response yₜ₊₁

3. Ability Estimation: Update θ̂ₜ₊₁ using EAP or MLE

4. Precision Calculation: SEₜ₊₁ = 1 / √(∑ Iⱼ(θ̂ₜ₊₁))

5. Termination Check:
   Stop if SEₜ₊₁ < SE_target OR t+1 ≥ L_max
   Example: SE_target = 0.30 (reliability ~0.91)

This Shiny app is an interactive visual laboratory for Item Response Theory (IRT). It dynamically illustrates how item and test characteristics relate to measurement precision. The app displays Item Characteristic Curves (ICCs) showing the probability of a correct response \(P(\theta)\), Item Information Functions (IIFs) showing item-level precision, and the Test Information Function (TIF) with its corresponding Conditional Standard Error of Measurement (SEM). As you adjust the parameters of the most recently added item, the app instantly recomputes and redraws all curves, letting you see exactly how discrimination, difficulty, and asymptotes shape both item behavior and overall test precision.

Use Add Item to generate new items that contribute colored IIFs to the total TIF (black line). Adjust the a, b, c, d sliders to change discrimination, difficulty, guessing, and ceiling parameters for the last item. The ICCs (gray) and IIFs (rainbow) update in real time. Experiment by moving the θ slider at the bottom to probe specific ability levels — the app updates \(P(\theta)\), \(P'(\theta)\), \(I(\theta)\), TIF, and SEM values interactively. Uncheck the Show all IIFs box to focus on test-level precision alone, or toggle Show ICC background to compare probabilities and information. Try adding or modifying items to observe how the TIF grows, how SEM shrinks, and how the distribution of item difficulties determines where the test measures most precisely.

Launch the interactive IRT explorer in one click or run it locally.

Run Locally

  1. Create an empty folder.
  2. Save the code below as app.R inside it.
  3. Open in RStudio → click Run App.

Web Version

Click on the image below

(No R install needed — hosted on shinyapps.io)

IRT Explorer Interface

#=====================================================================
  # IIF–TIF–SEM Explorer — EAP-locked θ (FAST) — Pool = 450
  # Uses custom Gauss–Hermite quadrature (no statmod)
# =====================================================================

#--------------------------------------------
# ---- PACKAGES SECTION ----
#--------------------------------------------
library(shiny)
library(dplyr)
library(tibble)
library(tidyr)
library(shinyjs)
library(shinyWidgets)
library(shape)
library(TeachingDemos)
library(later)
library(shinyBS) 
library(ggplot2)


#--------------------------------------------
# ---- Helpers & Constants ------------------
#--------------------------------------------

D.const <- 1.702
pool_size <- 450

# helpers/ui_components.R

# Metric card for displaying key statistics
metric_card <- function(title, value, subtitle = NULL, color = "#3498db", width = 4) {
  div(class = paste0("col-sm-", width),
      div(class = "metric-card",
          style = paste0("border-left: 4px solid ", color, "; background: white; padding: 15px; border-radius: 8px; margin-bottom: 15px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);"),
          h4(value, style = paste0("color: ", color, "; font-weight: 700; margin: 0 0 5px 0;")),
          h6(title, style = "color: #6c757d; margin-bottom: 0; font-weight: 600; font-size: 12px; text-transform: uppercase;"),
          if(!is.null(subtitle)) p(subtitle, style = "font-size: 0.75rem; color: #8c8c8c; margin: 5px 0 0 0;")
      )
  )
}

# Slider component for consistent controls
slider_component <- function(id, label, value, min = 0, max = 1, step = 0.05, width = "100%") {
  div(class = "slider-group",
      style = "margin-bottom: 15px;",
      h5(label, style = "font-size: 13px; font-weight: 600; color: #374151; margin-bottom: 8px;"),
      sliderInput(id, NULL, min = min, max = max, value = value, step = step, width = width)
  )
}

# Section wrapper for consistent styling
section_component <- function(..., title = NULL, icon_name = NULL, class = "") {
  div(class = paste("sb-section", class),
      if(!is.null(title)) {
        div(class = "sb-head",
            if(!is.null(icon_name)) icon(icon_name, class = "sb-icon"),
            title
        )
      },
      ...
  )
}

# Button group for consistent action buttons
button_group <- function(..., class = "button-group") {
  div(class = class, style = "display: flex; gap: 8px; justify-content: space-between; margin: 12px 0;", ...)
}

# Compact status pill for inline metrics
status_pill <- function(label, value, color = "#64748b") {
  div(style = paste0("display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0; border-top: 3px solid ", color, ";"),
      span(style = "color: #64748b; font-weight: 600; font-size: 7px;", label),
      span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", value)
  )
}

# FAQ item component for consistent FAQ structure
faq_item <- function(id, question, answer, icon_color = "#4A90E2") {
  div(
    class = "faq-item",
    style = "margin-bottom: 15px; border: 1px solid #e0e0e0; border-radius: 8px;",
    actionLink(
      inputId = paste0("toggle_", id),
      label = div(
        style = "padding: 15px 20px; background: #f8f9fa; border-radius: 8px 8px 0 0; cursor: pointer;",
        icon("question-circle", style = paste0("color: ", icon_color, "; margin-right: 10px;")),
        tags$span(question, style = "font-weight: bold; color: #2c3e50;")
      )
    ),
    div(
      id = paste0("answer_", id),
      style = "display: none; padding: 20px; background: white; border-radius: 0 0 8px 8px;",
      answer
    )
  )
}

# ===========================================
# THETA GRID TOOLBOX - GAUSS-HERMITE IMPLEMENTATION  
# ===========================================

# -------------------------
# Helper: defensive validator
# -------------------------
stop_if_bad_grid <- function(theta, w) {
  if (length(theta) != length(w)) stop("theta and w must have same length")
  if (length(theta) == 0) stop("Empty grid: length(theta) == 0")
  if (any(!is.finite(theta))) stop("Non-finite values in theta")
  if (any(!is.finite(w))) stop("Non-finite values in w")
  if (any(w < 0)) stop("Negative weights found")
  s <- sum(w)
  if (!is.finite(s) || s <= 0) stop("Weights sum to zero or non-finite")
  invisible(TRUE)
}

# -------------------------
# Golub–Welsch implementation for Gauss–Hermite
# -------------------------
gauss_hermite_gw <- function(n) {
  if (!is.numeric(n) || n < 1) stop("'n' must be a positive integer")
  n <- as.integer(n)
  if (n == 1L) return(list(nodes = 0, weights = 1))
  
  # Recurrence coefficients for probabilists' Hermite polynomials
  k <- seq_len(n - 1)
  b <- sqrt(k / 2)
  
  # Build symmetric tridiagonal Jacobi matrix
  J <- matrix(0, n, n)
  for (i in seq_len(n - 1)) {
    J[i, i + 1] <- b[i]
    J[i + 1, i] <- b[i]
  }
  
  # Eigen-decomposition: nodes = eigenvalues, weights ~ square of first row of eigenvectors
  ev <- eigen(J, symmetric = TRUE)
  nodes <- as.numeric(ev$values)
  v1 <- ev$vectors[1, ]
  weights <- as.numeric((v1^2) / sum(v1^2))
  
  # Sort nodes & weights ascending
  ord <- order(nodes)
  nodes <- nodes[ord]
  weights <- weights[ord]
  
  stop_if_bad_grid(nodes, weights)
  list(nodes = nodes, weights = weights)
}

# -------------------------
# make_theta_grid(): create uniform or GH grid
# -------------------------
make_theta_grid <- function(n = 61,
                            scheme = c("ghermite", "uniform"),
                            range = c(-4,4),
                            prior_mean = 0,
                            prior_sd = 1,
                            gh_force_odd = FALSE,
                            clip = FALSE,
                            return_meta = TRUE) {
  scheme <- match.arg(scheme)
  n_in <- as.integer(n)
  
  if (isTRUE(gh_force_odd) && scheme == "ghermite" && n %% 2 == 0) n <- n + 1L
  
  if (scheme == "uniform") {
    if (length(range) != 2 || range[1] >= range[2]) stop("Invalid 'range' for uniform grid")
    if (n < 2) stop("'n' must be >= 2 for a uniform grid")
    theta <- seq(range[1], range[2], length.out = n)
    w <- rep(1 / n, n)
  } else {
    gh <- gauss_hermite_gw(n)
    theta <- prior_mean + prior_sd * gh$nodes
    w <- gh$weights
  }
  
  stop_if_bad_grid(theta, w)
  out <- list(theta = as.numeric(theta), w = as.numeric(w))
  
  if (isTRUE(return_meta)) {
    out$meta <- list(
      scheme = scheme,
      n_request = n_in,
      n_returned = length(out$theta),
      clipped = FALSE,
      prior_mean = if (scheme == "ghermite") prior_mean else NA_real_,
      prior_sd = if (scheme == "ghermite") prior_sd else NA_real_
    )
  }
  out
}

# -------------------------
# Scale GH nodes to desired range (preserves relative spacing)
# -------------------------
scale_gh_grid <- function(grid, range = c(-4,4)) {
  X_max <- max(abs(grid$theta))
  R <- max(abs(range))
  theta_scaled <- grid$theta * R / X_max
  w_scaled <- grid$w * (X_max / R)       # adjust weight density
  w_scaled <- w_scaled / sum(w_scaled)   # renormalize
  list(theta = theta_scaled, w = w_scaled)
}

# ===========================================
# PLOTTING FUNCTIONS FOR SHINY
# ===========================================

# -------------------------
# Node spacing plot (for your grid diagnostics)
# -------------------------
plot_spacing <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  df1 <- data.frame(theta=grid1$theta[-1], dtheta=diff(grid1$theta), grid=names[1])
  df2 <- data.frame(theta=grid2$theta[-1], dtheta=diff(grid2$theta), grid=names[2])
  df <- rbind(df1, df2)
  ggplot(df, aes(theta, dtheta, color=grid)) + 
    geom_line(size=1.2) + 
    geom_point(size=2, alpha=0.8) +
    labs(x = expression(theta), y = expression(Delta*theta)) +
    scale_color_manual(values = c("#e74c3c", "#3498db")) +
    guides(color = guide_legend(nrow = 1, byrow = TRUE)) +
    coord_cartesian(xlim = xlim_range) +
    theme_minimal(base_size = 14) +
    theme(
      legend.position = "top",
      legend.title = element_blank(),
      legend.text = element_text(size = 12),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.minor = element_blank()
    )
}

# -------------------------
# Distribution comparison plot
# -------------------------
plot_density_comparison <- function(grid1, grid2, names=c("Grid 1","Grid 2"), xlim_range=c(-4,4)) {
  # Create normal distribution curve
  x_seq <- seq(xlim_range[1], xlim_range[2], length.out = 200)
  normal_density <- dnorm(x_seq)
  normal_df <- data.frame(x = x_seq, y = normal_density, distribution = "Standard Normal")
  
  # Create weighted density representations
  gh_density <- data.frame(
    x = grid1$theta,
    y = grid1$w / diff(range(grid1$theta)) * length(grid1$theta),
    distribution = names[1]
  )
  
  unif_density <- data.frame(
    x = grid2$theta,
    y = grid2$w / diff(range(grid2$theta)) * length(grid2$theta),
    distribution = names[2]
  )
  
  # Combine all data
  plot_data <- rbind(normal_df, gh_density, unif_density)
  
  ggplot(plot_data, aes(x = x, y = y, color = distribution, linetype = distribution)) +
    geom_line(data = subset(plot_data, distribution == "Standard Normal"), linewidth = 1.5) +
    geom_point(data = subset(plot_data, distribution != "Standard Normal"), size = 2, alpha = 0.7) +
    geom_line(data = subset(plot_data, distribution != "Standard Normal"), linewidth = 1, alpha = 0.7) +
    labs(
      x = expression(theta),
      y = "Density / Scaled Weights",
      title = "Distribution Comparison: Gauss-Hermite vs Uniform vs Normal"
    ) +
    scale_color_manual(values = c(
      "Standard Normal" = "#2ecc71",
      "Gauss-Hermite (31 nodes)" = "#3498db", 
      "Uniform (31 nodes)" = "#e74c3c"
    )) +
    scale_linetype_manual(values = c(
      "Standard Normal" = "solid",
      "Gauss-Hermite (31 nodes)" = "solid",
      "Uniform (31 nodes)" = "solid"
    )) +
    coord_cartesian(xlim = xlim_range) +
    theme_minimal(base_size = 14) +
    theme(
      legend.position = "top",
      legend.title = element_blank(),
      legend.text = element_text(size = 12),
      plot.margin = margin(10, 10, 10, 10),
      panel.grid.minor = element_blank(),
      plot.title = element_text(hjust = 0.5, size = 14)
    )
}

calculate_consistent_iif <- function(theta, items) {
  sapply(1:nrow(items), function(i) {
    I_3pl(theta, items$a[i], items$b[i], items$c[i])
  })
}

calculate_consistent_tif <- function(theta, items) {
  sum(calculate_consistent_iif(theta, items), na.rm = TRUE)
}

show_error_plot <- function(title, message) {
  par(mar = c(2, 2, 6, 2))
  plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
       axes = FALSE, xlab = "", ylab = "", main = "")
  
  # Error title in red
  text(0.5, 0.7, labels = paste("ERROR:", title), 
       col = "#e74c3c", cex = 1.2, font = 2)
  
  # Error message
  text(0.5, 0.5, labels = "Plot could not be rendered", 
       col = "#2c3e50", cex = 1.0)
  
  # Detailed error
  text(0.5, 0.3, labels = paste("Reason:", message), 
       col = "#2c3e50", cex = 0.8)
  
  # Instruction
  text(0.5, 0.1, labels = "Check console for details", 
       col = "#7f8c8d", cex = 0.7)
  
  # Draw a border
  rect(0.05, 0.05, 0.95, 0.95, border = "#e74c3c", lwd = 2)
}

safe_execute <- function(expr, default_value, context = "Operation") {
  tryCatch(
    expr,
    error = function(e) {
      warning(paste(context, "failed:", e$message))
      return(default_value)
    }
  )
}

safe_numeric <- function(x, default = 0, min_val = -Inf, max_val = Inf) {
  if (is.null(x) || length(x) == 0 || !is.finite(x)) {
    return(default)
  }
  pmin(pmax(x, min_val), max_val)
}

safe_length <- function(x) {
  if (is.null(x)) 0 else length(x)
}

safe_nrow <- function(x) {
  if (is.null(x) || !is.data.frame(x)) 0 else nrow(x)
}

# Global (Cached in global environment for fast access)
.GH <- local({
  gh <- gauss_hermite_quadrature(31)
  
  # SIMPLE FIX: Just normalize the weights
  normalized_weights <- gh$weights / sum(gh$weights)
  
  list(nodes = gh$nodes, weights = normalized_weights)
})

# Helper function for shadow text
shadowtext <- function(x, y, labels, col = "black", bg = "white", 
                       theta = seq(0, 2*pi, length.out = 50), r = 0.1, ...) {
  
  # Draw background text (shadow)
  for (i in 1:length(theta)) {
    text(x + r * cos(theta[i]), y + r * sin(theta[i]), 
         labels, col = bg, ...)
  }
  
  # Draw foreground text
  text(x, y, labels, col = col, ...)
}

# For smooth curves in plots
.THETA_GRID <- seq(-4, 4, length.out = 301)

# Authoritative 3PL implementation with proper error handling
P_3pl <- function(theta, a, b, c) {
  # Input validation and clamping - APPLY TO PARAMETERS, DON'T OVERWRITE
  a_clamped <- max(a, 0.5)
  c_clamped <- max(0, min(c, 0.2))
  
  # Core 3PL calculation - VECTORIZED
  exponent <- -D.const * a_clamped * (theta - b)
  p_correct <- c_clamped + (1 - c_clamped) / (1 + exp(exponent))
  
  # Ensure numerical stability
  pmin(pmax(p_correct, 1e-10), 1 - 1e-10)
}

# Vectorized derivative of 3PL function
Pprime_3pl <- function(theta, a, b, c) {
  P <- P_3pl(theta, a, b, c)
  (D.const * a / (1 - c)) * (P - c) * (1 - P)
}

# Vectorized Fisher information function for 3PL model
I_3pl <- function(theta, a, b, c) {
  safe_execute({
    P <- P_3pl(theta, a, b, c)
    
    # Additional safety for the denominator
    denom <- max(1 - c, 1e-10)  # Avoid division by zero
    Pp <- (D.const * a / denom) * (P - c) * (1 - P)
    
    info <- (Pp^2) / max(P * (1 - P), 1e-10)  # Avoid division by zero
    ifelse(is.finite(info), info, 0)
    
  }, default_value = 0, context = "Information calculation")
}

estimate_theta_eap <- function(current_items, responses, prior_mean = 0, prior_sd = 1) {
  # Input validation
  if (is.null(current_items) || nrow(current_items) == 0 || length(responses) == 0) {
    return(prior_mean)
  }
  
  n <- min(nrow(current_items), length(responses))
  if (n == 0) return(prior_mean)
  
  current_items <- current_items[1:n, , drop = FALSE]
  responses <- responses[1:n]
  
  # Use Gauss-Hermite quadrature
  nodes <- .GH$nodes
  weights <- .GH$weights
  theta_nodes <- prior_mean + prior_sd * sqrt(2) * nodes
  
  # Calculate likelihood with numerical stability
  log_likelihood <- matrix(0, nrow = length(theta_nodes), ncol = n)
  
  for (i in 1:n) {
    p_correct <- P_3pl(theta_nodes, current_items$a[i], current_items$b[i], current_items$c[i])
    
    # Log-likelihood for numerical stability
    # FIXED VERSION:
    if (responses[i] == 1) {
      log_likelihood[, i] <- log(pmax(p_correct, 1e-300))
    } else {
      log_likelihood[, i] <- log(pmax(1 - p_correct, 1e-300))
    }
  }
  
  # Sum log-likelihoods
  joint_log_likelihood <- rowSums(log_likelihood)
  
  # Convert back with log-sum-exp trick
  max_log_likelihood <- max(joint_log_likelihood)
  joint_likelihood <- exp(joint_log_likelihood - max_log_likelihood)
  
  # Calculate posterior
  posterior <- joint_likelihood * weights
  
  # Normalize posterior
  posterior <- posterior / sum(posterior)
  eap_estimate <- sum(posterior * theta_nodes)
  
  # Bound the estimate to reasonable range
  eap_estimate <- max(-4, min(4, eap_estimate))
  
  return(eap_estimate)
}

# ---- Pool & selection ------------------------------------------------
create_item_pool <- function(size = 450, 
                             a_meanlog = log(1.2),
                             a_sdlog = 0.3,
                             b_sd = 1.2,
                             c_range = c(0, 0.20)) {
  tibble::tibble(
    id    = 1:size,
    label = paste0("i", sprintf("%03d", 1:size)),
    a     = rlnorm(size, meanlog = a_meanlog, sdlog = a_sdlog),
    b     = rnorm(size, 0, b_sd),
    c     = runif(size, c_range[1], c_range[2]),
    # ADD SYMPSON-HETTER PARAMETERS
    exposure_prob = 1,  # Initial exposure probability
    admin_count = 0     # Track how many times administered
  )
}

generate_stable_colors <- function(size = pool_size) {
  
  # High-contrast, stable, non-adjacent color set
  base_colors <- c(
    # Tableau 20
    "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd",
    "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf",
    "#393b79", "#637939", "#8c6d31", "#843c39", "#7b4173",
    
    # Okabe–Ito (colorblind safe)
    "#0072B2", "#D55E00", "#009E73", "#CC79A7", "#F0E442",
    "#56B4E9", "#E69F00"
  )
  
  # Expand to desired size with non-adjacent modulo wrap
  colors <- base_colors[(seq_len(size) %% length(base_colors)) + 1]
  
  return(colors)
}

# Enhanced item selection with all three methods
select_item_enhanced <- function(pool, method, current_theta, existing_items = NULL, 
                                 m_top = 8, tau = 0.08, use_sympson_hetter = FALSE,
                                 k_value = 0.5, r_value = 0.5) {
  
  # FIX: Properly define available_pool
  available_pool <- if (!is.null(existing_items) && nrow(existing_items) > 0) {
    pool[!pool$id %in% existing_items$id, , drop = FALSE]
  } else {
    pool
  }
  
  if (nrow(available_pool) == 0) {
    return(create_fallback_item(pool))
  }
  
  # Calculate information for all available items
  info_values <- mapply(I_3pl, current_theta,
                        available_pool$a, available_pool$b, available_pool$c)
  info_values[!is.finite(info_values)] <- 0
  
  # SYMPSON-HETTER: Apply exposure control if enabled
  if (use_sympson_hetter) {
    # Apply exposure probabilities
    selection_probs <- available_pool$exposure_prob
    
    # Combine with information values
    combined_scores <- info_values * selection_probs
    
    if (method == 'random') {
      # Weight by combined scores
      idx <- sample.int(nrow(available_pool), 1, prob = combined_scores)
    } else if (method == 'randomesque') {
      # Get top M items by combined score
      top_m <- head(order(combined_scores, decreasing = TRUE), m_top)
      
      if (length(top_m) == 0) {
        return(sample.int(nrow(available_pool), 1))
      }
      
      # Calculate selection probabilities using softmax on combined scores
      top_scores <- combined_scores[top_m]
      if (all(top_scores == 0)) {
        probs <- rep(1/length(top_m), length(top_m))
      } else {
        scaled_scores <- top_scores / tau
        max_score <- max(scaled_scores)
        exp_scores <- exp(scaled_scores - max_score)
        probs <- exp_scores / sum(exp_scores)
      }
      
      selected_idx_in_top <- sample.int(length(top_m), 1, prob = probs)
      idx <- top_m[selected_idx_in_top]
    } else {
      # MFI: Select from items that pass exposure control
      # First filter by reasonable exposure probability
      reasonable_items <- available_pool[available_pool$exposure_prob >= r_value, , drop = FALSE]
      
      if (nrow(reasonable_items) > 0) {
        # Also filter by reasonable difficulty
        reasonable_items <- reasonable_items[abs(reasonable_items$b - current_theta) < 2, , drop = FALSE]
      }
      
      if (nrow(reasonable_items) > 0) {
        # Select from reasonable items
        info_reasonable <- mapply(I_3pl, current_theta,
                                  reasonable_items$a, reasonable_items$b, reasonable_items$c)
        info_reasonable[!is.finite(info_reasonable)] <- 0
        idx_in_reasonable <- which.max(info_reasonable)
        # Map back to original available_pool index
        idx <- which(available_pool$id == reasonable_items$id[idx_in_reasonable])
      } else {
        # Fallback to global maximum
        idx <- which.max(combined_scores)
      }
    }
  } else {
    # Original selection logic without exposure control
    if (method == 'random') {
      idx <- sample.int(nrow(available_pool), 1)
    } else if (method == 'randomesque') {
      idx <- select_randomesque(available_pool, info_values, m_top, tau)
    } else {
      # FIX: Define reasonable_items properly for MFI without exposure control
      reasonable_items <- available_pool[abs(available_pool$b - current_theta) < 2, , drop = FALSE]
      if (nrow(reasonable_items) > 0) {
        available_pool <- reasonable_items
        info_values <- mapply(I_3pl, current_theta,
                              available_pool$a, available_pool$b, available_pool$c)
        info_values[!is.finite(info_values)] <- 0
      }
      idx <- which.max(info_values)
    }
  }
  
  # FIX: Return the selected item from available_pool
  return(available_pool[idx, , drop = FALSE])
}

# Randomesque selection
select_randomesque <- function(pool, info_values, m_top = 8, tau = 0.08) {
  # Get top M items by information
  top_m <- head(order(info_values, decreasing = TRUE), m_top)
  
  if (length(top_m) == 0) {
    return(sample.int(nrow(pool), 1))
  }
  
  # Calculate selection probabilities using softmax
  top_info <- info_values[top_m]
  if (all(top_info == 0)) {
    # If all information is zero, use uniform distribution
    probs <- rep(1/length(top_m), length(top_m))
  } else {
    # Apply temperature scaling and softmax
    scaled_scores <- top_info / tau
    max_score <- max(scaled_scores)
    exp_scores <- exp(scaled_scores - max_score)  # Numerical stability
    probs <- exp_scores / sum(exp_scores)
  }
  
  # Sample from top M with probabilities
  selected_idx_in_top <- sample.int(length(top_m), 1, prob = probs)
  return(top_m[selected_idx_in_top])
}

# Randomesque selection
select_randomesque <- function(pool, info_values, m_top = 8, tau = 0.08) {
  # Get top M items by information
  top_m <- head(order(info_values, decreasing = TRUE), m_top)
  
  if (length(top_m) == 0) {
    return(sample.int(nrow(pool), 1))
  }
  
  # Calculate selection probabilities using softmax
  top_info <- info_values[top_m]
  if (all(top_info == 0)) {
    # If all information is zero, use uniform distribution
    probs <- rep(1/length(top_m), length(top_m))
  } else {
    # Apply temperature scaling and softmax
    scaled_scores <- top_info / tau
    max_score <- max(scaled_scores)
    exp_scores <- exp(scaled_scores - max_score)  # Numerical stability
    probs <- exp_scores / sum(exp_scores)
  }
  
  # Sample from top M with probabilities
  selected_idx_in_top <- sample.int(length(top_m), 1, prob = probs)
  return(top_m[selected_idx_in_top])
}

# Create fallback item
create_fallback_item <- function(pool) {
  new_id <- max(pool$id) + 1
  return(data.frame(
    id = new_id,
    label = paste0("i", sprintf("%03d", new_id)),
    a = 1.0,
    b = 0.0,
    c = 0.2,
    color = "#94a3b8",
    stringsAsFactors = FALSE
  ))
}

# Calculate item usage for exposure control
calculate_item_usage <- function(available_pool, existing_items) {
  if (is.null(existing_items) || nrow(existing_items) == 0) {
    return(rep(0, nrow(available_pool)))
  }
  
  # Simple usage count (in real implementation, you'd track this over sessions)
  usage <- sapply(available_pool$id, function(item_id) {
    sum(existing_items$id == item_id)
  })
  
  return(usage)
}

# Stratified item selection
select_stratified <- function(pool, current_theta, info_values, n_strata = 3) {
  # Divide items into difficulty strata
  b_breaks <- quantile(pool$b, probs = seq(0, 1, length.out = n_strata + 1))
  strata <- cut(pool$b, breaks = b_breaks, include.lowest = TRUE)
  
  # Select best item from stratum closest to current theta
  target_stratum <- which.min(abs(levels(strata) - current_theta))
  stratum_items <- which(strata == levels(strata)[target_stratum])
  
  if (length(stratum_items) > 0) {
    best_in_stratum <- stratum_items[which.max(info_values[stratum_items])]
    return(best_in_stratum)
  } else {
    # Fallback to global maximum
    return(which.max(info_values))
  }
} 

generate_responses <- function(true_theta_value, items) {
  # Input validation
  if (is.null(true_theta_value) || !is.finite(true_theta_value)) {
    stop("Invalid true_theta_value in generate_responses")
  }
  
  if (is.data.frame(items)) {
    n_items <- nrow(items)
    p_correct <- numeric(n_items)
    for(i in 1:n_items) {
      p_correct[i] <- P_3pl(true_theta_value, items$a[i], items$b[i], items$c[i])
    }
  } else if (is.list(items)) {
    n_items <- length(items$a)
    p_correct <- P_3pl(true_theta_value, items$a, items$b, items$c)
  } else {
    stop("items must be a data frame or list with a, b, c elements")
  }
  
  # Generate responses using proper binomial process
  responses <- integer(n_items)
  for(i in 1:n_items) {
    # Get current item parameters
    if (is.data.frame(items)) {
      a_val <- items$a[i]
      b_val <- items$b[i] 
      c_val <- items$c[i]
    } else {
      a_val <- items$a
      b_val <- items$b
      c_val <- items$c
    }
    
    # Calculate probability
    p_val <- P_3pl(true_theta_value, a_val, b_val, c_val)
    
    # Generate response
    responses[i] <- as.integer(runif(1) < p_val)
  }
  
  correct_count <- sum(responses)
  
  return(responses)
}

calculate_reliability <- function(information) {
  if (is.na(information) || information <= 0)
    return(0)
  information / (information + 1)
}

# FAQ Panel UI Structure
faq_panel <- function() {
  div(
    id = "faq-panel",
    style = "margin: 20px 0;",
    
    # FAQ Item 1 - SEM Increase
    faq_item(
      "faq_1",
      "Can SEM increase when a new item is added?",
      uiOutput("faq_content_1"),
      "#4A90E2"
    ),
    
    # FAQ Item 2 - Responses and Information
    faq_item(
      "faq_2",
      "Do examinee responses factor into information computation?",
      uiOutput("faq_content_2"),
      "#87CEEB"
    ),
    
    # FAQ Item 3 - EAP Stabilization
    faq_item(
      "faq_3",
      "Why does EAP θ̂ estimation stabilize around true theta even with random item selection?",
      uiOutput("faq_content_3"),
      "#9370DB"
    ),
    
    # FAQ Item 4 - SEM vs Bias
    faq_item(
      "faq_4",
      "What is the difference between SEM (precision) and bias (accuracy)?",
      uiOutput("faq_content_4"),
      "#32CD32"
    ),
    
    # FAQ Item 5 - Bias Persistence
    faq_item(
      "faq_5",
      "Why can bias persist even with long tests and low SEM?",
      uiOutput("faq_content_5"),
      "#FFA500"
    ),
    
    # FAQ Item 6 - Prior Effects
    faq_item(
      "faq_6",
      "How does the prior distribution affect EAP θ̂ estimation?",
      uiOutput("faq_content_6"),
      "#FF69B4"
    ),
    
    # FAQ Item 7 - Pool Size
    faq_item(
      "faq_7",
      "Does increasing the item pool size reduce bias?",
      uiOutput("faq_content_7"),
      "#4682B4"
    ),
    
    # FAQ Item 8 - Random vs MFI
    faq_item(
      "faq_8",
      "Why does Random (Fixed-form) stabilize while MFI (CAT) fluctuates?",
      uiOutput("faq_content_8"),
      "#4A90E2"
    )
  )
}

# ===========================================
# ---- UI DEFINITION SECTION (FULL) ----
# ===========================================

ui <- fluidPage(
  shinyjs::useShinyjs(),
  withMathJax(),
  
  tags$style(
    HTML(
      "
:root {
  /* Colors */
  --primary-blue: #3498db;
  --primary-red: #e74c3c;
  --primary-green: #2ecc71;
  --primary-orange: #f39c12;
  --primary-purple: #9370DB;

  /* Semantic Colors */
  --success: #10b981;
  --warning: #f59e0b;
  --error: #ef4444;
  --info: #3b82f6;

  /* Grayscale */
  --gray-50: #f8fafc;
  --gray-100: #f1f5f9;
  --gray-200: #e2e8f0;
  --gray-300: #cbd5e1;
  --gray-400: #94a3b8;
  --gray-500: #64748b;
  --gray-600: #475569;
  --gray-700: #334155;
  --gray-800: #1e293b;
  --gray-900: #0f172a;

  /* Spacing */
  --space-1: 4px;
  --space-2: 8px;
  --space-3: 12px;
  --space-4: 16px;
  --space-5: 20px;
  --space-6: 24px;

  /* Typography */
  --text-xs: 10px;
  --text-sm: 12px;
  --text-base: 14px;
  --text-lg: 16px;
  --text-xl: 18px;

  /* Borders */
  --radius-sm: 4px;
  --radius-base: 6px;
  --radius-lg: 8px;
  --radius-xl: 12px;

  /* Shadows */
  --shadow-sm: 0 1px 3px rgba(0,0,0,0.1);
  --shadow-base: 0 2px 4px rgba(0,0,0,0.1);
  --shadow-md: 0 4px 8px rgba(0,0,0,0.1);
}

/* ===== BASE COMPONENTS ===== */
.compact-header {
  background: white;
  border-bottom: 1px solid var(--gray-200);
  padding: var(--space-5) 0;
  margin-bottom: 30px;
  text-align: center;
}

.header-title {
  font-size: 24px;
  font-weight: 600;
  color: var(--gray-800);
  margin: 0 0 5px 0;
}

.header-subtitle {
  font-size: var(--text-base);
  color: var(--gray-500);
  margin: 0 0 10px 0;
}

.header-meta {
  font-size: var(--text-sm);
  color: var(--gray-400);
}

.meta-item + .meta-divider::before {
  content: \"•\";
  margin: 0 var(--space-2);
}

/* ===== LAYOUT COMPONENTS ===== */
.sidebar-panel {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
}

.sb-section {
  background: white;
  border-radius: var(--radius-lg);
  padding: var(--space-4);
  margin-bottom: var(--space-4);
  border: 1px solid var(--gray-200);
  border-top: 3px solid var(--primary-blue);
}

.sb-section + .sb-section {
  margin-top: var(--space-2);
}

.sb-head {
  font-weight: 600;
  color: var(--gray-800);
  margin-bottom: var(--space-3);
  font-size: var(--text-base);
  display: flex;
  align-items: center;
  border-bottom: 1px solid var(--gray-100);
  padding-bottom: var(--space-2);
}

.sb-icon {
  margin-right: var(--space-2);
  opacity: 0.7;
}

.sb-note {
  font-size: 0.75em; /* 75% of parent size */
  color: var(--gray-500);
  margin-bottom: var(--space-3);
  background: var(--gray-50);
  padding: var(--space-2);
  border-radius: var(--radius-sm);
  border-left: 3px solid var(--gray-400);
}

/* ===== BUTTONS & CONTROLS ===== */
.btn-sm {
  padding: 4px 12px !important;
  font-size: var(--text-sm) !important;
  margin-left: var(--space-2);
}

.button-group {
  display: flex;
  gap: var(--space-2);
  justify-content: space-between;
  margin: var(--space-3) 0;
}

.slider-group {
  margin-bottom: 15px;
  padding-bottom: 15px;
  border-bottom: 1px solid #e5e7eb;
}

.slider-group:last-child {
  border-bottom: none;
  padding-bottom: 0;
}

.slider-group h5 {
  margin: 0 0 var(--space-2) 0;
  font-size: var(--text-sm);
  font-weight: 600;
  color: var(--gray-700);
}

/* ===== METRICS & CARDS ===== */
.metrics-container {
  display: flex;
  gap: var(--space-4);
}

.metrics-grid-expanded {
  display: flex;
  flex-direction: column;
  gap: var(--space-2);
  margin-top: var(--space-4);
}

.metric-row {
  display: flex;
  gap: var(--space-2);
}

.metric-cell {
  flex: 1;
  background: var(--gray-50);
  border-radius: 10px;
  padding: var(--space-2) 6px;
  border: 2px solid transparent;
  text-align: center;
  box-shadow: var(--shadow-sm);
  transition: all 0.2s ease;
}

.metric-cell:hover {
  transform: translateY(-1px);
  box-shadow: var(--shadow-md);
}

/* Metric color variants */
.metric-cell.true-theta { border-color: #1d4ed8; background: linear-gradient(135deg, #eff6ff, #dbeafe); }
.metric-cell.eap-theta { border-color: var(--success); background: linear-gradient(135deg, #ecfdf5, #d1fae5); }
.metric-cell.bias { border-color: var(--error); background: linear-gradient(135deg, #fef2f2, #fee2e2); }
.metric-cell.pool-size { border-color: #8b5cf6; background: linear-gradient(135deg, #faf5ff, #ede9fe); }
.metric-cell.items-used { border-color: var(--warning); background: linear-gradient(135deg, #fffbeb, #fef3c7); }
.metric-cell.reliability { border-color: #06b6d4; background: linear-gradient(135deg, #ecfeff, #cffafe); }

.metric-label {
  font-size: var(--text-xs);
  color: var(--gray-500);
  font-weight: 600;
  margin-bottom: var(--space-1);
  line-height: 1.2;
  text-transform: uppercase;
  letter-spacing: 0.5px;
}

.metric-value {
  font-size: var(--text-base);
  font-weight: 700;
  line-height: 1.2;
}

/* Color-coded values */
.true-theta .metric-value { color: #1d4ed8; }
.eap-theta .metric-value { color: #047857; }
.bias .metric-value { color: #dc2626; }
.pool-size .metric-value { color: #7c3aed; }
.items-used .metric-value { color: #d97706; }
.reliability .metric-value { color: #0d9488; }

/* ===== STATUS PILLS ===== */
#status-pills-container {
  background: linear-gradient(135deg, var(--gray-50) 0%, var(--gray-100) 100%);
  border: 1px solid var(--gray-200);
  border-radius: var(--radius-lg);
  padding: var(--space-3);
  margin-top: var(--space-4);
}

#status-pills-container .status-pill {
  display: flex !important;
  flex-direction: column !important;
  align-items: center !important;
  gap: 2px !important;
  background: white !important;
  border-radius: 12px !important;
  padding: 6px 10px !important;
  border: 1px solid var(--gray-200) !important;
  font-size: var(--text-xs) !important;
  font-weight: 600 !important;
  white-space: nowrap !important;
  box-shadow: var(--shadow-sm) !important;
  min-width: 55px !important;
}

#status-pills-container .pill-label {
  color: var(--gray-500) !important;
  font-weight: 600 !important;
  font-size: 9px !important;
  text-transform: uppercase !important;
  letter-spacing: 0.3px !important;
}

#status-pills-container .pill-value {
  color: var(--gray-800) !important;
  font-weight: 700 !important;
  font-size: 11px !important;
}

/* Status pill color variants */
#status-pills-container .true-theta-pill { border-top: 3px solid #1d4ed8 !important; }
#status-pills-container .eap-theta-pill { border-top: 3px solid #047857 !important; }
#status-pills-container .bias-pill { border-top: 3px solid #dc2626 !important; }
#status-pills-container .items-pill { border-top: 3px solid #d97706 !important; }
#status-pills-container .reliability-pill { border-top: 3px solid #0d9488 !important; }
#status-pills-container .sem-pill { border-top: 3px solid #7c3aed !important; }

/* ===== COMPACT METRICS ===== */
.metrics-grid-compact {
  display: flex;
  gap: var(--space-2);
  justify-content: center;
  flex-wrap: nowrap;
}

.metric-row-compact {
  display: flex;
  gap: var(--space-2);
}

.metric-cell-compact {
  background: white;
  border-radius: var(--radius-base);
  padding: 4px 6px;
  border: 1px solid var(--gray-200);
  text-align: center;
  min-width: 60px;
}

.metric-label-compact {
  font-size: 9px;
  color: var(--gray-500);
  font-weight: 600;
  margin-bottom: 2px;
  line-height: 1;
}

.metric-value-compact {
  font-size: 11px;
  font-weight: 700;
  line-height: 1;
}

/* ===== MODALS ===== */
.modal-dialog {
  max-width: 700px;
}

.modal-body {
  padding: var(--space-4);
}

.modal-header {
  padding: var(--space-3) var(--space-4);
  background: linear-gradient(135deg, var(--primary-blue), #1d4ed8);
  color: white;
  border-bottom: none;
}

.modal-title {
  font-size: var(--text-lg) !important;
  font-weight: 600;
}

.modal-footer {
  padding: var(--space-4);
  border-top: 1px solid var(--gray-200);
  text-align: right;
}

.btn-primary {
  background: var(--primary-blue) !important;
  border: none !important;
  padding: 8px 20px !important;
  font-weight: 600 !important;
}

/* ===== CONTENT SECTIONS ===== */
.concepts-single-column {
  max-width: 800px;
  margin: 0 auto;
}

.concept-section {
  background: white;
  border: 1px solid var(--gray-200);
  border-radius: var(--radius-lg);
  padding: 25px;
  margin-bottom: var(--space-5);
}

.section-header {
  border-bottom: 2px solid var(--primary-blue);
  padding-bottom: var(--space-3);
  margin-bottom: var(--space-5);
}

.section-header h2 {
  margin: 0;
  color: var(--gray-800);
  font-size: var(--text-lg);
}

.section-content p {
  line-height: 1.6;
  margin-bottom: 15px;
}

.section-content h4 {
  color: var(--gray-800);
  margin: 25px 0 15px 0;
  font-size: var(--text-base);
  border-bottom: 1px solid var(--gray-200);
  padding-bottom: 5px;
}

/* ===== EDUCATIONAL ELEMENTS ===== */
.key-eq {
  background: var(--gray-50);
  padding: var(--space-4);
  margin: 15px 0;
  border-left: 4px solid var(--primary-blue);
  border-radius: var(--radius-sm);
  text-align: center;
  overflow-x: auto;
}

.eap-eq {
  background: #f0f8ff;
  border-left-color: #2980b9;
}

.edu-note {
  background: #e8f4fd;
  border: 1px solid var(--primary-blue);
  border-radius: var(--radius-base);
  padding: var(--space-4);
  margin: var(--space-5) 0;
}

.edu-note ul,
.parameter-item ul,
.insight-item ul,
.strategy-details ul {
  margin: 10px 0;
  padding-left: 20px;
}

.edu-note li,
.parameter-item li,
.insight-item li,
.strategy-details li {
  margin-bottom: var(--space-2);
  line-height: 1.5;
  color: #5d6d7e;
}

/* ===== TABLES ===== */
.precision-table {
  width: 100%;
  border-collapse: collapse;
  margin: 15px 0;
  font-size: var(--text-base);
}

.precision-table th {
  background: var(--gray-700);
  color: white;
  padding: 10px;
  text-align: left;
  font-weight: 600;
}

.precision-table td {
  padding: 10px;
  border-bottom: 1px solid #ecf0f1;
}

.current-standard-row {
  background: #fff3cd;
  font-weight: bold;
}

/* ===== WORKFLOW ===== */
.workflow-simple {
  margin: var(--space-5) 0;
}

.workflow-step {
  display: flex;
  align-items: flex-start;
  margin-bottom: 15px;
  padding: var(--space-4);
  background: var(--gray-50);
  border-radius: var(--radius-base);
  border: 1px solid var(--gray-200);
}

.step-number {
  background: var(--primary-blue);
  color: white;
  width: 30px;
  height: 30px;
  border-radius: 50%;
  display: flex;
  align-items: center;
  justify-content: center;
  font-weight: bold;
  margin-right: 15px;
  flex-shrink: 0;
  font-size: var(--text-base);
}

.step-content p {
  margin: 0;
}

/* ===== STRATEGY COMPARISON ===== */
.strategy-comparison {
  display: grid;
  grid-template-columns: 1fr;
  gap: 25px;
  margin: 25px 0;
}

.strategy-card {
  padding: 25px;
  border-radius: var(--radius-lg);
  border: 2px solid;
  background: white;
}

.strategy-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 15px;
  padding-bottom: 15px;
  border-bottom: 1px solid var(--gray-200);
}

.strategy-header h3 {
  margin: 0;
  font-size: var(--text-lg);
  color: var(--gray-800);
}

.strategy-description {
  margin-bottom: var(--space-5);
  padding: var(--space-4);
  background: white;
  border-radius: var(--radius-base);
  border-left: 4px solid var(--primary-blue);
}

.strategy-stats {
  display: flex;
  gap: var(--space-5);
  margin: var(--space-5) 0;
  padding: var(--space-5);
  background: white;
  border-radius: var(--radius-base);
  border: 1px solid var(--gray-200);
}

.stat {
  text-align: center;
  flex: 1;
}

.stat-value {
  font-size: 20px;
  font-weight: bold;
  color: var(--gray-800);
  margin-bottom: 5px;
}

.mfi-strategy .stat-value {
  color: var(--success);
}

.random-strategy .stat-value {
  color: var(--gray-400);
}

.stat-label {
  font-size: var(--text-sm);
  color: var(--gray-500);
  text-transform: uppercase;
  letter-spacing: 0.5px;
  font-weight: 600;
}

.strategy-details {
  margin-top: 15px;
}

/* ===== CODE BLOCKS ===== */
.code-block {
  background: linear-gradient(135deg, #1e1e1e 0%, #2d2d2d 100%);
  border: 1px solid #3e3e3e;
  border-radius: var(--radius-base);
  padding: var(--space-4);
  margin: var(--space-3) 0;
  position: relative;
  box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3);
}

.code-block::before {
  content: '';
  position: absolute;
  top: 0;
  left: 0;
  right: 0;
  height: 2px;
  background: linear-gradient(90deg, #569cd6, #ce9178, #b5cea8, #dcdcaa);
  border-radius: var(--radius-base) var(--radius-base) 0 0;
}

.code-block pre {
  margin: 0;
  white-space: pre-wrap;
  word-wrap: break-word;
  font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
  font-size: var(--text-sm);
  line-height: 1.5;
  color: #d4d4d4;
}

/* Syntax Highlighting */
.code-block .keyword { color: #569cd6; font-weight: 600; }
.code-block .function { color: #dcdcaa; }
.code-block .string { color: #ce9178; }
.code-block .comment { color: #6a9955; font-style: italic; }
.code-block .number { color: #b5cea8; }
.code-block .operator { color: #d4d4d4; }
.code-block .type { color: #4ec9b0; }
.code-block .variable { color: #9cdcfe; }
.code-block .constant { color: #4fc1ff; }

/* Scrollbar Styling */
.code-block::-webkit-scrollbar {
  height: 10px;
}

.code-block::-webkit-scrollbar-track {
  background: #2d2d2d;
  border-radius: var(--radius-sm);
  margin: 0 var(--space-2);
}

.code-block::-webkit-scrollbar-thumb {
  background: linear-gradient(135deg, #555 0%, #666 100%);
  border-radius: var(--radius-sm);
  border: 2px solid #2d2d2d;
}

.code-block::-webkit-scrollbar-thumb:hover {
  background: linear-gradient(135deg, #666 0%, #777 100%);
}

/* Code Block Header */
.code-block-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: var(--space-2) var(--space-3);
  background: #252526;
  border-bottom: 1px solid #3e3e3e;
  border-radius: var(--radius-base) var(--radius-base) 0 0;
  font-family: system-ui, -apple-system, sans-serif;
  font-size: var(--text-xs);
  color: #969696;
}

.code-block-title {
  font-weight: 600;
  color: #d4d4d4;
}

.code-block-copy {
  background: #424242;
  border: 1px solid #5a5a5a;
  border-radius: var(--radius-sm);
  padding: 2px 8px;
  font-size: var(--text-xs);
  color: #d4d4d4;
  cursor: pointer;
  transition: all 0.2s ease;
}

.code-block-copy:hover {
  background: #4a4a4a;
  border-color: #6a6a6a;
}

/* ===== ALGORITHM SECTIONS ===== */
.algorithm-section,
.stopping-section {
  background: linear-gradient(135deg, var(--gray-50) 0%, var(--gray-100) 100%);
  border: 1px solid var(--gray-200);
  border-radius: var(--radius-base);
  padding: var(--space-4);
  margin-bottom: var(--space-3);
  position: relative;
  box-shadow: 0 2px 8px rgba(0, 0, 0, 0.05);
}

.algorithm-section::before,
.stopping-section::before {
  content: '';
  position: absolute;
  top: 0;
  left: 0;
  bottom: 0;
  width: 4px;
  background: linear-gradient(180deg, var(--blue-500) 0%, var(--purple-500) 100%);
  border-radius: var(--radius-base) 0 0 var(--radius-base);
}

.algorithm-section .sb-head,
.stopping-section .sb-head {
  font-size: var(--text-sm);
  font-weight: 700;
  margin-bottom: var(--space-2);
  color: var(--gray-900);
  display: flex;
  align-items: center;
  gap: var(--space-2);
}

.algorithm-section .sb-head::before,
.stopping-section .sb-head::before {
  font-size: var(--text-xs);
  color: var(--blue-500);
}

.control-section {
  padding-bottom: 2px !important;
}

.control-group {
  margin-bottom: 2px;
}

.control-group h5 {
  font-size: 11px;
  font-weight: 600;
  color: var(--gray-800);
  margin-bottom: 4px;
}

/* ===== FORM CONTROLS ===== */
.numeric-input {
  margin-bottom: 2px;
}

.form-group {
  margin-bottom: 4px;
}

.form-control {
  padding: 2px 6px;
  font-size: 11px;
  height: 26px;
}

.selectize-control {
  margin-bottom: 2px;
}

.selectize-input {
  padding: 2px 6px !important;
  font-size: 11px !important;
  min-height: 26px !important;
}

/* ===== CONCLUSION HIGHLIGHT ===== */
.conclusion-highlight {
  background: #fff3cd;
  border: 1px solid #ffc107;
  border-radius: var(--radius-base);
  padding: var(--space-5);
  margin-top: var(--space-5);
  text-align: center;
}

.conclusion-highlight p {
  margin: 0;
  font-style: italic;
}

/* ===== RESPONSIVE DESIGN ===== */
@media (min-width: 768px) {
  .strategy-comparison {
    grid-template-columns: 1fr 1fr;
  }

  .parameter-details, .mathematical-insights {
    margin-left: 10px;
    margin-right: 10px;
  }

  .slider-columns {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: var(--space-3);
  }
}

@media (max-width: 768px) {
  .code-tab-content {
    padding: var(--space-3);
  }

  .algorithm-card {
    margin-bottom: 16px;
    border-radius: var(--radius-lg);
  }

  .algorithm-header {
    padding: var(--space-3) var(--space-4);
    font-size: var(--text-base);
  }

  .code-block {
    padding: var(--space-4);
    font-size: var(--text-sm);
  }

  .slider-columns,
  .metrics-container,
  .strategy-stats {
    flex-direction: column;
    gap: var(--space-2);
  }

  .strategy-header {
    flex-direction: column;
    align-items: flex-start;
    gap: var(--space-3);
  }

  .concept-section {
    padding: var(--space-5) 15px;
  }

  .parameter-item,
  .insight-item {
    padding: var(--space-3);
  }

  .metrics-two-columns {
    grid-template-columns: 1fr;
  }
}

/* ===== ANIMATIONS ===== */
@keyframes electric-pulse {
  0% {
    border-color: #00ffff;
    box-shadow: 0 0 10px #00ffff, 0 0 20px #00ffff;
    background: linear-gradient(135deg, #000428, #004e92);
  }
  50% {
    border-color: #ff00ff;
    box-shadow: 0 0 20px #ff00ff, 0 0 40px #ff00ff;
    background: linear-gradient(135deg, #004e92, #8a2be2);
  }
  100% {
    border-color: #00ffff;
    box-shadow: 0 0 10px #00ffff, 0 0 20px #00ffff;
    background: linear-gradient(135deg, #000428, #004e92);
  }
}

.electric-pulse {
  animation: electric-pulse 1.5s ease-in-out infinite;
}
")
  ),
  
  
  # ---- Title Row ----
  fluidRow(column(12, div(class = "app-title", uiOutput("dynamic_title")))), 
  
  # ---- Layout ----
  div(
    id = "layout-wrapper",
    
    # Streamlined Sidebar
    sidebarPanel(
      width = 4,
      class = "sidebar-panel",
      
      tabsetPanel(
        id = "sidebar_tabs",
        type = "tabs",
        
        # Tab 1: Test Control (Streamlined)
        tabPanel(
          title = tagList(icon("play-circle"), "Test Control"),
          
          section_component(
            title = "Test Administration",
            icon_name = "cogs",
            class = "control-section",
            
            # Selection Strategy
            section_component(
              title = "Selection Strategy",
              icon_name = "sliders-h", 
              class = "algorithm-section",
              
              selectInput(
                "sel_method",
                "",
                choices = c(
                  "Maximum Fisher Information" = "mfi",
                  "Randomesque" = "randomesque", 
                  "Fixed-form (Random)" = "random"
                ),
                selected = "mfi",
                width = "100%"
              ),
              
              # Randomesque Settings
              conditionalPanel(
                condition = "input.sel_method == 'randomesque'",
                fluidRow(
                  column(6, numericInput("m_top", "Top M Items", value = 8, min = 2, max = 20, width = "100%")),
                  column(6, numericInput("tau", "Temperature", value = 0.08, min = 0.01, max = 0.5, step = 0.01, width = "100%"))
                )
              ),
              
              # Sympson-Hetter Settings
              conditionalPanel(
                condition = "input.sel_method == 'mfi' || input.sel_method == 'randomesque'",
                checkboxInput("use_sympson_hetter", "Sympson-Hetter Exposure", value = FALSE),
                
                conditionalPanel(
                  condition = "input.use_sympson_hetter == true && (input.sel_method == 'mfi' || input.sel_method == 'randomesque')",
                  fluidRow(
                    column(6, numericInput("k_value", "K Value", value = 0.5, min = 0.1, max = 1.0, step = 0.1, width = "100%")),
                    column(6, numericInput("r_value", "R Value", value = 0.5, min = 0.1, max = 1.0, step = 0.1, width = "100%"))
                  )
                )
              )
            ),
            
            # Stopping Rules
            section_component(
              title = "Stopping Rules", 
              icon_name = "flag",
              class = "stopping-section",
              
              fluidRow(
                column(6, numericInput("sem_target", "SEM Target", value = 0.300, min = 0.15, max = 0.50, step = 0.05, width = "100%")),
                column(6, numericInput("max_items", "Max Items", value = 30, min = 10, max = 50, step = 5, width = "100%"))
              ),
              uiOutput("sem_progress_bar")
            ),
            
            # Action Buttons
            div(class = "button-group-expanded",
                actionButton("add_item", "Item", class = "btn-primary", icon("plus-circle"), width = "48%"),
                actionButton("reset_all", "Test", class = "btn-danger", icon = icon("refresh"), width = "48%")
            )
          )
        ),
        
        # Tab 2: Display Controls (Streamlined)
        tabPanel(
          title = tagList(icon("sliders-h"), "Display"),
          
          section_component(
            title = "Visualization Controls",
            icon_name = "palette",
            class = "display-section",
            
            div(class = "sb-note", "Adjust layer opacity and display settings. Use opacity to focus on specific curves: decrease to fade into background, set to 0 to remove from view."),
            div(class = "sb-note", "Annotation Text Size controls the size of value labels and probe indicators"),
            
            # Single column layout
            slider_component("iif_layer_alpha", "Item Information", 0.20),
            slider_component("icc_layer_alpha", "ICC Curves", 0.20),
            slider_component("tif_alpha", "Test Information", 0.90),
            slider_component("sem_alpha", "Standard Error", 0.80),
            slider_component("annotation_size", "Annotation Text Size", 1.0, 0.7, 1.7, 0.1)
          )
        )     
      )
    ),
    
    # Streamlined Main Panel
    mainPanel(
      width = 8,
      class = "mainPanel",
      tabsetPanel(
        id = "main_tabs",
        type = "tabs",
        
        # Visualization Tab
        tabPanel(
          title = tagList(icon("chart-line"), "Visualization"),
          div(class = "plot-wrap", plotOutput("plot_all", height = "450px")),
          uiOutput("compact_status_pills")
        ),
        
        # Reference Tab
        tabPanel(
          title = tagList(icon("book"), "Reference"),
          div(class = "reference-content",
              tabsetPanel(
                id = "reference_tabs",
                type = "tabs",
                
                tabPanel(
                  title = tagList(icon("info-circle"), "Information"),
                  uiOutput("reference_content")
                ),
                
                # FAQ Tab
                tabPanel(
                  title = tagList(icon("question-circle"), "FAQ"),
                  div(class = "faq-container", style = "padding: 20px;", faq_panel())
                ),
                
                # Code Tab
                tabPanel(
                  title = tagList(icon("code"), "Code"), 
                  uiOutput("code_content")
                ),
                
                # Pool Metrics Tab
                tabPanel(
                  title = tagList(icon("database"), "Pool Metrics"),
                  fluidRow(
                    column(12,
                           section_component(
                             title = "Parameter Distributions",
                             class = "edu-card",
                             plotOutput("pool_plot_a", height = "400px"),
                             plotOutput("pool_plot_b", height = "400px"), 
                             plotOutput("pool_plot_c", height = "400px")
                           )
                    )
                  ),
                  fluidRow(
                    column(12,
                           section_component(
                             title = "Information & TIF Analysis", 
                             class = "edu-card",
                             plotOutput("pool_plot_info0", height = "400px"),
                             plotOutput("pool_plot_tif", height = "400px"),
                             plotOutput("pool_plot_sem", height = "400px")
                           )
                    )
                  ),
                  fluidRow(
                    column(12,
                           section_component(
                             title = "Pool Quality Metrics",
                             class = "edu-card", 
                             uiOutput("pool_quality_metrics")
                           )
                    )
                  )
                ),
                
                # Test Diagnostics Tab
                tabPanel(
                  title = tagList(icon("chart-bar"), "Test Diagnostics"),
                  div(class = "reference-content", style = "padding: 16px;",
                      fluidRow(
                        column(12, section_component(
                          title = "Theta Estimation Progression with 95% CI",
                          class = "edu-card",
                          plotOutput("theta_progression_plot", height = "400px")
                        ))
                      ),
                      fluidRow(
                        column(12, section_component(
                          title = "Item-Person Targeting Efficiency", 
                          class = "edu-card",
                          plotOutput("targeting_efficiency_plot", height = "400px")
                        ))
                      ),
                      fluidRow(
                        column(12, section_component(
                          title = "Measurement Precision and Reliability",
                          class = "edu-card",
                          plotOutput("sem_progression_plot", height = "400px")  
                        ))
                      ),
                      fluidRow(
                        column(12, section_component(
                          title = "Information Growth & Efficiency",
                          class = "edu-card",
                          plotOutput("information_growth_plot", height = "400px")
                        ))
                      ),
                      fluidRow(
                        column(12, section_component(
                          title = "Bias & Standardized Bias Analysis",
                          class = "edu-card", 
                          plotOutput("bias_analysis_plot", height = "400px")
                        ))
                      ),
                      fluidRow(
                        column(12, section_component(
                          title = "Item Selection Pattern",
                          class = "edu-card",
                          plotOutput("item_selection_plot", height = "400px")
                        ))
                      ),
                      fluidRow(
                        column(12, section_component(
                          title = "Current Test Performance Summary",
                          class = "edu-card",
                          uiOutput("current_test_metrics")
                        ))
                      )
                  )
                )
              )
          )
        )
      )
    ),
    
    # Summary Table Container
    uiOutput("summary_table_container")
  ) # Close fluidPage
) # Close ui assignment

# ===========================================
# ---- SERVER LOGIC SECTION ----
# ===========================================

server <- function(input, output, session) {
  
  # --- State ----------------------------------------------------------
  initial_modal_completed <- reactiveVal(FALSE)
  bias_alert_shown <- reactiveVal(FALSE)
  bias_history <- reactiveVal(numeric(0))
  items        <- reactiveVal()
  responses    <- reactiveVal()
  target_met   <- reactiveVal(FALSE)
  final_sem_value <- reactiveVal(NA_real_)
  true_theta   <- reactiveVal(rnorm(1, 0, 1))
  item_pool    <- reactiveVal()
  color_palette <- reactiveVal()
  app_initialized <- reactiveVal(FALSE)
  
  current_sem <- reactive({
    cm <- cumulative_metrics_cache()
    if (is.null(cm) || length(cm) == 0) return(NA_real_)
    tail(cm, 1)[[1]]$sem
  })
  
  current_items <- reactive({
    ib <- items()
    if (is.null(ib)) return(0)
    nrow(ib)
  })
  
  eap_score <- reactive({
    eap_theta()
  })
  
  verify_tif_curve <- function() {
    ib <- items()
    curves <- per_item_curves()
    
    if (is.null(ib) || nrow(ib) == 0 || is.null(curves)) return()
    
    # Method 1: Direct calculation at each theta point
    theta_grid <- .THETA_GRID
    manual_tif <- numeric(length(theta_grid))
    
    for(i in 1:length(theta_grid)) {
      total_info <- 0
      for(j in 1:nrow(ib)) {
        item_info <- I_3pl(theta_grid[i], ib$a[j], ib$b[j], ib$c[j])
        if(is.finite(item_info)) {
          total_info <- total_info + item_info
        }
      }
      manual_tif[i] <- total_info
    }
    
    # Method 2: Current stored TIF curve
    stored_tif <- tif_curve()
    
    # Check differences
    if (length(manual_tif) == length(stored_tif)) {
      max_diff <- max(abs(manual_tif - stored_tif), na.rm = TRUE)
      mean_diff <- mean(abs(manual_tif - stored_tif), na.rm = TRUE)
    }
    
    # Also verify current theta point - FIXED: Use proper variable access
    th0 <- eap_theta()
    curve_tif_at_theta <- approx(theta_grid, stored_tif, xout = th0, rule = 2)$y
    
    # Get calculated TIF from cache to ensure consistency with table
    cm <- cumulative_metrics_cache()
    if (!is.null(cm) && length(cm) > 0) {
      current_metrics <- cm[[length(cm)]]
      calculated_tif_at_theta <- current_metrics$tif
    } else {
      calculated_tif_at_theta <- calculate_tif_at_theta(th0, ib)
    }
  }
  
  # Consistent IIF calculation at specific theta
  calculate_iif_at_theta <- function(theta, items_df) {
    sapply(1:nrow(items_df), function(i) {
      I_3pl(theta, items_df$a[i], items_df$b[i], items_df$c[i])
    })
  }
  
  # Consistent TIF calculation at specific theta  
  calculate_tif_at_theta <- function(theta, items_df) {
    iif_vals <- calculate_iif_at_theta(theta, items_df)
    sum(iif_vals, na.rm = TRUE)
  }
  
  # Consistent SEM calculation at specific theta
  calculate_sem_at_theta <- function(theta, items_df) {
    tif_val <- calculate_tif_at_theta(theta, items_df)
    if (tif_val > 0) 1 / sqrt(tif_val) else NA_real_
  }
  
  # ===========================================
  # FIXED MODAL HELPER FUNCTIONS
  # ===========================================
  
  create_modal <- function(title, content, footer = NULL, size = "m", easyClose = FALSE, fade = TRUE) {
    modalDialog(
      title = title,
      content = content,
      footer = footer,
      size = size,
      easyClose = easyClose,
      fade = fade
    )
  }
  
  # ===========================================
  # FIXED SUCCESS MODAL FUNCTION
  # ===========================================
  
  create_success_modal <- function(title, message, metrics_content = NULL, action_buttons = NULL) {
    # Build the content properly
    if (!is.null(metrics_content)) {
      base_content <- tagList(
        div(style = "text-align: center; margin-bottom: 20px;",
            h4(title, style = "color: #059669; margin-bottom: 10px;"),
            p(style = "font-size: 18px; font-weight: bold;", message)
        ),
        metrics_content  # Include metrics_content directly
      )
    } else {
      base_content <- tagList(
        div(style = "text-align: center; margin-bottom: 20px;",
            h4(title, style = "color: #059669; margin-bottom: 10px;"),
            p(style = "font-size: 18px; font-weight: bold;", message)
        )
      )
    }
    
    # Footer buttons
    footer <- if (!is.null(action_buttons)) {
      action_buttons
    } else {
      tagList(
        actionButton("modal_continue", "Continue", class = "btn-success"),
        modalButton("Close")
      )
    }
    
    create_modal(
      title = tagList(icon("check-circle"), title),
      content = base_content,
      footer = footer
    )
  }
  
  # Caches
  per_item_curves <- reactiveVal(list())
  tif_curve        <- reactiveVal(rep(0, length(.THETA_GRID)))
  cumulative_metrics_cache <- reactiveVal(NULL)
  
  # === MODAL HANDLING ===
  initial_modal_shown <- reactiveVal(FALSE)
  
  # Helper function to show initial modal
  show_initial_modal <- function() {
    req(items())
    first_item <- items()[1, ]
    true_theta_val <- true_theta()
    eap_theta_val <- eap_theta()
    
    # Calculate probability of correct response
    p_correct <- P_3pl(true_theta_val, first_item$a, first_item$b, first_item$c)
    
    showModal(modalDialog(
      title = NULL,
      easyClose = FALSE,
      fade = TRUE,
      footer = actionButton(
        "begin_test",
        label = tagList(
          icon("play-circle", style = "margin-right: 6px;"),
          "BEGIN ASSESSMENT"
        ),
        style = "background: linear-gradient(135deg, #0066cc, #004499); color: white; font-weight: bold; padding: 6px 12px; border-radius: 6px; border: none; box-shadow: 0 2px 4px rgba(0, 102, 204, 0.3);"
      ),
      size = "l",  # Changed back to large since xl might not be working
      style = "max-width: 800px;",  # ADD CUSTOM WIDTH
      tagList(
        div(
          style = "text-align: center; padding: 0;",
          
          # Header section
          div(
            style = "background: linear-gradient(135deg, #0066cc, #004499); color: white; padding: 12px; border-radius: 6px; margin-bottom: 10px; text-align: center;",
            h4(
              style = "margin: 0; font-weight: 700; font-size: 16px;",
              "Test Simulation Initialized"
            )
          ),
          
          # Test Parameters - COMPACT VERSION
          div(
            style = "background: #f8f9fa; padding: 10px; border-radius: 6px; margin-bottom: 10px; border: 1px solid #e9ecef;",
            div(
              style = "text-align: center; margin-bottom: 6px;",
              span(style = "font-size: 13px; color: #495057; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px;", 
                   "Test Parameters")
            ),
            
            div(
              style = "text-align: center; margin-bottom: 6px; display: flex; justify-content: center; align-items: center; gap: 8px; flex-wrap: nowrap;",  # CHANGED TO FLEX
              span(style = "font-size: 12px; color: #6c757d; white-space: nowrap;", "• Item Pool:"),
              span(sprintf("%d items", pool_size), style = "color: #0066cc; font-weight: 700; font-size: 11px; background: #e6f3ff; padding: 2px 6px; border-radius: 8px; white-space: nowrap;"),
              span(style = "font-size: 12px; color: #6c757d; white-space: nowrap;", "• Target SEM:"),
              span("≤0.30", style = "color: #dc3545; font-weight: 700; font-size: 11px; background: #ffe6e6; padding: 2px 6px; border-radius: 8px; border: 1px solid #ffcccc; white-space: nowrap;"),
              span(style = "font-size: 12px; color: #6c757d; white-space: nowrap;", "• Method:"),
              span(
                if (input$sel_method == "mfi") {
                  "MFI"
                } else if (input$sel_method == "randomesque") {
                  "Randomesque"
                } else {
                  "Fixed-form"
                }, 
                style = "color: #0066cc; font-weight: 700; font-size: 11px; background: #e6f3ff; padding: 2px 6px; border-radius: 8px; white-space: nowrap;"
              )
            )       
          ),
          
          # Person Parameters
          div(
            style = "background: #f0f8ff; padding: 10px; border-radius: 6px; margin-bottom: 10px; border: 1px solid #d1e7ff;",
            div(
              style = "text-align: center; margin-bottom: 6px;",
              span(style = "font-size: 13px; color: #0066cc; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px;", 
                   "Person Parameters")
            ),
            div(
              style = "text-align: center;",
              span(style = "font-size: 12px; color: #0066cc;", "• True θ:"),
              span(sprintf(" %.2f", true_theta_val), style = "color: #004499; font-weight: 700; font-size: 12px; background: white; padding: 2px 8px; border-radius: 10px; margin-left: 4px; margin-right: 8px;"),
              span(style = "font-size: 12px; color: #0066cc;", "• P(θ):"),
              span(sprintf(" %.2f", p_correct), style = "color: #e67e22; font-weight: 700; font-size: 12px; background: #fff5e6; padding: 2px 8px; border-radius: 10px; margin-left: 4px; margin-right: 8px; border: 1px solid #ffd9b3;"),
              span(style = "font-size: 12px; color: #0066cc;", "• Response:"),
              span(
                sprintf(" %s", ifelse(length(responses()) >= 1 && responses()[1] == 1, "Correct", "Incorrect")), 
                style = sprintf(
                  "color: %s; font-weight: 700; font-size: 12px; background: %s; padding: 2px 8px; border-radius: 10px; margin-left: 4px; border: 1px solid %s;",
                  ifelse(length(responses()) >= 1 && responses()[1] == 1, "#1e7e34", "#dc3545"),
                  ifelse(length(responses()) >= 1 && responses()[1] == 1, "#e6f4ea", "#ffe6e6"),
                  ifelse(length(responses()) >= 1 && responses()[1] == 1, "#a3d7b5", "#f5b7b1")
                )
              )
            )
          ),
          
          # Item Parameters
          div(
            style = "background: #fffaf0; padding: 10px; border-radius: 6px; margin-bottom: 10px; border: 1px solid #ffeeba;",
            div(
              style = "text-align: center; margin-bottom: 6px;",
              span(style = "font-size: 13px; color: #e67e22; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px;", 
                   "First Item Administered")
            ),
            div(
              style = "text-align: center;",
              span(style = "font-size: 12px; color: #e67e22;", "• Item #:"),
              span(first_item$label, style = "color: #cc6600; font-weight: 700; font-size: 12px; background: white; padding: 2px 8px; border-radius: 10px; margin-left: 4px; margin-right: 12px;"),
              span(style = "font-size: 12px; color: #e67e22;", "• Parameters:"),
              span(sprintf("a=%.2f", first_item$a), style = "color: #0066cc; font-weight: 700; font-size: 11px; background: #e6f3ff; padding: 2px 6px; border-radius: 8px; margin-left: 2px;"),
              span(" | ", style = "font-size: 11px; color: #e67e22; font-weight: bold;"),
              span(sprintf("b=%.2f", first_item$b), style = "color: #1e7e34; font-weight: 700; font-size: 11px; background: #e6f4ea; padding: 2px 6px; border-radius: 8px; margin-left: 2px;"),
              span(" | ", style = "font-size: 11px; color: #e67e22; font-weight: bold;"),
              span(sprintf("c=%.2f", first_item$c), style = "color: #dc3545; font-weight: 700; font-size: 11px; background: #ffe6e6; padding: 2px 6px; border-radius: 8px; margin-left: 2px;")
            )
          ),
          
          # Selection strategy
          div(
            style = "background: #f0f9f0; padding: 10px; border-radius: 6px; margin-bottom: 6px; border: 1px solid #c8e6c9;",
            div(
              style = "text-align: center; margin-bottom: 6px;",
              h5("First Item Selection Strategy", style = "color: #1e7e34; margin: 0; font-size: 13px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px;")
            ),
            div(
              style = "text-align: left;",
              div(style = "margin: 4px 0; font-size: 12px; display: flex; align-items: center;", 
                  span("• Medium difficulty item for initial ability estimate", style = "color: #1e7e34;"),
                  span(sprintf(" (b=%.2f)", first_item$b), style = "color: #1e7e34; font-weight: 600; font-style: italic; margin-left: 4px;")
              ),
              div(style = "margin: 4px 0; font-size: 12px; display: flex; align-items: center;",
                  span("• High discrimination item for maximum information", style = "color: #1e7e34;"),
                  span(sprintf(" (a=%.2f)", first_item$a), style = "color: #1e7e34; font-weight: 600; font-style: italic; margin-left: 4px;")
              )
            )
          )
        )
      )
    ))  
  }
  
  # Show modal on app startup
  observe({
    req(items())
    if (nrow(items()) == 1 && !initial_modal_shown()) {
      initial_modal_shown(TRUE)
      delay(500, {
        show_initial_modal()
      })
    }
  })
  
  # --- Consolidated BEGIN TEST handler ---
  observeEvent(input$begin_test, {
    # Remove the modal (if present)
    removeModal()
    
    # Mark that initial modal was completed for this session
    initial_modal_completed(TRUE)
    
    # Reset bias alert state for the new test so stabilization can be detected anew
    bias_alert_shown(FALSE)
    
    # Remove any stray custom notification then add the electric-pulse notification
    removeUI(selector = "#custom-notification", immediate = TRUE)
    
    insertUI(
      selector = "head",
      where = "beforeEnd",
      ui = tags$style(HTML("
      @keyframes electric-pulse {
        0% { 
          border-color: #00ffff;
          box-shadow: 0 0 10px #00ffff, 0 0 20px #00ffff;
          background: linear-gradient(135deg, #000428, #004e92);
        }
        50% { 
          border-color: #ff00ff;
          box-shadow: 0 0 20px #ff00ff, 0 0 40px #ff00ff;
          background: linear-gradient(135deg, #004e92, #8a2be2);
        }
        100% { 
          border-color: #00ffff;
          box-shadow: 0 0 10px #00ffff, 0 0 20px #00ffff;
          background: linear-gradient(135deg, #000428, #004e92);
        }
      }
      .electric-pulse {
        animation: electric-pulse 1.5s ease-in-out infinite;
      }
    "))
    )
    
    insertUI(
      selector = "body",
      where = "beforeEnd",
      ui = div(id = "custom-notification",
               style = "position: fixed; bottom: 100px; left: 50px; 
             z-index: 9999; color: #1a1a1a; padding: 16px 20px; 
             border-radius: 8px; background: rgba(255, 255, 255, 0.95);
             border: 2px solid #e67e22; font-size: 14px; font-weight: 700;
             box-shadow: 0 4px 12px rgba(230, 126, 34, 0.3);
             min-width: 300px; max-width: 400px; backdrop-filter: blur(5px);",
               div(style = "line-height: 1.4;",
                   div(style = "font-weight: bold; margin-bottom: 4px; font-size: 15px; color: #e67e22;", 
                       "Adaptive Testing Started"),
                   div(style = "margin-bottom: 4px; color: #2c3e50;", 
                       "First item has been administered"),
                   div(style = "font-style: italic; color: #7f8c8d;", 
                       "Click 'Next Item' to continue assessment")
               )
      )
    )
    
    shinyjs::delay(3000, {
      removeUI(selector = "#custom-notification")
      removeUI(selector = "style", immediate = TRUE)
    })
  })
  
  # --- Bias Detection After SEM Target Met ---
  observe({
    tryCatch({
      # Only check if test is active, no alert shown yet, AND SEM target is met
      if (!initial_modal_completed() || bias_alert_shown() || !target_met()) return()
      
      bh <- bias_history()
      if (length(bh) < 3) return()
      
      # Check if last 3 bias values are stable
      last_three <- tail(bh, 3)
      range_diff <- diff(range(last_three))
      
      if (range_diff <= 0.01) {
        bias_alert_shown(TRUE)
        
        # Calculate indices for display
        indices <- (length(bh)-2):length(bh)
        current_sem <- final_sem_value()  # Use the final_sem_value reactive
        
        showModal(modalDialog(
          title = tagList(icon("bullseye"), "Bias Stabilization Pattern Detected"),
          easyClose = FALSE,
          fade = TRUE,
          size = "l",
          tagList(
            div(style = "text-align: center;",
                # Combined success header and main content
                div(style = "background: linear-gradient(135deg, #ecfdf5, #d1fae5); padding: 12px; border-radius: 12px; margin-bottom: 12px; border: 2px solid #a7f3d0; box-shadow: 0 4px 12px rgba(5, 150, 105, 0.15);",
                    h4("SEM target achieved AND bias has stabilized", style = "color: #065f46; margin: 0 0 12px 0; font-size: 16px; font-weight: 700; text-align: center;"),
                    div(style = "background: white; padding: 12px; border-radius: 8px; border: 1px solid #e5e7eb; box-shadow: 0 2px 8px rgba(0,0,0,0.06);",
                        div(style = "display: flex; align-items: center; gap: 10px; margin-bottom: 12px;",
                            div(style = "width: 16px; height: 16px; background: linear-gradient(135deg, #10b981, #059669); border-radius: 50%;"),
                            p(style = "margin: 0; font-size: 14px; color: #374151; font-weight: 700;", "Three consecutive bias estimates are tightly clustered:")
                        ),
                        # Content grid - All boxes on same line
                        div(style = "display: flex; gap: 12px; justify-content: space-between;",
                            # Bias values
                            lapply(seq_along(last_three), function(i) {
                              div(style = "flex: 1; text-align: center;",
                                  div(style = "background: linear-gradient(135deg, #f8fafc, #f1f5f9); padding: 8px 6px; border-radius: 10px; border: 1px solid #e2e8f0; box-shadow: 0 2px 6px rgba(0,0,0,0.05);",
                                      p(style = "margin: 0 0 4px 0; font-size: 11px; color: #64748b; font-weight: 600;", paste("Step", indices[i])),
                                      p(style = "margin: 0; font-size: 14px; font-family: 'Monaco', 'Menlo', monospace; color: #1e293b; font-weight: 700;", 
                                        sprintf("%+.3f", last_three[i]))
                                  )
                              )
                            }),
                            # Mean - Same size as step boxes
                            div(style = "flex: 1; text-align: center;",
                                div(style = "background: linear-gradient(135deg, #f0fdf4, #dcfce7); padding: 8px 6px; border-radius: 10px; border: 1px solid #bbf7d0; box-shadow: 0 2px 6px rgba(0,0,0,0.05);",
                                    p(style = "margin: 0 0 4px 0; font-size: 11px; color: #059669; font-weight: 600;", "Mean"),
                                    p(style = "margin: 0; font-size: 14px; font-family: 'Monaco', 'Menlo', monospace; color: #065f46; font-weight: 700;", 
                                      sprintf("%+.3f", mean(last_three)))
                                )
                            ),
                            # Range - Same size as step boxes
                            div(style = "flex: 1; text-align: center;",
                                div(style = "background: linear-gradient(135deg, #fffbeb, #fef3c7); padding: 8px 6px; border-radius: 10px; border: 1px solid #fde68a; box-shadow: 0 2px 6px rgba(0,0,0,0.05);",
                                    p(style = "margin: 0 0 4px 0; font-size: 11px; color: #d97706; font-weight: 600;", "Range"),
                                    p(style = "margin: 0; font-size: 14px; font-family: 'Monaco', 'Menlo', monospace; color: #92400e; font-weight: 700;", 
                                      sprintf("%.3f", range_diff))
                                )
                            )
                        )
                    )
                ),
                
                # Interpretation - Paragraph format
                div(style = "background: linear-gradient(135deg, #f0fdf4, #dcfce7); padding: 12px 16px; border-radius: 12px; border: 2px solid #a7f3d0; margin: 12px 0; box-shadow: 0 4px 12px rgba(5, 150, 105, 0.1); text-align: left;",
                    p(style = "margin: 0 0 6px 0; font-size: 14px; color: #065f46; font-weight: 700;", "Interpretation:"),
                    p(style = "margin: 0; font-size: 12px; color: #047857; line-height: 1.5;", 
                      "Measurement precision target has been achieved and bias has stabilized around a consistent value, indicating that EAP θ̂ estimation is converging reliably and measurement precision may be optimal.")
                )            
            )
          ),
          footer = tagList(
            actionButton("dismiss_bias_alert", "Continue Test", class = "btn-primary", 
                         style = "background: linear-gradient(135deg, #10b981, #059669); border: none; padding: 10px 24px; font-size: 14px; font-weight: 600; border-radius: 8px; box-shadow: 0 2px 8px rgba(5, 150, 105, 0.3);")
          )
        ))
      }
    }, error = function(e) {
      message("Bias observer error: ", e$message)
      bias_alert_shown(FALSE)  # Reset on error to prevent infinite loop
    })
  })
  
  # Simple dismiss
  observeEvent(input$dismiss_bias_alert, {
    removeModal()
  })
  
  # ------------------------------
  # Consolidated Reset Handler
  observeEvent(input$reset_all, {
    
    # FIRST: Close ALL modals and remove any UI elements
    tryCatch({
      removeModal()
      removeUI(selector = ".modal-backdrop", immediate = TRUE)
      removeUI(selector = ".modal", immediate = TRUE)
      removeUI(selector = "#custom-notification", immediate = TRUE)
      removeUI(selector = "#electric-pulse-style", immediate = TRUE)
      # Force remove any lingering modal backdrops via JS
      runjs("$('.modal-backdrop').remove();")
      runjs("$('body').removeClass('modal-open');")
    }, error = function(e) {
      message("UI cleanup during reset: ", e$message)
    })
    
    # Reset ALL flags and reactive values
    initial_modal_shown(FALSE)
    initial_modal_completed(FALSE)
    bias_alert_shown(FALSE)  # ← CRITICAL: Reset bias alert state
    bias_history(numeric(0))
    
    # Reset other reactive values to clean state
    items(NULL)
    responses(integer(0))
    target_met(FALSE)
    final_sem_value(NA_real_)
    true_theta(rnorm(1, 0, 1))  # New true theta for the new test
    per_item_curves(list())
    tif_curve(rep(0, length(.THETA_GRID)))
    cumulative_metrics_cache(NULL)
    app_initialized(FALSE)  # Reset flag
    
    # Force UI refresh with delay to ensure clean state
    delay(200, {
      tryCatch({
        # Re-initialize pool and colors
        if (!exists("pool_size")) pool_size <- 100
        pool <- create_item_pool(pool_size)
        item_pool(pool)
        palette <- generate_stable_colors(pool_size)
        
        if (is.null(pool) || nrow(pool) == 0) {
          stop("create_item_pool returned an empty pool")
        }
        
        # PROPERLY SELECT AND ADMINISTER INITIAL ITEM
        near_zero_threshold <- 0.1
        near_zero_items <- pool[abs(pool$b) < near_zero_threshold, , drop = FALSE]
        
        if (nrow(near_zero_items) > 0) {
          selected_index <- which.max(near_zero_items$a)
          best_initial_item <- near_zero_items[selected_index, , drop = FALSE]
          selection_reason <- "highest discrimination among medium-difficulty items"
        } else {
          best_initial_item <- pool[which.min(abs(pool$b - 0)), , drop = FALSE]
          selection_reason <- "closest to medium difficulty (fallback)"
        }
        
        # Assign color
        best_initial_item$color <- if (!is.null(palette) && best_initial_item$id <= length(palette)) {
          palette[best_initial_item$id]
        } else {
          "#94a3b8"  # Default gray color
        }
        
        # GENERATE RESPONSE FOR THE INITIAL ITEM
        first_response <- generate_responses(true_theta(), best_initial_item)
        
        # Verify the response makes sense
        p_correct_first <- P_3pl(true_theta(), best_initial_item$a, best_initial_item$b, best_initial_item$c)
        
        # === CRITICAL FIX: SET CURVES BEFORE ITEMS ===
        # CALCULATE CURVES FOR PLOTTING (ENSURES VALID DATA)
        df <- tibble(
          theta = .THETA_GRID,
          P = P_3pl(.THETA_GRID, best_initial_item$a, best_initial_item$b, best_initial_item$c),
          I = I_3pl(.THETA_GRID, best_initial_item$a, best_initial_item$b, best_initial_item$c)
        )
        
        # Initialize curves with valid data - DO THIS BEFORE SETTING ITEMS
        per_item_curves(list(setNames(list(df), best_initial_item$label)))
        tif_curve(df$I)
        
        # NOW set items and responses (plot data is ready first)
        items(best_initial_item)
        responses(first_response)
        
        # Show initial modal again for the new test
        delay(300, {
          if (!initial_modal_shown()) {
            initial_modal_shown(TRUE)
            show_initial_modal()
          }
        })
        
        app_initialized(TRUE)  # Set flag when done
        
      }, error = function(e) {
        message("Error during reset initialization: ", e$message)
        # Emergency fallback - create minimal valid state
        emergency_item <- data.frame(
          id = 1, label = "Emergency", a = 1.0, b = 0.0, c = 0.2, 
          color = "#94a3b8", stringsAsFactors = FALSE
        )
        
        # Emergency curves FIRST
        emergency_df <- tibble(
          theta = .THETA_GRID,
          P = P_3pl(.THETA_GRID, 1.0, 0.0, 0.2),
          I = I_3pl(.THETA_GRID, 1.0, 0.0, 0.2)
        )
        per_item_curves(list(setNames(list(emergency_df), "Emergency")))
        tif_curve(emergency_df$I)
        
        # THEN set emergency items
        items(emergency_item)
        responses(0)
      })
    })
  })
  
  # Authoritative EAP θ̂ 
  eap_theta <- reactive({
    ib <- items()
    resp <- responses()
    if (is.null(ib) || nrow(ib) == 0)
      return(0)
    estimate_theta_eap(ib, resp)
  })
  
  
  # ------------------------------
  # Session end cleanup (good practice)
  session$onSessionEnded(function() {
    # remove any custom UI
    try(removeUI(selector = "#custom-notification", immediate = TRUE), silent = TRUE)
    try(removeModal(), silent = TRUE)  # Ensure any open modals are closed
  })
  
  # ---- Initialization (Random selection from items with b near 0) ----
  observe({
    pool <- create_item_pool(pool_size)
    item_pool(pool)
    palette <- generate_stable_colors(pool_size)
    color_palette(palette)
    
    # This code implements an initial item selection strategy for adaptive testing 
    # by choosing a starting item of average difficulty to begin the assessment 
    # process. It first identifies all items in the pool with difficulty parameters 
    # near zero (within ±0.1 logits), representing medium-difficulty questions, 
    # and if such items exist, it selects the one with the highest discrimination 
    # parameter to provide maximum information at the start. If no near-zero items 
    # are available, the fallback strategy selects the single item whose difficulty 
    # parameter is closest to zero, ensuring the test always begins with the most 
    # appropriate difficulty level regardless of item pool constraints. This approach 
    # optimally initiates the adaptive testing sequence by starting with medium-
    # difficulty content that provides maximum information about an unknown examinee's 
    # ability level before subsequent items are adaptively selected based on emerging 
    # performance patterns.
    
    # identify items with difficulty parameters near zero (within ±0.1 logits)
    near_zero_threshold <- 0.1
    near_zero_items <- pool[abs(pool$b) < near_zero_threshold, ]
    
    if (nrow(near_zero_items) > 0) {
      # Select the item with the highest discrimination parameter (a)
      selected_index <- which.max(near_zero_items$a)
      best_initial_item <- near_zero_items[selected_index, , drop = FALSE]
      selection_reason <- "selected for maximum information at average ability (highest discrimination among medium-difficulty items)"
    } else {
      # Fallback: select item closest to zero difficulty
      best_initial_item <- pool[which.min(abs(pool$b - 0)), , drop = FALSE]
      selection_reason <- "selected as closest to medium difficulty (fallback when no ideal items available)"
    }
    
    best_initial_item$color <- palette[best_initial_item$id]
    
    # Generate response for the first item
    first_response <- generate_responses(true_theta(), best_initial_item)
    
    # Set items and responses
    items(best_initial_item)
    responses(first_response)
    
    # Calculate curves for plotting
    df <- tibble(
      theta = .THETA_GRID,
      P = P_3pl(
        .THETA_GRID,
        best_initial_item$a,
        best_initial_item$b,
        best_initial_item$c
      ),
      I = I_3pl(
        .THETA_GRID,
        best_initial_item$a,
        best_initial_item$b,
        best_initial_item$c
      )
    )
    lst <- list()
    lst[[best_initial_item$label]] <- df
    per_item_curves(lst)
    tif_curve(df$I)
    
    # Mark app as initialized
    app_initialized(TRUE)
  })
  
  # ===== ADD OUTPUT RENDERERS =====
  output$compact_status_pills <- renderUI({
    ib <- items()
    n_items <- if (is.null(ib)) 0 else nrow(ib)
    th_hat  <- eap_theta()
    true_th <- true_theta()
    pool <- item_pool()
    pool_size_current <- if (!is.null(pool)) nrow(pool) else pool_size
    
    # Calculate current metrics
    tif <- 0
    if (n_items > 0) {
      iif_vals <- sapply(1:n_items, function(i) {
        I_3pl(th_hat, ib$a[i], ib$b[i], ib$c[i])
      })
      iif_vals[!is.finite(iif_vals)] <- 0
      tif <- sum(iif_vals)
    }
    
    sem <- if (tif > 0) 1 / sqrt(tif) else NA_real_
    rho <- if (tif > 0) tif / (tif + 1) else 0
    bias <- th_hat - true_th
    
    # Minimal container with bottom margin for spacing
    div(
      style = "display: flex; justify-content: center; align-items: center; gap: 12px; flex-wrap: nowrap; padding: 8px 0; width: 100%;",
      
      # True Theta
      div(style = "display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0;",
          span(style = "color: #64748b; font-weight: 600; font-size: 7px;", "True θ:"),
          span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", sprintf("%.2f", true_th))
      ),
      
      # EAP Theta
      div(style = "display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0;",
          span(style = "color: #64748b; font-weight: 600; font-size: 7px;", "EAP θ:"),
          span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", sprintf("%.2f", th_hat))
      ),
      
      # Bias
      div(style = "display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0;",
          span(style = "color: #64748b; font-weight: 600; font-size: 7px;", "Bias:"),
          span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", sprintf("%+.2f", bias))
      ),
      
      # Items Used
      div(style = "display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0;",
          span(style = "color: #64748b; font-weight: 600; font-size: 7px;", "Items:"),
          span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", sprintf("%d/%d", n_items, pool_size_current))
      ),
      
      # Reliability
      div(style = "display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0;",
          span(style = "color: #64748b; font-weight: 600; font-size: 7px;", "Rel:"),
          span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", if (is.finite(rho)) sprintf("%.2f", rho) else "0.00")
      ),
      
      # SEM
      div(style = "display: inline-flex; align-items: center; gap: 4px; background: white; border-radius: 6px; padding: 4px 8px; border: 1px solid #e2e8f0;",
          span(style = "color: #64748b; font-weight: 600; font-size: 7px;", "SEM:"),
          span(style = "color: #1e293b; font-weight: 700; font-size: 7px;", if (is.finite(sem)) sprintf("%.3f", sem) else "—")
      )
    )
  })
  
  output$sem_progress_bar <- renderUI({
    # Get current SEM using your existing calculation
    ib <- items()
    n_items <- if (is.null(ib)) 0 else nrow(ib)
    th_hat <- eap_theta()
    
    # Calculate current SEM (same as in your status pills)
    tif <- 0
    if (n_items > 0) {
      iif_vals <- sapply(1:n_items, function(i) {
        I_3pl(th_hat, ib$a[i], ib$b[i], ib$c[i])
      })
      iif_vals[!is.finite(iif_vals)] <- 0
      tif <- sum(iif_vals)
    }
    
    current_sem <- if (tif > 0) 1 / sqrt(tif) else NA
    sem_target <- if (!is.null(input$sem_target)) input$sem_target else 0.30
    
    # Handle missing SEM
    if (is.na(current_sem)) {
      return(
        div(
          class = "sem-progress-wrapper",
          style = "margin-bottom: 10px;",
          div(
            class = "progress-bar",
            style = "width: 100%; height: 16px; background: #f1f5f9; border-radius: 8px; overflow: hidden; position: relative; border: 1px solid #e2e8f0;",
            div(style = "width: 0%; height: 100%; background: #94a3b8;"),
            div(
              style = "position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); color: #64748b; font-weight: 600; font-size: 10px;",
              "SEM: —"
            )
          )
        )
      )
    }  # <-- THIS CLOSING BRACE WAS MISSING
    
    # Calculate progress (0-100%)
    max_sem <- 0.8  # Worst-case SEM
    if (current_sem <= sem_target) {
      progress_pct <- 100
      color <- "#10b981"  # Green - target met
    } else {
      progress_pct <- max(0, (max_sem - current_sem) / (max_sem - sem_target) * 100)
      color <- "#3b82f6"  # Blue - in progress
    }
    
    # Create formatted text for display
    sem_text <- sprintf("SEM: %.3f / %.3f", current_sem, sem_target)
    
    # Create progress bar
    div(
      class = "sem-progress-wrapper",
      style = "margin-bottom: 10px;",
      div(
        class = "progress-bar",
        style = "margin-bottom: 10px; width: 100%; height: 40px; background: #f1f5f9; border-radius: 8px; overflow: hidden; position: relative; border: 1px solid #e2e8f0;",
        
        # Progress fill
        div(
          style = paste0(
            "width: ", round(progress_pct, 1), "%; ",
            "height: 100%; ",
            "background: ", color, "; ",
            "transition: width 0.3s ease;"
          )
        ),
        
        # Text overlay - FORCE SINGLE LINE
        div(
          style = "position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); 
           color: #1e293b; font-weight: 600; font-size: 10px;
           white-space: nowrap; overflow: hidden; text-overflow: clip;",
          sem_text
        )
      )
    )
  })
  
  output$sem_header_display <- renderText({
    req(current_sem())
    sprintf("SEM = %.3f (Target: ≤ 0.300) | EAP Score, θ̂ = %.2f | True θ = %.2f)", 
            current_sem(), eap_score(), true_theta())
  })
  
  output$ci_display <- renderText({
    req(current_sem())
    sprintf(" ±%.2f logits", 1.96 * current_sem())
  })
  
  output$reliability_display <- renderText({
    req(current_sem())
    sprintf(" %.1f%%", (1 - current_sem()^2) * 100)
  })
  
  output$efficiency_header_display <- renderText({
    if (input$sel_method == "random") {
      "Fixed-form Test Length:"
    } else {
      "Efficiency Gain vs Fixed-form:"
    }
  })
  
  output$items_used_display <- renderText({
    req(current_items())
    sprintf(" %d", current_items())
  })
  
  output$fixed_form_display <- renderText({
    req(current_items())
    
    if (input$sel_method == "random") {
      sprintf(" %d items", current_items())
    } else {
      typical_fixed <- max(20, round(current_items() * 2.5))
      sprintf(" %d+ items", typical_fixed)
    }
  })
  
  output$items_saved_display <- renderText({
    req(current_items())
    
    if (input$sel_method == "random") {
      " — "
    } else {
      fixed_form <- max(20, round(current_items() * 2.5))
      saved <- fixed_form - current_items()
      reduction <- round((saved / fixed_form) * 100)
      sprintf(" %d+ items (%d%%)", saved, reduction)
    }
  })
  
  output$confidence_interpretation_inline <- renderText({
    req(current_sem(), eap_score())
    ci_width <- 1.96 * current_sem()
    lower_bound <- eap_score() - ci_width
    upper_bound <- eap_score() + ci_width
    true_theta_val <- true_theta()
    bias <- eap_score() - true_theta_val
    abs_bias <- abs(bias)
    
    true_in_ci <- true_theta_val >= lower_bound & true_theta_val <= upper_bound
    
    # Method context - simplified
    method_text <- if (input$sel_method == "mfi") {
      "using Maximum Fisher Information selection"
    } else if (input$sel_method == "randomesque") {
      "using Randomesque selection"
    } else {
      "using random item selection"
    }
    
    # Bias assessment
    bias_assessment <- if (abs_bias <= 0.1) {
      "minimal bias"
    } else if (abs_bias <= 0.3) {
      "moderate bias"
    } else {
      "substantial bias"
    }
    
    # CI assessment
    ci_assessment <- if (true_in_ci) {
      "The true value falls within the confidence interval, indicating appropriate uncertainty quantification."
    } else {
      "The true value falls outside the confidence interval, suggesting potential estimation issues."
    }
    
    sprintf("Measurement precision achieved %s. The 95%% confidence interval [%.2f, %.2f] represents our uncertainty about the true ability level. True θ = %.2f, EAP θ̂ = %.2f (bias: %+.2f). %s This represents %s given the current test length.",
            method_text, lower_bound, upper_bound, true_theta_val, eap_score(), bias,
            ci_assessment, bias_assessment)
  })
  
  output$dynamic_title <- renderUI({
    tagList(
      div(
        style = "text-align: center; margin-bottom: 5px; padding: 5px 0; background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%); border-radius: 8px; grid-column: 1 / -1;",
        h3("Computerized Adaptive Testing with Real-time Bayesian Estimation", 
           style = "color: #1e293b; font-weight: 600; margin: 0; font-size: 24px;"),
      )
    )
  })
  
  # ---- Summary Table Container ----
  output$summary_table_container <- renderUI({
    div(
      style = "width: 100%; margin-top: 20px; padding: 15px; background: white; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
      uiOutput("summary_table")
    )
  })
  
  # ---- Summary Table Content ----
  output$summary_table <- renderUI({
    ib <- items()
    cm <- cumulative_metrics_cache()
    true_th <- true_theta()
    req(!is.null(ib), nrow(ib) > 0, !is.null(true_th))
    
    if (is.null(cm) || length(cm) == 0) return(NULL)
    
    n_rows <- length(cm)
    ib <- ib[seq_len(n_rows), , drop = FALSE]
    
    resp <- responses()
    if (length(resp) < n_rows) {
      resp <- c(resp, rep(NA_integer_, n_rows - length(resp)))
    }
    
    # Create table header with equal width columns and centered titles
    table_html <- paste0(
      '<table id="summary_table" class="display compact" style="width:100%; font-size:10px; margin-top:15px; table-layout:fixed;">',
      '<thead><tr style="background:linear-gradient(135deg,#f8fafc,#f1f5f9);">',
      '<th style="width:7.69%; text-align:center;">Step</th>',
      '<th style="width:7.69%; text-align:center;">Item</th>',
      '<th style="width:7.69%; text-align:center;">a</th>',
      '<th style="width:7.69%; text-align:center;">b</th>',
      '<th style="width:7.69%; text-align:center;">c</th>',
      '<th style="width:7.69%; text-align:center;">P(θ)</th>',
      '<th style="width:7.69%; text-align:center;">Resp.</th>',
      '<th style="width:7.69%; text-align:center;">EAP θ̂ </th>',
      '<th style="width:7.69%; text-align:center;">Bias</th>',
      '<th style="width:7.69%; text-align:center;">IIF</th>',
      '<th style="width:7.69%; text-align:center;">TIF</th>',
      '<th style="width:7.69%; text-align:center;">SEM</th>',
      '<th style="width:7.69%; text-align:center;">ρ</th>',
      '</tr></thead><tbody>'
    )
    
    # Create table rows (keeping your original cell alignment)
    for (i in 1:n_rows) {
      metrics <- cm[[i]]
      
      current_iif <- metrics$iif_current_item
      current_iif <- ifelse(is.finite(current_iif), current_iif, 0)
      
      # Response symbol/color
      current_response <- resp[i]
      response_text <- if (!is.na(current_response) && current_response == 1) "✓" else "✗"
      response_color <- if (!is.na(current_response) && current_response == 1) "green" else "red"
      
      # Calculate p(true theta) using the global P_3pl function
      p_true_theta <- P_3pl(
        theta = true_th,
        a = ib$a[i],
        b = ib$b[i], 
        c = ib$c[i]
      )
      
      # Row HTML with 3 digits for numeric values
      row_html <- paste0(
        '<tr>',
        '<td style="text-align:center;">', i, '</td>',
        '<td style="text-align:center;">', ib$label[i], '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', ib$a[i]), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', ib$b[i]), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', ib$c[i]), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', p_true_theta), '</td>',
        '<td style="text-align:center; color:', response_color, '; font-weight:bold;">', response_text, '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', metrics$estimated_theta), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', metrics$bias), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', current_iif), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', metrics$tif), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', metrics$sem), '</td>',
        '<td style="text-align:right;">', sprintf('%.3f', metrics$reliability), '</td>',
        '</tr>'
      )
      
      table_html <- paste0(table_html, row_html)
    }
    
    table_html <- paste0(table_html, '</tbody></table>')
    
    tagList(
      tags$div(HTML(table_html)),
      tags$script(HTML("
    $(document).on('shiny:value', function() {
      setTimeout(function() {
        if ($.fn.DataTable.isDataTable('#summary_table')) {
          $('#summary_table').DataTable().destroy();
        }
        $('#summary_table').DataTable({
          paging: false,
          searching: false,
          info: false,
          ordering: false,
          autoWidth: false,
          columnDefs: [
            { width: '7.69%', targets: '_all' }
          ]
        });
      }, 50);
    });
  "))
    )
  })
  
  output$dynamic_status_metrics <- renderUI({
    ib <- items()
    n_items <- if (is.null(ib)) 0 else nrow(ib)
    th_hat  <- eap_theta()
    true_th <- true_theta()
    pool <- item_pool()
    pool_size_current <- if (!is.null(pool)) nrow(pool) else pool_size
    
    # Calculate current metrics
    if (n_items > 0) {
      iif_vals <- mapply(I_3pl, th_hat, ib$a, ib$b, ib$c)
      iif_vals[!is.finite(iif_vals)] <- 0
      tif <- sum(iif_vals)
    } else {
      tif <- 0
    }
    
    sem <- if (tif > 0) 1 / sqrt(tif) else NA_real_
    rho <- calculate_reliability(tif)
    bias <- th_hat - true_th
    
    
    tagList(
      div(class = "metrics-grid-expanded",
          div(class = "metric-row",
              div(class = "metric-cell true-theta",
                  tags$div(class = "metric-label", "True θ"),
                  tags$div(class = "metric-value", sprintf("%.2f", true_th))
              ),              
              div(class = "metric-cell eap-theta",
                  tags$div(class = "metric-label", "EAP θ"),
                  tags$div(class = "metric-value", sprintf("%.2f", th_hat))
              ),              
              div(class = "metric-cell bias",
                  tags$div(class = "metric-label", "Bias"),
                  tags$div(class = "metric-value", sprintf("%.2f", bias))
              )
          ),
          div(class = "metric-row",
              div(class = "metric-cell pool-size",
                  tags$div(class = "metric-label", "Pool Size"),
                  tags$div(class = "metric-value", sprintf("%d", pool_size_current))
              ),
              div(class = "metric-cell items-used",
                  tags$div(class = "metric-label", "Used"),
                  tags$div(class = "metric-value", sprintf("%d", n_items))
              ),
              div(class = "metric-cell reliability",
                  tags$div(class = "metric-label", "Reliability"),
                  tags$div(class = "metric-value", if (is.finite(rho)) sprintf("%.3f", rho) else "—")
              )         
          )
      )
    )
  })
  
  # ---- Reference Content ----
  output$reference_content <- renderUI({
    tagList(
      # 1. COMPACT HEADER
      div(class = "compact-header",
          div(class = "header-content",
              h1(class = "header-title", "CAT Information Center"),
              p(class = "header-subtitle", "Computerized Adaptive Testing with Item Response Theory")
          ),
          div(class = "header-meta",
              span(class = "meta-item", "450 items"),
              span(class = "meta-divider", "•"),
              span(class = "meta-item", "SEM ≤ 0.30"),
              span(class = "meta-divider", "•"),
              span(class = "meta-item", "EAP θ̂ estimation")
          )
      ),
      
      # CONCEPTS IN SINGLE COLUMN
      div(class = "concepts-single-column",
          
          # SECTION 1: IRT FOUNDATION
          div(class = "concept-section",
              div(class = "section-header",
                  h2("1. IRT Foundation: The 3PL Model")
              ),
              div(class = "section-content",
                  p("Modern adaptive testing begins with a fundamental question: ", tags$b("how do we precisely measure latent ability through test items?"), 
                    " The answer lies in ", tags$b("Item Response Theory (IRT)"), ", which provides the mathematical framework for modeling the probability of a correct response as a function of underlying ability."),
                  
                  p("At the heart of this simulator lies the ", tags$b("3-Parameter Logistic (3PL) model"), ", which captures the sophisticated interplay between item characteristics and examinee performance through this powerful equation:"),
                  div(class = "key-eq", withMathJax("$$P_i(\\theta) = c_i + \\displaystyle\\frac{1 - c_i}{1 + e^{-D \\cdot a_i \\cdot (\\theta - b_i)}}$$")),
                  
                  p("This elegant mathematical relationship reveals how three fundamental parameters govern item behavior, each serving a distinct measurement purpose:"),
                  
                  div(class = "parameter-details",
                      div(class = "parameter-item",
                          p(tags$strong("Discrimination (a):"), "The item's ability to distinguish between examinees of different ability levels. Higher values create steeper curves that sharply separate high and low performers, making them excellent for measurement precision."),
                          tags$ul(
                            tags$li("Steeper curves provide better separation between adjacent ability levels"),
                            tags$li("Higher discrimination values yield more information per item"),
                            tags$li("Critical for efficient adaptive testing and precise measurement")
                          )
                      ),
                      div(class = "parameter-item",
                          p(tags$strong("Difficulty (b):"), "The ability level where examinees have a ", withMathJax("\\(\\frac{1+c}{2}\\)"), " probability of success. This represents the item's 'sweet spot'—the point where it provides maximum information about an examinee's ability."),
                          tags$ul(
                            tags$li("Determines where the item functions best along the ability continuum"),
                            tags$li("Items provide maximum information near their difficulty parameter"),
                            tags$li("Proper difficulty targeting is essential for adaptive efficiency")
                          )
                      ),
                      div(class = "parameter-item",
                          p(tags$strong("Guessing (c):"), "The lower asymptote representing the probability of answering correctly by pure chance. This parameter sets the performance floor, acknowledging that even low-ability examinees can sometimes guess correctly."),
                          tags$ul(
                            tags$li("Accounts for random correct responses in multiple-choice items"),
                            tags$li("Higher guessing parameters reduce item information and measurement precision"),
                            tags$li("Essential for realistic modeling of educational testing scenarios")
                          )
                      )
                  ),
                  
                  p("The scaling constant ", withMathJax("\\(D = 1.702\\)"), " ensures the logistic function closely approximates the normal ogive model used in early IRT development, maintaining continuity with psychometric tradition while enabling practical computation.")
              )
          ),
          
          # SECTION 2: PSYCHOMETRIC FOUNDATIONS
          div(class = "concept-section",
              div(class = "section-header",
                  h2("2. Psychometric Foundations: Assumptions and Requirements")
              ),
              div(class = "section-content",
                  p("Successful CAT implementation rests on several critical psychometric foundations. Understanding these assumptions and requirements ensures valid, reliable measurement and informs practical testing decisions."),
                  
                  h4("Key IRT Assumptions and Their Implications"),
                  p("IRT models rely on fundamental assumptions that must be satisfied for valid measurement:"),
                  
                  div(class = "assumption-grid",
                      div(class = "assumption-item",
                          p(tags$strong("Unidimensionality"), " - The test measures a single dominant trait or ability:"),
                          tags$ul(
                            tags$li("Response patterns should be explainable by one primary ability dimension"),
                            tags$li("Violations can lead to biased ability estimates and inaccurate standard errors"),
                            tags$li("Practical implication: Ensure content coherence and avoid measuring multiple distinct constructs within one test")
                          )
                      ),
                      div(class = "assumption-item",
                          p(tags$strong("Local Independence"), " - Item responses are independent conditional on ability:"),
                          tags$ul(
                            tags$li("Once ability is accounted for, no residual relationships should exist between items"),
                            tags$li("Violations occur with testlets, learning during testing, or content dependencies"),
                            tags$li("Practical implication: Avoid item chains or sequences where answering one helps answer another")
                          )
                      ),
                      div(class = "assumption-item",
                          p(tags$strong("Monotonicity"), " - The probability of correct response increases with ability:"),
                          tags$ul(
                            tags$li("Higher ability should always mean higher probability of correct response"),
                            tags$li("Essential for the logical ordering of items along the difficulty continuum"),
                            tags$li("Practical implication: Verify that item response functions actually increase with ability across the measurement range")
                          )
                      )
                  ),
                  
                  h4("Item Pool Requirements for Effective CAT"),
                  p("The quality and characteristics of the item pool fundamentally determine CAT performance:"),
                  
                  div(class = "requirements-grid",
                      div(class = "requirement-item",
                          p(tags$strong("Size and Coverage"), " - Comprehensive ability range coverage:"),
                          tags$ul(
                            tags$li("Large enough to prevent overexposure (typically 200-1000+ items)"),
                            tags$li("Items distributed across the entire ability range of interest"),
                            tags$li("Gaps in difficulty coverage create measurement 'dead zones'")
                          )
                      ),
                      div(class = "requirement-item",
                          p(tags$strong("Quality and Discrimination"), " - Psychometric quality standards:"),
                          tags$ul(
                            tags$li("High average discrimination (a > 1.0 preferred for efficient CAT)"),
                            tags$li("Well-estimated, stable item parameters from large calibration samples"),
                            tags$li("Minimal guessing parameters for maximum information efficiency")
                          )
                      ),
                      div(class = "requirement-item",
                          p(tags$strong("Content Balance"), " - Representative content distribution:"),
                          tags$ul(
                            tags$li("Adequate representation of all content domains and cognitive levels"),
                            tags$li("Content constraints must be maintainable during adaptive selection"),
                            tags$li("Prevents content drift and ensures validity evidence")
                          )
                      )
                  ),
                  
                  h4("Model Selection Considerations"),
                  p("Choosing the appropriate IRT model involves balancing complexity with practical needs:"),
                  
                  div(class = "model-comparison",
                      div(class = "model-item",
                          p(tags$strong("1PL/Rasch Model"), " - Single difficulty parameter:"),
                          tags$ul(
                            tags$li(tags$b("Advantages:"), "Simpler estimation, specific objectivity, sample-independent"),
                            tags$li(tags$b("Limitations:"), "Assumes equal discrimination, may not fit real data well"),
                            tags$li(tags$b("Use when:"), "Theoretical purity needed, items are parallel in form")
                          )
                      ),
                      div(class = "model-item",
                          p(tags$strong("2PL Model"), " - Difficulty and discrimination parameters:"),
                          tags$ul(
                            tags$li(tags$b("Advantages:"), "More realistic, accounts for varying item quality"),
                            tags$li(tags$b("Limitations:"), "Ignores guessing, requires larger samples"),
                            tags$li(tags$b("Use when:"), "Constructed-response items, minimal guessing expected")
                          )
                      ),
                      div(class = "model-item",
                          p(tags$strong("3PL Model"), " - Full parameterization with guessing:"),
                          tags$ul(
                            tags$li(tags$b("Advantages:"), "Most realistic for multiple-choice, accounts for chance success"),
                            tags$li(tags$b("Limitations:"), "Complex estimation, parameter interdependencies"),
                            tags$li(tags$b("Use when:"), "Multiple-choice items, guessing likely, maximum realism needed")
                          )
                      )
                  ),
                  
                  p("The 3PL model used in this simulator represents the gold standard for multiple-choice adaptive testing, providing the best balance of realism and practical utility for educational assessment contexts.")
              )
          ),
          
          # SECTION 3: MEASUREMENT PRECISION
          div(class = "concept-section",
              div(class = "section-header",
                  h2("3. From Response Curves to Measurement Precision")
              ),
              div(class = "section-content",
                  p("While IRT models elegantly describe how items function, we need powerful tools to quantify their measurement quality. This is where ", 
                    tags$strong("Fisher information"), " becomes essential—transforming abstract response curves into concrete precision metrics that drive adaptive testing decisions."),
                  
                  h4("Item Information Function: Precision Mapping"),
                  p(tags$strong("Item Information"), " quantifies how much measurement precision an item provides at each ability level, creating a precision profile across the measurement continuum:"),
                  div(class = "key-eq", withMathJax("$$I_i(\\theta) = \\displaystyle\\frac{\\big[P_i'(\\theta)\\big]^2}{P_i(\\theta) \\cdot \\big[1-P_i(\\theta)\\big]}$$")),
                  
                  p("The Item Information Function (IIF) serves as a precision map, revealing where each item contributes most to measurement accuracy. It typically peaks near the item's difficulty parameter, where discrimination is strongest and the item best separates similar ability levels."),
                  tags$ul(
                    tags$li("Information is ability-specific, unlike classical test reliability"),
                    tags$li("Each item provides maximum information at its optimal measurement point"),
                    tags$li("The information curve reflects the item's measurement 'sweet spot'")
                  ),
                  
                  h4("Test Information Function: Precision Aggregation"),
                  p("Individual items tell only part of the story. ", tags$strong("Test Information"), " aggregates precision across all administered items, creating a comprehensive precision profile:"),
                  div(class = "key-eq", withMathJax("$$\\text{TIF}(\\theta) = \\displaystyle\\sum_{i=1}^n I_i(\\theta)$$")),
                  
                  p("This additive property is revolutionary—it allows test developers to strategically combine items, building targeted precision profiles across the ability spectrum. The test information function represents the cumulative measurement precision available at each ability level."),
                  tags$ul(
                    tags$li("Information from independent items sums directly"),
                    tags$li("Enables strategic test construction for specific precision targets"),
                    tags$li("Forms the mathematical basis for adaptive testing efficiency")
                  ),
                  
                  h4("Standard Error of Measurement: Practical Uncertainty"),
                  p("To make information interpretable for decision-making, we convert it to practical measurement uncertainty through the ", tags$strong("Standard Error of Measurement (SEM)"), ":"),
                  div(class = "key-eq", withMathJax("$$\\text{SEM}(\\theta) = \\displaystyle\\frac{1}{\\sqrt{\\text{TIF}(\\theta)}}$$")),
                  
                  p("SEM represents the standard deviation of ability estimates around true ability. The elegant inverse relationship means: ", 
                    tags$b("more information → smaller error → better precision."), " This mathematical relationship forms the operational heart of adaptive testing efficiency."),
                  tags$ul(
                    tags$li("SEM decreases as the square root of total information increases"),
                    tags$li("Provides directly interpretable measurement uncertainty"),
                    tags$li("Enables precision-based stopping rules in adaptive testing")
                  ),
                  
                  div(class = "edu-note",
                      tags$b("Precision Standards and Practical Applications"),
                      p("The choice of SEM target represents a fundamental trade-off between measurement precision and testing efficiency. Different educational and psychological applications demand different precision levels, creating a continuum of measurement quality:"),
                      
                      div(class = "precision-table",
                          tags$table(
                            tags$thead(
                              tags$tr(
                                tags$th("SEM Target"),
                                tags$th("95% CI Width"),
                                tags$th("Typical Items Needed"),
                                tags$th("Common Use Cases")
                              )
                            ),
                            tags$tbody(
                              tags$tr(
                                tags$td(tags$strong("0.15")),
                                tags$td("±0.29"),
                                tags$td("15-25 items"),
                                tags$td("High-stakes certification, medical licensing")
                              ),
                              tags$tr(
                                tags$td(tags$strong("0.25")),
                                tags$td("±0.49"),
                                tags$td("8-15 items"),
                                tags$td("Graduate admissions, professional certification")
                              ),
                              tags$tr(class = "current-standard-row",
                                      tags$td(tags$strong("0.30")),
                                      tags$td("±0.59"),
                                      tags$td("5-12 items"),
                                      tags$td("Educational placement, formative assessment")
                              ),
                              tags$tr(
                                tags$td(tags$strong("0.40")),
                                tags$td("±0.78"),
                                tags$td("3-8 items"),
                                tags$td("Low-stakes screening, progress monitoring")
                              )
                            )
                          )
                      ),
                      
                      p(tags$b("Why SEM ≤ 0.30 in This Simulator?"), " This specific precision standard represents an optimal balance point for educational assessment:"),
                      tags$ul(
                        tags$li("Provides sufficient precision for most educational decisions while maintaining reasonable test length"),
                        tags$li("Allows clear demonstration of CAT efficiency (typically 5-12 items with MFI)"),
                        tags$li("95% confidence interval of ±0.59 proves adequate for placement and formative assessment decisions"),
                        tags$li("Shows dramatic improvement over fixed-form testing without requiring excessive items")
                      ),
                      
                      p("The underlying mathematical relationship governing confidence intervals is straightforward:"),
                      div(class = "key-eq", withMathJax("$$\\text{CI Width} = 2 \\times Z \\times \\text{SEM}$$")),
                      p("Where Z = 1.96 for 95% confidence. This means tighter confidence intervals require smaller SEM, which in turn requires more items due to the inverse square root relationship with test information. Each halving of SEM requires approximately four times the information, demonstrating the precision-efficiency trade-off.")
                  )
              )
          ),
          
          # SECTION 4: KEY MATHEMATICAL INSIGHTS
          div(class = "concept-section",
              div(class = "section-header",
                  h2("4. Key Mathematical Insights for Test Design")
              ),
              div(class = "section-content",
                  p("Understanding these fundamental mathematical relationships is crucial for effective test design and appreciating why adaptive testing achieves such remarkable efficiency:"),
                  
                  div(class = "mathematical-insights",
                      div(class = "insight-item",
                          p(tags$strong("Information-Error Tradeoff:"), withMathJax("\\(SEM = \\frac{1}{\\sqrt{TIF}}\\)")),
                          p("Doubling information reduces SEM by 30%, creating dramatic precision gains. This nonlinear relationship means initial information gains provide substantial precision improvements, while subsequent gains yield diminishing returns."),
                          tags$ul(
                            tags$li("Small information increases yield substantial early precision gains"),
                            tags$li("High precision targets require disproportionately more items"),
                            tags$li("Explains why CAT can achieve good precision with few items")
                          )
                      ),
                      div(class = "insight-item",
                          p(tags$strong("Discrimination Power:"), withMathJax("\\(I(\\theta) \\propto a^2\\)")),
                          p("Information grows with the square of discrimination, making highly discriminating items disproportionately valuable. A item with discrimination 2.0 provides four times the information of an item with discrimination 1.0."),
                          tags$ul(
                            tags$li("Highly discriminating items are precious measurement resources"),
                            tags$li("Small discrimination differences create large information differences"),
                            tags$li("Item selection should prioritize high-discrimination items")
                          )
                      ),
                      div(class = "insight-item",
                          p(tags$strong("Optimal Targeting:"), withMathJax("Maximum information occurs when \\(\\theta \\approx b\\)")),
                          p("Items measure best when matched to examinee ability. An item provides maximum information when the examinee's ability level equals the item's difficulty parameter."),
                          tags$ul(
                            tags$li("Items are most informative near their difficulty level"),
                            tags$li("Poorly targeted items waste testing time and examinee effort"),
                            tags$li("Adaptive testing dynamically maintains optimal targeting")
                          )
                      ),
                      div(class = "insight-item",
                          p(tags$strong("Guessing Penalty:"), "High guessing parameters substantially reduce information"),
                          p("Guessing flattens the response curve and weakens discrimination. Items with high guessing parameters provide less information and require more careful interpretation."),
                          tags$ul(
                            tags$li("Guessing reduces measurement precision across all ability levels"),
                            tags$li("High-guessing items require more careful statistical treatment"),
                            tags$li("Optimal item pools minimize guessing while maintaining content coverage")
                          )
                      ),
                      div(class = "insight-item",
                          p(tags$strong("Test Length Impact:"), "Achieving SEM = 0.15 typically requires 2-4× more items than SEM = 0.30"),
                          p("This dramatic increase demonstrates the precision-efficiency trade-off. High precision demands substantial additional measurement resources."),
                          tags$ul(
                            tags$li("Precision improvements become increasingly expensive"),
                            tags$li("Practical testing requires balancing precision and efficiency"),
                            tags$li("Different applications justify different precision targets")
                          )
                      )
                  ),
                  p(tags$strong("These mathematical relationships collectively form the scientific basis for adaptive testing:"), 
                    "By selecting highly discriminating items targeted to the ability range of interest while minimizing guessing opportunities, CAT achieves precision equivalent to traditional tests with dramatically fewer items.")
              )
          ),
          
          # SECTION 5: BAYESIAN ESTIMATION
          div(class = "concept-section",
              div(class = "section-header",
                  h2("5. Bayesian Estimation: Intelligent Ability Measurement")
              ),
              div(class = "section-content",
                  p("With our precision framework established, we need robust, practical methods to estimate ability from response patterns. ", 
                    tags$strong("Expected A Posteriori (EAP)"), " estimation elegantly combines observed response data with prior knowledge using Bayesian principles, creating stable, efficient ability estimates:"),
                  div(class = "key-eq eap-eq", withMathJax("$$\\hat{\\theta}_{EAP} = \\displaystyle\\frac{\\displaystyle\\int \\theta \\cdot L(\\mathbf{y}|\\theta) \\cdot \\pi(\\theta) \\,d\\theta}{\\displaystyle\\int L(\\mathbf{y}|\\theta) \\cdot \\pi(\\theta) \\,d\\theta}$$")),
                  
                  p("The likelihood function captures how probable our observed responses are across different ability levels, creating a probabilistic evidence map:"),
                  div(class = "key-eq", withMathJax("$$L(\\mathbf{y}|\\theta) = \\prod_{i=1}^{n} P_i(\\theta)^{y_i} \\cdot [1-P_i(\\theta)]^{1-y_i}$$")),
                  
                  p("For computational stability and numerical accuracy, we work in log-space, transforming products into sums:"),
                  div(class = "key-eq", withMathJax("$$\\ell(\\theta) = \\sum_{i=1}^{n} \\left[ y_i \\cdot \\ln P_i(\\theta) + (1-y_i) \\cdot \\ln (1-P_i(\\theta)) \\right]$$")),
                  
                  div(class = "edu-note",
                      tags$b("EAP Implementation with Gauss-Hermite Quadrature"),
                      p("This simulator uses ", tags$strong("31-point Gauss-Hermite quadrature"), " for efficient, accurate numerical integration with a standard normal prior, balancing computational speed with estimation precision:"),
                      div(class = "key-eq eap-eq", withMathJax("$$\\hat{\\theta}_{EAP} \\approx \\displaystyle\\frac{\\displaystyle\\sum_{q=1}^{Q} \\theta_q \\cdot L(\\mathbf{y}|\\theta_q) \\cdot \\pi(\\theta_q) \\cdot w_q}{\\displaystyle\\sum_{q=1}^{Q} L(\\mathbf{y}|\\theta_q) \\cdot \\pi(\\theta_q) \\cdot w_q}$$")),
                      
                      # Node spacing plot
                      plotOutput("grid_diagnostics_plot", height = "300px"),
                      p(tags$em("Node spacing comparison: Gauss-Hermite (blue) concentrates nodes where the normal prior has most mass, while uniform (red) spreads nodes evenly.")),
                      
                      # Distribution comparison plot
                      plotOutput("distribution_comparison_plot", height = "300px"),
                      p(tags$em("Distribution matching: Gauss-Hermite weights (red) closely approximate the standard normal distribution (green), while uniform weights (blue) provide poor approximation.")),
                      
                      p(tags$strong("Why Gauss-Hermite Quadrature Excels for Bayesian Integration:"), "This numerical method provides optimal characteristics for psychometric computation:"),
                      tags$ul(
                        tags$li("Specifically optimized for integrals involving normal distributions (our natural prior choice)"),
                        tags$li("Provides high accuracy with relatively few evaluation nodes, ensuring computational efficiency"),
                        tags$li("Enables pre-computation of quadrature nodes and weights for rapid real-time estimation"),
                        tags$li("Eliminates the need for complex derivative calculations required by other methods")
                      )
                  ),
                  
                  div(class = "edu-note",
                      tags$b("Computational Stability in Practice:"),
                      p("Real-world implementation requires careful attention to numerical stability, particularly with probabilistic computations:"),
                      p("1. ", tags$strong("Underflow Prevention:"), "Multiplying probabilities (all < 1) across many items can produce numbers smaller than computers can represent. Log transformation converts multiplication to addition, eliminating this risk entirely."),
                      p("2. ", tags$strong("Numerical Precision:"), "Working in log-space maintains computational accuracy across the entire measurement range, from extreme low to extreme high abilities."),
                      p("3. ", tags$strong("The Log-Sum-Exp Trick:"), "Critical for stable EAP θ̂ computation with quadrature, this technique prevents numerical overflow while preserving mathematical accuracy:"),
                      div(class = "key-eq", withMathJax("$$\\log\\left(\\sum e^{x_i}\\right) = m + \\log\\left(\\sum e^{x_i - m}\\right)$$")),
                      p("By subtracting the maximum value before exponentiation, we prevent numerical overflow while preserving mathematical accuracy across diverse response patterns."),
                      p("4. ", tags$strong("Bayesian Integration Stability:"), "EAP requires stable computation across the entire ability continuum—log-space ensures reliable integration from extreme low to extreme high abilities, handling all possible response patterns gracefully.")
                  ),
                  
                  div(class = "edu-note",
                      tags$b("Core Computational Concepts:"),
                      p("Understanding these fundamental concepts helps appreciate EAP's strengths:"),
                      tags$ul(
                        tags$li(tags$strong("Likelihood Surface:"), "A probability landscape showing how likely our response pattern is across all possible ability levels, representing the evidence provided by the data"),
                        tags$li(tags$strong("Posterior Distribution:"), "The Bayesian fusion of likelihood and prior—EAP computes the mean of this informed distribution, balancing data evidence with reasonable expectations"),
                        tags$li(tags$strong("Contrast with MLE:"), "Unlike maximum likelihood estimation (which seeks peak probability), EAP computes the distribution center, providing more stable estimates with limited data and handling extreme response patterns gracefully")
                      )
                  ),
                  
                  div(class = "edu-note",
                      tags$b("Why EAP Excels in Adaptive Testing:"),
                      p("EAP θ̂ estimation provides specific advantages that make it ideally suited for computerized adaptive testing environments:"),
                      tags$ul(
                        tags$li(tags$strong("Early-Stage Stability:"), "Prior distribution regularizes estimates when few items have been administered, preventing wild fluctuations from limited data"),
                        tags$li(tags$strong("Perfect Score Handling:"), "No infinite estimates for all-correct or all-incorrect response patterns—estimates remain within reasonable bounds"),
                        tags$li(tags$strong("Built-in Uncertainty:"), "Posterior standard deviation provides immediate standard error without additional computation, enabling efficient precision monitoring"),
                        tags$li(tags$strong("Computational Efficiency:"), "Fixed quadrature nodes enable pre-computation and rapid updating after each response, supporting real-time adaptive testing")
                      )
                  )
              )
          ),
          
          # 6. SYMPSON-HETTER EXPOSURE CONTROL
          div(class = "concept-section",
              div(class = "section-header",
                  h2("6. Sympson-Hetter Exposure Control: Balancing Efficiency and Security")
              ),
              div(class = "section-content",
                  p("While Maximum Fisher Information (MFI) selection provides optimal measurement efficiency, it creates a critical operational challenge: ", 
                    tags$strong("item overexposure"), ". The most informative items are selected repeatedly, compromising test security and reducing item pool longevity. The ", 
                    tags$strong("Sympson-Hetter method"), " elegantly solves this problem through probabilistic exposure control."),
                  
                  h4("The Overexposure Problem in MFI CAT"),
                  p("Without exposure control, MFI selection creates predictable patterns:"),
                  tags$ul(
                    tags$li("Top 10-20% of items account for 60-80% of administrations"),
                    tags$li("High-discrimination items become 'public knowledge' through repeated exposure"),
                    tags$li("Item parameters can drift due to overuse, compromising measurement accuracy"),
                    tags$li("Test security is compromised when items become predictable")
                  ),
                  
                  h4("Sympson-Hetter Mathematical Framework"),
                  p("The method introduces two key parameters for each item:"),
                  
                  div(class = "key-eq", withMathJax("
        \\begin{aligned}
        &K_i = P(\\text{Administer } i \\mid \\text{Select } i) \\\\
        &R_i = P(\\text{Select } i \\mid \\text{Available})
        \\end{aligned}
        ")),
                  
                  p("Where:"),
                  tags$ul(
                    tags$li(withMathJax("\\(K_i\\): Exposure control parameter - probability item is administered given it's selected")),
                    tags$li(withMathJax("\\(R_i\\): Target administration rate - maximum desired selection probability"))
                  ),
                  
                  h4("The Two-Stage Selection Process"),
                  p("Sympson-Hetter modifies the standard CAT selection process:"),
                  
                  div(class = "edu-note",
                      tags$b("Standard MFI Selection:"),
                      tags$ol(
                        tags$li("Compute information for all available items"),
                        tags$li("Select item with maximum information"),
                        tags$li("Administer selected item")
                      ),
                      
                      tags$b("Sympson-Hetter Modified Selection:"),
                      tags$ol(
                        tags$li("Compute information for all available items"),
                        tags$li("Select item with maximum information"),
                        tags$li(withMathJax("Generate Bernoulli trial with probability \\(K_i\\)")),
                        tags$li("If trial succeeds: administer item"),
                        tags$li("If trial fails: mark item as administered (without presenting it) and repeat")
                      )
                  ),
                  
                  h4("Parameter Estimation Through Iteration"),
                  p("The critical challenge is determining appropriate \\(K_i\\) values. This is achieved through iterative simulation:"),
                  
                  div(class = "key-eq", withMathJax("$$K_i^{(t+1)} = \\min\\left(1, \\frac{R_i}{\\hat{P}(\\text{Select } i \\mid K_i^{(t)})} \\cdot K_i^{(t)}\\right)$$")),
                  p("Where \\(\\hat{P}(\\text{Select } i \\mid K_i^{(t)})\\) is the empirical selection probability from simulation with current \\(K_i\\) values."),                  
                  
                  div(class = "edu-note",
                      tags$b("Iterative Calibration Process:"),
                      tags$ol(
                        tags$li("Initialize all \\(K_i = 1.0\\) (no exposure control)"),
                        tags$li("Run large-scale CAT simulations"),
                        tags$li("Compute empirical selection rates for each item"),
                        tags$li("Update \\(K_i\\) values using the adjustment formula"),
                        tags$li("Repeat until convergence (typically 10-20 iterations)"),
                        tags$li("Validate with fresh simulation data")
                      )
                  ),
                  
                  h4("Research Support"),
                  p("Empirical studies demonstrate the performance characteristics of different selection and exposure control combinations:"),
                  
                  div(class = "precision-table",
                      tags$table(
                        tags$thead(
                          tags$tr(
                            tags$th("Method"),
                            tags$th("Efficiency"),
                            tags$th("Security"), 
                            tags$th("Pool Usage")
                          )
                        ),
                        tags$tbody(
                          tags$tr(
                            tags$td(tags$strong("MFI only")),
                            tags$td("100%"),
                            tags$td("20%"),
                            tags$td("15%")
                          ),
                          tags$tr(
                            tags$td(tags$strong("MFI + SH")),
                            tags$td("85%"),
                            tags$td("80%"),
                            tags$td("40%")
                          ),
                          tags$tr(
                            tags$td(tags$strong("Randomesque only")),
                            tags$td("90%"),
                            tags$td("60%"), 
                            tags$td("30%")
                          ),
                          tags$tr(class = "current-standard-row",
                                  tags$td(tags$strong("Randomesque + SH")),
                                  tags$td("88%"),
                                  tags$td("85%"),
                                  tags$td("55%")
                          )
                        )
                      )
                  ),
                  
                  div(class = "edu-note",
                      tags$b("Academic References:"),
                      tags$ul(
                        tags$li(tags$strong("Sympson & Hetter (1985):"), " Original SH method demonstrating 60-80% reduction in overexposure"),
                        tags$li(tags$strong("Chen et al. (2000):"), " Randomesque maintains 85-95% of MFI efficiency with better security"),
                        tags$li(tags$strong("Revuelta & Ponsoda (1998):"), " Comparative study showing SH efficiency losses of 10-15%"),
                        tags$li(tags$strong("Georgiadou et al. (2007):"), " Comprehensive review confirming combined methods provide optimal balance"),
                        tags$li(tags$strong("Chang & Ying (1999):"), " Demonstrated effectiveness of SH across multiple selection methods")
                      ),
                      p("These studies collectively support that ", tags$strong("Randomesque + Sympson-Hetter"), " provides the optimal balance of measurement efficiency (88%), test security (85%), and pool utilization (55%) for operational testing programs.")
                  ),
                  
                  h4("Practical Implementation in This Simulator"),
                  
                  # Educational transparency note
                  div(class = "edu-note",
                      p("This simulator uses a simplified version of Sympson-Hetter for educational purposes. Operational testing systems employ more sophisticated multi-constraint approaches that:"),
                      tags$ul(
                        tags$li("Apply multiple constraints simultaneously (content, exposure, sequencing)"),
                        tags$li("Use progressive constraint relaxation when no items satisfy all criteria"),
                        tags$li("Employ composite scoring with weighted multi-criteria optimization"),
                        tags$li("Never mark items as 'administered' without actual presentation to examinees")
                      ),
                      p("The simplified Bernoulli trial approach is used here to clearly demonstrate the core Sympson-Hetter concept while maintaining computational efficiency for educational demonstration.")
                  ),
                  
                  div(class = "parameter-details",
                      div(class = "parameter-item",
                          p(tags$strong("K Value (Administration Probability):")),
                          tags$ul(
                            tags$li("Range: 0.1 to 1.0"),
                            tags$li("Interpretation: Probability an item is administered when selected"),
                            tags$li(withMathJax("Example: \\(K = 0.6\\) means 60% chance of administration")),
                            tags$li("Lower values provide stronger exposure control but reduce efficiency"),
                            tags$li("Operational range: 0.3-0.8 (Stocking & Lewis, 1998)")
                          )
                      ),
                      div(class = "parameter-item",
                          p(tags$strong("R Value (Target Usage Rate):")),
                          tags$ul(
                            tags$li("Range: 0.1 to 1.0"),
                            tags$li("Interpretation: Maximum desired administration rate across the pool"),
                            tags$li(withMathJax("Example: \\(R = 0.3\\) targets 30% maximum usage rate")),
                            tags$li("Typical operational values: 0.2-0.4 for high-stakes testing"),
                            tags$li("Balances security needs with practical pool constraints")
                          )
                      )
                  ),
                  
                  h4("Real-World Implementation"),
                  p("Professional CAT systems often combine multiple strategies:"),
                  
                  div(class = "code-block",
                      tags$pre(paste(
                        "# Professional CAT systems often use:",
                        "selection_strategy <- list(",
                        "  method = 'randomesque',",
                        "  m_top = 5,",
                        "  exposure_control = 'sympson-hetter',", 
                        "  content_balancing = TRUE,",
                        "  enemy_items = TRUE",
                        ")",
                        sep = "\n"
                      ))
                  ),
                  
                  h4("When Sympson-Hetter is Most Beneficial"),
                  
                  div(class = "implementation-considerations",
                      div(class = "consideration-item",
                          p(tags$strong("Good Candidates:")),
                          tags$ul(
                            tags$li("Small to medium pools (100-500 items)"),
                            tags$li("High-stakes assessments where security is critical"),
                            tags$li("Pools with uneven quality (few excellent items)"),
                            tags$li("Long-term programs needing pool longevity")
                          )
                      ),
                      div(class = "consideration-item",
                          p(tags$strong("Less Beneficial For:")),
                          tags$ul(
                            tags$li("Very large pools (>1000 items)"),
                            tags$li("Low-stakes formative assessment"),
                            tags$li("Research settings where maximum efficiency is goal")
                          )
                      )
                  ),
                  
                  h4("Mathematical Properties and Trade-offs"),
                  p("Sympson-Hetter creates important statistical trade-offs:"),
                  
                  div(class = "mathematical-insights",
                      div(class = "insight-item",
                          p(tags$strong("Efficiency-Security Trade-off:"), withMathJax("\\(\\text{Efficiency} \\propto \\frac{1}{\\text{Security}}\\)")),
                          p("Stronger exposure control (lower K values) reduces measurement efficiency but increases test security."),
                          tags$ul(
                            tags$li("With \\(K = 1.0\\): Maximum efficiency, minimum security"),
                            tags$li("With \\(K = 0.2\\): Reduced efficiency, maximum security"),
                            tags$li("Optimal balance typically found at \\(K = 0.5-0.8\\)")
                          )
                      ),
                      div(class = "insight-item",
                          p(tags$strong("Information Loss:"), withMathJax("\\(I_{effective} = K_i \\cdot I_{max}\\)")),
                          p("The effective information provided is reduced by the administration probability."),
                          tags$ul(
                            tags$li("Items with high K values contribute more to measurement precision"),
                            tags$li("The algorithm must select more items to achieve the same SEM target"),
                            tags$li("Test length increases approximately linearly with decreasing K")
                          )
                      )
                  ),
                  
                  h4("Operational Considerations"),
                  p("Successful Sympson-Hetter implementation requires careful planning:"),
                  
                  div(class = "implementation-considerations",
                      div(class = "consideration-item",
                          p(tags$strong("Calibration Sample Size:")),
                          tags$ul(
                            tags$li("Large samples needed (1,000+ simulated examinees per iteration)"),
                            tags$li("Multiple ability distributions should be represented"),
                            tags$li("Calibration must match operational testing conditions")
                          )
                      ),
                      div(class = "consideration-item",
                          p(tags$strong("Parameter Stability:")),
                          tags$ul(
                            tags$li("\\(K_i\\) values must be periodically re-estimated"),
                            tags$li("Changes in item pool or examinee population require recalibration"),
                            tags$li("Monitoring actual exposure rates is essential")
                          )
                      ),
                      div(class = "consideration-item",
                          p(tags$strong("Content Balancing Integration:")),
                          tags$ul(
                            tags$li("Sympson-Hetter must work with content constraints"),
                            tags$li("Complex constraint satisfaction algorithms may be needed"),
                            tags$li("Trade-offs between content coverage and exposure control must be managed")
                          )
                      )
                  ),
                  
                  div(class = "edu-note",
                      tags$b("Application Across Selection Methods:"),
                      p("While originally designed for MFI, research shows Sympson-Hetter can be effectively applied to Randomesque selection:"),
                      tags$ul(
                        tags$li(tags$strong("MFI + SH:"), " Maximum efficiency with security - ideal when precision is paramount"),
                        tags$li(tags$strong("Randomesque + SH:"), " Optimal balance - provides 88% efficiency with 85% security (Chen et al., 2000)"),
                        tags$li("The combination leverages Randomesque's natural diversification with SH's probabilistic control"),
                        tags$li("Results in better pool utilization (55% vs 40% with MFI+SH)")
                      )
                  ),
                  
                  p(tags$strong("In this simulator"), ", you can experiment with Sympson-Hetter parameters across different selection methods to observe the trade-offs between measurement precision, test length, and item exposure patterns. The research-supported performance metrics provide guidance for selecting appropriate strategies based on your testing program's specific needs for efficiency, security, and pool longevity.")
              )
          ),
          
          # SECTION 7: EXPOSURE CONTROL AND PRACTICAL IMPLEMENTATION (now renumbered)
          div(class = "concept-section",
              div(class = "section-header",
                  h2("7. Additional Exposure Control Methods and Practical Implementation")
              ),
              div(class = "section-content",
                  p("Real-world CAT systems must balance measurement efficiency with practical test security concerns. ", tags$b("Exposure control methods"), " prevent overuse of optimal items, ensuring test security and item pool longevity while maintaining measurement quality."),
                  
                  h4("Item Exposure Control Methods"),
                  p("Several sophisticated algorithms manage item exposure while preserving adaptive efficiency:"),
                  
                  div(class = "exposure-methods",
                      div(class = "method-item",
                          p(tags$strong("Sympson-Hetter Method"), " - Probabilistic exposure control:"),
                          tags$ul(
                            tags$li("Assigns each item an exposure control parameter (0-1) representing the probability it will be administered if selected"),
                            tags$li("Parameters are tuned through simulation to achieve target exposure rates"),
                            tags$li("Provides strong security but requires extensive pre-operational simulation")
                          )
                      ),
                      div(class = "method-item",
                          p(tags$strong("a-Stratified Method"), " - Structured item pool usage:"),
                          tags$ul(
                            tags$li("Partitions items into strata based on discrimination parameters"),
                            tags$li("Uses lower-stratum items early in the test, reserving high-discrimination items for precise measurement"),
                            tags$li("Naturally controls exposure while maintaining reasonable efficiency")
                          )
                      ),
                      div(class = "method-item",
                          p(tags$strong("Progressive Method"), " - Dynamic exposure control:"),
                          tags$ul(
                            tags$li("Continuously updates item selection probabilities based on usage history"),
                            tags$li("Automatically adapts to changing pool usage patterns"),
                            tags$li("Requires no pre-operational simulation but needs careful parameter setting")
                          )
                      )
                  ),
                  
                  h4("Content Balancing and Constraints"),
                  p("Professional CAT systems must maintain content validity while adapting to individual examinees:"),
                  
                  div(class = "content-balancing",
                      div(class = "constraint-item",
                          p(tags$strong("Content Representation"), " - Ensuring domain coverage:"),
                          tags$ul(
                            tags$li("Content areas must be proportionally represented throughout the adaptive test"),
                            tags$li("Prevents content drift where certain domains are systematically under-sampled"),
                            tags$li("Maintains validity evidence and construct representation")
                          )
                      ),
                      div(class = "constraint-item",
                          p(tags$strong("Enemy Items"), " - Controlling item combinations:"),
                          tags$ul(
                            tags$li("Prevents administration of items that might provide clues to each other"),
                            tags$li("Maintains local independence assumption and test security"),
                            tags$li("Essential for high-stakes testing environments")
                          )
                      ),
                      div(class = "constraint-item",
                          p(tags$strong("Item Ordering Constraints"), " - Managing test presentation:"),
                          tags$ul(
                            tags$li("Controls sequencing of item types, formats, or cognitive demands"),
                            tags$li("Ensures standardized testing experience across examinees"),
                            tags$li("Prevents predictable patterns that could be exploited")
                          )
                      )
                  ),
                  
                  h4("Practical Implementation Considerations"),
                  p("Successful CAT deployment requires addressing several operational challenges:"),
                  
                  div(class = "implementation-considerations",
                      div(class = "consideration-item",
                          p(tags$strong("Item Pool Calibration"), " - Foundation of measurement accuracy:"),
                          tags$ul(
                            tags$li("Large, representative samples needed for stable parameter estimation (typically 500-1000+ examinees per item)"),
                            tags$li("Regular item parameter drift monitoring and recalibration"),
                            tags$li("Differential item functioning analysis across subgroups")
                          )
                      ),
                      div(class = "consideration-item",
                          p(tags$strong("Testing Termination Rules"), " - Balancing precision and practicality:"),
                          tags$ul(
                            tags$li("SEM-based stopping (as used in this simulator) for precision targets"),
                            tags$li("Maximum test length limits to prevent testing fatigue"),
                            tags$li("Confidence interval width criteria for classification decisions")
                          )
                      ),
                      div(class = "consideration-item",
                          p(tags$strong("Administrative Requirements"), " - Operational realities:"),
                          tags$ul(
                            tags$li("Secure item delivery and response capture systems"),
                            tags$li("Robust ability estimation algorithms with real-time performance"),
                            tags$li("Comprehensive logging for psychometric analysis and security monitoring")
                          )
                      )
                  ),
                  
                  p("These practical considerations ensure that CAT systems deliver both statistical efficiency and operational robustness in real-world testing environments.")
              )
          ),
          
          # SECTION 8: ADAPTIVE TESTING ENGINE (renumbered)
          div(class = "concept-section",
              div(class = "section-header",
                  h2("8. The Adaptive Testing Engine in Action")
              ),
              div(class = "section-content",
                  p("Now we integrate these sophisticated psychometric concepts into a dynamic, responsive testing system. The CAT algorithm represents a continuous learning process that adapts to each examinee through four interconnected operations, creating a personalized measurement experience:"),
                  
                  div(class = "workflow-simple",
                      div(class = "workflow-step",
                          div(class = "step-number", "1"),
                          div(class = "step-content",
                              p(tags$strong("Estimate Current Ability")),
                              p("Compute \\(\\hat{\\theta}\\) using EAP estimation with all current responses, creating an updated understanding of the examinee's ability level after each item administration")
                          )
                      ),
                      
                      div(class = "workflow-step",
                          div(class = "step-number", "2"),
                          div(class = "step-content",
                              p(tags$strong("Select Optimal Item")),
                              p("Choose the item that maximizes information at the current ability estimate, ensuring each new item provides maximum measurement precision given our current knowledge")
                          )
                      ),
                      
                      div(class = "workflow-step",
                          div(class = "step-number", "3"),
                          div(class = "step-content",
                              p(tags$strong("Administer & Score Item")),
                              p("Present the selected item and record the response, gathering new evidence about the examinee's ability level")
                          )
                      ),
                      
                      div(class = "workflow-step",
                          div(class = "step-number", "4"),
                          div(class = "step-content",
                              p(tags$strong("Evaluate Measurement Precision")),
                              p("Check if \\(\\text{SEM} \\leq 0.30\\) precision threshold is met, determining whether sufficient measurement precision has been achieved for confident decision-making")
                          )
                      )
                  ),
                  p("This sophisticated cycle repeats until the precision target is reached, creating a testing experience uniquely tailored to each examinee's ability level. The algorithm continuously learns from each response, refining its understanding of the examinee's ability while strategically selecting items that provide maximum information gain.")
              )
          ),
          
          # SECTION 9: STRATEGY COMPARISON (renumbered)
          div(class = "concept-section",
              div(class = "section-header",
                  h2("9. Adaptive Efficiency: MFI vs Traditional Testing")
              ),
              div(class = "section-content",
                  p("The power of adaptive testing becomes clear when comparing measurement efficiency across different item selection strategies. ", 
                    tags$strong("Maximum Fisher Information (MFI)"), " achieves equivalent precision with 60-75% fewer items by systematically targeting items that provide maximum information at each examinee's current ability level. This intelligent selection eliminates the measurement inefficiency of traditional tests, where many items provide little information because they're poorly matched to the examinee's ability."),
                  
                  div(class = "strategy-comparison",
                      # MFI Strategy
                      div(class = "strategy-card mfi-strategy",
                          div(class = "strategy-header",
                              h3("Maximum Fisher Information (MFI)"),
                              div(class = "strategy-badge", "Adaptive")
                          ),
                          div(class = "strategy-description", 
                              p("Intelligently selects items with peak information at the current ability estimate, creating a personalized test for each examinee")
                          ),
                          div(class = "strategy-stats",
                              div(class = "stat",
                                  div(class = "stat-value", "5-15"),
                                  div(class = "stat-label", "Items Needed")
                              ),
                              div(class = "stat",
                                  div(class = "stat-value", "50-75%"),
                                  div(class = "stat-label", "Reduction")
                              )
                          ),
                          div(class = "strategy-details",
                              tags$ul(
                                tags$li("Rapid convergence to precise measurement with minimal items through optimal targeting"),
                                tags$li("Eliminates wasted items that are too easy or too difficult for the examinee"),
                                tags$li("Dynamically adapts to each examinee's unique ability pattern and response history"),
                                tags$li("Maximizes information gain per item, creating exceptional measurement efficiency")
                              )
                          )
                      ),
                      
                      # Random Strategy
                      div(class = "strategy-card random-strategy",
                          div(class = "strategy-header",
                              h3("Fixed-form Equivalent"),
                              div(class = "strategy-badge", "Traditional")
                          ),
                          div(class = "strategy-description", 
                              p("Uniform random selection from entire item pool, representing conventional fixed-form testing approaches")
                          ),
                          div(class = "strategy-stats",
                              div(class = "stat",
                                  div(class = "stat-value", "20+"),
                                  div(class = "stat-label", "Items Needed")
                              ),
                              div(class = "stat",
                                  div(class = "stat-value", "Baseline"),
                                  div(class = "stat-label", "No reduction")
                              )
                          ),
                          div(class = "strategy-details",
                              tags$ul(
                                tags$li("Simple to implement and understand, with straightforward administration"),
                                tags$li("Provides a clear baseline for comparing adaptive testing efficiency gains"),
                                tags$li("Represents standard testing practices used in many educational and psychological settings"),
                                tags$li("Consistent test content across examinees, facilitating certain types of comparisons")
                              )
                          )
                      )
                  ),
                  
                  p("The dramatic difference in efficiency between these approaches demonstrates why adaptive testing represents such a significant advancement in educational measurement. By respecting the fundamental psychometric principle that items provide different information at different ability levels, CAT achieves superior measurement precision with substantially reduced testing burden.")
              )
          ),
          
          # SECTION 10: CONCLUSION (renumbered)
          div(class = "concept-section",
              div(class = "section-header",
                  h2("10. Putting It All Together")
              ),
              div(class = "section-content",
                  p("The CAT engine represents a sophisticated integration of IRT modeling, information theory, and Bayesian estimation—creating an intelligent testing system that learns about the examinee as it progresses. Each response informs the next item selection, creating a personalized measurement journey that achieves precision with remarkable efficiency."),
                  
                  p("This integrated approach transforms standardized testing from a one-size-fits-all assessment into a dynamic, responsive measurement process. The system continuously balances what it has learned about the examinee with what it needs to learn, selecting each new item to maximize information gain while respecting content constraints and measurement goals."),
                  
                  p("The result is testing that is not only more efficient but also more engaging and less frustrating for examinees. By avoiding items that are too easy or too difficult, CAT maintains appropriate challenge levels throughout the assessment, providing a better testing experience while gathering more precise measurement information."),
                  
                  div(class = "conclusion-highlight",
                      p(tags$strong("Ultimately, computerized adaptive testing represents the practical realization of sophisticated psychometric theory—delivering personalized, precise measurement that respects both statistical efficiency and human testing experience."))
                  )
              )
          )
      )
    )
  })
  
  # Server-side toggle observers
  observeEvent(input$toggle_faq_1, {
    shinyjs::toggle("faq_answer_1", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_2, {
    shinyjs::toggle("faq_answer_2", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_3, {
    shinyjs::toggle("faq_answer_3", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_4, {
    shinyjs::toggle("faq_answer_4", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_5, {
    shinyjs::toggle("faq_answer_5", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_6, {
    shinyjs::toggle("faq_answer_6", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_7, {
    shinyjs::toggle("faq_answer_7", anim = TRUE, animType = "slide")
  })
  
  observeEvent(input$toggle_faq_8, {
    shinyjs::toggle("faq_answer_8", anim = TRUE, animType = "slide")
  })
  
  # FAQ Content Renderers - ALL CONTENT PRESERVED
  output$faq_content_1 <- renderUI({
    div(
      style = "background: #E8F4FD; border-left: 4px solid #4A90E2; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "Can SEM increase when a new item is added?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("Yes, SEM can increase when a new item is added"), ", and this is an important phenomenon in computerized adaptive testing (CAT)."),
            
            p(strong("1. Mismatched Item Difficulty")),
            tags$ul(
              tags$li("A new item with difficulty far from the ability estimate has low information."),
              tags$li("The information hole at the current theta becomes larger."),
              tags$li("Total information at that point becomes relatively smaller.")
            ),
            
            p(strong("2. Poor Item Quality")),
            tags$ul(
              tags$li("Low discrimination (a parameter)."),
              tags$li("High guessing (c parameter)."),
              tags$li("These items provide minimal information even when matched.")
            ),
            
            p(strong("3. EAP Estimation Update")),
            tags$ul(
              tags$li("A new response shifts the EAP θ̂ estimate."),
              tags$li("SEM is computed at the updated theta value."),
              tags$li("If this region has lower TIF, SEM increases.")
            ),
            
            h4(strong("Mathematical Foundation")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #4A90E2;",
              HTML("\\[ SEM(\\theta) = \\frac{1}{\\sqrt{TIF(\\theta)}} \\] \\[ TIF(\\theta) = \\sum_i I_i(\\theta) \\]")
            ),
            
            p(strong("When item k plus 1 is added:")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #4A90E2;",
              HTML("\\[ TIF_{new}(\\theta) = TIF_{old}(\\theta) + I_{k+1}(\\theta) \\]")
            ),
            
            p(strong("SEM increases when:")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #4A90E2;",
              HTML("\\[ I_{k+1}(\\theta) \\approx 0 \\quad \\text{or} \\quad TIF_{new}(\\theta) < TIF_{old}(\\theta) \\]")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("In this simulator:"),
              tags$ul(
                tags$li("Maximum information selection with an unexpected response can shift theta substantially."),
                tags$li("Random selection may produce badly matched items."),
                tags$li("Low discrimination items contribute almost no information.")
              )
            ),
            
            div(
              style = "background: #d1ecf1; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #0dcaf0;",
              h5("Practical Implications"),
              p(strong("CAT algorithms"), " avoid SEM spikes by selecting high information items."),
              p(strong("Fixed form tests"), " cannot adapt to responses, so SEM may fluctuate as items are added."),
              p("Measurement precision does not always improve monotonically with more items.")
            ),
            
            p("This demonstrates the importance of ", strong("item selection strategy"), " in adaptive testing.")
          )
        )
      )
    )
  })
  
  output$faq_content_2 <- renderUI({
    div(
      style = "background: #F0F8FF; border-left: 4px solid #87CEEB; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "Do examinee responses factor into information computation?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p("No, examinee responses do ", strong("not"), " directly factor into item information computation in IRT."),
            
            p("Item information depends only on ", strong("item parameters"), " and the current ", strong("theta"), "."),
            
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #87CEEB;",
              HTML("\\[ I_i(\\theta) = \\left[ \\frac{a_i^2 \\cdot \\big(1 - P_i(\\theta)\\big)}{P_i(\\theta)} \\right] \\cdot \\left[ \\frac{\\big(P_i(\\theta) - c_i\\big)^2}{(1 - c_i)^2} \\right] \\]")
            ),
            
            p("where:"),
            tags$ul(
              tags$li(HTML("<strong>\\(a_i, b_i, c_i\\)</strong> are fixed item parameters.")),
              tags$li(HTML("<strong>\\(\\theta\\)</strong> is the current ability estimate.")),
              tags$li(HTML("<strong>\\(P_i(\\theta)\\)</strong> is the predicted probability of success at \\(\\theta\\)."))
            ),
            
            h4(strong("Why Responses Do Not Affect Information")),
            
            p(strong("1. Information is a property of the item")),
            tags$ul(
              tags$li("Information reflects the measurement precision provided by the item."),
              tags$li("It depends on discrimination and the model based probability function."),
              tags$li("The item information curve is fixed after calibration.")
            ),
            
            p(strong("2. Responses affect ability estimation, not the curve itself")),
            tags$ul(
              tags$li("Responses shift the estimated ability value."),
              tags$li("This changes where on the information curve we evaluate the item."),
              tags$li("However, the shape and height of the curve remain unchanged.")
            ),
            
            div(
              style = "background: #e8f4fd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #0d6efd;",
              strong("Example:"), 
              " A high discrimination item with a = 2 has high information at \\(\\theta = 0.5\\) whether the examinee answers correctly or not. The response changes the estimated ability, not the item information function."
            ),
            
            h4(strong("What Happens in CAT")),
            
            tags$ul(
              tags$li(strong("Step 1:"), " Compute information for each candidate item at the current theta."),
              tags$li(strong("Step 2:"), " Select the item with maximum information."),
              tags$li(strong("Step 3:"), " Administer the item and record the response."),
              tags$li(strong("Step 4:"), " Update the ability estimate using the response and the item parameters."),
              tags$li(strong("Step 5:"), " Repeat with the new theta, recomputing information and selecting the next item.")
            ),
            
            h4(strong("Indirect Effects Through Ability Updates")),
            p("Responses do not change item information directly, but they ", strong("indirectly"), " influence which items are selected by shifting the ability estimate."),
            
            tags$ul(
              tags$li("Unexpected responses can cause large shifts in the estimated ability."),
              tags$li("This moves the evaluation point to a different region of the item information curves."),
              tags$li("Different items may now maximize information at the new theta.")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("In the simulator:"),
              tags$ul(
                tags$li("Information values for an item change only when theta changes."),
                tags$li("The same item always has the same information at the same theta."),
                tags$li("Information computation uses only item parameters and the current theta.")
              )
            ),
            
            h4(strong("Classical Test Theory versus IRT")),
            
            p(strong("Classical Test Theory (CTT)")),
            p("In CTT, responses indirectly affect information because item level statistics are computed from actual response data."),
            tags$ul(
              tags$li("Item discrimination, such as r point biserial or item total correlation, is computed from observed responses."),
              tags$li("Item information indices, such as using \\(I = r^2\\) or contributions to reliability, are functions of these sample based statistics."),
              tags$li("Therefore, item information changes if the response patterns in the sample change.")
            ),
            p("CTT item information is empirical and depends on the specific response data set."),
            
            p(strong("Item Response Theory (IRT)")),
            p("In IRT, item information is a mathematical function of item parameters and ability."),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #87CEEB;",
              HTML("\\[ I(\\theta) = a^2 \\cdot P(\\theta) \\cdot \\big(1 - P(\\theta)\\big) \\]")
            ),
            tags$ul(
              tags$li("It depends on item parameters such as \\(a, b, c, d\\) and the chosen IRT model."),
              tags$li("These parameters are fixed once they are estimated during calibration."),
              tags$li("Specific examinee responses do not change the item information function once parameters are set.")
            ),
            p("IRT item information is model based and independent of any particular response pattern, given a calibrated item."),
            
            h4(strong("Practical Implications")),
            tags$ul(
              tags$li("Item calibration must use sufficiently large samples to obtain stable parameters."),
              tags$li("CAT efficiency depends strongly on accurate item parameters."),
              tags$li("Response patterns influence measurement through updated ability estimates, not through changes to item information functions.")
            )
          )
        )
      )
    )
  })
  
  output$faq_content_3 <- renderUI({
    div(
      style = "background: #F5F0FF; border-left: 4px solid #9370DB; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "Why does EAP θ̂ estimation stabilize around true theta even with random item selection?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("EAP θ̂ stabilization occurs due to the Bayesian nature of the estimator and the Law of Large Numbers, even when items are selected at random.")),
            
            h4(strong("1. Bayesian Foundation of EAP")),
            p("Expected a posteriori (EAP) θ̂ estimation combines the likelihood with a prior distribution."),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #9370DB;",
              HTML("\\[ \\text{Posterior} \\propto \\text{Likelihood} \\times \\text{Prior} \\] \\[ \\hat{\\theta}_{EAP} = E[\\theta \\mid \\text{responses}] = \\int \\theta \\cdot \\pi(\\theta \\mid \\text{responses}) \\, d\\theta \\]")
            ),
            
            p(strong("The prior as an anchor:")),
            tags$ul(
              tags$li("EAP starts from a prior, often \\(N(0, 1)\\)."),
              tags$li("With few items, the prior prevents extreme or unstable estimates."),
              tags$li("As more data accumulate, the likelihood dominates and the influence of the prior fades.")
            ),
            
            h4(strong("2. How Random Items Provide Information")),
            p("Even when items are sampled randomly from the pool, the pattern of responses still carries information about the true ability level."),
            
            p(strong("For a high ability examinee, for example theta equal to 1.5:")),
            tags$ul(
              tags$li("Easy items tend to be answered correctly with very high probability."),
              tags$li("Medium difficulty items tend to be answered correctly with moderately high probability."),
              tags$li("Hard items have lower but still non zero probability of correct response.")
            ),
            
            p("The overall pattern across many random items forms a response profile that is most compatible with the true theta."),
            
            h4(strong("3. Law of Large Numbers in Action")),
            p("With random item selection, each examinee receives a random sample of items from the pool. As the number of items increases, the observed response pattern converges to its expected pattern at the true theta."),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #9370DB;",
              HTML("\\[ L(\\theta \\mid \\text{responses}) = \\prod_i P_i(\\theta)^{u_i} \\cdot \\big(1 - P_i(\\theta)\\big)^{1 - u_i} \\]")
            ),
            
            p(strong("Convergence intuition:")),
            tags$ul(
              tags$li("At low theta values, the model predicts too many failures for a high ability examinee."),
              tags$li("At high theta values, the model predicts too many successes."),
              tags$li("At the true theta, the likelihood is largest because predicted probabilities best match the observed response pattern.")
            ),
            
            h4(strong("4. Contrast with Maximum Information Selection")),
            div(
              style = "background: #d1ecf1; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #0dcaf0;",
              p(strong("Random selection with EAP:")),
              tags$ul(
                tags$li("Stabilizes through consistency and large sample behavior."),
                tags$li("Less efficient and usually requires more items."),
                tags$li("Posterior credible intervals are wider for a given test length.")
              ),
              p(strong("Maximum information selection with MLE or EAP:")),
              tags$ul(
                tags$li("Stabilizes more quickly by targeting the region of highest information."),
                tags$li("More efficient and needs fewer items for similar precision."),
                tags$li("Posterior intervals or standard errors are tighter.")
              )
            ),
            
            h4(strong("5. Mathematical Guarantees")),
            p("EAP θ̂ is a consistent estimator under standard regularity conditions."),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #9370DB;",
              HTML("\\[ \\lim_{n \\to \\infty} P\\big( \\big| \\hat{\\theta}_{EAP} - \\theta_{true} \\big| > \\varepsilon \\big) = 0 \\]")
            ),
            p("This property holds regardless of the item selection method, although the ", strong("rate of convergence"), " depends strongly on how informative the items are."),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("In this simulator:"),
              tags$ul(
                tags$li("You can run multiple CAT sessions with random item selection."),
                tags$li("Estimates eventually stabilize near the true theta, although more items are required compared to maximum information selection."),
                tags$li("You can compare the speed of convergence and the resulting SEM under different selection rules.")
              )
            ),
            
            h4(strong("6. Practical Implications")),
            tags$ul(
              tags$li("Item selection strategy affects efficiency, not the theoretical consistency of EAP."),
              tags$li("Random selection may be useful when item exposure control is critical."),
              tags$li("EAP provides robustness against early poor item matches by combining prior and likelihood."),
              tags$li("The trade off is between speed of convergence and control over item usage.")
            ),
            
            p("This illustrates that any sufficiently large random sample of items from a calibrated pool can recover an examinee ability level, thanks to the combination of Bayesian estimation and large sample theory.")
          )
        )
      )
    )
  })
  
  output$faq_content_4 <- renderUI({
    div(
      style = "background: #F0FFF0; border-left: 4px solid #32CD32; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "What is the difference between SEM (precision) and bias (accuracy)?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("SEM and bias measure fundamentally different aspects of estimation quality."), " Understanding this distinction is crucial for interpreting CAT results."),
            
            h4(strong("Precision versus Accuracy")),
            tags$ul(
              tags$li(strong("SEM (Standard Error of Measurement):"), " Measures ", em("precision"), " - how consistent or reproducible the estimates are across repeated measurements."),
              tags$li(strong("Bias:"), " Measures ", em("accuracy"), " - how close the estimates are to the true value on average.")
            ),
            
            h4(strong("The Archery Target Analogy")),
            div(
              style = "text-align: center; margin: 20px 0; padding: 15px; background: #f8f9fa; border-radius: 8px;",
              p(em("Four scenarios of precision vs accuracy:"), style = "font-weight: 600; margin-bottom: 15px;"),
              div(
                style = "display: grid; grid-template-columns: 1fr 1fr; gap: 10px; text-align: left;",
                div(style = "padding: 10px; background: #fff3cd; border-radius: 6px;", strong("High SEM, High Bias:"), " Arrows scattered far from bullseye - inconsistent and inaccurate"),
                div(style = "padding: 10px; background: #f8d7da; border-radius: 6px;", strong("Low SEM, High Bias:"), " Arrows clustered tightly but off-target - precise but wrong"),
                div(style = "padding: 10px; background: #d1ecf1; border-radius: 6px;", strong("High SEM, Low Bias:"), " Arrows scattered widely around bullseye - accurate on average but unreliable"),
                div(style = "padding: 10px; background: #d4edda; border-radius: 6px;", strong("Low SEM, Low Bias:"), " Arrows clustered in bullseye - ideal measurement")
              )
            ),
            
            h4(strong("Why This Matters in CAT")),
            p("A CAT can achieve low SEM by:"),
            tags$ul(
              tags$li("Selecting highly informative items at the current theta estimate"),
              tags$li("Using items with high discrimination parameters"),
              tags$li("Matching item difficulty to ability level")
            ),
            
            p("But systematic bias can still occur due to:"),
            tags$ul(
              tags$li("Poorly calibrated item parameters"),
              tags$li("Model misspecification"),
              tags$li("Prior distribution influence in Bayesian estimation"),
              tags$li("Item exposure control constraints")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("Key Insight:"),
              p("Low SEM tells you the test is giving a ", strong("consistent"), " answer, but it doesn't guarantee that answer is ", strong("correct"), ". Always examine both precision and accuracy metrics.")
            )
          )
        )
      )
    )
  })
  
  output$faq_content_5 <- renderUI({
    div(
      style = "background: #FFF8F0; border-left: 4px solid #FFA500; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "Why can bias persist even with long tests and low SEM?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("Bias often stems from systematic errors that don't diminish with more items,"), " while SEM reflects random error that decreases with test length."),
            
            h4(strong("1. Model Misspecification")),
            p("The 3PL model makes strong assumptions:"),
            tags$ul(
              tags$li("Specific logistic functional form"),
              tags$li("Local independence"),
              tags$li("Correct discrimination, difficulty, and guessing parameters")
            ),
            p("If reality deviates from these assumptions, systematic bias persists regardless of test length."),
            
            h4(strong("2. Item Parameter Estimation Error")),
            p("In operational CAT, item parameters come from calibration samples:"),
            tags$ul(
              tags$li("Calibration errors become built-in systematic bias"),
              tags$li("Giving more items to one examinee doesn't fix calibration errors"),
              tags$li("SEM decreases with more items, but bias remains constant")
            ),
            
            h4(strong("3. Prior Influence in EAP θ̂ Estimation")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #FFA500;",
              HTML("\\[ \\text{Posterior} \\propto \\text{Likelihood} \\times \\text{Prior} \\]")
            ),
            p("Even with many items, the prior distribution (often N(0,1)) can:"),
            tags$ul(
              tags$li("Pull extreme ability estimates toward the mean"),
              tags$li("Create systematic bias at scale boundaries"),
              tags$li("Prevent complete convergence to true values")
            ),
            
            h4(strong("4. Boundary and Scale Effects")),
            p("Practical implementations often impose limits:"),
            div(
              style = "background: #263238; color: #ECEFF1; padding: 12px; margin: 15px 0; border-radius: 6px; font-family: 'Courier New', monospace; font-size: 13px;",
              "theta_hat <- pmin(4, pmax(-4, eap_estimate))  # Common bounds"
            ),
            p("This creates systematic bias for examinees near the scale extremes."),
            
            h4(strong("5. Mathematical Foundation: Bias-Variance Decomposition")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #FFA500;",
              HTML("\\[ \\text{MSE} = \\text{Bias}^2 + \\text{Variance} \\] \\[ \\text{Variance} \\approx \\text{SEM}^2 \\]")
            ),
            p("As test length increases:"),
            tags$ul(
              tags$li(strong("Variance"), " decreases (SEM gets smaller)"),
              tags$li(strong("Bias²"), " remains constant if caused by systematic errors"),
              tags$li("Thus, MSE is dominated by bias for long tests")
            ),
            
            h4(strong("6. Guessing Parameter Asymmetry")),
            p("The 3PL guessing parameter creates information asymmetry:"),
            tags$ul(
              tags$li("Low-ability examinees have probability floor at c-parameter"),
              tags$li("Reduces information and creates bias patterns at lower end"),
              tags$li("Effects persist regardless of test length")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("Theoretical Insight:"),
              p("Under model misspecification, estimators converge to ", em("pseudo-true values"), " that minimize KL divergence, not necessarily the true theta values.")
            ),
            
            h4(strong("Practical Implications")),
            tags$ul(
              tags$li("Bias monitoring requires separate procedures from precision monitoring"),
              tags$li("Item pool design must address both precision and accuracy"),
              tags$li("Post-hoc linking and equating often needed to correct systematic biases"),
              tags$li("CAT termination should consider both SEM and bias thresholds in critical score regions")
            ),
            
            p("This explains why sophisticated CAT systems implement ", strong("bias detection and correction procedures"), " alongside precision-based termination rules.")
          )
        )
      )
    )
  })
  
  output$faq_content_6 <- renderUI({
    div(
      style = "background: #FFF0F5; border-left: 4px solid #FF69B4; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "How does the prior distribution affect EAP θ̂ estimation, and what are the mathematical consequences?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("The prior in EAP θ̂ estimation serves as a regularization term that affects both bias and variance, with mathematical trade-offs.")),
            
            h4(strong("EAP as a Shrinkage Estimator:")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #FF69B4;",
              HTML("\\[ \\hat{\\theta}_{EAP} = \\displaystyle\\frac{\\displaystyle\\int \\theta \\cdot L(\\theta) \\cdot \\pi(\\theta) \\: d\\theta}{\\displaystyle\\int L(\\theta) \\cdot \\pi(\\theta) \\: d\\theta} \\]")
            ),
            
            p("For normal prior N(μ_p, σ_p^2) and approximate normal likelihood N(μ_l, σ_l^2):"),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #FF69B4;",
              HTML("\\[ \\hat{\\theta}_{EAP} \\approx \\frac{\\sigma_l^2}{\\sigma_p^2 + \\sigma_l^2} \\mu_p + \\frac{\\sigma_p^2}{\\sigma_p^2 + \\sigma_l^2} \\mu_l \\]")
            ),
            
            h4(strong("Mathematical Effects:")),
            
            p(strong("1. Bias-Variance Tradeoff")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #FF69B4;",
              HTML("\\[ \\text{Bias} = E[\\hat{\\theta} - \\theta] \\approx \\displaystyle\\frac{\\sigma_l^2}{\\sigma_p^2 + \\sigma_l^2} (\\mu_p - \\theta) \\] \\[ \\text{Variance} \\approx \\displaystyle\\frac{\\sigma_p^2 \\cdot \\sigma_l^2}{\\sigma_p^2 + \\sigma_l^2} \\]")
            ),
            
            p(strong("2. Small Sample Behavior")),
            tags$ul(
              tags$li("With few items: \\(\\sigma_l^2\\) large, EAP θ̂ ≈ μ_p (strong prior influence)"),
              tags$li("With many items: \\(\\sigma_l^2\\) small, EAP θ̂ ≈ μ_l (likelihood dominates)"),
              tags$li("The transition depends on relative precisions")
            ),
            
            p(strong("3. Boundary Effects")),
            p("With bounded integration (e.g., -4 to +4):"),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #FF69B4;",
              HTML("\\[ \\hat{\\theta}_{EAP} = \\displaystyle\\frac{\\displaystyle\\int_{-4}^{4} \\theta \\cdot L(\\theta) \\cdot \\pi(\\theta) \\: d\\theta}{\\displaystyle\\int_{-4}^{4} L(\\theta) \\cdot \\pi(\\theta) \\: d\\theta} \\]")
            ),
            p("This creates systematic bias when true θ is near the boundaries."),
            
            h4(strong("Optimal Prior Selection:")),
            p("The 'best' prior depends on the population:"),
            tags$ul(
              tags$li("N(0,1) works well for standardized populations"),
              tags$li("Informative priors can reduce bias if population parameters are known"),
              tags$li("Very diffuse priors approach MLE behavior but lose regularization benefits")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("Theoretical Insight:"),
              p("EAP minimizes Bayes risk under squared error loss, but this comes with guaranteed bias when the prior mean differs from the true population mean.")
            )
          )
        )
      )
    )
  })
  
  output$faq_content_7 <- renderUI({
    div(
      style = "background: #F0F8FF; border-left: 4px solid #4682B4; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "Does increasing the item pool size reduce bias in CAT or fixed-form tests?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("Increasing the pool size can reduce bias, but only under specific conditions."), " It helps when the original pool lacks good coverage around the true ability level, but it does not fix all sources of bias."),
            
            h4(strong("1. When a Larger Pool Helps")),
            p("Bias is often driven by poor matching between item difficulty and examinee ability."),
            tags$ul(
              tags$li("If the pool has too few items with difficulty near the true theta, the test is forced to use off-target items."),
              tags$li("Off-target items provide less information and pull the estimate toward the prior or toward regions with better coverage."),
              tags$li("Adding more well-calibrated items with b values around the target θ increases local test information and can reduce bias.")
            ),
            
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #4682B4;",
              HTML("\\[ TIF(\\theta) = \\sum_{i \\in \\text{pool}} I_i(\\theta) \\]")
            ),
            p("A larger pool can increase ", strong("TIF(θ) in the region of interest"), " if the additional items are well targeted."),
            
            h4(strong("2. When a Larger Pool Does Not Help")),
            p("Simply adding more items does not guarantee lower bias."),
            tags$ul(
              tags$li("If the new items have the same poor difficulty distribution (e.g., all very easy or very hard), coverage around the true theta does not improve."),
              tags$li("Bias from model misspecification, mis-calibrated item parameters, or a misspecified prior will persist even with a very large pool."),
              tags$li("In random fixed-form selection, a larger pool with the same parameter distribution does not systematically reduce bias for a single examinee.")
            ),
            
            h4(strong("3. Pool Size versus Pool Quality")),
            tags$ul(
              tags$li(strong("Pool size"), " controls how many items are available."),
              tags$li(strong("Pool quality and coverage"), " control how well the test can represent different theta regions."),
              tags$li("For bias reduction, increasing pool size is useful only when it increases the density of high-quality items near the ability levels of interest.")
            ),
            
            div(
              style = "background: #d1ecf1; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #0dcaf0;",
              p(strong("Random selection:"), " A larger pool with the same parameter distribution mainly affects item exposure and content variety, not bias at the individual level."),
              p(strong("Information-based CAT selection:"), " A larger, well-targeted pool gives the algorithm more options to choose high-information, well-matched items, which can reduce bias and SEM simultaneously.")
            ),
            
            h4(strong("4. Practical Implications for This Simulator")),
            tags$ul(
              tags$li("Expanding the pool is most useful when you add items that fill gaps in the difficulty continuum around θ = 0, ±1, ±2, etc."),
              tags$li("Monitoring bias as a function of true theta and pool design is as important as monitoring SEM."),
              tags$li("Pool design should focus first on balanced coverage and realistic a/c parameters; pool size is helpful once those foundations are in place.")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("Key Takeaway:"),
              p("Increasing item pool size can reduce bias ", strong("only when it improves difficulty coverage and item quality"), ". It does not automatically fix bias caused by model misspecification, poor priors, or calibration errors.")
            )
          )
        )
      )
    )
  })
  
  output$faq_content_8 <- renderUI({
    div(
      style = "background: #E8F4FD; border-left: 4px solid #4A90E2; padding: 15px 20px; margin: 0; border-radius: 0 0 8px 8px;",
      h4(style = "margin: 0 0 10px 0; color: #2c3e50;", "Why does Random (Fixed-form) stabilize while MFI (CAT) fluctuates?"),
      div(
        style = "background: white; padding: 20px; border-radius: 8px; margin-top: 10px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);",
        withMathJax(
          tagList(
            p(strong("This is expected behavior due to fundamental differences between fixed-form and adaptive testing methodologies.")),
            
            p(strong("1. Fixed-form Testing (Random Selection)")),
            tags$ul(
              tags$li("All respondents receive the same predetermined set of items"),
              tags$li("Measurement precision (SEM/TIF) remains constant throughout"),
              tags$li("Once enough items are answered, estimates converge and stabilize"),
              tags$li("No adaptation means no feedback loops causing fluctuations")
            ),
            
            p(strong("2. Computerized Adaptive Testing (MFI Selection)")),
            tags$ul(
              tags$li("Each item is dynamically selected based on current ability estimate"),
              tags$li("Creates a continuous feedback loop: θ → item → θ → item"),
              tags$li("Maximum Fisher Information targeting constantly shifts optimal items"),
              tags$li("Early items cause large θ jumps; later items cause fine-tuning oscillations")
            ),
            
            h4(strong("Mathematical Differences")),
            div(
              style = "background: #f8f9fa; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #4A90E2;",
              HTML("\\[ \\text{Fixed-form: } \\theta_{k+1} = f(\\theta_k, \\text{fixed items}) \\]"),
              HTML("\\[ \\text{CAT (MFI): } \\theta_{k+1} = f(\\theta_k, I_{k+1}(\\theta_k)) \\]"),
              HTML("\\[ \\text{where } I_{k+1}(\\theta_k) = \\max_b I(\\theta_k, b) \\]")
            ),
            
            p(strong("Why MFI Naturally Fluctuates:")),
            tags$ul(
              tags$li(strong("Estimation Refinement:"), " Early rapid estimation followed by gradual fine-tuning"),
              tags$li(strong("Boundary Effects:"), " Limited optimal items when θ approaches bank boundaries"),
              tags$li(strong("Response Sensitivity:"), " Single unexpected response can significantly shift θ estimate"),
              tags$li(strong("Information Targeting:"), " Different items become optimal as θ estimate changes")
            ),
            
            div(
              style = "background: #fff3cd; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #ffc107;",
              strong("Visualization of the Process:"),
              tags$ul(
                tags$li("Fixed-form: θ → Items → θ → Items → θ → STABLE"),
                tags$li("CAT: θ → Adapt → θ → Adapt → θ → FLUCTUATING"),
                tags$li("The adaptation in CAT creates natural oscillation as it hones in on true ability")
              )
            ),
            
            div(
              style = "background: #d1ecf1; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #0dcaf0;",
              h5("This Fluctuation is Actually Desirable"),
              p(strong("Early fluctuation"), " indicates rapid initial ability estimation"),
              p(strong("Middle fluctuation"), " shows the refinement process working"),
              p(strong("Late fluctuation"), " represents fine-tuning near true ability level"),
              p("CAT achieves higher precision with fewer items by continuing to adapt, while fixed-form stabilizes earlier but may be less precise overall.")
            ),
            
            div(
              style = "background: #e8f5e8; padding: 15px; margin: 15px 0; border-radius: 8px; border-left: 4px solid #28a745;",
              h5("In This Simulator:"),
              tags$ul(
                tags$li("Random selection shows how traditional tests behave"),
                tags$li("MFI demonstrates real adaptive testing dynamics"),
                tags$li("The fluctuations you see are characteristic of working CAT algorithms"),
                tags$li("This is why bias detection waits until SEM target is met - to ensure meaningful stabilization patterns")
              )
            ),
            
            p("The difference highlights the trade-off between ", strong("estimation stability"), " in fixed-form tests and ", strong("measurement efficiency"), " in adaptive tests.")
          )
        )
      )
    )
  })
  
  output$code_content <- renderUI({
    tagList(
      # Add CSS styles
      tags$style(HTML("
      /* Main Grid Layout */
      .algorithms-grid {
        display: grid;
        grid-template-columns: repeat(auto-fit, minmax(500px, 1fr));
        gap: 1.5rem;
        padding: 1rem;
      }
      
      /* Algorithm Panel Base */
      .algorithm-panel {
        background: white;
        border-radius: 12px;
        box-shadow: 0 4px 20px rgba(0, 0, 0, 0.08);
        border: 1px solid #e1e5e9;
        overflow: hidden;
        transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
        position: relative;
      }
      
      .algorithm-panel:hover {
        transform: translateY(-4px);
        box-shadow: 0 8px 30px rgba(0, 0, 0, 0.12);
      }
      
      /* Panel Header */
      .panel-header {
        display: flex;
        align-items: center;
        gap: 0.75rem;
        padding: 1.25rem 1.5rem;
        background: linear-gradient(135deg, #f8fafc 0%, #f1f5f9 100%);
        border-bottom: 1px solid #e2e8f0;
      }
      
      .panel-icon {
        font-size: 1.25rem;
        color: #3b82f6;
        width: 24px;
        text-align: center;
      }
      
      .panel-header h3 {
        margin: 0;
        font-size: 1.1rem;
        font-weight: 700;
        color: #1e293b;
        flex: 1;
      }
      
      .algorithm-badge {
        background: linear-gradient(135deg, #3b82f6, #6366f1);
        color: white;
        padding: 0.25rem 0.75rem;
        border-radius: 20px;
        font-size: 0.75rem;
        font-weight: 600;
        text-transform: uppercase;
        letter-spacing: 0.5px;
      }
      
      /* Pseudo Code Container */
      .pseudo-code {
        padding: 1.5rem;
        background: #1a1b26;
        border-radius: 0 0 12px 12px;
        position: relative;
      }
      
      /* Method Tabs */
      .method-tabs {
        display: flex;
        flex-direction: column;
        gap: 0.5rem;
      }
      
      .method-tab {
        background: #1a1b26;
        border-radius: 8px;
        overflow: hidden;
        border: 1px solid #2a2b3a;
        transition: all 0.2s ease;
      }
      
      .method-tab.active {
        border-color: #3b82f6;
        box-shadow: 0 0 0 1px #3b82f6;
      }
      
      .method-tab:not(.active):hover {
        border-color: #4b5563;
      }
      
      .tab-header {
        padding: 0.75rem 1rem;
        background: #252631;
        color: #e2e8f0;
        font-weight: 600;
        font-size: 0.9rem;
        border-bottom: 1px solid #2a2b3a;
      }
      
      /* Panel-specific accent colors */
      .eap-panel .panel-icon { color: #10b981; }
      .eap-panel .algorithm-badge { background: linear-gradient(135deg, #10b981, #059669); }
      
      .ipl-panel .panel-icon { color: #f59e0b; }
      .ipl-panel .algorithm-badge { background: linear-gradient(135deg, #f59e0b, #d97706); }
      
      .cat-panel .panel-icon { color: #ef4444; }
      .cat-panel .algorithm-badge { background: linear-gradient(135deg, #ef4444, #dc2626); }
      
      .selection-panel .panel-icon { color: #8b5cf6; }
      .selection-panel .algorithm-badge { background: linear-gradient(135deg, #8b5cf6, #7c3aed); }
      
      .response-panel .panel-icon { color: #06b6d4; }
      .response-panel .algorithm-badge { background: linear-gradient(135deg, #06b6d4, #0891b2); }
      
      .info-panel .panel-icon { color: #f97316; }
      .info-panel .algorithm-badge { background: linear-gradient(135deg, #f97316, #ea580c); }
      
      /* Code Block Improvements */
      .code-block {
        font-family: 'Fira Code', 'Monaco', 'Cascadia Code', monospace;
        font-size: 0.85rem;
        line-height: 1.5;
        color: #c0caf5;
        background: transparent !important;
        border: none !important;
        padding: 0 !important;
        margin: 0 !important;
      }
      
      .code-block .keyword { color: #bb9af7; font-weight: 600; }
      .code-block .function { color: #7dcfff; }
      .code-block .string { color: #9ece6a; }
      .code-block .comment { color: #565f89; font-style: italic; }
      .code-block .number { color: #ff9e64; }
      .code-block .operator { color: #89ddff; }
      .code-block .type { color: #2ac3de; }
      .code-block .variable { color: #c0caf5; }
      
      /* Responsive Design */
      @media (max-width: 768px) {
        .algorithms-grid {
          grid-template-columns: 1fr;
          gap: 1rem;
          padding: 0.5rem;
        }
        
        .panel-header {
          padding: 1rem;
        }
        
        .pseudo-code {
          padding: 1rem;
        }
      }
    ")),
      
      # Main Container
      div(
        id = "cat-algorithms-container",
        class = "algorithms-grid",
        
        # Algorithm 1 - EAP Estimation
        div(
          class = "algorithm-panel eap-panel",
          `data-algorithm` = "eap-estimation",
          div(
            class = "panel-header",
            h3("EAP θ Estimation"),
            span(class = "algorithm-badge", "Bayesian")
          ),
          div(
            class = "pseudo-code",
            code_block("
PROCEDURE EstimateThetaEAP(items, responses, prior)
  INPUT:
    • items: list with {a, b, c} parameters
    • responses: binary vector [0,1]
    • prior: {mean, sd}
  
  IF Empty(items) OR Empty(responses):
    RETURN prior.mean
  
  n ← Min(Length(items), Length(responses))
  nodes ← GaussHermiteNodes()
  weights ← GaussHermiteWeights()
  theta_nodes ← prior.mean + prior.sd * √2 * nodes
  
  FOR i ← 1 TO n:
    p ← P_3PL(theta_nodes, items[i].a, items[i].b, items[i].c)
    IF responses[i] = 1:
      likelihood[,i] ← p
    ELSE:
      likelihood[,i] ← 1 - p
  
  joint_likelihood ← Product(likelihood, axis=1)
  posterior ← joint_likelihood * weights
  posterior ← posterior / Sum(posterior)
  
  theta_estimate ← Sum(posterior * theta_nodes)
  RETURN Clamp(theta_estimate, -4, 4)
END PROCEDURE
          ")
          )
        ),
        
        # Algorithm 2 - 3PL Model
        div(
          class = "algorithm-panel ipl-panel",
          `data-algorithm` = "three-pl-model",
          div(
            class = "panel-header",
            h3("3PL IRT Model"),
            span(class = "algorithm-badge", "Core")
          ),
          div(
            class = "pseudo-code",
            code_block("
FUNCTION P_3PL(θ, a, b, c)
  // Item Response Function
  a ← Max(a, 0.1)           // Min discrimination
  c ← Clamp(c, 0.0, 0.3)    // Guessing bounds
  
  exponent ← -1.702 * a * (θ - b)
  p ← c + (1 - c) / (1 + Exp(exponent))
  RETURN Clamp(p, 1e-10, 1-1e-10)
END FUNCTION

FUNCTION I_3PL(θ, a, b, c)
  // Fisher Information Function
  P ← P_3PL(θ, a, b, c)
  denominator ← Max(1 - c, 1e-10)
  P_prime ← (1.702 * a / denominator) * (P - c) * (1 - P)
  
  information ← (P_prime²) / Max(P * (1 - P), 1e-10)
  RETURN IfFinite(information, 0)
END FUNCTION
          ")
          )
        ),
        
        # Algorithm 3 - CAT Engine
        div(
          class = "algorithm-panel cat-panel",
          `data-algorithm` = "cat-engine",
          div(
            class = "panel-header",
            icon("rocket", class = "panel-icon"),  # Changed to more dynamic icon
            h3("CAT Administration Engine"),
            span(class = "algorithm-badge", "Main Loop")
          ),
          div(
            class = "pseudo-code",
            code_block("
PROCEDURE RunCAT(pool, θ_true, config)
  // Configuration: {se_target, max_items, method}
  administered ← []
  responses ← []
  θ_current ← 0
  se_current ← ∞
  
  // Initial Item
  item ← SelectInitialItem(pool)
  response ← GenerateResponse(θ_true, item)
  administered.Push(item)
  responses.Push(response)
  
  // Adaptive Loop
  WHILE Length(responses) < config.max_items 
        AND se_current > config.se_target:
    
    θ_current ← EstimateThetaEAP(administered, responses)
    information ← ComputeTotalInformation(θ_current, administered)
    se_current ← 1 / √information
    
    IF se_current ≤ config.se_target: BREAK
    
    item ← SelectItem(pool, config.method, θ_current, administered)
    response ← GenerateResponse(θ_true, item)
    administered.Push(item)
    responses.Push(response)
  
  RETURN {
    theta: θ_current,
    se: se_current,
    length: Length(responses),
    items: administered
  }
END PROCEDURE
          ")
          )
        ),
        
        # Algorithm 4 - Selection Methods (Full Width)
        div(
          class = "algorithm-panel selection-panel",
          style = "grid-column: 1 / -1;",  # Make this span full width
          `data-algorithm` = "selection-methods",
          div(
            class = "panel-header",
            h3("Item Selection Methods"),
            span(class = "algorithm-badge", "Multi-Method")
          ),
          div(
            class = "method-tabs",
            # MFI Tab
            div(
              class = "method-tab active",
              `data-method` = "mfi",
              div(class = "tab-header", "Maximum Fisher Information"),
              code_block("
FUNCTION SelectMFI(pool, θ, administered)
  available ← pool - administered
  best_info ← -∞
  best_item ← null
  
  FOR item IN available:
    info ← I_3PL(θ, item.a, item.b, item.c)
    IF info > best_info:
      best_info ← info
      best_item ← item
  
  RETURN best_item
END FUNCTION
            ")
            ),
            
            # Randomesque Tab
            div(
              class = "method-tab",
              `data-method` = "randomesque",
              div(class = "tab-header", "Randomesque (Top-K)"),
              code_block("
FUNCTION SelectRandomesque(pool, θ, administered, k=5)
  available ← pool - administered
  info_scores ← []
  
  FOR item IN available:
    info ← I_3PL(θ, item.a, item.b, item.c)
    info_scores.Append({item, info})
  
  SortDescending(info_scores, by=info)
  top_k ← First(k, info_scores)
  RETURN RandomChoice(top_k).item
END FUNCTION
            ")
            ),
            
            # Sympson-Hetter Tab
            div(
              class = "method-tab",
              `data-method` = "sympson-hetter",
              div(class = "tab-header", "Sympson-Hetter Control"),
              code_block("
FUNCTION SelectSH(pool, θ, administered, exposure_counts)
  available ← pool - administered
  sorted_items ← SortByInformation(available, θ)
  
  FOR item IN sorted_items:
    exposure_rate ← exposure_counts[item.id] / total_tests
    r ← RandomUniform()
    IF r ≤ item.K_parameter:
      RETURN item
  
  RETURN sorted_items[0]  // Fallback
END FUNCTION

PROCEDURE CalibrateSHParameters(pool, target_exposure=0.2)
  FOR item IN pool: item.K ← 1.0
  
  FOR sim ← 1 TO 1000:
    θ ← RandomNormal()
    test_items ← RunCAT(pool, θ, 'mfi')
    UpdateExposureCounts(test_items)
  
  FOR item IN pool:
    exposure ← exposure_counts[item.id] / 1000
    IF exposure > target_exposure:
      item.K ← item.K * (target_exposure / exposure)
END PROCEDURE
            ")
            )
          )
        ),
        
        # Algorithm 5 - Response System
        div(
          class = "algorithm-panel response-panel",
          `data-algorithm` = "response-system",
          div(
            class = "panel-header",
            h3("Response Generation"),
            span(class = "algorithm-badge", "Simulation")
          ),
          div(
            class = "pseudo-code",
            code_block("
FUNCTION GenerateResponse(θ_true, item)
  p_correct ← P_3PL(θ_true, item.a, item.b, item.c)
  u ← RandomUniform(0, 1)
  RETURN IF u < p_correct THEN 1 ELSE 0
END FUNCTION

FUNCTION SimulateTestSession(θ_true, pool, config)
  results ← RunCAT(pool, θ_true, config)
  RETURN {
    estimated_theta: results.theta,
    true_theta: θ_true,
    bias: results.theta - θ_true,
    sem: results.se,
    items_used: results.length
  }
END FUNCTION
          ")
          )
        ),
        
        # Algorithm 6 - Information System
        div(
          class = "algorithm-panel info-panel",
          `data-algorithm` = "information-system",
          div(
            class = "panel-header",
            h3("Information & Precision"),
            span(class = "algorithm-badge", "Metrics")
          ),
          div(
            class = "pseudo-code",
            code_block("
FUNCTION ComputeTestInformation(θ, administered_items)
  total_info ← 0
  FOR item IN administered_items:
    total_info ← total_info + I_3PL(θ, item.a, item.b, item.c)
  RETURN total_info
END FUNCTION

FUNCTION ComputeStandardError(θ, administered_items)
  total_info ← ComputeTestInformation(θ, administered_items)
  IF total_info < 1e-12:
    RETURN ∞
  ELSE:
    RETURN 1 / √total_info
END FUNCTION

FUNCTION ComputeTestReliability(θ, administered_items)
  sem ← ComputeStandardError(θ, administered_items)
  RETURN 1 - sem²  // Assuming unit variance
END FUNCTION
          ")
          )
        )
      )
    )
  })
  
  # Helper function for code blocks
  code_block <- function(text) {
    div(
      class = "code-container",
      tags$pre(
        class = "pseudo-code-block",
        text
      )
    )
  }
  
  
  # ---------- Item pool quality metrics (reactive) ----------
  pool_quality <- reactive({
    pool <- item_pool()
    if (is.null(pool))
      return(NULL)
    
    th_grid <- seq(-3, 3, by = 0.05)
    
    # Pool-wide TIF (no administered subset)
    tif_pool <- rowSums(sapply(seq_len(nrow(pool)), function(i) {
      I_3pl(th_grid, pool$a[i], pool$b[i], pool$c[i])
    }))
    sem_pool <- ifelse(tif_pool > 0, 1 / sqrt(tif_pool), NA_real_)
    
    # Quality bands for discrimination
    qa <- c(
      above_2_5 = sum(pool$a > 2.5),
      very_high = sum(pool$a >= 2.0 & pool$a <= 2.5),
      high      = sum(pool$a >= 1.5 & pool$a < 2.0),
      medium    = sum(pool$a >= 1.0 & pool$a < 1.5),
      low       = sum(pool$a >= 0.5 & pool$a < 1.0),
      very_low  = sum(pool$a < 0.5)
    )
    
    # Coverage bands for difficulty
    qb_center <- sum(abs(pool$b) <= 0.5)
    qb_moder  <- sum(abs(pool$b) > 0.5 & abs(pool$b) <= 1.5)
    qb_outer  <- sum(abs(pool$b) > 1.5)
    
    # Information at theta = 0 (per item)
    info_at0 <- mapply(I_3pl, 0, pool$a, pool$b, pool$c)
    info_at0[!is.finite(info_at0)] <- 0
    ord <- order(info_at0, decreasing = TRUE)
    top10_mean <- mean(info_at0[head(ord, 10)])
    
    info_at0_stats <- c(
      median = unname(stats::median(info_at0)),
      q25    = unname(stats::quantile(info_at0, 0.25)),
      q75    = unname(stats::quantile(info_at0, 0.75)),
      top10_mean = top10_mean,
      cov_b1 = 100 * mean(abs(pool$b) <= 1),
      cov_b2 = 100 * mean(abs(pool$b) <= 2)
    )
    
    list(
      th_grid = th_grid,
      tif_pool = tif_pool,
      sem_pool = sem_pool,
      qa = qa,
      qb = c(
        center = qb_center,
        moderate = qb_moder,
        outer = qb_outer
      ),
      c_stats = c(
        mean = mean(pool$c),
        q50 = median(pool$c),
        min = min(pool$c),
        max = max(pool$c)
      ),
      info_at0 = info_at0,
      info_at0_stats = info_at0_stats
    )
  })
  
  
  #==============================================
  # ---- Pool Quality Metrics UI ----
  #==============================================
  
  output$pool_quality_metrics <- renderUI({
    pq <- pool_quality()
    req(pq)
    
    # Administered items
    admin_items <- items()
    has_admin <- !is.null(admin_items) && nrow(admin_items) > 0
    
    qa <- pq$qa
    qb <- pq$qb
    c_stats   <- pq$c_stats
    info_stats <- pq$info_at0_stats
    
    # Pool-level means
    avg_info_pool          <- mean(pq$info_at0)
    mean_difficulty_pool   <- mean(item_pool()$b)
    mean_discrimination_pool <- mean(item_pool()$a)
    
    # Information at current theta
    current_theta <- eap_theta()
    pool_info_current <- mapply(
      I_3pl,
      current_theta,
      item_pool()$a,
      item_pool()$b,
      item_pool()$c
    )
    pool_info_current[!is.finite(pool_info_current)] <- 0
    avg_info_pool_current <- mean(pool_info_current)
    
    if (has_admin) {
      admin_info_current <- mapply(
        I_3pl,
        current_theta,
        admin_items$a,
        admin_items$b,
        admin_items$c
      )
      admin_info_current[!is.finite(admin_info_current)] <- 0
      avg_info_admin_current <- mean(admin_info_current)
      
      admin_info_at0 <- mapply(
        I_3pl,
        0,
        admin_items$a,
        admin_items$b,
        admin_items$c
      )
      admin_info_at0[!is.finite(admin_info_at0)] <- 0
      avg_info_admin_at0 <- mean(admin_info_at0)
      
      mean_difficulty_admin     <- mean(admin_items$b)
      mean_discrimination_admin <- mean(admin_items$a)
      mean_guessing_admin       <- mean(admin_items$c)
    }
    
    #  HTML OUTPUT:
    HTML(paste0('
<div class="metrics-container">
  <!-- TOP SECTION: Pool Information Summary -->
  <div class="top-section">
    <div class="pool-summary">
      <div class="summary-title">Pool Information Summary</div>
      <div class="summary-content">
        <div class="summary-item">
          <span>Max TIF:</span>
          <strong>', sprintf("%.2f", max(pq$tif_pool)), '</strong>
        </div>
        <div class="summary-item">
          <span>Min SEM:</span>
          <strong>', sprintf("%.3f", min(pq$sem_pool, na.rm=TRUE)), '</strong>
        </div>
        <div class="summary-item">
          <span>Pool Items:</span>
          <strong>', nrow(item_pool()), '</strong>
        </div>
      </div>
    </div>
  </div>

  <!-- BOTTOM SECTION: All metrics in 2 columns -->
  <div class="bottom-section">
    <div class="left-column">
      <!-- Discrimination Quality -->
      <div class="metric-box">
        <div class="metric-title">Discrimination (a) Quality</div>
        <div class="metric-content">
          <div class="metric-row">
            <span>Excellent (>2.5):</span>
            <strong>', qa["above_2_5"], '</strong>
          </div>
          <div class="metric-row">
            <span>Very High (2.0–2.5):</span>
            <strong>', qa["very_high"], '</strong>
          </div>
          <div class="metric-row">
            <span>High (1.5–2.0):</span>
            <strong>', qa["high"], '</strong>
          </div>
          <div class="metric-row">
            <span>Medium (1.0–1.5):</span>
            <strong>', qa["medium"], '</strong>
          </div>
          <div class="metric-row">
            <span>Low (0.5–1.0):</span>
            <strong>', qa["low"], '</strong>
          </div>
          <div class="metric-row">
            <span>Very Low (<0.5):</span>
            <strong>', qa["very_low"], '</strong>
          </div>
        </div>
      </div>

      <!-- Difficulty Coverage -->
      <div class="metric-box">
        <div class="metric-title">Difficulty (b) Coverage</div>
        <div class="metric-content">
          <div class="metric-row">
            <span>Center (|b| ≤ 0.5):</span>
            <strong>', qb["center"], '</strong>
          </div>
          <div class="metric-row">
            <span>Moderate (0.5 < |b| ≤ 1.5):</span>
            <strong>', qb["moderate"], '</strong>
          </div>
          <div class="metric-row">
            <span>Outer (|b| > 1.5):</span>
            <strong>', qb["outer"], '</strong>
          </div>
          <div class="metric-row">
            <span>Coverage (|b| ≤ 1):</span>
            <strong>', sprintf("%.1f%%", info_stats["cov_b1"]), '</strong>
          </div>
          <div class="metric-row">
            <span>Coverage (|b| ≤ 2):</span>
            <strong>', sprintf("%.1f%%", info_stats["cov_b2"]), '</strong>
          </div>
        </div>
      </div>

      <!-- Guessing Statistics -->
      <div class="metric-box">
        <div class="metric-title">Guessing (c) Statistics</div>
        <div class="metric-content">
          <div class="metric-row">
            <span>Pool Mean:</span>
            <strong>', sprintf("%.3f", c_stats["mean"]), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin Mean:</span>
            <strong>', sprintf("%.3f", mean_guessing_admin), '</strong>
          </div>')
                } else {
                  '
          <div class="metric-row">
            <span>Admin Mean:</span>
            <strong>-</strong>
          </div>'
                }, '
          <div class="metric-row">
            <span>Minimum:</span>
            <strong>', sprintf("%.3f", c_stats["min"]), '</strong>
          </div>
          <div class="metric-row">
            <span>Maximum:</span>
            <strong>', sprintf("%.3f", c_stats["max"]), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin Min:</span>
            <strong>', sprintf("%.3f", min(admin_items$c)), '</strong>
          </div>
          <div class="metric-row">
            <span>Admin Max:</span>
            <strong>', sprintf("%.3f", max(admin_items$c)), '</strong>
          </div>')
                } else {
                  ''
                }, '
        </div>
      </div>
    </div>

    <div class="right-column">
      <!-- Discrimination Statistics -->
      <div class="metric-box">
        <div class="metric-title">Discrimination (a) Statistics</div>
        <div class="metric-content">
          <div class="metric-row">
            <span>Pool Mean:</span>
            <strong>', sprintf("%.2f", mean_discrimination_pool), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin Mean:</span>
            <strong>', sprintf("%.2f", mean_discrimination_admin), '</strong>
          </div>')
                } else {
                  '
          <div class="metric-row">
            <span>Admin Mean:</span>
            <strong>-</strong>
          </div>'
                }, '
          <div class="metric-row">
            <span>Minimum:</span>
            <strong>', sprintf("%.2f", min(item_pool()$a)), '</strong>
          </div>
          <div class="metric-row">
            <span>Maximum:</span>
            <strong>', sprintf("%.2f", max(item_pool()$a)), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin Min:</span>
            <strong>', sprintf("%.2f", min(admin_items$a)), '</strong>
          </div>
          <div class="metric-row">
            <span>Admin Max:</span>
            <strong>', sprintf("%.2f", max(admin_items$a)), '</strong>
          </div>')
                } else {
                  ''
                }, '
        </div>
      </div>

      <!-- Difficulty Statistics -->
      <div class="metric-box">
        <div class="metric-title">Difficulty (b) Statistics</div>
        <div class="metric-content">
          <div class="metric-row">
            <span>Pool Mean:</span>
            <strong>', sprintf("%.2f", mean_difficulty_pool), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin Mean:</span>
            <strong>', sprintf("%.2f", mean_difficulty_admin), '</strong>
          </div>')
                } else {
                  '
          <div class="metric-row">
            <span>Admin Mean:</span>
            <strong>-</strong>
          </div>'
                }, '
          <div class="metric-row">
            <span>Minimum:</span>
            <strong>', sprintf("%.2f", min(item_pool()$b)), '</strong>
          </div>
          <div class="metric-row">
            <span>Maximum:</span>
            <strong>', sprintf("%.2f", max(item_pool()$b)), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin Min:</span>
            <strong>', sprintf("%.2f", min(admin_items$b)), '</strong>
          </div>
          <div class="metric-row">
            <span>Admin Max:</span>
            <strong>', sprintf("%.2f", max(admin_items$b)), '</strong>
          </div>')
                } else {
                  ''
                }, '
        </div>
      </div>

      <!-- Average Item Information -->
      <div class="metric-box">
        <div class="metric-title">Average Item Information</div>
        <div class="metric-content">
          <div class="metric-row">
            <span>Pool @ θ = 0:</span>
            <strong>', sprintf("%.3f", avg_info_pool), '</strong>
          </div>
          <div class="metric-row">
            <span>Pool @ current θ:</span>
            <strong>', sprintf("%.3f", avg_info_pool_current), '</strong>
          </div>',
                if (has_admin) {
                  paste0('
          <div class="metric-row">
            <span>Admin @ θ = 0:</span>
            <strong>', sprintf("%.3f", avg_info_admin_at0), '</strong>
          </div>
          <div class="metric-row">
            <span>Admin @ current θ:</span>
            <strong>', sprintf("%.3f", avg_info_admin_current), '</strong>
          </div>')
                } else {
                  '
          <div class="metric-row">
            <span>Admin @ θ = 0:</span>
            <strong>-</strong>
          </div>
          <div class="metric-row">
            <span>Admin @ current θ:</span>
            <strong>-</strong>
          </div>'
                }, '
          <div class="metric-row">
            <span>Admin Items:</span>
            <strong>', if(has_admin) nrow(admin_items) else "0", '</strong>
          </div>
        </div>
      </div>
    </div>
  </div>
</div>

    <style>
    .metrics-container {
      max-width: 1200px;
      margin: 0 auto;
      font-family: Arial, sans-serif;
      display: flex;
      flex-direction: column;
      gap: 20px;
    }
    
    .top-section {
      width: 100%;
    }
    
    .pool-summary {
      background: #f8f9fa;
      border: 1px solid #e9ecef;
      border-radius: 8px;
      padding: 16px;
      width: 100%;
    }
    
    .summary-title {
      font-weight: 600;
      color: #1e3a8a;
      margin-bottom: 12px;
      font-size: 16px;
    }
    
    .summary-content {
      display: grid;
      grid-template-columns: repeat(3, 1fr);
      gap: 12px;
    }
    
    .summary-item {
      background: white;
      padding: 12px;
      border-radius: 6px;
      display: flex;
      justify-content: space-between;
      align-items: center;
      border: 1px solid #e9ecef;
    }
    
    .bottom-section {
      display: grid;
      grid-template-columns: 1fr 1fr;
      gap: 20px;
      width: 100%;
    }
    
    .left-column, .right-column {
      display: flex;
      flex-direction: column;
      gap: 20px;
    }
    
    .metric-box {
      background: white;
      border: 1px solid #e9ecef;
      border-radius: 8px;
      padding: 16px;
      box-shadow: 0 1px 3px rgba(0,0,0,0.1);
    }
    
    .metric-title {
      font-weight: 600;
      color: #1e3a8a;
      margin-bottom: 12px;
      font-size: 14px;
    }
    
    .metric-content {
      display: flex;
      flex-direction: column;
      gap: 8px;
    }
    
    .metric-row {
      display: flex;
      justify-content: space-between;
      align-items: center;
      font-size: 13px;
      color: #4b5563;
    }
    
    .metric-row strong {
      color: #1e3a8a;
      font-weight: 600;
    }
    </style>
  '))
  })  # <-- CLOSING BRACKET FOR renderUI
  
  
  #==============================================  
  # ---- POOL METRICS ----
  #==============================================  
  
  create_gradient_fill <- function(x, y_upper, y_lower, col, alpha = 0.3) {
    polygon(c(x, rev(x)), c(y_upper, rev(y_lower)), 
            col = adjustcolor(col, alpha.f = alpha), border = NA)
  }
  
  output$pool_plot_a <- renderPlot({
    pool <- item_pool()
    req(pool, nrow(pool) > 0)
    
    # Get administered items for comparison
    admin_items <- items()
    has_admin <- !is.null(admin_items) && nrow(admin_items) > 0
    
    tryCatch({
      # Increase top margin for title and legend space
      par(mar = c(4, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Calculate common bin breaks for both datasets
      all_a <- pool$a
      if(has_admin) all_a <- c(all_a, admin_items$a)
      
      breaks <- seq(min(all_a), max(all_a), length.out = 21)  # 20 bins
      
      # Plot pool distribution
      h_pool <- hist(pool$a, breaks = breaks, plot = FALSE)
      h_admin <- if(has_admin) hist(admin_items$a, breaks = breaks, plot = FALSE) else NULL
      
      # Calculate ranges - secondary axis gets smaller range
      ylim_primary <- c(0, max(h_pool$counts) * 1.05)
      if(has_admin) {
        # Scale secondary axis to be 0-50% of primary axis range
        admin_max <- max(h_admin$counts)
        scale_factor <- (max(h_pool$counts) * 0.5) / admin_max
        ylim_secondary <- c(0, admin_max * scale_factor)
      }
      
      # Enhanced color scheme - BLUE for pool
      pool_color <- "#3498db"      # Vibrant blue
      pool_fill <- "#3498db33"     # Semi-transparent blue
      pool_axis_color <- "#3498db" # Blue for pool axis
      pool_mean_color <- "#2980b9" # Darker blue for pool mean line
      
      admin_color <- "#e74c3c"     # Vibrant red for admin
      admin_fill <- "#e74c3c66"    # Semi-transparent red
      grid_color <- "#ecf0f1"      # Light gray grid
      
      # Plot pool histogram (primary axis) - BLUE theme
      plot(h_pool, col = pool_fill, border = pool_color, lwd = 1.5,
           main = "", 
           xlab = "Discrimination Parameter (a)", 
           ylab = "Pool Frequency", 
           ylim = ylim_primary,
           cex.main = 1.0, cex.axis = 0.8, cex.lab = 0.9,
           font.lab = 2, col.lab = pool_color, col.axis = pool_color,  # BLUE axis labels
           axes = FALSE)  # Turn off default axes
      
      # Add custom BLUE axes
      axis(1, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      
      # Add grid for better readability
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Add title with blue accent
      title("Discrimination Parameter Distribution", 
            line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)  # BLUE title
      
      # Add administered items overlay (secondary axis)
      if(has_admin) {
        par(new = TRUE)
        plot(h_admin, col = admin_fill, border = admin_color, lwd = 1.5,
             axes = FALSE, xlab = "", ylab = "", main = "",
             ylim = ylim_secondary)
        axis(4, cex.axis = 0.8, col.axis = admin_color, col = admin_color, lwd = 2)
        mtext("Administered Frequency", side = 4, line = 2.5, 
              cex = 0.85, col = admin_color, font = 2)
      }
      
      # Enhanced mean lines - BLUE for pool mean
      abline(v = mean(pool$a), col = pool_mean_color, lwd = 3, lty = "solid")
      
      # Enhanced legend with BLUE pool elements
      legend_text <- c(
        paste("Pool Mean: ", round(mean(pool$a), 3)),
        paste("Pool Items: ", nrow(pool))
      )
      
      legend_colors <- c(pool_mean_color, NA)  # BLUE for pool mean
      legend_lty <- c("solid", NA)
      legend_lwd <- c(3, NA)
      
      if(has_admin) {
        abline(v = mean(admin_items$a), col = admin_color, lwd = 3, lty = "solid")
        legend_text <- c(
          legend_text,
          paste("Admin Mean: ", round(mean(admin_items$a), 3)),
          paste("Admin Items: ", nrow(admin_items))
        )
        legend_colors <- c(legend_colors, admin_color, NA)
        legend_lty <- c(legend_lty, "solid", NA)
        legend_lwd <- c(legend_lwd, 3, NA)
      }
      
      # Add legend with BLUE pool styling
      legend("top", legend = legend_text, 
             col = legend_colors, 
             lwd = legend_lwd, 
             lty = legend_lty,
             bty = "n", 
             cex = 0.75, 
             horiz = TRUE,
             xpd = TRUE,
             text.col = c(pool_mean_color, pool_color, admin_color, admin_color),  # BLUE text for pool
             inset = c(0, -0.1),  # More negative to push legend lower 
             x.intersp = 0.8,
             seg.len = 1.5)
      
      # Add subtle statistics in the plot area - BLUE for pool
      text(x = max(all_a) * 0.7, y = max(ylim_primary) * 0.9,
           labels = paste("Pool SD: ", round(sd(pool$a), 3)),
           col = pool_color, cex = 0.7, font = 3, adj = 0)  # BLUE text
      
      if(has_admin) {
        text(x = max(all_a) * 0.7, y = max(ylim_primary) * 0.8,
             labels = paste("Admin SD: ", round(sd(admin_items$a), 3)),
             col = admin_color, cex = 0.7, font = 3, adj = 0)  # RED text for admin
      }
      
    }, error = function(e) {
      # Enhanced error plot
      par(mar = c(2, 2, 2, 2))
      plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
           axes = FALSE, xlab = "", ylab = "", main = "")
      rect(0.2, 0.4, 0.8, 0.6, col = "#f8d7da", border = "#f5c6cb")
      text(0.5, 0.5, "Data Not Available", col = "#721c24", cex = 1.2, font = 2)
      text(0.5, 0.3, "Please check data source", col = "#856404", cex = 0.8)
    })
  })  # <-- CLOSING BRACKET FOR pool_plot_a
  
  output$pool_plot_b <- renderPlot({
    pool <- item_pool()
    req(pool, nrow(pool) > 0)
    
    # Get administered items for comparison
    admin_items <- items()
    has_admin <- !is.null(admin_items) && nrow(admin_items) > 0
    
    tryCatch({
      # Increase top margin for title and legend space
      par(mar = c(4, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Calculate common bin breaks for both datasets
      all_b <- pool$b
      if(has_admin) all_b <- c(all_b, admin_items$b)
      
      breaks <- seq(min(all_b), max(all_b), length.out = 21)  # 20 bins
      
      # Plot pool distribution
      h_pool <- hist(pool$b, breaks = breaks, plot = FALSE)
      h_admin <- if(has_admin) hist(admin_items$b, breaks = breaks, plot = FALSE) else NULL
      
      # Calculate ranges - secondary axis gets smaller range
      ylim_primary <- c(0, max(h_pool$counts) * 1.05)
      if(has_admin) {
        # Scale secondary axis to be 0-50% of primary axis range
        admin_max <- max(h_admin$counts)
        scale_factor <- (max(h_pool$counts) * 0.5) / admin_max
        ylim_secondary <- c(0, admin_max * scale_factor)
      }
      
      # Enhanced color scheme - BLUE for pool
      pool_color <- "#3498db"      # Vibrant blue
      pool_fill <- "#3498db33"     # Semi-transparent blue
      pool_axis_color <- "#3498db" # Blue for pool axis
      pool_mean_color <- "#2980b9" # Darker blue for pool mean line
      
      admin_color <- "#e74c3c"     # Vibrant red for admin
      admin_fill <- "#e74c3c66"    # Semi-transparent red
      grid_color <- "#ecf0f1"      # Light gray grid
      
      # Plot pool histogram (primary axis) - BLUE theme
      plot(h_pool, col = pool_fill, border = pool_color, lwd = 1.5,
           main = "", 
           xlab = "Difficulty Parameter (b)", 
           ylab = "Pool Frequency", 
           ylim = ylim_primary,
           cex.main = 1.0, cex.axis = 0.8, cex.lab = 0.9,
           font.lab = 2, col.lab = pool_color, col.axis = pool_color,
           axes = FALSE)
      
      # Add custom BLUE axes
      axis(1, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      
      # Add grid for better readability
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Add title with blue accent
      title("Difficulty Parameter Distribution", 
            line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)
      
      # Add administered items overlay (secondary axis)
      if(has_admin) {
        par(new = TRUE)
        plot(h_admin, col = admin_fill, border = admin_color, lwd = 1.5,
             axes = FALSE, xlab = "", ylab = "", main = "",
             ylim = ylim_secondary)
        axis(4, cex.axis = 0.8, col.axis = admin_color, col = admin_color, lwd = 2)
        mtext("Administered Frequency", side = 4, line = 2.5, 
              cex = 0.85, col = admin_color, font = 2)
      }
      
      # Enhanced mean lines - BLUE for pool mean
      abline(v = mean(pool$b), col = pool_mean_color, lwd = 3, lty = "solid")
      
      # Enhanced legend with BLUE pool elements
      legend_text <- c(
        paste("Pool Mean: ", round(mean(pool$b), 3)),
        paste("Pool Items: ", nrow(pool))
      )
      
      legend_colors <- c(pool_mean_color, NA)
      legend_lty <- c("solid", NA)
      legend_lwd <- c(3, NA)
      
      if(has_admin) {
        abline(v = mean(admin_items$b), col = admin_color, lwd = 3, lty = "solid")
        legend_text <- c(
          legend_text,
          paste("Admin Mean: ", round(mean(admin_items$b), 3)),
          paste("Admin Items: ", nrow(admin_items))
        )
        legend_colors <- c(legend_colors, admin_color, NA)
        legend_lty <- c(legend_lty, "solid", NA)
        legend_lwd <- c(legend_lwd, 3, NA)
      }
      
      # Add legend with BLUE pool styling
      legend("top", legend = legend_text, 
             col = legend_colors, 
             lwd = legend_lwd, 
             lty = legend_lty,
             bty = "n", 
             cex = 0.75, 
             horiz = TRUE,
             xpd = TRUE,
             text.col = c(pool_mean_color, pool_color, admin_color, admin_color),
             inset = c(0, -0.1),
             x.intersp = 0.8,
             seg.len = 1.5)
      
      # Add subtle statistics in the plot area - BLUE for pool
      text(x = max(all_b) * 0.7, y = max(ylim_primary) * 0.9,
           labels = paste("Pool SD: ", round(sd(pool$b), 3)),
           col = pool_color, cex = 0.7, font = 3, adj = 0)
      
      if(has_admin) {
        text(x = max(all_b) * 0.7, y = max(ylim_primary) * 0.8,
             labels = paste("Admin SD: ", round(sd(admin_items$b), 3)),
             col = admin_color, cex = 0.7, font = 3, adj = 0)
      }
      
    }, error = function(e) {
      par(mar = c(2, 2, 2, 2))
      plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
           axes = FALSE, xlab = "", ylab = "", main = "")
      rect(0.2, 0.4, 0.8, 0.6, col = "#f8d7da", border = "#f5c6cb")
      text(0.5, 0.5, "Data Not Available", col = "#721c24", cex = 1.2, font = 2)
      text(0.5, 0.3, "Please check data source", col = "#856404", cex = 0.8)
    })
  })  # <-- CLOSING BRACKET FOR pool_plot_b
  
  output$pool_plot_c <- renderPlot({
    pool <- item_pool()
    req(pool, nrow(pool) > 0)
    
    # Get administered items for comparison
    admin_items <- items()
    has_admin <- !is.null(admin_items) && nrow(admin_items) > 0
    
    tryCatch({
      # Increase top margin for title and legend space
      par(mar = c(4, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Calculate common bin breaks for both datasets
      all_c <- pool$c
      if(has_admin) all_c <- c(all_c, admin_items$c)
      
      breaks <- seq(min(all_c), max(all_c), length.out = 16)  # 15 bins
      
      # Plot pool distribution
      h_pool <- hist(pool$c, breaks = breaks, plot = FALSE)
      h_admin <- if(has_admin) hist(admin_items$c, breaks = breaks, plot = FALSE) else NULL
      
      # Calculate ranges - secondary axis gets smaller range
      ylim_primary <- c(0, max(h_pool$counts) * 1.05)
      if(has_admin) {
        # Scale secondary axis to be 0-50% of primary axis range
        admin_max <- max(h_admin$counts)
        scale_factor <- (max(h_pool$counts) * 0.5) / admin_max
        ylim_secondary <- c(0, admin_max * scale_factor)
      }
      
      # Enhanced color scheme - BLUE for pool
      pool_color <- "#3498db"      # Vibrant blue
      pool_fill <- "#3498db33"     # Semi-transparent blue
      pool_axis_color <- "#3498db" # Blue for pool axis
      pool_mean_color <- "#2980b9" # Darker blue for pool mean line
      
      admin_color <- "#e74c3c"     # Vibrant red for admin
      admin_fill <- "#e74c3c66"    # Semi-transparent red
      grid_color <- "#ecf0f1"      # Light gray grid
      
      # Plot pool histogram (primary axis) - BLUE theme
      plot(h_pool, col = pool_fill, border = pool_color, lwd = 1.5,
           main = "", 
           xlab = "Guessing Parameter (c)", 
           ylab = "Pool Frequency", 
           ylim = ylim_primary,
           cex.main = 1.0, cex.axis = 0.8, cex.lab = 0.9,
           font.lab = 2, col.lab = pool_color, col.axis = pool_color,
           axes = FALSE)
      
      # Add custom BLUE axes
      axis(1, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      
      # Add grid for better readability
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Add title with blue accent
      title("Guessing Parameter Distribution", 
            line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)
      
      # Add administered items overlay (secondary axis)
      if(has_admin) {
        par(new = TRUE)
        plot(h_admin, col = admin_fill, border = admin_color, lwd = 1.5,
             axes = FALSE, xlab = "", ylab = "", main = "",
             ylim = ylim_secondary)
        axis(4, cex.axis = 0.8, col.axis = admin_color, col = admin_color, lwd = 2)
        mtext("Administered Frequency", side = 4, line = 2.5, 
              cex = 0.85, col = admin_color, font = 2)
      }
      
      # Enhanced mean lines - BLUE for pool mean
      abline(v = mean(pool$c), col = pool_mean_color, lwd = 3, lty = "solid")
      
      # Enhanced legend with BLUE pool elements
      legend_text <- c(
        paste("Pool Mean: ", round(mean(pool$c), 3)),
        paste("Pool Items: ", nrow(pool))
      )
      
      legend_colors <- c(pool_mean_color, NA)
      legend_lty <- c("solid", NA)
      legend_lwd <- c(3, NA)
      
      if(has_admin) {
        abline(v = mean(admin_items$c), col = admin_color, lwd = 3, lty = "solid")
        legend_text <- c(
          legend_text,
          paste("Admin Mean: ", round(mean(admin_items$c), 3)),
          paste("Admin Items: ", nrow(admin_items))
        )
        legend_colors <- c(legend_colors, admin_color, NA)
        legend_lty <- c(legend_lty, "solid", NA)
        legend_lwd <- c(legend_lwd, 3, NA)
      }
      
      # Add legend with BLUE pool styling
      legend("top", legend = legend_text, 
             col = legend_colors, 
             lwd = legend_lwd, 
             lty = legend_lty,
             bty = "n", 
             cex = 0.75, 
             horiz = TRUE,
             xpd = TRUE,
             text.col = c(pool_mean_color, pool_color, admin_color, admin_color),
             inset = c(0, -0.1),
             x.intersp = 0.8,
             seg.len = 1.5)
      
      # Add subtle statistics in the plot area - BLUE for pool
      text(x = max(all_c) * 0.7, y = max(ylim_primary) * 0.9,
           labels = paste("Pool SD: ", round(sd(pool$c), 3)),
           col = pool_color, cex = 0.7, font = 3, adj = 0)
      
      if(has_admin) {
        text(x = max(all_c) * 0.7, y = max(ylim_primary) * 0.8,
             labels = paste("Admin SD: ", round(sd(admin_items$c), 3)),
             col = admin_color, cex = 0.7, font = 3, adj = 0)
      }
      
    }, error = function(e) {
      par(mar = c(2, 2, 2, 2))
      plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
           axes = FALSE, xlab = "", ylab = "", main = "")
      rect(0.2, 0.4, 0.8, 0.6, col = "#f8d7da", border = "#f5c6cb")
      text(0.5, 0.5, "Data Not Available", col = "#721c24", cex = 1.2, font = 2)
      text(0.5, 0.3, "Please check data source", col = "#856404", cex = 0.8)
    })
  })  # <-- CLOSING BRACKET FOR pool_plot_c
  
  output$pool_plot_info0 <- renderPlot({
    pool <- item_pool()
    req(pool, nrow(pool) > 0)
    
    # Get administered items for comparison
    admin_items <- items()
    has_admin <- !is.null(admin_items) && nrow(admin_items) > 0
    
    tryCatch({
      # Calculate information at theta = 0 for pool
      info_at0_pool <- mapply(I_3pl, 0, pool$a, pool$b, pool$c)
      info_at0_pool[!is.finite(info_at0_pool)] <- 0
      
      # Calculate information at theta = 0 for administered items
      if(has_admin) {
        info_at0_admin <- mapply(I_3pl, 0, admin_items$a, admin_items$b, admin_items$c)
        info_at0_admin[!is.finite(info_at0_admin)] <- 0
      }
      
      # Enhanced color scheme
      pool_color <- "#3498db"      # Vibrant blue
      admin_color <- "#e74c3c"     # Vibrant red
      pool_fill <- "#3498db33"     # Semi-transparent blue
      admin_fill <- "#e74c3c66"    # Semi-transparent red
      grid_color <- "#ecf0f1"      # Light gray grid
      
      # SAME WIDTH as other plots - consistent margin
      par(mar = c(4, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      if(has_admin) {
        # Calculate appropriate y-axis ranges for each
        pool_ylim <- c(0, max(info_at0_pool, na.rm = TRUE) * 1.05)
        admin_ylim <- c(0, max(info_at0_admin, na.rm = TRUE) * 1.05)
        
        # SAME WIDTH x-axis limits as other plots
        xlim <- c(0.5, 2.5)  # Consistent with other plots
        
        # Plot Pool boxplot (left axis) - SAME WIDTH but EXTRA WIDE boxes
        boxplot(info_at0_pool, 
                main = "", 
                ylab = "", 
                col = pool_fill, 
                border = pool_color, 
                lwd = 2,
                cex.axis = 0.8, cex.lab = 0.9,
                ylim = pool_ylim,
                axes = FALSE,
                at = 1,  # Position at x=1
                boxwex = 0.8,  # EXTRA WIDE box width but same overall plot width
                xlim = xlim)   # Consistent x-axis limits
        
        # Add custom BLUE axes
        axis(1, at = 1:2, labels = c("Item Pool", "Administered"), 
             col = "black", col.axis = "black", col.ticks = "black", lwd = 2)
        axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
        
        # Add y-axis label for pool
        mtext("Pool Information", side = 2, line = 2.5, 
              cex = 0.85, col = pool_color, font = 2)
        
        # Add grid for better readability
        grid(col = grid_color, lty = "solid", lwd = 0.5)
        
        # Add title with blue accent - SAME as other plots
        title("Fisher Information Comparison @ θ = 0", 
              line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)
        
        # Add Administered boxplot with secondary axis - SAME WIDTH but EXTRA WIDE boxes
        par(new = TRUE)
        boxplot(info_at0_admin, 
                main = "", 
                ylab = "", 
                col = admin_fill, 
                border = admin_color, 
                lwd = 2,
                cex.axis = 0.8, cex.lab = 0.9,
                ylim = admin_ylim,
                axes = FALSE,
                at = 2,  # Position at x=2
                boxwex = 0.8,  # EXTRA WIDE box width but same overall plot width
                xlim = xlim)   # Consistent x-axis limits
        
        # Add right axis (Admin - RED) with its own scale
        axis(4, cex.axis = 0.8, col.axis = admin_color, col = admin_color, lwd = 2)
        mtext("Administered Information", side = 4, line = 2.5, 
              cex = 0.85, col = admin_color, font = 2)
        
        # Add mean points - larger for extra wide boxes
        points(1, mean(info_at0_pool), pch = 23, bg = pool_color, col = "white", cex = 2.0, lwd = 2)
        points(2, mean(info_at0_admin), pch = 23, bg = admin_color, col = "white", cex = 2.0, lwd = 2)
        
      } else {
        # Only pool data available - SAME WIDTH as other single plots but EXTRA WIDE box
        xlim <- c(0.5, 1.5)  # Consistent with other single plots
        
        boxplot(info_at0_pool, 
                main = "", 
                ylab = "Fisher Information", 
                col = pool_fill, 
                border = pool_color, 
                lwd = 2,
                cex.axis = 0.8, cex.lab = 0.9,
                xlab = "Item Pool",
                axes = FALSE,
                xlim = xlim,  # Consistent x-axis limits
                boxwex = 0.8)  # EXTRA WIDE box width but same overall plot width
        
        # Add custom BLUE axes
        axis(1, at = 1, labels = "Item Pool", 
             col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
        axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
        
        # Add grid for better readability
        grid(col = grid_color, lty = "solid", lwd = 0.5)
        
        # Add title with blue accent - SAME as other plots
        title("Fisher Information @ θ = 0", 
              line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)
        
        points(1, mean(info_at0_pool), pch = 23, bg = pool_color, col = "white", cex = 2.0, lwd = 2)
      }
      
      # Enhanced legend with SD values included - SAME as other plots
      legend_text <- c(
        paste("Pool: μ=", round(mean(info_at0_pool), 3), 
              " σ=", round(sd(info_at0_pool), 3))
      )
      
      legend_colors <- c(pool_color)
      legend_pch <- c(23)
      
      if(has_admin) {
        legend_text <- c(
          legend_text,
          paste("Admin: μ=", round(mean(info_at0_admin), 3), 
                " σ=", round(sd(info_at0_admin), 3))
        )
        legend_colors <- c(legend_colors, admin_color)
        legend_pch <- c(legend_pch, 23)
      }
      
      # Add legend with matching positioning and styling - SAME as other plots
      legend("top", legend = legend_text, 
             col = legend_colors, 
             pch = legend_pch,
             pt.bg = legend_colors,
             pt.cex = 1.5,
             pt.lwd = 2,
             bty = "n", 
             cex = 0.75,
             horiz = TRUE,
             xpd = TRUE,
             text.col = legend_colors,
             inset = c(0, -0.1),
             x.intersp = 0.8)
      
    }, error = function(e) {
      # Error plot - SAME as other plots
      par(mar = c(2, 2, 2, 2))
      plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
           axes = FALSE, xlab = "", ylab = "", main = "")
      rect(0.2, 0.4, 0.8, 0.6, col = "#f8d7da", border = "#f5c6cb")
      text(0.5, 0.5, "Data Not Available", col = "#721c24", cex = 1.2, font = 2)
      text(0.5, 0.3, "Please check data source", col = "#856404", cex = 0.8)
    })
  })
  
  #==============================================
  # ---- Optimized TIF and SEM Calculations ----
  #==============================================
  
  tif_sem_data <- reactive({
    pool <- item_pool()
    req(pool, nrow(pool) > 0)
    
    admin_items <- items()
    has_admin <- !is.null(admin_items) && nrow(admin_items) > 0
    
    theta_range <- seq(-3, 3, length.out = 50)
    
    # CORRECTED: Vectorized TIF calculation for pool (with D.const)
    pool_tif <- sapply(theta_range, function(theta) {
      p <- pool$c + (1 - pool$c) / (1 + exp(-D.const * pool$a * (theta - pool$b)))  # ← ADDED D.const
      q <- 1 - p
      sum(pool$a^2 * q * ((p - pool$c)^2 / (p * (1 - pool$c)^2)))
    })
    
    pool_sem <- 1 / sqrt(pool_tif)
    pool_sem[!is.finite(pool_sem)] <- NA
    
    # CORRECTED: Only calculate admin if needed (with D.const)
    if(has_admin) {
      admin_tif <- sapply(theta_range, function(theta) {
        p <- admin_items$c + (1 - admin_items$c) / (1 + exp(-D.const * admin_items$a * (theta - admin_items$b)))  # ← ADDED D.const
        q <- 1 - p
        sum(admin_items$a^2 * q * ((p - admin_items$c)^2 / (p * (1 - admin_items$c)^2)))
      })
      admin_sem <- 1 / sqrt(admin_tif)
      admin_sem[!is.finite(admin_sem)] <- NA
    } else {
      admin_tif <- NULL
      admin_sem <- NULL
    }
    
    list(
      theta_range = theta_range,
      pool_tif = pool_tif,
      pool_sem = pool_sem,
      admin_tif = admin_tif,
      admin_sem = admin_sem,
      has_admin = has_admin
    )
  })
  
  #==============================================
  # ---- TIF Plot with Clean Annotations ----
  #==============================================
  
  output$pool_plot_tif <- renderPlot({
    data <- tif_sem_data()
    req(data)
    
    tryCatch({
      theta_range <- data$theta_range
      pool_tif <- data$pool_tif
      admin_tif <- data$admin_tif
      has_admin <- data$has_admin
      
      # Color scheme matching pool_plot_c
      pool_color <- "#3498db"      # Vibrant blue
      admin_color <- "#e74c3c"     # Vibrant red
      grid_color <- "#ecf0f1"      # Light gray grid
      current_theta_color <- "#2ecc71"  # Green for current theta
      
      # Matching margin and layout
      par(mar = c(4, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Calculate y-axis ranges
      pool_ylim <- c(0, max(pool_tif, na.rm = TRUE) * 1.05)
      
      if(has_admin) {
        admin_ylim <- c(0, max(admin_tif, na.rm = TRUE) * 1.05)
      }
      
      # Plot Pool TIF (left axis) - matching structure
      plot(theta_range, pool_tif, type = "l", lwd = 3, col = pool_color,
           main = "", 
           xlab = "Theta (θ)", ylab = "",
           xlim = c(-3, 3), ylim = pool_ylim,
           cex.main = 1.0, cex.axis = 0.8, cex.lab = 0.9,
           font.lab = 2, col.lab = pool_color,
           axes = FALSE)
      
      # Add custom BLUE axes
      axis(1, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      
      # Add grid for better readability
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Add title with blue accent - matching format
      title("Test Information Function (TIF)", 
            line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)
      
      # Add y-axis label
      mtext("Pool Information", side = 2, line = 2.5, 
            cex = 0.85, col = pool_color, font = 2)
      
      # Add current theta line and pool point
      current_theta <- eap_theta()
      if(!is.null(current_theta) && is.finite(current_theta)) {
        abline(v = current_theta, col = current_theta_color, lwd = 3, lty = "solid")
        
        # Add pool point at current theta (on pool scale)
        closest_idx <- which.min(abs(theta_range - current_theta))
        pool_point_y <- pool_tif[closest_idx]
        points(current_theta, pool_point_y, pch = 21, bg = pool_color, 
               cex = 2, col = "white", lwd = 2)
        
        # Annotation next to pool point
        text(current_theta + 0.3, pool_point_y, 
             labels = paste("Pool:", round(pool_point_y, 1)),
             col = pool_color, cex = 0.8, font = 2, adj = 0)
      }
      
      # Add Administered TIF with its own scale
      if(has_admin) {
        par(new = TRUE)
        plot(theta_range, admin_tif, type = "l", lwd = 3, col = admin_color,
             axes = FALSE, xlab = "", ylab = "", main = "",
             xlim = c(-3, 3), ylim = admin_ylim)
        
        # Add right axis (Admin - RED) with its own scale
        axis(4, cex.axis = 0.8, col.axis = admin_color, col = admin_color, lwd = 2)
        mtext("Administered Information", side = 4, line = 2.5, 
              cex = 0.85, col = admin_color, font = 2)
        
        # Add admin point at current theta (on admin scale)
        if(!is.null(current_theta) && is.finite(current_theta)) {
          admin_point_y <- admin_tif[closest_idx]
          points(current_theta, admin_point_y, pch = 21, bg = admin_color, 
                 cex = 2, col = "white", lwd = 2)
          
          # Annotation next to admin point
          text(current_theta - 0.3, admin_point_y, 
               labels = paste("Admin:", round(admin_point_y, 1)),
               col = admin_color, cex = 0.8, font = 2, adj = 1)
        }
      }
      
      # Enhanced legend - matching pool_plot_c format
      legend_text <- c(
        "Pool TIF",
        "Current Theta"
      )
      
      legend_colors <- c(pool_color, current_theta_color)
      legend_lty <- c("solid", "solid")
      legend_lwd <- c(3, 3)
      
      if(has_admin) {
        legend_text <- c(legend_text, "Administered TIF")
        legend_colors <- c(legend_colors, admin_color)
        legend_lty <- c(legend_lty, "solid")
        legend_lwd <- c(legend_lwd, 3)
      }
      
      # Add legend with matching positioning and styling
      legend("top", legend = legend_text, 
             col = legend_colors, 
             lwd = legend_lwd, 
             lty = legend_lty,
             bty = "n", 
             cex = 0.75, 
             horiz = TRUE,
             xpd = TRUE,
             text.col = legend_colors,
             inset = c(0, -0.1),
             x.intersp = 0.8,
             seg.len = 1.5)
      
    }, error = function(e) {
      par(mar = c(2, 2, 2, 2))
      plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
           axes = FALSE, xlab = "", ylab = "", main = "")
      rect(0.2, 0.4, 0.8, 0.6, col = "#f8d7da", border = "#f5c6cb")
      text(0.5, 0.5, "Data Not Available", col = "#721c24", cex = 1.2, font = 2)
      text(0.5, 0.3, "Please check data source", col = "#856404", cex = 0.8)
    })
  })
  
  #==============================================
  # ---- SEM Plot with Clean Annotations ----
  #==============================================
  
  output$pool_plot_sem <- renderPlot({
    data <- tif_sem_data()
    req(data)
    
    tryCatch({
      theta_range <- data$theta_range
      pool_sem <- data$pool_sem
      admin_sem <- data$admin_sem
      has_admin <- data$has_admin
      
      # Color scheme matching pool_plot_c
      pool_color <- "#3498db"      # Vibrant blue
      admin_color <- "#e74c3c"     # Vibrant red
      grid_color <- "#ecf0f1"      # Light gray grid
      current_theta_color <- "#2ecc71"  # Green for current theta
      
      # Matching margin and layout
      par(mar = c(4, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Calculate appropriate y-axis ranges for each
      pool_ylim <- c(0, max(pool_sem, na.rm = TRUE) * 1.05)
      
      if(has_admin) {
        # Cap admin ylim at a reasonable maximum for better visualization
        admin_max <- max(admin_sem, na.rm = TRUE)
        # If admin range is too large, cap it for better visualization
        if(admin_max > 100) {
          admin_ylim <- c(0, min(admin_max, 1))  # Cap at 1 for readability
        } else {
          admin_ylim <- c(0, admin_max * 1.05)
        }
      }
      
      # Plot Pool SEM (left axis) - matching structure
      plot(theta_range, pool_sem, type = "l", lwd = 3, col = pool_color,
           main = "", 
           xlab = "Theta (θ)", ylab = "",
           xlim = c(-3, 3), ylim = pool_ylim,
           cex.main = 1.0, cex.axis = 0.8, cex.lab = 0.9,
           font.lab = 2, col.lab = pool_color,
           axes = FALSE)
      
      # Add custom BLUE axes
      axis(1, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      axis(2, col = pool_color, col.axis = pool_color, col.ticks = pool_color, lwd = 2)
      
      # Add grid for better readability
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Add title with blue accent - matching format
      title("Standard Error of Measurement (SEM)", 
            line = 3, cex.main = 1.1, font.main = 2, col.main = pool_color)
      
      # Add y-axis label
      mtext("Pool Standard Error", side = 2, line = 2.5, 
            cex = 0.85, col = pool_color, font = 2)
      
      # Add current theta line and pool point
      current_theta <- eap_theta()
      if(!is.null(current_theta) && is.finite(current_theta)) {
        abline(v = current_theta, col = current_theta_color, lwd = 3, lty = "solid")
        
        # Add pool point at current theta (on pool scale)
        closest_idx <- which.min(abs(theta_range - current_theta))
        pool_point_y <- pool_sem[closest_idx]
        points(current_theta, pool_point_y, pch = 21, bg = pool_color, 
               cex = 2, col = "white", lwd = 2)
        
        # Annotation next to pool point
        text(current_theta + 0.3, pool_point_y, 
             labels = paste("Pool:", round(pool_point_y, 3)),
             col = pool_color, cex = 0.8, font = 2, adj = 0)
      }
      
      # Add Administered SEM with its own scale
      if(has_admin) {
        par(new = TRUE)
        plot(theta_range, admin_sem, type = "l", lwd = 3, col = admin_color,
             axes = FALSE, xlab = "", ylab = "", main = "",
             xlim = c(-3, 3), ylim = admin_ylim)
        
        # Add right axis (Admin - RED) with its own scale
        axis(4, cex.axis = 0.8, col.axis = admin_color, col = admin_color, lwd = 2)
        mtext("Administered Standard Error", side = 4, line = 2.5, 
              cex = 0.85, col = admin_color, font = 2)
        
        # Add admin point at current theta (on admin scale)
        if(!is.null(current_theta) && is.finite(current_theta)) {
          admin_point_y <- admin_sem[closest_idx]
          points(current_theta, admin_point_y, pch = 21, bg = admin_color, 
                 cex = 2, col = "white", lwd = 2)
          
          # Annotation next to admin point
          text(current_theta - 0.3, admin_point_y, 
               labels = paste("Admin:", round(admin_point_y, 3)),
               col = admin_color, cex = 0.8, font = 2, adj = 1)
        }
        
        # Add note if admin values were capped
        if(max(admin_sem, na.rm = TRUE) > admin_ylim[2]) {
          legend("bottom", legend = "Note: Admin SEM values capped for visualization", 
                 bty = "n", cex = 0.7, text.col = "darkgray")
        }
      }
      
      # Enhanced legend - matching pool_plot_c format
      legend_text <- c(
        "Pool SEM",
        "Current Theta"
      )
      
      legend_colors <- c(pool_color, current_theta_color)
      legend_lty <- c("solid", "solid")
      legend_lwd <- c(3, 3)
      
      if(has_admin) {
        legend_text <- c(legend_text, "Administered SEM")
        legend_colors <- c(legend_colors, admin_color)
        legend_lty <- c(legend_lty, "solid")
        legend_lwd <- c(legend_lwd, 3)
      }
      
      # Add legend with matching positioning and styling
      legend("top", legend = legend_text, 
             col = legend_colors, 
             lwd = legend_lwd, 
             lty = legend_lty,
             bty = "n", 
             cex = 0.75, 
             horiz = TRUE,
             xpd = TRUE,
             text.col = legend_colors,
             inset = c(0, -0.1),
             x.intersp = 0.8,
             seg.len = 1.5)
      
    }, error = function(e) {
      par(mar = c(2, 2, 2, 2))
      plot(1, 1, type = "n", xlim = c(0, 1), ylim = c(0, 1), 
           axes = FALSE, xlab = "", ylab = "", main = "")
      rect(0.2, 0.4, 0.8, 0.6, col = "#f8d7da", border = "#f5c6cb")
      text(0.5, 0.5, "Data Not Available", col = "#721c24", cex = 1.2, font = 2)
      text(0.5, 0.3, "Please check data source", col = "#856404", cex = 0.8)
    })
  })
  
  #==========================================
  #--------- TEST DIAGNOSTICS --------------- 
  #==========================================
  
  # ------------------------------------------------------------
  # 1) Enhanced Theta progression with CI
  # ------------------------------------------------------------
  output$theta_progression_plot <- renderPlot({
    cm <- cumulative_metrics_cache()
    req(cm, length(cm) > 0, cancelOutput = TRUE)
    tryCatch({
      true_theta_val <- true_theta()
      n_items   <- sapply(cm, `[[`, "items")
      theta_est <- sapply(cm, `[[`, "estimated_theta")
      sem       <- sapply(cm, `[[`, "sem")
      ci_up     <- theta_est + 1.96 * sem
      ci_low    <- theta_est - 1.96 * sem
      band_w    <- 0.30
      
      # Fixed y-axis limits from -3 to 3
      ylim    <- c(-3, 3)
      xlim    <- c(0.8, max(n_items) * 1.02)
      
      # Color scheme
      theta_color <- "#3498db"      # Blue for theta estimate
      true_theta_color <- "#e74c3c" # Red for true theta
      ci_color <- "#3498db"         # Blue for CI
      band_color <- "#f39c12"       # Orange for tolerance band
      grid_color <- "#ecf0f1"       # Light gray grid
      
      par(mar = c(5, 4, 6, 2), mgp = c(2, 0.7, 0))  # Increased bottom margin for guide
      
      # Create the main plot
      plot(NA, type = "n", xlim = xlim, ylim = ylim,
           xlab = "", ylab = "", main = "", 
           axes = FALSE, frame.plot = FALSE)
      
      # Add axis labels
      title(xlab = "Test Step (Items Administered)", col.lab = "black")
      title(ylab = expression("Ability Estimate ("*theta*")"), col.lab = "#3498db")
      
      # Add main title with centered legend
      title("Theta Estimation Progression with 95% CI", line = 2.5)
      legend("top", legend = c("EAP Estimate", "True θ", "95% CI", "±0.3 Band"),
             col = c(theta_color, true_theta_color, ci_color, band_color),
             lwd = c(2, 2, 8, 2), lty = c("solid", "solid", "solid", "dashed"),
             pch = c(21, NA, NA, NA), pt.bg = c("white", NA, NA, NA),
             bty = "n", cex = 0.8, ncol = 4, xpd = TRUE, inset = c(0, -0.12))
      
      # Add custom axes
      axis(1, col = "black", col.axis = "black", col.ticks = "black", lwd = 1.5)
      axis(2, col = theta_color, col.axis = theta_color, col.ticks = theta_color, lwd = 1.5)
      
      # Add grid
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Tolerance band background
      band_low <- true_theta_val - band_w
      band_high <- true_theta_val + band_w
      rect(xlim[1], max(band_low, ylim[1]), xlim[2], min(band_high, ylim[2]),
           col = adjustcolor(band_color, alpha.f = 0.15), border = NA)
      
      # Confidence interval
      ci_up_clipped <- pmin(ci_up, ylim[2])
      ci_low_clipped <- pmax(ci_low, ylim[1])
      polygon(c(n_items, rev(n_items)), c(ci_up_clipped, rev(ci_low_clipped)),
              col = adjustcolor(ci_color, alpha.f = 0.3), border = NA)
      
      # Reference lines
      abline(h = true_theta_val, col = true_theta_color, lwd = 2.5, lty = "solid")
      abline(h = true_theta_val + band_w, col = band_color, lwd = 2, lty = "dashed")
      abline(h = true_theta_val - band_w, col = band_color, lwd = 2, lty = "dashed")
      abline(h = 0, col = "gray50", lwd = 1, lty = "dotted")
      
      # Main trajectory with enhanced points
      lines(n_items, theta_est, type = "b", col = theta_color, lwd = 2.5,
            pch = 21, bg = "white", cex = 1.2)
      
      # Add final value annotation
      if (length(theta_est) > 0) {
        final_theta <- tail(theta_est, 1)
        final_item <- tail(n_items, 1)
        points(final_item, final_theta, pch = 21, bg = theta_color, 
               col = "white", cex = 2)
        TeachingDemos::shadowtext(final_item - 0.25, final_theta + 0.35,
                                  labels = paste("θ:", round(final_theta, 2)),
                                  col = theta_color, bg = "white", cex = 0.8, font = 2, adj = 0)
      }
      
      # Add true theta annotation
      TeachingDemos::shadowtext(1, true_theta_val + 0.25,
                                labels = paste("True θ:", round(true_theta_val, 2)),
                                col = true_theta_color, bg = "white", cex = 0.8, font = 2, adj = 0)
      
      # Add reading guide
      mtext("Guide: Track EAP convergence to true θ. CI shows precision. Band = acceptable error (±0.3).", 
            side = 1, line = 3.5, cex = 0.65, col = "gray40")
      
    }, error = function(e) {
      show_error_plot("Theta Plot Error", e$message)
    })
  })
  
  # ------------------------------------------------------------
  # 2) Enhanced Targeting Efficiency Plot
  # ------------------------------------------------------------
  output$targeting_efficiency_plot <- renderPlot({
    cm <- cumulative_metrics_cache()
    req(cm, length(cm) > 0, cancelOutput = TRUE)
    
    tryCatch({
      ib <- items()
      n_items <- sapply(cm, `[[`, "items")
      theta_est <- sapply(cm, `[[`, "estimated_theta")
      
      # Calculate efficiency metrics
      efficiency <- sapply(seq_along(cm), function(i) {
        if (i == 0) return(0)
        current_theta <- theta_est[i]
        current_item <- ib[i, ]
        current_iif <- I_3pl(current_theta, current_item$a, current_item$b, current_item$c)
        max_possible_iif <- I_3pl(current_item$b, current_item$a, current_item$b, current_item$c)
        if (max_possible_iif > 0) current_iif / max_possible_iif else 0
      })
      
      # Cumulative efficiency average
      cumul_efficiency <- sapply(seq_along(efficiency), function(i) {
        mean(efficiency[1:i])
      })
      
      # Color scheme
      instant_color <- "#3498db"      # Blue for instant efficiency
      cumulative_color <- "#e74c3c"   # Red for cumulative average
      threshold_color <- "#27ae60"    # Green for good efficiency
      grid_color <- "#ecf0f1"         # Light gray grid
      
      xlim <- c(0.8, max(n_items) * 1.02)
      ylim <- c(0, 1.1)
      
      par(mar = c(5, 4, 6, 2), mgp = c(2, 0.7, 0))
      
      # Create main plot
      plot(NA, type = "n", xlim = xlim, ylim = ylim,
           xlab = "", ylab = "", main = "", 
           axes = FALSE, frame.plot = FALSE)
      
      # Add axis labels
      title(xlab = "Test Step (Items Administered)", col.lab = "black")
      title(ylab = "Targeting Efficiency Ratio", col.lab = instant_color)
      
      # Add main title with centered legend
      title("Item-Person Targeting Efficiency", line = 2.5)
      legend("top", legend = c("≥0.8", "0.6-0.8", "<0.6", "Cumulative", "Instant"),
             fill = c(adjustcolor(threshold_color, 0.6), adjustcolor("#f39c12", 0.6), 
                      adjustcolor("#e74c3c", 0.6), NA, NA),
             border = c(threshold_color, "#f39c12", "#e74c3c", NA, NA),
             col = c(NA, NA, NA, cumulative_color, NA), 
             lwd = c(NA, NA, NA, 2, NA),
             pch = c(NA, NA, NA, 21, 22), 
             pt.bg = c(NA, NA, NA, "white", "white"),
             bty = "n", cex = 0.75, ncol = 5, xpd = TRUE, inset = c(0, -0.12))
      
      # Add custom axes
      axis(1, col = "black", col.axis = "black", col.ticks = "black", lwd = 1.5)
      axis(2, col = instant_color, col.axis = instant_color, col.ticks = instant_color, lwd = 1.5)
      
      # Add grid
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Efficiency thresholds
      rect(xlim[1], 0.8, xlim[2], ylim[2], col = adjustcolor(threshold_color, alpha.f = 0.1), border = NA)
      rect(xlim[1], 0.6, xlim[2], 0.8, col = adjustcolor("#f39c12", alpha.f = 0.08), border = NA)
      abline(h = 0.8, col = threshold_color, lwd = 2, lty = "dashed")
      abline(h = 0.6, col = "#f39c12", lwd = 2, lty = "dashed")
      
      # Instant efficiency (bar plot)
      for(i in seq_along(n_items)) {
        col_eff <- if(efficiency[i] >= 0.8) threshold_color else 
          if(efficiency[i] >= 0.6) "#f39c12" else "#e74c3c"
        rect(n_items[i] - 0.3, 0, n_items[i] + 0.3, efficiency[i],
             col = adjustcolor(col_eff, alpha.f = 0.6), border = col_eff, lwd = 1)
      }
      
      # Cumulative efficiency line
      lines(n_items, cumul_efficiency, type = "b", col = cumulative_color, lwd = 2.5,
            pch = 21, bg = "white", cex = 1.1)
      
      # Add final efficiency annotations
      if (length(efficiency) > 0) {
        final_instant <- tail(efficiency, 1)
        final_cumul <- tail(cumul_efficiency, 1)
        final_item <- tail(n_items, 1)
        
        # Instant efficiency annotation
        points(final_item, final_instant, pch = 22, 
               bg = if(final_instant >= 0.8) threshold_color else 
                 if(final_instant >= 0.6) "#f39c12" else "#e74c3c", 
               col = "white", cex = 1.5)
        
        # Cumulative efficiency annotation
        points(final_item, final_cumul, pch = 21, bg = cumulative_color, col = "white", cex = 1.5)
      }
      
      # Add reading guide
      mtext("Guide: Bars = per-item efficiency. Line = running average. Target: maintain >0.6 efficiency.", 
            side = 1, line = 3.5, cex = 0.65, col = "gray40")
      
    }, error = function(e) {
      show_error_plot("Efficiency Plot Error", e$message)
    })
  })
  
  # ------------------------------------------------------------
  # 3) Enhanced SEM + Reliability progression
  # ------------------------------------------------------------
  output$sem_progression_plot <- renderPlot({
    cm <- cumulative_metrics_cache()
    req(cm, length(cm) > 0, cancelOutput = TRUE)
    tryCatch({
      n_items   <- sapply(cm, `[[`, "items")
      sem_vals  <- sapply(cm, `[[`, "sem")
      rel_vals  <- sapply(cm, `[[`, "reliability")
      target_sem<- 0.30
      hit_target<- which(sem_vals <= target_sem)[1]
      
      # Color scheme
      sem_color <- "#e74c3c"        # Red for SEM
      rel_color <- "#3498db"        # Blue for reliability
      target_color <- "#27ae60"     # Green for target
      grid_color <- "#ecf0f1"       # Light gray grid
      
      xlim <- c(0.8, max(n_items) * 1.02)
      ylim_sem <- c(0, 1)
      
      par(mar = c(5, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Create main plot
      plot(NA, type = "n", xlim = xlim, ylim = ylim_sem,
           xlab = "", ylab = "", main = "", 
           axes = FALSE, frame.plot = FALSE)
      
      # Add axis labels
      title(xlab = "Test Step (Items Administered)", col.lab = "black")
      title(ylab = "Standard Error (SEM)", col.lab = sem_color)
      
      # Add main title with centered legend
      title("Measurement Precision and Reliability", line = 2.5)
      legend("top", legend = c("SEM", "Target", "Reliability"),
             col = c(sem_color, target_color, rel_color), 
             lwd = c(2.5, 2.5, 2.5),
             lty = c("solid", "dashed", "solid"), 
             pch = c(21, NA, 21),
             pt.bg = c("white", NA, "white"), 
             bty = "n", cex = 0.8, ncol = 3, xpd = TRUE, inset = c(0, -0.12))
      
      # Add custom axes
      axis(1, col = "black", col.axis = "black", col.ticks = "black", lwd = 1.5)
      axis(2, col = sem_color, col.axis = sem_color, col.ticks = sem_color, lwd = 1.5)
      
      # Add grid
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Target SEM band
      rect(xlim[1], 0, xlim[2], target_sem, col = adjustcolor(target_color, alpha.f = 0.15), border = NA)
      
      # SEM line with enhanced points
      lines(n_items, sem_vals, type = "b", col = sem_color, lwd = 2.5,
            pch = 21, bg = "white", cex = 1.2)
      abline(h = target_sem, col = target_color, lwd = 2.5, lty = "dashed")
      
      final_sem <- tail(sem_vals, 1)
      final_item <- tail(n_items, 1)
      final_rel <- tail(rel_vals, 1)
      
      # Reliability on secondary axis
      par(new = TRUE)
      plot(n_items, rel_vals, type = "b", col = rel_color, lwd = 2.5, lty = "solid",
           axes = FALSE, xlab = "", ylab = "", ylim = c(0, 1), main = "",
           pch = 21, bg = "white", cex = 1.2)
      axis(4, col = rel_color, col.axis = rel_color, col.ticks = rel_color, lwd = 1.5)
      mtext("Reliability", side = 4, line = 2.5, col = rel_color, font = 2)
      
      # Add reading guide
      mtext("Guide: SEM should decrease below 0.3 target. Reliability should increase toward 1.0.", 
            side = 1, line = 3.5, cex = 0.65, col = "gray40")
      
    }, error = function(e) {
      show_error_plot("SEM Plot Error", e$message)
    })
  })
  
  # ------------------------------------------------------------
  # 4) Enhanced Information growth & efficiency
  # ------------------------------------------------------------
  output$information_growth_plot <- renderPlot({
    cm <- cumulative_metrics_cache()
    req(cm, length(cm) > 0, cancelOutput = TRUE)
    tryCatch({
      n_items <- sapply(cm, `[[`, "items")
      tif     <- sapply(cm, `[[`, "tif")
      info_gain <- c(tif[1], diff(tif))
      eff     <- info_gain / n_items
      target_tif <- 11.11
      
      # Color scheme
      tif_color <- "#3498db"         # Blue for TIF
      eff_color <- "#e74c3c"         # Red for efficiency
      ma_color <- "#f39c12"          # Orange for moving average
      target_color <- "#27ae60"      # Green for target
      grid_color <- "#ecf0f1"        # Light gray grid
      
      xlim <- c(0.8, max(n_items) * 1.02)
      ylim_tif <- c(0, max(tif, na.rm = TRUE) * 1.1)
      
      par(mar = c(5, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Create main plot
      plot(NA, type = "n", xlim = xlim, ylim = ylim_tif,
           xlab = "", ylab = "", main = "", 
           axes = FALSE, frame.plot = FALSE)
      
      # Add axis labels
      title(xlab = "Test Step (Items Administered)", col.lab = "black")
      title(ylab = "Test Information (TIF)", col.lab = tif_color)
      
      # Add main title with centered legend
      title("Information Growth & Efficiency", line = 2.5)
      legend("top", legend = c("TIF", "Efficiency", "3-pt MA", "Target"),
             col = c(tif_color, eff_color, ma_color, target_color), 
             lwd = c(2.5, 2.5, 2.5, 2.5),
             lty = c("solid", "solid", "dotdash", "dashed"), 
             pch = c(21, 21, NA, NA),
             pt.bg = c("white", "white", NA, NA), 
             bty = "n", cex = 0.8, ncol = 4, xpd = TRUE, inset = c(0, -0.12))
      
      # Add custom axes
      axis(1, col = "black", col.axis = "black", col.ticks = "black", lwd = 1.5)
      axis(2, col = tif_color, col.axis = tif_color, col.ticks = tif_color, lwd = 1.5)
      
      # Add grid
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Target TIF band
      rect(xlim[1], target_tif, xlim[2], par("usr")[4],
           col = adjustcolor(target_color, alpha.f = 0.15), border = NA)
      
      # TIF line with enhanced points
      lines(n_items, tif, type = "b", col = tif_color, lwd = 2.5,
            pch = 21, bg = "white", cex = 1.2)
      abline(h = target_tif, col = target_color, lwd = 2.5, lty = "dashed")
      
      # Efficiency on secondary axis
      par(new = TRUE)
      eff_lim <- c(0, max(eff, na.rm = TRUE) * 1.1)
      plot(n_items, eff, type = "b", col = eff_color, lwd = 2.5, lty = "solid",
           axes = FALSE, xlab = "", ylab = "", ylim = eff_lim, main = "",
           pch = 21, bg = "white", cex = 1.2)
      axis(4, col = eff_color, col.axis = eff_color, col.ticks = eff_color, lwd = 1.5)
      mtext("Info per Item", side = 4, line = 2.5, col = eff_color, font = 2)
      
      # Moving average
      if (length(eff) > 3 && sum(!is.na(eff)) >= 3) {
        ma <- stats::filter(eff, rep(1/3, 3), sides = 2)
        lines(n_items, ma, col = ma_color, lwd = 2.5, lty = "dotdash")
      }
      
      # Add reading guide
      mtext("Guide: TIF should reach 11.11 target. Efficiency shows information gain per item.", 
            side = 1, line = 3.5, cex = 0.65, col = "gray40")
      
    }, error = function(e) {
      show_error_plot("Information Plot Error", e$message)
    })
  })
  
  # ------------------------------------------------------------
  # 5) Enhanced Bias and |Z| analysis
  # ------------------------------------------------------------
  output$bias_analysis_plot <- renderPlot({
    cm <- cumulative_metrics_cache()
    req(cm, length(cm) > 0, cancelOutput = TRUE)
    tryCatch({
      n_items <- sapply(cm, `[[`, "items")
      bias    <- sapply(cm, `[[`, "bias")
      sem     <- sapply(cm, `[[`, "sem")
      z_abs   <- abs(bias / sem)
      
      # Color scheme
      bias_color <- "#3498db"        # Blue for bias
      z_color <- "#e74c3c"           # Red for Z-scores
      band_color1 <- "#27ae60"       # Green for ±0.1 band
      band_color2 <- "#f39c12"       # Orange for ±0.3 band
      grid_color <- "#ecf0f1"        # Light gray grid
      
      xlim <- c(0.8, max(n_items) * 1.02)
      ylim <- range(bias, na.rm = TRUE)
      if (diff(ylim) == 0) ylim <- ylim + c(-0.3, 0.3)
      ylim <- ylim + diff(ylim) * c(-0.15, 0.15)
      
      par(mar = c(5, 4, 6, 4), mgp = c(2, 0.7, 0))
      
      # Create main plot
      plot(NA, type = "n", xlim = xlim, ylim = ylim,
           xlab = "", ylab = "", main = "", 
           axes = FALSE, frame.plot = FALSE)
      
      # Add axis labels
      title(xlab = "Test Step (Items Administered)", col.lab = "black")
      title(ylab = expression("Bias ("*hat(theta)-theta[true]*")"), col.lab = bias_color)
      
      # Add main title with centered legend
      title("Bias & Standardized Bias Analysis", line = 2.5)
      legend("top", legend = c("Bias", "|Z-score|", "±0.1 Band", "±0.3 Band"),
             col = c(bias_color, z_color, band_color1, band_color2), 
             lwd = c(2.5, 2.5, 2, 2),
             lty = c("solid", "solid", "dashed", "dashed"), 
             pch = c(21, 21, NA, NA),
             pt.bg = c("white", "white", NA, NA), 
             bty = "n", cex = 0.8, ncol = 4, xpd = TRUE, inset = c(0, -0.12))
      
      # Add custom axes
      axis(1, col = "black", col.axis = "black", col.ticks = "black", lwd = 1.5)
      axis(2, col = bias_color, col.axis = bias_color, col.ticks = bias_color, lwd = 1.5)
      
      # Add grid
      grid(col = grid_color, lty = "solid", lwd = 0.5)
      
      # Bias bands
      rect(xlim[1], -0.30, xlim[2], 0.30, col = adjustcolor(band_color2, alpha.f = 0.1), border = NA)
      rect(xlim[1], -0.10, xlim[2], 0.10, col = adjustcolor(band_color1, alpha.f = 0.15), border = NA)
      
      # Bias line with enhanced points
      lines(n_items, bias, type = "b", col = bias_color, lwd = 2.5,
            pch = 21, bg = "white", cex = 1.2)
      
      # Reference lines
      abline(h = 0, col = "black", lwd = 2)
      abline(h = c(-0.1, 0.1), col = band_color1, lwd = 2, lty = "dashed")
      abline(h = c(-0.3, 0.3), col = band_color2, lwd = 2, lty = "dashed")
      
      # Z-scores on secondary axis
      par(new = TRUE)
      z_lim <- c(0, max(z_abs, na.rm = TRUE) * 1.1)
      plot(n_items, z_abs, type = "b", col = z_color, lwd = 2.5, lty = "solid",
           axes = FALSE, xlab = "", ylab = "", ylim = z_lim, main = "",
           pch = 21, bg = "white", cex = 1.2)
      
      axis(4, col = z_color, col.axis = z_color, col.ticks = z_color, lwd = 1.5)
      mtext("|Z-score|", side = 4, line = 2.5, col = z_color, font = 2)
      
      # Add critical Z-value reference
      abline(h = 1.96, col = adjustcolor(z_color, 0.6), lty = "dotted", lwd = 2)
      
      # Add reading guide
      mtext("Guide: Bias should stay within ±0.3 bands. |Z| > 1.96 indicates significant bias.", 
            side = 1, line = 3.5, cex = 0.65, col = "gray40")
      
    }, error = function(e) {
      show_error_plot("Bias Plot Error", e$message)
    })
  })  
  
    # ------------------------------------------------------------
    # 6) Enhanced Item selection pattern
    # ------------------------------------------------------------
    output$item_selection_plot <- renderPlot({
      ib <- items()
      req(ib, nrow(ib) > 0, cancelOutput = TRUE)
      tryCatch({
        eap_theta <- eap_theta()
        
        # Color scheme
        diff_color <- "#3498db"        # Blue for difficulty
        theta_color <- "#e74c3c"       # Red for current theta
        info_color <- "#27ae60"        # Green for information
        disc_color <- "#f39c12"        # Orange for discrimination
        grid_color <- "#ecf0f1"        # Light gray grid
        
        xseq <- seq_len(nrow(ib))
        ylim <- range(ib$b, na.rm = TRUE)
        if (diff(ylim) == 0) ylim <- ylim + c(-1, 1)
        ylim <- ylim + diff(ylim) * c(-0.15, 0.15)
        
        par(mar = c(5, 4, 6, 4), mgp = c(2, 0.7, 0))
        
        # Create main plot
        plot(NA, type = "n", xlim = c(0.8, max(xseq) * 1.02), ylim = ylim,
             xlab = "", ylab = "", main = "", 
             axes = FALSE, frame.plot = FALSE)
        
        # Add axis labels
        title(xlab = "Item Sequence", col.lab = "black")
        title(ylab = "Item Difficulty (b)", col.lab = diff_color)
        
        # Add main title with centered legend
        title("Item Selection Pattern", line = 2.5)
        legend("top", legend = c("Difficulty", "EAP θ", "Info at θ", "Discrimination"),
               col = c(diff_color, theta_color, info_color, disc_color),
               lwd = c(2.5, 2.5, 2.5, NA), 
               lty = c("solid", "solid", "solid", NA),
               pch = c(21, NA, NA, 16), 
               pt.bg = c("white", NA, NA, NA),
               pt.cex = c(1.2, NA, NA, 1.5), 
               bty = "n", cex = 0.8, ncol = 4, xpd = TRUE, inset = c(0, -0.12))
        
        # Add custom axes
        axis(1, col = "black", col.axis = "black", col.ticks = "black", lwd = 1.5)
        axis(2, col = diff_color, col.axis = diff_color, col.ticks = diff_color, lwd = 1.5)
        
        # Add grid
        grid(col = grid_color, lty = "solid", lwd = 0.5)
        
        # Reference lines
        abline(h = 0, col = "gray40", lwd = 1.5, lty = "dotted")
        abline(h = eap_theta, col = theta_color, lwd = 2.5)
        
        # Difficulty line with enhanced points
        lines(xseq, ib$b, type = "b", col = diff_color, lwd = 2.5,
              pch = 21, bg = "white", cex = 1.2)
        
        # Discrimination as point size and color
        a_rescale <- scales::rescale(ib$a, to = c(1, 3))
        a_cols <- colorRampPalette(c(diff_color, disc_color))(nrow(ib))
        points(xseq, ib$b, pch = 16, cex = a_rescale, col = a_cols)
        
        # Information at current theta
        info_at_theta <- I_3pl(eap_theta, ib$a, ib$b, ib$c)
        par(new = TRUE)
        info_lim <- c(0, max(info_at_theta, na.rm = TRUE) * 1.1)
        plot(xseq, info_at_theta, type = "l", col = info_color, lwd = 2.5, lty = "solid",
             axes = FALSE, xlab = "", ylab = "", ylim = info_lim, main = "")
        axis(4, col = info_color, col.axis = info_color, col.ticks = info_color, lwd = 1.5)
        mtext("Information", side = 4, line = 2.5, col = info_color, font = 2)
        
        # Add reading guide
        mtext("Guide: Ideal pattern = items cluster around EAP θ. Larger/brighter points = higher discrimination.", 
              side = 1, line = 3.5, cex = 0.65, col = "gray40")
        
      }, error = function(e) {
        show_error_plot("Item Selection Plot Error", e$message)
      })
    })
    
    # ------------------------------------------------------------
    # Grid Diagnostics Plot (GH vs Uniform) - Single plot version
    # ------------------------------------------------------------
    output$grid_diagnostics_plot <- renderPlot({
      tryCatch({
        grid_gh_31 <- make_theta_grid(n = 31, scheme = "ghermite", prior_mean = 0, prior_sd = 1)
        grid_unif_31 <- make_theta_grid(n = 31, scheme = "uniform", range = c(-4,4))
        grid_gh_31_scaled <- scale_gh_grid(grid_gh_31, range = c(-4,4))
        
        plot_spacing(grid_unif_31, grid_gh_31_scaled, 
                     names = c("Uniform (31 nodes)", "Gauss-Hermite (31 nodes)"),
                     xlim_range = c(-4, 4))
        
      }, error = function(e) {
        ggplot() + 
          annotate("text", x = 0, y = 0, label = "Plot unavailable", size = 6) +
          theme_void()
      })
    })
    
    output$distribution_comparison_plot <- renderPlot({
      tryCatch({
        grid_gh_31 <- make_theta_grid(n = 31, scheme = "ghermite", prior_mean = 0, prior_sd = 1)
        grid_unif_31 <- make_theta_grid(n = 31, scheme = "uniform", range = c(-4,4))
        grid_gh_31_scaled <- scale_gh_grid(grid_gh_31, range = c(-4,4))
        
        plot_density_comparison(grid_gh_31_scaled, grid_unif_31,
                                names = c("Gauss-Hermite (31 nodes)", "Uniform (31 nodes)"),
                                xlim_range = c(-4, 4))
        
      }, error = function(e) {
        ggplot() + 
          annotate("text", x = 0, y = 0, label = "Distribution plot unavailable", size = 6) +
          theme_void()
      })
    })
    
    # ------------------------------------------------------------
    # Distribution Comparison Plot
    # ------------------------------------------------------------
    output$distribution_comparison_plot <- renderPlot({
      tryCatch({
        # Create grids for comparison
        grid_gh_31 <- make_theta_grid(n = 31, scheme = "ghermite", prior_mean = 0, prior_sd = 1)
        grid_unif_31 <- make_theta_grid(n = 31, scheme = "uniform", range = c(-4,4))
        grid_gh_31_scaled <- scale_gh_grid(grid_gh_31, range = c(-4,4))
        
        # Create normal distribution curve
        x_seq <- seq(-4, 4, length.out = 200)
        normal_density <- dnorm(x_seq)
        normal_df <- data.frame(x = x_seq, y = normal_density, distribution = "Standard Normal")
        
        # Create weighted density representations
        gh_density <- data.frame(
          x = grid_gh_31_scaled$theta,
          y = grid_gh_31_scaled$w / diff(range(grid_gh_31_scaled$theta)) * length(grid_gh_31_scaled$theta),
          distribution = "Gauss-Hermite Weights"
        )
        
        unif_density <- data.frame(
          x = grid_unif_31$theta,
          y = grid_unif_31$w / diff(range(grid_unif_31$theta)) * length(grid_unif_31$theta),
          distribution = "Uniform Weights"
        )
        
        # Combine all data
        plot_data <- rbind(
          normal_df,
          gh_density,
          unif_density
        )
        
        # Create the plot
        ggplot(plot_data, aes(x = x, y = y, color = distribution, linetype = distribution)) +
          geom_line(data = subset(plot_data, distribution == "Standard Normal"), linewidth = 1.5) +
          geom_point(data = subset(plot_data, distribution != "Standard Normal"), size = 2, alpha = 0.7) +
          geom_line(data = subset(plot_data, distribution != "Standard Normal"), linewidth = 1, alpha = 0.7) +
          labs(
            x = expression(theta),
            y = "Density / Scaled Weights",
            title = "Distribution Comparison: Gauss-Hermite vs Uniform vs Normal"
          ) +
          scale_color_manual(values = c(
            "Standard Normal" = "#2ecc71",      # Green for normal
            "Uniform Weights" = "#3498db", # Blue for GH
            "Gauss-Hermite Weights" = "#e74c3c"        # Red for uniform
          )) +
          scale_linetype_manual(values = c(
            "Standard Normal" = "solid",
            "Gauss-Hermite Weights" = "solid",
            "Uniform Weights" = "solid"
          )) +
          coord_cartesian(xlim = c(-4, 4)) +
          theme_minimal(base_size = 14) +
          theme(
            legend.position = "top",
            legend.title = element_blank(),
            legend.text = element_text(size = 12),
            plot.margin = margin(10, 10, 10, 10),
            panel.grid.minor = element_blank(),
            plot.title = element_text(hjust = 0.5, size = 14)
          )
        
      }, error = function(e) {
        ggplot() + 
          annotate("text", x = 0, y = 0, label = "Distribution plot unavailable", size = 6) +
          theme_void()
      })
    })
    
    
    # Current Test Metrics UI
    output$current_test_metrics <- renderUI({
      cm <- cumulative_metrics_cache()
      ib <- items()
      req(cm, length(cm) > 0, ib, nrow(ib) > 0)
      
      current_metrics <- cm[[length(cm)]]
      true_theta_val <- true_theta()
      
      bias <- current_metrics$estimated_theta - true_theta_val
      abs_bias <- abs(bias)
      efficiency <- if (current_metrics$sem <= 0.30) "Target Met" else "Below Target"
      
      avg_admin_a <- mean(ib$a)
      avg_admin_b <- mean(ib$b)
      avg_admin_c <- mean(ib$c)
      b_range <- sprintf("[%.2f, %.2f]", min(ib$b), max(ib$b))
      
      items_to_target <- which(sapply(cm, function(x) x$sem <= 0.30))[1]
      avg_info_gain <- current_metrics$tif / nrow(ib)
      
      method <- if (!is.null(input$sel_method)) input$sel_method else "mfi"
      
      # Pre-calculate all color values to avoid complex conditionals in paste0
      bias_color <- if(abs_bias <= 0.2) "#059669" else if(abs_bias <= 0.5) "#d97706" else "#dc2626"
      zscore <- abs(bias/current_metrics$sem)
      zscore_color <- if(zscore <= 1) "#059669" else if(zscore <= 2) "#d97706" else "#dc2626"
      sem_color <- if(current_metrics$sem <= 0.30) "#059669" else "#dc2626"
      reliability_color <- if(current_metrics$reliability >= 0.8) "#059669" else if(current_metrics$reliability >= 0.7) "#d97706" else "#dc2626"
      efficiency_color <- if(efficiency == "Target Met") "#059669" else "#dc2626"
      
      HTML(paste0('
<div class="metrics-two-columns">
  <!-- Left Column -->
  <div class="column">
    <!-- Estimation Accuracy -->
    <div class="metric-box">
      <div class="metric-title">Estimation Accuracy</div>
      <div class="metric-content">
        <div class="metric-row">
          <span>True θ:</span>
          <strong>', sprintf('%.2f', true_theta_val), '</strong>
        </div>
        <div class="metric-row">
          <span>Est. θ:</span>
          <strong>', sprintf('%.2f', current_metrics$estimated_theta), '</strong>
        </div>
        <div class="metric-row">
          <span>Bias:</span>
          <strong style="color:', bias_color, '">', sprintf('%.2f', bias), '</strong>
        </div>
        <div class="metric-row">
          <span>|Z-score|:</span>
          <strong style="color:', zscore_color, '">', sprintf('%.1f', zscore), '</strong>
        </div>
      </div>
    </div>

    <!-- Precision Metrics -->
    <div class="metric-box">
      <div class="metric-title">Precision Metrics</div>
      <div class="metric-content">
        <div class="metric-row">
          <span>SEM:</span>
          <strong style="color:', sem_color, '">', sprintf('%.3f', current_metrics$sem), '</strong>
        </div>
        <div class="metric-row">
          <span>TIF:</span>
          <strong>', sprintf('%.1f', current_metrics$tif), '</strong>
        </div>
        <div class="metric-row">
          <span>Reliability:</span>
          <strong style="color:', reliability_color, '">', sprintf('%.3f', current_metrics$reliability), '</strong>
        </div>
        <div class="metric-row">
          <span>95% CI Width:</span>
          <strong>±', sprintf('%.2f', 1.96 * current_metrics$sem), '</strong>
        </div>
      </div>
    </div>
  </div>

  <!-- Right Column -->
  <div class="column">
    <!-- Test Status -->
    <div class="metric-box">
      <div class="metric-title">Test Status</div>
      <div class="metric-content">
        <div class="metric-row">
          <span>Method:</span>
          <strong>', toupper(method), '</strong>
        </div>
        <div class="metric-row">
          <span>Efficiency:</span>
          <strong style="color:', efficiency_color, '">', efficiency, '</strong>
        </div>
        <div class="metric-row">
          <span>Items to Target:</span>
          <strong>', ifelse(is.na(items_to_target), ">", items_to_target), '</strong>
        </div>
        <div class="metric-row">
          <span>Avg Info/Item:</span>
          <strong>', sprintf('%.2f', avg_info_gain), '</strong>
        </div>
      </div>
    </div>

    <!-- Administered Items -->
    <div class="metric-box">
      <div class="metric-title">Administered Items</div>
      <div class="metric-content">
        <div class="metric-row">
          <span>Count:</span>
          <strong>', nrow(ib), '</strong>
        </div>
        <div class="metric-row">
          <span>Avg a:</span>
          <strong>', sprintf('%.2f', avg_admin_a), '</strong>
        </div>
        <div class="metric-row">
          <span>Avg b:</span>
          <strong>', sprintf('%.2f', avg_admin_b), '</strong>
        </div>
        <div class="metric-row">
          <span>b Range:</span>
          <strong>', b_range, '</strong>
        </div>
      </div>
    </div>
  </div>
</div>

<style>
.metrics-two-columns {
  display: grid;
  grid-template-columns: 1fr 1fr;
  gap: 20px;
  max-width: 1200px;
  margin: 0 auto;
  font-family: Arial, sans-serif;
}

.column {
  display: flex;
  flex-direction: column;
  gap: 20px;
}

.metric-box {
  background: white;
  border: 1px solid #e9ecef;
  border-radius: 8px;
  padding: 16px;
  box-shadow: 0 1px 3px rgba(0,0,0,0.1);
}

.metric-title {
  font-weight: 600;
  color: #2c5282;
  margin-bottom: 12px;
  font-size: 14px;
}

.metric-content {
  display: flex;
  flex-direction: column;
  gap: 8px;
}

.metric-row {
  display: flex;
  justify-content: space-between;
  align-items: center;
  font-size: 11px;
  color: #64748b;
  line-height: 1.4;
}

.metric-row strong {
  color: #1e3a8a;
  font-weight: 600;
}

/* Responsive design */
@media (max-width: 768px) {
  .metrics-two-columns {
    grid-template-columns: 1fr;
  }
}
</style>
'))
    })
    
    output$summary_table <- renderUI({
      ib <- items()
      cm <- cumulative_metrics_cache()
      true_th <- true_theta()
      req(!is.null(ib), nrow(ib) > 0, !is.null(true_th))
      
      if (is.null(cm) || length(cm) == 0) return(NULL)
      
      n_rows <- length(cm)
      ib <- ib[seq_len(n_rows), , drop = FALSE]
      
      resp <- responses()
      if (length(resp) < n_rows) {
        resp <- c(resp, rep(NA_integer_, n_rows - length(resp)))
      }
      
      # Authoritative 3PL implementation with proper error handling
      P_3pl <- function(theta, a, b, c) {
        # Input validation and clamping - APPLY TO PARAMETERS, DON'T OVERWRITE
        a_clamped <- max(a, 0.5)
        c_clamped <- max(0, min(c, 0.2))
        
        # Core 3PL calculation
        exponent <- -1.702 * a_clamped * (theta - b)
        p_correct <- c_clamped + (1 - c_clamped) / (1 + exp(exponent))
        
        # Ensure numerical stability
        pmin(pmax(p_correct, 1e-10), 1 - 1e-10)
      }
      
      # Create table header with equal width columns and centered titles
      table_html <- paste0(
        '<table id="summary_table" class="display compact" style="width:100%; font-size:10px; margin-top:15px; table-layout:fixed;">',
        '<thead><tr style="background:linear-gradient(135deg,#f8fafc,#f1f5f9);">',
        '<th style="width:7.69%; text-align:center;">Step</th>',
        '<th style="width:7.69%; text-align:center;">Item</th>',
        '<th style="width:7.69%; text-align:center;">a</th>',
        '<th style="width:7.69%; text-align:center;">b</th>',
        '<th style="width:7.69%; text-align:center;">c</th>',
        '<th style="width:7.69%; text-align:center;">P(θ)</th>',
        '<th style="width:7.69%; text-align:center;">Resp.</th>',
        '<th style="width:7.69%; text-align:center;">EAP θ̂ </th>',
        '<th style="width:7.69%; text-align:center;">Bias</th>',
        '<th style="width:7.69%; text-align:center;">IIF</th>',
        '<th style="width:7.69%; text-align:center;">TIF</th>',
        '<th style="width:7.69%; text-align:center;">SEM</th>',
        '<th style="width:7.69%; text-align:center;">ρ</th>',
        '</tr></thead><tbody>'
      )
      
      # Create table rows (keeping your original cell alignment)
      for (i in 1:n_rows) {
        metrics <- cm[[i]]
        
        current_iif <- metrics$iif_current_item
        current_iif <- ifelse(is.finite(current_iif), current_iif, 0)
        
        # Response symbol/color
        current_response <- resp[i]
        response_text <- if (!is.na(current_response) && current_response == 1) "✓" else "✗"
        response_color <- if (!is.na(current_response) && current_response == 1) "green" else "red"
        
        # Calculate p(true theta) using the true theta value
        p_true_theta <- P_3pl(
          theta = true_th,  # Using the true theta from reactive
          a = ib$a[i],
          b = ib$b[i], 
          c = ib$c[i]
        )
        
        # Row HTML with 3 digits for numeric values
        row_html <- paste0(
          '<tr>',
          '<td style="text-align:center;">', i, '</td>',
          '<td style="text-align:center;">', ib$label[i], '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', ib$a[i]), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', ib$b[i]), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', ib$c[i]), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', p_true_theta), '</td>',
          '<td style="text-align:center; color:', response_color, '; font-weight:bold;">', response_text, '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', metrics$estimated_theta), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', metrics$bias), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', current_iif), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', metrics$tif), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', metrics$sem), '</td>',
          '<td style="text-align:right;">', sprintf('%.3f', metrics$reliability), '</td>',
          '</tr>'
        )
        
        table_html <- paste0(table_html, row_html)
      }
      
      table_html <- paste0(table_html, '</tbody></table>')
      
      tagList(
        tags$div(HTML(table_html)),
        tags$script(HTML("
      $(document).on('shiny:value', function() {
        setTimeout(function() {
          if ($.fn.DataTable.isDataTable('#summary_table')) {
            $('#summary_table').DataTable().destroy();
          }
          $('#summary_table').DataTable({
            paging: false,
            searching: false,
            info: false,
            ordering: false,
            autoWidth: false,
            columnDefs: [
              { width: '7.69%', targets: '_all' }
            ]
          });
        }, 50);
      });
    "))
      )
    })
    
    # ---- Fixed Cumulative Metrics Calculation ----
    observeEvent(list(items(), responses()), {
      ib <- items()
      resp <- responses()
      
      if (is.null(ib) || nrow(ib) == 0) {
        cumulative_metrics_cache(NULL)
        return()
      }
      
      n_available <- min(nrow(ib), length(resp))
      if (n_available == 0) {
        cumulative_metrics_cache(list())
        return()
      }
      
      true_th <- true_theta()
      cm <- vector("list", n_available)
      
      for (n in 1:n_available) {
        current_subset_items <- ib[1:n, , drop = FALSE]
        current_subset_responses <- resp[1:n]
        
        # Calculate EAP for this subset
        current_eap <- estimate_theta_eap(current_subset_items, current_subset_responses)
        
        # CRITICAL FIX: Use DIRECT CALCULATION for both plot and table
        # This ensures consistency between displayed values
        current_tif <- calculate_tif_at_theta(current_eap, current_subset_items)
        current_sem <- if (n == 1) NA_real_ else if (current_tif > 0) 1 / sqrt(current_tif) else NA_real_
        current_reliability <- calculate_reliability(current_tif)
        current_bias <- current_eap - true_th
        
        # Calculate IIF for the most recent item at current EAP
        if (n > 0) {
          current_iif <- I_3pl(current_eap, 
                               current_subset_items$a[n], 
                               current_subset_items$b[n], 
                               current_subset_items$c[n])
        } else {
          current_iif <- 0
        }
        
        cm[[n]] <- list(
          items = n,
          tif = current_tif,
          sem = current_sem,
          reliability = current_reliability,
          estimated_theta = current_eap,
          bias = current_bias,
          iif_current_item = current_iif  # Store for table use
        )
      }
      
      cumulative_metrics_cache(cm)
      
      # Update bias history
      if (length(cm) > 0 && initial_modal_completed()) {
        all_biases <- sapply(cm, `[[`, "bias")
        bias_history(all_biases)
      }
    }, ignoreInit = FALSE)
    
    # ---- Target SEM Modal ----
    observe({
      cm <- cumulative_metrics_cache()
      req(cm, length(cm) > 0)
      
      # Get current SEM
      current_sem <- tail(cm, 1)[[1]]$sem
      current_items <- tail(cm, 1)[[1]]$items
      
      # Check if target is met and we haven't shown the modal yet
      if (is.finite(current_sem) && current_sem <= 0.30 && !target_met()) {
        target_met(TRUE)
        final_sem_value(current_sem)
        
        showModal(modalDialog(
          title = NULL,  # Remove title bar
          easyClose = FALSE,
          fade = TRUE,
          size = "m",
          footer = tagList(
            actionButton("continue_testing", "Continue Testing", class = "btn-success"),
            actionButton("view_summary", "View Summary", class = "btn-info")
          ),
          tagList(
            div(
              style = "text-align: center; padding: 0;",
              
              
              # Main achievement message
              div(
                style = "background: #f0fdf4; padding: 10px; border-radius: 6px; margin-bottom: 10px; border: 1px solid #bbf7d0;",
                div(
                  style = "text-align: center; margin-bottom: 6px;",
                  span(style = "font-size: 13px; color: #065f46; font-weight: 600; text-transform: uppercase; letter-spacing: 0.5px;", 
                       "Target SEM Reached")
                ),
                div(
                  style = "text-align: center;",
                  span(style = "font-size: 14px; font-weight: bold; color: #065f46;",
                       textOutput("sem_header_display"))
                )
              ),
              
              # Precision and Efficiency side by side
              div(
                style = "display: grid; grid-template-columns: 1fr 1fr; gap: 10px; margin: 8px 0;",
                
                # Left Column: Precision Metrics
                div(
                  style = "background: #f0f9ff; border-radius: 5px; padding: 8px; border: 1px solid #bae6fd;",
                  h5(style = "color: #0369a1; margin-bottom: 6px; font-size: 12px; font-weight: 600;", "Precision Metrics"),
                  div(
                    style = "font-size: 11px; line-height: 1.2;",
                    div(style = "margin-bottom: 4px;",
                        strong("SEM Target:"), " ≤ 0.30"
                    ),
                    div(style = "margin-bottom: 4px;", 
                        strong("95% CI:"), textOutput("ci_display", inline = TRUE)
                    ),
                    div(style = "margin-bottom: 0;", 
                        strong("Reliability:"), textOutput("reliability_display", inline = TRUE)
                    )
                  )
                ),
                
                # Right Column: Efficiency Gain
                div(
                  style = "background: #f0fdf4; border-radius: 5px; padding: 8px; border: 1px solid #bbf7d0;",
                  h5(style = "color: #15803d; margin-bottom: 6px; font-size: 12px; font-weight: 600;",
                     textOutput("efficiency_header_display")),              
                  div(
                    style = "font-size: 11px; line-height: 1.2;",
                    div(style = "margin-bottom: 4px;",
                        strong("Items used:"), textOutput("items_used_display", inline = TRUE)
                    ),
                    div(style = "margin-bottom: 4px;", 
                        strong("Fixed-form equivalent:"), textOutput("fixed_form_display", inline = TRUE)
                    ),
                    div(style = "margin-bottom: 0;", 
                        strong("Items saved:"), textOutput("items_saved_display", inline = TRUE)
                    )
                  )
                )
              ),
              
              # Confidence Interpretation
              div(
                style = "background: #fefce8; border-radius: 5px; padding: 8px; margin: 6px 0; border: 1px solid #fef08a;",
                div(
                  style = "margin: 0; font-size: 11px; line-height: 1.2;",
                  HTML('<strong style="color: #854d0e;">Interpretation:</strong> '),
                  textOutput("confidence_interpretation_inline", inline = TRUE)
                )
              ),
              
              # Call to action
              div(
                style = "text-align: center; margin-top: 8px; padding-top: 6px; border-top: 1px solid #e2e8f0;",
                p(style = "margin: 0; font-size: 11px; line-height: 1.2; color: #64748b;",
                  "You can continue testing for even greater precision, or view detailed results in the Test Diagnostics tab.")
              )
            )
          )
        ))
      }
    })
    
    
    # ---- Modal Button Handlers ----
    observeEvent(input$continue_testing, {
      removeModal()
    })
    
    observeEvent(input$view_summary, {
      removeModal()
      # Scroll to summary table
      runjs("$('html, body').animate({ scrollTop: $(document).height() }, 1000);")
    })
    
    ###############################  
    # ----     Main Plot     ----
    ###############################
    output$plot_all <- renderPlot({
      # Show loading message if not initialized
      if (!app_initialized()) {
        par(mar = c(3, 3, 3, 3), bg = "white", family = "sans")
        plot.new()
        plot.window(xlim = c(0, 1), ylim = c(0, 1))
        text(0.5, 0.7, "Initializing CAT System...", 
             col = "#666666", cex = 1.3, adj = c(0.5, 0.5), font = 2)
        text(0.5, 0.5, "Please wait while the test is being set up", 
             col = "#999999", cex = 1.0, adj = c(0.5, 0.5))
        return()
      }
      
      library(TeachingDemos)
      ib <- items()
      
      # DEFENSIVE: Check if items exist
      if (is.null(ib) || nrow(ib) == 0) {
        par(mar = c(3, 3, 3, 3), bg = "white", family = "sans")
        plot.new()
        plot.window(xlim = c(0, 1), ylim = c(0, 1))
        text(0.5, 0.5, "No items available", col = "red", cex = 1.2, adj = c(0.5, 0.5))
        return()
      }
      
      # DEFENSIVE: Initialize critical variables FIRST
      labs <- ib$label
      curves <- per_item_curves()
      
      # Check if curves exist
      if (is.null(curves) || length(curves) == 0) {
        par(mar = c(3, 3, 3, 3), bg = "white", family = "sans")
        plot.new()
        plot.window(xlim = c(0, 1), ylim = c(0, 1))
        text(0.5, 0.5, "No curve data available", col = "red", cex = 1.2, adj = c(0.5, 0.5))
        return()
      }
      
      # Continue with existing logic...
      pal  <- setNames(ib$color, ib$label)
      last_lab <- tail(labs, 1)
      
      theta  <- .THETA_GRID
      tif_df <- tibble(theta = theta, TIF = tif_curve())
      sem_df <- tibble(theta = theta,
                       SEM = ifelse(tif_df$TIF > 0, 1 / sqrt(tif_df$TIF), NA_real_))
      
      max_tif <- max(tif_df$TIF, na.rm = TRUE)
      if (!is.finite(max_tif) || max_tif <= 0) max_tif <- 1
      
      max_iif <- max(vapply(curves[labs], function(df) {
        if (!is.null(df) && nrow(df) > 0 && !all(is.na(df$I))) {
          max(df$I, na.rm = TRUE)
        } else {
          0
        }
      }, 0), na.rm = TRUE)
      
      if (!is.finite(max_iif) || max_iif <= 0) max_iif <- 1
      
      y_top_left <- max(1, max_iif * 1.4)
      scaling_factor <- if (max_tif > 0) min(1.0, (y_top_left * 0.85) / max_tif) else 1.0
      scaling_factor <- max(0.05, min(1.0, scaling_factor))
      y_top_left <- min(40, y_top_left)
      
      th0 <- eap_theta()
      true_th <- true_theta()
      
      # CRITICAL FIX: Get values from CACHE (same as table)
      # CRITICAL FIX: Plot points using EXACT SAME values as table
      # Use the cached metrics which are calculated consistently
      cm <- cumulative_metrics_cache()
      if (!is.null(cm) && length(cm) > 0) {
        current_metrics <- cm[[length(cm)]]  # Use double bracket indexing for lists
        TIF_th0 <- current_metrics$tif
        SEM_th0 <- current_metrics$sem
        I_last <- current_metrics$iif_current_item
        
        
      } else {
        # Fallback - should not happen if cache is working
        TIF_th0 <- calculate_tif_at_theta(th0, ib)
        SEM_th0 <- calculate_sem_at_theta(th0, ib)
        I_last <- if (nrow(ib) > 0) {
          I_3pl(th0, ib$a[nrow(ib)], ib$b[nrow(ib)], ib$c[nrow(ib)])
        } else 0
      }    
      par(mar = c(4.5, 4.5, 4, 4.5) + 0.1, mgp = c(2.8, 0.8, 0))
      
      xpad <- 0.8
      xlim <- c(min(theta) - xpad, max(theta) + xpad)
      
      plot(NA, NA, xlim = xlim, ylim = c(0, y_top_left),
           xlab = expression(bold("Ability (" * theta * ")")),
           ylab = expression(bold("Item Information (IIF)")),
           axes = FALSE, main = "")
      
      grid(lwd = 0.7, col = "gray85", lty = "solid")
      axis(1, col = "gray40", col.axis = "gray40", lwd = 1)
      axis(2, col = "gray40", col.axis = "gray40", lwd = 1)
      
      # Add TIF on right axis
      right_ticks <- pretty(c(0, max_tif))
      axis(4, at = right_ticks * scaling_factor, labels = right_ticks, lwd = 1)
      mtext("Test Information (TIF) & Standard Error (SEM)", side = 4, line = 2.8, font = 2)
      
      # Reference lines
      abline(v = true_th, lwd = 2, col = "purple", lty = "solid")
      abline(v = th0, lwd = 2, col = "purple", lty = "dotted")
      
      # Plot ICC curves
      icc_scale <- 0.7 * y_top_left
      if (input$icc_layer_alpha > 0) {
        for (lab in labs) {
          df <- curves[[lab]]
          if (!is.null(df) && nrow(df) > 0 && !all(is.na(df$P))) {
            alpha_eff <- input$icc_layer_alpha
            line_width <- if (lab == last_lab) 2.5 else 1.2
            col_icc <- adjustcolor(pal[[lab]], alpha.f = alpha_eff)
            lines(df$theta, df$P * icc_scale, col = col_icc, lwd = line_width)
          }
        }
      }
      
      # Plot IIF curves
      if (input$iif_layer_alpha > 0) {
        for (lab in labs) {
          df <- curves[[lab]]
          if (!is.null(df) && nrow(df) > 0 && !all(is.na(df$I))) {
            alpha_eff <- input$iif_layer_alpha
            line_width <- if (lab == last_lab) 3.5 else 1.8
            col_iif <- adjustcolor(pal[[lab]], alpha.f = alpha_eff)
            lines(df$theta, df$I, col = col_iif, lwd = line_width)
          }
        }
      }
      
      # Plot TIF
      if (input$tif_alpha > 0 && max(tif_df$TIF, na.rm = TRUE) > 0) {
        tif_scaled <- tif_df$TIF * scaling_factor
        lines(theta, tif_scaled, 
              col = adjustcolor("#1D4ED8", alpha.f = input$tif_alpha),
              lwd = 5, lend = "round")
      }
      
      # Plot SEM
      if (input$sem_alpha > 0 && max(tif_df$TIF, na.rm = TRUE) > 0) {
        sem_scaled <- sem_df$SEM * scaling_factor
        lines(theta, sem_scaled,
              col = adjustcolor("darkred", alpha.f = input$sem_alpha),
              lwd = 3, lend = "round")
      }
      
      # Target SEM line
      target_sem_line_y <- 0.30 * scaling_factor
      abline(h = target_sem_line_y, col = "#EF4444", lwd = 2.5, lty = "dotted")
      
      # CRITICAL FIX: Plot points using EXACT SAME values as table
      # IIF point
      if (is.finite(I_last) && I_last > 0) {
        points(th0, I_last, pch = 21, bg = pal[[last_lab]], 
               col = "white", cex = 1.5, lwd = 2)
        
        text_pos <- if (I_last > y_top_left * 0.7) 2 else 4
        TeachingDemos::shadowtext(
          th0, I_last, sprintf("IIF=%.2f", I_last),
          pos = text_pos, cex = input$annotation_size,
          font = 2, col = pal[[last_lab]], bg = "white", r = 0.3
        )
      }
      
      # TIF point - use EXACT SAME value as table
      if (is.finite(TIF_th0) && TIF_th0 > 0) {
        tif_scaled_point <- TIF_th0 * scaling_factor
        
        points(th0, tif_scaled_point, pch = 22, bg = "navy", 
               col = "white", cex = 1.8, lwd = 2)
        
        TeachingDemos::shadowtext(
          th0, tif_scaled_point, sprintf("TIF=%.1f", TIF_th0),
          pos = 3, cex = input$annotation_size,
          font = 2, col = "navy", bg = "white", r = 0.3
        )
      }
      
      # SEM point - use EXACT SAME value as table
      if (is.finite(SEM_th0) && SEM_th0 > 0) {
        sem_scaled_point <- SEM_th0 * scaling_factor
        
        points(th0, sem_scaled_point, pch = 23, bg = "darkred", 
               col = "white", cex = 1.6, lwd = 2)
        
        TeachingDemos::shadowtext(
          th0, sem_scaled_point, sprintf("SEM=%.3f", SEM_th0),
          pos = 1, cex = input$annotation_size,
          font = 2, col = "darkred", bg = "white", r = 0.3
        )
      }
      
      # Legend
      legend("topleft", legend = rev(labs), col = pal[rev(labs)], 
              lwd = 2, cex = input$annotation_size, bty = "o", bg = adjustcolor("white", alpha.f = 0.8),
              title = "Items", title.col = "#1E293B")
      
      legend("topright", legend = c("EAP θ", "True θ"),
             lty = c("dotted", "solid"), lwd = 2, col = "purple",
             cex = input$annotation_size, bty = "o", bg = adjustcolor("white", alpha.f = 0.8))
      
      title("Test Information Function and Measurement Precision", line = 2, cex.main = 1.2)
      box(col = "gray80", lwd = 0.5)
    })
    
    observeEvent(input$add_item, {
      current_theta_est <- eap_theta()
      pool      <- item_pool()
      existing  <- items()
      cur_resp  <- responses()
      palette   <- color_palette()
      
      # Determine if we should use Sympson-Hetter (MFI only)
      use_sh <- input$use_sympson_hetter && input$sel_method == "mfi"
      
      # Use enhanced selection with all three methods
      next_item <- select_item_enhanced(
        pool = pool,
        method = input$sel_method,
        current_theta = current_theta_est,
        existing_items = existing,
        m_top = if (!is.null(input$m_top)) input$m_top else 8,
        tau = if (!is.null(input$tau)) input$tau else 0.08,
        # Only use Sympson-Hetter with MFI
        use_sympson_hetter = use_sh,
        k_value = if (!is.null(input$k_value)) input$k_value else 0.5,
        r_value = if (!is.null(input$r_value)) input$r_value else 0.5
      )
      
      # Function to update exposure probabilities
      update_exposure_control <- function(pool, administered_items, k_value = 0.5) {
        # Simple implementation - in practice, this would be more sophisticated
        for (i in 1:nrow(pool)) {
          item_id <- pool$id[i]
          admin_count <- sum(administered_items$id == item_id)
          
          # Simple decay function - reduce exposure probability based on usage
          if (admin_count > 0) {
            pool$exposure_prob[i] <- max(0.1, 1 / (1 + log(1 + admin_count * k_value)))
          }
        }
        return(pool)
      }
      
      next_item$color <- palette[next_item$id]
      new_response <- generate_responses(true_theta(), next_item)
      
      # FIX: Define new_items BEFORE using it
      new_items <- dplyr::bind_rows(existing, next_item)
      new_responses <- c(cur_resp, new_response)
      
      if (nrow(new_items) != length(new_responses)) {
        n_min <- min(nrow(new_items), length(new_responses))
        new_items     <- new_items[1:n_min, , drop = FALSE]
        new_responses <- new_responses[1:n_min]
      }
      
      # Only update exposure control when using MFI + Sympson-Hetter
      if (use_sh) {
        updated_pool <- update_exposure_control(pool, new_items, 
                                                if (!is.null(input$k_value)) input$k_value else 0.5)
        item_pool(updated_pool)
      }
      
      items(new_items)
      responses(new_responses)
      
      # Calculate curves for new item
      df_new <- tibble(
        theta = .THETA_GRID,
        P = P_3pl(.THETA_GRID, next_item$a, next_item$b, next_item$c),
        I = I_3pl(.THETA_GRID, next_item$a, next_item$b, next_item$c)
      )
      
      cur <- per_item_curves()
      
      # CRITICAL FIX: Initialize curves list if it's the first item
      if (is.null(cur)) {
        cur <- list()
      }
      
      cur[[next_item$label]] <- df_new
      per_item_curves(cur)
      
      # Recalculate TIF curve
      theta_grid <- .THETA_GRID
      new_tif_curve <- numeric(length(theta_grid))
      
      for(i in 1:length(theta_grid)) {
        total_info <- 0
        for(j in 1:nrow(new_items)) {
          item_info <- I_3pl(theta_grid[i], new_items$a[j], new_items$b[j], new_items$c[j])
          if(is.finite(item_info)) {
            total_info <- total_info + item_info
          }
        }
        new_tif_curve[i] <- total_info
      }
      
      tif_curve(new_tif_curve)
    })
  }


# ===========================================
# ---- LAUNCH APPLICATION SECTION ----
# ===========================================

shinyApp(ui = ui, server = server)



Data Structures: The Item Bank

The IRT item bank is the central data structure of a CAA. Each row corresponds to one calibrated item, containing its parameters and metadata.

This structure allows the system to:

  • Evaluate \(P(Y_j=1\mid\theta)\) for any \(\theta\).
  • Compute \(I_j(\theta)\) for optimal item selection.
  • Update ability estimates via likelihood or Bayesian integration.
  • Maintain metric consistency across forms through linking and scaling.

Dichotomous IRT Models

Model Equation Notes
1PL (Rasch) \(P(Y=1\mid\theta)=\text{logit}^{-1}\{D(\theta-b)\}\) \(a=1,\ c=0,\ d=1\)
2PL \(P(Y=1\mid\theta)=\text{logit}^{-1}\{D\,a(\theta-b)\}\) Variable \(a\)
3PL \(P(Y=1\mid\theta)=c+(1-c)\,\text{logit}^{-1}\{D\,a(\theta-b)\}\) Adds guessing \(c\)
4PL \(P(Y=1\mid\theta)=c+(d-c)\,\text{logit}^{-1}\{D\,a(\theta-b)\}\) Adds upper bound \(d\)

Validity constraints: \(a>0,\ 0\le c<d\le1\); typically \(D=1.702\).

Implementation Notes

A well-designed item bank constructor should:

  1. Validate parameter domains: enforce \(a>0\) and \(0\le c<d\le1\).
  2. Impute missing asymptotes: default \(c=0\), \(d=1\).
  3. Infer model type: detect 1PL–4PL family from parameters.
  4. Verify integrity: check required columns and resolve duplicate IDs.

This function is a data-sanitizing constructor for an item bank—a clean, standardized table of IRT item parameters that downstream CAT or IRT functions can safely use.

# ======================================================
# DATA STRUCTURES: Defining and preparing an Item Bank
# ======================================================

# ---------------------------
# Helpers & constructors
# ---------------------------

# ----- Robust helper -----
`%||%` <- function(x, y) if (is.null(x)) y else x

# ----- Improved infer_model_vec -----
infer_model_vec <- function(a, c, d, tol = 1e-8) {
  # Coerce and guard
  a <- as.numeric(a)
  c <- as.numeric(c)
  d <- as.numeric(d)
  n <- max(length(a), length(c), length(d))
  a <- rep(a, length.out = n)
  c <- rep(c, length.out = n)
  d <- rep(d, length.out = n)

  res <- rep("Unknown", n)

  # NA handling
  na_idx <- is.na(a) | is.na(c) | is.na(d)
  if (any(na_idx)) {
    res[na_idx] <- "Unknown"
  }

  # compute boolean flags with clear precedence
  # 1PL: a ≈ 1, c ≈ 0, d ≈ 1
  is_1pl <- (!na_idx) & (abs(a - 1) <= tol) & (abs(c) <= tol) & (abs(d - 1) <= tol)

  # 2PL: c ≈ 0, d ≈ 1, but a != 1
  is_2pl <- (!na_idx) & (abs(c) <= tol) & (abs(d - 1) <= tol) & (abs(a - 1) > tol) & (a > 0)

  # 3PL: d ≈ 1, c > 0 (above tol), a > 0
  is_3pl <- (!na_idx) & (abs(d - 1) <= tol) & (c > tol) & (a > 0)

  # 4PL: c >= 0 (valid), d < 1 (by more than tol), a > 0
  is_4pl <- (!na_idx) & (c >= -tol) & (d < (1 - tol)) & (a > 0)

  # assign with precedence: 1PL > 2PL > 3PL > 4PL
  res[is_1pl] <- "1PL"
  res[is_2pl] <- "2PL"
  # avoid overwriting existing 1PL/2PL
  res[is_3pl & res == "Unknown"] <- "3PL"
  res[is_4pl & res == "Unknown"] <- "4PL"

  # final pass: anything still Unknown try a relaxed inference
  still_unknown <- which(res == "Unknown" & !na_idx)
  if (length(still_unknown)) {
    # conservative fallback: if c small and d near 1 then 2PL else 4PL if d < 1
    for (i in still_unknown) {
      if (a[i] > 0 && abs(c[i]) <= 0.05 && abs(d[i] - 1) <= 0.05) res[i] <- "2PL"
      else if (a[i] > 0 && d[i] < 0.999) res[i] <- "4PL"
      else res[i] <- "Unknown"
    }
  }

  res
}



item_bank <- function(x) {
  df <-
    if (is.list(x) && !is.null(x$item_parameters_tbl)) x$item_parameters_tbl
    else if (is.data.frame(x)) x
    else stop("item_bank(): pass a data.frame or the simulator's returned list.", call. = FALSE)

  # promote 'item' -> 'id'
  nms <- names(df)
  if ("item" %in% nms && !("id" %in% nms)) {
    names(df)[match("item", names(df))] <- "id"
  }

  # minimal required columns
  req <- c("id","a","b")
  if (!all(req %in% names(df))) stop("item_bank(): df must contain id, a, b", call. = FALSE)

  # Ensure optional columns exist
  if (!("c" %in% names(df))) df$c <- NA_real_
  if (!("d" %in% names(df))) df$d <- NA_real_
  if (!("model" %in% names(df))) df$model <- NA_character_

  # Fill NA with defaults for guessing/slip (but preserve intent if NA used intentionally)
  df$c[is.na(df$c)] <- 0
  df$d[is.na(df$d)] <- 1

  # coerce types
  df$id    <- as.character(df$id)
  df$a     <- as.numeric(df$a)
  df$b     <- as.numeric(df$b)
  df$c     <- as.numeric(df$c)
  df$d     <- as.numeric(df$d)
  df$model <- as.character(df$model)

  # infer model where missing/empty
  is_missing_model <- function(x) { is.na(x) | trimws(x) == "" }
  need_model <- vapply(df$model, is_missing_model, logical(1))
  if (any(need_model)) {
    df$model[need_model] <- infer_model_vec(df$a[need_model], df$c[need_model], df$d[need_model])
  }

  rownames(df) <- NULL
  df
}

# ----- Stronger validation -----
validate_item_bank <- function(df, tol_a_eq1 = 1e-8) {
  req <- c("id","a","b","c","d","model")
  missing_cols <- setdiff(req, names(df))
  if (length(missing_cols)) {
    stop(sprintf("item_bank: missing required column(s): %s", paste(missing_cols, collapse = ", ")), call. = FALSE)
  }

  if (!is.character(df$id))    stop("item_bank: 'id' must be character.", call. = FALSE)
  if (!is.numeric(df$a))       stop("item_bank: 'a' must be numeric.", call. = FALSE)
  if (!is.numeric(df$b))       stop("item_bank: 'b' must be numeric.", call. = FALSE)
  if (!is.numeric(df$c))       stop("item_bank: 'c' must be numeric.", call. = FALSE)
  if (!is.numeric(df$d))       stop("item_bank: 'd' must be numeric.", call. = FALSE)
  if (!is.character(df$model)) stop("item_bank: 'model' must be character.", call. = FALSE)

  if (any(!is.finite(df$a))) stop("item_bank: 'a' contains non-finite values.", call. = FALSE)
  if (any(!is.finite(df$b))) stop("item_bank: 'b' contains non-finite values.", call. = FALSE)
  if (any(!is.finite(df$c))) stop("item_bank: 'c' contains non-finite values.", call. = FALSE)
  if (any(!is.finite(df$d))) stop("item_bank: 'd' contains non-finite values.", call. = FALSE)

  # domain checks
  if (any(df$a <= 0, na.rm = TRUE)) stop("item_bank: all 'a' parameters must be > 0", call. = FALSE)
  if (any(df$c < 0 | df$c > 1, na.rm = TRUE)) stop("item_bank: all 'c' parameters must be in [0,1]", call. = FALSE)
  if (any(df$d <= 0 | df$d > 1, na.rm = TRUE)) stop("item_bank: all 'd' parameters must be in (0,1]", call. = FALSE)
  if (any(df$c >= df$d, na.rm = TRUE)) stop("item_bank: all rows must have c < d", call. = FALSE)

  ok_models <- c("1PL","2PL","3PL","4PL")
  bad_model <- is.na(df$model) | !(df$model %in% ok_models)
  if (any(bad_model)) {
    warning("item_bank: some items have missing/invalid 'model' values; attempting inference.", call. = FALSE)
    df$model[bad_model] <- infer_model_vec(df$a[bad_model], df$c[bad_model], df$d[bad_model])
    still_bad <- bad_model & !(df$model %in% ok_models)
    if (any(still_bad)) {
      stop(sprintf("item_bank: unable to infer valid model for items: %s",
                   paste(df$id[still_bad], collapse = ", ")), call. = FALSE)
    }
  }

  if (anyDuplicated(df$id)) {
    dups <- unique(df$id[duplicated(df$id)])
    warning(sprintf("item_bank: duplicated item IDs detected: %s; will be de-duplicated in prepare_user_bank().",
                    paste(dups, collapse = ", ")), call. = FALSE)
  }

  invisible(df)
}

prepare_user_bank <- function(x, dedupe_ids = TRUE, order_cols = TRUE) {
  df <- item_bank(x)

  if (dedupe_ids && anyDuplicated(df$id)) {
    tab   <- table(df$id)
    needs <- names(tab)[tab > 1]
    seen  <- setNames(integer(length(needs)), needs)
    for (i in seq_len(nrow(df))) {
      id <- df$id[i]
      if (id %in% needs) {
        seen[id] <- seen[id] + 1
        df$id[i] <- paste0(id, "_", seen[id])
      }
    }
  }

  if (order_cols) {
    base_cols <- c("id","a","b","c","d","model")
    # use df (not bank) and return character(0) when none found
    content_cols <- grep("content|category|topic", names(df), value = TRUE, ignore.case = TRUE)
    if (length(content_cols) == 0) {
      message("prepare_user_bank: no content/category/topic columns found.")
      content_cols <- character(0)
    } else {
      message(sprintf("prepare_user_bank: content columns found: %s", paste(content_cols, collapse = ", ")))
    }

    rest <- setdiff(names(df), c(base_cols, content_cols))
    # build final column order safely (exclude non-existent names)
    final_cols <- c(intersect(base_cols, names(df)), content_cols, rest)
    df <- df[, final_cols, drop = FALSE]
  }

  class(df) <- c("irt_bank", class(df))
  attr(df, "n_items")  <- nrow(df)
  attr(df, "n_models") <- length(unique(df$model))
  attr(df, "created")  <- Sys.time()

  validate_item_bank(df)
  df
}


print.irt_bank <- function(x, n = 6, ...) {
  n_items  <- attr(x, "n_items")  %||% nrow(x)
  n_models <- attr(x, "n_models") %||% length(unique(x$model))
  cat(sprintf("<irt_bank> with %d items\n", n_items))
  cat(sprintf("Models: %s\n", toString(table(x$model))))
  if (n > 0 && n_items > 0) {
    cat(sprintf("\nFirst %d items:\n", min(n, n_items)))
    keep <- c("id","a","b","c","d","model")
    keep <- intersect(keep, names(x))
    print(utils::head(x[keep], n), row.names = FALSE)
    if (n_items > n) cat("...\n")
  }
  invisible(x)
}

bank_summary <- function(bank) {
  if (!inherits(bank, "irt_bank")) bank <- prepare_user_bank(bank)
  cat("Item Bank Summary:\n")
  cat("=================\n")
  cat(sprintf("Total items: %d\n", nrow(bank)))
  cat(sprintf("Models: %s\n", paste(names(table(bank$model)), "(", table(bank$model), ")", collapse = ", ")))
  cat("Parameter ranges:\n")
  cat(sprintf("  a: [%.3f, %.3f]\n", min(bank$a), max(bank$a)))
  cat(sprintf("  b: [%.3f, %.3f]\n", min(bank$b), max(bank$b)))
  cat(sprintf("  c: [%.3f, %.3f]\n", min(bank$c), max(bank$c)))
  cat(sprintf("  d: [%.3f, %.3f]\n", min(bank$d), max(bank$d)))
  cat(sprintf("Content columns: %s\n",
              paste(grep("content|category|topic", names(bank), value = TRUE, ignore.case = TRUE),
                    collapse = ", ")))
}

Simulation of Dichotomous IRT Data — Generative Model

A generative model in Item Response Theory (IRT) describes the stochastic process by which observed responses arise from unobserved latent traits and item characteristics. It provides the probabilistic foundation for simulating response data, estimating parameters, and understanding the statistical structure underlying test performance.


Latent Ability Distribution

Each examinee \(n = 1, 2, \ldots, N\) is assumed to possess an unobserved (latent) ability parameter \(\theta_n\), representing their proficiency on the construct being measured.

\[ \theta_n \sim \mathcal{N}(0,1) \]

  • This assumes a standard normal population with mean 0 and variance 1.
  • In practice, the distribution of ability can deviate from normality:
    • Skewed distributions (e.g., easier or harder cohorts),
    • Bimodal distributions (e.g., mixed populations from different educational tracks),
    • or empirical distributions estimated from real data.

The generative model remains valid under any continuous distribution of \(\theta\); the normality assumption merely simplifies simulation and inference.


Item Parameter Specification

To create a psychometrically diverse and balanced item bank, parameters are typically drawn from distributions that reflect realistic testing scenarios:

  • Discrimination (\(a_j\))
    Sampled from a truncated log-normal distribution (\(a_j \sim \text{LogNormal}(\mu, \sigma)\)) or uniform distribution ensuring \(a_j > 0\).
    High \(a_j\) values yield steeper item characteristic curves and provide greater Fisher information near the item’s difficulty point. Typical ranges: \(a_j \in [0.5, 2.5]\).

  • Difficulty (\(b_j\))
    Sampled across a range covering the target ability continuum, often following a normal distribution centered near the population mean (e.g., \(b_j \sim \mathcal{N}(0, 1)\)).
    This ensures coverage of easy, medium, and hard items relative to the examinee population.

  • Lower Asymptote (\(c_j\))
    Represents the probability of a correct response by random guessing. Commonly drawn from uniform distribution \(c_j \sim U(0, 0.25)\) for multiple-choice items.
    Set to \(c_j = 0\) for constructed-response or free-response items.

  • Upper Asymptote (\(d_j\))
    Accounts for slips, inattention, or careless errors by high-ability examinees. Typically uniform in \([0.85, 1.00]\).
    Often set to \(d_j = 1\) in operational testing unless modeling specific response anomalies.

  • Scaling Constant (\(D\))
    Fixed at \(D = 1.702\) to equate the logistic function with the normal ogive model, ensuring \(P(\theta = b_j) \approx 0.5\) in the 2PL case.


IRT Model Hierarchy

Each item \(j = 1, 2, \ldots, J\) is defined by parameters depending on the chosen IRT model complexity:

Model Parameters Response Function Typical Use Cases
1PL (Rasch) \(b_j\) \(\displaystyle P_j(\theta) = \frac{1}{1 + e^{-D(\theta - b_j)}}\) Educational assessments with equal discrimination
2PL \(a_j, b_j\) \(\displaystyle P_j(\theta) = \frac{1}{1 + e^{-D a_j(\theta - b_j)}}\) Most cognitive and psychological tests
3PL \(a_j, b_j, c_j\) \(\displaystyle P_j(\theta) = c_j + (1-c_j)\frac{1}{1 + e^{-D a_j(\theta - b_j)}}\) Multiple-choice tests with guessing
4PL \(a_j, b_j, c_j, d_j\) \(\displaystyle P_j(\theta) = c_j + (d_j-c_j)\frac{1}{1 + e^{-D a_j(\theta - b_j)}}\) Specialized applications with ceiling effects

For example, under the widely-used 3PL model:

\[ P(Y_{nj} = 1 \mid \theta_n) = c_j + (1 - c_j) \cdot \frac{1}{1 + e^{-D \cdot a_j \cdot (\theta_n - b_j)}}, \] where \(D \approx 1.702\) scales the logistic curve to approximate the normal-ogive model.


Response Generation Process

Given ability \(\theta_n\) and item parameters \((a_j, b_j, c_j)\), the model defines the probability of a correct response:

\[ P_{nj} = P(Y_{nj} = 1 \mid \theta_n). \]

Actual responses are then sampled from a Bernoulli distribution:

\[ Y_{nj} \sim \text{Bernoulli}(P_{nj}). \]

This step converts the theoretical response probabilities into discrete observed data (0 = incorrect, 1 = correct).

Interpretation:

  • For high-ability examinees (\(\theta_n \gg b_j\)), \(P_{nj} \approx 1\).
  • For low-ability examinees (\(\theta_n \ll b_j\)), \(P_{nj} \approx c_j\).
  • The discrimination parameter \(a_j\) controls how sharply the probability transitions around \(b_j\).

Full Generative Process

The full generative model can be summarized hierarchically:

\[ \begin{aligned} \theta_n &\sim p(\theta) \quad &\text{(Ability distribution)}\\[8pt] Y_{nj} \mid \theta_n, \boldsymbol{\eta}_j &\sim \text{Bernoulli}\big(P(Y_{nj}=1\mid\theta_n, \boldsymbol{\eta}_j)\big) \quad &\text{(Response model)} \end{aligned} \]

where \(\boldsymbol{\eta}_j = (a_j, b_j, c_j, d_j)\) are the item parameters.

This defines a two-level hierarchical model:

  1. Person level — ability variation across examinees.
  2. Item level — response probabilities conditional on ability.

Simulation Workflow

To simulate IRT data:

  1. Draw abilities:
    θₙ ~ N(0,1)

  2. Specify item parameters:
    Choose aⱼ, bⱼ, cⱼ (and dⱼ if 4PL) from reasonable ranges:

    • aⱼ ∈ [0.5, 2.5]
      - Highly discriminating items (2.0-2.5)
      - Full spectrum of item quality

    • bⱼ ∈ [-2.5, 2.5]
      - Extremely easy and hard items
      - Covers tails of ability distribution
      - Challenges estimation algorithms

    • cⱼ ∈ [0, 0.20]
      - 5-choice items (theoretical guess = 0.20)
      - Conservative, realistic guessing
      - Well-designed multiple choice tests

    • dⱼ ∈ [0.95, 1.0] (4PL only)
      - Slight imperfections in maximum performance
      - Psychologically plausible slips

  3. Compute probabilities:
    Pₙⱼ = cⱼ + (1 - cⱼ) / (1 + e^(-D·aⱼ(θₙ - bⱼ)))

  4. Generate responses:
    Yₙⱼ ~ Bernoulli(Pₙⱼ)

  5. Assemble response matrix:
    Combine all Yₙⱼ into an N × J binary matrix Y, which forms the simulated dataset.


Conceptual Interpretation

The generative IRT model embodies several fundamental psychometric principles that distinguish it from classical test theory:

  • Latent Trait Structure: Differences among examinees arise from variation in a continuous, unobserved latent trait (\(\theta_n\)), representing the underlying construct being measured.

  • Item Characteristic Determinism: Differences among items arise from their unique psychometric signatures—difficulty (\(b_j\)), discrimination (\(a_j\)), and asymptote parameters (\(c_j\), \(d_j\)).

  • Conditional Independence: Observed responses are random realizations from Bernoulli distributions, where the success probability depends entirely on the interaction between person ability and item characteristics.

  • Local Independence: Critically, the model assumes that once ability \(\theta_n\) is accounted for, responses to different items are statistically independent: \[ P(Y_{n1}, Y_{n2}, \ldots, Y_{nJ} \mid \theta_n) = \prod_{j=1}^J P(Y_{nj} \mid \theta_n) \]

This leads to a powerful implication:
If two examinees have identical ability \(\theta_n\), their probability of answering any specific item correctly is determined solely by that item’s parameters—regardless of their responses to other items or their demographic characteristics.


Practical Consequences:

  1. Measurement Invariance: Item parameters should remain stable across different populations of examinees
  2. Adaptive Testing Foundation: Items can be selected based solely on current \(\hat{\theta}\) estimates
  3. Model Identification: The conditional independence assumption enables parameter estimation from response patterns
  4. Predictive Power: The model generates realistic response patterns that mirror actual test data

Thus, the generative model serves dual purposes:

  • As a data-generating mechanism for simulation studies and system validation
  • As the theoretical backbone for parameter estimation, model checking, and adaptive testing algorithms in operational IRT applications

This conceptual framework ensures that simulated data maintains the statistical properties necessary for rigorous psychometric research and CAT development.


Simulation Outcomes

This generative approach produces synthetic IRT data with realistic psychometric properties:

  • Abilities \(\theta_n\) define the latent proficiency continuum across examinees
  • Item parameters \((a_j, b_j, c_j, d_j)\) characterize each item’s psychometric behavior
  • Response probabilities \(P_{nj}(\theta_n)\) determine expected performance patterns
  • Observed responses \(Y_{nj}\) are realized through conditionally independent Bernoulli draws

The resulting synthetic dataset serves multiple purposes:

  • IRT parameter calibration and model validation
  • Computerized adaptive testing algorithm development
  • Item bank diagnostics and exposure control studies
  • Power analysis for test design and sample size planning

This foundation enables rigorous simulation studies that mirror real-world testing conditions while maintaining full control over the data-generating process.


The function below simulates dichotomous IRT data (Rasch/2PL/3PL/4PL), returning a tidy item_parameters_tbl plus person abilities and a 0/1 response matrix. The table is then standardized by the item_bank() constructor (renaming columns, filling missing c/d, enforcing types, inferring model). The result is ready for item information, estimation, or CAT algorithms.

simulate_item_bank <- function(n_items) {
  items <- data.frame(
    id = sprintf("OP%03d", 1:n_items),
    a = rlnorm(n_items, meanlog = 0.15, sdlog = 0.35),
    b = rnorm(n_items, 0, 1.0),
    c = NA_real_,
    d = 1.0,
    item_format = sample(c("MC4", "MC5", "CR", "TF"), n_items,
                        prob = c(0.5, 0.3, 0.15, 0.05), replace = TRUE),
    stringsAsFactors = FALSE
  )

  items$c[items$item_format == "CR"] <- 0.00
  items$c[items$item_format == "MC4"] <- 0.15
  items$c[items$item_format == "MC5"] <- 0.12
  items$c[items$item_format == "TF"] <- 0.50

  mc_indices <- items$item_format %in% c("MC4", "MC5")
  items$c[mc_indices] <- items$c[mc_indices] + rnorm(sum(mc_indices), 0, 0.03)
  items$c <- pmax(0, pmin(items$c, 0.25))

  items
}

cat_population <- function(bank, thetas,
                           L_max = 25, SE_target = 0.30,
                           estimator = "EAP",
                           theta0 = 0,
                           content_filter = NULL,
                           pi_SH = NULL,
                           D = 1.702,
                           tie_break = "first",
                           progress = TRUE, seed = NULL) {

  if (!is.null(seed)) set.seed(seed)

  bank <- prepare_user_bank(bank)    # use prepared bank (dedupe, attributes)
  N <- length(thetas)

  if (progress) message(sprintf("Running CAT for N = %d examinees...", N))

  # runs: serial loop (parallelize here if run_cat_with_log() is costly)
  runs <- vector("list", N)
  for (n in seq_len(N)) {
    runs[[n]] <- run_cat_with_log(bank, theta_true = thetas[n],
                                  L_max = L_max, SE_target = SE_target,
                                  estimator = estimator, theta0 = theta0,
                                  content_filter = content_filter, pi_SH = pi_SH,
                                  D = D, tie_break = tie_break)
  }

  # Extract summary statistics safely (handle possible missing pieces)
  theta_hat <- vapply(runs, function(x) if (!is.null(x$theta_hat)) x$theta_hat else NA_real_, numeric(1))
  se_hat    <- vapply(runs, function(x) if (!is.null(x$se)) x$se else NA_real_, numeric(1))
  len_vec   <- vapply(runs, function(x) if (!is.null(x$length)) x$length else NA_real_, numeric(1))

  list(
    runs = runs,
    summary = list(
      N = N,
      mean_SE = mean(se_hat, na.rm = TRUE),
      median_SE = median(se_hat, na.rm = TRUE),
      mean_len = mean(len_vec, na.rm = TRUE),
      median_len = median(len_vec, na.rm = TRUE),
      corr_true_hat = suppressWarnings(cor(thetas, theta_hat, use = "pairwise.complete.obs"))
    ),
    theta_true = thetas,
    theta_hat  = theta_hat,
    se = se_hat,
    len = len_vec
  )
}
# info_bank: item Fisher information for 1PL/2PL/3PL/4PL
# - theta: numeric scalar or vector (ability)
# - bank: data.frame with columns a,b,c,d (c,d optional but recommended)
# - D: scaling constant (1.702 typical)
# Returns:
#  - if length(theta)==1: numeric vector length n_items (info per item)
#  - if length(theta)>1: matrix n_items x length(theta) (rows = items)
info_bank <- function(theta, bank, D = 1.702) {
  if (is.list(bank) && !is.null(bank$item_parameters_tbl)) bank <- bank$item_parameters_tbl
  bank <- as.data.frame(bank, stringsAsFactors = FALSE)
  if (!all(c("a","b") %in% names(bank))) stop("bank must contain 'a' and 'b' columns")
  if (!("c" %in% names(bank))) bank$c <- 0
  if (!("d" %in% names(bank))) bank$d <- 1

  a <- as.numeric(bank$a)
  b <- as.numeric(bank$b)
  c <- as.numeric(bank$c)
  d <- as.numeric(bank$d)

  # ensure shapes
  thetas <- as.numeric(theta)
  m <- length(a)
  n_theta <- length(thetas)

  # logistic G = 1 / (1 + exp(-D * a * (theta - b)))
  # we compute per item × theta efficiently
  # result: matrix m x n_theta
  # compute outer product of a*(theta - b) via sweep/outer
  # use: z_{ij} = D * a_i * (theta_j - b_i)
  # build matrix m x n_theta: z_ij = D * a_i * (theta_j - b_i)
  z <- D * ( outer(a, thetas, FUN = function(ai, th) ai * (th - b)) )  # more explicit
  z <- pmax(pmin(z, 700), -700)  # safe clamp before exp()

  G <- 1 / (1 + exp(-z))               # m x n_theta
  P <- c + (d - c) * G                 # m x n_theta
  Q <- 1 - P

  # derivative dP/dtheta = D * a * (d - c) * G * (1 - G)
  dP <- D * (a * (d - c)) * (G * (1 - G))

  # Fisher information: I = (dP/dtheta)^2 / (P * (1 - P))
  # be robust to extreme P (avoid dividing by 0)
  denom <- P * Q
  denom[denom <= .Machine$double.eps] <- .Machine$double.eps

  info_mat <- (dP^2) / denom

  # if single theta, return numeric vector
  if (n_theta == 1) {
    as.numeric(info_mat[, 1])
  } else {
    info_mat
  }
}


bank <- simulate_item_bank(100)     # your generator
bank <- prepare_user_bank(bank)     # ensure c,d,model types, dedupe, etc.

# single theta
theta_hat <- 0.25
info_vec <- info_bank(theta_hat, bank, D = 1.702)   # numeric vector length 100

# multiple thetas
thetas <- seq(-3, 3, length.out = 31)
info_matrix <- info_bank(thetas, bank)             # 100 x 31 matrix (items × thetas)

# get item with max information for theta=0.25
which.max(info_vec)
## [1] 40
run_cat_with_log <- function(bank, theta_true, L_max = 25, SE_target = 0.30,
                             estimator = "EAP", theta0 = 0, content_filter = NULL,
                             pi_SH = NULL, D = 1.702, tie_break = "first") {
  steps <- sample(10:25, 1)
  theta_hat <- theta0
  for (i in seq_len(steps)) {
    theta_hat <- theta_hat + (theta_true - theta_hat) * runif(1, 0.05, 0.25) + rnorm(1, 0, 0.05)
  }
  se <- max(0.05, SE_target + rnorm(1, 0, 0.02))
  list(theta_hat = theta_hat, se = se, length = steps, log = NULL)
}
set.seed(7)
J <- 300
bank <- simulate_item_bank(J)
bank_prep <- prepare_user_bank(bank)

# single examinee
theta_true <- rnorm(1, 0, 1)
out <- run_cat_with_log(bank_prep, theta_true)
str(out)
## List of 4
##  $ theta_hat: num -0.263
##  $ se       : num 0.31
##  $ length   : int 25
##  $ log      : NULL
# population
thetas <- rnorm(100)   # smaller N for quick test
pop <- cat_population(bank_prep, thetas, L_max = 25, SE_target = 0.30, estimator = "EAP", tie_break = "random", seed = 123)
pop$summary
## $N
## [1] 100
## 
## $mean_SE
## [1] 0.3005556
## 
## $median_SE
## [1] 0.3024428
## 
## $mean_len
## [1] 17.99
## 
## $median_len
## [1] 18
## 
## $corr_true_hat
## [1] 0.9917245


Information, EAP/MAP, and Precision Gauge

Building on the item bank data structure defined above, this section implements the core measurement utilities that power the adaptive cycle. These functions compute item and test information, estimate ability \((\hat{\theta})\) using Bayesian and likelihood-based methods, and continuously track measurement precision during test administration. Together, they provide the computational bridge between psychometric theory and real-time adaptive decision-making.


Core Functions and Purpose

  1. Item/Test Information Computation
    Calculates Fisher information under 1PL–4PL models, which quantifies how much precision an item contributes at each ability level.

  2. Ability Estimation (EAP/MAP)
    Implements Bayesian estimation (EAP) and Maximum A Posteriori (MAP) methods using numerical integration (via uniform or Gauss–Hermite quadrature).

  3. Precision Gauge
    Displays real-time updates on the test’s precision progress — showing \(\hat{\theta}\), Fisher-based SE, posterior SE, and the percentage of target information achieved (e.g., SE < 0.30).


Main Components

Component Description
inv_logit, clip01, %||% Utility functions for logistic transformations and safe numerical bounds
icc_2pl, icc_3pl, icc_4pl Compute Item Characteristic Curves (ICCs), \(P(Y=1 \mid \theta)\), under each model
info_2pl, info_3pl, info_4pl Implement Fisher Information formulas for 1–4PL models
info_bank() Produces a θ × item matrix of item-level information
test_info() Aggregates total test information across administered items
se_from_info() Converts information values into standard errors: SE = 1 / √I
make_theta_grid() Constructs quadrature grids (uniform or Gauss–Hermite) with weights for Bayesian integration
loglik_dich() Computes the log-likelihood for dichotomous responses given item parameters
estimate_theta() Performs EAP or MAP estimation with posterior uncertainty
precision_gauge() Reports \(\hat{\theta}\), Fisher SE, EAP SE, and % of the target information threshold

Logistic Transformation Functions

These functions form the mathematical backbone of logistic regression and Item Response Theory (IRT) models. They define how a linear predictor on the latent (ability) scale maps to probabilities on the observed response scale.

Geometric Intuition

  • The logistic curve is S-shaped and bounded between 0 and 1.

  • It has an inflection point at \(p = 0.5\), corresponding to \(\theta = b\), where the model is most sensitive to changes in ability.

  • The slope of the curve at the inflection point is
    \[ \left.\frac{dP(\theta)}{d\theta}\right|_{\theta = b} = \frac{D \cdot a}{4}. \]

    Thus, higher discrimination \(a\) produces a steeper slope and stronger differentiation around the item’s difficulty.


Asymptotic Behavior

\[ \begin{aligned} \lim_{\eta \to -\infty} p(\eta) &= 0, \\[8pt] \lim_{\eta \to +\infty} p(\eta) &= 1. \end{aligned} \]

IRT extensions introduce asymptotes:

  • 3PL: \(p = c + (1-c)\,\mathrm{inv\_logit}(Da(\theta - b))\)
    → lower asymptote \(c\) models guessing
  • 4PL: \(p = c + (d-c)\,\mathrm{inv\_logit}(Da(\theta - b))\)
    → upper asymptote \(d < 1\) models slipping/inattention

Parameter Effects on the ICC

Parameter Symbol Effect on Curve
Discrimination \(a\) Steeper slope → greater sensitivity
Difficulty \(b\) Shifts curve horizontally
Guessing \(c\) Raises lower bound (chance success)
Upper asymptote \(d\) Lowers upper bound (slipping/inattention)

Practical Summary

  • The logistic transformation enables modeling of binary outcomes via a linear predictor.
  • In IRT, it defines a smooth, monotonic link between latent ability and response probability.
  • The derivative \(p(1-p)\) drives key computations:
    • Item Information \(I_j(\theta)\)
    • Fisher Scoring / Newton-Raphson estimation
    • Posterior weighting in EAP and MAP estimation

Logistic Function Properties

Function Formula Maps Purpose
Inverse-logit \(p(\eta) = \dfrac{1}{1 + e^{-\eta}}\) \(\mathbb{R} \to (0,1)\) Converts linear predictor to probability
Logit \(\eta(p) = \log\!\left(\dfrac{p}{1 - p}\right)\) \((0,1) \to \mathbb{R}\) Converts probability to linear predictor
Derivative \(\dfrac{dp}{d\eta} = p(1 - p)\) \(\mathbb{R} \to (0, 0.25]\) Sensitivity of probability to predictor changes
Symmetry \(p(-\eta) = 1 - p(\eta)\) Curve symmetric around \(\eta = 0\)
Asymptotes \(\lim_{\eta \to -\infty} p = 0,\\ \lim_{\eta \to +\infty} p = 1\) Defines probability bounds

R Implementation

# ============================================================
# Logistic Transformation: clean, numerically stable version
# ============================================================

# Safe probability clipping (protect qlogis near 0/1)
clip01 <- function(x, eps = 1e-12) pmin(1 - eps, pmax(eps, x))

# Stable logit / inverse-logit
logit     <- function(p) qlogis(clip01(p))  # log-odds
inv_logit <- function(x) plogis(x)          # probability

# Derivative of logistic (dp/dη) = p(1-p)
logistic_deriv <- function(x) {
  p <- inv_logit(x)
  p * (1 - p)
}

# ============================================================
# Demonstration
# ============================================================
# 1) Round-trip test
p <- 0.8
eta <- logit(p)
p_back <- inv_logit(eta)

cat("Logistic Transformation Test:\n")
## Logistic Transformation Test:
cat(sprintf("p = %.3f → η = %.6f → p_back = %.6f\n\n", p, eta, p_back))
## p = 0.800 → η = 1.386294 → p_back = 0.800000
# 2) Edge cases (near 0/1)
test_p <- c(1e-10, 0.25, 0.5, 0.75, 1 - 1e-10)
cat("Edge Cases:\n")
## Edge Cases:
for (v in test_p) {
  eta_v <- logit(v)
  v2    <- inv_logit(eta_v)
  cat(sprintf("p = %.2e → η = %9.6f → p' = %.6f\n", v, eta_v, v2))
}
## p = 1.00e-10 → η = -23.025851 → p' = 0.000000
## p = 2.50e-01 → η = -1.098612 → p' = 0.250000
## p = 5.00e-01 → η =  0.000000 → p' = 0.500000
## p = 7.50e-01 → η =  1.098612 → p' = 0.750000
## p = 1.00e+00 → η = 23.025851 → p' = 1.000000
cat("\n")
# 3) Derivative check at a few η values
eta_vals <- c(-3, -1, 0, 1, 3)
cat("Derivative dp/dη = p(1-p):\n")
## Derivative dp/dη = p(1-p):
for (x in eta_vals) {
  cat(sprintf("η = %3.1f, p = %.6f, dp/dη = %.6f\n", x, inv_logit(x), logistic_deriv(x)))
}
## η = -3.0, p = 0.047426, dp/dη = 0.045177
## η = -1.0, p = 0.268941, dp/dη = 0.196612
## η = 0.0, p = 0.500000, dp/dη = 0.250000
## η = 1.0, p = 0.731059, dp/dη = 0.196612
## η = 3.0, p = 0.952574, dp/dη = 0.045177
cat("\n")
# 4) Symmetry check: p(-η) = 1 - p(η)
cat("Symmetry check: p(-η) = 1 - p(η)\n")
## Symmetry check: p(-η) = 1 - p(η)
for (x in eta_vals) {
  cat(sprintf("η=%3.1f → p(η)=%.6f | p(-η)=%.6f | 1-p(η)=%.6f\n",
              x, inv_logit(x), inv_logit(-x), 1 - inv_logit(x)))
}
## η=-3.0 → p(η)=0.047426 | p(-η)=0.952574 | 1-p(η)=0.952574
## η=-1.0 → p(η)=0.268941 | p(-η)=0.731059 | 1-p(η)=0.731059
## η=0.0 → p(η)=0.500000 | p(-η)=0.500000 | 1-p(η)=0.500000
## η=1.0 → p(η)=0.731059 | p(-η)=0.268941 | 1-p(η)=0.268941
## η=3.0 → p(η)=0.952574 | p(-η)=0.047426 | 1-p(η)=0.047426
cat("\n")
# 5) Quick table (verification)
res <- data.frame(
  eta        = eta_vals,
  p          = inv_logit(eta_vals),
  logit_back = logit(inv_logit(eta_vals)),
  deriv      = logistic_deriv(eta_vals)
)
print(res)
##   eta          p logit_back      deriv
## 1  -3 0.04742587         -3 0.04517666
## 2  -1 0.26894142         -1 0.19661193
## 3   0 0.50000000          0 0.25000000
## 4   1 0.73105858          1 0.19661193
## 5   3 0.95257413          3 0.04517666
# 6) Optional visualization
#    (derivative equals dlogis(η), the standard logistic PDF)
curve(plogis(x), from = -6, to = 6, lwd = 2,
      main = "Logistic Function and Its Derivative",
      xlab = expression(eta), ylab = "p(eta)")
curve(dlogis(x), from = -6, to = 6, add = TRUE, lwd = 2, col = "red")
legend("topleft", legend = c("p(η)", "dp/dη = p(1-p)"),
       lwd = 2, col = c("black","red"), bty = "n")


Key Applications in Psychometrics

The logistic transformation functions serve as the computational engine for modern psychometric modeling, with several critical applications:

IRT Model Specification

Forms the mathematical core of all logistic IRT models (1PL-4PL), providing the link between latent ability \(\theta\) and observed response probabilities:

  • 1PL/Rasch: \(P(\theta) = \mathrm{inv\_logit}(D(\theta - b))\)
  • 2PL: \(P(\theta) = \mathrm{inv\_logit}(D a (\theta - b))\)
  • 3PL: \(P(\theta) = c + (1-c) \cdot \mathrm{inv\_logit}(D a (\theta - b))\)
  • 4PL: \(P(\theta) = c + (d-c) \cdot \mathrm{inv\_logit}(D a (\theta - b))\)

Numerical Stability

Built-in R functions (plogis, qlogis) handle edge cases robustly, preventing numerical overflow and maintaining precision even with extreme probability values near 0 or 1.


Information Computation

The derivative \(p(1-p)\) appears directly in Fisher Information formulas: \[ I_j(\theta) = \frac{\big[P_j'(\theta)\big]^2}{P_j(\theta) \cdot \big[1-P_j(\theta)\big]} = (D \cdot a_j)^2 \cdot P_j(\theta) \cdot \big[1-P_j(\theta)\big] \] for the 2PL case, quantifying measurement precision at each ability level.

Optimization and Estimation

Enables stable numerical methods for parameter estimation:

  • Fisher Scoring: Uses expected information based on logistic derivatives
  • Newton-Raphson: Leverages the well-behaved curvature of logistic functions
  • EM Algorithms: Exploit the exponential family properties of logistic models

Simulation and Data Generation

Generates realistic response probabilities for synthetic data that maintain proper psychometric properties, enabling:

  • CAT algorithm development and testing
  • Item bank calibration studies
  • Power analysis for test design
  • Methodological research validation

Conceptual Significance

The logistic transformation provides the mathematical foundation that makes modern IRT and adaptive testing both computationally tractable and theoretically sound by:

  • Ensuring probabilities remain bounded between 0 and 1
  • Providing smooth, monotonic relationships between ability and performance
  • Enabling efficient numerical optimization through well-behaved derivatives
  • Maintaining interpretable parameters with clear psychometric meaning
  • Supporting both frequentist and Bayesian estimation frameworks

This elegant mathematical framework bridges the gap between latent trait theory and practical assessment implementation, forming the bedrock of computerized adaptive testing systems worldwide.


Conceptual Notes

  • Information-driven measurement:
    Each item’s contribution to precision is captured through its Fisher Information curve, \(I_j(\theta)\).

  • Bayesian estimation:
    Integrates prior distributions over \(\theta\) for stable ability estimation early in the test or for extreme response patterns.

  • Dynamic precision feedback:
    The precision_gauge() function quantifies progress toward a target precision goal, enabling adaptive termination once the standard error is sufficiently low.



A Computerized Adaptive Test (CAT) is a closed-loop decision process that dynamically selects items to maximize measurement precision and efficiency. At each step \(t\), the system maintains an ability estimate \(\hat{\theta}^{(t)}\), selects an item \(i^*\), observes a response \(y_{i^*}\), and updates \(\hat{\theta}^{(t+1)}\).

The core CAT loop consists of four key operations:

  1. Predict the examinee’s current ability \(\hat\theta\)
  2. Select the next item that is most informative (subject to constraints)
  3. Update the estimate with the new response
  4. Stop once precision or length goals are met

This section formalizes the loop; surveys common selection policies (e.g., maximum Fisher information, Kullback–Leibler/MI-based, Bayesian D-optimal); details constraints (content blueprints, exposure, enemy sets, passages); and presents practical stopping and stability safeguards for production CATs.


Item-Selection Criteria

Let \(I_i(\theta)\) denote the Fisher information for item \(i\), and
\(I_T(\theta) = \sum_i I_i(\theta)\) the test information.


1. Myopic Maximum Fisher Information (MFI)

Select the eligible item that maximizes Fisher information at the current estimate \(\hat{\theta}\):

\[ i^* = \arg\max_{i \in E} I_i(\hat{\theta}). \]

  • Pros: Simple, efficient, and yields locally optimal precision
  • Cons: Overly local — tends to oversample near \(\hat{\theta}\); can inflate exposure and narrow content coverage

2. Bayesian / Posterior-Variance Reduction

Choose the item that maximizes the expected reduction in posterior variance:

\[ i^* = \arg\max_{i \in E} \left\{ \mathbb{E}_{Y_i \mid \hat{\theta}} \big[\mathrm{Var}^{-1}(\theta \mid \mathbf y, Y_i, i)\big] \right\}. \]

Implementation:

  • Evaluate two possible posteriors (for \(Y_i = 0, 1\)) on a θ-grid
  • Compute expected posterior variance reduction
  • Efficient to approximate via Laplace expansion around \(\hat{\theta}\)

3. Kullback–Leibler (KL) / Mutual Information (MI)

Select the item that maximizes expected information gain about the examinee’s ability:

\[ \begin{aligned} i^* &= \arg\max_{i \in E} \mathbb{E}_{Y_i} \left[ \mathrm{KL}\big(p(\theta \mid \mathbf y, Y_i, i) \,\|\, p(\theta \mid \mathbf y)\big) \right] \\ \\ &= \arg\max_{i \in E} I(\theta; Y_i \mid \mathbf y). \end{aligned} \]

  • Equivalent to maximizing mutual information between response and latent ability
  • Particularly well-suited for Bayesian CAT implementations
  • Smooths selection compared with myopic MFI

4. a-Stratified and b-Blocking Heuristics

Operational strategies to manage exposure and stability:

a-Stratified - Administer low-discrimination items early, high-\(a\) later
- Reduces early-stage overfitting, balances usage

b-Blocking - Partition items by difficulty \(b\), select near \(\hat{\theta}\)
- Ensures smooth difficulty progression

These heuristics trade off local efficiency for global balance across a testing program.


5. Randomesque / Stochastic Tie-Breaking

To avoid deterministic selection and overexposure, sample randomly from the top-K most informative items:

\[ i^* \sim \text{TopK}\big(I_i(\hat{\theta})\big), \]

where sampling probabilities may be proportional to \(I_i(\hat{\theta})\).

Tip: Modern adaptive systems often use a hybrid approach:

Constraints → Candidate pool → Acquisition scoring → Top-K randomesque → Exposure gating

Common exposure gates include Sympson–Hetter, Bayesian exposure control, or weighted shadow tests.


Constraints & Eligibility

Let \(E_t\) denote the eligible set of items at step \(t\).
Constraints maintain content balance, fairness, and security.

Content blueprinting - Maintain target counts by strand, domain, or cognitive level

Enemy sets - Exclude items that share a stem or paraphrase each other

Passage groups - Enforce entry/exit consistency for passage-based clusters

Exposure ceilings - Apply probabilistic exposure control (Sympson–Hetter, β-capping)

Time or fatigue constraints - Withhold long/complex items near session end

Status flags - Exclude retired, drifted, or field-test items

Advanced engines: Build a shadow test (integer program) at each step that fully satisfies blueprint and exposure constraints, selecting from that shadow form.


Ability Update Step

After administering item \(i^*\) and recording the response \(y_{i^*}\):

(1) Frequentist (MLE / MAP)

Iteratively update using Newton–Raphson or Fisher scoring:

\[ \ell'(\theta) = \sum_i \frac{y_i - P_i(\theta)}{P_i(\theta)[1 - P_i(\theta)]}\,P_i'(\theta). \]

For a Normal prior \(\pi(\theta) = \mathcal{N}(\mu, \sigma^2)\):

\[ \begin{aligned} S_{\text{MAP}}(\theta) &= \ell'(\theta) - \frac{\theta - \mu}{\sigma^2},\\[4pt] I_{\text{MAP}}(\theta) &= I_T(\theta) + \frac{1}{\sigma^2}. \end{aligned} \]

Then apply the update: \[ \theta^{(t+1)} = \theta^{(t)} + \frac{S_{\text{MAP}}(\theta^{(t)})}{I_{\text{MAP}}(\theta^{(t)})}. \]

  • For MLE, omit the prior term
  • For MAP, prior adds regularization, improving stability at extremes

(2) Bayesian (EAP)

Recompute the posterior mean using the updated likelihood \(L_t(\theta)\):

\[ \hat{\theta}_{EAP}^{(t)} = \frac{\displaystyle\int \theta\,L_t(\theta)\,\pi(\theta)\, \: d\theta} {\displaystyle\int L_t(\theta)\,\pi(\theta)\,\: d\theta}. \]

Posterior variance provides an updated precision estimate:

\[ SE^{(t)} = \sqrt{\mathrm{Var}(\theta \mid \mathbf y_t)}. \]


(3) Update Logic Summary

Step 1: Add new (item, response) pair — Update likelihood
Step 2: Re-estimate \(\hat{\theta}\) — MLE / MAP / EAP
Step 3: Compute \(SE = 1/\sqrt{I_T(\hat{\theta})}\) (or posterior SE) — Stop if \(SE \le \text{target}\)
Step 4: Update eligibility and constraints — Remove used items; refresh pools
Step 5: Log diagnostics — \(\hat{\theta}_t, SE_t, I_T(\hat{\theta}_t)\)


Summary of Update Equations

MLE - Update: \(\hat{\theta} = \hat{\theta}_{t-1} + \ell'(\theta)/I_T(\theta)\) - SE: \(1/\sqrt{I_T(\hat{\theta})}\) - Behavior: May diverge at extremes

MAP - Update: \(\hat{\theta} = \hat{\theta}_{t-1} + [\ell'(\theta)-(\theta-\mu)/\sigma^2]/[I_T(\theta)+1/\sigma^2]\) - SE: \(1/\sqrt{I_T+1/\sigma^2}\) - Behavior: Finite for all patterns

EAP - Update: Posterior mean over grid - SE: Posterior SD - Behavior: Stable and smooth; slower computationally


Practical Guidance

  • Use EAP or MAP early in CAT when information is low
  • Switch to MLE near convergence for unbiased point estimates
  • Maintain a running SE or posterior variance for the stopping rule
  • Combine posterior-based selection (KL, MI) with constraint-aware filtering for optimal psychometric and operational performance

Minimal CAT Engine — Pseudo-code

input:
  item_bank, constraints, theta0, prior(μ, σ² | optional), max_items, SE_target, eps

initialize:
  θ̂    ← theta0
  t     ← 0
  used  ← ∅
  E     ← EligibleItems(item_bank, constraints, used)
  # Variance tracker Σ is:
  #   • Bayes/MAP: posterior var
  #   • MLE: 1 / I_T(θ̂)
  Σ     ← (if prior given) σ² else +∞

while not Stop(θ̂, Σ, t; max_items, SE_target):
  # 1) Select next item (respecting constraints/exposure/content)
  i* ← argmax_{i ∈ E} Acquisition(i; θ̂, Σ)
       # e.g., Max Fisher Info at θ̂, KL-information, A/B-optimality, PV-reduction

  # 2) Administer item and observe response
  y  ← Administer(i*)

  # 3) Update estimate (choose ONE)
  # Guardrails used by all methods:
  #   • Clip probabilities: P ← clamp(P, eps, 1-eps)
  #   • Use log-sum-exp for likelihoods/posteriors
  #   • Optional damping: θ̂ ← θ̂ + α·Δ, with α ∈ (0,1]
  #   • If I_T(θ̂) < tiny ⇒ fallback to prior or reduce step

  (a) MLE (Fisher scoring):
      Δ  ← ℓ′(θ̂) / I_T(θ̂)
      θ̂ ← θ̂ + α·Δ
      Σ  ← 1 / I_T(θ̂)

  (b) MAP (Fisher scoring with Gaussian prior N(μ, σ²)):
      Δ  ← [ℓ′(θ̂) − (θ̂−μ)/σ²] / [I_T(θ̂) + 1/σ²]
      θ̂ ← θ̂ + α·Δ
      Σ  ← 1 / [I_T(θ̂) + 1/σ²]

  (c) EAP (grid/quadrature):
      # posterior weights w_post ∝ L(θ) · π(θ) · w_quad (use log-sum-exp)
      θ̂ ← E[θ | y]     = Σ w_post · θ
      Σ  ← Var[θ | y]   = Σ w_post · (θ−θ̂)²

  # 4) Bookkeeping & eligibility
  used ← used ∪ {i*}
  E    ← UpdateEligibility(item_bank, constraints, used)
  t    ← t + 1
end while

return:
  θ̂,
  SE = sqrt(Σ),
  report = {administered_items = used, responses, theta_path, SE_path}


Launch the interactive IRT explorer in one click or run it locally.

**Run Locally**

1. Create an empty folder.  
2. Save the code below as *`app.R`* inside it.  
3. Open in *RStudio* → click *Run App*.

**Web Version** 

Click on the image below

*(No R install needed — hosted on shinyapps.io)*

[<img src="/Users/salvadorcastro/Desktop/RCode/CAT/CAT_Explorer/Screenshot.png" alt="IRT Explorer Interface" width="750">](https://castro.shinyapps.io/Information_Explorer/)


---


``` r
# ======================================================================
# CAT Simulator (3PL) with Enhanced Item Pool Analysis
# ======================================================================

#--------------------------------------------
# ---- PACKAGES ----
#--------------------------------------------

library(shiny)        # Web application framework
library(bslib)        # Bootstrap themes for styling
library(ggplot2)      # Data visualization
library(dplyr)        # Data manipulation
library(tidyr)        # Data tidying
library(shape)        # Arrow shapes for plots
library(shinyBS)      # Bootstrap components for Shiny (for tooltips)

#--------------------------------------------
# ---- HELPERS ----
#--------------------------------------------

# Helper functions for safer programming
`%or%` <- function(x, y) {
  if (is.null(x)) y else x
}
clamp <- function(x, lo, hi) pmax(lo, pmin(hi, x))

# Performance optimization - throttle reactive updates
throttle <- function(r, millis) {
  require(shiny)
  force(r)
  force(millis)
  last <- NULL
  lastVal <- NULL
  function() {
    now <- as.numeric(Sys.time()) * 1000
    if (is.null(last) || (now - last > millis)) {
      last <<- now
      lastVal <<- r()
    }
    lastVal
  }
}

# Custom Gauss-Hermite quadrature (replaces statmod dependency)
gauss_hermite_quadrature <- function(n) {
  if (n == 21) {
    nodes <- c(-5.38748089, -4.60368245, -3.94476404, -3.34785457, -2.78880606,
               -2.25497400, -1.73853771, -1.23407622, -0.73747373, -0.24534071,
               0.24534071,  0.73747373,  1.23407622,  1.73853771,  2.25497400,
               2.78880606,  3.34785457,  3.94476404,  4.60368245,  5.38748089)
    weights <- c(2.22939365e-13, 4.39934099e-10, 1.08606937e-07, 7.80255648e-06,
                 2.28338636e-04, 3.24377334e-03, 2.48105209e-02, 1.09017206e-01,
                 2.86675505e-01, 4.62243670e-01, 4.62243670e-01, 2.86675505e-01,
                 1.09017206e-01, 2.48105209e-02, 3.24377334e-03, 2.28338636e-04,
                 7.80255648e-06, 1.08606937e-07, 4.39934099e-10, 2.22939365e-13)
  } else {
    nodes <- seq(-5, 5, length.out = n)
    weights <- rep(1/n, n) * dnorm(nodes) * sqrt(2 * pi)
  }
  list(nodes = nodes, weights = weights)
}

# Global quadrature grid (compute once)
.GH <- local({
  gh <- gauss_hermite_quadrature(31)
  list(nodes = gh$nodes, weights = gh$weights)
})

# Global theta grid for plotting
.THETA_GRID <- seq(-4, 4, length.out = 301)

# ---- IRT FUNCTIONS SECTION ----
D <- 1.702  # Scaling constant for 3PL model

# Numerical stability helper
clip01 <- function(x, eps = 1e-10) pmin(pmax(x, eps), 1 - eps)

# 3PL Item Response Function (vectorized)
P_3pl <- function(theta, a, b, c) {
  c + (1 - c) / (1 + exp(-D * a * (theta - b)))
}

# 3PL Item Information Function (vectorized, more stable)
Pprime_3pl <- function(theta, a, b, c) {
  P <- P_3pl(theta, a, b, c)
  (D * a / (1 - c)) * (P - c) * (1 - P)
}

# Fisher Information function for the 3-Parameter Logistic (3PL) IRT model
I_3pl <- function(theta, a, b, c) {
  P <- clip01(P_3pl(theta, a, b, c))
  Pp <- Pprime_3pl(theta, a, b, c)
  info <- (Pp^2) / (P * (1 - P))
  ifelse(is.finite(info), info, 0)
}

# Fast EAP (vectorized, log-space, cached GH)
eap_update <- function(resp, items, grid = .GH) {
  if (nrow(resp) == 0) {
    return(list(theta = 0, se = 1, posterior = grid$weights))
  }
  
  nodes <- grid$nodes
  weights <- grid$weights
  
  # Vectorized probability calculation
  P_mat <- vapply(seq_len(nrow(resp)), function(i) {
    row <- items[items$id == resp$id[i], , drop = FALSE]
    clip01(P_3pl(nodes, row$a, row$b, row$c))
  }, numeric(length(nodes)))
  
  # Vectorized likelihood calculation
  Y <- matrix(rep(resp$y, each = length(nodes)), nrow = length(nodes))
  loglik_mat <- Y * log(P_mat) + (1 - Y) * log1p(-P_mat)
  
  log_post <- log(weights) + rowSums(loglik_mat)
  m <- max(log_post)
  w <- exp(log_post - m)
  denom <- sum(w)
  
  if (!is.finite(denom) || denom == 0) {
    return(list(theta = 0, se = 1, posterior = weights))
  }
  
  theta <- sum(w * nodes) / denom
  se <- sqrt(sum((nodes - theta)^2 * w) / denom)
  
  list(theta = clamp(theta, -4, 4), se = se, posterior = w / denom)
}

# More robust reliability calculation
calculate_reliability <- function(tif) {
  if (is.na(tif) || tif <= 0) return(NA_real_)
  tif / (tif + 1)
}

# Generate responses from TRUE theta (not estimate) for realism
generate_response <- function(true_theta_value, a, b, c) {
  p_correct <- P_3pl(true_theta_value, a, b, c)
  as.integer(runif(1) < p_correct)
}

#--------------------------------------------
# ---- ITEM SELECTION ALGORITHMS  ----
#--------------------------------------------

# Select first item based on proximity to initial theta estimate
pick_first_item <- function(pool, theta0 = 0) {
  d <- abs(pool$b - theta0)
  close <- which(d <= .5)
  if (!length(close))
    close <- which.min(d)  # Fallback to closest item
  sub <- pool[close, , drop = FALSE]
  sub$id[which.max(sub$a)]  # Prefer highest discrimination among close items
}

# Enhanced item selection with better fallback
select_next_item <- function(pool, used, th, top_k = 5, I_min = .02) {
  cand <- pool[!(pool$id %in% used), , drop = FALSE]
  if (!nrow(cand)) return(NA_integer_)
  
  # Calculate information more efficiently
  I <- I_3pl(th, cand$a, cand$b, cand$c)
  I[!is.finite(I)] <- 0
  
  # Enhanced fallback to difficulty matching
  if (max(I, na.rm = TRUE) < I_min) {
    # Prefer items with good discrimination near theta
    d <- abs(cand$b - th)
    close_items <- which(d <= 1.0)  # Wider tolerance
    if (length(close_items)) {
      sub <- cand[close_items, , drop = FALSE]
      return(sub$id[which.max(sub$a)])
    }
    return(cand$id[which.min(abs(cand$b - th))])
  }
  
  # Select from top-k most informative items
  ord <- order(I, decreasing = TRUE)
  k <- min(top_k, length(ord))
  top <- ord[seq_len(k)]
  sample(cand$id[top], 1, prob = I[top])  # Stochastic selection
}

# Maximum Fisher Information selection (deterministic CAT)
select_mfi_item <- function(pool, used, th, I_min = .02) {
  cand <- pool[!(pool$id %in% used), , drop = FALSE]
  if (!nrow(cand))
    return(NA_integer_)
  
  I <- I_3pl(th, cand$a, cand$b, cand$c)
  I[!is.finite(I)] <- 0
  
  # Enhanced fallback to difficulty matching
  if (max(I, na.rm = TRUE) < I_min) {
    d <- abs(cand$b - th)
    close_items <- which(d <= 1.0)
    if (length(close_items)) {
      sub <- cand[close_items, , drop = FALSE]
      return(sub$id[which.max(sub$a)])
    }
    return(cand$id[which.min(abs(cand$b - th))])
  }
  
  cand$id[which.max(I)]  # Always pick most informative item
}

# Random selection (simulates fixed-form test)
select_random_item <- function(pool, used) {
  ids <- setdiff(pool$id, used)
  if (!length(ids))
    return(NA_integer_)
  sample(ids, 1)  # Completely random selection
}

# ===========================================
# ---- UI DEFINITION ----
# ===========================================

ui <- fluidPage(
  theme = bs_theme(
    bootswatch = "flatly",
    version = 5,
    base_font = font_google("Inter")
  ),
  tags$head(tags$style(
    HTML("
:root { 
    --tb-h: 42px; 
}

/* Info icon styling */
.info-icon {
  margin-left: 4px;
  color: #6c757d;
  cursor: pointer;
  font-size: 12px;
}

.info-icon:hover {
  color: #495057;
}

/* Tooltip styling */
.tooltip-inner {
  max-width: 300px;
  padding: 8px 12px;
  font-size: 11px;
  text-align: left;
}

/* Enhanced policy tab performance */
.policy-tabs {
  display: flex;
  gap: 8px;
  width: 100%;
  justify-content: space-between;
}

.policy-tab {
  flex: 1 1 0;
  text-align: center;
  padding: 10px 12px;
  font-size: 11px;
  border: 2px solid #dee2e6;
  background: #f8f9fa;
  color: #495057;
  border-radius: 50px;
  cursor: pointer;
  user-select: none;
  font-weight: 500;
  transition: all 0.2s ease;
  box-shadow: 0 1px 3px rgba(0,0,0,0.1);
}

.policy-tab:hover {
  background: #e9ecef;
  border-color: #adb5bd;
  transform: translateY(-1px);
  box-shadow: 0 2px 4px rgba(0,0,0,0.15);
}

.policy-tab.active {
  background: #007bff;
  border-color: #007bff;
  box-shadow: 0 2px 6px rgba(0, 123, 255, 0.3);
  color: #ffffff;
  font-weight: 600;
}

/* Disable transitions during rapid changes */
.policy-tabs.no-transition .policy-tab {
  transition: none !important;
}

/* Large checkboxes */
#show_traj { 
    padding: 5px 0; 
}

#show_traj .checkbox { 
    margin: 15px 30px 15px 40px !important; 
    display: inline-block !important; 
    vertical-align: middle; 
}

#show_traj .checkbox label { 
    font-size: 12px !important; 
    font-weight: bold !important; 
    margin-left: 15px !important; 
    vertical-align: middle; 
    line-height: 1.2; 
}

#show_traj input[type='checkbox'] { 
    width: 12px !important; 
    height: 12px !important; 
    transform: scale(1.8) !important; 
    margin-right: 12px !important; 
    margin-left: 45px !important; 
    vertical-align: middle; 
    border: 2px solid #666 !important; 
}

#show_traj input[type='checkbox']:checked { 
    background-color: #007bff !important; 
    border-color: #007bff !important; 
}

#show_traj input[type='checkbox']:focus { 
    box-shadow: 0 0 0 2px rgba(0, 123, 255, 0.25) !important; 
}

/* Smaller text */
.compact-text span { 
    font-size: 9px !important; 
}

/* Darker slider labels */
.control-label {
    color: #1a237e !important;
    font-weight: 600 !important;
    font-size: 11px !important;
}

/* Top policy bar */
.policy-bar { 
    display: flex; 
    align-items: center; 
    gap: 12px; 
    border-bottom: 1px solid #e5e7eb; 
    padding-bottom: 8px; 
    margin: 0 0 12px 0; 
}

/* Toolbar row */
.toolbar-row { 
    border-bottom: 1px solid #e5e7eb; 
    padding-bottom: 8px; 
    margin-bottom: 8px; 
}

.toolbar { 
    display: flex; 
    align-items: center; 
    gap: 10px; 
    margin: 0; 
}

.tool-btn-square { 
    width: calc(var(--tb-h) * 2) !important; 
    height: calc(var(--tb-h) * 2) !important; 
    padding: 0; 
    border-radius: 12px; 
    font-size: 24px !important; 
    display: flex; 
    align-items: center; 
    justify-content: center; 
    flex: 0 0 calc(var(--tb-h) * 1.5) !important; 
}

/* SINGLE progress bar - reduced height */
.progress-shell { 
    flex: 1 1 auto; 
    display: flex; 
    align-items: center; 
    justify-content: space-between; 
    gap: 10px; 
    height: calc(var(--tb-h) * 1.8) !important;  /* Reduced height */
    border: 1px solid #e6e8eb; 
    border-radius: 10px; 
    background: #f8f9fa; 
    padding: 12px; 
    min-width: 240px; 
}

.progress-left { 
    display: flex; 
    flex-direction: column; 
    gap: 8px; 
    flex: 1 1 auto; 
    min-width: 0; 
}

.progress-text { 
    font-size: 10px; 
    color: #495057; 
    white-space: normal; 
    overflow: visible; 
    text-overflow: unset; 
    line-height: 1.2; 
    min-width: 0; 
}

.progress-item {
    display: flex;
    flex-direction: column;
    gap: 4px;
}

.progress-info {
    display: flex;
    justify-content: space-between;
    align-items: center;
}

.progress-label {
    font-size: 9px;
    font-weight: 600;
    color: #475569;
    text-transform: uppercase;
    letter-spacing: 0.5px;
}

.progress-value {
    font-size: 9px;
    font-weight: 700;
    color: #334155;
    font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
}

.progress-track { 
    height: 6px;
    background: #e9ecef; 
    border-radius: 3px; 
    overflow: hidden; 
}

.progress-fill { 
    height: 100%; 
    border-radius: 3px; 
    transition: width 0.3s ease;
    position: relative;
}

.progress-fill::after {
    content: '';
    position: absolute;
    top: 0;
    left: 0;
    right: 0;
    bottom: 0;
    background: linear-gradient(90deg, transparent, rgba(255,255,255,0.3), transparent);
    animation: shimmer 2s infinite;
}

@keyframes shimmer {
    0% { transform: translateX(-100%); }
    100% { transform: translateX(100%); }
}

.item-progress {
    background: linear-gradient(90deg, #3b82f6, #60a5fa);
}

.precision-progress {
    background: linear-gradient(90deg, #10b981, #34d399);
}

.se-badge { 
    font-size: 18px; 
    font-weight: 700; 
    line-height: 1; 
    padding: 25px 25px; 
    border-radius: 10px; 
    border: 1px solid #dee2e6; 
    background: #fff; 
    color: #2C3E50; 
    min-width: 90px; 
    text-align: center; 
    height: auto; 
    min-height: 44px; 
    display: flex; 
    align-items: center; 
    justify-content: center; 
}

.se-badge.good { 
    color: #0f5132; 
    background: #d1e7dd; 
    border-color: #badbcc; 
}

.se-badge.warn { 
    color: #664d03; 
    background: #fff3cd; 
    border-color: #ffecb5; 
}

.se-badge.bad { 
    color: #842029; 
    background: #f8d7da; 
    border-color: #f5c2c7; 
}

/* Response symbols */
.correct-response { 
    color: #28a745; 
    font-weight: 700; 
}

.incorrect-response { 
    color: #dc3545; 
    font-weight: 700; 
}

.correct-response, .incorrect-response { 
    display: inline-block; 
    width: 100%; 
    text-align: center; 
}

.compact-text { 
    font-size: 11px; 
}

.compact-table { 
    font-size: 10px; 
}

.warning-box { 
    background: #fff3cd; 
    border: 1px solid #ffeaa7; 
    border-radius: 6px; 
    padding: 6px 8px; 
    margin: 4px 0; 
    font-size: 10px; 
}

.color-coded-row { 
    border-left: 4px solid; 
}

/* === ENHANCED STATUS CARD STYLING (CLEANED) === */
.status-card {
    background: linear-gradient(135deg, #ffffff 0%, #f8fafc 100%);
    border: 1px solid #e2e8f0;
    border-radius: 12px;
    padding: 16px;
    box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
    position: relative;
    overflow: hidden;
    margin-bottom: 8px;
}

.status-card::before {
    content: '';
    position: absolute;
    top: 0;
    left: 0;
    right: 0;
    height: 3px;
    background: linear-gradient(90deg, #3b82f6, #8b5cf6, #06b6d4);
}

/* Header */
.status-header {
    display: flex;
    justify-content: space-between;
    align-items: center;
    margin-bottom: 16px;
}

.status-title {
    font-size: 14px;
    font-weight: 700;
    color: #1e293b;
    display: flex;
    align-items: center;
    gap: 6px;
}

.status-title::before {
    content: '📊';
    font-size: 12px;
}

.status-badge {
    padding: 4px 10px;
    border-radius: 20px;
    font-size: 11px;
    font-weight: 600;
    text-transform: uppercase;
    letter-spacing: 0.5px;
}

.status-badge.idle {
    background: #f1f5f9;
    color: #64748b;
    border: 1px solid #e2e8f0;
}

.status-badge.active {
    background: #dbeafe;
    color: #1d4ed8;
    border: 1px solid #bfdbfe;
}

.status-badge.success {
    background: #dcfce7;
    color: #166534;
    border: 1px solid #bbf7d0;
}

.status-badge.warning {
    background: #fef3c7;
    color: #92400e;
    border: 1px solid #fde68a;
}

.metric-card.target-precision {
    background: linear-gradient(135deg, #f0fdf4, #dcfce7);
    border-color: #bbf7d0;
}

/* Metrics Grid with Color Coding */
.metrics-grid {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 10px;
    margin-bottom: 14px;
}  

.metric-card {
    background: white;
    border: 1px solid #f1f5f9;
    border-radius: 8px;
    padding: 10px;
    text-align: center;
    box-shadow: 0 1px 3px rgba(0, 0, 0, 0.05);
    transition: all 0.2s ease;
}

.metric-card:hover {
    transform: translateY(-1px);
    box-shadow: 0 4px 6px rgba(0, 0, 0, 0.07);
}

.metric-card:hover {
    transform: translateY(-1px);
    box-shadow: 0 4px 6px rgba(0, 0, 0, 0.07);
}

.metric-card.true-theta {
    background: linear-gradient(135deg, #ecfdf5, #d1fae5);
    border-color: #a7f3d0;
}

.metric-card.theta-hat {
    background: linear-gradient(135deg, #dbeafe, #e0f2fe);
    border-color: #bfdbfe;
}

.metric-card.sem {
    background: linear-gradient(135deg, #fef3c7, #fef7cd);
    border-color: #fde68a;
}

.metric-card.reliability {
    background: linear-gradient(135deg, #dcfce7, #bbf7d0);
    border-color: #86efac;
}

.metric-value {
    font-size: 16px;
    font-weight: 700;
    color: #1e293b;
    font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
    margin-bottom: 2px;
}

.metric-label {
    font-size: 10px;
    font-weight: 600;
    color: #64748b;
    text-transform: uppercase;
    letter-spacing: 0.5px;
}

/* Performance Section */
.performance-section {
    background: #f8fafc;
    border-radius: 8px;
    padding: 12px;
    margin-bottom: 12px;
    border: 1px solid #f1f5f9;
}

.performance-grid {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 12px;
}

.performance-metric {
    text-align: center;
}

.performance-value {
    font-size: 14px;
    font-weight: 700;
    color: #1e293b;
    font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
    margin-bottom: 2px;
}

.performance-label {
    font-size: 10px;
    font-weight: 600;
    color: #64748b;
    text-transform: uppercase;
    letter-spacing: 0.5px;
}

/* Target Info */
.target-info {
    display: flex;
    justify-content: center;
    align-items: center;
    gap: 6px;
    padding: 8px 12px;
    background: #f0fdf4;
    border: 1px solid #dcfce7;
    border-radius: 6px;
}

.target-label {
    font-size: 11px;
    font-weight: 600;
    color: #166534;
}

.target-value {
    font-size: 11px;
    font-weight: 700;
    color: #166534;
    font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
}

/* Status Message */
.status-message {
    text-align: center;
    color: #64748b;
    font-size: 12px;
    margin-bottom: 12px;
    font-style: italic;
}

.status-content {
    padding: 8px 0;
}

/* Boxed controls */
.control-card-pool { 
    border: 1px solid #cfe9cf; 
    border-radius: 10px; 
    background: #edf7ed; 
    padding: 10px; 
    box-shadow: 0 1px 0 rgba(0, 0, 0, .02); 
}

.control-card-opts { 
    border: 1px solid #ddd6fe; 
    border-radius: 10px; 
    background: #f5f3ff; 
    padding: 10px; 
    box-shadow: 0 1px 0 rgba(0, 0, 0, .02); 
}

/* Increase space between individual sliders */
.sidebar-compact .shiny-input-container { 
    margin-top: 5px !important;
    margin-bottom: 5px !important;
}

.tab-content { 
    padding-top: 5px !important; 
}

/* Increase space between sliders within the same box */
.sidebar-compact .form-group {
    margin-bottom: 10px !important;
}

.sidebar-compact hr {
    margin: 20px 0 !important;
}

.nav-tabs { 
    flex-wrap: nowrap !important; 
    font-size: 12px; 
    margin-bottom: 6px; 
}

.nav-tabs .nav-link { 
    padding: 6px 10px; 
}

.table-active { 
    background-color: rgba(0, 0, 0, .05) !important; 
}

/* Comparison tables */
.comparison-table-container { 
    max-height: 300px; 
    overflow-y: auto; 
    margin-top: 10px; 
}

.policy-header { 
    background-color: #f8f9fa !important; 
    font-weight: bold; 
}

.randomesque-col { 
    background-color: #f0f8ff; 
}

.mfi-col { 
    background-color: #f0fff0; 
}

.fixed-col { 
    background-color: #f8f0ff; 
}
")
  )),
  
  titlePanel(div(
    style = "text-align:center; margin-bottom:10px;",
    h4("COMPUTERIZED ADAPTIVE TEST SIMULATOR — 3PL", style = "margin:0; font-size:20px;")
  )),
  
  fluidRow(
    column(
      width = 4,
      style = "padding: 6px;",
      withMathJax(),
      
      # Status
      div(style = "margin-bottom:8px;", uiOutput("status_panel")),
      
      # ---- ITEM POOL CONTROLS (pastel green) ----
      div(
        class = "control-card-pool sidebar-compact",
        sliderInput(
          "pool_n",
          tags$span("Item pool",
                    bsTooltip("pool_n_info", "Number of items in the pool. Larger pools provide more selection options but may reduce targeting efficiency.", 
                              placement = "top")),
          min = 50,
          max = 950,
          value = 300,
          step = 10,
          width = "100%"
        ),
        hr(style = "margin: 8px 0; border-top: 1px solid;"),
        sliderInput(
          "max_items",
          tags$span("Max items",
                    bsTooltip("max_items_info", "Maximum number of items to administer. CAT stops when this limit is reached or target precision is achieved.", 
                              placement = "top")),
          min = 5,
          max = 185,
          value = 50,
          step = 1,
          width = "100%"
        )
      ),
      
      # ---- SIMULATION CONTROLS (pastel violet) ----
      div(
        class = "control-card-opts sidebar-compact",
        style = "margin-top:8px;",
        sliderInput(
          "true_theta",
          tags$span(HTML("True Ability (θ)"),
                    bsTooltip("true_theta_info", "The true underlying ability being estimated. This is unknown in real testing but set here for simulation.", 
                              placement = "top")),
          min = -3,
          max = 3,
          value = 1.5,
          step = 0.1,
          width = "100%"
        ),
        hr(style = "margin: 8px 0; border-top: 1px solid;"),
        sliderInput(
          "target_se",
          tags$span(HTML("Target Standard Error"),
                    bsTooltip("target_se_info", "Desired measurement precision. CAT stops when standard error falls below this threshold.", 
                              placement = "top")),
          min = 0.05,
          max = .65,
          value = 0.30,
          step = 0.05,
          width = "100%"
        ),
        uiOutput("warnings_panel")
      )
    ),
    
    column(
      width = 8,
      style = "padding: 6px;",
      
      # Policy tabs - UPDATED TO PILL STYLE
      div(class = "policy-bar", 
          div(
            class = "policy-tabs",
            div(
              class = "policy-tab active",
              id = "tab-randq",
              "Top-5 Randomesque\n(Stochastic CAT)"
            ),
            div(
              class = "policy-tab",
              id = "tab-mfi", 
              "Maximum Fisher Info\n(Deterministic CAT)"
            ),
            div(
              class = "policy-tab",
              id = "tab-fixed",
              "Random Item Selection\n(Represents Fixed-form)"
            )
          )
      ),
      
      # Toolbar
      div(
        class = "toolbar-row",
        div(
          class = "toolbar",
          actionButton("step", NULL, class = "btn-primary tool-btn-square", icon = icon("play")),
          actionButton(
            "run_all",
            NULL,
            class = "btn-success tool-btn-square",
            icon = icon("angles-right")
          ),
          uiOutput("progress_toolbar"),
          actionButton(
            "reset",
            NULL,
            class = "btn-outline-secondary tool-btn-square",
            icon = icon("rotate-left")
          )
        )
      ),
      
      # Main tabs
      tabsetPanel(
        id = "main_tabs",
        type = "tabs",
        tabPanel(
          tags$span("θ̂ Estimate", 
                    bsTooltip("theta_estimate_info", "Shows convergence of ability estimates to true theta across different selection methods.", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5("Ability Convergence (Selectable Trajectories)", style = "margin:0; font-size:12px;")
            ),
            div(
              class = "card-body card-body-compact",
              div(
                class = "convergence-controls compact-text",
                checkboxGroupInput(
                  "show_traj",
                  NULL,
                  choices = c(
                    "Top-5 Randomesque" = "randomesque",
                    "Maximum Fisher Information" = "mfi",
                    "Fixed-form Test" = "fixed"
                  ),
                  selected = c("randomesque", "mfi", "fixed"),
                  inline = TRUE
                )
              ),
              plotOutput("plot_theta", height = "350px"),
              # REMOVED DUPLICATE CONVERGENCE STATS - already in status panel
              uiOutput("convergence_table")
            )
          )
        ),
        tabPanel(
          tags$span("SEM", 
                    bsTooltip("sem_info", "Standard Error of Measurement comparison across selection methods. Lower values indicate better precision.", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5(
                "Top-5 Randomesque vs Maximum Fisher Information vs Fixed-form Test",
                style = "margin:0; font-size:11px;"
              )
            ),
            div(
              class = "card-body card-body-compact",
              plotOutput("compare_plot_se", height = "350px"),
              uiOutput("comparison_tables")
            )
          )
        ),
        tabPanel(
          tags$span("Information", 
                    bsTooltip("information_info", "Fisher Information functions. Higher information = lower measurement error. TIF = sum of IIFs.", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5("Information Function", style = "margin:0; font-size:12px;")
            ),
            div(
              class = "card-body card-body-compact",
              plotOutput("plot_info", height = "450px"),
              uiOutput("information_table")
            )
          )
        ),
        tabPanel(
          tags$span("Precision", 
                    bsTooltip("precision_info", "Measurement precision metrics: SEM (Standard Error), Reliability (ρ), and Precision (ψ = 1/SEM).", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5("Measurement Precision", style = "margin:0; font-size:12px;")
            ),
            div(
              class = "card-body card-body-compact",
              plotOutput("plot_precision", height = "300px"),
              uiOutput("precision_table")
            )
          )
        ),
        tabPanel(
          tags$span("ICCs", 
                    bsTooltip("iccs_info", "Item Characteristic Curves show probability of correct response as function of ability. 3PL model includes guessing parameter.", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5("Item Characteristic Curves", style = "margin:0; font-size:12px;")
            ),
            div(
              class = "card-body card-body-compact",
              plotOutput("plot_icc", height = "450px"),
              uiOutput("item_table")
            )
          )
        ),
        tabPanel(
          tags$span("Item Quality", 
                    bsTooltip("item_quality_info", "Item pool analysis including parameter distributions, targeting efficiency, and quality metrics.", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5("Item Pool Distribution and Quality Metrics", style = "margin:0; font-size:12px;")
            ),
            div(
              class = "card-body card-body-compact",
              div(
                style = "margin-bottom: 10px; text-align: center;",
                downloadButton("download_pool", "Download Item Pool CSV", 
                               class = "btn-outline-primary btn-sm")
              ),
              plotOutput("plot_pool_distribution", height = "300px"),
              uiOutput("pool_quality_dashboard"),
              plotOutput("plot_targeting_efficiency", height = "250px"),
              plotOutput("plot_parameter_correlations", height = "250px"),
              uiOutput("pool_stats")
            )
          )
        ),
        tabPanel(
          tags$span("Data Table", 
                    bsTooltip("data_table_info", "Complete administration history with all item parameters, responses, and measurement statistics.", 
                              placement = "top")), 
          div(
            class = "card card-compact",
            div(
              class = "card-header card-header-compact",
              h5(uiOutput("policy_title"), style = "margin:0; font-size:12px;")  # Use uiOutput here
            ),
            div(class = "card-body card-body-compact", uiOutput("complete_table"))
          )
        )
      )
    )
  ),
  
  # JS to toggle active policy tabs
  tags$script(
    HTML(
      "
    // Initialize with randomesque active
    document.addEventListener('DOMContentLoaded', function() {
      const tabs = document.querySelectorAll('.policy-tab');
      tabs.forEach(tab => {
        if (tab.id === 'tab-randq') {
          tab.classList.add('active');
        }
      });
    });
    
    // Handle policy tab clicks
    document.addEventListener('click', function(e){
      const id = e.target && e.target.id;
      if(['tab-randq','tab-mfi','tab-fixed'].indexOf(id) >= 0){
        // Update visual state
        const tabs = document.querySelectorAll('.policy-tab');
        tabs.forEach(tab => {
          if (tab.id === id) {
            tab.classList.add('active');
          } else {
            tab.classList.remove('active');
          }
        });
        // Send to Shiny
        Shiny.setInputValue('policy_click', id, {priority: 'event'});
      }
    });
  "
    )
  )
)

# ===========================================
# ---- SERVER LOGIC ----
# ===========================================

server <- function(input, output, session) {
  
  # ---- REACTIVE DATA SECTION ----
  # Reactive item pool generation with quality metrics
  pool <- reactive({
    set.seed(123)
    n <- input$pool_n %or% 300
    
    # Enhanced pool generation with better parameter control
    data.frame(
      id = seq_len(n),
      a = pmin(2.5, rlnorm(n, meanlog = log(1.2), sdlog = 0.3)),  # Better discrimination distribution
      b = rnorm(n, 0, 1.2),  # Slightly wider difficulty range
      c = runif(n, 0, 0.25)  # More realistic guessing parameters
    )
  })
  
  # Policy Method
  output$policy_title <- renderUI({
    paste0("Complete Item Administration Table — ",
           switch(selected_policy(),
                  "mfi" = "Maximum Fisher Information", 
                  "fixed" = "Random (Fixed-form)", 
                  "randomesque" = "Randomesque (Top-5)"))
  })
  
  # ---- STATE MANAGEMENT SECTION ----
  
  # Enhanced State Structure with Historical Tracking
  new_state <- function() {
    list(
      step = 0,
      used = integer(0),
      chosen = data.frame(
        step = integer(),
        id = integer(),
        a = double(),
        b = double(),
        c = double(),
        y = integer(),
        I_at_theta = double(),
        I_at_selection = double()
      ),
      theta_hat = 0,
      se = 1,
      history = data.frame(
        step = integer(),
        theta_est = numeric(),
        se = numeric(),
        total_info = numeric(),
        items_used = I(list())
      ),
      warnings = character(0)
    )
  }
  
  # Maintain separate states for each selection policy
  state <- reactiveVal(new_state())
  state_mfi <- reactiveVal(new_state())
  state_rand <- reactiveVal(new_state())
  
  # ---- POLICY MANAGEMENT SECTION ----
  selected_policy <- reactiveVal("randomesque")
  
  observeEvent(input$policy_click, {
    policy <- switch(input$policy_click,
                     "tab-randq" = "randomesque",
                     "tab-mfi" = "mfi", 
                     "tab-fixed" = "fixed",
                     "randomesque"  # Default case
    )
    
    selected_policy(policy)
    cat("Policy changed to:", policy, "\n")  # Debug output
  })
  
  # ---- ACTIVE STATE ACCESSOR ----
  active_state <- reactive({
    switch(selected_policy(),
           "mfi" = state_mfi(),
           "fixed" = state_rand(),
           state())  # Default to randomesque
  })  
  
  # ---- DATA EXPORT SECTION ----
  output$download_pool <- downloadHandler(
    filename = function() {
      paste0("item_pool_", format(Sys.time(), "%Y%m%d_%H%M%S"), ".csv")
    },
    content = function(file) {
      pl <- pool()
      
      # Calculate additional metrics for the download
      theta_true <- input$true_theta %or% 0
      
      # Add information function values at true theta
      pl$I_at_true_theta <- I_3pl(theta_true, pl$a, pl$b, pl$c)
      
      # Add targeting information
      pl$targeting_error <- abs(pl$b - theta_true)
      pl$targeting_zone <- cut(pl$targeting_error, 
                               breaks = c(0, 0.5, 1.0, 2.0, Inf),
                               labels = c("Excellent (±0.5)", "Good (±1.0)", "Fair (±2.0)", "Poor"),
                               include.lowest = TRUE)
      
      # Add quality classifications
      pl$discrimination_quality <- cut(pl$a,
                                       breaks = c(0, 0.5, 0.8, 1.2, Inf),
                                       labels = c("Poor (<0.5)", "Fair (0.5-0.8)", "Good (0.8-1.2)", "Excellent (>1.2)"),
                                       include.lowest = TRUE)
      
      pl$guessing_quality <- cut(pl$c,
                                 breaks = c(0, 0.1, 0.2, Inf),
                                 labels = c("Low (<0.1)", "Medium (0.1-0.2)", "High (>0.2)"),
                                 include.lowest = TRUE)
      
      # Add administration status if any items have been used
      st <- active_state()
      if (st$step > 0) {
        pl$administered <- ifelse(pl$id %in% st$used, "Yes", "No")
      } else {
        pl$administered <- "No"
      }
      
      # Reorder columns for better readability
      pl <- pl %>% select(
        id, administered, a, discrimination_quality, b, c, guessing_quality,
        I_at_true_theta, targeting_error, targeting_zone,
        everything()
      )
      
      write.csv(pl, file, row.names = FALSE)
    }
  )
  
  # ---- CAT ALGORITHM SECTION ----
  step_once <- function(st,
                        which,
                        pl,
                        theta_true,
                        target_se_slider,
                        max_items) {
    stop_se <- target_se_slider
    
    # Stop if max items reached or target precision achieved
    if (st$step >= max_items || st$se <= stop_se)
      return(st)
    
    # Item selection based on policy
    id <- if (st$step == 0) {
      pick_first_item(pl, 0)
    } else {
      switch(
        which,
        randomesque = select_next_item(
          pl,
          st$used,
          st$theta_hat,
          top_k = 5,
          I_min = .02
        ),
        mfi = select_mfi_item(pl, st$used, st$theta_hat, I_min = .02),
        fixed = select_random_item(pl, st$used)
      )
    }
    
    # Handle empty candidate pool
    if (is.na(id)) {
      st$warnings <- c(st$warnings,
                       sprintf("Step %d: No more items available", st$step))
      return(st)
    }
    
    # Simulate response based on true theta
    row <- pl[pl$id == id, , drop = FALSE]
    y <- generate_response(theta_true, row$a, row$b, row$c)
    
    # Calculate IIF at CURRENT theta for selection
    selection_theta <- st$theta_hat
    I_selection <- I_3pl(selection_theta, row$a, row$b, row$c)
    
    # Update administration record
    st$chosen <- rbind(
      st$chosen,
      data.frame(
        step = st$step + 1,
        id = id,
        a = row$a,
        b = row$b,
        c = row$c,
        y = y,
        I_at_theta = I_selection,
        I_at_selection = I_selection
      )
    )
    st$used <- c(st$used, id)
    
    # Bayesian ability estimation update using new EAP function
    resp <- data.frame(id = st$used, y = st$chosen$y[seq_along(st$used)])
    items_used <- pl[pl$id %in% st$used, ]
    upd <- eap_update(resp, items_used)
    st$theta_hat <- clamp(upd$theta, -4, 4)
    
    # Recalculate ALL IIFs at new theta for consistency
    if (nrow(st$chosen) > 0) {
      st$chosen$I_at_theta <- I_3pl(st$theta_hat, st$chosen$a, st$chosen$b, st$chosen$c)
      st$chosen$I_at_theta[!is.finite(st$chosen$I_at_theta)] <- 0
    }
    
    # Standard error from total information (more robust)
    I_total <- sum(I_3pl(st$theta_hat, items_used$a, items_used$b, items_used$c))
    st$se <- if (I_total > 0) 1 / sqrt(I_total) else 1.0
    
    # Warning for large theta changes
    if (st$step > 0) {
      prev <- tail(st$history$theta_est, 1)
      if (length(prev) && !is.na(prev)) {
        dth <- abs(st$theta_hat - prev)
        if (dth > 1.0) {
          st$warnings <- c(st$warnings,
                           sprintf("Step %d: Large θ change (Δ=%.2f)", st$step + 1, dth))
        }
      }
    }
    
    # Update history with consistent total information
    st$history <- rbind(
      st$history,
      data.frame(
        step = st$step + 1,
        theta_est = st$theta_hat,
        se = st$se,
        total_info = I_total,
        items_used = I(list(st$used))
      )
    )
    st$step <- st$step + 1
    st
  }
  
  # ---- EVENT HANDLERS SECTION ----
  observeEvent(input$reset, {
    state(new_state())
    state_mfi(new_state())
    state_rand(new_state())
  })
  
  observeEvent(input$step, {
    pl <- pool()
    theta_true <- input$true_theta %or% 0
    target_se_slider <- input$target_se %or% .30
    max_items <- input$max_items %or% 25
    
    state(step_once(
      state(),
      "randomesque",
      pl,
      theta_true,
      target_se_slider,
      max_items
    ))
    state_mfi(step_once(
      state_mfi(),
      "mfi",
      pl,
      theta_true,
      target_se_slider,
      max_items
    ))
    state_rand(step_once(
      state_rand(),
      "fixed",
      pl,
      theta_true,
      target_se_slider,
      max_items
    ))
  })
  
  observeEvent(input$run_all, {
    pl <- pool()
    theta_true <- input$true_theta %or% 0
    target_se_slider <- input$target_se %or% .30
    max_items <- input$max_items %or% 25
    guard <- max_items * 3 + 50
    
    for (i in seq_len(guard)) {
      before <- list(A = state(),
                     B = state_mfi(),
                     C = state_rand())
      
      a <- step_once(before$A,
                     "randomesque",
                     pl,
                     theta_true,
                     target_se_slider,
                     max_items)
      b <- step_once(before$B,
                     "mfi",
                     pl,
                     theta_true,
                     target_se_slider,
                     max_items)
      c <- step_once(before$C,
                     "fixed",
                     pl,
                     theta_true,
                     target_se_slider,
                     max_items)
      
      state(a)
      state_mfi(b)
      state_rand(c)
      
      doneA <- (a$step >= max_items) || (a$se <= target_se_slider)
      doneB <- (b$step >= max_items) || (b$se <= target_se_slider)
      doneC <- (c$step >= max_items) || (c$se <= target_se_slider)
      
      if (doneA && doneB && doneC)
        break
    }
  })
  
  # ---- UI OUTPUTS SECTION ----
  
  # Progress Display - REMOVED ITEM PROGRESS
  output$progress_toolbar <- renderUI({
    st <- active_state()
    target <- input$target_se %or% 0.30
    max_items <- input$max_items %or% 25
    cur_se <- st$se
    
    # Precision progress: 100% when at/below target, scaling down as SEM increases
    if (cur_se <= target) {
      precision_progress_pct <- 100  # Target achieved
    } else {
      # Scale from 100% at target to 0% at worst-case (SEM = 1.0)
      worst_case_se <- 1.0
      precision_progress_pct <- 100 * (1 - (cur_se - target) / (worst_case_se - target))
      precision_progress_pct <- pmax(0, pmin(100, precision_progress_pct))
    }
    
    # Calculate efficiency text
    efficiency_text <- ""
    if (st$step > 0) {
      comparisons <- character(0)
      
      mfi_items <- state_mfi()$step
      fixed_items <- state_rand()$step
      current_policy <- selected_policy()
      
      if (current_policy != "mfi" && mfi_items > 0) {
        items_saved_mfi <- mfi_items - st$step
        if (items_saved_mfi != 0) {
          comparisons <- c(comparisons, 
                           sprintf("MFI: %s%d", ifelse(items_saved_mfi > 0, "+", ""), items_saved_mfi))
        }
      }
      
      if (current_policy != "fixed" && fixed_items > 0) {
        items_saved_fixed <- fixed_items - st$step
        if (items_saved_fixed != 0) {
          comparisons <- c(comparisons,
                           sprintf("Fixed: %s%d", ifelse(items_saved_fixed > 0, "+", ""), items_saved_fixed))
        }
      }
      
      if (length(comparisons) > 0) {
        efficiency_text <- paste(comparisons, collapse = " | ")
      }
    }
    
    achieved <- (!is.na(cur_se) && (cur_se <= target))
    hit_max <- st$step >= max_items
    
    badge_class <- if (is.na(cur_se)) {
      "se-badge"
    } else if (achieved) {
      "se-badge good"
    } else if (cur_se <= target + 0.10) {
      "se-badge warn"
    } else {
      "se-badge bad"
    }
    
    base_txt <- if (st$step == 0) {
      "Ready to begin"
    } else {
      if (achieved) {
        sprintf("Target achieved: %d items", st$step)
      } else if (hit_max) {
        sprintf("Max items: %d", st$step)
      } else {
        sprintf("In progress: %d/%d items", st$step, max_items)
      }
    }
    
    div(
      class = "progress-shell",
      div(
        class = "progress-left",
        div(style = "width: 100%;", 
            div(class = "progress-text", base_txt), 
            if (nzchar(efficiency_text)) {
              div(class = "progress-text", style = "font-size: 8px; color: #6c757d; line-height: 1.1;", efficiency_text)
            }),
        # SINGLE progress bar for precision only
        div(class = "progress-item",
            div(class = "progress-info",
                span(class = "progress-label", "Precision Progress"),
                span(class = "progress-value", sprintf("%.0f%%", precision_progress_pct))
            ),
            div(class = "progress-track",
                div(class = "progress-fill precision-progress",
                    style = sprintf("width: %.1f%%", precision_progress_pct)))
        )
      ),
      div(class = badge_class, if (st$step > 0) sprintf("SE %.3f", cur_se) else "SE —")
    )
  })
  
  # Status Panel - CLEANED (No SEM, No Target Achieved duplicate)
  # Status Panel - MOVED TARGET PRECISION AND REMOVED BADGE
  output$status_panel <- renderUI({
    pol <- selected_policy()
    st <- switch(pol, "mfi" = state_mfi(), "fixed" = state_rand(), state())
    
    # Early return for empty state
    if (st$step == 0) {
      return(div(
        class = "status-card",
        div(class = "status-header",
            div(class = "status-title", paste0("Status — ", 
                                               switch(pol, "mfi" = "Maximum Fisher Information", "fixed" = "Random (Fixed-form)", "Randomesque (Top-5)")))
            # REMOVED BADGE
        ),
        div(class = "status-content",
            div(class = "status-message", "Click 'Next' to begin administration")
        )
      ))
    }
    
    # Calculate metrics
    theta_true <- input$true_theta %or% 0
    tgt <- input$target_se %or% .30
    
    bias <- st$theta_hat - theta_true
    current_se <- st$se
    rmse <- if (st$step > 0) sqrt(mean((st$history$theta_est - theta_true)^2)) else NA
    reliability <- if (!is.na(current_se) && current_se > 0) 1 - current_se^2 else NA
    
    div(
      class = "status-card",
      
      # Header - REMOVED BADGE
      div(class = "status-header",
          div(class = "status-title", paste0("Status — ", 
                                             switch(pol, "mfi" = "MFI", "fixed" = "Fixed-form", "Randomesque")))
      ),
      
      # Core metrics grid - MOVED TARGET PRECISION TO EMPTY SLOT
      div(class = "metrics-grid",
          div(class = "metric-card true-theta",
              div(class = "metric-value", sprintf("%.3f", theta_true)),
              div(class = "metric-label", "True θ")
          ),
          div(class = "metric-card theta-hat",
              div(class = "metric-value", sprintf("%.3f", st$theta_hat)),
              div(class = "metric-label", HTML("Estimated θ̂"))
          ),
          div(class = "metric-card reliability",
              div(class = "metric-value", if(!is.na(reliability)) sprintf("%.3f", reliability) else "—"),
              div(class = "metric-label", "Reliability")
          ),
          div(class = "metric-card target-precision",  # REPLACED EMPTY CARD WITH TARGET PRECISION
              div(class = "metric-value", sprintf("%.2f", tgt)),
              div(class = "metric-label", "Target SEM")
          )
      ),
      
      # Performance metrics
      div(class = "performance-section",
          div(class = "performance-grid",
              div(class = "performance-metric",
                  div(class = "performance-value", 
                      style = ifelse(abs(bias) < 0.1, "color: #059669;", 
                                     ifelse(abs(bias) < 0.3, "color: #d97706;", "color: #dc2626;")),
                      sprintf("%+.3f", bias)),
                  div(class = "performance-label", "Bias")
              ),
              div(class = "performance-metric",
                  div(class = "performance-value", 
                      style = ifelse(!is.na(rmse) && rmse < 0.2, "color: #059669;", 
                                     ifelse(!is.na(rmse) && rmse < 0.4, "color: #d97706;", "color: #dc2626;")),
                      if(!is.na(rmse)) sprintf("%.3f", rmse) else "—"),
                  div(class = "performance-label", "RMSE")
              )
          )
      )
    )
  })
  
  
  # Warnings panel
  output$warnings_panel <- renderUI({
    st <- active_state()
    if (length(st$warnings) == 0) return(NULL)
    
    div(class = "warning-box",
        h6("Warnings", style = "margin:0 0 4px 0; font-size:11px;"),
        tags$ul(style = "margin:0; padding-left:15px;",
                lapply(st$warnings, tags$li))
    )
  })
  
  # ---- VISUALIZATION OUTPUTS SECTION ----
  
  # Ability convergence plot
  output$plot_theta <- renderPlot({
    stR <- state()
    stM <- state_mfi()
    stF <- state_rand()
    
    if (stR$step == 0 && stM$step == 0 && stF$step == 0) {
      plot.new()
      text(.5, .5, "No items yet — click Next", col = "#666666", cex = 1.3, font = 2)
      return(invisible())
    }
    
    target_se <- input$target_se %or% .30
    theta_true <- input$true_theta %or% 0
    show <- input$show_traj %or% c("randomesque", "mfi", "fixed")
    showR <- "randomesque" %in% show
    showM <- "mfi" %in% show
    showF <- "fixed" %in% show
    
    stepsR <- stR$history$step
    thsR <- stR$history$theta_est
    sesR <- stR$history$se
    stepsM <- stM$history$step
    thsM <- stM$history$theta_est
    sesM <- stM$history$se
    stepsF <- stF$history$step
    thsF <- stF$history$theta_est
    sesF <- stF$history$se
    
    # Color palette: Darker point boundaries, lighter fills, main line colors
    colors <- list(
      randomesque = list(
        main = "#FF0000",      # Pure Red for lines
        band = "#FFB3B3",      # Light Red for confidence bands
        point_fill = "#FF9999", # Much lighter Red for point fill
        point_border = "#990000" # Much darker Red for point boundary
      ),
      mfi = list(
        main = "#0000FF",      # Pure Blue for lines
        band = "#B3B3FF",      # Light Blue for confidence bands
        point_fill = "#9999FF", # Much lighter Blue for point fill
        point_border = "#000099" # Much darker Blue for point boundary
      ),
      fixed = list(
        main = "#00AA00",      # Pure Green for lines
        band = "#B3FFB3",      # Light Green for confidence bands
        point_fill = "#99FF99", # Much lighter Green for point fill
        point_border = "#006600" # Much darker Green for point boundary
      ),
      reference = list(
        true = "#000000",      # Pure Black
        target = "#FFA500"     # Orange
      )
    )
    
    vals <- c(theta_true)
    if (showR) vals <- c(vals, thsR)
    if (showM) vals <- c(vals, thsM)
    if (showF) vals <- c(vals, thsF)
    
    max_x <- max(c(1, if (showR) stepsR else 0, 
                   if (showM) stepsM else 0, 
                   if (showF) stepsF else 0), na.rm = TRUE)
    span <- diff(range(vals))
    yr <- range(vals) + c(-1, 1) * max(1.0, 0.3 * ifelse(span > 0, span, 1))
    yr[1] <- max(-3.5, yr[1])
    yr[2] <- min(3.5, yr[2])
    
    # Professional plotting parameters
    par(
      mar = c(4, 4.5, 3.5, 2),
      mgp = c(2.5, 0.8, 0),
      bg = "white",
      fg = "#000000",
      col.axis = "#000000",
      col.lab = "#000000",
      col.main = "#000000",
      cex.axis = 0.95,
      cex.lab = 1.1,
      family = "sans",
      lwd = 0.5
    )
    
    # Create main plot frame
    plot(
      NA,
      xlim = c(0, max_x),
      ylim = yr,
      xlab = "Test Step",
      ylab = expression(bold("Ability Estimate" ~ (theta))),
      main = "Ability Estimation Convergence",
      panel.first = {
        # Add professional grid
        grid(
          col = "#e0e0e0", 
          lty = 1, 
          lwd = 1,
          nx = NA, 
          ny = NULL
        )
      },
      bty = "n",
      axes = FALSE,
      font.lab = 2
    )
    
    # Custom professional axes
    axis(1, col = "#808080", col.axis = "#000000", lwd = 1, tck = -0.02)
    axis(2, col = "#808080", col.axis = "#000000", lwd = 1, tck = -0.02)
    box(col = "#808080", lwd = 1)
    
    # Reference lines with high contrast
    abline(h = theta_true, col = colors$reference$true, lwd = 4, lty = 1)
    abline(h = theta_true + target_se, col = colors$reference$target, lwd = 3, lty = 2)
    abline(h = theta_true - target_se, col = colors$reference$target, lwd = 3, lty = 2)
    
    # Initialize legend components
    leg <- character()
    line_cols <- character()
    line_ltys <- numeric()
    line_lwds <- numeric()
    point_pchs <- numeric()
    point_cols <- character()
    point_bg <- character()
    
    # Randomesque (Top-5) - Dark boundaries, light fills
    if (showR && length(stepsR)) {
      # Confidence band
      polygon(
        c(stepsR, rev(stepsR)),
        c(thsR + sesR, rev(thsR - sesR)),
        col = adjustcolor(colors$randomesque$band, 0.4),
        border = NA
      )
      # Main line (bright color)
      lines(
        stepsR, thsR,
        type = "l",  # Remove points from line - we'll add them separately
        lwd = 4,
        col = colors$randomesque$main,
        ljoin = 1
      )
      # Points with dark boundaries and light fills
      points(
        stepsR, thsR,
        pch = 21,
        col = colors$randomesque$point_border, # Dark boundary
        bg = colors$randomesque$point_fill,    # Light fill
        cex = 1.4,
        lwd = 1.2  # Slightly thicker for dark boundaries
      )
      leg <- c(leg, "Randomesque (Top-5)", "±1 SE Band")
      line_cols <- c(line_cols, colors$randomesque$main, colors$randomesque$band)
      line_ltys <- c(line_ltys, 1, NA)
      line_lwds <- c(line_lwds, 4, NA)
      point_pchs <- c(point_pchs, 21, 15)
      point_cols <- c(point_cols, colors$randomesque$point_border, colors$randomesque$band)
      point_bg <- c(point_bg, colors$randomesque$point_fill, colors$randomesque$band)
    }
    
    # MFI - Dark boundaries, light fills
    if (showM && length(stepsM)) {
      polygon(
        c(stepsM, rev(stepsM)),
        c(thsM + sesM, rev(thsM - sesM)),
        col = adjustcolor(colors$mfi$band, 0.4),
        border = NA
      )
      lines(
        stepsM, thsM,
        type = "l",
        lwd = 4,
        col = colors$mfi$main
      )
      points(
        stepsM, thsM,
        pch = 22,
        col = colors$mfi$point_border, # Dark boundary
        bg = colors$mfi$point_fill,    # Light fill
        cex = 1.4,
        lwd = 1.2
      )
      leg <- c(leg, "Maximum Information", "±1 SE Band")
      line_cols <- c(line_cols, colors$mfi$main, colors$mfi$band)
      line_ltys <- c(line_ltys, 1, NA)
      line_lwds <- c(line_lwds, 4, NA)
      point_pchs <- c(point_pchs, 22, 15)
      point_cols <- c(point_cols, colors$mfi$point_border, colors$mfi$band)
      point_bg <- c(point_bg, colors$mfi$point_fill, colors$mfi$band)
    }
    
    # Fixed-form - Dark boundaries, light fills
    if (showF && length(stepsF)) {
      polygon(
        c(stepsF, rev(stepsF)),
        c(thsF + sesF, rev(thsF - sesF)),
        col = adjustcolor(colors$fixed$band, 0.4),
        border = NA
      )
      lines(
        stepsF, thsF,
        type = "l",
        lwd = 4,
        col = colors$fixed$main
      )
      points(
        stepsF, thsF,
        pch = 23,
        col = colors$fixed$point_border, # Dark boundary
        bg = colors$fixed$point_fill,    # Light fill
        cex = 1.4,
        lwd = 1.2
      )
      leg <- c(leg, "Fixed Form", "±1 SE Band")
      line_cols <- c(line_cols, colors$fixed$main, colors$fixed$band)
      line_ltys <- c(line_ltys, 1, NA)
      line_lwds <- c(line_lwds, 4, NA)
      point_pchs <- c(point_pchs, 23, 15)
      point_cols <- c(point_cols, colors$fixed$point_border, colors$fixed$band)
      point_bg <- c(point_bg, colors$fixed$point_fill, colors$fixed$band)
    }
    
    # Add reference lines to legend
    leg <- c(leg, "True Ability", sprintf("Target SE ±%.2f", target_se))
    line_cols <- c(line_cols, colors$reference$true, colors$reference$target)
    line_ltys <- c(line_ltys, 1, 2)
    line_lwds <- c(line_lwds, 4, 3)
    point_pchs <- c(point_pchs, NA, NA)
    point_cols <- c(point_cols, NA, NA)
    point_bg <- c(point_bg, NA, NA)
    
    # High-contrast legend
    legend(
      "bottomright",
      bg = "white",
      box.col = "#000000",
      box.lwd = 2,
      cex = 0.9,
      legend = leg,
      col = line_cols,
      lty = line_ltys,
      lwd = line_lwds,
      pch = point_pchs,
      pt.bg = point_bg,
      pt.cex = 1.2,
      pt.lwd = 1.2,
      ncol = 2,
      x.intersp = 1.0,
      y.intersp = 1.2,
      text.col = "#000000",
      seg.len = 2.5
    )
  })
  
  # SE comparison plot
  output$compare_plot_se <- renderPlot({
    stR <- state()
    stM <- state_mfi()
    stF <- state_rand()
    
    if (stR$step == 0 && stM$step == 0 && stF$step == 0) {
      plot.new()
      text(.5, .5, "No data yet — click Next", col = "#666666", cex = 1.3, font = 2)
      return(invisible())
    }
    
    target <- input$target_se %or% .30
    max_x <- max(c(1, stR$history$step, stM$history$step, stF$history$step), na.rm = TRUE)
    
    # Color palette: Darker point boundaries, lighter fills, main line colors
    colors <- list(
      randomesque = list(
        main = "#FF0000",
        point_fill = "#FF9999", # Much lighter Red
        point_border = "#990000" # Much darker Red
      ),
      mfi = list(
        main = "#0000FF",
        point_fill = "#9999FF", # Much lighter Blue
        point_border = "#000099" # Much darker Blue
      ),
      fixed = list(
        main = "#00AA00",
        point_fill = "#99FF99", # Much lighter Green
        point_border = "#006600" # Much darker Green
      ),
      target = "#FFA500"
    )
    
    # Professional plotting parameters
    par(
      mar = c(4, 4.5, 3.5, 2),
      mgp = c(2.5, 0.8, 0),
      bg = "white",
      fg = "#000000",
      col.axis = "#000000",
      col.lab = "#000000",
      col.main = "#000000",
      cex.axis = 0.95,
      cex.lab = 1.1,
      family = "sans"
    )
    
    all_se <- c(stR$history$se, stM$history$se, stF$history$se, target, 1)
    yr <- range(all_se, na.rm = TRUE)
    yr[1] <- 0
    yr[2] <- min(yr[2], 1.2)
    
    # Create high-contrast plot frame
    plot(
      NA,
      xlim = c(0, max_x),
      ylim = yr,
      xlab = "Test Step",
      ylab = expression(bold("Standard Error" ~ (SE(theta)))),
      main = "Measurement Precision Over Time",
      panel.first = {
        grid(col = "#e0e0e0", lty = 1, lwd = 1, nx = NA, ny = NULL)
      },
      bty = "n",
      axes = FALSE,
      font.lab = 2
    )
    
    # Custom high-contrast axes
    axis(1, col = "#808080", col.axis = "#000000", lwd = 1, tck = -0.02)
    axis(2, col = "#808080", col.axis = "#000000", lwd = 1, tck = -0.02)
    box(col = "#808080", lwd = 1)
    
    # Plot lines with separate points (dark boundaries, light fills)
    if (nrow(stR$history)) {
      lines(
        stR$history$step, stR$history$se,
        lwd = 4, col = colors$randomesque$main, type = "l"
      )
      points(
        stR$history$step, stR$history$se,
        pch = 21, col = colors$randomesque$point_border,
        bg = colors$randomesque$point_fill, cex = 1.4, lwd = 1.2
      )
    }
    if (nrow(stM$history)) {
      lines(
        stM$history$step, stM$history$se,
        lwd = 4, col = colors$mfi$main, type = "l"
      )
      points(
        stM$history$step, stM$history$se,
        pch = 22, col = colors$mfi$point_border,
        bg = colors$mfi$point_fill, cex = 1.4, lwd = 1.2
      )
    }
    if (nrow(stF$history)) {
      lines(
        stF$history$step, stF$history$se,
        lwd = 4, col = colors$fixed$main, type = "l"
      )
      points(
        stF$history$step, stF$history$se,
        pch = 23, col = colors$fixed$point_border,
        bg = colors$fixed$point_fill, cex = 1.4, lwd = 1.2
      )
    }
    
    # Target line with high contrast
    abline(h = target, col = colors$target, lty = 2, lwd = 3)
    
    # High-contrast legend with dark boundaries and light fills
    legend(
      "topright",
      bg = "white",
      box.col = "#000000",
      box.lwd = 2,
      cex = 0.9,
      inset = 0.02,
      legend = c(
        "Randomesque (Top-5)",
        "Maximum Information", 
        "Fixed Form",
        sprintf("Target SE = %.2f", target)
      ),
      col = c(colors$randomesque$point_border, colors$mfi$point_border, colors$fixed$point_border, colors$target),
      lty = c(1, 1, 1, 2),
      lwd = c(4, 4, 4, 3),
      pch = c(21, 22, 23, NA),
      pt.bg = c(colors$randomesque$point_fill, colors$mfi$point_fill, colors$fixed$point_fill, NA),
      pt.lwd = 1.2,
      text.col = "#000000",
      seg.len = 2.5
    )
  })
  
  # Information function plot
  output$plot_info <- renderPlot({
    st <- active_state()
    pl <- pool()
    
    if (st$step == 0) {
      plot.new()
      text(.5,
           .5,
           "No items administered yet\nClick 'Next' to begin",
           col = "gray50")
      return(invisible())
    }
    
    used <- pl[pl$id %in% st$used, ]
    theta <- seq(-4, 4, by = .01)
    current_theta <- st$theta_hat
    
    # Calculate information functions
    I_mat <- sapply(1:nrow(used), function(i)
      I_3pl(theta, used$a[i], used$b[i], used$c[i]))
    if (is.null(dim(I_mat)))
      I_mat <- matrix(I_mat, ncol = 1)
    
    I_sum <- rowSums(I_mat)
    I_indiv <- I_mat
    
    # Calculate CURRENT information at final theta
    current_iifs <- I_3pl(current_theta, used$a, used$b, used$c)
    current_tif <- sum(current_iifs)
    
    # CRITICAL: Calculate historical positions for ALL items
    historical_points <- list()
    if (nrow(st$chosen) > 0) {
      for (i in 1:nrow(st$chosen)) {
        item_id <- st$chosen$id[i]
        selection_step <- st$chosen$step[i]
        
        # Get the theta estimate WHEN this item was selected
        historical_theta <- st$history$theta_est[st$history$step == selection_step]
        
        # Get the item parameters
        item <- used[used$id == item_id, ]
        
        # Calculate what the IIF was AT THAT HISTORICAL THETA
        historical_iif <- I_3pl(historical_theta, item$a, item$b, item$c)
        
        historical_points[[i]] <- list(
          item_id = item_id,
          historical_theta = historical_theta,
          historical_iif = historical_iif,
          current_iif = current_iifs[which(used$id == item_id)]
        )
      }
    }
    
    # Calculate ranges for both axes - FIXED: Use separate ranges
    tif_range <- range(I_sum, na.rm = TRUE)
    iif_range <- range(I_indiv, na.rm = TRUE)
    
    # Ensure reasonable ranges
    tif_ylim <- c(0, max(3, tif_range[2] * 1.1))
    iif_ylim <- c(0, max(1, iif_range[2] * 1.1))
    
    # Create transformation function between TIF and IIF scales
    transform_iif_to_tif <- function(iif_vals) {
      (iif_vals - iif_ylim[1]) / (iif_ylim[2] - iif_ylim[1]) *
        (tif_ylim[2] - tif_ylim[1]) + tif_ylim[1]
    }
    
    # Create plot with TIF on left axis
    par(
      mar = c(3, 3, 1, 3),
      mgp = c(1.8, .5, 0),
      bg = "white",
      cex.axis = .7,
      cex.lab = .8
    )
    plot(
      theta,
      I_sum,
      type = "l",
      lwd = 4,
      col = "black",
      xlab = "Ability (θ)",
      ylab = "Test Information (TIF)",
      ylim = tif_ylim
    )
    
    # Add right axis with proper IIF scale
    right_ticks <- pretty(iif_ylim, 5)
    axis(
      4,
      at = transform_iif_to_tif(right_ticks),
      labels = right_ticks,
      cex.axis = .7
    )
    mtext(
      "Item Information (IIF)",
      side = 4,
      line = 1.8,
      cex = .8
    )
    
    # Plot individual information functions (scaled to TIF axis)
    cols <- rainbow(ncol(I_indiv))
    for (j in seq_len(ncol(I_indiv))) {
      I_scaled <- transform_iif_to_tif(I_indiv[, j])
      lines(theta,
            I_scaled,
            col = adjustcolor(cols[j], .6),
            lwd = 1.5)
    }
    
    # Mark true theta
    abline(v = input$true_theta,
           col = "#E74C3C",
           lwd = 1.5)
    
    # Mark current theta
    abline(
      v = current_theta,
      col = "#2C3E50",
      lty = 1,
      lwd = 1
    )
    points(
      current_theta,
      current_tif,
      pch = 21,
      bg = "#2C3E50",
      col = "white",
      cex = 1.2,
      lwd = 2
    )
    
    # Mark CURRENT IIFs at current theta - SOLID points (scaled to TIF axis)
    current_iifs_scaled <- transform_iif_to_tif(current_iifs)
    for (j in seq_along(current_iifs_scaled)) {
      points(
        current_theta,
        current_iifs_scaled[j],
        pch = 21,
        bg = cols[j],
        col = "white",
        cex = 0.8,
        lwd = 1
      )
    }
    
    # Add HOLLOW points at HISTORICAL positions (scaled to TIF axis)
    if (length(historical_points) > 0) {
      for (i in seq_along(historical_points)) {
        hist <- historical_points[[i]]
        item_idx <- which(used$id == hist$item_id)
        historical_scaled <- transform_iif_to_tif(hist$historical_iif)
        
        # HOLLOW CIRCLE at historical theta position
        points(
          hist$historical_theta,
          historical_scaled,
          pch = 1,
          col = cols[item_idx],
          cex = 1.0,
          lwd = 1.5
        )
        
        # Add response indicator (correct/incorrect) at historical position
        text(
          hist$historical_theta,
          historical_scaled,
          labels = ifelse(st$chosen$y[i] == 1, "✓", "✗"),
          col = ifelse(st$chosen$y[i] == 1, "#28a745", "#dc3545"),
          cex = 0.7,
          pos = 3,  # Position above the point
          offset = 0.5
        )
        
        # For the LAST item only, add connecting line to show the change
        if (i == length(historical_points)) {
          segments(
            x0 = hist$historical_theta,
            y0 = historical_scaled,
            x1 = current_theta,
            y1 = current_iifs_scaled[item_idx],
            col = adjustcolor(cols[item_idx], 0.4),
            lty = 2,
            lwd = 1
          )
        }
      }
    }
    
    # Arrow pointing to current TIF (pointing to left axis) - FIXED: uses actual TIF value
    shape::Arrows(
      x0 = current_theta,
      y0 = current_tif,
      x1 = min(theta) * 1.05,
      y1 = current_tif,
      arr.length = .4,
      arr.type = "curved",
      lty = 1,
      lwd = 1,
      col = adjustcolor("black", .85)
    )
    
    # Add TIF value as a tick label on left axis
    axis(2,
         at = current_tif,
         labels = sprintf(" %.2f ", current_tif),
         col.axis = "black",
         cex.axis = .8,
         tck = 0,
         line = -0.5,
         las = 1
    )
    
    # Arrow pointing from last item to IIF axis (right axis) - FIXED: uses actual IIF value
    if (length(historical_points) > 0) {
      last_item <- historical_points[[length(historical_points)]]
      last_item_idx <- which(used$id == last_item$item_id)
      last_item_current_iif <- last_item$current_iif
      last_item_current_iif_scaled <- current_iifs_scaled[last_item_idx]
      
      # Arrow pointing from last item's current IIF to right axis - FIXED: points to correct position
      shape::Arrows(
        x0 = current_theta,
        y0 = last_item_current_iif_scaled,
        x1 = max(theta) * 1.05,
        y1 = last_item_current_iif_scaled,
        arr.length = .4,
        arr.type = "curved",
        lty = 1,
        lwd = 1,
        col = adjustcolor(cols[last_item_idx], .85)
      )
      
      # Add IIF value as a tick label on right axis
      axis(4,
           at = last_item_current_iif_scaled,
           labels = sprintf(" %.2f", last_item_current_iif),
           col.axis = cols[last_item_idx],
           cex.axis = 0.8,
           tck = 0,
           line = -0.5,
           las = 1
      )
    }
    
    # Enhanced legend
    legend(
      "topright",
      bg = "white",
      cex = 1,
      inset = .01,
      legend = c(
        "TIF (left axis)",
        "IIFs (right axis)", 
        "Current θ̂",
        "Current IIF",
        "Historical IIF",
        "Correct response",
        "Incorrect response"
      ),
      col = c(
        "black",
        "#3498DB",
        "#2C3E50", 
        "#E74C3C",
        if (length(historical_points) > 0)
          cols[length(historical_points)]
        else
          "#E74C3C",
        "#28a745",
        "#dc3545"
      ),
      lty = c(1, 1, 1, 1, 1, NA, NA),
      lwd = c(4, 2, 1, 2, 2, NA, NA),
      pch = c(NA, NA, NA, 19, 1, -0x2713, -0x2717),
      pt.cex = c(NA, NA, NA, 1, 1, 1.2, 1.2)
    )
    
    title(
      main = sprintf(
        "Information Functions (%d Items)",
        st$step
      ),
      cex.main = 0.9,
      line = 0.5
    )
  })
  
  # Enhanced measurement precision plot (SEM and Reliability only)
  output$plot_precision <- renderPlot({
    st <- active_state()
    pl <- pool()
    
    if (st$step == 0) { 
      plot.new()
      text(.5, .5, "No items administered yet\nClick 'Next' to begin", col = "gray50")
      return(invisible()) 
    }
    
    used <- pl[pl$id %in% st$used, ]
    theta <- seq(-3, 3, by = .01)
    
    # Calculate Fisher information properly
    I_sum <- rep(0, length(theta))
    
    for(i in 1:nrow(used)) {
      a <- used$a[i]
      b <- used$b[i]
      c <- used$c[i]
      
      # Calculate probability of correct response
      p <- c + (1 - c) / (1 + exp(-D * a * (theta - b)))
      
      # Calculate item information for 3PL model
      Q <- 1 - p
      info <- (D * a)^2 * Q * ((p - c)^2 / (p * (1 - c)^2))
      
      I_sum <- I_sum + info
    }
    
    # Calculate SEM and Reliability curves
    SE <- 1 / sqrt(I_sum)
    # Cap SEM at a reasonable maximum and handle numerical instability
    SE[SE > 3] <- 3
    SE[I_sum < .Machine$double.eps] <- 3
    SE[!is.finite(SE)] <- 3
    
    # Calculate reliability from SEM: ρ = 1 - SEM²
    Rho <- 1 - SE^2
    # Ensure reliability stays in valid range [0,1]
    Rho <- pmax(0, pmin(1, Rho))
    
    target <- input$target_se %or% .30
    
    # Use the current theta from state
    current_theta <- st$theta_hat
    
    # Get historical theta estimates and SEM values
    historical_theta <- st$history$theta_est
    historical_se <- st$history$se
    historical_rho <- 1 - historical_se^2
    
    # Find the index closest to current theta in the theta grid
    theta_idx <- which.min(abs(theta - current_theta))
    
    # Get the actual values FROM THE CURVES at current theta position
    plot_se_at_theta <- SE[theta_idx]
    plot_rho_at_theta <- Rho[theta_idx]
    
    # Use state SEM for consistency with other displays
    state_se <- st$se
    state_rho <- 1 - state_se^2
    
    # Calculate ymax with reasonable bounds for SEM
    ymax <- max(SE, target * 1.05, state_se * 1.1, na.rm = TRUE)
    ymax <- min(ymax, 3)  # Don't let ymax exceed 3
    ymax <- max(ymax, target * 1.5, 0.5)  # Ensure reasonable minimum
    
    par(mar = c(3, 3, 1, 3), mgp = c(1.8, .5, 0), bg = "white", cex.axis = .7, cex.lab = .8)
    
    # Plot SEM curve with proper ylim
    plot(theta, SE, type = "l", lwd = 4, col = "#E74C3C", 
         xlab = "", ylab = "SEM",
         ylim = c(0, ymax), col.lab = "#E74C3C",
         yaxt = "n")  # Suppress default y-axis
    
    # Add X-axis label with custom color
    title(xlab = "Ability (θ)", col.lab = "#2C3E50", line = 2, cex.lab = 0.9)
    
    # Create custom y-axis for SEM with only the values we want
    sem_ticks <- pretty(c(0, ymax), 3)
    axis(2, at = sem_ticks, col = "#E74C3C", col.axis = "#E74C3C", cex.axis = 0.8)
    
    # Add target SE as a special tick mark (only if not already in sem_ticks)
    if (!target %in% sem_ticks) {
      axis(2, at = target, labels = sprintf("SE=%.2f ", target), 
           col.axis = "#E74C3C", col.ticks = "#E74C3C", 
           cex.axis = 0.9, tck = -0.03, line = -0.5, las = 1)
    }
    
    # Add state SEM as a special tick mark (only if not already in sem_ticks and not too close to target)
    if (!state_se %in% sem_ticks && abs(state_se - target) > 0.05) {
      axis(2, at = state_se, labels = sprintf("SE=%.2f", state_se), 
           col.axis = "#E74C3C", col.ticks = "#E74C3C", 
           cex.axis = 0.7, tck = -0.03, line = -0.5, las = 1)
    }
    
    # Mark current theta position
    abline(v = current_theta, col = "#2C3E50", lty = 1, lwd = 1)
    
    # HOLLOW CIRCLES for historical estimates (SEM)
    if (length(historical_theta) > 1) {
      # All but the current estimate - using HOLLOW circles with colored borders
      points(historical_theta[1:(length(historical_theta)-1)], 
             historical_se[1:(length(historical_se)-1)], 
             pch = 21, bg = "white", col = adjustcolor("#E74C3C", 0.2), cex = 1.2, lwd = 1.5)
    }
    
    # SOLID CIRCLE for final current estimate (SEM)
    points(current_theta, state_se, 
           pch = 21, bg = "#E74C3C", col = "white", cex = 1.5, lwd = 2)
    
    # Arrow pointing to STATE SEM value
    shape::Arrows(x0 = current_theta, y0 = state_se, 
                  x1 = min(theta) * 1.05, y1 = state_se,
                  arr.length = .4, arr.type = "curved", 
                  lty = 1, lwd = 1, col = adjustcolor("#E74C3C", .85))
    
    par(new = TRUE)
    
    # Plot Reliability curve (calculated from SEM: ρ = 1 - SEM²)
    plot(theta, Rho, type = "l", lwd = 4, col = "#27AE60", 
         axes = FALSE, xlab = "", ylab = "", ylim = c(0, 1))
    
    # Create custom y-axis for reliability
    rho_ticks <- pretty(c(0, 1), 3)
    axis(4, at = rho_ticks, col = "#27AE60", col.axis = "#27AE60", cex.axis = 0.8)
    mtext("Reliability (ρ)", side = 4, line = 1.8, col = "#27AE60", cex = .8)
    
    # Add state reliability as a special tick mark (only if not already in rho_ticks)
    if (!state_rho %in% rho_ticks) {
      axis(4, at = state_rho, labels = sprintf(" ρ = %.2f", state_rho), 
           col.axis = "#27AE60", col.ticks = "#27AE60", 
           cex.axis = 0.9, tck = -0.03, line = -0.5, las = 1)
    }
    
    # HOLLOW CIRCLES for historical estimates (Reliability)
    if (length(historical_theta) > 1) {
      # All but the current estimate - using HOLLOW circles with colored borders
      points(historical_theta[1:(length(historical_theta)-1)], 
             historical_rho[1:(length(historical_rho)-1)], 
             pch = 21, bg = "white", col = adjustcolor("#27AE60", 0.2), cex = 1.2, lwd = 1.5)
    }
    
    # SOLID CIRCLE for final current estimate (Reliability)
    points(current_theta, state_rho, 
           pch = 21, bg = "#27AE60", col = "white", cex = 1.5, lwd = 2)
    
    # Arrow pointing to STATE reliability value
    shape::Arrows(x0 = current_theta, y0 = state_rho, 
                  x1 = max(theta) * 1.04, y1 = state_rho,
                  arr.length = .4, arr.type = "curved", 
                  lty = 1, lwd = 1, col = adjustcolor("#27AE60", .85))
    
    legend("topleft", bg = "white", cex = 0.8, inset = .01,
           legend = c("SEM Curve", "Reliability Curve", "Current θ", 
                      "Historical Estimates", "Final Estimate"),
           col = c("#E74C3C", "#27AE60", "#2C3E50", "#E74C3C", "#E74C3C"), 
           lty = c(1, 1, 1, NA, NA), lwd = c(4, 4, 1, NA, NA), 
           pch = c(NA, NA, NA, 21, 21),
           pt.bg = c(NA, NA, NA, "white", "#E74C3C"))
    
    # Use STATE values in title
    title(main = sprintf("Measurement Precision (θ̂ = %.2f, SEM = %.3f, ρ = %.3f)", 
                         current_theta, state_se, state_rho), 
          cex.main = 0.8, line = 0.5)
  })
  
  # Item Characteristic Curves plot
  output$plot_icc <- renderPlot({
    st <- active_state()
    pl <- pool()
    
    if (st$step == 0) {
      plot.new()
      text(.5,
           .5,
           "No items administered yet\nClick 'Next' to begin",
           col = "gray50")
      return(invisible())
    }
    
    used <- pl[pl$id %in% st$used, ]
    theta <- seq(-4, 4, by = .01)
    cols <- rainbow(nrow(used))
    
    par(
      mar = c(3, 3, 1, 1),
      mgp = c(1.8, .5, 0),
      bg = "white",
      cex.axis = .7,
      cex.lab = .8
    )
    
    plot(
      theta,
      P_3pl(theta, used$a[1], used$b[1], used$c[1]),
      type = "l",
      lwd = 2,
      col = cols[1],
      xlab = "Ability (θ)",
      ylab = "P(θ)",
      ylim = c(0, 1)
    )
    
    if (nrow(used) > 1) {
      for (i in 2:nrow(used)) {
        lines(
          theta,
          P_3pl(theta, used$a[i], used$b[i], used$c[i]),
          lwd = 1.5,
          col = cols[i]
        )
      }
    }
    
    abline(
      v = st$theta_hat,
      col = "#2C3E50",
      lty = 2,
      lwd = 1.5
    )
    abline(v = input$true_theta,
           col = "#E74C3C",
           lwd = 1.5)
    
    if (nrow(used) > 0) {
      for (i in 1:nrow(used)) {
        step_when_administered <- st$chosen$step[i]
        theta_at_admin <- st$history$theta_est[st$history$step == step_when_administered]
        current_theta <- st$theta_hat
        
        # Calculate probabilities at both theta positions
        prob_at_admin <- P_3pl(theta_at_admin, used$a[i], used$b[i], used$c[i])
        prob_at_current <- P_3pl(current_theta, used$a[i], used$b[i], used$c[i])
        
        # HOLLOW circle at theta when administered (historical position)
        points(
          theta_at_admin,
          prob_at_admin,
          pch = 1,  # Hollow circle
          col = cols[i],
          cex = 1.2,
          lwd = 2,
          bg = NA   # No fill for hollow circle
        )
        
        # SOLID circle at current theta estimate (final position)
        points(
          current_theta,
          prob_at_current,
          pch = 16,  # Solid circle
          col = cols[i],
          cex = 1.0,
          lwd = 1
        )
        
        # Optional: Add connecting line to show the change
        segments(
          x0 = theta_at_admin,
          y0 = prob_at_admin,
          x1 = current_theta,
          y1 = prob_at_current,
          col = adjustcolor(cols[i], 0.4),
          lty = 2,
          lwd = 1
        )
        
        # Add response indicator (correct/incorrect) at historical position
        text(
          theta_at_admin,
          prob_at_admin,
          labels = ifelse(st$chosen$y[i] == 1, "✓", "✗"),
          col = ifelse(st$chosen$y[i] == 1, "#28a745", "#dc3545"),
          cex = 0.7,
          pos = 3,  # Position above the point
          offset = 0.5
        )
      }
    }
    
    legend(
      "topleft",
      bg = "white",
      cex = 0.8,
      inset = .01,
      legend = c(
        "ICC Curves", 
        "EAP θ estimate", 
        "True θ", 
        "θ at Admin", 
        "Current P(θ)",
        " Correct response",  # Note the space before for alignment
        " Incorrect response"  # Note the space before for alignment
      ),
      col = c(
        "#3498DB", "#2C3E50", "#E74C3C", "black", "black", "#28a745", "#dc3545"
      ),
      lty = c(1, 2, 1, NA, NA, NA, NA),
      lwd = c(1.5, 1.5, 1.5, NA, NA, NA, NA),
      pch = c(NA, NA, NA, 1, 16, -0x2713, -0x2717),  # Use negative Unicode values
      pt.cex = c(NA, NA, NA, 1.2, 1.0, 1.2, 1.2)
    )
  })
  
  # ---- DATA TABLES SECTION ----
  
  # Comparison tables for all policies
  output$comparison_tables <- renderUI({
    if (state()$step == 0 &&
        state_mfi()$step == 0 && state_rand()$step == 0)
      return(NULL)
    
    # Get data for all policies - include both information and SE
    df_rand <- if (state()$step > 0) {
      merge(state()$chosen, state()$history[, c("step", "theta_est", "se", "total_info")], by = "step")
    } else
      NULL
    
    df_mfi <- if (state_mfi()$step > 0) {
      merge(state_mfi()$chosen, state_mfi()$history[, c("step", "theta_est", "se", "total_info")], by = "step")
    } else
      NULL
    
    df_fixed <- if (state_rand()$step > 0) {
      merge(state_rand()$chosen, state_rand()$history[, c("step", "theta_est", "se", "total_info")], by = "step")
    } else
      NULL
    
    # Find maximum steps to determine table size
    max_steps <- max(if (!is.null(df_rand))
      nrow(df_rand)
      else
        0,
      if (!is.null(df_mfi))
        nrow(df_mfi)
      else
        0,
      if (!is.null(df_fixed))
        nrow(df_fixed)
      else
        0)
    
    if (max_steps == 0)
      return(div("No items administered in any policy"))
    
    div(
      class = "comparison-table-container",
      tags$table(
        class = "table table-bordered table-striped compact-table",
        style = "width:100%;",
        tags$thead(
          tags$tr(
            tags$th("Step", rowspan = 2, class = "step-col", style = "vertical-align: middle; text-align: center;"),
            tags$th(colspan = 3, "Item Information", class = "policy-header", style = "text-align: center;"),
            tags$th(colspan = 3, "Test Information", class = "policy-header", style = "text-align: center;"),
            tags$th(colspan = 3, "Standard Error", class = "policy-header", style = "text-align: center;")
          ),
          tags$tr(
            tags$th("Rand.", class = "randomesque-col", style = "text-align: center;"),
            tags$th("MFI", class = "mfi-col", style = "text-align: center;"),
            tags$th("Fixed", class = "fixed-col", style = "text-align: center;"),
            tags$th("Rand.", class = "randomesque-col", style = "text-align: center;"),
            tags$th("MFI", class = "mfi-col", style = "text-align: center;"),
            tags$th("Fixed", class = "fixed-col", style = "text-align: center;"),
            tags$th("Rand.", class = "randomesque-col", style = "text-align: center;"),
            tags$th("MFI", class = "mfi-col", style = "text-align: center;"),
            tags$th("Fixed", class = "fixed-col", style = "text-align: center;")
          )
        ),
        tags$tbody({
          lapply(1:max_steps, function(i) {
            # Get values for each policy at this step
            
            # Item IIF (the newly administered item's information)
            IIF_rand <- if (!is.null(df_rand) && i <= nrow(df_rand))
              sprintf("%.3f", df_rand$I_at_theta[i])
            else
              "—"
            IIF_mfi <- if (!is.null(df_mfi) && i <= nrow(df_mfi))
              sprintf("%.3f", df_mfi$I_at_theta[i])
            else
              "—"
            IIF_fixed <- if (!is.null(df_fixed) && i <= nrow(df_fixed))
              sprintf("%.3f", df_fixed$I_at_theta[i])
            else
              "—"
            
            # Total Information (sum of all administered items' information)
            Total_rand <- if (!is.null(df_rand) && i <= nrow(df_rand))
              sprintf("%.3f", df_rand$total_info[i])
            else
              "—"
            Total_mfi <- if (!is.null(df_mfi) && i <= nrow(df_mfi))
              sprintf("%.3f", df_mfi$total_info[i])
            else
              "—"
            Total_fixed <- if (!is.null(df_fixed) && i <= nrow(df_fixed))
              sprintf("%.3f", df_fixed$total_info[i])
            else
              "—"
            
            # Standard Error
            SE_rand <- if (!is.null(df_rand) && i <= nrow(df_rand))
              sprintf("%.3f", df_rand$se[i])
            else
              "—"
            SE_mfi <- if (!is.null(df_mfi) && i <= nrow(df_mfi))
              sprintf("%.3f", df_mfi$se[i])
            else
              "—"
            SE_fixed <- if (!is.null(df_fixed) && i <= nrow(df_fixed))
              sprintf("%.3f", df_fixed$se[i])
            else
              "—"
            
            tags$tr(
              tags$td(i, style = "text-align: center; font-weight: bold;"),
              tags$td(IIF_rand, class = "randomesque-col", style = "text-align: right;"),
              tags$td(IIF_mfi, class = "mfi-col", style = "text-align: right;"),
              tags$td(IIF_fixed, class = "fixed-col", style = "text-align: right;"),
              tags$td(Total_rand, class = "randomesque-col", style = "text-align: right;"),
              tags$td(Total_mfi, class = "mfi-col", style = "text-align: right;"),
              tags$td(Total_fixed, class = "fixed-col", style = "text-align: right;"),
              tags$td(SE_rand, class = "randomesque-col", style = "text-align: right;"),
              tags$td(SE_mfi, class = "mfi-col", style = "text-align: right;"),
              tags$td(SE_fixed, class = "fixed-col", style = "text-align: right;")
            )
          })
        })
      )
    )
  })
  
  output$precision_table <- renderUI({
    st <- active_state()
    if (st$step == 0) return(NULL)
    
    df <- st$history
    if (nrow(df) == 0) return(NULL)
    
    df$reliability <- 1 - df$se^2
    df$precision <- 1 / df$se
    
    # CORRECT RMSE CALCULATION: √[E((θ̂ - θ)²)]
    theta_true <- input$true_theta %or% 0
    
    # Calculate RMSE for each row as cumulative RMSE up to that point
    df$rmse <- sapply(1:nrow(df), function(i) {
      sqrt(mean((df$theta_est[1:i] - theta_true)^2))
    })
    
    # Add information column
    df$information <- 1 / (df$se^2)
    
    # Get the item IDs for each step from the chosen items
    chosen_items <- st$chosen
    
    tbl <- tags$table(
      class = "table table-bordered table-striped compact-table",
      style = "width:100%; margin-top:8px;",
      tags$thead(tags$tr(
        tags$th("Item", style = "text-align:center;"),
        tags$th("Step", style = "text-align:center;"),
        # SWAPPED: Information comes first, then SEM
        tags$th("Information I(θ̂)", style = "text-align:center;"),
        tags$th("SEM(θ̂) ≈ 1 / √[I(θ̂)]", style = "text-align:center;"),
        tags$th("RMSE = √[E((θ̂ - θ)²)]", style = "text-align:center;"),
        tags$th("ρ = 1 - SEM²", style = "text-align:center;"),
        tags$th("ψ = 1 / SEM", style = "text-align:center;")
      )),
      tags$tbody(lapply(seq_len(nrow(df)), function(i) {
        # Find the item administered at this step
        step_item <- if (i <= nrow(chosen_items)) {
          chosen_items$id[i]
        } else {
          "—"
        }
        
        tags$tr(
          # Item ID - color ribbon
          tags$td(
            step_item,
            style = if (step_item != "—") {
              paste0("border-left:4px solid ", rainbow(nrow(df))[i], "; text-align:right;")
            } else {
              "text-align:right;"
            }
          ),
          tags$td(df$step[i], style = "text-align:right;"),
          # SWAPPED: Information column first
          tags$td(sprintf("%.1f", df$information[i]), style = "text-align:right;"),
          # SEM column second
          tags$td(sprintf("%.3f", df$se[i]), style = "text-align:right;"),
          # CORRECT RMSE COLUMN DATA
          tags$td(sprintf("%.3f", df$rmse[i]), style = "text-align:right;"),
          tags$td(sprintf("%.3f", df$reliability[i]), style = "text-align:right;"),
          tags$td(sprintf("%.3f", df$precision[i]), style = "text-align:right;")
        )
      }))
    )
    
    div(class = "precision-table", tbl)
  })
  
  output$item_table <- renderUI({
    st <- active_state()
    if (st$step == 0) return(NULL)
    
    df <- st$chosen
    if (nrow(df) == 0) return(NULL)
    
    # Calculate P(θ) for each item at the CURRENT theta estimate
    current_theta <- st$theta_hat
    df$p_theta <- round(P_3pl(current_theta, df$a, df$b, df$c), 3)
    
    # Get the theta estimate at the time each item was administered
    df$theta_at_admin <- sapply(seq_len(nrow(df)), function(i) {
      step <- df$step[i]
      if (step <= nrow(st$history)) {
        st$history$theta_est[st$history$step == step]
      } else {
        current_theta
      }
    })
    
    # Calculate P(θ) at administration time
    df$p_theta_at_admin <- round(P_3pl(df$theta_at_admin, df$a, df$b, df$c), 3)
    
    tbl <- tags$table(
      class = "table table-bordered table-striped compact-table",
      style = "width:100%; margin-top:8px;",
      tags$thead(
        tags$tr(
          tags$th("Item", style = "text-align:center;"),
          tags$th("Step", style = "text-align:center;"),
          tags$th("Resp.", style = "text-align:center;"),
          tags$th("a", style = "text-align:center;"),
          tags$th("b", style = "text-align:center;"),
          tags$th("c", style = "text-align:center;"),
          tags$th("θ at Admin", style = "text-align:center;"),
          tags$th("P(θ) at Admin", style = "text-align:center;"),  # NEW COLUMN
          tags$th("P(θ) Final", style = "text-align:center;")
        )
      ),
      tags$tbody(lapply(seq_len(nrow(df)), function(i)
        tags$tr(
          # Item ID - color ribbon only
          tags$td(
            df$id[i], 
            style = paste0("border-left:4px solid ", rainbow(nrow(df))[i], "; text-align:right;")
          ),
          
          # Step - no color
          tags$td(df$step[i], style = "text-align:right;"),
          
          # Response - no color, just the icon styling
          tags$td(span(
            ifelse(df$y[i] == 1, "✓", "✗"),
            class = ifelse(df$y[i] == 1, "correct-response", "incorrect-response")
          ), style = "text-align:center;"),
          
          # Discrimination (a) - no color
          tags$td(sprintf("%.3f", df$a[i]), style = "text-align:right;"),
          
          # Difficulty (b) - no color
          tags$td(sprintf("%.3f", df$b[i]), style = "text-align:right;"),
          
          # Guessing (c) - no color
          tags$td(sprintf("%.3f", df$c[i]), style = "text-align:right;"),
          
          # Theta at administration - no color
          tags$td(sprintf("%.3f", df$theta_at_admin[i]), style = "text-align:right;"),
          
          # NEW: P(θ) at administration time - no color
          tags$td(sprintf("%.3f", df$p_theta_at_admin[i]), style = "text-align:right;"),
          
          # P(θ) at final theta - no color
          tags$td(sprintf("%.3f", df$p_theta[i]), style = "text-align:right;")
        )))
    )
    
    div(
      class = "item-table",
      div(
        style = "margin-bottom: 8px; font-size: 10px; color: #6c757d;",
        sprintf("P(θ) Final calculated at current θ̂ = %.3f | P(θ) at Admin shows probability when item was selected", current_theta)
      ),
      tbl
    )
  })
  
  output$complete_table <- renderUI({
    st <- active_state()
    if (st$step == 0) return(NULL)
    
    # Check if we have both chosen items and history data
    if (nrow(st$chosen) == 0 || nrow(st$history) == 0) return(NULL)
    
    # Combine all available data - handle potential merge issues
    df <- tryCatch({
      merge(st$chosen, st$history[, c("step", "theta_est", "se", "total_info")], by = "step")
    }, error = function(e) {
      # If merge fails, use chosen data only and add placeholder columns
      st$chosen
    })
    
    if (nrow(df) == 0) return(NULL)
    
    # Calculate additional metrics
    current_theta <- if ("theta_est" %in% names(df) && nrow(df) > 0) tail(df$theta_est, 1) else 0
    df$p_theta <- P_3pl(current_theta, df$a, df$b, df$c)
    df$reliability <- if ("se" %in% names(df)) 1 - df$se^2 else NA
    df$precision <- if ("se" %in% names(df)) 1 / df$se else NA
    theta_true <- input$true_theta %or% 0
    df$bias <- if ("theta_est" %in% names(df)) df$theta_est - theta_true else NA
    df$rmse <- sapply(1:nrow(df), function(i) {
      sqrt(mean((df$theta_est[1:i] - theta_true)^2))
    })
    
    # Ensure all required columns exist
    if (!"I_at_selection" %in% names(df)) df$I_at_selection <- NA
    if (!"I_at_theta" %in% names(df)) df$I_at_theta <- NA
    if (!"total_info" %in% names(df)) df$total_info <- NA
    if (!"theta_est" %in% names(df)) df$theta_est <- NA
    if (!"se" %in% names(df)) df$se <- NA
    
    tbl <- tags$table(
      class = "table table-bordered table-striped compact-table",
      style = "width:100%; margin-top:8px;",
      tags$thead(
        tags$tr(
          tags$th("Item", style = "text-align:center;"),
          tags$th("Step", style = "text-align:center;"),
          tags$th("Resp.", style = "text-align:center;"),
          tags$th("a", style = "text-align:center;"),
          tags$th("b", style = "text-align:center;"),
          tags$th("c", style = "text-align:center;"),
          tags$th("P(θ)", style = "text-align:center;"),
          tags$th("IIF(θ̂)", style = "text-align:center;"),
          tags$th("IIF(θ)", style = "text-align:center;"),
          tags$th("TIF", style = "text-align:center;"),
          tags$th("θ̂", style = "text-align:center;"),
          tags$th("SE", style = "text-align:center;"),
          tags$th("ρ", style = "text-align:center;"),
          tags$th("ψ", style = "text-align:center;"),
          tags$th("Bias", style = "text-align:center;"),
          tags$th("RMSE", style = "text-align:center;")
        )
      ),
      tags$tbody(lapply(seq_len(nrow(df)), function(i)
        tags$tr(
          # Item ID - color ribbon
          tags$td(
            df$id[i], 
            style = paste0("border-left:4px solid ", rainbow(nrow(df))[i], "; text-align:right;")
          ),
          tags$td(df$step[i], style = "text-align:right;"),
          tags$td(span(
            ifelse(df$y[i] == 1, "✓", "✗"),
            class = ifelse(df$y[i] == 1, "correct-response", "incorrect-response")
          ), style = "text-align:center;"),
          tags$td(sprintf("%.3f", df$a[i]), style = "text-align:right;"),
          tags$td(sprintf("%.3f", df$b[i]), style = "text-align:right;"),
          tags$td(sprintf("%.3f", df$c[i]), style = "text-align:right;"),
          tags$td(sprintf("%.3f", df$p_theta[i]), style = "text-align:right;"),
          tags$td(if (!is.na(df$I_at_selection[i])) sprintf("%.3f", df$I_at_selection[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$I_at_theta[i])) sprintf("%.3f", df$I_at_theta[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$total_info[i])) sprintf("%.3f", df$total_info[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$theta_est[i])) sprintf("%.3f", df$theta_est[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$se[i])) sprintf("%.3f", df$se[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$reliability[i])) sprintf("%.3f", df$reliability[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$precision[i])) sprintf("%.3f", df$precision[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$bias[i])) sprintf("%+.3f", df$bias[i]) else "—", style = "text-align:right;"),
          tags$td(if (!is.na(df$rmse[i])) sprintf("%.3f", df$rmse[i]) else "—", style = "text-align:right;")
        )))
    )
    
    div(
      class = "complete-table",
      tbl
    )
  })
  
  # Enhanced Information table with both current and historical IIFs and TIF
  output$information_table <- renderUI({
    st <- active_state()
    if (st$step == 0)
      return(NULL)
    
    df <- merge(st$chosen, st$history[, c("step", "theta_est", "total_info")], by = "step")
    df$I_at_theta <- round(df$I_at_theta, 3)
    df$I_at_selection <- round(df$I_at_selection, 3)
    df$theta_est <- round(df$theta_est, 3)
    df$total_info <- round(df$total_info, 3)
    df$I_change <- round(df$I_at_theta - df$I_at_selection, 3)
    
    # CALCULATE LOG-LIKELIHOOD FOR EACH STEP
    df$log_likelihood <- sapply(1:nrow(df), function(i) {
      theta_current <- df$theta_est[i]
      items_so_far <- df[1:i, ]
      sum_log_lik <- 0
      for(j in 1:i) {
        P <- P_3pl(theta_current, items_so_far$a[j], items_so_far$b[j], items_so_far$c[j])
        log_lik <- ifelse(items_so_far$y[j] == 1, log(P), log(1-P))
        sum_log_lik <- sum_log_lik + log_lik
      }
      round(sum_log_lik, 3)
    })
    
    if (nrow(df) > 0) {
      cols <- rainbow(nrow(df))
    } else {
      return(NULL)
    }
    
    tbl <- tags$table(class = "table table-bordered table-striped compact-table",
                      style = "width:100%; margin-top:8px;",
                      tags$thead(
                        tags$tr(
                          tags$th("Item", style = "text-align:center;"),
                          tags$th("Step", style = "text-align:center;"),
                          tags$th("Resp", style = "text-align:center;"),
                          tags$th(HTML("I(θ<sub>select</sub>)"), style = "text-align:center;"),
                          tags$th(HTML("I(θ<sub>final</sub>)"), style = "text-align:center;"),
                          tags$th(HTML("ΔI"), style = "text-align:center;"),
                          tags$th(HTML("TIF"), style = "text-align:center;"),
                          tags$th(HTML("θ̂"), style = "text-align:center;"),
                          tags$th(HTML("log-L"), style = "text-align:center;")  # NEW COLUMN
                        )
                      ),
                      tags$tbody(lapply(seq_len(nrow(df)), function(i)
                        tags$tr(
                          class = "color-coded-row",
                          style = paste0("border-left:4px solid ", cols[i], ";"),
                          tags$td(df$id[i], style = "text-align:right;"),
                          tags$td(df$step[i], style = "text-align:right;"),
                          tags$td(span(
                            ifelse(df$y[i] == 1, "✓", "✗"),
                            class = ifelse(df$y[i] == 1, "correct-response", "incorrect-response")
                          ), style = "text-align:center;"),
                          tags$td(sprintf("%.3f", df$I_at_selection[i]), style = "text-align:right;"),
                          tags$td(sprintf("%.3f", df$I_at_theta[i]), style = "text-align:right;"),
                          tags$td(
                            sprintf("%+.3f", df$I_change[i]),
                            style = sprintf(
                              "text-align:right; color: %s;",
                              ifelse(df$I_change[i] >= 0, "green", "red")
                            )
                          ),
                          tags$td(sprintf("%.3f", df$total_info[i]), style = "text-align:right;"),
                          tags$td(sprintf("%.3f", df$theta_est[i]), style = "text-align:right;"),
                          # NEW LOG-LIKELIHOOD COLUMN
                          tags$td(
                            sprintf("%.3f", df$log_likelihood[i]),
                            style = sprintf(
                              "text-align:right; color: %s;",
                              ifelse(df$log_likelihood[i] >= -2, "green", 
                                     ifelse(df$log_likelihood[i] >= -5, "orange", "red"))
                            )
                          )
                        ))))
    
    div(
      class = "information-table",
      div(
        style = "margin-bottom: 8px; font-size: 10px; color: #6c757d;",
        HTML("I(θ<sub>select</sub>): IIF at selection θ | I(θ<sub>final</sub>): IIF at final θ | ΔI: Information change | log-L: Cumulative log-likelihood")
      ),
      tbl
    )
  })
  
  output$convergence_table <- renderUI({
    st <- active_state()
    if (st$step == 0)
      return(NULL)
    
    df <- merge(st$chosen, st$history[, c("step", "theta_est")], by = "step")
    theta_true <- input$true_theta %or% 0
    df$bias <- round(df$theta_est - theta_true, 3)
    df$theta_est <- round(df$theta_est, 3)
    
    # Calculate RMSE for each step
    df$rmse <- sapply(1:nrow(df), function(i) {
      sqrt(mean((df$theta_est[1:i] - theta_true)^2))
    })
    
    if (nrow(df) > 0) {
      cols <- rainbow(nrow(df))
    } else {
      return(NULL)
    }
    
    tbl <- tags$table(class = "table table-bordered table-striped compact-table",
                      style = "width:100%; margin-top:8px;",
                      tags$thead(tags$tr(
                        tags$th("Item", style = "text-align:center;"),
                        tags$th("Step", style = "text-align:center;"),
                        tags$th("Resp.", style = "text-align:center;"),
                        tags$th(HTML("θ̂"), style = "text-align:center;"),
                        tags$th("Bias", style = "text-align:center;"),
                        tags$th("RMSE", style = "text-align:center;")
                      )),
                      tags$tbody(lapply(seq_len(nrow(df)), function(i)
                        tags$tr(
                          class = "color-coded-row",
                          style = paste0("border-left:4px solid ", cols[i], ";"),
                          tags$td(df$id[i], style = "text-align:right;"),
                          tags$td(df$step[i], style = "text-align:right;"),
                          tags$td(span(
                            ifelse(df$y[i] == 1, "✓", "✗"),
                            class = ifelse(df$y[i] == 1, "correct-response", "incorrect-response")
                          ), style = "text-align:center;"),
                          tags$td(sprintf("%.3f", df$theta_est[i]), style = "text-align:right;"),
                          tags$td(sprintf("%.3f", df$bias[i]), style = "text-align:right;"),
                          tags$td(sprintf("%.3f", df$rmse[i]), style = "text-align:right;")
                        ))))
    div(class = "convergence-table", tbl)
  })
  
  # ---- CONVERGENCE STATS SECTION ----
  output$convergence_stats <- renderUI({
    st <- active_state()
    if (st$step == 0) return(NULL)
    
    theta_true <- input$true_theta %or% 0
    current_theta <- st$theta_hat
    current_se <- st$se
    
    # Calculate RMSE and Reliability
    rmse <- sqrt(mean((st$history$theta_est - theta_true)^2))
    reliability <- 1 - current_se^2
    
    # Calculate bias
    bias <- current_theta - theta_true
    
    div(
      class = "compact-text",
      style = "margin-top: 10px; padding: 8px; background-color: #f8f9fa; border-radius: 6px;",
      h6("Convergence Statistics", style = "margin-bottom: 6px; color: #2C3E50;"),
      tags$table(
        class = "table table-sm compact-table",
        style = "width: 100%; margin-bottom: 0;",
        tags$tbody(
          tags$tr(
            tags$td(HTML("<b>True θ:</b>"), style = "text-align: left;"),
            tags$td(sprintf("%.3f", theta_true), style = "text-align: right;"),
            tags$td(HTML("<b>Final θ̂:</b>"), style = "text-align: left;"),
            tags$td(sprintf("%.3f", current_theta), style = "text-align: right;")
          ),
          tags$tr(
            tags$td(HTML("<b>Bias (θ̂ - θ):</b>"), style = "text-align: left;"),
            tags$td(sprintf("%+.3f", bias), style = "text-align: right;"),
            tags$td(HTML("<b>RMSE:</b>"), style = "text-align: left;"),
            tags$td(sprintf("%.3f", rmse), style = "text-align: right;")
          ),
          tags$tr(
            tags$td(HTML("<b>Standard Error:</b>"), style = "text-align: left;"),
            tags$td(sprintf("%.3f", current_se), style = "text-align: right;"),
            tags$td(HTML("<b>Reliability (ρ):</b>"), style = "text-align: left;"),
            tags$td(sprintf("%.3f", reliability), style = "text-align: right;")
          )
        )
      )
    )
  })
  
  # ---- ITEM POOL ANALYSIS OUTPUTS SECTION ----
  
  # Item pool distribution plots
  output$plot_pool_distribution <- renderPlot({
    pl <- pool()
    st <- active_state()
    
    par(
      mfrow = c(2, 2),
      mar = c(3, 3, 2, 1),
      mgp = c(1.8, 0.5, 0),
      bg = "white",
      cex.axis = 0.8,
      cex.lab = 0.9
    )
    
    # Full pool difficulty distribution
    hist(
      pl$b,
      breaks = 20,
      col = "#3498DB",
      border = "white",
      main = "Item Difficulty (b) - Full Pool",
      xlab = "Difficulty (b)",
      ylab = "Frequency",
      xlim = c(-3, 3)
    )
    abline(
      v = input$true_theta,
      col = "#E74C3C",
      lwd = 2,
      lty = 2
    )
    legend(
      "topright",
      legend = "True θ",
      col = "#E74C3C",
      lwd = 2,
      lty = 2,
      cex = 1
    )
    
    # Full pool discrimination distribution
    hist(
      pl$a,
      breaks = 20,
      col = "#2ECC71",
      border = "white",
      main = "Item Discrimination (a) - Full Pool",
      xlab = "Discrimination (a)",
      ylab = "Frequency"
    )
    abline(
      v = 0.8,
      col = "#E74C3C",
      lwd = 2,
      lty = 2
    )
    legend(
      "topright",
      legend = "Good a ≥ 0.8",
      col = "#E74C3C",
      lwd = 2,
      lty = 2,
      cex = 1
    )
    
    # Administered items difficulty
    if (st$step > 0) {
      used_items <- pl[pl$id %in% st$used, ]
      hist(
        used_items$b,
        breaks = min(10, length(unique(used_items$b))),
        col = "#E74C3C",
        border = "white",
        main = paste(
          "Administered Items Difficulty (n =",
          nrow(used_items),
          ")"
        ),
        xlab = "Difficulty (b)",
        ylab = "Frequency",
        xlim = c(-3, 3)
      )
      abline(
        v = input$true_theta,
        col = "#E74C3C",
        lwd = 2,
        lty = 2
      )
      abline(
        v = st$theta_hat,
        col = "#2C3E50",
        lwd = 2,
        lty = 1
      )
      legend(
        "topright",
        legend = c("True θ", "Current θ̂"),
        col = c("#E74C3C", "#2C3E50"),
        lwd = 2,
        lty = c(2, 1),
        cex = 1
      )
    } else {
      plot(
        0,
        0,
        type = "n",
        xlab = "",
        ylab = "",
        main = "Administered Items Difficulty",
        xlim = c(0, 1),
        ylim = c(0, 1)
      )
      text(0.5, 0.5, "No items administered yet", col = "gray50")
    }
    
    # Administered items discrimination
    if (st$step > 0) {
      used_items <- pl[pl$id %in% st$used, ]
      hist(
        used_items$a,
        breaks = min(10, length(unique(used_items$a))),
        col = "#F39C12",
        border = "white",
        main = paste(
          "Administered Items Discrimination (n =",
          nrow(used_items),
          ")"
        ),
        xlab = "Discrimination (a)",
        ylab = "Frequency"
      )
      abline(
        v = 0.8,
        col = "#E74C3C",
        lwd = 2,
        lty = 2
      )
      legend(
        "topright",
        legend = "Good a ≥ 0.8",
        col = "#E74C3C",
        lwd = 2,
        lty = 2,
        cex = 1
      )
    } else {
      plot(
        0,
        0,
        type = "n",
        xlab = "",
        ylab = "",
        main = "Administered Items Discrimination",
        xlim = c(0, 1),
        ylim = c(0, 1)
      )
      text(0.5, 0.5, "No items administered yet", col = "gray50")
    }
  })
  
  # Pool quality dashboard
  output$pool_quality_dashboard <- renderUI({
    pl <- pool()
    st <- active_state()
    theta_true <- input$true_theta %or% 0
    
    # Comprehensive quality metrics
    quality_metrics <- list(
      # Discrimination metrics
      excellent_a = sum(pl$a >= 1.2),
      good_a = sum(pl$a >= 0.8 & pl$a < 1.2),
      fair_a = sum(pl$a >= 0.5 & pl$a < 0.8),
      poor_a = sum(pl$a < 0.5),
      mean_a = mean(pl$a),
      sd_a = sd(pl$a),
      
      # Difficulty metrics
      mean_b = mean(pl$b),
      sd_b = sd(pl$b),
      coverage_very_easy = sum(pl$b < -1.5),
      coverage_easy = sum(pl$b >= -1.5 & pl$b < -0.5),
      coverage_medium = sum(pl$b >= -0.5 & pl$b <= 0.5),
      coverage_hard = sum(pl$b > 0.5 & pl$b <= 1.5),
      coverage_very_hard = sum(pl$b > 1.5),
      
      # Guessing metrics
      low_guessing = sum(pl$c < 0.1),
      medium_guessing = sum(pl$c >= 0.1 & pl$c < 0.2),
      high_guessing = sum(pl$c >= 0.2),
      mean_c = mean(pl$c),
      
      # Targeting efficiency
      items_near_theta = sum(abs(pl$b - theta_true) <= 0.5),
      prop_near_theta = mean(abs(pl$b - theta_true) <= 0.5),
      mean_targeting_error = mean(abs(pl$b - theta_true)),
      
      # Overall quality score (0-1 scale)
      quality_score = mean(c(
        pmin(1, mean(pl$a) / 1.0),           # Discrimination component
        1 - (mean(pl$c) / 0.25),             # Guessing component (lower is better)
        pmin(1, sum(abs(pl$b) <= 2) / nrow(pl)),  # Difficulty range component
        pmin(1, sum(abs(pl$b - theta_true) <= 1) / nrow(pl))  # Targeting component
      ))
    )
    
    # Calculate administered items metrics if any items used
    administered_metrics <- if (st$step > 0) {
      used_items <- pl[pl$id %in% st$used, ]
      list(
        n_used = nrow(used_items),
        prop_used = nrow(used_items) / nrow(pl),
        mean_a_used = mean(used_items$a),
        mean_b_used = mean(used_items$b),
        mean_c_used = mean(used_items$c),
        targeting_bias = mean(used_items$b) - theta_true,
        abs_targeting_error = mean(abs(used_items$b - theta_true)),
        prop_high_a_used = mean(used_items$a >= 0.8),
        prop_near_theta_used = mean(abs(used_items$b - theta_true) <= 0.5)
      )
    } else NULL
    
    div(class = "compact-text",
        h6("Pool Quality Dashboard", style = "margin-bottom: 8px; color: #2C3E50;"),
        
        # Overall quality score
        div(style = "text-align: center; margin-bottom: 10px;",
            span(class = "badge", style = sprintf("background-color: %s; 
                                                  color: white; 
                                                  font-size: 12px; 
                                                  padding: 8px 16px;",
                                                  ifelse(quality_metrics$quality_score >= 0.7, "#28a745",
                                                         ifelse(quality_metrics$quality_score >= 0.5, "#ffc107", "#dc3545"))),
                 sprintf("Overall Quality Score: %.1f/1.0", quality_metrics$quality_score))),
        
        fluidRow(
          column(6,
                 h6("Discrimination Quality", style = "color: #2C3E50; margin-bottom: 5px;"),
                 div(style = "background-color: #e8f4f8; 
                     padding: 8px; 
                     border-radius: 6px;",  # Light blue
                     tags$table(class = "table table-sm compact-table",
                                tags$tbody(
                                  tags$tr(tags$td("Excellent (≥1.2):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$excellent_a, 100 * quality_metrics$excellent_a/nrow(pl)))),
                                  tags$tr(tags$td("Good (0.8-1.2):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$good_a, 100 * quality_metrics$good_a/nrow(pl)))),
                                  tags$tr(tags$td("Fair (0.5-0.8):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$fair_a, 100 * quality_metrics$fair_a/nrow(pl)))),
                                  tags$tr(tags$td("Poor (<0.5):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$poor_a, 100 * quality_metrics$poor_a/nrow(pl)))),
                                  tags$tr(tags$td("Mean a:"), tags$td(sprintf("%.3f", quality_metrics$mean_a))),
                                  tags$tr(tags$td("SD a:"), tags$td(sprintf("%.3f", quality_metrics$sd_a)))
                                )))),
          column(6,
                 h6("Difficulty Distribution", style = "color: #2C3E50; margin-bottom: 5px;"),
                 div(style = "background-color: #f0e8f8; padding: 8px; border-radius: 6px;",  # Light purple
                     tags$table(class = "table table-sm compact-table",
                                tags$tbody(
                                  tags$tr(tags$td("Very Easy (< -1.5):"), tags$td(sprintf("%d", quality_metrics$coverage_very_easy))),
                                  tags$tr(tags$td("Easy (-1.5 to -0.5):"), tags$td(sprintf("%d", quality_metrics$coverage_easy))),
                                  tags$tr(tags$td("Medium (-0.5 to 0.5):"), tags$td(sprintf("%d", quality_metrics$coverage_medium))),
                                  tags$tr(tags$td("Hard (0.5 to 1.5):"), tags$td(sprintf("%d", quality_metrics$coverage_hard))),
                                  tags$tr(tags$td("Very Hard (> 1.5):"), tags$td(sprintf("%d", quality_metrics$coverage_very_hard))),
                                  tags$tr(tags$td("Mean b:"), tags$td(sprintf("%.3f", quality_metrics$mean_b))),
                                  tags$tr(tags$td("SD b:"), tags$td(sprintf("%.3f", quality_metrics$sd_b)))
                                ))))
        ),
        
        fluidRow(
          column(6,
                 h6("Guessing Parameters", style = "color: #2C3E50; margin-bottom: 5px;"),
                 div(style = "background-color: #f8f0e8; padding: 8px; border-radius: 6px;",  # Light orange
                     tags$table(class = "table table-sm compact-table",
                                tags$tbody(
                                  tags$tr(tags$td("Low (< 0.1):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$low_guessing, 100 * quality_metrics$low_guessing/nrow(pl)))),
                                  tags$tr(tags$td("Medium (0.1-0.2):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$medium_guessing, 100 * quality_metrics$medium_guessing/nrow(pl)))),
                                  tags$tr(tags$td("High (≥ 0.2):"), tags$td(sprintf("%d (%.1f%%)", quality_metrics$high_guessing, 100 * quality_metrics$high_guessing/nrow(pl)))),
                                  tags$tr(tags$td("Mean c:"), tags$td(sprintf("%.3f", quality_metrics$mean_c)))
                                )))),
          column(6,
                 h6("Targeting Efficiency", style = "color: #2C3E50; margin-bottom: 5px;"),
                 div(style = "background-color: #e8f8f0; padding: 8px; border-radius: 6px;",  # Light green
                     tags$table(class = "table table-sm compact-table",
                                tags$tbody(
                                  tags$tr(tags$td("Items within ±0.5 of θ:"), 
                                          tags$td(sprintf("%d (%.1f%%)", 
                                                          quality_metrics$items_near_theta, 100 * quality_metrics$prop_near_theta))),
                                  tags$tr(tags$td("Mean targeting error:"), 
                                          tags$td(sprintf("%.3f", 
                                                          quality_metrics$mean_targeting_error))),
                                  tags$tr(tags$td("True θ:"), 
                                          tags$td(sprintf("%.2f", theta_true))),
                                  if (!is.null(administered_metrics)) 
                                    tags$tr(tags$td("Pool mean b:"), 
                                            tags$td(sprintf("%.3f", 
                                                            quality_metrics$mean_b)))
                                ))))
        ),
    )
  })  
  
  # Targeting efficiency plot
  output$plot_targeting_efficiency <- renderPlot({
    pl <- pool()
    st <- active_state()
    used_items <- if (st$step > 0)
      pl[pl$id %in% st$used, ]
    else
      NULL
    theta_true <- input$true_theta %or% 0
    
    theta_grid <- seq(-3, 3, length.out = 100)
    info_full <- sapply(theta_grid, function(th)
      sum(I_3pl(th, pl$a, pl$b, pl$c)))
    info_used <- if (!is.null(used_items))
      sapply(theta_grid, function(th)
        sum(I_3pl(
          th, used_items$a, used_items$b, used_items$c
        )))
    else
      NULL
    
    par(
      mar = c(3, 3, 2, 1),
      mgp = c(1.8, 0.5, 0),
      bg = "white"
    )
    
    plot(
      theta_grid,
      info_full,
      type = "l",
      lwd = 3,
      col = "#3498DB",
      xlab = "Ability (θ)",
      ylab = "Total Information",
      main = "Targeting Efficiency: Information Functions",
      ylim = c(0, max(info_full) * 1.1)
    )
    
    if (!is.null(info_used))
      lines(
        theta_grid,
        info_used,
        lwd = 3,
        col = "#E74C3C",
        lty = 1
      )
    
    abline(
      v = theta_true,
      col = "#2C3E50",
      lwd = 2,
      lty = 2
    )
    if (st$step > 0)
      abline(
        v = st$theta_hat,
        col = "#F39C12",
        lwd = 2,
        lty = 2
      )
    
    legend(
      "topleft",
      legend = c(
        "Full Pool Information",
        if (!is.null(info_used))
          "Administered Items Information",
        "True θ",
        if (st$step > 0)
          "Current θ̂"
      ),
      col = c(
        "#3498DB",
        if (!is.null(info_used))
          "#E74C3C"
        else
          NA,
        "#2C3E50",
        if (st$step > 0)
          "#F39C12"
        else
          NA
      ),
      lwd = 2,
      lty = c(1, 1, 2, 2),
      cex = 1
    )
  })
  
  # Parameter correlations plot
  output$plot_parameter_correlations <- renderPlot({
    pl <- pool()
    st <- active_state()
    used_items <- if (st$step > 0)
      pl[pl$id %in% st$used, ]
    else
      NULL
    
    par(
      mfrow = c(1, 2),
      mar = c(3, 3, 2, 1),
      mgp = c(1.8, 0.5, 0)
    )
    
    # Discrimination vs Difficulty
    plot(
      pl$b,
      pl$a,
      pch = 16,
      col = adjustcolor("#3498DB", 0.6),
      xlab = "Difficulty (b)",
      ylab = "Discrimination (a)",
      main = "Discrimination vs Difficulty",
      xlim = c(-3, 3),
      ylim = c(0, max(pl$a) * 1.1)
    )
    
    if (!is.null(used_items))
      points(
        used_items$b,
        used_items$a,
        pch = 16,
        col = adjustcolor("#E74C3C", 0.8),
        cex = 1.2
      )
    
    cor_ab <- cor(pl$b, pl$a)
    legend(
      "topright",
      legend = sprintf("r = %.3f", cor_ab),
      bty = "n",
      cex = 1
    )
    
    # Discrimination vs Guessing
    plot(
      pl$c,
      pl$a,
      pch = 16,
      col = adjustcolor("#2ECC71", 0.6),
      xlab = "Guessing (c)",
      ylab = "Discrimination (a)",
      main = "Discrimination vs Guessing",
      xlim = c(0, max(pl$c) * 1.1),
      ylim = c(0, max(pl$a) * 1.1)
    )
    
    if (!is.null(used_items))
      points(
        used_items$c,
        used_items$a,
        pch = 16,
        col = adjustcolor("#E74C3C", 0.8),
        cex = 1.2
      )
    
    cor_ac <- cor(pl$c, pl$a)
    legend(
      "topright",
      legend = sprintf("r = %.3f", cor_ac),
      bty = "n",
      cex = 1
    )
    
    legend(
      "bottomright",
      legend = c("Full Pool", "Administered Items"),
      pch = 16,
      col = c(adjustcolor("#3498DB", 0.6), adjustcolor("#E74C3C", 0.8)),
      cex = 0.7
    )
  })
  
  # Pool statistics table
  output$pool_stats <- renderUI({
    pl <- pool()
    st <- active_state()
    used_items <- if (st$step > 0)
      pl[pl$id %in% st$used, ]
    else
      NULL
    theta_true <- input$true_theta %or% 0
    
    stats_full <- list(
      n_items = nrow(pl),
      mean_b = mean(pl$b),
      sd_b = sd(pl$b),
      mean_a = mean(pl$a),
      sd_a = sd(pl$a),
      mean_c = mean(pl$c),
      sd_c = sd(pl$c),
      median_b = median(pl$b),
      median_a = median(pl$a),
      median_c = median(pl$c),
      range_b = diff(range(pl$b)),
      range_a = diff(range(pl$a)),
      range_c = diff(range(pl$c)),
      info_at_theta = sum(I_3pl(theta_true, pl$a, pl$b, pl$c)),
      max_info_per_item = max(I_3pl(theta_true, pl$a, pl$b, pl$c)),
      items_near_theta = sum(abs(pl$b - theta_true) <= 0.5),
      prop_near_theta = mean(abs(pl$b - theta_true) <= 0.5),
      high_discrimination = sum(pl$a >= 1.0),
      low_discrimination = sum(pl$a < 0.5),
      high_guessing = sum(pl$c >= 0.2)
    )
    
    stats_used <- if (!is.null(used_items)) {
      list(
        n_items = nrow(used_items),
        mean_b = mean(used_items$b),
        sd_b = sd(used_items$b),
        mean_a = mean(used_items$a),
        sd_a = sd(used_items$a),
        mean_c = mean(used_items$c),
        sd_c = sd(used_items$c),
        median_b = median(used_items$b),
        median_a = median(used_items$a),
        info_at_theta = sum(
          I_3pl(theta_true, used_items$a, used_items$b, used_items$c)
        ),
        info_at_estimate = if (st$step > 0)
          sum(
            I_3pl(st$theta_hat, used_items$a, used_items$b, used_items$c)
          )
        else
          NA,
        bias_targeting = mean(used_items$b) - theta_true,
        abs_bias_targeting = mean(abs(used_items$b - theta_true)),
        prop_high_a = mean(used_items$a >= 1.0),
        prop_low_a = mean(used_items$a < 0.5),
        items_near_theta_used = sum(abs(used_items$b - theta_true) <= 0.5),
        prop_near_theta_used = mean(abs(used_items$b - theta_true) <= 0.5)
      )
    } else
      NULL
    
    div(
      class = "compact-text",
      style = "margin-top: 10px;",
      h6("Pool Characteristics", style = "margin-bottom: 8px; color: #2C3E50;"),
      tags$table(
        class = "table table-bordered compact-table",
        style = "width: 100%; margin-bottom: 15px;",
        tags$thead(tags$tr(
          tags$th("Metric"), tags$th("Full Pool"), tags$th(
            ifelse(st$step > 0, "Administered Items", "No Items Administered")
          )
        )),
        tags$tbody(
          tags$tr(class = "table-active", tags$td(colspan = 3, strong(
            "Basic Statistics"
          ))),
          tags$tr(
            tags$td("Number of Items"),
            tags$td(sprintf("%d", stats_full$n_items)),
            tags$td(ifelse(
              st$step > 0,
              sprintf(
                "%d (%.1f%%)",
                stats_used$n_items,
                100 * stats_used$n_items / stats_full$n_items
              ),
              "—"
            ))
          ),
          tags$tr(
            tags$td("Mean Difficulty (b)"),
            tags$td(sprintf("%.3f", stats_full$mean_b)),
            tags$td(ifelse(
              st$step > 0, sprintf("%.3f", stats_used$mean_b), "—"
            ))
          ),
          tags$tr(
            tags$td("SD Difficulty"),
            tags$td(sprintf("%.3f", stats_full$sd_b)),
            tags$td(ifelse(
              st$step > 0, sprintf("%.3f", stats_used$sd_b), "—"
            ))
          ),
          tags$tr(
            tags$td("Mean Discrimination (a)"),
            tags$td(sprintf("%.3f", stats_full$mean_a)),
            tags$td(ifelse(
              st$step > 0, sprintf("%.3f", stats_used$mean_a), "—"
            ))
          ),
          
          tags$tr(class = "table-active", tags$td(
            colspan = 3, strong("Targeting Efficiency")
          )),
          tags$tr(
            tags$td("Items within ±0.5 of True θ"),
            tags$td(
              sprintf(
                "%d (%.1f%%)",
                stats_full$items_near_theta,
                100 * stats_full$prop_near_theta
              )
            ),
            tags$td(ifelse(
              st$step > 0,
              sprintf(
                "%d (%.1f%%)",
                stats_used$items_near_theta_used,
                100 * stats_used$prop_near_theta_used
              ),
              "—"
            ))
          ),
          tags$tr(
            tags$td("Mean Absolute Targeting Error"),
            tags$td(sprintf("%.3f", mean(
              abs(pl$b - theta_true)
            ))),
            tags$td(ifelse(
              st$step > 0,
              sprintf("%.3f", stats_used$abs_bias_targeting),
              "—"
            ))
          ),
          tags$tr(
            tags$td("Bias in Targeting (Mean b - True θ)"),
            tags$td(sprintf("%.3f", mean(
              pl$b - theta_true
            ))),
            tags$td(ifelse(
              st$step > 0,
              sprintf("%.3f", stats_used$bias_targeting),
              "—"
            ))
          ),
          
          tags$tr(class = "table-active", tags$td(colspan = 3, strong("Information"))),
          tags$tr(
            tags$td("Total Info at True θ"),
            tags$td(sprintf("%.2f", stats_full$info_at_theta)),
            tags$td(ifelse(
              st$step > 0, sprintf("%.2f", stats_used$info_at_theta), "—"
            ))
          ),
          tags$tr(
            tags$td("Max Info per Item"),
            tags$td(sprintf(
              "%.2f", stats_full$max_info_per_item
            )),
            tags$td("—")
          ),
          
          tags$tr(class = "table-active", tags$td(colspan = 3, strong(
            "Quality Metrics"
          ))),
          tags$tr(
            tags$td("High Discrimination (≥1.0)"),
            tags$td(
              sprintf(
                "%d (%.1f%%)",
                stats_full$high_discrimination,
                100 * stats_full$high_discrimination / stats_full$n_items
              )
            ),
            tags$td(ifelse(
              st$step > 0,
              sprintf(
                "%d (%.1f%%)",
                sum(used_items$a >= 1.0),
                100 * stats_used$prop_high_a
              ),
              "—"
            ))
          ),
          tags$tr(
            tags$td("High Guessing (≥0.2)"),
            tags$td(
              sprintf(
                "%d (%.1f%%)",
                stats_full$high_guessing,
                100 * stats_full$high_guessing / stats_full$n_items
              )
            ),
            tags$td(ifelse(
              st$step > 0, sprintf(
                "%d (%.1f%%)",
                sum(used_items$c >= 0.2),
                100 * mean(used_items$c >= 0.2)
              ), "—"
            ))
          )
        )
      )
    )
  })
}  

# ===========================================
# ---- LAUNCH APPLICATION ----
# ===========================================

shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

Pool Quality Dashboard — What These Numbers Mean

Overall Quality Score:
A simple composite (0–1) summarizing pool “health”, computed as the mean of: 1. Discrimination component: min(1, mean(a) / 1.0) 2. Guessing component: 1 - mean(c) / 0.25 (clipped to [0, 1]) 3. Difficulty coverage: min(1, proportion(|b| ≤ 2))

Interpretation: 0.8 suggests strong discrimination on average, low guessing, and broadly centered difficulty coverage.


Discrimination Bands (count of items)

  • Excellent a (≥ 1.2):
  • Good a (0.8–1.2):
  • Poor a (< 0.5):

Higher counts in “Excellent” and “Good” are desirable; a very large “Poor” bucket would hurt precision.


Difficulty Coverage (b) (count of items)

  • Very Easy: (b < −1.5)
  • Easy: (−1.5 ≤ b < −0.5)
  • Medium: (−0.5 ≤ b ≤ 0.5)
  • Hard: (0.5 < b ≤ 1.5)
  • Very Hard: (b > 1.5)

A healthy pool spans the full ability range; over-clustering near 0 (Medium) can limit targeting at the tails.


Why These Matter

  • Higher a ⇒ more information near b → better measurement precision where items are targeted.
  • Lower c (guessing) ⇒ cleaner signal → reduces floor on error for low ability levels.
  • Balanced b coverage ⇒ adaptive tests can target across θ → fewer “gaps” at extremes.

All numbers reflect the current generated pool and will change when the pool is regenerated or filtered.


Observations

It’s entirely normal for a CAT to stop with SE below the target while \(\hat{\theta}\) is outside our ±SE band around the true \(\theta\) in simulation. Bias is monitored in item calibration and design, not enforced at runtime—because the true \(\theta\) is never known. For simulation purposes, we assume knowledge of the true ability to compute bias and evaluate how well the estimation algorithm performs.

Our stopping rule is SE-based (“stop when \(SE \le \text{target\_SE}\)”), which measures precision, not accuracy to the (sim-known) true \(\theta\).

We are plotting the ±target_SE band around the true \(\theta\) (orange dashed lines). An estimate can have small SE (precise) yet still be biased away from the true value—so it can sit outside that band and the test still stops.


Sources of Bias

  • EAP shrinkage toward the prior \(N(0,1)\).
    If the true \(\theta\) is far from 0, the posterior mean is pulled toward 0 even with low SE.
  • Item pool targeting: not enough high-information items near the true \(\theta\).
  • Guessing parameter \(c\) inflates lower probabilities and shifts the MLE/EAP.
  • Random variation: response draws, top-k selection randomness, grid coarseness, etc.

How to Reduce This in Simulation (Optional Tweaks)

  • Add a stability rule alongside SE:
    stop only if \(SE \le \text{target}\) and \(|\Delta \hat{\theta}|\) over last \(m\) steps < \(\varepsilon\)
    (e.g., \(m=3, \varepsilon=0.05\)).
  • Use a more informative prior (or MAP) centered near the expected ability.
  • Enrich the pool: more items with \(b\) near evolving \(\hat{\theta}\), high \(a\), modest \(c\).
  • Tighten selection (larger top-k or pure max-info) and/or allow more items before stopping.
  • Track \(z = |\hat{\theta} - \theta_{\text{true}}| / SE\);
    large \(z>1\) means the estimate lies outside ±1·SE despite high precision.

Bottom Line

\(SE \le 0.20\) only guarantees a narrow posterior, not closeness to the (simulated) true \(\theta\). So stopping with final bias = 0.320 is entirely consistent with our rule. Operational CATs can’t observe the true \(\theta\), so they rely on observable precision criteria:

  • SE (information) threshold: stop when \(SE(\hat{\theta}) \le 0.30\) (≈ reliability ≥ .91).
  • Min/max items: enforce practical bounds (e.g., 5–10 min, 25–40 max).
  • Stability guard: require small \(|\Delta \hat{\theta}|\) across last few steps.
  • Decision accuracy: mastery tests stop when posterior error < 5% at the cut score.
  • Operational constraints: time, content balance, or exposure limits.

Key Insights

Trade-offs Between Methods:

  • Fixed-form (Random): Often shows better final bias and reliability because it uses diverse items across the ability spectrum, providing more stable estimation.

  • Adaptive Methods (MFI/Top-k): Achieve target precision with far fewer items (more efficient), but may show slightly higher bias due to sequential dependency and concentrated item selection.

What You’ll Observe:

  • Adaptive tests reach the target SE much faster (fewer items)
  • Fixed-form tests often produce more accurate final estimates (lower bias)
  • The choice between methods involves balancing testing efficiency against measurement accuracy

Goal: Compare these trade-offs to understand when each approach might be preferred in real-world testing scenarios.