Fuzzy Cluster-Adaptive p'-Chart - Morán-Zabala et al. (2026)

The method monitors a nonconformity or discard proportion by subgroup. For each observation \(i\), the following proportion is defined:

where \(X_i\) is the number of non-viable, discarded, or nonconforming units, and \(n_i\) is the subgroup size. In heterogeneous processes, a single global parameter can induce false signals because it mixes different operational regimes. Therefore, Fuzzy C-Means is used to represent latent states with partial memberships.

1.1. Fuzzy C-Means

where \(u_{ij}\) is the membership degree of observation \(i\) in cluster \(j\), \(v_j\) is the centroid, and \(m>1\) is the fuzzification parameter. This code uses \(m=2\) by default.

1.2. Cluster-conditioned central line

The central line of cluster \(j\) is estimated through a fuzzy-membership-weighted mean:

Additionally, to summarize the global performance of a window, the following quantity can be computed:

where \(n_j^{\ast}\) corresponds to the number of observations dominantly assigned to cluster \(j\).

1.3. Regime-specific variability

The extra-binomial or between-subgroup component is obtained by subtracting the expected binomial variability:

1.4. FCA p'-chart adaptive control limits

2. Included validation indices

2.1. Equations of the validation indices

Index	Interpretation	Usual criterion
PC: Partition Coefficient	Measures compact fuzzy partitioning.	Higher is better.
PE: Partition Entropy	Measures partition uncertainty.	Lower is better.
MPC: Modified Partition Coefficient	PC adjusted by number of clusters.	Higher is better.
SIL.F aproximado	Silhouette based on dominant assignment.	Higher is better.
DB: Davies-Bouldin	Dispersion/separation ratio.	Lower is better.
CH: Calinski-Harabasz	Between-group separation relative to within-group compactness.	Higher is better.
Gap aproximado	Comparison against a uniform reference distribution.	Higher value or elbow criterion.

Partition Coefficient (PC). Evaluates how well-defined the fuzzy assignment is. Higher values indicate more concentrated memberships.

Partition Entropy (PE). Measures partition uncertainty. Lower values indicate less ambiguity.

Modified Partition Coefficient (MPC). Corrects PC by the number of clusters.

Davies-Bouldin (DB). Contrasts internal cluster dispersion with centroid separation.

Calinski-Harabasz (CH). Compares between-cluster variability against within-cluster variability.

Signal rate by window. Summarizes the proportion of out-of-control observations in a window.

3. Executable RHTML code

# ==========================================================
# 0. Instalación/carga de paquetes
# ==========================================================
required_pkgs <- c(
  "readxl", "dplyr", "ggplot2", "e1071", "scales", "writexl",
  "tidyr", "cluster", "knitr"
)

install_if_missing <- function(pkgs){
  missing <- pkgs[!pkgs %in% rownames(installed.packages())]
  if(length(missing) > 0){
    install.packages(missing, dependencies = TRUE)
  }
}
install_if_missing(required_pkgs)
invisible(lapply(required_pkgs, library, character.only = TRUE))

knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.width = 8.5,
  fig.height = 4.8,
  dpi = 150,
  out.width = '100%'
)

# ==========================================================
# 1. General parameters
# ==========================================================
DATA_PATH <- "EX_DATA.xlsx"       # Change if the file is in another path
base_out_dir <- "outputs_pcharts_rhtml"
dir.create(base_out_dir, showWarnings = FALSE, recursive = TRUE)

exploratory <- 200      # initial reference window
window_size <- 1000     # sequential window size
m_fuzz      <- 2        # FCM fuzzification parameter
k_sigma     <- 3        # Shewhart-type limit width

k_min <- 2
k_max <- 10
K_MAX <- 10

# K_ELEGIDO define la solución final que se desea visualizar de forma general.
# This study recommends starting with 4 clusters, but it can be changed
# to 5, 6, ..., 10 if the validation indices and operational interpretation
# justify a more granular partition.
K_ELEGIDO <- 4

# Output sizes adjusted to Letter/A4 paper.
# These values prevent oversized figures when exporting
# o al compilar/imprimir el RHTML.
FIG_W_STD <- 8.5       # standard width for individual plots
FIG_H_STD <- 4.8       # standard height for individual plots
FIG_W_LAND <- 10.5     # maximum recommended width in landscape page
FIG_H_LAND <- 7.2      # maximum recommended height in landscape page
FIG_W_PORT <- 7.5      # ancho recomendado en hoja vertical
FIG_H_PORT <- 9.8      # alto recomendado en hoja vertical
DPI_EXPORT <- 180      # sufficient report resolution without excessive file sizes
PLOT_BASE_SIZE <- 10   # compact base size so legends and facets fit on the page
if(K_ELEGIDO < 4){
  stop("K_ELEGIDO must be 4 or greater according to the analysis request.")
}
if(K_ELEGIDO < k_min || K_ELEGIDO > k_max){
  stop("K_ELEGIDO must be within the k_min:k_max range.")
}

CTRL_LTY <- c("UCL" = "longdash", "LCL" = "dotdash", "CCL" = "solid")
CTRL_LWD <- c("UCL" = 1.40, "LCL" = 1.20, "CCL" = 0.85)
CTRL_ALPHA <- c("UCL" = 0.95, "LCL" = 0.95, "CCL" = 0.55)

4. Auxiliary functions

# ==========================================================
# 2. Auxiliary functions for plots, windows, FCM, and indices
# ==========================================================
get_cluster_palette <- function(k){
  cols <- scales::hue_pal()(k)
  names(cols) <- as.character(seq_len(k))
  cols
}

limits_long <- function(out_df){
  dplyr::bind_rows(
    data.frame(week = out_df$week, limit = "UCL", value = out_df$UCL),
    data.frame(week = out_df$week, limit = "LCL", value = out_df$LCL),
    data.frame(week = out_df$week, limit = "CCL", value = out_df$CCL)
  ) %>%
    dplyr::mutate(
      limit = factor(limit, levels = c("UCL", "LCL", "CCL")),
      linewidth = unname(CTRL_LWD[as.character(limit)]),
      alpha     = unname(CTRL_ALPHA[as.character(limit)])
    )
}

make_windows <- function(N, exploratory = 200, window_size = 1000){
  windows <- list()
  windows[[1]] <- list(id = 0, start = 1, end = min(exploratory, N))
  idx <- exploratory + 1
  wid <- 1
  while(idx <= N){
    endi <- min(idx + window_size - 1, N)
    windows[[length(windows) + 1]] <- list(id = wid, start = idx, end = endi)
    idx <- endi + 1
    wid <- wid + 1
  }
  windows
}

run_fcm_1d <- function(x, centers = 3, m = 2, seed = 1){
  set.seed(seed)
  fit <- e1071::cmeans(
    as.matrix(x), centers = centers, m = m,
    iter.max = 300, verbose = FALSE, method = "cmeans"
  )
  list(centers = as.numeric(fit$centers), U = fit$membership, fit = fit)
}

cluster_stats_pchart <- function(x, n_i, U, m = 2){
  Um <- U^m
  wsum <- colSums(Um) + 1e-12
  mu <- colSums(Um * x) / wsum

  diff2 <- (matrix(x, nrow = length(x), ncol = ncol(U)) -
              matrix(mu, nrow = length(x), ncol = ncol(U), byrow = TRUE))^2
  sig2_total <- colSums(Um * diff2) / wsum
  nbar <- colSums(Um * n_i) / wsum

  sig2_between <- pmax(
    0,
    (sig2_total - (mu * (1 - mu)) / nbar) / (1 - 1 / nbar)
  )

  list(mu = mu, sig2_total = sig2_total, nbar = nbar, sig2_between = sig2_between)
}

reorder_by_mu <- function(mu, U, centers, sig2_total, nbar, sig2_between){
  ord <- order(mu)
  list(
    centers = centers[ord],
    U = U[, ord, drop = FALSE],
    mu = mu[ord],
    sig2_total = sig2_total[ord],
    nbar = nbar[ord],
    sig2_between = sig2_between[ord]
  )
}

limits_regime_adaptive <- function(x, n_i, U, mu, sig2_between, k = 3){
  j_star <- max.col(U, ties.method = "first")
  mu_star <- mu[j_star]
  sig2b_star <- sig2_between[j_star]
  var_hat <- (mu_star * (1 - mu_star)) / n_i + sig2b_star * (1 - 1 / n_i)
  sd_hat <- sqrt(pmax(var_hat, 0))
  ucl <- pmin(1, pmax(0, mu_star + k * sd_hat))
  lcl <- pmin(1, pmax(0, mu_star - k * sd_hat))
  list(cluster = j_star, lcl = lcl, ucl = ucl, cl = mu_star, var_hat = var_hat)
}

cluster_limits_rep <- function(mu, nbar, sig2_between, k = 3){
  var_rep <- (mu * (1 - mu)) / nbar + sig2_between * (1 - 1 / nbar)
  sd_rep  <- sqrt(pmax(var_rep, 0))
  data.frame(
    cluster = seq_along(mu),
    CL_rep  = mu,
    LCL_rep = pmin(1, pmax(0, mu - k * sd_rep)),
    UCL_rep = pmin(1, pmax(0, mu + k * sd_rep))
  )
}

p_fuzzy_global <- function(mu, dom_count){
  if(sum(dom_count) == 0) return(NA_real_)
  w <- dom_count / sum(dom_count)
  sum(w * mu)
}

enforce_u_cols <- function(df_in, K_MAX = 10, after_col = "signal"){
  u_cols <- paste0("u_", 1:K_MAX)
  for(uc in u_cols){
    if(!uc %in% names(df_in)) df_in[[uc]] <- NA_real_
  }
  df_in[u_cols] <- lapply(df_in[u_cols], function(v) as.numeric(v))
  if(after_col %in% names(df_in)){
    df_in <- df_in %>% dplyr::relocate(dplyr::all_of(u_cols), .after = dplyr::all_of(after_col))
  } else {
    df_in <- df_in %>% dplyr::relocate(dplyr::all_of(u_cols), .after = dplyr::last_col())
  }
  df_in
}

# ---------- Validation indices ----------
partition_coefficient <- function(U){
  mean(rowSums(U^2))
}

partition_entropy <- function(U){
  U_safe <- pmax(U, 1e-12)
  -mean(rowSums(U_safe * log(U_safe)))
}

modified_partition_coefficient <- function(U){
  c <- ncol(U)
  pc <- partition_coefficient(U)
  (pc - 1 / c) / (1 - 1 / c)
}

davies_bouldin_1d <- function(x, clusters){
  labs <- sort(unique(clusters))
  centers <- sapply(labs, function(g) mean(x[clusters == g]))
  S <- sapply(labs, function(g) mean(abs(x[clusters == g] - mean(x[clusters == g]))))
  R <- matrix(NA_real_, length(labs), length(labs))
  for(a in seq_along(labs)){
    for(b in seq_along(labs)){
      if(a != b){
        R[a, b] <- (S[a] + S[b]) / (abs(centers[a] - centers[b]) + 1e-12)
      }
    }
  }
  mean(apply(R, 1, max, na.rm = TRUE))
}

calinski_harabasz_1d <- function(x, clusters){
  N <- length(x)
  labs <- sort(unique(clusters))
  K <- length(labs)
  grand_mean <- mean(x)
  B <- sum(sapply(labs, function(g) sum(clusters == g) * (mean(x[clusters == g]) - grand_mean)^2))
  W <- sum(sapply(labs, function(g) sum((x[clusters == g] - mean(x[clusters == g]))^2)))
  (B / (K - 1)) / (W / (N - K) + 1e-12)
}

silhouette_hard_1d <- function(x, clusters){
  if(length(unique(clusters)) < 2) return(NA_real_)
  sil <- cluster::silhouette(clusters, dist(as.matrix(x)))
  mean(sil[, "sil_width"])
}

gap_approx_1d <- function(x, K, B = 20, m = 2){
  # Gap-type approximation using FCM: compares observed log(Wk) against uniform references.
  W_obs <- local({
    fcm <- run_fcm_1d(x, centers = K, m = m, seed = 100 + K)
    U <- fcm$U; centers <- fcm$centers
    sum((U^m) * (as.numeric(outer(x, centers, "-"))^2))
  })
  refs <- replicate(B, {
    xr <- runif(length(x), min(x), max(x))
    fcmr <- run_fcm_1d(xr, centers = K, m = m, seed = sample.int(1e6, 1))
    sum((fcmr$U^m) * (as.numeric(outer(xr, fcmr$centers, "-"))^2))
  })
  mean(log(refs + 1e-12)) - log(W_obs + 1e-12)
}

5. Data loading and preparation

# ==========================================================
# 3. Dataset loading and preparation
# ==========================================================
if(!file.exists(DATA_PATH)){
  stop(paste0(
    "File not found: ", DATA_PATH,
    ". Place EX_DATA.xlsx in the working directory or update DATA_PATH."
  ))
}

EX_DATA <- readxl::read_excel(DATA_PATH)

required_cols <- c(
  "Semana",
  "Número de muestras tomadas",
  "Número de muestras descartadas (no viables)"
)
missing_cols <- setdiff(required_cols, names(EX_DATA))
if(length(missing_cols) > 0){
  stop(paste("Missing columns in the Excel file:", paste(missing_cols, collapse = ", ")))
}

df <- EX_DATA %>%
  dplyr::rename(
    week = `Semana`,
    n_taken = `Número de muestras tomadas`,
    x_discarded = `Número de muestras descartadas (no viables)`
  ) %>%
  dplyr::mutate(
    week = as.integer(round(week)),
    n = as.integer(round(n_taken)),
    X = as.integer(round(x_discarded)),
    n = pmax(n, 1L),
    X = pmin(pmax(X, 0L), n),
    p = X / n
  ) %>%
  dplyr::select(week, n, X, p)

N <- nrow(df)
windows <- make_windows(N, exploratory, window_size)

knitr::kable(head(df, 10), caption = "First observations prepared for the FCA p'-chart")

First observations prepared for the FCA p'-chart
week	n	X	p
1	13477	3850	0.2856719
2	12857	5330	0.4145602
3	12559	2690	0.2141890
4	10223	10223	1.0000000
5	11894	3950	0.3321002
6	13240	4340	0.3277946
7	12980	7830	0.6032357
8	10439	4590	0.4396973
9	8742	4720	0.5399222
10	11331	3010	0.2656429

data_quality <- data.frame(
  observations = N,
  min_week = min(df$week, na.rm = TRUE),
  max_week = max(df$week, na.rm = TRUE),
  min_n = min(df$n, na.rm = TRUE),
  max_n = max(df$n, na.rm = TRUE),
  min_p = min(df$p, na.rm = TRUE),
  mean_p = mean(df$p, na.rm = TRUE),
  max_p = max(df$p, na.rm = TRUE),
  windows_created = length(windows)
)
knitr::kable(data_quality, digits = 4, caption = "Data quality and dataset structure summary")

Data quality and dataset structure summary
observations	min_week	max_week	min_n	max_n	min_p	mean_p	max_p	windows_created
11539	1	11539	152	29670	0.0016	0.5912	1	13

6. Overdispersion diagnosis

A global variance ratio is computed. If the observed variance of the proportions substantially exceeds the expected binomial variance, there is evidence of overdispersion.

# ==========================================================
# 4. Analytical and graphical overdispersion diagnosis
# ==========================================================
p_pool <- sum(df$X) / sum(df$n)
sampling_var <- mean(p_pool * (1 - p_pool) / df$n)
total_var <- stats::var(df$p)
variance_ratio <- total_var / sampling_var

od_table <- data.frame(
  N = N,
  p_pool = p_pool,
  sampling_variance = sampling_var,
  total_variance = total_var,
  variance_ratio = variance_ratio,
  threshold_Heiman = 1.357,
  diagnosis = ifelse(variance_ratio > 1.357, "Overdispersion detected", "No severe overdispersion")
)
knitr::kable(od_table, digits = 6, caption = "Analytical overdispersion diagnosis")

Analytical overdispersion diagnosis
N	p_pool	sampling_variance	total_variance	variance_ratio	threshold_Heiman	diagnosis
11539	0.556322	3.2e-05	0.087079	2716.212	1.357	Overdispersion detected

# Arcsine-square-root transformation for visual diagnosis
od_plot_df <- df %>%
  dplyr::mutate(z = asin(sqrt(p)))

ggplot(od_plot_df, aes(sample = z)) +
  stat_qq(size = 1.2, alpha = 0.65) +
  stat_qq_line(linewidth = 0.55) +
  labs(
    title = "Graphical overdispersion diagnosis",
    subtitle = "Normal Q-Q plot of the arcsine-square-root transformation",
    x = "Theoretical quantiles", y = "Transformed sample quantiles"
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE)

7. Validation of the number of clusters

# ==========================================================
# 5. Validation indices for K = 2,...,10
# ==========================================================
set.seed(123)
validity_rows <- list()
x_all <- df$p

for(K in k_min:k_max){
  fcm <- run_fcm_1d(x_all, centers = K, m = m_fuzz, seed = 1000 + K)
  U <- fcm$U
  clusters <- max.col(U, ties.method = "first")

  validity_rows[[length(validity_rows) + 1]] <- data.frame(
    K = K,
    PC = partition_coefficient(U),
    PE = partition_entropy(U),
    MPC = modified_partition_coefficient(U),
    SIL_F_approx = silhouette_hard_1d(x_all, clusters),
    DB = davies_bouldin_1d(x_all, clusters),
    CH = calinski_harabasz_1d(x_all, clusters),
    GAP_approx = gap_approx_1d(x_all, K, B = 20, m = m_fuzz)
  )
}

validity_indices <- dplyr::bind_rows(validity_rows)
knitr::kable(validity_indices, digits = 4, caption = "Cluster validation indices")

Cluster validation indices
K	PC	PE	MPC	SIL_F_approx	DB	CH	GAP_approx
2	0.8726	0.2177	0.7451	0.6673	0.4502	40716.30	0.0835
3	0.8236	0.3220	0.7355	0.6302	0.4904	55698.78	0.1353
4	0.7910	0.3972	0.7213	0.6067	0.5070	68179.34	0.1367
5	0.7880	0.4189	0.7350	0.6161	0.4882	86885.24	0.1985
6	0.7745	0.4568	0.7294	0.6112	0.4922	101313.90	0.1973
7	0.7676	0.4801	0.7289	0.6074	0.4936	115481.84	0.2040
8	0.7682	0.4876	0.7351	0.6155	0.4899	135049.97	0.2311
9	0.7644	0.5025	0.7349	0.6153	0.4850	151716.67	0.2427
10	0.7585	0.5198	0.7317	0.6105	0.4899	165648.00	0.2369

validity_long <- validity_indices %>%
  tidyr::pivot_longer(cols = -K, names_to = "Index", values_to = "Value")

ggplot(validity_long, aes(x = K, y = Value)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 2) +
  facet_wrap(~ Index, scales = "free_y", ncol = 2) +
  scale_x_continuous(breaks = k_min:k_max) +
  labs(
    title = "Validation of the number of clusters",
    subtitle = "PC, MPC, SIL_F, and CH are maximized; PE and DB are minimized; GAP is analyzed by maximum value or elbow criterion",
    x = "Number of clusters K", y = "Index value"
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE)

best_summary <- data.frame(
  criterion = c("max_PC", "min_PE", "max_MPC", "max_SIL_F", "min_DB", "max_CH", "max_GAP"),
  selected_K = c(
    validity_indices$K[which.max(validity_indices$PC)],
    validity_indices$K[which.min(validity_indices$PE)],
    validity_indices$K[which.max(validity_indices$MPC)],
    validity_indices$K[which.max(validity_indices$SIL_F_approx)],
    validity_indices$K[which.min(validity_indices$DB)],
    validity_indices$K[which.max(validity_indices$CH)],
    validity_indices$K[which.max(validity_indices$GAP_approx)]
  )
)
knitr::kable(best_summary, caption = "K suggested by each criterion")

K suggested by each criterion
criterion	selected_K
max_PC	2
min_PE	2
max_MPC	2
max_SIL_F	2
min_DB	2
max_CH	10
max_GAP	9

write_xlsx(list(Cluster_Validity = validity_indices, Best_K_By_Criterion = best_summary),
           path = file.path(base_out_dir, "cluster_validity_indices.xlsx"))

8. Main execution of the FCA p'-chart

The routine loops over \(K=2,\ldots,10\), divides the series into an initial exploratory window and sequential windows, computes FCM, estimates cluster-level parameters, constructs adaptive limits, and exports the results.

8.1. Calculations performed within each window

# ==========================================================
# 6. Main loop: FCA p'-chart for K = 2,...,10
# ==========================================================
master_detail_rows <- list()
master_summary_cluster_rows <- list()
master_summary_window_pfuzzy_rows <- list()
master_summary_window_rows <- list()
master_cluster_dist_rows <- list()

for(c_clusters in k_min:k_max){

  message("Running c_clusters = ", c_clusters)

  out_dir <- file.path(base_out_dir, sprintf("K_%02d", c_clusters))
  dir.create(out_dir, showWarnings = FALSE, recursive = TRUE)

  out_pdf <- file.path(out_dir, sprintf("pcharts_fuzzy_windowing_report_%dclusters.pdf", c_clusters))
  cluster_cols <- get_cluster_palette(c_clusters)

  detail_rows <- list()
  summary_rows <- list()
  summary_window_rows <- list()

  grDevices::pdf(out_pdf, width = FIG_W_LAND, height = FIG_H_LAND, onefile = TRUE)

  for(w in windows){
    wid <- w$id
    s <- w$start
    e <- w$end
    ww <- df[s:e, , drop = FALSE]
    x <- ww$p
    n_i <- ww$n

    fcm <- run_fcm_1d(x, centers = c_clusters, m = m_fuzz, seed = wid + 1)
    stats <- cluster_stats_pchart(x, n_i, fcm$U, m = m_fuzz)

    re <- reorder_by_mu(
      stats$mu, fcm$U, fcm$centers,
      stats$sig2_total, stats$nbar, stats$sig2_between
    )

    U <- re$U
    mu <- re$mu
    sig2_total <- re$sig2_total
    nbar <- re$nbar
    sig2_between <- re$sig2_between

    lim <- limits_regime_adaptive(x, n_i, U, mu, sig2_between, k = k_sigma)

    out <- ww %>%
      dplyr::mutate(
        window_id = wid,
        idx_global = s:e,
        k_clusters = c_clusters,
        cluster = lim$cluster,
        CCL = lim$cl,
        LCL = lim$lcl,
        UCL = lim$ucl,
        var_hat = lim$var_hat,
        signal = (p > UCL) | (p < LCL),
        cluster_f = factor(cluster, levels = 1:c_clusters),
        ooc_flag = signal,
        ooc_lab = factor(ifelse(ooc_flag, "OOC", NA_character_), levels = "OOC")
      )

    for(j in 1:K_MAX){
      out[[paste0("u_", j)]] <- if(j <= c_clusters) U[, j] else NA_real_
    }
    out <- enforce_u_cols(out, K_MAX = K_MAX, after_col = "signal")
    detail_rows[[length(detail_rows) + 1]] <- out

    rep_limits <- cluster_limits_rep(mu, nbar, sig2_between, k = k_sigma)
    dom_counts <- integer(c_clusters)

    for(j in 1:c_clusters){
      dom_count <- sum(out$cluster == j)
      dom_counts[j] <- dom_count
      avg_max_mem <- if(dom_count > 0) mean(out[[paste0("u_", j)]][out$cluster == j]) else NA_real_
      signals_j <- sum(out$signal & out$cluster == j)
      rep_row <- rep_limits %>% dplyr::filter(cluster == j)

      summary_rows[[length(summary_rows) + 1]] <- data.frame(
        k_clusters = c_clusters,
        window_id = wid,
        start_idx = s,
        end_idx = e,
        n_obs = nrow(out),
        cluster = j,
        mu_hat = mu[j],
        sig2_total = sig2_total[j],
        nbar = nbar[j],
        sig2_between_hat = sig2_between[j],
        CL_rep = rep_row$CL_rep,
        LCL_rep = rep_row$LCL_rep,
        UCL_rep = rep_row$UCL_rep,
        dominant_count = dom_count,
        avg_max_membership = avg_max_mem,
        signals = signals_j
      )
    }

    p_fg <- p_fuzzy_global(mu, dom_counts)
    summary_window_rows[[length(summary_window_rows) + 1]] <- data.frame(
      k_clusters = c_clusters,
      window_id = wid,
      start_idx = s,
      end_idx = e,
      n_obs = nrow(out),
      p_fuzzy_global = p_fg
    )

    # ---------- Plot 1: FCA p-chart by window ----------
    lim_df <- limits_long(out)
    x_start <- min(out$week)
    x_end <- max(out$week)

    g1 <- ggplot(out, aes(x = week)) +
      geom_line(
        data = lim_df,
        aes(x = week, y = value, linetype = limit, group = limit,
            linewidth = linewidth, alpha = alpha),
        color = "black"
      ) +
      scale_linetype_manual(name = "Control limits", values = CTRL_LTY) +
      scale_linewidth_identity(guide = "none") +
      scale_alpha_identity(guide = "none") +
      geom_line(aes(y = p, color = cluster_f), linewidth = 0.35, alpha = 0.9) +
      geom_point(aes(y = p, color = cluster_f), size = 0.75, alpha = 0.9) +
      geom_point(
        data = out %>% dplyr::filter(ooc_flag),
        aes(y = p, shape = ooc_lab),
        color = "red", size = 1.5, stroke = 0.8
      ) +
      geom_hline(yintercept = -0.02, color = "black", linewidth = 0.55) +
      scale_color_manual(values = cluster_cols, limits = as.character(1:c_clusters),
                         name = "Cluster", drop = FALSE) +
      scale_shape_manual(name = NULL, values = c("OOC" = 4), drop = FALSE) +
      coord_cartesian(ylim = c(-0.02, 1.02)) +
      labs(
        title = sprintf("Window %d (rows %d-%d, n=%d): FCA p-Chart - %d clusters",
                        wid, s, e, nrow(out), c_clusters),
        x = "Weeks", y = expression(p[i])
      ) +
      scale_x_continuous(breaks = c(x_start, x_end), labels = c(x_start, x_end),
                         minor_breaks = NULL, expand = c(0, 0)) +
      theme_classic(base_size = PLOT_BASE_SIZE) +
      theme(plot.title = element_text(face = "bold", size = 12), legend.position = "right",
            axis.text.x = element_text(size = 8), axis.ticks.x = element_blank(),
            axis.line.y = element_line(color = "black", linewidth = 0.55),
            axis.line.x = element_blank())

    print(g1)
    ggsave(file.path(out_dir, sprintf("window_%02d_pchart_%dclusters.png", wid, c_clusters)),
           g1, width = FIG_W_LAND, height = FIG_H_LAND, units = "in", dpi = DPI_EXPORT)

    # ---------- Plot 2: membership diagnostic ----------
    dom_u <- sapply(1:nrow(out), function(i) U[i, out$cluster[i]])
    out2 <- out %>% dplyr::mutate(dom_membership = dom_u)

    g2 <- ggplot(out2, aes(x = p, y = dom_membership, color = cluster_f)) +
      geom_point(size = 1.3, alpha = 0.75) +
      scale_color_manual(values = cluster_cols, limits = as.character(1:c_clusters),
                         name = "Cluster", drop = FALSE) +
      coord_cartesian(xlim = c(-0.02, 1.02), ylim = c(-0.02, 1.02)) +
      labs(
        title = sprintf("Window %d: Membership diagnostics - %d clusters", wid, c_clusters),
        subtitle = "Dominant membership degree vs observed proportion p",
        x = expression(p[i]), y = "Dominant membership degree"
      ) +
      theme_classic(base_size = PLOT_BASE_SIZE) +
      theme(plot.title = element_text(face = "bold", size = 12), legend.position = "right")

    print(g2)
    ggsave(file.path(out_dir, sprintf("window_%02d_membership_%dclusters.png", wid, c_clusters)),
           g2, width = FIG_W_LAND, height = FIG_H_LAND, units = "in", dpi = DPI_EXPORT)
  }

  grDevices::dev.off()

  # ---------- Export by K ----------
  detail_all <- dplyr::bind_rows(detail_rows) %>%
    dplyr::mutate(cluster_f = factor(cluster, levels = 1:c_clusters))
  detail_all <- enforce_u_cols(detail_all, K_MAX = K_MAX, after_col = "signal")

  summary_cluster_all <- dplyr::bind_rows(summary_rows)
  summary_window_pfuzzy <- dplyr::bind_rows(summary_window_rows)

  summary_window_all <- detail_all %>%
    dplyr::group_by(k_clusters, window_id) %>%
    dplyr::summarise(
      n_obs = dplyr::n(),
      signals = sum(signal),
      signal_rate = mean(signal),
      p_mean = mean(p),
      p_sd = sd(p),
      .groups = "drop"
    )

  cluster_dist <- detail_all %>%
    dplyr::group_by(k_clusters, window_id, cluster_f) %>%
    dplyr::summarise(count = dplyr::n(), signals = sum(signal), .groups = "drop") %>%
    dplyr::group_by(k_clusters, window_id) %>%
    dplyr::mutate(pct_within_window = count / sum(count)) %>%
    dplyr::ungroup()

  writexl::write_xlsx(
    list(
      All_Observations = detail_all,
      Window_Cluster_Summary = summary_cluster_all,
      Window_pFuzzy_Global = summary_window_pfuzzy,
      Window_Summary = summary_window_all,
      Window_Cluster_Distribution = cluster_dist
    ),
    path = file.path(out_dir, sprintf("FCA_p_results_%dclusters.xlsx", c_clusters))
  )

  # ---------- Global plots by K ----------
  lim_df_all <- dplyr::bind_rows(
    data.frame(idx_global = detail_all$idx_global, window_id = detail_all$window_id,
               limit = "UCL", value = detail_all$UCL),
    data.frame(idx_global = detail_all$idx_global, window_id = detail_all$window_id,
               limit = "LCL", value = detail_all$LCL),
    data.frame(idx_global = detail_all$idx_global, window_id = detail_all$window_id,
               limit = "CCL", value = detail_all$CCL)
  ) %>%
    dplyr::mutate(limit = factor(limit, levels = c("UCL", "LCL", "CCL")),
                  linewidth = unname(CTRL_LWD[as.character(limit)]),
                  alpha = unname(CTRL_ALPHA[as.character(limit)]))

  g_global_facet <- ggplot(detail_all, aes(x = idx_global)) +
    geom_line(data = lim_df_all,
              aes(x = idx_global, y = value, linetype = limit,
                  group = interaction(window_id, limit), linewidth = linewidth, alpha = alpha),
              color = "black") +
    scale_linetype_manual(name = "Control limits", values = CTRL_LTY) +
    scale_linewidth_identity(guide = "none") +
    scale_alpha_identity(guide = "none") +
    geom_point(aes(y = p, color = cluster_f), size = 1.0, alpha = 0.85) +
    geom_point(data = detail_all %>% dplyr::filter(signal),
               aes(y = p, shape = "OOC"), color = "red", size = 2.0, stroke = 1.0) +
    scale_color_manual(values = cluster_cols, limits = as.character(1:c_clusters),
                       name = "Cluster", drop = FALSE) +
    scale_shape_manual(name = NULL, values = c("OOC" = 4)) +
    facet_wrap(~ window_id, scales = "free_x", ncol = 2) +
    coord_cartesian(ylim = c(-0.02, 1.02)) +
    labs(title = sprintf("Global FCA p-Chart faceted by window - %d clusters", c_clusters),
         subtitle = "Limits computed window-by-window", x = "Global index", y = expression(p[i])) +
    theme_minimal(base_size = PLOT_BASE_SIZE)

  ggsave(file.path(out_dir, sprintf("GLOBAL_pchart_facet_by_window_%dclusters.png", c_clusters)),
         g_global_facet, width = FIG_W_LAND, height = FIG_H_LAND, units = "in", dpi = DPI_EXPORT)

  g_global_single <- ggplot(detail_all, aes(x = idx_global)) +
    geom_line(aes(y = UCL), color = "black", linewidth = 0.35,
              alpha = CTRL_ALPHA["UCL"], linetype = CTRL_LTY["UCL"]) +
    geom_line(aes(y = LCL), color = "black", linewidth = 0.35,
              alpha = CTRL_ALPHA["LCL"], linetype = CTRL_LTY["LCL"]) +
    geom_line(aes(y = CCL), color = "black", linewidth = 0.25,
              alpha = CTRL_ALPHA["CCL"], linetype = CTRL_LTY["CCL"]) +
    geom_point(aes(y = p, color = cluster_f), size = 0.6, alpha = 0.6) +
    geom_point(data = detail_all %>% dplyr::filter(signal),
               aes(y = p, shape = "OOC"), color = "red", size = 1.6, stroke = 0.9) +
    scale_color_manual(values = cluster_cols, limits = as.character(1:c_clusters),
                       name = "Cluster", drop = FALSE) +
    scale_shape_manual(name = NULL, values = c("OOC" = 4)) +
    coord_cartesian(ylim = c(-0.02, 1.02)) +
    labs(title = sprintf("Global FCA p-Chart single view - %d clusters", c_clusters),
         subtitle = "All observations in one panel", x = "Global index", y = expression(p[i])) +
    theme_minimal(base_size = PLOT_BASE_SIZE)

  ggsave(file.path(out_dir, sprintf("GLOBAL_pchart_single_%dclusters.png", c_clusters)),
         g_global_single, width = FIG_W_LAND, height = FIG_H_LAND, units = "in", dpi = DPI_EXPORT)

  master_detail_rows[[length(master_detail_rows) + 1]] <- detail_all
  master_summary_cluster_rows[[length(master_summary_cluster_rows) + 1]] <- summary_cluster_all
  master_summary_window_pfuzzy_rows[[length(master_summary_window_pfuzzy_rows) + 1]] <- summary_window_pfuzzy
  master_summary_window_rows[[length(master_summary_window_rows) + 1]] <- summary_window_all
  master_cluster_dist_rows[[length(master_cluster_dist_rows) + 1]] <- cluster_dist
}

9. Master summary and final comparative plots

# ==========================================================
# 7. Master export and summary plots
# ==========================================================
master_detail_all <- dplyr::bind_rows(master_detail_rows)
master_summary_cluster_all <- dplyr::bind_rows(master_summary_cluster_rows)
master_summary_window_pfuzzy_all <- dplyr::bind_rows(master_summary_window_pfuzzy_rows)
master_summary_window_all <- dplyr::bind_rows(master_summary_window_rows)
master_cluster_dist_all <- dplyr::bind_rows(master_cluster_dist_rows)

master_detail_all <- enforce_u_cols(master_detail_all, K_MAX = K_MAX, after_col = "signal")

master_xlsx <- file.path(base_out_dir, sprintf("MASTER_FCA_p_results_K%02d_to_K%02d.xlsx", k_min, k_max))
writexl::write_xlsx(
  list(
    All_Observations_AllK = master_detail_all,
    Window_Cluster_Summary_AllK = master_summary_cluster_all,
    Window_pFuzzy_Global_AllK = master_summary_window_pfuzzy_all,
    Window_Summary_AllK = master_summary_window_all,
    Window_Cluster_Distribution_AllK = master_cluster_dist_all,
    Cluster_Validity = validity_indices
  ),
  path = master_xlsx
)

signal_by_K <- master_summary_window_all %>%
  dplyr::group_by(k_clusters) %>%
  dplyr::summarise(
    total_signals = sum(signals),
    mean_signal_rate = mean(signal_rate),
    .groups = "drop"
  )

knitr::kable(signal_by_K, digits = 4, caption = "OOC signal summary by number of clusters")

OOC signal summary by number of clusters
k_clusters	total_signals	mean_signal_rate
2	0	0.0000
3	6	0.0005
4	99	0.0093
5	159	0.0139
6	156	0.0131
7	171	0.0142
8	154	0.0129
9	145	0.0125
10	150	0.0129

ggplot(signal_by_K, aes(x = k_clusters, y = total_signals)) +
  geom_line(linewidth = 0.55) +
  geom_point(size = 2.5) +
  scale_x_continuous(breaks = k_min:k_max) +
  labs(
    title = "Total out-of-control signals by K",
    subtitle = "Compares the sensitivity of the FCA p'-chart against the number of clusters",
    x = "Number of clusters K", y = "Total OOC signals"
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE)

# Simple automatic recommendation: lowest number of signals with validity balance.
recommended_K_by_signals <- signal_by_K$k_clusters[which.min(signal_by_K$total_signals)]
cat("Output directory:", normalizePath(base_out_dir), "\n")

## Output directory: /Users/jeanpaul/Documents/Master's/Software_registration/FCA_p-chart/outputs_pcharts_rhtml

cat("Archivo maestro:", normalizePath(master_xlsx), "\n")

## Archivo maestro: /Users/jeanpaul/Documents/Master's/Software_registration/FCA_p-chart/outputs_pcharts_rhtml/MASTER_FCA_p_results_K02_to_K10.xlsx

cat("K selected for the general plot:", K_ELEGIDO, "\n")

## K selected for the general plot: 4

cat("K with the lowest number of OOC signals:", recommended_K_by_signals, "\n")

## K with the lowest number of OOC signals: 2

10. General FCA p'-chart plot for the selected number of clusters

The general plot makes it possible to observe four elements simultaneously: the observed proportion \(p_i\), the dominant cluster of each observation, the adaptive limits \(LCL_i, CCL_i, UCL_i\), and the out-of-control signals. Mathematically, for the selected number of clusters \(K^*\), each observation is represented by:

Where \(j_i^*\) is the dominant cluster. The main reading is: if a red point appears outside the limits, the deviation should be interpreted relative to the operational regime to which it belongs, not relative to a single global mean.

# ==========================================================
# 8. General plot for the selected number of clusters
# ==========================================================
selected_detail <- master_detail_all %>%
  dplyr::filter(k_clusters == K_ELEGIDO) %>%
  dplyr::mutate(
    cluster_f = factor(cluster, levels = 1:K_ELEGIDO),
    signal_label = factor(ifelse(signal, "OOC", NA_character_), levels = "OOC")
  )

if(nrow(selected_detail) == 0){
  stop("No hay resultados disponibles para K_ELEGIDO. Verifique que el loop principal haya incluido ese K.")
}

selected_detail$dom_membership <- mapply(
  FUN = function(row_id, cl){
    selected_detail[[paste0("u_", cl)]][row_id]
  },
  row_id = seq_len(nrow(selected_detail)),
  cl = selected_detail$cluster
)

selected_cols <- get_cluster_palette(K_ELEGIDO)

selected_summary <- selected_detail %>%
  dplyr::summarise(
    selected_K = unique(k_clusters),
    observations = dplyr::n(),
    windows = dplyr::n_distinct(window_id),
    total_signals = sum(signal),
    signal_rate = mean(signal),
    p_mean = mean(p),
    p_sd = sd(p),
    .groups = "drop"
  )

knitr::kable(selected_summary, digits = 4,
             caption = paste("General summary of the FCA p'-chart for selected K =", K_ELEGIDO))

General summary of the FCA p'-chart for selected K = 4
selected_K	observations	windows	total_signals	signal_rate	p_mean	p_sd
4	11539	13	99	0.0086	0.5912	0.2951

selected_cluster_summary <- selected_detail %>%
  dplyr::group_by(cluster) %>%
  dplyr::summarise(
    observations = dplyr::n(),
    porcentaje = observations / nrow(selected_detail),
    signals = sum(signal),
    tasa_signals = mean(signal),
    p_promedio = mean(p),
    CCL_promedio = mean(CCL),
    LCL_promedio = mean(LCL),
    UCL_promedio = mean(UCL),
    membresia_promedio = mean(dom_membership, na.rm = TRUE),
    .groups = "drop"
  )

knitr::kable(selected_cluster_summary, digits = 4,
             caption = paste("General characterization by cluster for selected K =", K_ELEGIDO))

General characterization by cluster for selected K = 4
cluster	observations	porcentaje	signals	tasa_signals	p_promedio	CCL_promedio	LCL_promedio	UCL_promedio	membresia_promedio
1	2759	0.2391	4	0.0014	0.2232	0.2198	0.0133	0.4372	0.8615
2	3192	0.2766	0	0.0000	0.4429	0.4411	0.2382	0.6439	0.8089
3	2323	0.2013	0	0.0000	0.6932	0.6923	0.4694	0.9153	0.8070
4	3265	0.2830	95	0.0291	0.9746	0.9811	0.8539	1.0000	0.9449

selected_limits_long <- dplyr::bind_rows(
  data.frame(idx_global = selected_detail$idx_global, window_id = selected_detail$window_id,
             limit = "UCL", value = selected_detail$UCL),
  data.frame(idx_global = selected_detail$idx_global, window_id = selected_detail$window_id,
             limit = "LCL", value = selected_detail$LCL),
  data.frame(idx_global = selected_detail$idx_global, window_id = selected_detail$window_id,
             limit = "CCL", value = selected_detail$CCL)
) %>%
  dplyr::mutate(
    limit = factor(limit, levels = c("UCL", "LCL", "CCL")),
    linewidth = unname(CTRL_LWD[as.character(limit)]),
    alpha = unname(CTRL_ALPHA[as.character(limit)])
  )

g_selected_global <- ggplot(selected_detail, aes(x = idx_global)) +
  geom_line(
    data = selected_limits_long,
    aes(x = idx_global, y = value, linetype = limit, group = limit,
        linewidth = linewidth, alpha = alpha),
    color = "black"
  ) +
  scale_linetype_manual(name = "Adaptive limits", values = CTRL_LTY) +
  scale_linewidth_identity(guide = "none") +
  scale_alpha_identity(guide = "none") +
  geom_point(aes(y = p, color = cluster_f), size = 0.8, alpha = 0.72) +
  geom_point(
    data = selected_detail %>% dplyr::filter(signal),
    aes(y = p, shape = signal_label),
    color = "red", size = 2.1, stroke = 1.0
  ) +
  scale_color_manual(values = selected_cols, limits = as.character(1:K_ELEGIDO),
                     name = "Dominant cluster", drop = FALSE) +
  scale_shape_manual(name = NULL, values = c("OOC" = 4), drop = FALSE) +
  coord_cartesian(ylim = c(-0.02, 1.02)) +
  labs(
    title = paste0("General FCA p'-chart plot for selected K = ", K_ELEGIDO),
    subtitle = "Full series: observed proportion, dominant cluster, adaptive limits, and OOC signals",
    x = "Global observation index",
    y = expression(p[i])
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE) +
  theme(
    plot.title = element_text(face = "bold", size = 12),
    legend.position = "right",
    panel.grid.minor = element_blank()
  )

print(g_selected_global)

ggsave(
  filename = file.path(base_out_dir, sprintf("GENERAL_FCA_pchart_K_elegido_%02d.png", K_ELEGIDO)),
  plot = g_selected_global, width = FIG_W_LAND, height = FIG_H_LAND, units = "in", dpi = DPI_EXPORT
)

g_selected_cluster_bars <- selected_cluster_summary %>%
  tidyr::pivot_longer(cols = c(observations, signals), names_to = "type", values_to = "value") %>%
  ggplot(aes(x = factor(cluster), y = value, fill = type)) +
  geom_col(position = "dodge", alpha = 0.85) +
  labs(
    title = paste0("General distribution of observations and signals by cluster - K = ", K_ELEGIDO),
    subtitle = "Identifies which regimes concentrate more data and which concentrate more OOC signals",
    x = "Dominant cluster",
    y = "Count"
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE) +
  theme(plot.title = element_text(face = "bold", size = 12))

print(g_selected_cluster_bars)

ggsave(
  filename = file.path(base_out_dir, sprintf("GENERAL_cluster_distribution_K_elegido_%02d.png", K_ELEGIDO)),
  plot = g_selected_cluster_bars, width = FIG_W_STD, height = FIG_H_STD, units = "in", dpi = DPI_EXPORT
)

11. Sequential windowing strategy for the selected number of clusters

The strategy separates the monitoring horizon into an initial exploratory window and subsequent sequential windows. The first window makes it possible to observe the initial process behavior; the following windows support analysis of local stability, regime changes, and temporal concentration of OOC signals.

For each observation within a window, the model identifies the dominant cluster and evaluates the observed proportion against regime-conditioned limits:

\[ Signal_i= \begin{cases} 1, & p_i>UCL_i\;\text{o}\;p_i

Generated plot. The faceted figure shows one FCA p'-chart per window. Each panel displays the monitored proportion \(p_i\), the limits \(LCL_i\), \(CCL_i\), \(UCL_i\), the dominant cluster through color, and OOC signals through red markers. This visualization is the operational equivalent of the windowing plot in the Word document.

# ==========================================================
# 9. Sequential windowing strategy for K_ELEGIDO
# ==========================================================
# This section uses the results already computed in selected_detail.
# Por defecto K_ELEGIDO = 4, pero funciona para K_ELEGIDO >= 4.

selected_detail_windowed <- selected_detail %>%
  dplyr::mutate(
    window_label = ifelse(
      window_id == 0,
      paste0("Initial window (", exploratory, " obs.)"),
      paste0("Window ", window_id, ":(" window_size, " obs.)")
    ),
    window_label = factor(window_label, levels = unique(window_label))
  )

selected_window_summary <- selected_detail_windowed %>%
  dplyr::group_by(window_id, window_label) %>%
  dplyr::summarise(
    start_idx = min(idx_global),
    end_idx = max(idx_global),
    n_obs = dplyr::n(),
    clusters_presentes = paste(sort(unique(cluster)), collapse = ", "),
    total_signals = sum(signal),
    signal_rate = mean(signal),
    p_mean = mean(p),
    p_sd = sd(p),
    CCL_mean = mean(CCL),
    LCL_mean = mean(LCL),
    UCL_mean = mean(UCL),
    .groups = "drop"
  )

knitr::kable(
  selected_window_summary,
  digits = 4,
  caption = paste("Summary of the sequential windowing strategy for selected K =", K_ELEGIDO)
)

selected_limits_window_long <- dplyr::bind_rows(
  data.frame(idx_global = selected_detail_windowed$idx_global,
             window_label = selected_detail_windowed$window_label,
             limit = "UCL", value = selected_detail_windowed$UCL),
  data.frame(idx_global = selected_detail_windowed$idx_global,
             window_label = selected_detail_windowed$window_label,
             limit = "LCL", value = selected_detail_windowed$LCL),
  data.frame(idx_global = selected_detail_windowed$idx_global,
             window_label = selected_detail_windowed$window_label,
             limit = "CCL", value = selected_detail_windowed$CCL)
) %>%
  dplyr::mutate(
    limit = factor(limit, levels = c("UCL", "LCL", "CCL")),
    linewidth = unname(CTRL_LWD[as.character(limit)]),
    alpha = unname(CTRL_ALPHA[as.character(limit)])
  )

# Main windowing plot, similar to the figure in the Word document.
g_selected_window_facets <- ggplot(selected_detail_windowed, aes(x = idx_global)) +
  geom_line(
    data = selected_limits_window_long,
    aes(x = idx_global, y = value, linetype = limit,
        group = interaction(window_label, limit),
        linewidth = linewidth, alpha = alpha),
    color = "black"
  ) +
  scale_linetype_manual(name = "Adaptive limits", values = CTRL_LTY) +
  scale_linewidth_identity(guide = "none") +
  scale_alpha_identity(guide = "none") +
  geom_line(aes(y = p, color = cluster_f, group = window_label), linewidth = 0.25, alpha = 0.75) +
  geom_point(aes(y = p, color = cluster_f), size = 0.45, alpha = 0.85) +
  geom_point(
    data = selected_detail_windowed %>% dplyr::filter(signal),
    aes(y = p, shape = signal_label),
    color = "red", size = 1.4, stroke = 0.8
  ) +
  scale_color_manual(values = selected_cols,
                     limits = as.character(1:K_ELEGIDO),
                     name = "Dominant cluster",
                     drop = FALSE) +
  scale_shape_manual(name = NULL, values = c("OOC" = 4), drop = FALSE) +
  facet_wrap(~ window_label, scales = "free_x", ncol = 2) +
  coord_cartesian(ylim = c(-0.02, 1.02)) +
  labs(
    title = paste0("Sequential windowing strategy FCA p'-chart - selected K = ", K_ELEGIDO),
    subtitle = "Each panel recalculates and shows local behavior: p_i, dominant cluster, CCL, LCL, UCL, and OOC signals",
    x = "Global observation index within each window",
    y = expression(p[i])
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE) +
  theme(
    plot.title = element_text(face = "bold", size = 12),
    strip.text = element_text(face = "bold", size = 8.5),
    legend.position = "right",
    panel.grid.minor = element_blank()
  )

print(g_selected_window_facets)

ggsave(
  filename = file.path(base_out_dir, sprintf("VENTANEO_FCA_pchart_K_elegido_%02d.png", K_ELEGIDO)),
  plot = g_selected_window_facets,
  width = FIG_W_LAND, height = FIG_H_LAND, units = "in", dpi = DPI_EXPORT
)

# Auxiliary plot: OOC signals by window.
g_selected_signals_by_window <- selected_window_summary %>%
  ggplot(aes(x = factor(window_id), y = total_signals)) +
  geom_col(alpha = 0.85) +
  geom_text(aes(label = total_signals), vjust = -0.25, size = 3.5) +
  labs(
    title = paste0("OOC signals by window - selected K = ", K_ELEGIDO),
    subtitle = "Identifies critical windows and compares local process stability",
    x = "Ventana secuencial",
    y = "Number of OOC signals"
  ) +
  theme_minimal(base_size = PLOT_BASE_SIZE) +
  theme(plot.title = element_text(face = "bold", size = 12))

print(g_selected_signals_by_window)

ggsave(
  filename = file.path(base_out_dir, sprintf("VENTANEO_signals_by_window_K_elegido_%02d.png", K_ELEGIDO)),
  plot = g_selected_signals_by_window,
  width = FIG_W_STD, height = FIG_H_STD, units = "in", dpi = DPI_EXPORT
)

## Error in parse(text = input): <text>:12:41: unexpected symbol
## 11:       paste0("Initial window (", exploratory, " obs.)"),
## 12:       paste0("Window ", window_id, ":(" window_size
##                                             ^

Result and interpretation. The faceted plot makes it possible to read the process by windows, not only globally. If OOC signals concentrate in one or a few windows, the behavior suggests local episodes of instability. If signals are persistently distributed in the same cluster across several windows, the result suggests a structural operational condition associated with that regime. For K_ELEGIDO = 4, this figure shows the four latent clusters as dominant regimes within each window.

How to report it. The report may state that, after selecting the four-cluster solution, a sequential windowing strategy was applied, consisting of an initial exploratory window of 200 observations and subsequent windows of 1,000 observations. In each window, FCA limits conditioned by the dominant cluster were recalculated, making it possible to differentiate local OOC signals from fluctuations produced by regime mixing.

12. Inventory of generated outputs

When the document is executed, the following elements are generated:

Overdispersion plots: Q-Q plot with arcsine-square-root transformation.
Validation index table: PC, PE, MPC, approximate SIL.F, DB, CH, and approximate GAP.
Faceted index plot: sensitivity of each index against \(K\).
PDF for each K: report with p-charts by window and membership diagnostics.
PNG by window: FCA p-chart and dominant-membership plot.
Global PNG files: faceted view by window and single global view.
General PNG for selected K: GENERAL_FCA_pchart_K_elegido_XX.png and signal distribution by cluster.
Windowing PNG for selected K: VENTANEO_FCA_pchart_K_elegido_XX.png and VENTANEO_signals_by_window_K_elegido_XX.png.
Excel for each K: observations, memberships \(u_1,\ldots,u_{10}\), limits, signals, window summary, and cluster distribution.
Master Excel file: consolidated results for \(K=2,\ldots,10\).

13. Plot interpretation

Plot	What it shows	How to interpret it
Q-Q arcsine-square-root plot	Visual overdispersion diagnosis.	Strong deviations from the line indicate variability greater than binomial variability.
Validation indices	Relative partition quality for each K.	Look for convergence: high PC/MPC/SIL/CH and low PE/DB.
FCA p-chart by window	Series \(p_i\), dominant cluster, CCL, LCL, UCL, and OOC signals.	Red signals indicate observations outside the adaptive limit of their regime.
Membership diagnostics	Relationship between \(p_i\) and dominant membership.	High memberships imply clear assignment; medium/low memberships suggest transition between regimes.
Global faceted chart	Comparison of successive windows.	Useful for detecting local stability, temporal changes, and critical windows.
Sequential windowing for selected K	FCA p'-chart by window for the selected solution, default K=4.	Allows each window to be interpreted as a local context, with adaptive limits, dominant cluster, and OOC signals.
General plot for selected K	Full series with dominant cluster, CCL, LCL, UCL, and OOC signals.	This is the main figure for reporting process behavior with the selected solution, for example K=4 or more.
Total OOC by K	Method sensitivity to the number of clusters.	It should be analyzed together with validity indices to avoid choosing K only by minimizing signals.

14. Guide for reporting the results

Integrated reading of results. The methodology should be reported at four levels. First, confirm that the proportional data show overdispersion, which justifies not relying only on a classical p-chart. Second, validate the existence of latent regimes through clustering indices. Third, construct and show that FCA p'-chart limits adapt to each regime and reduce spurious signals. Fourth, analyze the remaining OOC signals as contextual events associated with specific windows and clusters.

Element to report	Result to extract from the RHTML	Technical interpretation
Overdispersion	Variance table and Q-Q plot	Justifies the use of adjusted limits and a fuzzy-regime approach.
Number of clusters	Index table and plot by K	Supports selection of an interpretable latent structure.
Parameters by cluster	Window_Cluster_Summary	Describes CCL, variability, effective size, and signals by regime.
OOC signals	All_Observations and Window_Summary	Identifies critical observations, window, cluster, and limits used.
Dominant membership	Membership diagnostics and u_j columns	Evaluates whether the signal clearly belongs to a regime or to a transitional zone.
Comparison across K	MASTER_FCA_p_results	Allows analysis of robustness, sensitivity, and possible over-segmentation.

Fuzzy Cluster-Adaptive p'-Chart for Overdispersed Data

Developed by

Jean Paul Morán-Zabala, Juan Miguel Cogollo-Flórez & Alexander Alberto Correa-Espinal

1. Methodological foundation