Multivariate Statistical Process Control (MSPC)

Hotelling T², MEWMA, and Multivariate Process Monitoring

Author

David Guy PhD: Quality Control Department

Published

December 20, 2025

Introduction to Multivariate SPC

Note: This analysis uses the qcr package as the quality control framework. The multivariate methods (Hotelling T² and MEWMA) are implemented using standard statistical formulas to ensure compatibility across different package versions and to provide educational transparency. All calculations follow established statistical theory and produce results equivalent to specialized multivariate SPC packages.

What is Multivariate SPC?

Multivariate Statistical Process Control (MSPC) monitors multiple related quality characteristics simultaneously, rather than tracking each variable independently.

Why Use Multivariate SPC?

Traditional (Univariate) Approach: - Monitor each variable with separate control charts - Example: Track temperature, pressure, flow rate separately - Problem: Misses interactions and correlations between variables

Multivariate Approach: - Monitor all variables together as a system - Detects patterns that univariate charts miss - Accounts for correlations between variables - More powerful for detecting process changes

Real-World Example

Chemical Reactor Process: - Temperature: 150°C +/- 5°C - Pressure: 200 psi +/- 10 psi
- pH: 7.0 +/- 0.5

Scenario 1 (Both in control individually): - Temperature: 152°C [OK] - within limits - Pressure: 195 psi [OK] - within limits - But: This combination is unusual and indicates a problem!

Multivariate SPC detects this because it knows the normal relationship between temperature and pressure.

Key Multivariate Methods

1. Hotelling T² Chart

Multivariate extension of X-bar chart
Monitors mean vector of multiple variables
One statistic for all variables combined
Most common multivariate control chart

2. MEWMA (Multivariate EWMA)

Multivariate exponentially weighted moving average
More sensitive to small sustained shifts
Better for detecting gradual process drift
Incorporates historical information

3. MCUSUM (Multivariate CUSUM)

Multivariate cumulative sum
Excellent for detecting small persistent shifts
Directional information for diagnosis

4. Principal Component Analysis (PCA)

Reduces dimensionality
Transforms correlated variables to uncorrelated components
T² and Q statistics for monitoring
Useful for high-dimensional data (10+ variables)

Setup and Data Import

Code

# Install packages if needed
packages <- c("qcr", "plotly", "gt", "dplyr", "tidyr", "ggplot2", 
              "readr", "lubridate", "DT", "knitr", "MASS", "psych")

# Load libraries
library(qcr)       # Unified quality control (replaces qcc + MSQC)
library(plotly)    # Interactive plots
library(gt)        # Tables
library(dplyr)     # Data manipulation
library(tidyr)     # Data tidying
library(ggplot2)   # Graphics
library(readr)     # Data import
library(lubridate) # Dates
library(DT)        # Interactive tables
library(knitr)     # Reports
library(MASS)      # Multivariate normal
library(psych)     # Correlation matrices

theme_set(theme_minimal())

cat("Package versions:\n")

Package versions:

Code

cat("qcr version:", as.character(packageVersion("qcr")), "\n\n")

qcr version: 1.4

Import Multivariate Data

Code

# Import multivariate quality data
if (file.exists("multivariate_data.csv")) {
  mv_data <- read_csv("multivariate_data.csv", show_col_types = FALSE)
  cat("[OK] Multivariate data loaded from multivariate_data.csv\n\n")
} else {
  cat("[INFO] No multivariate_data.csv found. Generating example data.\n")
  cat("For your own data, create multivariate_data.csv with multiple quality variables.\n\n")
  
  # Generate realistic multivariate process data
  set.seed(42)
  n_samples <- 100
  
  # Phase I: In-control data (samples 1-80)
  # Create correlated variables (temperature, pressure, flow_rate)
  
  # Correlation structure (realistic for a process)
  correlation_matrix <- matrix(c(
    1.0,  0.7,  0.5,   # Temperature correlations
    0.7,  1.0,  0.6,   # Pressure correlations
    0.5,  0.6,  1.0    # Flow rate correlations
  ), nrow = 3, byrow = TRUE)
  
  # Mean vector
  mu_in_control <- c(150, 200, 50)  # Temp, Pressure, Flow
  
  # Standard deviations
  sigma_in_control <- c(2, 5, 1)
  
  # Create covariance matrix
  D <- diag(sigma_in_control)
  Sigma_in_control <- D %*% correlation_matrix %*% D
  
  # Generate in-control data (Phase I)
  phase1_data <- mvrnorm(n = 80, mu = mu_in_control, Sigma = Sigma_in_control)
  
  # Phase II: Out-of-control data (samples 81-100)
  # Introduce a mean shift
  mu_out_control <- c(152, 205, 51)  # Small shifts in all variables
  phase2_data <- mvrnorm(n = 20, mu = mu_out_control, Sigma = Sigma_in_control)
  
  # Combine data
  all_data <- rbind(phase1_data, phase2_data)
  
  # Create dataframe
  mv_data <- tibble(
    sample = 1:n_samples,
    date = seq.Date(from = Sys.Date() - n_samples, to = Sys.Date() - 1, by = "day"),
    temperature = all_data[, 1],
    pressure = all_data[, 2],
    flow_rate = all_data[, 3],
    shift = sample(c("Day", "Night"), n_samples, replace = TRUE),
    operator = sample(paste0("OP", 1:3), n_samples, replace = TRUE)
  )
}

[OK] Multivariate data loaded from multivariate_data.csv

Code

# Display data summary
cat("Multivariate Process Data:\n")

Multivariate Process Data:

Code

cat("Samples:", nrow(mv_data), "\n")

Samples: 100

Code

cat("Variables:", sum(sapply(mv_data, is.numeric)), "numeric variables\n\n")

Variables: 4 numeric variables

Code

# Show first few observations
mv_data %>%
  head(10) %>%
  gt() %>%
  tab_header(
    title = "Multivariate Quality Data - Sample Preview",
    subtitle = "First 10 observations"
  ) %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )

Multivariate Quality Data - Sample Preview
First 10 observations
sample	date	temperature	pressure	flow_rate	shift	operator
1.00	2024-09-11	149.82	199.45	49.89	Day	OP1
2.00	2024-09-12	150.24	201.32	50.15	Night	OP2
3.00	2024-09-13	149.95	198.76	49.95	Day	OP3
4.00	2024-09-14	150.18	200.88	50.08	Day	OP1
5.00	2024-09-15	149.73	199.12	49.82	Night	OP2
6.00	2024-09-16	150.42	202.15	50.28	Day	OP1
7.00	2024-09-17	149.88	199.67	49.92	Day	OP3
8.00	2024-09-18	150.15	200.45	50.12	Night	OP2
9.00	2024-09-19	149.92	199.88	49.98	Day	OP1
10.00	2024-09-20	150.05	200.12	50.02	Day	OP3

Data Summary and Correlation Analysis

Code

# Select quality variables
quality_vars <- c("temperature", "pressure", "flow_rate")
X <- as.matrix(mv_data[, quality_vars])

# Summary statistics
summary_stats <- mv_data %>%
  select(all_of(quality_vars)) %>%
  summarise(across(everything(),
                   list(Mean = ~mean(., na.rm = TRUE),
                        SD = ~sd(., na.rm = TRUE),
                        Min = ~min(., na.rm = TRUE),
                        Max = ~max(., na.rm = TRUE)),
                   .names = "{.col}__{.fn}")) %>%
  pivot_longer(everything(),
               names_to = c("Variable", "Statistic"),
               names_sep = "__") %>%
  pivot_wider(names_from = Statistic, values_from = value)

summary_stats %>%
  gt() %>%
  tab_header(
    title = "Summary Statistics",
    subtitle = "Process variables descriptive statistics"
  ) %>%
  fmt_number(
    columns = c(Mean, SD, Min, Max),
    decimals = 3
  ) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightyellow"),
      cell_text(weight = "bold")
    ),
    locations = cells_body(columns = Variable)
  )

Summary Statistics
Process variables descriptive statistics
Variable	Mean	SD	Min	Max
temperature	150.444	0.857	149.670	152.380
pressure	201.321	2.219	198.760	206.220
flow_rate	50.250	0.485	49.750	51.350

Code

# Correlation matrix
cor_matrix <- cor(X)

cat("\nCorrelation Matrix:\n")


Correlation Matrix:

Code

cor_df <- as.data.frame(cor_matrix)
cor_df$Variable <- rownames(cor_matrix)
cor_df <- cor_df[, c("Variable", quality_vars)]

cor_df %>%
  gt() %>%
  tab_header(
    title = "Correlation Matrix",
    subtitle = "Relationships between process variables"
  ) %>%
  fmt_number(
    columns = all_of(quality_vars),
    decimals = 3
  )

Correlation Matrix
Relationships between process variables
Variable	temperature	pressure	flow_rate
temperature	1.000	0.980	0.998
pressure	0.980	1.000	0.989
flow_rate	0.998	0.989	1.000

Code

# Note: Strong correlations (>0.7) are important for multivariate SPC
cat("\nStrong correlations (>0.7) found:\n")


Strong correlations (>0.7) found:

Code

for (i in 1:(length(quality_vars)-1)) {
  for (j in (i+1):length(quality_vars)) {
    if (abs(cor_matrix[i,j]) > 0.7) {
      cat(sprintf("%s <-> %s: %.3f\n", 
                  quality_vars[i], quality_vars[j], cor_matrix[i,j]))
    }
  }
}

temperature <-> pressure: 0.980
temperature <-> flow_rate: 0.998
pressure <-> flow_rate: 0.989

Code

cat("\n")

Code

# Correlation heatmap
cor_long <- cor_matrix %>%
  as.data.frame() %>%
  mutate(Var1 = rownames(.)) %>%
  pivot_longer(cols = -Var1, names_to = "Var2", values_to = "Correlation")

cor_plot <- plot_ly(
  data = cor_long,
  x = ~Var1,
  y = ~Var2,
  z = ~Correlation,
  type = "heatmap",
  colorscale = list(
    c(0, "blue"),
    c(0.5, "white"),
    c(1, "red")
  ),
  zmin = -1,
  zmax = 1,
  text = ~round(Correlation, 3),
  hovertemplate = "<b>%{x} vs %{y}</b><br>Correlation: %{z:.3f}<extra></extra>"
) %>%
  layout(
    title = list(
      text = "Correlation Heatmap<br><sub>Strong correlations indicate variables move together</sub>",
      font = list(size = 16)
    ),
    xaxis = list(title = ""),
    yaxis = list(title = "")
  )

cor_plot

1. Hotelling T² Control Chart

Theory

The Hotelling T² statistic is a multivariate generalization of the t-statistic:

Formula: \[T^2 = n(\bar{x} - \mu_0)' S^{-1} (\bar{x} - \mu_0)\]

Where: - \(\bar{x}\) = sample mean vector - \(\mu_0\) = target mean vector (from Phase I) - \(S\) = sample covariance matrix - \(n\) = sample size

Control Limit: For Phase II monitoring: \[UCL = \frac{p(m+1)(m-1)}{m^2-mp} F_{\alpha}(p, m-p)\]

Where: - \(p\) = number of variables - \(m\) = number of Phase I samples - \(F_{\alpha}\) = F-distribution critical value

Interpretation: - T² combines all variables into single statistic - Large T² indicates multivariate out-of-control - Accounts for correlations between variables

Code

# Phase I data (first 80 samples for establishing control limits)
phase1_indices <- 1:80
X_phase1 <- X[phase1_indices, ]

# Calculate Phase I statistics
mu_hat <- colMeans(X_phase1)
Sigma_hat <- cov(X_phase1)

cat("\nPhase I Statistics:\n")


Phase I Statistics:

Code

cat("Mean vector:\n")

Mean vector:

Code

print(round(mu_hat, 3))

temperature    pressure   flow_rate 
    150.029     200.309      50.019

Code

cat("\nCovariance matrix:\n")


Covariance matrix:

Code

print(round(Sigma_hat, 3))

            temperature pressure flow_rate
temperature       0.040    0.190     0.029
pressure          0.190    0.942     0.139
flow_rate         0.029    0.139     0.021

Code

cat("\n")

Code

# Phase II data (monitoring new samples)
X_phase2 <- X

# Calculate T² statistics using standard formula
m <- nrow(X_phase1)  # Number of Phase I samples
p <- ncol(X)          # Number of variables
n <- 1                # Individual observations

# Calculate T² for each observation
T2_values <- numeric(nrow(X_phase2))
for (i in 1:nrow(X_phase2)) {
  x_i <- X_phase2[i, ]
  # Hotelling T² formula: T² = n(x - μ)'Σ⁻¹(x - μ)
  T2_values[i] <- n * t(x_i - mu_hat) %*% solve(Sigma_hat) %*% (x_i - mu_hat)
}

# Calculate Phase II control limit
# UCL = [(p(m+1)(m-1)) / (m²-mp)] * F(α, p, m-p)
alpha <- 0.0027  # For 3-sigma equivalent (0.27%)
F_critical <- qf(1 - alpha, p, m - p)
UCL_T2 <- (p * (m + 1) * (m - 1)) / (m^2 - m * p) * F_critical

cat("Hotelling T² Chart Parameters:\n")

Hotelling T² Chart Parameters:

Code

cat("Number of variables (p):", p, "\n")

Number of variables (p): 3

Code

cat("Phase I samples (m):", m, "\n")

Phase I samples (m): 80

Code

cat("Significance level (α):", alpha, "\n")

Significance level (α): 0.0027

Code

cat("F critical value:", round(F_critical, 3), "\n")

F critical value: 5.145

Code

cat("Control limit (UCL):", round(UCL_T2, 3), "\n")

Control limit (UCL): 16.032

Code

cat("Number of out-of-control points:", sum(T2_values > UCL_T2), "\n\n")

Number of out-of-control points: 22

Code

# Create results table
t2_summary <- tibble(
  Metric = c("Number of Variables", "Phase I Samples", "UCL", 
             "Mean T²", "Max T²", "Out-of-Control Points"),
  Value = c(p, m, UCL_T2, mean(T2_values), max(T2_values), 
            sum(T2_values > UCL_T2))
)

t2_summary %>%
  gt() %>%
  tab_header(
    title = "Hotelling T² Chart Summary",
    subtitle = "Multivariate control chart statistics"
  ) %>%
  fmt_number(
    columns = Value,
    rows = 3:5,
    decimals = 3
  ) %>%
  fmt_number(
    columns = Value,
    rows = c(1, 2, 6),
    decimals = 0
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(rows = 6)
  )

Hotelling T² Chart Summary
Multivariate control chart statistics
Metric	Value
Number of Variables	3
Phase I Samples	80
UCL	16.032
Mean T²	312.980
Max T²	1,841.013
Out-of-Control Points	22

Code

# Create T² control chart
t2_data <- tibble(
  Sample = 1:length(T2_values),
  T2 = T2_values,
  UCL = UCL_T2,
  Phase = ifelse(Sample <= 80, "Phase I (Training)", "Phase II (Monitoring)"),
  Status = ifelse(T2 > UCL, "Out of Control", "In Control")
)

t2_plot <- plot_ly(t2_data) %>%
  add_trace(
    x = ~Sample,
    y = ~T2,
    type = "scatter",
    mode = "lines+markers",
    name = "T² Statistic",
    line = list(color = "blue"),
    marker = list(
      size = 6,
      color = ~ifelse(Status == "Out of Control", "red", "blue")
    ),
    text = ~paste0(
      "Sample: ", Sample, "<br>",
      "T²: ", round(T2, 3), "<br>",
      "Phase: ", Phase, "<br>",
      "Status: ", Status
    ),
    hovertemplate = "%{text}<extra></extra>"
  ) %>%
  add_trace(
    x = ~Sample,
    y = ~UCL,
    type = "scatter",
    mode = "lines",
    name = "UCL",
    line = list(color = "red", dash = "dash", width = 3),
    hovertemplate = "<b>UCL:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 80.5, xend = 80.5,
    y = 0, yend = max(t2_data$T2) * 1.1,
    line = list(color = "gray", dash = "dot", width = 2),
    showlegend = FALSE,
    hoverinfo = "skip"
  ) %>%
  layout(
    title = list(
      text = "Hotelling T² Control Chart<br><sub>Monitoring multiple correlated variables simultaneously</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Sample Number"),
    yaxis = list(title = "T² Statistic"),
    hovermode = "closest",
    annotations = list(
      list(
        x = 80.5, y = max(t2_data$T2) * 1.05,
        text = "Phase I | Phase II",
        showarrow = FALSE,
        xanchor = "center",
        font = list(size = 10, color = "gray")
      )
    )
  )

t2_plot

Interpreting T² Results

Understanding the T² Statistic

What T² Measures: - Overall deviation from the target mean vector - Accounts for variable correlations - Larger values indicate process is farther from target

When to Investigate: - Any point above UCL indicates out-of-control - Patterns (runs, trends) suggest special causes - Compare Phase I vs Phase II performance

Decomposition Analysis

When T² signals out-of-control, we need to identify which variable(s) caused the signal.

Code

# For out-of-control points, decompose T²
ooc_indices <- which(T2_values > UCL_T2)

if (length(ooc_indices) > 0) {
  cat("\nDecomposition of Out-of-Control Points:\n\n")
  
  # Analyze first few out-of-control points
  analyze_points <- head(ooc_indices, 3)
  
  decomp_results <- list()
  
  for (idx in analyze_points) {
    x_i <- X[idx, ]
    deviation <- x_i - mu_hat
    
    # Calculate contribution of each variable
    S_inv <- solve(Sigma_hat)
    contributions <- numeric(p)
    
    for (j in 1:p) {
      e_j <- rep(0, p)
      e_j[j] <- 1
      contributions[j] <- n * t(deviation) %*% S_inv %*% e_j * deviation[j]
    }
    
    decomp_results[[length(decomp_results) + 1]] <- tibble(
      Sample = idx,
      Variable = quality_vars,
      Value = x_i,
      Target = mu_hat,
      Deviation = deviation,
      Contribution = contributions,
      Pct_of_T2 = (contributions / T2_values[idx]) * 100
    )
  }
  
  # Display decomposition table
  bind_rows(decomp_results) %>%
    gt() %>%
    tab_header(
      title = "T² Decomposition Analysis",
      subtitle = "Identifying which variables caused out-of-control signals"
    ) %>%
    fmt_number(
      columns = c(Value, Target, Deviation, Contribution),
      decimals = 3
    ) %>%
    fmt_number(
      columns = Pct_of_T2,
      decimals = 1
    ) %>%
    tab_style(
      style = cell_fill(color = "lightcoral"),
      locations = cells_body(
        columns = Pct_of_T2,
        rows = Pct_of_T2 > 40
      )
    ) %>%
    tab_style(
      style = list(
        cell_fill(color = "lightyellow"),
        cell_text(weight = "bold")
      ),
      locations = cells_body(columns = Sample)
    )
} else {
  cat("\nNo out-of-control points detected in T² chart.\n")
}


Decomposition of Out-of-Control Points:

T² Decomposition Analysis
Identifying which variables caused out-of-control signals
Sample	Variable	Value	Target	Deviation	Contribution	Pct_of_T2
3	temperature	149.950	150.029	−0.079	−12.126	−32.6
3	pressure	198.760	200.309	−1.549	48.773	131.2
3	flow_rate	49.950	50.019	−0.069	0.523	1.4
4	temperature	150.180	150.029	0.151	42.757	223.3
4	pressure	200.880	200.309	0.571	−0.450	−2.3
4	flow_rate	50.080	50.019	0.061	−23.158	−120.9
81	temperature	152.150	150.029	2.121	5,489.128	313.7
81	pressure	205.320	200.309	5.011	−569.587	−32.5
81	flow_rate	51.180	50.019	1.161	−3,169.505	−181.1

2. MEWMA (Multivariate EWMA) Chart

Theory

MEWMA (Multivariate Exponentially Weighted Moving Average) is more sensitive to small sustained shifts than T².

Formula: \[Z_i = \Lambda X_i + (I - \Lambda) Z_{i-1}\]

Where: - \(Z_i\) = EWMA vector at time \(i\) - \(\Lambda\) = diagonal matrix of smoothing constants (typically 0.1-0.3) - \(X_i\) = observation vector at time \(i\) - \(I\) = identity matrix

MEWMA Statistic: \[T_i^2 = Z_i' \Sigma_Z^{-1} Z_i\]

Where \(\Sigma_Z\) is the covariance matrix of \(Z_i\).

Control Limit: Typically set using simulation to achieve desired false alarm rate.

Advantages: - More sensitive to small shifts - Smooths out random noise - Better for detecting gradual process drift

Code

# MEWMA parameters
lambda <- 0.2  # Smoothing constant (0 < lambda <= 1)
               # Smaller = more smoothing, more sensitive to small shifts

cat("MEWMA Chart Setup:\n")

MEWMA Chart Setup:

Code

cat("Smoothing constant (lambda):", lambda, "\n\n")

Smoothing constant (lambda): 0.2

Code

# Initialize MEWMA vectors
Z <- matrix(0, nrow = nrow(X), ncol = p)
Z[1, ] <- X[1, ] - mu_hat  # Initialize with first observation

# Calculate MEWMA vectors
# Z_i = λ(X_i - μ) + (1-λ)Z_{i-1}
for (i in 2:nrow(X)) {
  Z[i, ] <- lambda * (X[i, ] - mu_hat) + (1 - lambda) * Z[i-1, ]
}

# Calculate MEWMA T² statistic
# Covariance of Z_i: Σ_Z = [λ/(2-λ)]Σ
Sigma_Z <- (lambda / (2 - lambda)) * Sigma_hat

MEWMA_T2 <- numeric(nrow(X))
for (i in 1:nrow(X)) {
  MEWMA_T2[i] <- t(Z[i, ]) %*% solve(Sigma_Z) %*% Z[i, ]
}

# Control limit
# Using h parameter (typically 9-13 for good performance)
h <- 11.5  # Control limit parameter
UCL_MEWMA <- h

cat("MEWMA Chart Parameters:\n")

MEWMA Chart Parameters:

Code

cat("Smoothing constant (λ):", lambda, "\n")

Smoothing constant (λ): 0.2

Code

cat("Control limit parameter (h):", h, "\n")

Control limit parameter (h): 11.5

Code

cat("UCL:", round(UCL_MEWMA, 3), "\n")

UCL: 11.5

Code

cat("Number of out-of-control points:", sum(MEWMA_T2 > UCL_MEWMA), "\n\n")

Number of out-of-control points: 23

Code

# Create MEWMA results table
mewma_summary <- tibble(
  Metric = c("Smoothing Constant (λ)", "Control Limit (h)", 
             "Mean MEWMA T²", "Max MEWMA T²", "Out-of-Control Points"),
  Value = c(lambda, UCL_MEWMA, mean(MEWMA_T2), max(MEWMA_T2), 
            sum(MEWMA_T2 > UCL_MEWMA))
)

mewma_summary %>%
  gt() %>%
  tab_header(
    title = "MEWMA Chart Summary",
    subtitle = "Multivariate exponentially weighted moving average"
  ) %>%
  fmt_number(
    columns = Value,
    rows = 1:4,
    decimals = 3
  ) %>%
  fmt_number(
    columns = Value,
    rows = 5,
    decimals = 0
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(rows = 5)
  )

MEWMA Chart Summary
Multivariate exponentially weighted moving average
Metric	Value
Smoothing Constant (λ)	0.200
Control Limit (h)	11.500
Mean MEWMA T²	1,915.799
Max MEWMA T²	13,632.016
Out-of-Control Points	23

Code

# Create MEWMA control chart
mewma_data <- tibble(
  Sample = 1:length(MEWMA_T2),
  MEWMA = MEWMA_T2,
  UCL = UCL_MEWMA,
  Phase = ifelse(Sample <= 80, "Phase I", "Phase II"),
  Status = ifelse(MEWMA > UCL, "Out of Control", "In Control")
)

mewma_plot <- plot_ly(mewma_data) %>%
  add_trace(
    x = ~Sample,
    y = ~MEWMA,
    type = "scatter",
    mode = "lines+markers",
    name = "MEWMA T²",
    line = list(color = "darkgreen"),
    marker = list(
      size = 6,
      color = ~ifelse(Status == "Out of Control", "red", "darkgreen")
    ),
    text = ~paste0(
      "Sample: ", Sample, "<br>",
      "MEWMA T²: ", round(MEWMA, 3), "<br>",
      "Phase: ", Phase, "<br>",
      "Status: ", Status
    ),
    hovertemplate = "%{text}<extra></extra>"
  ) %>%
  add_trace(
    x = ~Sample,
    y = ~UCL,
    type = "scatter",
    mode = "lines",
    name = "UCL",
    line = list(color = "red", dash = "dash", width = 3),
    hovertemplate = "<b>UCL:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 80.5, xend = 80.5,
    y = 0, yend = max(mewma_data$MEWMA) * 1.1,
    line = list(color = "gray", dash = "dot", width = 2),
    showlegend = FALSE,
    hoverinfo = "skip"
  ) %>%
  layout(
    title = list(
      text = "MEWMA Control Chart<br><sub>More sensitive to small sustained shifts</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Sample Number"),
    yaxis = list(title = "MEWMA T² Statistic"),
    hovermode = "closest"
  )

mewma_plot

Comparing T² vs MEWMA

Code

# Create comparison dataframe
comparison_data <- tibble(
  Sample = 1:length(T2_values),
  `T² Statistic` = T2_values / UCL_T2,  # Normalize by UCL
  `MEWMA Statistic` = MEWMA_T2 / UCL_MEWMA,
  Phase = ifelse(Sample <= 80, "Phase I", "Phase II")
)

# Create comparison plot
comp_plot <- plot_ly(comparison_data) %>%
  add_trace(
    x = ~Sample,
    y = ~`T² Statistic`,
    type = "scatter",
    mode = "lines",
    name = "T² (normalized)",
    line = list(color = "blue", width = 2),
    hovertemplate = "<b>T²:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_trace(
    x = ~Sample,
    y = ~`MEWMA Statistic`,
    type = "scatter",
    mode = "lines",
    name = "MEWMA (normalized)",
    line = list(color = "darkgreen", width = 2),
    hovertemplate = "<b>MEWMA:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 0, xend = max(comparison_data$Sample),
    y = 1, yend = 1,
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE,
    hoverinfo = "skip"
  ) %>%
  add_segments(
    x = 80.5, xend = 80.5,
    y = 0, yend = max(c(comparison_data$`T² Statistic`, 
                       comparison_data$`MEWMA Statistic`)),
    line = list(color = "gray", dash = "dot", width = 2),
    showlegend = FALSE,
    hoverinfo = "skip"
  ) %>%
  layout(
    title = list(
      text = "T² vs MEWMA Comparison<br><sub>Both statistics normalized by their control limits</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Sample Number"),
    yaxis = list(title = "Normalized Statistic (UCL = 1.0)"),
    hovermode = "x unified",
    annotations = list(
      list(
        x = max(comparison_data$Sample) / 2,
        y = 1.05,
        text = "Control Limit",
        showarrow = FALSE,
        font = list(size = 10, color = "red")
      )
    )
  )

comp_plot

Key Observations:

T² Chart: Better for detecting large, sudden shifts
MEWMA Chart: Better for detecting small, gradual shifts
MEWMA smooths the data, making trends more visible
Notice how MEWMA responds earlier to the shift at sample 81

3. Variable Contribution Analysis

Identifying Root Causes

When a multivariate chart signals out-of-control, we need to identify which variable(s) are responsible.

Code

# For recent out-of-control points, show variable contributions
recent_ooc <- tail(which(MEWMA_T2 > UCL_MEWMA), 5)

if (length(recent_ooc) > 0) {
  # Calculate standardized deviations
  contrib_data <- list()
  
  for (idx in recent_ooc) {
    x_i <- X[idx, ]
    std_dev <- (x_i - mu_hat) / sqrt(diag(Sigma_hat))
    
    contrib_data[[length(contrib_data) + 1]] <- tibble(
      Sample = idx,
      Variable = quality_vars,
      Value = x_i,
      Target = mu_hat,
      `Std Deviation` = std_dev,
      `Abs Std Dev` = abs(std_dev),
      Status = ifelse(abs(std_dev) > 3, "[X] Out of Control",
                     ifelse(abs(std_dev) > 2, "[WARNING] Near Limit", "[OK] In Control"))
    )
  }
  
  contrib_table <- bind_rows(contrib_data)
  
  contrib_table %>%
    gt() %>%
    tab_header(
      title = "Variable Contribution Analysis",
      subtitle = "Which variables are causing out-of-control signals?"
    ) %>%
    fmt_number(
      columns = c(Value, Target, `Std Deviation`),
      decimals = 3
    ) %>%
    fmt_number(
      columns = `Abs Std Dev`,
      decimals = 2
    ) %>%
    tab_style(
      style = cell_fill(color = "lightcoral"),
      locations = cells_body(
        columns = `Abs Std Dev`,
        rows = `Abs Std Dev` > 3
      )
    ) %>%
    tab_style(
      style = cell_fill(color = "lightyellow"),
      locations = cells_body(
        columns = `Abs Std Dev`,
        rows = `Abs Std Dev` > 2 & `Abs Std Dev` <= 3
      )
    ) %>%
    tab_style(
      style = cell_fill(color = "lightgreen"),
      locations = cells_body(
        columns = `Abs Std Dev`,
        rows = `Abs Std Dev` <= 2
      )
    )
  
  # Create contribution bar chart for most recent OOC point
  latest_ooc <- tail(recent_ooc, 1)
  latest_contrib <- contrib_table %>% filter(Sample == latest_ooc)
  
  contrib_plot <- plot_ly(
    data = latest_contrib,
    x = ~Variable,
    y = ~`Std Deviation`,
    type = "bar",
    marker = list(
      color = ~ifelse(abs(`Std Deviation`) > 3, "red",
                     ifelse(abs(`Std Deviation`) > 2, "orange", "green"))
    ),
    text = ~round(`Std Deviation`, 2),
    textposition = "outside",
    hovertemplate = paste(
      "<b>%{x}</b><br>",
      "Std Dev: %{y:.3f}<br>",
      "<extra></extra>"
    )
  ) %>%
    add_segments(
      x = -0.5, xend = length(quality_vars) - 0.5,
      y = 3, yend = 3,
      line = list(color = "red", dash = "dash", width = 2),
      showlegend = FALSE,
      name = "+3σ"
    ) %>%
    add_segments(
      x = -0.5, xend = length(quality_vars) - 0.5,
      y = -3, yend = -3,
      line = list(color = "red", dash = "dash", width = 2),
      showlegend = FALSE,
      name = "-3σ"
    ) %>%
    layout(
      title = list(
        text = paste0("Variable Contributions - Sample ", latest_ooc, 
                     "<br><sub>Standardized deviations from target</sub>"),
        font = list(size = 16, weight = "bold")
      ),
      xaxis = list(title = "Variable"),
      yaxis = list(title = "Standardized Deviation (σ)"),
      showlegend = FALSE
    )
  
  contrib_plot
} else {
  cat("\nNo recent out-of-control points for contribution analysis.\n")
}

4. Process Monitoring Dashboard

Code

# Create summary metrics
dashboard_metrics <- tibble(
  Metric = c(
    "Total Samples",
    "Phase I Samples",
    "Phase II Samples",
    "Variables Monitored",
    "T² Out-of-Control",
    "MEWMA Out-of-Control",
    "Current Process Status"
  ),
  Value = c(
    nrow(X),
    80,
    nrow(X) - 80,
    p,
    sum(T2_values > UCL_T2),
    sum(MEWMA_T2 > UCL_MEWMA),
    ifelse(tail(MEWMA_T2, 1) > UCL_MEWMA, "[X] OUT OF CONTROL", "[OK] IN CONTROL")
  )
)

dashboard_metrics %>%
  gt() %>%
  tab_header(
    title = "Multivariate SPC Dashboard",
    subtitle = paste("Analysis Date:", Sys.Date())
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(
      columns = Value,
      rows = grepl("OUT OF CONTROL", Value)
    )
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(
      columns = Value,
      rows = grepl("IN CONTROL", Value)
    )
  ) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightblue"),
      cell_text(weight = "bold")
    ),
    locations = cells_column_labels()
  )

Multivariate SPC Dashboard
Analysis Date: 2025-12-20
Metric	Value
Total Samples	100
Phase I Samples	80
Phase II Samples	20
Variables Monitored	3
T² Out-of-Control	22
MEWMA Out-of-Control	23
Current Process Status	[X] OUT OF CONTROL

5. Recommendations and Actions

Current Process Assessment

Code

# Calculate process performance metrics
recent_samples <- 10
recent_T2 <- tail(T2_values, recent_samples)
recent_MEWMA <- tail(MEWMA_T2, recent_samples)

recent_violations_T2 <- sum(recent_T2 > UCL_T2)
recent_violations_MEWMA <- sum(recent_MEWMA > UCL_MEWMA)

cat("\n### Recent Process Performance (Last", recent_samples, "Samples)\n\n")


### Recent Process Performance (Last 10 Samples)

Code

cat("- T² violations:", recent_violations_T2, "out of", recent_samples, 
    paste0("(", round(recent_violations_T2/recent_samples*100, 1), "%)\n"))

- T² violations: 10 out of 10 (100%)

Code

cat("- MEWMA violations:", recent_violations_MEWMA, "out of", recent_samples,
    paste0("(", round(recent_violations_MEWMA/recent_samples*100, 1), "%)\n\n"))

- MEWMA violations: 10 out of 10 (100%)

Code

if (recent_violations_MEWMA > 0) {
  cat("**Status: [WARNING] Process requires immediate attention**\n\n")
  cat("**Recommended Actions:**\n")
  cat("1. Stop production and investigate root cause\n")
  cat("2. Review variable contributions to identify problem source\n")
  cat("3. Check equipment calibration and process parameters\n")
  cat("4. Review operator procedures and training\n")
  cat("5. Implement corrective actions\n")
  cat("6. Document findings and countermeasures\n")
} else {
  cat("**Status: [OK] Process is in statistical control**\n\n")
  cat("**Recommended Actions:**\n")
  cat("1. Continue current monitoring protocol\n")
  cat("2. Maintain equipment calibration schedule\n")
  cat("3. Regular operator training refreshers\n")
  cat("4. Update control limits quarterly\n")
}

**Status: [WARNING] Process requires immediate attention**

**Recommended Actions:**
1. Stop production and investigate root cause
2. Review variable contributions to identify problem source
3. Check equipment calibration and process parameters
4. Review operator procedures and training
5. Implement corrective actions
6. Document findings and countermeasures

Advantages of Multivariate SPC

Compared to univariate monitoring:

Accounts for correlations: Detects patterns that univariate charts miss
Reduces chart proliferation: One chart instead of multiple
Lower false alarm rate: Adjusts for multiple testing
More powerful: Better at detecting real process shifts
Process understanding: Reveals relationships between variables

When to Use Each Method

Method	Best For	Advantages	Disadvantages
T²	Large sudden shifts	Simple, well-understood	Less sensitive to small shifts
MEWMA	Small gradual shifts	Memory, smoothing	Slower response to large shifts
MCUSUM	Persistent small shifts	Directional info	Complex interpretation
PCA-based	High dimensions (10+ vars)	Dimension reduction	Requires more data

Implementation Guidelines

Phase I (Establishing Control): 1. Collect 20-30 samples during stable operation 2. Calculate mean vector and covariance matrix 3. Verify multivariate normality 4. Establish control limits

Phase II (Ongoing Monitoring): 1. Plot new observations on control charts 2. Investigate out-of-control signals immediately 3. Use contribution analysis to identify root causes 4. Document and implement corrective actions 5. Update control limits after process improvements

Report Generated: 2025-12-20 10:45:34.022343
Analysis Method: Hotelling T² and MEWMA
Statistical Software: R with qcr package (unified quality control)
Methodology: Multivariate Statistical Process Control
Package Info: qcr integrates qcc and MSQC functionality