Time Series Statistical Process Control

Autocorrelated Data, ARIMA Models, and Time Series Cross-Validation

Author

David Guy PhD: Quality Control Department

Published

December 21, 2025

Introduction to Time Series SPC

The Autocorrelation Problem

Traditional SPC assumes: - Observations are independent - No correlation between consecutive measurements - Random sampling from a stable process

Reality in many processes: - ✗ Chemical reactors have thermal inertia - ✗ Manufacturing lines have tool wear - ✗ Measurements taken close in time are correlated - ✗ This is called autocorrelation

Why Autocorrelation Matters

Impact on traditional control charts: 1. False alarm rate increases - charts signal when nothing changed 2. Detection power decreases - miss real shifts 3. Control limits too narrow - calculated assuming independence 4. Run length distribution changes - not geometric anymore

Example: - Traditional I-MR chart: 1% false alarm rate - With autocorrelation (ρ=0.5): 15% false alarm rate! - Process appears “out of control” when it’s actually stable

Solutions for Autocorrelated Data

1. Time Series Models (ARIMA)

Model the autocorrelation explicitly
Monitor residuals (which ARE independent)
Most theoretically sound approach

2. Modified Control Charts

EWMA (exponentially weighted moving average)
CUSUM (cumulative sum)
More robust to autocorrelation

3. Time Series Cross-Validation

Block-based CV respects temporal order
Time series bootstrap preserves dependence
Validates model performance

Setup and Libraries

Code

# Load required packages
library(forecast)   # Time series modeling (ARIMA, auto.arima)
library(tseries)    # Time series tests (ADF, KPSS)
library(plotly)     # Interactive visualizations
library(gt)         # Beautiful tables
library(dplyr)      # Data manipulation
library(tidyr)      # Data tidying
library(ggplot2)    # Graphics
library(readr)      # Data import
library(lubridate)  # Date handling
library(DT)         # Interactive tables
library(knitr)      # Reports
library(qcc)        # Quality control charts
library(zoo)        # Moving averages

theme_set(theme_minimal())

cat("Time Series SPC Analysis\n")

Time Series SPC Analysis

Code

cat("Package versions:\n")

Package versions:

Code

cat("forecast:", as.character(packageVersion("forecast")), "\n")

forecast: 8.24.0

Code

cat("tseries:", as.character(packageVersion("tseries")), "\n\n")

tseries: 0.10.58

Import or Generate Time Series Data

Code

# Import time series quality data
if (file.exists("timeseries_data.csv")) {
  ts_data <- read_csv("timeseries_data.csv", show_col_types = FALSE)
  cat("[OK] Time series data loaded from timeseries_data.csv\n\n")
} else {
  cat("[INFO] No timeseries_data.csv found. Generating example data.\n")
  cat("For your own data, create timeseries_data.csv with columns:\n")
  cat("  time, measurement, batch, shift\n\n")
  
  # Generate realistic autocorrelated time series
  set.seed(123)
  n <- 200
  
  # Phase I: In-control with autocorrelation (samples 1-150)
  # AR(1) process: X_t = φ*X_{t-1} + ε_t
  phi <- 0.7  # Autocorrelation parameter
  mu <- 100   # Process mean
  sigma <- 2  # Process std dev
  
  # Generate AR(1) process
  x <- numeric(n)
  x[1] <- rnorm(1, mu, sigma)
  
  for (i in 2:150) {
    x[i] <- mu * (1 - phi) + phi * x[i-1] + rnorm(1, 0, sigma * sqrt(1 - phi^2))
  }
  
  # Phase II: Mean shift (samples 151-200)
  mu_shift <- 103  # 1.5 sigma shift
  for (i in 151:n) {
    x[i] <- mu_shift * (1 - phi) + phi * x[i-1] + rnorm(1, 0, sigma * sqrt(1 - phi^2))
  }
  
  # Create dataframe
  ts_data <- tibble(
    time = 1:n,
    date = seq.Date(from = Sys.Date() - n, to = Sys.Date() - 1, by = "day"),
    measurement = x,
    batch = rep(1:20, each = 10),
    shift = sample(c("Day", "Night"), n, replace = TRUE),
    phase = ifelse(time <= 150, "Phase I", "Phase II")
  )
}

[OK] Time series data loaded from timeseries_data.csv

Code

# Display summary
cat("Time Series Data Summary:\n")

Time Series Data Summary:

Code

cat("Observations:", nrow(ts_data), "\n")

Observations: 200

Code

cat("Time range:", min(ts_data$time), "to", max(ts_data$time), "\n\n")

Time range: 1 to 200

Code

# Show first observations
ts_data %>%
  head(10) %>%
  gt() %>%
  tab_header(
    title = "Time Series Quality Data - Sample",
    subtitle = "First 10 observations"
  ) %>%
  fmt_number(
    columns = measurement,
    decimals = 2
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )

Time Series Quality Data - Sample
First 10 observations
time	date	measurement	batch	shift	phase
1	2025-06-04 14:39:17.929335	97.83	1	Night	Phase I
2	2025-06-05 14:39:17.929335	99.90	1	Day	Phase I
3	2025-06-06 14:39:17.929335	100.34	1	Night	Phase I
4	2025-06-07 14:39:17.929335	98.08	1	Day	Phase I
5	2025-06-08 14:39:17.929335	97.83	1	Day	Phase I
6	2025-06-09 14:39:17.929335	100.84	1	Day	Phase I
7	2025-06-10 14:39:17.929335	97.12	1	Day	Phase I
8	2025-06-11 14:39:17.929335	97.37	1	Night	Phase I
9	2025-06-12 14:39:17.929335	99.97	1	Night	Phase I
10	2025-06-13 14:39:17.929335	98.74	1	Night	Phase I

Time Series Visualization

Code

# Create comprehensive time series plot
cat("\nTime Series Overview:\n\n")


Time Series Overview:

Code

# Summary statistics
ts_summary <- ts_data %>%
  group_by(phase) %>%
  summarise(
    n = n(),
    Mean = mean(measurement),
    `Std Dev` = sd(measurement),
    Min = min(measurement),
    Max = max(measurement),
    .groups = "drop"
  )

ts_summary %>%
  gt() %>%
  tab_header(
    title = "Time Series Summary Statistics",
    subtitle = "Phase I vs Phase II comparison"
  ) %>%
  fmt_number(
    columns = c(Mean, `Std Dev`, Min, Max),
    decimals = 3
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(rows = phase == "Phase I")
  ) %>%
  tab_style(
    style = cell_fill(color = "lightyellow"),
    locations = cells_body(rows = phase == "Phase II")
  )

Time Series Summary Statistics
Phase I vs Phase II comparison
phase	n	Mean	Std Dev	Min	Max
Phase I	150	100.237	2.201	93.778	105.562
Phase II	50	102.271	1.855	98.444	106.511

Code

# Main time series plot
ts_plot <- plot_ly(ts_data) %>%
  add_trace(
    x = ~time,
    y = ~measurement,
    type = "scatter",
    mode = "lines+markers",
    name = "Measurements",
    line = list(color = "blue", width = 2),
    marker = list(
      size = 4,
      color = ~ifelse(phase == "Phase I", "blue", "red")
    ),
    text = ~paste0(
      "Time: ", time, "<br>",
      "Date: ", date, "<br>",
      "Value: ", round(measurement, 2), "<br>",
      "Batch: ", batch, "<br>",
      "Shift: ", shift, "<br>",
      "Phase: ", phase
    ),
    hovertemplate = "%{text}<extra></extra>"
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(ts_data$measurement), yend = max(ts_data$measurement),
    line = list(color = "red", dash = "dash", width = 3),
    showlegend = FALSE,
    name = "Phase Boundary"
  ) %>%
  add_segments(
    x = 1, xend = 150,
    y = mean(ts_data$measurement[1:150]), 
    yend = mean(ts_data$measurement[1:150]),
    line = list(color = "darkgreen", dash = "dot", width = 2),
    name = "Phase I Mean"
  ) %>%
  add_segments(
    x = 151, xend = 200,
    y = mean(ts_data$measurement[151:200]), 
    yend = mean(ts_data$measurement[151:200]),
    line = list(color = "orange", dash = "dot", width = 2),
    name = "Phase II Mean"
  ) %>%
  layout(
    title = list(
      text = "Time Series Data Overview<br><sub>Full process measurement history with phase separation</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time Index"),
    yaxis = list(title = "Measurement Value"),
    hovermode = "closest",
    annotations = list(
      list(
        x = 150.5, 
        y = max(ts_data$measurement) * 0.95,
        text = "Phase I | Phase II",
        showarrow = FALSE,
        xanchor = "center",
        font = list(size = 12, color = "red", weight = "bold")
      )
    )
  )

ts_plot

Code

# Histogram by phase
hist_plot <- plot_ly() %>%
  add_trace(
    x = ts_data$measurement[ts_data$phase == "Phase I"],
    type = "histogram",
    name = "Phase I",
    marker = list(color = "lightblue", 
                  line = list(color = "darkblue", width = 1)),
    opacity = 0.7,
    nbinsx = 20
  ) %>%
  add_trace(
    x = ts_data$measurement[ts_data$phase == "Phase II"],
    type = "histogram",
    name = "Phase II",
    marker = list(color = "lightcoral", 
                  line = list(color = "darkred", width = 1)),
    opacity = 0.7,
    nbinsx = 20
  ) %>%
  layout(
    title = list(
      text = "Distribution Comparison<br><sub>Phase I vs Phase II histograms</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Measurement Value"),
    yaxis = list(title = "Frequency"),
    barmode = "overlay"
  )

hist_plot

Code

# Moving average plot to show trend
window_size <- 10
ts_data_ma <- ts_data %>%
  mutate(
    MA = zoo::rollmean(measurement, k = window_size, fill = NA, align = "right")
  )

ma_plot <- plot_ly(ts_data_ma) %>%
  add_trace(
    x = ~time,
    y = ~measurement,
    type = "scatter",
    mode = "lines",
    name = "Original",
    line = list(color = "lightgray", width = 1),
    opacity = 0.5
  ) %>%
  add_trace(
    x = ~time,
    y = ~MA,
    type = "scatter",
    mode = "lines",
    name = paste0("Moving Average (", window_size, ")"),
    line = list(color = "darkblue", width = 3)
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(ts_data$measurement, na.rm = TRUE), 
    yend = max(ts_data$measurement, na.rm = TRUE),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "Moving Average Trend<br><sub>Smoothed time series showing process shift</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time Index"),
    yaxis = list(title = "Measurement Value")
  )

ma_plot

Code

# Calculate and display first differences
ts_data_diff <- ts_data %>%
  mutate(
    Difference = c(NA, diff(measurement))
  )

diff_plot <- plot_ly(ts_data_diff) %>%
  add_trace(
    x = ~time,
    y = ~Difference,
    type = "scatter",
    mode = "lines+markers",
    name = "First Difference",
    line = list(color = "purple"),
    marker = list(size = 3)
  ) %>%
  add_segments(
    x = 0, xend = max(ts_data$time),
    y = 0, yend = 0,
    line = list(color = "black", width = 1),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(ts_data_diff$Difference, na.rm = TRUE), 
    yend = max(ts_data_diff$Difference, na.rm = TRUE),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "First Differences (Change from Previous Time Point)<br><sub>Assessing variability and detecting changes</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time Index"),
    yaxis = list(title = "Difference (Δy)")
  )

diff_plot

Code

cat("\n")

Code

cat("Visual Assessment:\n")

Visual Assessment:

Code

cat("- Main plot shows overall trend and phase separation\n")

- Main plot shows overall trend and phase separation

Code

cat("- Histogram shows distribution shift between phases\n")

- Histogram shows distribution shift between phases

Code

cat("- Moving average reveals underlying trend\n")

- Moving average reveals underlying trend

Code

cat("- First differences show period-to-period variability\n\n")

- First differences show period-to-period variability

Code

if (mean(ts_data$measurement[151:200]) > mean(ts_data$measurement[1:150])) {
  cat("Observation: Phase II mean is HIGHER than Phase I\n")
  cat("Shift magnitude:", round(mean(ts_data$measurement[151:200]) - mean(ts_data$measurement[1:150]), 3), "\n\n")
} else {
  cat("Observation: Phase II mean is LOWER than Phase I\n")
  cat("Shift magnitude:", round(abs(mean(ts_data$measurement[151:200]) - mean(ts_data$measurement[1:150])), 3), "\n\n")
}

Observation: Phase II mean is HIGHER than Phase I
Shift magnitude: 2.034

1. Diagnostic Analysis

Check for Autocorrelation

Code

# Extract measurement vector
y <- ts_data$measurement
y_phase1 <- y[1:150]  # Phase I for model building

# Create time series object
ts_phase1 <- ts(y_phase1, frequency = 1)
ts_all <- ts(y, frequency = 1)

# Calculate autocorrelation function (ACF)
acf_result <- acf(y_phase1, lag.max = 30, plot = FALSE)
pacf_result <- pacf(y_phase1, lag.max = 30, plot = FALSE)

# Test for autocorrelation
# Ljung-Box test (H0: no autocorrelation)
lb_test <- Box.test(y_phase1, lag = 20, type = "Ljung-Box")

# Durbin-Watson test
dw_stat <- sum(diff(y_phase1)^2) / sum((y_phase1 - mean(y_phase1))^2)

cat("\nAutocorrelation Tests:\n\n")


Autocorrelation Tests:

Code

cat("Ljung-Box Test:\n")

Ljung-Box Test:

Code

cat("  Statistic:", round(lb_test$statistic, 3), "\n")

  Statistic: 283.577

Code

cat("  p-value:", round(lb_test$p.value, 4), "\n")

  p-value: 0

Code

cat("  Interpretation:", ifelse(lb_test$p.value < 0.05, 
                                 "[SIGNIFICANT] Autocorrelation detected",
                                 "[OK] No significant autocorrelation"), "\n\n")

  Interpretation: [SIGNIFICANT] Autocorrelation detected

Code

cat("Durbin-Watson Statistic:", round(dw_stat, 3), "\n")

Durbin-Watson Statistic: 0.589

Code

cat("  (Values near 2 = no autocorrelation, <1.5 or >2.5 = autocorrelation)\n")

  (Values near 2 = no autocorrelation, <1.5 or >2.5 = autocorrelation)

Code

cat("  Interpretation:", ifelse(dw_stat < 1.5 | dw_stat > 2.5,
                                 "[SIGNIFICANT] Autocorrelation present",
                                 "[OK] Little autocorrelation"), "\n\n")

  Interpretation: [SIGNIFICANT] Autocorrelation present

Code

# Summary table
autocorr_summary <- tibble(
  Test = c("Ljung-Box", "Durbin-Watson"),
  Statistic = c(lb_test$statistic, dw_stat),
  `p-value` = c(lb_test$p.value, NA),
  Interpretation = c(
    ifelse(lb_test$p.value < 0.05, "Autocorrelation detected", "No autocorrelation"),
    ifelse(dw_stat < 1.5 | dw_stat > 2.5, "Autocorrelation present", "Little autocorrelation")
  )
)

autocorr_summary %>%
  gt() %>%
  tab_header(
    title = "Autocorrelation Tests Summary",
    subtitle = "Testing for temporal dependence in Phase I data"
  ) %>%
  fmt_number(
    columns = Statistic,
    decimals = 3
  ) %>%
  fmt_number(
    columns = `p-value`,
    decimals = 4
  ) %>%
  sub_missing(
    columns = `p-value`,
    missing_text = "—"
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(
      columns = Interpretation,
      rows = grepl("detected|present", Interpretation)
    )
  )

Autocorrelation Tests Summary
Testing for temporal dependence in Phase I data
Test	Statistic	p-value	Interpretation
Ljung-Box	283.577	0.0000	Autocorrelation detected
Durbin-Watson	0.589	—	Autocorrelation present

Code

# ACF Plot
acf_data <- tibble(
  Lag = 1:length(acf_result$acf[-1]),
  ACF = acf_result$acf[-1],
  Type = "ACF"
)

pacf_data <- tibble(
  Lag = 1:length(pacf_result$acf),
  ACF = pacf_result$acf,
  Type = "PACF"
)

combined_acf <- bind_rows(acf_data, pacf_data)

# Critical value for 95% confidence
ci <- qnorm(0.975) / sqrt(length(y_phase1))

acf_plot <- plot_ly(combined_acf) %>%
  add_trace(
    x = ~Lag,
    y = ~ACF,
    type = "bar",
    color = ~Type,
    colors = c("ACF" = "blue", "PACF" = "darkgreen"),
    hovertemplate = "<b>%{fullData.name}</b><br>Lag: %{x}<br>Value: %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 0, xend = max(combined_acf$Lag),
    y = ci, yend = ci,
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE,
    name = "95% CI"
  ) %>%
  add_segments(
    x = 0, xend = max(combined_acf$Lag),
    y = -ci, yend = -ci,
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "ACF and PACF Plots<br><sub>Identifying autocorrelation patterns</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Lag"),
    yaxis = list(title = "Correlation"),
    barmode = "group"
  )

acf_plot

Stationarity Tests

Code

# Augmented Dickey-Fuller test (H0: non-stationary)
adf_test <- adf.test(y_phase1)

# KPSS test (H0: stationary)
kpss_test <- kpss.test(y_phase1)

cat("\nStationarity Tests:\n\n")


Stationarity Tests:

Code

cat("Augmented Dickey-Fuller Test:\n")

Augmented Dickey-Fuller Test:

Code

cat("  Statistic:", round(adf_test$statistic, 3), "\n")

  Statistic: -3.865

Code

cat("  p-value:", round(adf_test$p.value, 4), "\n")

  p-value: 0.0178

Code

cat("  Interpretation:", ifelse(adf_test$p.value < 0.05,
                                 "[OK] Series is stationary",
                                 "[WARNING] Series may be non-stationary"), "\n\n")

  Interpretation: [OK] Series is stationary

Code

cat("KPSS Test:\n")

KPSS Test:

Code

cat("  Statistic:", round(kpss_test$statistic, 3), "\n")

  Statistic: 0.105

Code

cat("  p-value:", round(kpss_test$p.value, 4), "\n")

  p-value: 0.1

Code

cat("  Interpretation:", ifelse(kpss_test$p.value > 0.05,
                                 "[OK] Series is stationary",
                                 "[WARNING] Series is non-stationary"), "\n\n")

  Interpretation: [OK] Series is stationary

Code

# Summary table
stationarity_summary <- tibble(
  Test = c("Augmented Dickey-Fuller", "KPSS"),
  Hypothesis = c("H0: Non-stationary", "H0: Stationary"),
  Statistic = c(adf_test$statistic, kpss_test$statistic),
  `p-value` = c(adf_test$p.value, kpss_test$p.value),
  Conclusion = c(
    ifelse(adf_test$p.value < 0.05, "Stationary", "Non-stationary"),
    ifelse(kpss_test$p.value > 0.05, "Stationary", "Non-stationary")
  )
)

stationarity_summary %>%
  gt() %>%
  tab_header(
    title = "Stationarity Tests",
    subtitle = "Testing if mean and variance are constant over time"
  ) %>%
  fmt_number(
    columns = c(Statistic, `p-value`),
    decimals = 4
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(
      columns = Conclusion,
      rows = Conclusion == "Stationary"
    )
  ) %>%
  tab_style(
    style = cell_fill(color = "lightyellow"),
    locations = cells_body(
      columns = Conclusion,
      rows = Conclusion == "Non-stationary"
    )
  )

Stationarity Tests
Testing if mean and variance are constant over time
Test	Hypothesis	Statistic	p-value	Conclusion
Augmented Dickey-Fuller	H0: Non-stationary	−3.8651	0.0178	Stationary
KPSS	H0: Stationary	0.1047	0.1000	Stationary

2. ARIMA Model Building

Automatic Model Selection

Code

# Automatic ARIMA model selection
cat("\nFitting ARIMA model to Phase I data...\n\n")


Fitting ARIMA model to Phase I data...

Code

arima_model <- auto.arima(ts_phase1, 
                          seasonal = FALSE,
                          stepwise = TRUE,
                          approximation = FALSE,
                          trace = FALSE)

cat("Selected Model:", format(arima_model$call), "\n")

Selected Model: auto.arima(y = ts_phase1, seasonal = FALSE, stepwise = TRUE,      trace = FALSE, approximation = FALSE, x = structure(list(         x = structure(c(97.8287387933989, 99.9046113804412, 100.33740210444,          98.0847622947602, 97.8329271486981, 100.841772170881,          97.123249292472, 97.3736647365155, 99.9696839528894,          98.7408258572421, 98.1489347274819, 98.5689828445965,          101.128418446025, 99.8773583357823, 99.2800177573723,          98.8756347206254, 102.363642665268, 104.777905134276,          104.778609404013, 103.896611088766, 103.78080074418,          104.775751737961, 102.006388098132, 103.083891462195,          100.367824214204, 99.3465856079496, 100.838215292158,          98.5461865109298, 98.7822724095899, 97.9167585043672,          98.1766334651384, 94.7264586660874, 93.7782656913309,          94.6451613480021, 97.5762942589147, 98.0554046209828,          98.6428480156261, 100.032972257785, 98.7668514116694,          99.5418968348383, 98.5290343148848, 96.5027184123315,          96.9935863086652, 98.7150691157144, 99.5841502758808,          99.6920078671976, 103.201386571364, 102.830727127978,          103.379423616953, 105.562304624808, 102.045289694209,          99.9480164548274, 102.454130729517, 100.57802992962,          100.447017083682, 101.840200651206, 102.560323644697,          104.298705365059, 105.14530086738, 105.129108849709,          102.486727399015, 102.876000149823, 102.46207029605,          99.829163235352, 101.904722205198, 102.486269930958,          101.805361782678, 100.930831193718, 98.9400654634583,          99.5430232026388, 100.349181130777, 99.0573000249441,          101.000069424662, 99.132929191656, 96.3606565948708,          98.937486937375, 98.6801189178403, 98.8960769898098,          98.0310407461143, 96.3279548993663, 99.2224060040414,          98.4717824938718, 101.302562409174, 102.064860412742,          100.995837733503, 99.1461075578458, 98.3561103195336,          97.1174477919397, 100.963207590309, 100.909114368494,          102.279202169988, 99.7853007286366, 100.108280394259,          101.758119622923, 100.752192660219, 101.999260884158,          99.850409795014, 97.9478599664601, 99.1053944399024,          98.8322038340736, 100.099580205981, 97.24471712366, 99.088619369841,          103.073153865328, 102.116034768954, 101.52998905227,          101.327440295374, 98.2697749519012, 99.3975016118133,          97.2852673711942, 97.4888385117662, 100.017359745138,          98.9620519673089, 99.9893631338912, 101.439034887848,          101.405445996278, 99.025706121714, 98.8431246085634,          101.988786398743, 98.4998066029257, 98.5559634061243,          98.2006063288164, 98.9128861604702, 100.307685968181,          102.513050460081, 101.373166264736, 102.12147187883,          102.198802010666, 102.216664863071, 100.746220924922,          99.0978946695429, 97.7973504372616, 97.377736867761,          98.623876139027, 100.123566423073, 100.548502421318,          99.5998869889291, 102.299362149084, 103.778927533107,          102.139635979395, 100.321649895244, 100.41113908176,          102.097861871481, 101.943786763754, 102.15556128325,          101.205981895406, 101.495872515207, 103.2531614916, 101.934897554224,          101.559112670949), tsp = c(1, 150, 1), class = "ts")), class = "data.frame", row.names = c(NA,      -150L)))

Code

cat("Order: ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")\n\n")

Order: ARIMA( 1 , 0 , 0 )

Code

# Model summary
# Model summary - use accuracy for error metrics
arima_accuracy <- accuracy(arima_model)

# Extract model parameters
if (arima_model$arma[1] > 0) {
  ar_coefs <- coef(arima_model)[grep("ar", names(coef(arima_model)))]
} else {
  ar_coefs <- numeric(0)
}

if (arima_model$arma[2] > 0) {
  ma_coefs <- coef(arima_model)[grep("ma", names(coef(arima_model)))]
} else {
  ma_coefs <- numeric(0)
}

# Model information
model_info <- tibble(
  Metric = c("Model", "AIC", "BIC", "Log Likelihood", "Sigma²", "ME", "RMSE", "MAE"),
  Value = c(
    paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),
    as.character(round(arima_model$aic, 2)),
    as.character(round(arima_model$bic, 2)),
    as.character(round(arima_model$loglik, 2)),
    as.character(round(arima_model$sigma2, 4)),
    as.character(round(arima_accuracy[,"ME"], 4)),
    as.character(round(arima_accuracy[,"RMSE"], 4)),
    as.character(round(arima_accuracy[,"MAE"], 4))
  )
)

model_info %>%
  gt() %>%
  tab_header(
    title = "ARIMA Model Summary",
    subtitle = "Automatic model selection results"
  ) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightblue"),
      cell_text(weight = "bold")
    ),
    locations = cells_body(rows = Metric == "Model")
  )

ARIMA Model Summary
Automatic model selection results
Metric	Value
Model	ARIMA(1,0,0)
AIC	565.38
BIC	574.42
Log Likelihood	-279.69
Sigma²	2.4602
ME	0.0159
RMSE	1.558
MAE	1.2846

Code

# Model coefficients
if (length(coef(arima_model)) > 0) {
  coef_table <- tibble(
    Parameter = names(coef(arima_model)),
    Estimate = coef(arima_model),
    `Std Error` = sqrt(diag(arima_model$var.coef)),
    `t-value` = Estimate / `Std Error`,
    `p-value` = 2 * (1 - pnorm(abs(`t-value`)))
  )
  
  coef_table %>%
    gt() %>%
    tab_header(
      title = "ARIMA Model Coefficients",
      subtitle = "Parameter estimates and significance tests"
    ) %>%
    fmt_number(
      columns = c(Estimate, `Std Error`, `t-value`),
      decimals = 4
    ) %>%
    fmt_number(
      columns = `p-value`,
      decimals = 4
    ) %>%
    tab_style(
      style = cell_fill(color = "lightgreen"),
      locations = cells_body(
        columns = `p-value`,
        rows = `p-value` < 0.05
      )
    )
}

ARIMA Model Coefficients
Parameter estimates and significance tests
Parameter	Estimate	Std Error	t-value	p-value
ar1	0.7031	0.0577	12.1891	0.0000
intercept	100.2203	0.4218	237.5805	0.0000

Residual Diagnostics

Code

# Extract residuals
residuals_arima <- residuals(arima_model)

# Test residuals for independence
lb_residuals <- Box.test(residuals_arima, lag = 20, type = "Ljung-Box")

cat("\nResidual Diagnostics:\n\n")


Residual Diagnostics:

Code

cat("Ljung-Box Test on Residuals:\n")

Ljung-Box Test on Residuals:

Code

cat("  p-value:", round(lb_residuals$p.value, 4), "\n")

  p-value: 0.0116

Code

cat("  Interpretation:", ifelse(lb_residuals$p.value > 0.05,
                                 "[OK] Residuals are white noise (independent)",
                                 "[WARNING] Residuals still autocorrelated"), "\n\n")

  Interpretation: [WARNING] Residuals still autocorrelated

Code

# Residual statistics
resid_stats <- tibble(
  Statistic = c("Mean", "Std Dev", "Min", "Max", "Ljung-Box p-value"),
  Value = c(
    mean(residuals_arima),
    sd(residuals_arima),
    min(residuals_arima),
    max(residuals_arima),
    lb_residuals$p.value
  )
)

resid_stats %>%
  gt() %>%
  tab_header(
    title = "Residual Statistics",
    subtitle = "ARIMA model residuals should be white noise"
  ) %>%
  fmt_number(
    columns = Value,
    decimals = 4
  ) %>%
  tab_footnote(
    footnote = "p-value > 0.05 indicates residuals are independent (good)",
    locations = cells_body(rows = Statistic == "Ljung-Box p-value")
  )

Residual Statistics
ARIMA model residuals should be white noise
Statistic	Value
Mean	0.0159
Std Dev	1.5631
Min	−4.0570
Max	3.6485
Ljung-Box p-value¹	¹ 0.0116
¹ p-value > 0.05 indicates residuals are independent (good)

Code

# Residual plots
resid_data <- tibble(
  Time = 1:length(residuals_arima),
  Residual = as.numeric(residuals_arima)
)

resid_plot <- plot_ly(resid_data) %>%
  add_trace(
    x = ~Time,
    y = ~Residual,
    type = "scatter",
    mode = "lines+markers",
    name = "Residuals",
    line = list(color = "blue"),
    marker = list(size = 4),
    hovertemplate = "<b>Time:</b> %{x}<br><b>Residual:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 0, xend = max(resid_data$Time),
    y = 0, yend = 0,
    line = list(color = "black", dash = "solid", width = 1),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = 0, xend = max(resid_data$Time),
    y = 3 * sd(residuals_arima), yend = 3 * sd(residuals_arima),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = 0, xend = max(resid_data$Time),
    y = -3 * sd(residuals_arima), yend = -3 * sd(residuals_arima),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "ARIMA Residuals Time Plot<br><sub>Residuals should be random with no patterns</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Residual")
  )

resid_plot

3. Control Charts for Residuals

I-MR Chart on ARIMA Residuals

Code

# Now residuals should be independent - apply traditional control charts
residuals_vec <- as.numeric(residuals_arima)

# Calculate moving range
mr <- abs(diff(residuals_vec))
mr_mean <- mean(mr, na.rm = TRUE)

# Control limits for individuals
d2 <- 1.128  # For n=2
sigma_hat <- mr_mean / d2

ucl_i <- mean(residuals_vec) + 3 * sigma_hat
lcl_i <- mean(residuals_vec) - 3 * sigma_hat

# Control limits for moving range
d3 <- 0      # For n=2
d4 <- 3.267  # For n=2
ucl_mr <- d4 * mr_mean
lcl_mr <- d3 * mr_mean

cat("\nControl Chart Parameters (on Residuals):\n\n")


Control Chart Parameters (on Residuals):

Code

cat("Individuals Chart:\n")

Individuals Chart:

Code

cat("  Center Line:", round(mean(residuals_vec), 4), "\n")

  Center Line: 0.0159

Code

cat("  UCL:", round(ucl_i, 4), "\n")

  UCL: 4.8869

Code

cat("  LCL:", round(lcl_i, 4), "\n\n")

  LCL: -4.8551

Code

cat("Moving Range Chart:\n")

Moving Range Chart:

Code

cat("  Center Line:", round(mr_mean, 4), "\n")

  Center Line: 1.8315

Code

cat("  UCL:", round(ucl_mr, 4), "\n")

  UCL: 5.9835

Code

cat("  LCL:", round(lcl_mr, 4), "\n\n")

  LCL: 0

Code

# Identify out-of-control points
ooc_i <- which(residuals_vec > ucl_i | residuals_vec < lcl_i)
cat("Out-of-control points (Individuals):", length(ooc_i), "\n")

Out-of-control points (Individuals): 0

Code

if (length(ooc_i) > 0) {
  cat("  At times:", head(ooc_i, 10), "\n")
}

# Control chart data
control_data <- tibble(
  Time = 1:length(residuals_vec),
  Value = residuals_vec,
  UCL = ucl_i,
  CL = mean(residuals_vec),
  LCL = lcl_i,
  Status = ifelse(Value > UCL | Value < LCL, "Out of Control", "In Control")
)

# Individuals chart
i_chart <- plot_ly(control_data) %>%
  add_trace(
    x = ~Time,
    y = ~Value,
    type = "scatter",
    mode = "lines+markers",
    name = "Residuals",
    line = list(color = "blue"),
    marker = list(
      size = 6,
      color = ~ifelse(Status == "Out of Control", "red", "blue")
    ),
    text = ~paste0("Time: ", Time, "<br>Value: ", round(Value, 3), "<br>", Status),
    hovertemplate = "%{text}<extra></extra>"
  ) %>%
  add_trace(
    x = ~Time, y = ~UCL, type = "scatter", mode = "lines",
    name = "UCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = ~CL, type = "scatter", mode = "lines",
    name = "Center Line", line = list(color = "green", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = ~LCL, type = "scatter", mode = "lines",
    name = "LCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  layout(
    title = list(
      text = "I-Chart: ARIMA Residuals<br><sub>Monitoring residuals (independent observations)</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Residual Value")
  )

i_chart

4. Monitoring Phase II with ARIMA

One-Step-Ahead Forecasts

Code

# Use ARIMA model to monitor Phase II data
# Calculate one-step-ahead forecast errors

cat("\nPhase II Monitoring with ARIMA Model:\n\n")


Phase II Monitoring with ARIMA Model:

Code

# Refit model on full data for monitoring
all_residuals <- numeric(length(y))
all_forecasts <- numeric(length(y))

# Calculate residuals for all data
for (i in 1:length(y)) {
  if (i <= 150) {
    # Use Phase I model
    all_residuals[i] <- residuals_arima[i]
    all_forecasts[i] <- y[i] - residuals_arima[i]
  } else {
    # One-step-ahead forecast for Phase II
    # Refit on data up to i-1
    temp_model <- Arima(y[1:(i-1)], 
                        order = c(arima_model$arma[1], arima_model$arma[6], arima_model$arma[2]),
                        include.mean = TRUE)
    forecast_i <- forecast(temp_model, h = 1)
    all_forecasts[i] <- forecast_i$mean[1]
    all_residuals[i] <- y[i] - forecast_i$mean[1]
  }
}

# Calculate control limits (same as Phase I)
ucl_phase2 <- 3 * sigma_hat
lcl_phase2 <- -3 * sigma_hat

# Identify Phase II out-of-control
phase2_residuals <- all_residuals[151:200]
ooc_phase2 <- which(abs(phase2_residuals) > 3 * sigma_hat)

cat("Phase II Out-of-Control Points:", length(ooc_phase2), "out of 50\n")

Phase II Out-of-Control Points: 0 out of 50

Code

cat("False Alarm Rate:", round(length(ooc_phase2) / 50 * 100, 2), "%\n\n")

False Alarm Rate: 0 %

Code

# Full time series plot with forecasts
monitoring_data <- tibble(
  Time = 1:length(y),
  Observation = y,
  Forecast = all_forecasts,
  Residual = all_residuals,
  Phase = ifelse(Time <= 150, "Phase I", "Phase II"),
  UCL = ucl_phase2,
  LCL = lcl_phase2,
  Status = ifelse(abs(Residual) > 3 * sigma_hat, "Out of Control", "In Control")
)

# Monitoring chart
monitoring_plot <- plot_ly(monitoring_data) %>%
  add_trace(
    x = ~Time,
    y = ~Observation,
    type = "scatter",
    mode = "lines+markers",
    name = "Observations",
    line = list(color = "black"),
    marker = list(size = 4)
  ) %>%
  add_trace(
    x = ~Time,
    y = ~Forecast,
    type = "scatter",
    mode = "lines",
    name = "ARIMA Forecast",
    line = list(color = "blue", dash = "dot")
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(monitoring_data$Observation), yend = max(monitoring_data$Observation),
    line = list(color = "gray", dash = "dot", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "ARIMA-Based Process Monitoring<br><sub>Observations vs One-Step-Ahead Forecasts</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Measurement"),
    annotations = list(
      list(
        x = 150.5, y = max(monitoring_data$Observation),
        text = "Phase I | Phase II",
        showarrow = FALSE,
        font = list(size = 10, color = "gray")
      )
    )
  )

monitoring_plot

Code

# Residual monitoring chart for Phase II
residual_monitor_plot <- plot_ly(monitoring_data %>% filter(Time > 150)) %>%
  add_trace(
    x = ~Time,
    y = ~Residual,
    type = "scatter",
    mode = "lines+markers",
    name = "Forecast Errors",
    line = list(color = "blue"),
    marker = list(
      size = 6,
      color = ~ifelse(Status == "Out of Control", "red", "blue")
    )
  ) %>%
  add_trace(
    x = ~Time, y = ~UCL, type = "scatter", mode = "lines",
    name = "UCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = 0, type = "scatter", mode = "lines",
    name = "Center", line = list(color = "green", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = ~LCL, type = "scatter", mode = "lines",
    name = "LCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  layout(
    title = list(
      text = "Phase II: Forecast Error Monitoring<br><sub>Detecting shifts in autocorrelated process</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Forecast Error (Residual)")
  )

residual_monitor_plot

5. Time Series Cross-Validation

Rolling Origin Cross-Validation

Code

cat("\nTime Series Cross-Validation:\n\n")


Time Series Cross-Validation:

Code

cat("Method: Rolling Origin (Expanding Window)\n")

Method: Rolling Origin (Expanding Window)

Code

cat("This respects temporal order and tests forecasting performance\n\n")

This respects temporal order and tests forecasting performance

Code

# Rolling origin cross-validation
# Start with minimum training size, expand window each iteration
min_train <- 50
n_test <- length(y_phase1)
h <- 1  # One-step-ahead forecast

# Storage for results
cv_errors <- numeric(n_test - min_train)
cv_mae <- numeric(n_test - min_train)
cv_rmse <- numeric(n_test - min_train)

for (i in seq_along(cv_errors)) {
  train_end <- min_train + i - 1
  test_point <- train_end + 1
  
  if (test_point <= length(y_phase1)) {
    # Fit model on training data
    train_data <- y_phase1[1:train_end]
    test_data <- y_phase1[test_point]
    
    # Fit ARIMA
    cv_model <- auto.arima(train_data, seasonal = FALSE)
    
    # Forecast
    cv_forecast <- forecast(cv_model, h = h)
    
    # Calculate error
    error <- test_data - cv_forecast$mean[1]
    cv_errors[i] <- error
    cv_mae[i] <- abs(error)
    cv_rmse[i] <- error^2
  }
}

# Summary statistics
mae_cv <- mean(cv_mae, na.rm = TRUE)
rmse_cv <- sqrt(mean(cv_rmse, na.rm = TRUE))
mape_cv <- mean(abs(cv_errors / y_phase1[(min_train+1):length(y_phase1)]) * 100, na.rm = TRUE)

cat("Cross-Validation Results:\n")

Cross-Validation Results:

Code

cat("  Number of folds:", length(cv_errors), "\n")

  Number of folds: 100

Code

cat("  MAE:", round(mae_cv, 4), "\n")

  MAE: 1.2871

Code

cat("  RMSE:", round(rmse_cv, 4), "\n")

  RMSE: 1.5549

Code

cat("  MAPE:", round(mape_cv, 2), "%\n\n")

  MAPE: 1.28 %

Code

# CV results table
cv_summary <- tibble(
  Metric = c("Mean Absolute Error (MAE)", "Root Mean Squared Error (RMSE)", 
             "Mean Absolute Percentage Error (MAPE)", "Number of Folds"),
  Value = c(
    paste0(round(mae_cv, 4)),
    paste0(round(rmse_cv, 4)),
    paste0(round(mape_cv, 2), "%"),
    as.character(length(cv_errors))
  ),
  Interpretation = c(
    "Average forecast error magnitude",
    "Penalizes large errors more",
    "Relative error as percentage",
    "Expanding window iterations"
  )
)

cv_summary %>%
  gt() %>%
  tab_header(
    title = "Time Series Cross-Validation Results",
    subtitle = "Rolling origin method with expanding window"
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )

Time Series Cross-Validation Results
Rolling origin method with expanding window
Metric	Value	Interpretation
Mean Absolute Error (MAE)	1.2871	Average forecast error magnitude
Root Mean Squared Error (RMSE)	1.5549	Penalizes large errors more
Mean Absolute Percentage Error (MAPE)	1.28%	Relative error as percentage
Number of Folds	100	Expanding window iterations

Code

# Plot CV errors over time
cv_plot_data <- tibble(
  Fold = 1:length(cv_errors),
  `Forecast Error` = cv_errors,
  `Training Size` = min_train + Fold - 1
)

cv_error_plot <- plot_ly(cv_plot_data) %>%
  add_trace(
    x = ~`Training Size`,
    y = ~`Forecast Error`,
    type = "scatter",
    mode = "lines+markers",
    name = "CV Error",
    line = list(color = "blue"),
    marker = list(size = 4),
    hovertemplate = "<b>Training Size:</b> %{x}<br><b>Error:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = min(cv_plot_data$`Training Size`), 
    xend = max(cv_plot_data$`Training Size`),
    y = 0, yend = 0,
    line = list(color = "black", width = 1),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "Cross-Validation Forecast Errors<br><sub>Performance improves with more training data</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Training Sample Size"),
    yaxis = list(title = "One-Step-Ahead Forecast Error")
  )

cv_error_plot

Block Bootstrap for Time Series

Code

cat("\nBlock Bootstrap for Time Series:\n\n")


Block Bootstrap for Time Series:

Code

cat("Method: Moving Block Bootstrap\n")

Method: Moving Block Bootstrap

Code

cat("Preserves temporal dependence structure\n\n")

Preserves temporal dependence structure

Code

# Moving block bootstrap
block_length <- 10  # Block size
n_bootstrap <- 1000  # Number of bootstrap samples
n_obs <- length(y_phase1)

# Storage for bootstrap statistics
boot_means <- numeric(n_bootstrap)
boot_vars <- numeric(n_bootstrap)
boot_acf1 <- numeric(n_bootstrap)

set.seed(456)

for (b in 1:n_bootstrap) {
  # Generate bootstrap sample using moving blocks
  n_blocks <- ceiling(n_obs / block_length)
  boot_sample <- numeric(0)
  
  for (i in 1:n_blocks) {
    # Random starting point
    start <- sample(1:(n_obs - block_length + 1), 1)
    block <- y_phase1[start:(start + block_length - 1)]
    boot_sample <- c(boot_sample, block)
  }
  
  # Trim to original length
  boot_sample <- boot_sample[1:n_obs]
  
  # Calculate statistics
  boot_means[b] <- mean(boot_sample)
  boot_vars[b] <- var(boot_sample)
  
  # Calculate ACF(1)
  acf_temp <- acf(boot_sample, lag.max = 1, plot = FALSE)
  boot_acf1[b] <- acf_temp$acf[2]
}

# Calculate confidence intervals
ci_level <- 0.95
alpha <- 1 - ci_level

mean_ci <- quantile(boot_means, c(alpha/2, 1-alpha/2))
var_ci <- quantile(boot_vars, c(alpha/2, 1-alpha/2))
acf1_ci <- quantile(boot_acf1, c(alpha/2, 1-alpha/2))

cat("Bootstrap Results (", n_bootstrap, "iterations, block length =", block_length, "):\n\n")

Bootstrap Results ( 1000 iterations, block length = 10 ):

Code

# Bootstrap results table
boot_results <- tibble(
  Statistic = c("Mean", "Variance", "ACF(1)"),
  `Original Value` = c(
    mean(y_phase1),
    var(y_phase1),
    acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2]
  ),
  `Bootstrap Mean` = c(
    mean(boot_means),
    mean(boot_vars),
    mean(boot_acf1)
  ),
  `95% CI Lower` = c(mean_ci[1], var_ci[1], acf1_ci[1]),
  `95% CI Upper` = c(mean_ci[2], var_ci[2], acf1_ci[2])
)

boot_results %>%
  gt() %>%
  tab_header(
    title = "Block Bootstrap Results",
    subtitle = paste0("Moving block bootstrap (", n_bootstrap, " iterations, block length = ", block_length, ")")
  ) %>%
  fmt_number(
    columns = c(`Original Value`, `Bootstrap Mean`, `95% CI Lower`, `95% CI Upper`),
    decimals = 4
  ) %>%
  tab_footnote(
    footnote = "Bootstrap preserves autocorrelation structure in time series",
    locations = cells_column_labels(columns = `Bootstrap Mean`)
  )

Block Bootstrap Results
Moving block bootstrap (1000 iterations, block length = 10)
Statistic	Original Value	Bootstrap Mean¹	95% CI Lower	95% CI Upper
Mean	100.2371	100.2205	99.4992	100.9615
Variance	4.8457	4.8260	3.4209	6.4229
ACF(1)	0.7004	0.6190	0.4732	0.7275
¹ Bootstrap preserves autocorrelation structure in time series

Code

# Bootstrap distribution plots
boot_dist_data <- tibble(
  Mean = boot_means,
  Variance = boot_vars,
  `ACF(1)` = boot_acf1
)

# Mean distribution
mean_hist <- plot_ly(x = boot_dist_data$Mean, type = "histogram", 
                     name = "Bootstrap Distribution",
                     marker = list(color = "lightblue", line = list(color = "black", width = 1))) %>%
  add_segments(
    x = mean(y_phase1), xend = mean(y_phase1),
    y = 0, yend = 150,
    line = list(color = "red", width = 3),
    name = "Original Value"
  ) %>%
  add_segments(
    x = mean_ci[1], xend = mean_ci[1],
    y = 0, yend = 150,
    line = list(color = "green", dash = "dash", width = 2),
    name = "95% CI"
  ) %>%
  add_segments(
    x = mean_ci[2], xend = mean_ci[2],
    y = 0, yend = 150,
    line = list(color = "green", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "Bootstrap Distribution of Mean<br><sub>Block bootstrap preserves temporal dependence</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Mean Value"),
    yaxis = list(title = "Frequency")
  )

mean_hist

Comparing Bootstrap Methods

Code

# Compare block bootstrap vs naive bootstrap
cat("\nComparing Bootstrap Methods:\n\n")


Comparing Bootstrap Methods:

Code

# Naive bootstrap (WRONG for time series - ignores autocorrelation)
naive_boot_acf1 <- numeric(n_bootstrap)

set.seed(456)
for (b in 1:n_bootstrap) {
  # Naive bootstrap: random sampling with replacement
  naive_sample <- sample(y_phase1, replace = TRUE)
  acf_temp <- acf(naive_sample, lag.max = 1, plot = FALSE)
  naive_boot_acf1[b] <- acf_temp$acf[2]
}

# Comparison
comparison_boot <- tibble(
  Method = c("Original Data", "Block Bootstrap", "Naive Bootstrap (WRONG)"),
  `Mean ACF(1)` = c(
    acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2],
    mean(boot_acf1),
    mean(naive_boot_acf1)
  ),
  `Std Dev ACF(1)` = c(
    NA,
    sd(boot_acf1),
    sd(naive_boot_acf1)
  ),
  Validity = c(
    "Ground truth",
    "[OK] Preserves dependence",
    "[X] Destroys dependence"
  )
)

comparison_boot %>%
  gt() %>%
  tab_header(
    title = "Bootstrap Method Comparison",
    subtitle = "Block bootstrap vs Naive bootstrap for time series"
  ) %>%
  fmt_number(
    columns = c(`Mean ACF(1)`, `Std Dev ACF(1)`),
    decimals = 4
  ) %>%
  sub_missing(
    columns = `Std Dev ACF(1)`,
    missing_text = "—"
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(rows = Method == "Block Bootstrap")
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(rows = Method == "Naive Bootstrap (WRONG)")
  ) %>%
  tab_footnote(
    footnote = "Naive bootstrap underestimates ACF because it breaks temporal order",
    locations = cells_body(rows = Method == "Naive Bootstrap (WRONG)", columns = Method)
  )

Bootstrap Method Comparison
Block bootstrap vs Naive bootstrap for time series
Method	Mean ACF(1)	Std Dev ACF(1)	Validity
Original Data	0.7004	—	Ground truth
Block Bootstrap	0.6190	0.0642	[OK] Preserves dependence
Naive Bootstrap (WRONG)¹	−0.0093	0.0770	[X] Destroys dependence
¹ Naive bootstrap underestimates ACF because it breaks temporal order

Code

cat("\nKey Insight:\n")


Key Insight:

Code

cat("Block bootstrap mean ACF(1):", round(mean(boot_acf1), 4), "\n")

Block bootstrap mean ACF(1): 0.619

Code

cat("Original data ACF(1):", round(acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2], 4), "\n")

Original data ACF(1): 0.7004

Code

cat("Naive bootstrap mean ACF(1):", round(mean(naive_boot_acf1), 4), "\n\n")

Naive bootstrap mean ACF(1): -0.0093

Code

cat("Block bootstrap preserves autocorrelation, naive bootstrap destroys it!\n")

Block bootstrap preserves autocorrelation, naive bootstrap destroys it!

6. Dashboard and Summary

Code

# Summary metrics
dashboard_metrics <- tibble(
  Metric = c(
    "Total Observations",
    "Phase I Samples",
    "Phase II Samples",
    "ARIMA Model",
    "Autocorrelation (ACF1)",
    "Phase I OOC Points",
    "Phase II OOC Points",
    "CV MAE",
    "Bootstrap Mean CI Width"
  ),
  Value = c(
    length(y),
    150,
    50,
    paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),
    as.character(round(acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2], 3)),
    as.character(length(ooc_i)),
    as.character(length(ooc_phase2)),
    as.character(round(mae_cv, 4)),
    as.character(round(mean_ci[2] - mean_ci[1], 4))
  )
)

dashboard_metrics %>%
  gt() %>%
  tab_header(
    title = "Time Series SPC Dashboard",
    subtitle = paste("Analysis Date:", Sys.Date())
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )

Time Series SPC Dashboard
Analysis Date: 2025-12-21
Metric	Value
Total Observations	200
Phase I Samples	150
Phase II Samples	50
ARIMA Model	ARIMA(1,0,0)
Autocorrelation (ACF1)	0.7
Phase I OOC Points	0
Phase II OOC Points	0
CV MAE	1.2871
Bootstrap Mean CI Width	1.4623

7. Recommendations

When to Use Time Series SPC

Use ARIMA-based SPC when: - ✅ Autocorrelation detected (Ljung-Box p < 0.05) - ✅ Durbin-Watson statistic < 1.5 or > 2.5 - ✅ Slow-moving process (chemical, thermal) - ✅ High-frequency measurements - ✅ Traditional charts show too many false alarms

Stick with traditional SPC when: - ✅ No significant autocorrelation - ✅ Independent samples - ✅ Rational subgrouping possible - ✅ Simple process

Best Practices

1. Model Validation: - Always check residual diagnostics - Ljung-Box test on residuals should have p > 0.05 - Residuals should look like white noise

2. Cross-Validation: - Use time series CV (not random k-fold!) - Rolling origin or expanding window - Report out-of-sample forecast accuracy

3. Bootstrap: - Use block bootstrap (not naive) - Block length ~ 1.5 × ACF length - Validate CI coverage

4. Monitoring: - Monitor residuals, not original observations - Update model periodically with new data - Watch for model deterioration

Practical Implementation

Code

# Generate recommendations based on analysis
cat("\n### Analysis-Specific Recommendations:\n\n")


### Analysis-Specific Recommendations:

Code

if (lb_test$p.value < 0.05) {
  cat("[REQUIRED] Significant autocorrelation detected (p =", round(lb_test$p.value, 4), ")\n")
  cat("  Action: Use ARIMA-based SPC (implemented above)\n")
  cat("  Benefit: Proper false alarm rate (~0.27% instead of ~", 
      round(length(ooc_i)/150*100, 1), "%)\n\n")
} else {
  cat("[OPTIONAL] Weak autocorrelation detected\n")
  cat("  Action: Traditional SPC may be acceptable\n")
  cat("  Alternative: Use EWMA charts for added robustness\n\n")
}

[REQUIRED] Significant autocorrelation detected (p = 0 )
  Action: Use ARIMA-based SPC (implemented above)
  Benefit: Proper false alarm rate (~0.27% instead of ~ 0 %)

Code

if (length(ooc_phase2) > 5) {
  cat("[ALERT] Multiple Phase II out-of-control signals\n")
  cat("  Count:", length(ooc_phase2), "out of 50 samples\n")
  cat("  Action: Investigate process change at sample ~", min(ooc_phase2) + 150, "\n\n")
}

cat("Model Performance:\n")

Model Performance:

Code

cat("  CV MAE:", round(mae_cv, 4), "- Good if < 1 sigma\n")

  CV MAE: 1.2871 - Good if < 1 sigma

Code

cat("  Model:", paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")\n"))

  Model: ARIMA(1,0,0)

Report Generated: 2025-12-21 10:07:12.26231
Analysis Method: ARIMA-based Time Series SPC
Statistical Software: R with forecast and qcc packages
Methodology: Time series modeling with cross-validation and bootstrap