Time Series Statistical Process Control

Autocorrelated Data, ARIMA Models, and Time Series Cross-Validation

Author

David Guy PhD: Quality Control Department

Published

December 21, 2025

Introduction to Time Series SPC

The Autocorrelation Problem

Traditional SPC assumes: - Observations are independent - No correlation between consecutive measurements - Random sampling from a stable process

Reality in many processes: - ✗ Chemical reactors have thermal inertia - ✗ Manufacturing lines have tool wear - ✗ Measurements taken close in time are correlated - ✗ This is called autocorrelation

Why Autocorrelation Matters

Impact on traditional control charts: 1. False alarm rate increases - charts signal when nothing changed 2. Detection power decreases - miss real shifts 3. Control limits too narrow - calculated assuming independence 4. Run length distribution changes - not geometric anymore

Example: - Traditional I-MR chart: 1% false alarm rate - With autocorrelation (ρ=0.5): 15% false alarm rate! - Process appears “out of control” when it’s actually stable

Solutions for Autocorrelated Data

1. Time Series Models (ARIMA)

  • Model the autocorrelation explicitly
  • Monitor residuals (which ARE independent)
  • Most theoretically sound approach

2. Modified Control Charts

  • EWMA (exponentially weighted moving average)
  • CUSUM (cumulative sum)
  • More robust to autocorrelation

3. Time Series Cross-Validation

  • Block-based CV respects temporal order
  • Time series bootstrap preserves dependence
  • Validates model performance

Setup and Libraries

Code
# Load required packages
library(forecast)   # Time series modeling (ARIMA, auto.arima)
library(tseries)    # Time series tests (ADF, KPSS)
library(plotly)     # Interactive visualizations
library(gt)         # Beautiful tables
library(dplyr)      # Data manipulation
library(tidyr)      # Data tidying
library(ggplot2)    # Graphics
library(readr)      # Data import
library(lubridate)  # Date handling
library(DT)         # Interactive tables
library(knitr)      # Reports
library(qcc)        # Quality control charts
library(zoo)        # Moving averages

theme_set(theme_minimal())

cat("Time Series SPC Analysis\n")
Time Series SPC Analysis
Code
cat("Package versions:\n")
Package versions:
Code
cat("forecast:", as.character(packageVersion("forecast")), "\n")
forecast: 8.24.0 
Code
cat("tseries:", as.character(packageVersion("tseries")), "\n\n")
tseries: 0.10.58 

Import or Generate Time Series Data

Code
# Import time series quality data
if (file.exists("timeseries_data.csv")) {
  ts_data <- read_csv("timeseries_data.csv", show_col_types = FALSE)
  cat("[OK] Time series data loaded from timeseries_data.csv\n\n")
} else {
  cat("[INFO] No timeseries_data.csv found. Generating example data.\n")
  cat("For your own data, create timeseries_data.csv with columns:\n")
  cat("  time, measurement, batch, shift\n\n")
  
  # Generate realistic autocorrelated time series
  set.seed(123)
  n <- 200
  
  # Phase I: In-control with autocorrelation (samples 1-150)
  # AR(1) process: X_t = φ*X_{t-1} + ε_t
  phi <- 0.7  # Autocorrelation parameter
  mu <- 100   # Process mean
  sigma <- 2  # Process std dev
  
  # Generate AR(1) process
  x <- numeric(n)
  x[1] <- rnorm(1, mu, sigma)
  
  for (i in 2:150) {
    x[i] <- mu * (1 - phi) + phi * x[i-1] + rnorm(1, 0, sigma * sqrt(1 - phi^2))
  }
  
  # Phase II: Mean shift (samples 151-200)
  mu_shift <- 103  # 1.5 sigma shift
  for (i in 151:n) {
    x[i] <- mu_shift * (1 - phi) + phi * x[i-1] + rnorm(1, 0, sigma * sqrt(1 - phi^2))
  }
  
  # Create dataframe
  ts_data <- tibble(
    time = 1:n,
    date = seq.Date(from = Sys.Date() - n, to = Sys.Date() - 1, by = "day"),
    measurement = x,
    batch = rep(1:20, each = 10),
    shift = sample(c("Day", "Night"), n, replace = TRUE),
    phase = ifelse(time <= 150, "Phase I", "Phase II")
  )
}
[OK] Time series data loaded from timeseries_data.csv
Code
# Display summary
cat("Time Series Data Summary:\n")
Time Series Data Summary:
Code
cat("Observations:", nrow(ts_data), "\n")
Observations: 200 
Code
cat("Time range:", min(ts_data$time), "to", max(ts_data$time), "\n\n")
Time range: 1 to 200 
Code
# Show first observations
ts_data %>%
  head(10) %>%
  gt() %>%
  tab_header(
    title = "Time Series Quality Data - Sample",
    subtitle = "First 10 observations"
  ) %>%
  fmt_number(
    columns = measurement,
    decimals = 2
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )
Time Series Quality Data - Sample
First 10 observations
time date measurement batch shift phase
1 2025-06-04 14:39:17.929335 97.83 1 Night Phase I
2 2025-06-05 14:39:17.929335 99.90 1 Day Phase I
3 2025-06-06 14:39:17.929335 100.34 1 Night Phase I
4 2025-06-07 14:39:17.929335 98.08 1 Day Phase I
5 2025-06-08 14:39:17.929335 97.83 1 Day Phase I
6 2025-06-09 14:39:17.929335 100.84 1 Day Phase I
7 2025-06-10 14:39:17.929335 97.12 1 Day Phase I
8 2025-06-11 14:39:17.929335 97.37 1 Night Phase I
9 2025-06-12 14:39:17.929335 99.97 1 Night Phase I
10 2025-06-13 14:39:17.929335 98.74 1 Night Phase I

Time Series Visualization

Code
# Create comprehensive time series plot
cat("\nTime Series Overview:\n\n")

Time Series Overview:
Code
# Summary statistics
ts_summary <- ts_data %>%
  group_by(phase) %>%
  summarise(
    n = n(),
    Mean = mean(measurement),
    `Std Dev` = sd(measurement),
    Min = min(measurement),
    Max = max(measurement),
    .groups = "drop"
  )

ts_summary %>%
  gt() %>%
  tab_header(
    title = "Time Series Summary Statistics",
    subtitle = "Phase I vs Phase II comparison"
  ) %>%
  fmt_number(
    columns = c(Mean, `Std Dev`, Min, Max),
    decimals = 3
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(rows = phase == "Phase I")
  ) %>%
  tab_style(
    style = cell_fill(color = "lightyellow"),
    locations = cells_body(rows = phase == "Phase II")
  )
Time Series Summary Statistics
Phase I vs Phase II comparison
phase n Mean Std Dev Min Max
Phase I 150 100.237 2.201 93.778 105.562
Phase II 50 102.271 1.855 98.444 106.511
Code
# Main time series plot
ts_plot <- plot_ly(ts_data) %>%
  add_trace(
    x = ~time,
    y = ~measurement,
    type = "scatter",
    mode = "lines+markers",
    name = "Measurements",
    line = list(color = "blue", width = 2),
    marker = list(
      size = 4,
      color = ~ifelse(phase == "Phase I", "blue", "red")
    ),
    text = ~paste0(
      "Time: ", time, "<br>",
      "Date: ", date, "<br>",
      "Value: ", round(measurement, 2), "<br>",
      "Batch: ", batch, "<br>",
      "Shift: ", shift, "<br>",
      "Phase: ", phase
    ),
    hovertemplate = "%{text}<extra></extra>"
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(ts_data$measurement), yend = max(ts_data$measurement),
    line = list(color = "red", dash = "dash", width = 3),
    showlegend = FALSE,
    name = "Phase Boundary"
  ) %>%
  add_segments(
    x = 1, xend = 150,
    y = mean(ts_data$measurement[1:150]), 
    yend = mean(ts_data$measurement[1:150]),
    line = list(color = "darkgreen", dash = "dot", width = 2),
    name = "Phase I Mean"
  ) %>%
  add_segments(
    x = 151, xend = 200,
    y = mean(ts_data$measurement[151:200]), 
    yend = mean(ts_data$measurement[151:200]),
    line = list(color = "orange", dash = "dot", width = 2),
    name = "Phase II Mean"
  ) %>%
  layout(
    title = list(
      text = "Time Series Data Overview<br><sub>Full process measurement history with phase separation</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time Index"),
    yaxis = list(title = "Measurement Value"),
    hovermode = "closest",
    annotations = list(
      list(
        x = 150.5, 
        y = max(ts_data$measurement) * 0.95,
        text = "Phase I | Phase II",
        showarrow = FALSE,
        xanchor = "center",
        font = list(size = 12, color = "red", weight = "bold")
      )
    )
  )

ts_plot
Code
# Histogram by phase
hist_plot <- plot_ly() %>%
  add_trace(
    x = ts_data$measurement[ts_data$phase == "Phase I"],
    type = "histogram",
    name = "Phase I",
    marker = list(color = "lightblue", 
                  line = list(color = "darkblue", width = 1)),
    opacity = 0.7,
    nbinsx = 20
  ) %>%
  add_trace(
    x = ts_data$measurement[ts_data$phase == "Phase II"],
    type = "histogram",
    name = "Phase II",
    marker = list(color = "lightcoral", 
                  line = list(color = "darkred", width = 1)),
    opacity = 0.7,
    nbinsx = 20
  ) %>%
  layout(
    title = list(
      text = "Distribution Comparison<br><sub>Phase I vs Phase II histograms</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Measurement Value"),
    yaxis = list(title = "Frequency"),
    barmode = "overlay"
  )

hist_plot
Code
# Moving average plot to show trend
window_size <- 10
ts_data_ma <- ts_data %>%
  mutate(
    MA = zoo::rollmean(measurement, k = window_size, fill = NA, align = "right")
  )

ma_plot <- plot_ly(ts_data_ma) %>%
  add_trace(
    x = ~time,
    y = ~measurement,
    type = "scatter",
    mode = "lines",
    name = "Original",
    line = list(color = "lightgray", width = 1),
    opacity = 0.5
  ) %>%
  add_trace(
    x = ~time,
    y = ~MA,
    type = "scatter",
    mode = "lines",
    name = paste0("Moving Average (", window_size, ")"),
    line = list(color = "darkblue", width = 3)
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(ts_data$measurement, na.rm = TRUE), 
    yend = max(ts_data$measurement, na.rm = TRUE),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "Moving Average Trend<br><sub>Smoothed time series showing process shift</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time Index"),
    yaxis = list(title = "Measurement Value")
  )

ma_plot
Code
# Calculate and display first differences
ts_data_diff <- ts_data %>%
  mutate(
    Difference = c(NA, diff(measurement))
  )

diff_plot <- plot_ly(ts_data_diff) %>%
  add_trace(
    x = ~time,
    y = ~Difference,
    type = "scatter",
    mode = "lines+markers",
    name = "First Difference",
    line = list(color = "purple"),
    marker = list(size = 3)
  ) %>%
  add_segments(
    x = 0, xend = max(ts_data$time),
    y = 0, yend = 0,
    line = list(color = "black", width = 1),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(ts_data_diff$Difference, na.rm = TRUE), 
    yend = max(ts_data_diff$Difference, na.rm = TRUE),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "First Differences (Change from Previous Time Point)<br><sub>Assessing variability and detecting changes</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time Index"),
    yaxis = list(title = "Difference (Δy)")
  )

diff_plot
Code
cat("\n")
Code
cat("Visual Assessment:\n")
Visual Assessment:
Code
cat("- Main plot shows overall trend and phase separation\n")
- Main plot shows overall trend and phase separation
Code
cat("- Histogram shows distribution shift between phases\n")
- Histogram shows distribution shift between phases
Code
cat("- Moving average reveals underlying trend\n")
- Moving average reveals underlying trend
Code
cat("- First differences show period-to-period variability\n\n")
- First differences show period-to-period variability
Code
if (mean(ts_data$measurement[151:200]) > mean(ts_data$measurement[1:150])) {
  cat("Observation: Phase II mean is HIGHER than Phase I\n")
  cat("Shift magnitude:", round(mean(ts_data$measurement[151:200]) - mean(ts_data$measurement[1:150]), 3), "\n\n")
} else {
  cat("Observation: Phase II mean is LOWER than Phase I\n")
  cat("Shift magnitude:", round(abs(mean(ts_data$measurement[151:200]) - mean(ts_data$measurement[1:150])), 3), "\n\n")
}
Observation: Phase II mean is HIGHER than Phase I
Shift magnitude: 2.034 

1. Diagnostic Analysis

Check for Autocorrelation

Code
# Extract measurement vector
y <- ts_data$measurement
y_phase1 <- y[1:150]  # Phase I for model building

# Create time series object
ts_phase1 <- ts(y_phase1, frequency = 1)
ts_all <- ts(y, frequency = 1)

# Calculate autocorrelation function (ACF)
acf_result <- acf(y_phase1, lag.max = 30, plot = FALSE)
pacf_result <- pacf(y_phase1, lag.max = 30, plot = FALSE)

# Test for autocorrelation
# Ljung-Box test (H0: no autocorrelation)
lb_test <- Box.test(y_phase1, lag = 20, type = "Ljung-Box")

# Durbin-Watson test
dw_stat <- sum(diff(y_phase1)^2) / sum((y_phase1 - mean(y_phase1))^2)

cat("\nAutocorrelation Tests:\n\n")

Autocorrelation Tests:
Code
cat("Ljung-Box Test:\n")
Ljung-Box Test:
Code
cat("  Statistic:", round(lb_test$statistic, 3), "\n")
  Statistic: 283.577 
Code
cat("  p-value:", round(lb_test$p.value, 4), "\n")
  p-value: 0 
Code
cat("  Interpretation:", ifelse(lb_test$p.value < 0.05, 
                                 "[SIGNIFICANT] Autocorrelation detected",
                                 "[OK] No significant autocorrelation"), "\n\n")
  Interpretation: [SIGNIFICANT] Autocorrelation detected 
Code
cat("Durbin-Watson Statistic:", round(dw_stat, 3), "\n")
Durbin-Watson Statistic: 0.589 
Code
cat("  (Values near 2 = no autocorrelation, <1.5 or >2.5 = autocorrelation)\n")
  (Values near 2 = no autocorrelation, <1.5 or >2.5 = autocorrelation)
Code
cat("  Interpretation:", ifelse(dw_stat < 1.5 | dw_stat > 2.5,
                                 "[SIGNIFICANT] Autocorrelation present",
                                 "[OK] Little autocorrelation"), "\n\n")
  Interpretation: [SIGNIFICANT] Autocorrelation present 
Code
# Summary table
autocorr_summary <- tibble(
  Test = c("Ljung-Box", "Durbin-Watson"),
  Statistic = c(lb_test$statistic, dw_stat),
  `p-value` = c(lb_test$p.value, NA),
  Interpretation = c(
    ifelse(lb_test$p.value < 0.05, "Autocorrelation detected", "No autocorrelation"),
    ifelse(dw_stat < 1.5 | dw_stat > 2.5, "Autocorrelation present", "Little autocorrelation")
  )
)

autocorr_summary %>%
  gt() %>%
  tab_header(
    title = "Autocorrelation Tests Summary",
    subtitle = "Testing for temporal dependence in Phase I data"
  ) %>%
  fmt_number(
    columns = Statistic,
    decimals = 3
  ) %>%
  fmt_number(
    columns = `p-value`,
    decimals = 4
  ) %>%
  sub_missing(
    columns = `p-value`,
    missing_text = "—"
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(
      columns = Interpretation,
      rows = grepl("detected|present", Interpretation)
    )
  )
Autocorrelation Tests Summary
Testing for temporal dependence in Phase I data
Test Statistic p-value Interpretation
Ljung-Box 283.577 0.0000 Autocorrelation detected
Durbin-Watson 0.589 Autocorrelation present
Code
# ACF Plot
acf_data <- tibble(
  Lag = 1:length(acf_result$acf[-1]),
  ACF = acf_result$acf[-1],
  Type = "ACF"
)

pacf_data <- tibble(
  Lag = 1:length(pacf_result$acf),
  ACF = pacf_result$acf,
  Type = "PACF"
)

combined_acf <- bind_rows(acf_data, pacf_data)

# Critical value for 95% confidence
ci <- qnorm(0.975) / sqrt(length(y_phase1))

acf_plot <- plot_ly(combined_acf) %>%
  add_trace(
    x = ~Lag,
    y = ~ACF,
    type = "bar",
    color = ~Type,
    colors = c("ACF" = "blue", "PACF" = "darkgreen"),
    hovertemplate = "<b>%{fullData.name}</b><br>Lag: %{x}<br>Value: %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 0, xend = max(combined_acf$Lag),
    y = ci, yend = ci,
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE,
    name = "95% CI"
  ) %>%
  add_segments(
    x = 0, xend = max(combined_acf$Lag),
    y = -ci, yend = -ci,
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "ACF and PACF Plots<br><sub>Identifying autocorrelation patterns</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Lag"),
    yaxis = list(title = "Correlation"),
    barmode = "group"
  )

acf_plot

Stationarity Tests

Code
# Augmented Dickey-Fuller test (H0: non-stationary)
adf_test <- adf.test(y_phase1)

# KPSS test (H0: stationary)
kpss_test <- kpss.test(y_phase1)

cat("\nStationarity Tests:\n\n")

Stationarity Tests:
Code
cat("Augmented Dickey-Fuller Test:\n")
Augmented Dickey-Fuller Test:
Code
cat("  Statistic:", round(adf_test$statistic, 3), "\n")
  Statistic: -3.865 
Code
cat("  p-value:", round(adf_test$p.value, 4), "\n")
  p-value: 0.0178 
Code
cat("  Interpretation:", ifelse(adf_test$p.value < 0.05,
                                 "[OK] Series is stationary",
                                 "[WARNING] Series may be non-stationary"), "\n\n")
  Interpretation: [OK] Series is stationary 
Code
cat("KPSS Test:\n")
KPSS Test:
Code
cat("  Statistic:", round(kpss_test$statistic, 3), "\n")
  Statistic: 0.105 
Code
cat("  p-value:", round(kpss_test$p.value, 4), "\n")
  p-value: 0.1 
Code
cat("  Interpretation:", ifelse(kpss_test$p.value > 0.05,
                                 "[OK] Series is stationary",
                                 "[WARNING] Series is non-stationary"), "\n\n")
  Interpretation: [OK] Series is stationary 
Code
# Summary table
stationarity_summary <- tibble(
  Test = c("Augmented Dickey-Fuller", "KPSS"),
  Hypothesis = c("H0: Non-stationary", "H0: Stationary"),
  Statistic = c(adf_test$statistic, kpss_test$statistic),
  `p-value` = c(adf_test$p.value, kpss_test$p.value),
  Conclusion = c(
    ifelse(adf_test$p.value < 0.05, "Stationary", "Non-stationary"),
    ifelse(kpss_test$p.value > 0.05, "Stationary", "Non-stationary")
  )
)

stationarity_summary %>%
  gt() %>%
  tab_header(
    title = "Stationarity Tests",
    subtitle = "Testing if mean and variance are constant over time"
  ) %>%
  fmt_number(
    columns = c(Statistic, `p-value`),
    decimals = 4
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(
      columns = Conclusion,
      rows = Conclusion == "Stationary"
    )
  ) %>%
  tab_style(
    style = cell_fill(color = "lightyellow"),
    locations = cells_body(
      columns = Conclusion,
      rows = Conclusion == "Non-stationary"
    )
  )
Stationarity Tests
Testing if mean and variance are constant over time
Test Hypothesis Statistic p-value Conclusion
Augmented Dickey-Fuller H0: Non-stationary −3.8651 0.0178 Stationary
KPSS H0: Stationary 0.1047 0.1000 Stationary

2. ARIMA Model Building

Automatic Model Selection

Code
# Automatic ARIMA model selection
cat("\nFitting ARIMA model to Phase I data...\n\n")

Fitting ARIMA model to Phase I data...
Code
arima_model <- auto.arima(ts_phase1, 
                          seasonal = FALSE,
                          stepwise = TRUE,
                          approximation = FALSE,
                          trace = FALSE)

cat("Selected Model:", format(arima_model$call), "\n")
Selected Model: auto.arima(y = ts_phase1, seasonal = FALSE, stepwise = TRUE,      trace = FALSE, approximation = FALSE, x = structure(list(         x = structure(c(97.8287387933989, 99.9046113804412, 100.33740210444,          98.0847622947602, 97.8329271486981, 100.841772170881,          97.123249292472, 97.3736647365155, 99.9696839528894,          98.7408258572421, 98.1489347274819, 98.5689828445965,          101.128418446025, 99.8773583357823, 99.2800177573723,          98.8756347206254, 102.363642665268, 104.777905134276,          104.778609404013, 103.896611088766, 103.78080074418,          104.775751737961, 102.006388098132, 103.083891462195,          100.367824214204, 99.3465856079496, 100.838215292158,          98.5461865109298, 98.7822724095899, 97.9167585043672,          98.1766334651384, 94.7264586660874, 93.7782656913309,          94.6451613480021, 97.5762942589147, 98.0554046209828,          98.6428480156261, 100.032972257785, 98.7668514116694,          99.5418968348383, 98.5290343148848, 96.5027184123315,          96.9935863086652, 98.7150691157144, 99.5841502758808,          99.6920078671976, 103.201386571364, 102.830727127978,          103.379423616953, 105.562304624808, 102.045289694209,          99.9480164548274, 102.454130729517, 100.57802992962,          100.447017083682, 101.840200651206, 102.560323644697,          104.298705365059, 105.14530086738, 105.129108849709,          102.486727399015, 102.876000149823, 102.46207029605,          99.829163235352, 101.904722205198, 102.486269930958,          101.805361782678, 100.930831193718, 98.9400654634583,          99.5430232026388, 100.349181130777, 99.0573000249441,          101.000069424662, 99.132929191656, 96.3606565948708,          98.937486937375, 98.6801189178403, 98.8960769898098,          98.0310407461143, 96.3279548993663, 99.2224060040414,          98.4717824938718, 101.302562409174, 102.064860412742,          100.995837733503, 99.1461075578458, 98.3561103195336,          97.1174477919397, 100.963207590309, 100.909114368494,          102.279202169988, 99.7853007286366, 100.108280394259,          101.758119622923, 100.752192660219, 101.999260884158,          99.850409795014, 97.9478599664601, 99.1053944399024,          98.8322038340736, 100.099580205981, 97.24471712366, 99.088619369841,          103.073153865328, 102.116034768954, 101.52998905227,          101.327440295374, 98.2697749519012, 99.3975016118133,          97.2852673711942, 97.4888385117662, 100.017359745138,          98.9620519673089, 99.9893631338912, 101.439034887848,          101.405445996278, 99.025706121714, 98.8431246085634,          101.988786398743, 98.4998066029257, 98.5559634061243,          98.2006063288164, 98.9128861604702, 100.307685968181,          102.513050460081, 101.373166264736, 102.12147187883,          102.198802010666, 102.216664863071, 100.746220924922,          99.0978946695429, 97.7973504372616, 97.377736867761,          98.623876139027, 100.123566423073, 100.548502421318,          99.5998869889291, 102.299362149084, 103.778927533107,          102.139635979395, 100.321649895244, 100.41113908176,          102.097861871481, 101.943786763754, 102.15556128325,          101.205981895406, 101.495872515207, 103.2531614916, 101.934897554224,          101.559112670949), tsp = c(1, 150, 1), class = "ts")), class = "data.frame", row.names = c(NA,      -150L))) 
Code
cat("Order: ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")\n\n")
Order: ARIMA( 1 , 0 , 0 )
Code
# Model summary
# Model summary - use accuracy for error metrics
arima_accuracy <- accuracy(arima_model)

# Extract model parameters
if (arima_model$arma[1] > 0) {
  ar_coefs <- coef(arima_model)[grep("ar", names(coef(arima_model)))]
} else {
  ar_coefs <- numeric(0)
}

if (arima_model$arma[2] > 0) {
  ma_coefs <- coef(arima_model)[grep("ma", names(coef(arima_model)))]
} else {
  ma_coefs <- numeric(0)
}

# Model information
model_info <- tibble(
  Metric = c("Model", "AIC", "BIC", "Log Likelihood", "Sigma²", "ME", "RMSE", "MAE"),
  Value = c(
    paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),
    as.character(round(arima_model$aic, 2)),
    as.character(round(arima_model$bic, 2)),
    as.character(round(arima_model$loglik, 2)),
    as.character(round(arima_model$sigma2, 4)),
    as.character(round(arima_accuracy[,"ME"], 4)),
    as.character(round(arima_accuracy[,"RMSE"], 4)),
    as.character(round(arima_accuracy[,"MAE"], 4))
  )
)

model_info %>%
  gt() %>%
  tab_header(
    title = "ARIMA Model Summary",
    subtitle = "Automatic model selection results"
  ) %>%
  tab_style(
    style = list(
      cell_fill(color = "lightblue"),
      cell_text(weight = "bold")
    ),
    locations = cells_body(rows = Metric == "Model")
  )
ARIMA Model Summary
Automatic model selection results
Metric Value
Model ARIMA(1,0,0)
AIC 565.38
BIC 574.42
Log Likelihood -279.69
Sigma² 2.4602
ME 0.0159
RMSE 1.558
MAE 1.2846
Code
# Model coefficients
if (length(coef(arima_model)) > 0) {
  coef_table <- tibble(
    Parameter = names(coef(arima_model)),
    Estimate = coef(arima_model),
    `Std Error` = sqrt(diag(arima_model$var.coef)),
    `t-value` = Estimate / `Std Error`,
    `p-value` = 2 * (1 - pnorm(abs(`t-value`)))
  )
  
  coef_table %>%
    gt() %>%
    tab_header(
      title = "ARIMA Model Coefficients",
      subtitle = "Parameter estimates and significance tests"
    ) %>%
    fmt_number(
      columns = c(Estimate, `Std Error`, `t-value`),
      decimals = 4
    ) %>%
    fmt_number(
      columns = `p-value`,
      decimals = 4
    ) %>%
    tab_style(
      style = cell_fill(color = "lightgreen"),
      locations = cells_body(
        columns = `p-value`,
        rows = `p-value` < 0.05
      )
    )
}
ARIMA Model Coefficients
Parameter estimates and significance tests
Parameter Estimate Std Error t-value p-value
ar1 0.7031 0.0577 12.1891 0.0000
intercept 100.2203 0.4218 237.5805 0.0000

Residual Diagnostics

Code
# Extract residuals
residuals_arima <- residuals(arima_model)

# Test residuals for independence
lb_residuals <- Box.test(residuals_arima, lag = 20, type = "Ljung-Box")

cat("\nResidual Diagnostics:\n\n")

Residual Diagnostics:
Code
cat("Ljung-Box Test on Residuals:\n")
Ljung-Box Test on Residuals:
Code
cat("  p-value:", round(lb_residuals$p.value, 4), "\n")
  p-value: 0.0116 
Code
cat("  Interpretation:", ifelse(lb_residuals$p.value > 0.05,
                                 "[OK] Residuals are white noise (independent)",
                                 "[WARNING] Residuals still autocorrelated"), "\n\n")
  Interpretation: [WARNING] Residuals still autocorrelated 
Code
# Residual statistics
resid_stats <- tibble(
  Statistic = c("Mean", "Std Dev", "Min", "Max", "Ljung-Box p-value"),
  Value = c(
    mean(residuals_arima),
    sd(residuals_arima),
    min(residuals_arima),
    max(residuals_arima),
    lb_residuals$p.value
  )
)

resid_stats %>%
  gt() %>%
  tab_header(
    title = "Residual Statistics",
    subtitle = "ARIMA model residuals should be white noise"
  ) %>%
  fmt_number(
    columns = Value,
    decimals = 4
  ) %>%
  tab_footnote(
    footnote = "p-value > 0.05 indicates residuals are independent (good)",
    locations = cells_body(rows = Statistic == "Ljung-Box p-value")
  )
Residual Statistics
ARIMA model residuals should be white noise
Statistic Value
Mean 0.0159
Std Dev 1.5631
Min −4.0570
Max 3.6485
Ljung-Box p-value1 1 0.0116
1 p-value > 0.05 indicates residuals are independent (good)
Code
# Residual plots
resid_data <- tibble(
  Time = 1:length(residuals_arima),
  Residual = as.numeric(residuals_arima)
)

resid_plot <- plot_ly(resid_data) %>%
  add_trace(
    x = ~Time,
    y = ~Residual,
    type = "scatter",
    mode = "lines+markers",
    name = "Residuals",
    line = list(color = "blue"),
    marker = list(size = 4),
    hovertemplate = "<b>Time:</b> %{x}<br><b>Residual:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = 0, xend = max(resid_data$Time),
    y = 0, yend = 0,
    line = list(color = "black", dash = "solid", width = 1),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = 0, xend = max(resid_data$Time),
    y = 3 * sd(residuals_arima), yend = 3 * sd(residuals_arima),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  add_segments(
    x = 0, xend = max(resid_data$Time),
    y = -3 * sd(residuals_arima), yend = -3 * sd(residuals_arima),
    line = list(color = "red", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "ARIMA Residuals Time Plot<br><sub>Residuals should be random with no patterns</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Residual")
  )

resid_plot

3. Control Charts for Residuals

I-MR Chart on ARIMA Residuals

Code
# Now residuals should be independent - apply traditional control charts
residuals_vec <- as.numeric(residuals_arima)

# Calculate moving range
mr <- abs(diff(residuals_vec))
mr_mean <- mean(mr, na.rm = TRUE)

# Control limits for individuals
d2 <- 1.128  # For n=2
sigma_hat <- mr_mean / d2

ucl_i <- mean(residuals_vec) + 3 * sigma_hat
lcl_i <- mean(residuals_vec) - 3 * sigma_hat

# Control limits for moving range
d3 <- 0      # For n=2
d4 <- 3.267  # For n=2
ucl_mr <- d4 * mr_mean
lcl_mr <- d3 * mr_mean

cat("\nControl Chart Parameters (on Residuals):\n\n")

Control Chart Parameters (on Residuals):
Code
cat("Individuals Chart:\n")
Individuals Chart:
Code
cat("  Center Line:", round(mean(residuals_vec), 4), "\n")
  Center Line: 0.0159 
Code
cat("  UCL:", round(ucl_i, 4), "\n")
  UCL: 4.8869 
Code
cat("  LCL:", round(lcl_i, 4), "\n\n")
  LCL: -4.8551 
Code
cat("Moving Range Chart:\n")
Moving Range Chart:
Code
cat("  Center Line:", round(mr_mean, 4), "\n")
  Center Line: 1.8315 
Code
cat("  UCL:", round(ucl_mr, 4), "\n")
  UCL: 5.9835 
Code
cat("  LCL:", round(lcl_mr, 4), "\n\n")
  LCL: 0 
Code
# Identify out-of-control points
ooc_i <- which(residuals_vec > ucl_i | residuals_vec < lcl_i)
cat("Out-of-control points (Individuals):", length(ooc_i), "\n")
Out-of-control points (Individuals): 0 
Code
if (length(ooc_i) > 0) {
  cat("  At times:", head(ooc_i, 10), "\n")
}

# Control chart data
control_data <- tibble(
  Time = 1:length(residuals_vec),
  Value = residuals_vec,
  UCL = ucl_i,
  CL = mean(residuals_vec),
  LCL = lcl_i,
  Status = ifelse(Value > UCL | Value < LCL, "Out of Control", "In Control")
)

# Individuals chart
i_chart <- plot_ly(control_data) %>%
  add_trace(
    x = ~Time,
    y = ~Value,
    type = "scatter",
    mode = "lines+markers",
    name = "Residuals",
    line = list(color = "blue"),
    marker = list(
      size = 6,
      color = ~ifelse(Status == "Out of Control", "red", "blue")
    ),
    text = ~paste0("Time: ", Time, "<br>Value: ", round(Value, 3), "<br>", Status),
    hovertemplate = "%{text}<extra></extra>"
  ) %>%
  add_trace(
    x = ~Time, y = ~UCL, type = "scatter", mode = "lines",
    name = "UCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = ~CL, type = "scatter", mode = "lines",
    name = "Center Line", line = list(color = "green", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = ~LCL, type = "scatter", mode = "lines",
    name = "LCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  layout(
    title = list(
      text = "I-Chart: ARIMA Residuals<br><sub>Monitoring residuals (independent observations)</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Residual Value")
  )

i_chart

4. Monitoring Phase II with ARIMA

One-Step-Ahead Forecasts

Code
# Use ARIMA model to monitor Phase II data
# Calculate one-step-ahead forecast errors

cat("\nPhase II Monitoring with ARIMA Model:\n\n")

Phase II Monitoring with ARIMA Model:
Code
# Refit model on full data for monitoring
all_residuals <- numeric(length(y))
all_forecasts <- numeric(length(y))

# Calculate residuals for all data
for (i in 1:length(y)) {
  if (i <= 150) {
    # Use Phase I model
    all_residuals[i] <- residuals_arima[i]
    all_forecasts[i] <- y[i] - residuals_arima[i]
  } else {
    # One-step-ahead forecast for Phase II
    # Refit on data up to i-1
    temp_model <- Arima(y[1:(i-1)], 
                        order = c(arima_model$arma[1], arima_model$arma[6], arima_model$arma[2]),
                        include.mean = TRUE)
    forecast_i <- forecast(temp_model, h = 1)
    all_forecasts[i] <- forecast_i$mean[1]
    all_residuals[i] <- y[i] - forecast_i$mean[1]
  }
}

# Calculate control limits (same as Phase I)
ucl_phase2 <- 3 * sigma_hat
lcl_phase2 <- -3 * sigma_hat

# Identify Phase II out-of-control
phase2_residuals <- all_residuals[151:200]
ooc_phase2 <- which(abs(phase2_residuals) > 3 * sigma_hat)

cat("Phase II Out-of-Control Points:", length(ooc_phase2), "out of 50\n")
Phase II Out-of-Control Points: 0 out of 50
Code
cat("False Alarm Rate:", round(length(ooc_phase2) / 50 * 100, 2), "%\n\n")
False Alarm Rate: 0 %
Code
# Full time series plot with forecasts
monitoring_data <- tibble(
  Time = 1:length(y),
  Observation = y,
  Forecast = all_forecasts,
  Residual = all_residuals,
  Phase = ifelse(Time <= 150, "Phase I", "Phase II"),
  UCL = ucl_phase2,
  LCL = lcl_phase2,
  Status = ifelse(abs(Residual) > 3 * sigma_hat, "Out of Control", "In Control")
)

# Monitoring chart
monitoring_plot <- plot_ly(monitoring_data) %>%
  add_trace(
    x = ~Time,
    y = ~Observation,
    type = "scatter",
    mode = "lines+markers",
    name = "Observations",
    line = list(color = "black"),
    marker = list(size = 4)
  ) %>%
  add_trace(
    x = ~Time,
    y = ~Forecast,
    type = "scatter",
    mode = "lines",
    name = "ARIMA Forecast",
    line = list(color = "blue", dash = "dot")
  ) %>%
  add_segments(
    x = 150.5, xend = 150.5,
    y = min(monitoring_data$Observation), yend = max(monitoring_data$Observation),
    line = list(color = "gray", dash = "dot", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "ARIMA-Based Process Monitoring<br><sub>Observations vs One-Step-Ahead Forecasts</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Measurement"),
    annotations = list(
      list(
        x = 150.5, y = max(monitoring_data$Observation),
        text = "Phase I | Phase II",
        showarrow = FALSE,
        font = list(size = 10, color = "gray")
      )
    )
  )

monitoring_plot
Code
# Residual monitoring chart for Phase II
residual_monitor_plot <- plot_ly(monitoring_data %>% filter(Time > 150)) %>%
  add_trace(
    x = ~Time,
    y = ~Residual,
    type = "scatter",
    mode = "lines+markers",
    name = "Forecast Errors",
    line = list(color = "blue"),
    marker = list(
      size = 6,
      color = ~ifelse(Status == "Out of Control", "red", "blue")
    )
  ) %>%
  add_trace(
    x = ~Time, y = ~UCL, type = "scatter", mode = "lines",
    name = "UCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = 0, type = "scatter", mode = "lines",
    name = "Center", line = list(color = "green", width = 2)
  ) %>%
  add_trace(
    x = ~Time, y = ~LCL, type = "scatter", mode = "lines",
    name = "LCL", line = list(color = "red", dash = "dash", width = 2)
  ) %>%
  layout(
    title = list(
      text = "Phase II: Forecast Error Monitoring<br><sub>Detecting shifts in autocorrelated process</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Time"),
    yaxis = list(title = "Forecast Error (Residual)")
  )

residual_monitor_plot

5. Time Series Cross-Validation

Rolling Origin Cross-Validation

Code
cat("\nTime Series Cross-Validation:\n\n")

Time Series Cross-Validation:
Code
cat("Method: Rolling Origin (Expanding Window)\n")
Method: Rolling Origin (Expanding Window)
Code
cat("This respects temporal order and tests forecasting performance\n\n")
This respects temporal order and tests forecasting performance
Code
# Rolling origin cross-validation
# Start with minimum training size, expand window each iteration
min_train <- 50
n_test <- length(y_phase1)
h <- 1  # One-step-ahead forecast

# Storage for results
cv_errors <- numeric(n_test - min_train)
cv_mae <- numeric(n_test - min_train)
cv_rmse <- numeric(n_test - min_train)

for (i in seq_along(cv_errors)) {
  train_end <- min_train + i - 1
  test_point <- train_end + 1
  
  if (test_point <= length(y_phase1)) {
    # Fit model on training data
    train_data <- y_phase1[1:train_end]
    test_data <- y_phase1[test_point]
    
    # Fit ARIMA
    cv_model <- auto.arima(train_data, seasonal = FALSE)
    
    # Forecast
    cv_forecast <- forecast(cv_model, h = h)
    
    # Calculate error
    error <- test_data - cv_forecast$mean[1]
    cv_errors[i] <- error
    cv_mae[i] <- abs(error)
    cv_rmse[i] <- error^2
  }
}

# Summary statistics
mae_cv <- mean(cv_mae, na.rm = TRUE)
rmse_cv <- sqrt(mean(cv_rmse, na.rm = TRUE))
mape_cv <- mean(abs(cv_errors / y_phase1[(min_train+1):length(y_phase1)]) * 100, na.rm = TRUE)

cat("Cross-Validation Results:\n")
Cross-Validation Results:
Code
cat("  Number of folds:", length(cv_errors), "\n")
  Number of folds: 100 
Code
cat("  MAE:", round(mae_cv, 4), "\n")
  MAE: 1.2871 
Code
cat("  RMSE:", round(rmse_cv, 4), "\n")
  RMSE: 1.5549 
Code
cat("  MAPE:", round(mape_cv, 2), "%\n\n")
  MAPE: 1.28 %
Code
# CV results table
cv_summary <- tibble(
  Metric = c("Mean Absolute Error (MAE)", "Root Mean Squared Error (RMSE)", 
             "Mean Absolute Percentage Error (MAPE)", "Number of Folds"),
  Value = c(
    paste0(round(mae_cv, 4)),
    paste0(round(rmse_cv, 4)),
    paste0(round(mape_cv, 2), "%"),
    as.character(length(cv_errors))
  ),
  Interpretation = c(
    "Average forecast error magnitude",
    "Penalizes large errors more",
    "Relative error as percentage",
    "Expanding window iterations"
  )
)

cv_summary %>%
  gt() %>%
  tab_header(
    title = "Time Series Cross-Validation Results",
    subtitle = "Rolling origin method with expanding window"
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )
Time Series Cross-Validation Results
Rolling origin method with expanding window
Metric Value Interpretation
Mean Absolute Error (MAE) 1.2871 Average forecast error magnitude
Root Mean Squared Error (RMSE) 1.5549 Penalizes large errors more
Mean Absolute Percentage Error (MAPE) 1.28% Relative error as percentage
Number of Folds 100 Expanding window iterations
Code
# Plot CV errors over time
cv_plot_data <- tibble(
  Fold = 1:length(cv_errors),
  `Forecast Error` = cv_errors,
  `Training Size` = min_train + Fold - 1
)

cv_error_plot <- plot_ly(cv_plot_data) %>%
  add_trace(
    x = ~`Training Size`,
    y = ~`Forecast Error`,
    type = "scatter",
    mode = "lines+markers",
    name = "CV Error",
    line = list(color = "blue"),
    marker = list(size = 4),
    hovertemplate = "<b>Training Size:</b> %{x}<br><b>Error:</b> %{y:.3f}<extra></extra>"
  ) %>%
  add_segments(
    x = min(cv_plot_data$`Training Size`), 
    xend = max(cv_plot_data$`Training Size`),
    y = 0, yend = 0,
    line = list(color = "black", width = 1),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "Cross-Validation Forecast Errors<br><sub>Performance improves with more training data</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Training Sample Size"),
    yaxis = list(title = "One-Step-Ahead Forecast Error")
  )

cv_error_plot

Block Bootstrap for Time Series

Code
cat("\nBlock Bootstrap for Time Series:\n\n")

Block Bootstrap for Time Series:
Code
cat("Method: Moving Block Bootstrap\n")
Method: Moving Block Bootstrap
Code
cat("Preserves temporal dependence structure\n\n")
Preserves temporal dependence structure
Code
# Moving block bootstrap
block_length <- 10  # Block size
n_bootstrap <- 1000  # Number of bootstrap samples
n_obs <- length(y_phase1)

# Storage for bootstrap statistics
boot_means <- numeric(n_bootstrap)
boot_vars <- numeric(n_bootstrap)
boot_acf1 <- numeric(n_bootstrap)

set.seed(456)

for (b in 1:n_bootstrap) {
  # Generate bootstrap sample using moving blocks
  n_blocks <- ceiling(n_obs / block_length)
  boot_sample <- numeric(0)
  
  for (i in 1:n_blocks) {
    # Random starting point
    start <- sample(1:(n_obs - block_length + 1), 1)
    block <- y_phase1[start:(start + block_length - 1)]
    boot_sample <- c(boot_sample, block)
  }
  
  # Trim to original length
  boot_sample <- boot_sample[1:n_obs]
  
  # Calculate statistics
  boot_means[b] <- mean(boot_sample)
  boot_vars[b] <- var(boot_sample)
  
  # Calculate ACF(1)
  acf_temp <- acf(boot_sample, lag.max = 1, plot = FALSE)
  boot_acf1[b] <- acf_temp$acf[2]
}

# Calculate confidence intervals
ci_level <- 0.95
alpha <- 1 - ci_level

mean_ci <- quantile(boot_means, c(alpha/2, 1-alpha/2))
var_ci <- quantile(boot_vars, c(alpha/2, 1-alpha/2))
acf1_ci <- quantile(boot_acf1, c(alpha/2, 1-alpha/2))

cat("Bootstrap Results (", n_bootstrap, "iterations, block length =", block_length, "):\n\n")
Bootstrap Results ( 1000 iterations, block length = 10 ):
Code
# Bootstrap results table
boot_results <- tibble(
  Statistic = c("Mean", "Variance", "ACF(1)"),
  `Original Value` = c(
    mean(y_phase1),
    var(y_phase1),
    acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2]
  ),
  `Bootstrap Mean` = c(
    mean(boot_means),
    mean(boot_vars),
    mean(boot_acf1)
  ),
  `95% CI Lower` = c(mean_ci[1], var_ci[1], acf1_ci[1]),
  `95% CI Upper` = c(mean_ci[2], var_ci[2], acf1_ci[2])
)

boot_results %>%
  gt() %>%
  tab_header(
    title = "Block Bootstrap Results",
    subtitle = paste0("Moving block bootstrap (", n_bootstrap, " iterations, block length = ", block_length, ")")
  ) %>%
  fmt_number(
    columns = c(`Original Value`, `Bootstrap Mean`, `95% CI Lower`, `95% CI Upper`),
    decimals = 4
  ) %>%
  tab_footnote(
    footnote = "Bootstrap preserves autocorrelation structure in time series",
    locations = cells_column_labels(columns = `Bootstrap Mean`)
  )
Block Bootstrap Results
Moving block bootstrap (1000 iterations, block length = 10)
Statistic Original Value Bootstrap Mean1 95% CI Lower 95% CI Upper
Mean 100.2371 100.2205 99.4992 100.9615
Variance 4.8457 4.8260 3.4209 6.4229
ACF(1) 0.7004 0.6190 0.4732 0.7275
1 Bootstrap preserves autocorrelation structure in time series
Code
# Bootstrap distribution plots
boot_dist_data <- tibble(
  Mean = boot_means,
  Variance = boot_vars,
  `ACF(1)` = boot_acf1
)

# Mean distribution
mean_hist <- plot_ly(x = boot_dist_data$Mean, type = "histogram", 
                     name = "Bootstrap Distribution",
                     marker = list(color = "lightblue", line = list(color = "black", width = 1))) %>%
  add_segments(
    x = mean(y_phase1), xend = mean(y_phase1),
    y = 0, yend = 150,
    line = list(color = "red", width = 3),
    name = "Original Value"
  ) %>%
  add_segments(
    x = mean_ci[1], xend = mean_ci[1],
    y = 0, yend = 150,
    line = list(color = "green", dash = "dash", width = 2),
    name = "95% CI"
  ) %>%
  add_segments(
    x = mean_ci[2], xend = mean_ci[2],
    y = 0, yend = 150,
    line = list(color = "green", dash = "dash", width = 2),
    showlegend = FALSE
  ) %>%
  layout(
    title = list(
      text = "Bootstrap Distribution of Mean<br><sub>Block bootstrap preserves temporal dependence</sub>",
      font = list(size = 16, weight = "bold")
    ),
    xaxis = list(title = "Mean Value"),
    yaxis = list(title = "Frequency")
  )

mean_hist

Comparing Bootstrap Methods

Code
# Compare block bootstrap vs naive bootstrap
cat("\nComparing Bootstrap Methods:\n\n")

Comparing Bootstrap Methods:
Code
# Naive bootstrap (WRONG for time series - ignores autocorrelation)
naive_boot_acf1 <- numeric(n_bootstrap)

set.seed(456)
for (b in 1:n_bootstrap) {
  # Naive bootstrap: random sampling with replacement
  naive_sample <- sample(y_phase1, replace = TRUE)
  acf_temp <- acf(naive_sample, lag.max = 1, plot = FALSE)
  naive_boot_acf1[b] <- acf_temp$acf[2]
}

# Comparison
comparison_boot <- tibble(
  Method = c("Original Data", "Block Bootstrap", "Naive Bootstrap (WRONG)"),
  `Mean ACF(1)` = c(
    acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2],
    mean(boot_acf1),
    mean(naive_boot_acf1)
  ),
  `Std Dev ACF(1)` = c(
    NA,
    sd(boot_acf1),
    sd(naive_boot_acf1)
  ),
  Validity = c(
    "Ground truth",
    "[OK] Preserves dependence",
    "[X] Destroys dependence"
  )
)

comparison_boot %>%
  gt() %>%
  tab_header(
    title = "Bootstrap Method Comparison",
    subtitle = "Block bootstrap vs Naive bootstrap for time series"
  ) %>%
  fmt_number(
    columns = c(`Mean ACF(1)`, `Std Dev ACF(1)`),
    decimals = 4
  ) %>%
  sub_missing(
    columns = `Std Dev ACF(1)`,
    missing_text = "—"
  ) %>%
  tab_style(
    style = cell_fill(color = "lightgreen"),
    locations = cells_body(rows = Method == "Block Bootstrap")
  ) %>%
  tab_style(
    style = cell_fill(color = "lightcoral"),
    locations = cells_body(rows = Method == "Naive Bootstrap (WRONG)")
  ) %>%
  tab_footnote(
    footnote = "Naive bootstrap underestimates ACF because it breaks temporal order",
    locations = cells_body(rows = Method == "Naive Bootstrap (WRONG)", columns = Method)
  )
Bootstrap Method Comparison
Block bootstrap vs Naive bootstrap for time series
Method Mean ACF(1) Std Dev ACF(1) Validity
Original Data 0.7004 Ground truth
Block Bootstrap 0.6190 0.0642 [OK] Preserves dependence
Naive Bootstrap (WRONG)1 −0.0093 0.0770 [X] Destroys dependence
1 Naive bootstrap underestimates ACF because it breaks temporal order
Code
cat("\nKey Insight:\n")

Key Insight:
Code
cat("Block bootstrap mean ACF(1):", round(mean(boot_acf1), 4), "\n")
Block bootstrap mean ACF(1): 0.619 
Code
cat("Original data ACF(1):", round(acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2], 4), "\n")
Original data ACF(1): 0.7004 
Code
cat("Naive bootstrap mean ACF(1):", round(mean(naive_boot_acf1), 4), "\n\n")
Naive bootstrap mean ACF(1): -0.0093 
Code
cat("Block bootstrap preserves autocorrelation, naive bootstrap destroys it!\n")
Block bootstrap preserves autocorrelation, naive bootstrap destroys it!

6. Dashboard and Summary

Code
# Summary metrics
dashboard_metrics <- tibble(
  Metric = c(
    "Total Observations",
    "Phase I Samples",
    "Phase II Samples",
    "ARIMA Model",
    "Autocorrelation (ACF1)",
    "Phase I OOC Points",
    "Phase II OOC Points",
    "CV MAE",
    "Bootstrap Mean CI Width"
  ),
  Value = c(
    length(y),
    150,
    50,
    paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),
    as.character(round(acf(y_phase1, lag.max = 1, plot = FALSE)$acf[2], 3)),
    as.character(length(ooc_i)),
    as.character(length(ooc_phase2)),
    as.character(round(mae_cv, 4)),
    as.character(round(mean_ci[2] - mean_ci[1], 4))
  )
)

dashboard_metrics %>%
  gt() %>%
  tab_header(
    title = "Time Series SPC Dashboard",
    subtitle = paste("Analysis Date:", Sys.Date())
  ) %>%
  tab_style(
    style = cell_fill(color = "lightblue"),
    locations = cells_column_labels()
  )
Time Series SPC Dashboard
Analysis Date: 2025-12-21
Metric Value
Total Observations 200
Phase I Samples 150
Phase II Samples 50
ARIMA Model ARIMA(1,0,0)
Autocorrelation (ACF1) 0.7
Phase I OOC Points 0
Phase II OOC Points 0
CV MAE 1.2871
Bootstrap Mean CI Width 1.4623

7. Recommendations

When to Use Time Series SPC

Use ARIMA-based SPC when: - ✅ Autocorrelation detected (Ljung-Box p < 0.05) - ✅ Durbin-Watson statistic < 1.5 or > 2.5 - ✅ Slow-moving process (chemical, thermal) - ✅ High-frequency measurements - ✅ Traditional charts show too many false alarms

Stick with traditional SPC when: - ✅ No significant autocorrelation - ✅ Independent samples - ✅ Rational subgrouping possible - ✅ Simple process

Best Practices

1. Model Validation: - Always check residual diagnostics - Ljung-Box test on residuals should have p > 0.05 - Residuals should look like white noise

2. Cross-Validation: - Use time series CV (not random k-fold!) - Rolling origin or expanding window - Report out-of-sample forecast accuracy

3. Bootstrap: - Use block bootstrap (not naive) - Block length ~ 1.5 × ACF length - Validate CI coverage

4. Monitoring: - Monitor residuals, not original observations - Update model periodically with new data - Watch for model deterioration

Practical Implementation

Code
# Generate recommendations based on analysis
cat("\n### Analysis-Specific Recommendations:\n\n")

### Analysis-Specific Recommendations:
Code
if (lb_test$p.value < 0.05) {
  cat("[REQUIRED] Significant autocorrelation detected (p =", round(lb_test$p.value, 4), ")\n")
  cat("  Action: Use ARIMA-based SPC (implemented above)\n")
  cat("  Benefit: Proper false alarm rate (~0.27% instead of ~", 
      round(length(ooc_i)/150*100, 1), "%)\n\n")
} else {
  cat("[OPTIONAL] Weak autocorrelation detected\n")
  cat("  Action: Traditional SPC may be acceptable\n")
  cat("  Alternative: Use EWMA charts for added robustness\n\n")
}
[REQUIRED] Significant autocorrelation detected (p = 0 )
  Action: Use ARIMA-based SPC (implemented above)
  Benefit: Proper false alarm rate (~0.27% instead of ~ 0 %)
Code
if (length(ooc_phase2) > 5) {
  cat("[ALERT] Multiple Phase II out-of-control signals\n")
  cat("  Count:", length(ooc_phase2), "out of 50 samples\n")
  cat("  Action: Investigate process change at sample ~", min(ooc_phase2) + 150, "\n\n")
}

cat("Model Performance:\n")
Model Performance:
Code
cat("  CV MAE:", round(mae_cv, 4), "- Good if < 1 sigma\n")
  CV MAE: 1.2871 - Good if < 1 sigma
Code
cat("  Model:", paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")\n"))
  Model: ARIMA(1,0,0)

Report Generated: 2025-12-21 10:07:12.26231
Analysis Method: ARIMA-based Time Series SPC
Statistical Software: R with forecast and qcc packages
Methodology: Time series modeling with cross-validation and bootstrap