Autocorrelated Data, ARIMA Models, and Time Series Cross-Validation
Author
David Guy PhD: Quality Control Department
Published
December 21, 2025
Introduction to Time Series SPC
The Autocorrelation Problem
Traditional SPC assumes: - Observations are independent - No correlation between consecutive measurements - Random sampling from a stable process
Reality in many processes: - ✗ Chemical reactors have thermal inertia - ✗ Manufacturing lines have tool wear - ✗ Measurements taken close in time are correlated - ✗ This is called autocorrelation
Why Autocorrelation Matters
Impact on traditional control charts: 1. False alarm rate increases - charts signal when nothing changed 2. Detection power decreases - miss real shifts 3. Control limits too narrow - calculated assuming independence 4. Run length distribution changes - not geometric anymore
Example: - Traditional I-MR chart: 1% false alarm rate - With autocorrelation (ρ=0.5): 15% false alarm rate! - Process appears “out of control” when it’s actually stable
Solutions for Autocorrelated Data
1. Time Series Models (ARIMA)
Model the autocorrelation explicitly
Monitor residuals (which ARE independent)
Most theoretically sound approach
2. Modified Control Charts
EWMA (exponentially weighted moving average)
CUSUM (cumulative sum)
More robust to autocorrelation
3. Time Series Cross-Validation
Block-based CV respects temporal order
Time series bootstrap preserves dependence
Validates model performance
Setup and Libraries
Code
# Load required packageslibrary(forecast) # Time series modeling (ARIMA, auto.arima)library(tseries) # Time series tests (ADF, KPSS)library(plotly) # Interactive visualizationslibrary(gt) # Beautiful tableslibrary(dplyr) # Data manipulationlibrary(tidyr) # Data tidyinglibrary(ggplot2) # Graphicslibrary(readr) # Data importlibrary(lubridate) # Date handlinglibrary(DT) # Interactive tableslibrary(knitr) # Reportslibrary(qcc) # Quality control chartslibrary(zoo) # Moving averagestheme_set(theme_minimal())cat("Time Series SPC Analysis\n")
cat("- Main plot shows overall trend and phase separation\n")
- Main plot shows overall trend and phase separation
Code
cat("- Histogram shows distribution shift between phases\n")
- Histogram shows distribution shift between phases
Code
cat("- Moving average reveals underlying trend\n")
- Moving average reveals underlying trend
Code
cat("- First differences show period-to-period variability\n\n")
- First differences show period-to-period variability
Code
if (mean(ts_data$measurement[151:200]) >mean(ts_data$measurement[1:150])) {cat("Observation: Phase II mean is HIGHER than Phase I\n")cat("Shift magnitude:", round(mean(ts_data$measurement[151:200]) -mean(ts_data$measurement[1:150]), 3), "\n\n")} else {cat("Observation: Phase II mean is LOWER than Phase I\n")cat("Shift magnitude:", round(abs(mean(ts_data$measurement[151:200]) -mean(ts_data$measurement[1:150])), 3), "\n\n")}
Observation: Phase II mean is HIGHER than Phase I
Shift magnitude: 2.034
1. Diagnostic Analysis
Check for Autocorrelation
Code
# Extract measurement vectory <- ts_data$measurementy_phase1 <- y[1:150] # Phase I for model building# Create time series objectts_phase1 <-ts(y_phase1, frequency =1)ts_all <-ts(y, frequency =1)# Calculate autocorrelation function (ACF)acf_result <-acf(y_phase1, lag.max =30, plot =FALSE)pacf_result <-pacf(y_phase1, lag.max =30, plot =FALSE)# Test for autocorrelation# Ljung-Box test (H0: no autocorrelation)lb_test <-Box.test(y_phase1, lag =20, type ="Ljung-Box")# Durbin-Watson testdw_stat <-sum(diff(y_phase1)^2) /sum((y_phase1 -mean(y_phase1))^2)cat("\nAutocorrelation Tests:\n\n")
# Extract residualsresiduals_arima <-residuals(arima_model)# Test residuals for independencelb_residuals <-Box.test(residuals_arima, lag =20, type ="Ljung-Box")cat("\nResidual Diagnostics:\n\n")
cat(" Interpretation:", ifelse(lb_residuals$p.value >0.05,"[OK] Residuals are white noise (independent)","[WARNING] Residuals still autocorrelated"), "\n\n")
Interpretation: [WARNING] Residuals still autocorrelated
Code
# Residual statisticsresid_stats <-tibble(Statistic =c("Mean", "Std Dev", "Min", "Max", "Ljung-Box p-value"),Value =c(mean(residuals_arima),sd(residuals_arima),min(residuals_arima),max(residuals_arima), lb_residuals$p.value ))resid_stats %>%gt() %>%tab_header(title ="Residual Statistics",subtitle ="ARIMA model residuals should be white noise" ) %>%fmt_number(columns = Value,decimals =4 ) %>%tab_footnote(footnote ="p-value > 0.05 indicates residuals are independent (good)",locations =cells_body(rows = Statistic =="Ljung-Box p-value") )
Residual Statistics
ARIMA model residuals should be white noise
Statistic
Value
Mean
0.0159
Std Dev
1.5631
Min
−4.0570
Max
3.6485
Ljung-Box p-value1
1 0.0116
1 p-value > 0.05 indicates residuals are independent (good)
# Now residuals should be independent - apply traditional control chartsresiduals_vec <-as.numeric(residuals_arima)# Calculate moving rangemr <-abs(diff(residuals_vec))mr_mean <-mean(mr, na.rm =TRUE)# Control limits for individualsd2 <-1.128# For n=2sigma_hat <- mr_mean / d2ucl_i <-mean(residuals_vec) +3* sigma_hatlcl_i <-mean(residuals_vec) -3* sigma_hat# Control limits for moving ranged3 <-0# For n=2d4 <-3.267# For n=2ucl_mr <- d4 * mr_meanlcl_mr <- d3 * mr_meancat("\nControl Chart Parameters (on Residuals):\n\n")
Control Chart Parameters (on Residuals):
Code
cat("Individuals Chart:\n")
Individuals Chart:
Code
cat(" Center Line:", round(mean(residuals_vec), 4), "\n")
if (length(ooc_i) >0) {cat(" At times:", head(ooc_i, 10), "\n")}# Control chart datacontrol_data <-tibble(Time =1:length(residuals_vec),Value = residuals_vec,UCL = ucl_i,CL =mean(residuals_vec),LCL = lcl_i,Status =ifelse(Value > UCL | Value < LCL, "Out of Control", "In Control"))# Individuals charti_chart <-plot_ly(control_data) %>%add_trace(x =~Time,y =~Value,type ="scatter",mode ="lines+markers",name ="Residuals",line =list(color ="blue"),marker =list(size =6,color =~ifelse(Status =="Out of Control", "red", "blue") ),text =~paste0("Time: ", Time, "<br>Value: ", round(Value, 3), "<br>", Status),hovertemplate ="%{text}<extra></extra>" ) %>%add_trace(x =~Time, y =~UCL, type ="scatter", mode ="lines",name ="UCL", line =list(color ="red", dash ="dash", width =2) ) %>%add_trace(x =~Time, y =~CL, type ="scatter", mode ="lines",name ="Center Line", line =list(color ="green", width =2) ) %>%add_trace(x =~Time, y =~LCL, type ="scatter", mode ="lines",name ="LCL", line =list(color ="red", dash ="dash", width =2) ) %>%layout(title =list(text ="I-Chart: ARIMA Residuals<br><sub>Monitoring residuals (independent observations)</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time"),yaxis =list(title ="Residual Value") )i_chart
4. Monitoring Phase II with ARIMA
One-Step-Ahead Forecasts
Code
# Use ARIMA model to monitor Phase II data# Calculate one-step-ahead forecast errorscat("\nPhase II Monitoring with ARIMA Model:\n\n")
Phase II Monitoring with ARIMA Model:
Code
# Refit model on full data for monitoringall_residuals <-numeric(length(y))all_forecasts <-numeric(length(y))# Calculate residuals for all datafor (i in1:length(y)) {if (i <=150) {# Use Phase I model all_residuals[i] <- residuals_arima[i] all_forecasts[i] <- y[i] - residuals_arima[i] } else {# One-step-ahead forecast for Phase II# Refit on data up to i-1 temp_model <-Arima(y[1:(i-1)], order =c(arima_model$arma[1], arima_model$arma[6], arima_model$arma[2]),include.mean =TRUE) forecast_i <-forecast(temp_model, h =1) all_forecasts[i] <- forecast_i$mean[1] all_residuals[i] <- y[i] - forecast_i$mean[1] }}# Calculate control limits (same as Phase I)ucl_phase2 <-3* sigma_hatlcl_phase2 <--3* sigma_hat# Identify Phase II out-of-controlphase2_residuals <- all_residuals[151:200]ooc_phase2 <-which(abs(phase2_residuals) >3* sigma_hat)cat("Phase II Out-of-Control Points:", length(ooc_phase2), "out of 50\n")
# Summary metricsdashboard_metrics <-tibble(Metric =c("Total Observations","Phase I Samples","Phase II Samples","ARIMA Model","Autocorrelation (ACF1)","Phase I OOC Points","Phase II OOC Points","CV MAE","Bootstrap Mean CI Width" ),Value =c(length(y),150,50,paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),as.character(round(acf(y_phase1, lag.max =1, plot =FALSE)$acf[2], 3)),as.character(length(ooc_i)),as.character(length(ooc_phase2)),as.character(round(mae_cv, 4)),as.character(round(mean_ci[2] - mean_ci[1], 4)) ))dashboard_metrics %>%gt() %>%tab_header(title ="Time Series SPC Dashboard",subtitle =paste("Analysis Date:", Sys.Date()) ) %>%tab_style(style =cell_fill(color ="lightblue"),locations =cells_column_labels() )
Time Series SPC Dashboard
Analysis Date: 2025-12-21
Metric
Value
Total Observations
200
Phase I Samples
150
Phase II Samples
50
ARIMA Model
ARIMA(1,0,0)
Autocorrelation (ACF1)
0.7
Phase I OOC Points
0
Phase II OOC Points
0
CV MAE
1.2871
Bootstrap Mean CI Width
1.4623
7. Recommendations
When to Use Time Series SPC
Use ARIMA-based SPC when: - ✅ Autocorrelation detected (Ljung-Box p < 0.05) - ✅ Durbin-Watson statistic < 1.5 or > 2.5 - ✅ Slow-moving process (chemical, thermal) - ✅ High-frequency measurements - ✅ Traditional charts show too many false alarms
Stick with traditional SPC when: - ✅ No significant autocorrelation - ✅ Independent samples - ✅ Rational subgrouping possible - ✅ Simple process
Best Practices
1. Model Validation: - Always check residual diagnostics - Ljung-Box test on residuals should have p > 0.05 - Residuals should look like white noise
2. Cross-Validation: - Use time series CV (not random k-fold!) - Rolling origin or expanding window - Report out-of-sample forecast accuracy
3. Bootstrap: - Use block bootstrap (not naive) - Block length ~ 1.5 × ACF length - Validate CI coverage
4. Monitoring: - Monitor residuals, not original observations - Update model periodically with new data - Watch for model deterioration
Practical Implementation
Code
# Generate recommendations based on analysiscat("\n### Analysis-Specific Recommendations:\n\n")
### Analysis-Specific Recommendations:
Code
if (lb_test$p.value <0.05) {cat("[REQUIRED] Significant autocorrelation detected (p =", round(lb_test$p.value, 4), ")\n")cat(" Action: Use ARIMA-based SPC (implemented above)\n")cat(" Benefit: Proper false alarm rate (~0.27% instead of ~", round(length(ooc_i)/150*100, 1), "%)\n\n")} else {cat("[OPTIONAL] Weak autocorrelation detected\n")cat(" Action: Traditional SPC may be acceptable\n")cat(" Alternative: Use EWMA charts for added robustness\n\n")}
Report Generated: 2025-12-21 10:07:12.26231 Analysis Method: ARIMA-based Time Series SPC Statistical Software: R with forecast and qcc packages Methodology: Time series modeling with cross-validation and bootstrap
Source Code
---title: "Time Series Statistical Process Control"subtitle: "Autocorrelated Data, ARIMA Models, and Time Series Cross-Validation"author: "David Guy PhD: Quality Control Department"date: "`r Sys.Date()`"format: html: toc: true toc-depth: 3 toc-location: left code-fold: show code-tools: true theme: cosmo embed-resources: true fig-width: 12 fig-height: 8editor: visualexecute: warning: false message: false---# Introduction to Time Series SPC## The Autocorrelation Problem**Traditional SPC assumes:** - Observations are independent - No correlation between consecutive measurements - Random sampling from a stable process**Reality in many processes:** - ✗ Chemical reactors have thermal inertia - ✗ Manufacturing lines have tool wear - ✗ Measurements taken close in time are correlated - ✗ This is called **autocorrelation**## Why Autocorrelation Matters**Impact on traditional control charts:** 1. **False alarm rate increases** - charts signal when nothing changed 2. **Detection power decreases** - miss real shifts 3. **Control limits too narrow** - calculated assuming independence 4. **Run length distribution changes** - not geometric anymore**Example:** - Traditional I-MR chart: 1% false alarm rate - With autocorrelation (ρ=0.5): **15% false alarm rate!** - Process appears "out of control" when it's actually stable## Solutions for Autocorrelated Data### 1. Time Series Models (ARIMA)- Model the autocorrelation explicitly- Monitor residuals (which ARE independent)- Most theoretically sound approach### 2. Modified Control Charts- EWMA (exponentially weighted moving average)- CUSUM (cumulative sum)- More robust to autocorrelation### 3. Time Series Cross-Validation- Block-based CV respects temporal order- Time series bootstrap preserves dependence- Validates model performance## Setup and Libraries```{r setup}# Load required packageslibrary(forecast) # Time series modeling (ARIMA, auto.arima)library(tseries) # Time series tests (ADF, KPSS)library(plotly) # Interactive visualizationslibrary(gt) # Beautiful tableslibrary(dplyr) # Data manipulationlibrary(tidyr) # Data tidyinglibrary(ggplot2) # Graphicslibrary(readr) # Data importlibrary(lubridate) # Date handlinglibrary(DT) # Interactive tableslibrary(knitr) # Reportslibrary(qcc) # Quality control chartslibrary(zoo) # Moving averagestheme_set(theme_minimal())cat("Time Series SPC Analysis\n")cat("Package versions:\n")cat("forecast:", as.character(packageVersion("forecast")), "\n")cat("tseries:", as.character(packageVersion("tseries")), "\n\n")```## Import or Generate Time Series Data```{r import-data}# Import time series quality dataif (file.exists("timeseries_data.csv")) { ts_data <-read_csv("timeseries_data.csv", show_col_types =FALSE)cat("[OK] Time series data loaded from timeseries_data.csv\n\n")} else {cat("[INFO] No timeseries_data.csv found. Generating example data.\n")cat("For your own data, create timeseries_data.csv with columns:\n")cat(" time, measurement, batch, shift\n\n")# Generate realistic autocorrelated time seriesset.seed(123) n <-200# Phase I: In-control with autocorrelation (samples 1-150)# AR(1) process: X_t = φ*X_{t-1} + ε_t phi <-0.7# Autocorrelation parameter mu <-100# Process mean sigma <-2# Process std dev# Generate AR(1) process x <-numeric(n) x[1] <-rnorm(1, mu, sigma)for (i in2:150) { x[i] <- mu * (1- phi) + phi * x[i-1] +rnorm(1, 0, sigma *sqrt(1- phi^2)) }# Phase II: Mean shift (samples 151-200) mu_shift <-103# 1.5 sigma shiftfor (i in151:n) { x[i] <- mu_shift * (1- phi) + phi * x[i-1] +rnorm(1, 0, sigma *sqrt(1- phi^2)) }# Create dataframe ts_data <-tibble(time =1:n,date =seq.Date(from =Sys.Date() - n, to =Sys.Date() -1, by ="day"),measurement = x,batch =rep(1:20, each =10),shift =sample(c("Day", "Night"), n, replace =TRUE),phase =ifelse(time <=150, "Phase I", "Phase II") )}# Display summarycat("Time Series Data Summary:\n")cat("Observations:", nrow(ts_data), "\n")cat("Time range:", min(ts_data$time), "to", max(ts_data$time), "\n\n")# Show first observationsts_data %>%head(10) %>%gt() %>%tab_header(title ="Time Series Quality Data - Sample",subtitle ="First 10 observations" ) %>%fmt_number(columns = measurement,decimals =2 ) %>%tab_style(style =cell_fill(color ="lightblue"),locations =cells_column_labels() )```## Time Series Visualization```{r timeseries-plot}# Create comprehensive time series plotcat("\nTime Series Overview:\n\n")# Summary statisticsts_summary <- ts_data %>%group_by(phase) %>%summarise(n =n(),Mean =mean(measurement),`Std Dev`=sd(measurement),Min =min(measurement),Max =max(measurement),.groups ="drop" )ts_summary %>%gt() %>%tab_header(title ="Time Series Summary Statistics",subtitle ="Phase I vs Phase II comparison" ) %>%fmt_number(columns =c(Mean, `Std Dev`, Min, Max),decimals =3 ) %>%tab_style(style =cell_fill(color ="lightgreen"),locations =cells_body(rows = phase =="Phase I") ) %>%tab_style(style =cell_fill(color ="lightyellow"),locations =cells_body(rows = phase =="Phase II") )# Main time series plotts_plot <-plot_ly(ts_data) %>%add_trace(x =~time,y =~measurement,type ="scatter",mode ="lines+markers",name ="Measurements",line =list(color ="blue", width =2),marker =list(size =4,color =~ifelse(phase =="Phase I", "blue", "red") ),text =~paste0("Time: ", time, "<br>","Date: ", date, "<br>","Value: ", round(measurement, 2), "<br>","Batch: ", batch, "<br>","Shift: ", shift, "<br>","Phase: ", phase ),hovertemplate ="%{text}<extra></extra>" ) %>%add_segments(x =150.5, xend =150.5,y =min(ts_data$measurement), yend =max(ts_data$measurement),line =list(color ="red", dash ="dash", width =3),showlegend =FALSE,name ="Phase Boundary" ) %>%add_segments(x =1, xend =150,y =mean(ts_data$measurement[1:150]), yend =mean(ts_data$measurement[1:150]),line =list(color ="darkgreen", dash ="dot", width =2),name ="Phase I Mean" ) %>%add_segments(x =151, xend =200,y =mean(ts_data$measurement[151:200]), yend =mean(ts_data$measurement[151:200]),line =list(color ="orange", dash ="dot", width =2),name ="Phase II Mean" ) %>%layout(title =list(text ="Time Series Data Overview<br><sub>Full process measurement history with phase separation</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time Index"),yaxis =list(title ="Measurement Value"),hovermode ="closest",annotations =list(list(x =150.5, y =max(ts_data$measurement) *0.95,text ="Phase I | Phase II",showarrow =FALSE,xanchor ="center",font =list(size =12, color ="red", weight ="bold") ) ) )ts_plot# Histogram by phasehist_plot <-plot_ly() %>%add_trace(x = ts_data$measurement[ts_data$phase =="Phase I"],type ="histogram",name ="Phase I",marker =list(color ="lightblue", line =list(color ="darkblue", width =1)),opacity =0.7,nbinsx =20 ) %>%add_trace(x = ts_data$measurement[ts_data$phase =="Phase II"],type ="histogram",name ="Phase II",marker =list(color ="lightcoral", line =list(color ="darkred", width =1)),opacity =0.7,nbinsx =20 ) %>%layout(title =list(text ="Distribution Comparison<br><sub>Phase I vs Phase II histograms</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Measurement Value"),yaxis =list(title ="Frequency"),barmode ="overlay" )hist_plot# Moving average plot to show trendwindow_size <-10ts_data_ma <- ts_data %>%mutate(MA = zoo::rollmean(measurement, k = window_size, fill =NA, align ="right") )ma_plot <-plot_ly(ts_data_ma) %>%add_trace(x =~time,y =~measurement,type ="scatter",mode ="lines",name ="Original",line =list(color ="lightgray", width =1),opacity =0.5 ) %>%add_trace(x =~time,y =~MA,type ="scatter",mode ="lines",name =paste0("Moving Average (", window_size, ")"),line =list(color ="darkblue", width =3) ) %>%add_segments(x =150.5, xend =150.5,y =min(ts_data$measurement, na.rm =TRUE), yend =max(ts_data$measurement, na.rm =TRUE),line =list(color ="red", dash ="dash", width =2),showlegend =FALSE ) %>%layout(title =list(text ="Moving Average Trend<br><sub>Smoothed time series showing process shift</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time Index"),yaxis =list(title ="Measurement Value") )ma_plot# Calculate and display first differencests_data_diff <- ts_data %>%mutate(Difference =c(NA, diff(measurement)) )diff_plot <-plot_ly(ts_data_diff) %>%add_trace(x =~time,y =~Difference,type ="scatter",mode ="lines+markers",name ="First Difference",line =list(color ="purple"),marker =list(size =3) ) %>%add_segments(x =0, xend =max(ts_data$time),y =0, yend =0,line =list(color ="black", width =1),showlegend =FALSE ) %>%add_segments(x =150.5, xend =150.5,y =min(ts_data_diff$Difference, na.rm =TRUE), yend =max(ts_data_diff$Difference, na.rm =TRUE),line =list(color ="red", dash ="dash", width =2),showlegend =FALSE ) %>%layout(title =list(text ="First Differences (Change from Previous Time Point)<br><sub>Assessing variability and detecting changes</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time Index"),yaxis =list(title ="Difference (Δy)") )diff_plotcat("\n")cat("Visual Assessment:\n")cat("- Main plot shows overall trend and phase separation\n")cat("- Histogram shows distribution shift between phases\n")cat("- Moving average reveals underlying trend\n")cat("- First differences show period-to-period variability\n\n")if (mean(ts_data$measurement[151:200]) >mean(ts_data$measurement[1:150])) {cat("Observation: Phase II mean is HIGHER than Phase I\n")cat("Shift magnitude:", round(mean(ts_data$measurement[151:200]) -mean(ts_data$measurement[1:150]), 3), "\n\n")} else {cat("Observation: Phase II mean is LOWER than Phase I\n")cat("Shift magnitude:", round(abs(mean(ts_data$measurement[151:200]) -mean(ts_data$measurement[1:150])), 3), "\n\n")}```# 1. Diagnostic Analysis## Check for Autocorrelation```{r autocorrelation-check}# Extract measurement vectory <- ts_data$measurementy_phase1 <- y[1:150] # Phase I for model building# Create time series objectts_phase1 <-ts(y_phase1, frequency =1)ts_all <-ts(y, frequency =1)# Calculate autocorrelation function (ACF)acf_result <-acf(y_phase1, lag.max =30, plot =FALSE)pacf_result <-pacf(y_phase1, lag.max =30, plot =FALSE)# Test for autocorrelation# Ljung-Box test (H0: no autocorrelation)lb_test <-Box.test(y_phase1, lag =20, type ="Ljung-Box")# Durbin-Watson testdw_stat <-sum(diff(y_phase1)^2) /sum((y_phase1 -mean(y_phase1))^2)cat("\nAutocorrelation Tests:\n\n")cat("Ljung-Box Test:\n")cat(" Statistic:", round(lb_test$statistic, 3), "\n")cat(" p-value:", round(lb_test$p.value, 4), "\n")cat(" Interpretation:", ifelse(lb_test$p.value <0.05, "[SIGNIFICANT] Autocorrelation detected","[OK] No significant autocorrelation"), "\n\n")cat("Durbin-Watson Statistic:", round(dw_stat, 3), "\n")cat(" (Values near 2 = no autocorrelation, <1.5 or >2.5 = autocorrelation)\n")cat(" Interpretation:", ifelse(dw_stat <1.5| dw_stat >2.5,"[SIGNIFICANT] Autocorrelation present","[OK] Little autocorrelation"), "\n\n")# Summary tableautocorr_summary <-tibble(Test =c("Ljung-Box", "Durbin-Watson"),Statistic =c(lb_test$statistic, dw_stat),`p-value`=c(lb_test$p.value, NA),Interpretation =c(ifelse(lb_test$p.value <0.05, "Autocorrelation detected", "No autocorrelation"),ifelse(dw_stat <1.5| dw_stat >2.5, "Autocorrelation present", "Little autocorrelation") ))autocorr_summary %>%gt() %>%tab_header(title ="Autocorrelation Tests Summary",subtitle ="Testing for temporal dependence in Phase I data" ) %>%fmt_number(columns = Statistic,decimals =3 ) %>%fmt_number(columns =`p-value`,decimals =4 ) %>%sub_missing(columns =`p-value`,missing_text ="—" ) %>%tab_style(style =cell_fill(color ="lightcoral"),locations =cells_body(columns = Interpretation,rows =grepl("detected|present", Interpretation) ) )# ACF Plotacf_data <-tibble(Lag =1:length(acf_result$acf[-1]),ACF = acf_result$acf[-1],Type ="ACF")pacf_data <-tibble(Lag =1:length(pacf_result$acf),ACF = pacf_result$acf,Type ="PACF")combined_acf <-bind_rows(acf_data, pacf_data)# Critical value for 95% confidenceci <-qnorm(0.975) /sqrt(length(y_phase1))acf_plot <-plot_ly(combined_acf) %>%add_trace(x =~Lag,y =~ACF,type ="bar",color =~Type,colors =c("ACF"="blue", "PACF"="darkgreen"),hovertemplate ="<b>%{fullData.name}</b><br>Lag: %{x}<br>Value: %{y:.3f}<extra></extra>" ) %>%add_segments(x =0, xend =max(combined_acf$Lag),y = ci, yend = ci,line =list(color ="red", dash ="dash", width =2),showlegend =FALSE,name ="95% CI" ) %>%add_segments(x =0, xend =max(combined_acf$Lag),y =-ci, yend =-ci,line =list(color ="red", dash ="dash", width =2),showlegend =FALSE ) %>%layout(title =list(text ="ACF and PACF Plots<br><sub>Identifying autocorrelation patterns</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Lag"),yaxis =list(title ="Correlation"),barmode ="group" )acf_plot```## Stationarity Tests```{r stationarity-tests}# Augmented Dickey-Fuller test (H0: non-stationary)adf_test <-adf.test(y_phase1)# KPSS test (H0: stationary)kpss_test <-kpss.test(y_phase1)cat("\nStationarity Tests:\n\n")cat("Augmented Dickey-Fuller Test:\n")cat(" Statistic:", round(adf_test$statistic, 3), "\n")cat(" p-value:", round(adf_test$p.value, 4), "\n")cat(" Interpretation:", ifelse(adf_test$p.value <0.05,"[OK] Series is stationary","[WARNING] Series may be non-stationary"), "\n\n")cat("KPSS Test:\n")cat(" Statistic:", round(kpss_test$statistic, 3), "\n")cat(" p-value:", round(kpss_test$p.value, 4), "\n")cat(" Interpretation:", ifelse(kpss_test$p.value >0.05,"[OK] Series is stationary","[WARNING] Series is non-stationary"), "\n\n")# Summary tablestationarity_summary <-tibble(Test =c("Augmented Dickey-Fuller", "KPSS"),Hypothesis =c("H0: Non-stationary", "H0: Stationary"),Statistic =c(adf_test$statistic, kpss_test$statistic),`p-value`=c(adf_test$p.value, kpss_test$p.value),Conclusion =c(ifelse(adf_test$p.value <0.05, "Stationary", "Non-stationary"),ifelse(kpss_test$p.value >0.05, "Stationary", "Non-stationary") ))stationarity_summary %>%gt() %>%tab_header(title ="Stationarity Tests",subtitle ="Testing if mean and variance are constant over time" ) %>%fmt_number(columns =c(Statistic, `p-value`),decimals =4 ) %>%tab_style(style =cell_fill(color ="lightgreen"),locations =cells_body(columns = Conclusion,rows = Conclusion =="Stationary" ) ) %>%tab_style(style =cell_fill(color ="lightyellow"),locations =cells_body(columns = Conclusion,rows = Conclusion =="Non-stationary" ) )```# 2. ARIMA Model Building## Automatic Model Selection```{r arima-model}# Automatic ARIMA model selectioncat("\nFitting ARIMA model to Phase I data...\n\n")arima_model <-auto.arima(ts_phase1, seasonal =FALSE,stepwise =TRUE,approximation =FALSE,trace =FALSE)cat("Selected Model:", format(arima_model$call), "\n")cat("Order: ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")\n\n")# Model summary# Model summary - use accuracy for error metricsarima_accuracy <-accuracy(arima_model)# Extract model parametersif (arima_model$arma[1] >0) { ar_coefs <-coef(arima_model)[grep("ar", names(coef(arima_model)))]} else { ar_coefs <-numeric(0)}if (arima_model$arma[2] >0) { ma_coefs <-coef(arima_model)[grep("ma", names(coef(arima_model)))]} else { ma_coefs <-numeric(0)}# Model informationmodel_info <-tibble(Metric =c("Model", "AIC", "BIC", "Log Likelihood", "Sigma²", "ME", "RMSE", "MAE"),Value =c(paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),as.character(round(arima_model$aic, 2)),as.character(round(arima_model$bic, 2)),as.character(round(arima_model$loglik, 2)),as.character(round(arima_model$sigma2, 4)),as.character(round(arima_accuracy[,"ME"], 4)),as.character(round(arima_accuracy[,"RMSE"], 4)),as.character(round(arima_accuracy[,"MAE"], 4)) ))model_info %>%gt() %>%tab_header(title ="ARIMA Model Summary",subtitle ="Automatic model selection results" ) %>%tab_style(style =list(cell_fill(color ="lightblue"),cell_text(weight ="bold") ),locations =cells_body(rows = Metric =="Model") )# Model coefficientsif (length(coef(arima_model)) >0) { coef_table <-tibble(Parameter =names(coef(arima_model)),Estimate =coef(arima_model),`Std Error`=sqrt(diag(arima_model$var.coef)),`t-value`= Estimate /`Std Error`,`p-value`=2* (1-pnorm(abs(`t-value`))) ) coef_table %>%gt() %>%tab_header(title ="ARIMA Model Coefficients",subtitle ="Parameter estimates and significance tests" ) %>%fmt_number(columns =c(Estimate, `Std Error`, `t-value`),decimals =4 ) %>%fmt_number(columns =`p-value`,decimals =4 ) %>%tab_style(style =cell_fill(color ="lightgreen"),locations =cells_body(columns =`p-value`,rows =`p-value`<0.05 ) )}```## Residual Diagnostics```{r residual-diagnostics}# Extract residualsresiduals_arima <-residuals(arima_model)# Test residuals for independencelb_residuals <-Box.test(residuals_arima, lag =20, type ="Ljung-Box")cat("\nResidual Diagnostics:\n\n")cat("Ljung-Box Test on Residuals:\n")cat(" p-value:", round(lb_residuals$p.value, 4), "\n")cat(" Interpretation:", ifelse(lb_residuals$p.value >0.05,"[OK] Residuals are white noise (independent)","[WARNING] Residuals still autocorrelated"), "\n\n")# Residual statisticsresid_stats <-tibble(Statistic =c("Mean", "Std Dev", "Min", "Max", "Ljung-Box p-value"),Value =c(mean(residuals_arima),sd(residuals_arima),min(residuals_arima),max(residuals_arima), lb_residuals$p.value ))resid_stats %>%gt() %>%tab_header(title ="Residual Statistics",subtitle ="ARIMA model residuals should be white noise" ) %>%fmt_number(columns = Value,decimals =4 ) %>%tab_footnote(footnote ="p-value > 0.05 indicates residuals are independent (good)",locations =cells_body(rows = Statistic =="Ljung-Box p-value") )# Residual plotsresid_data <-tibble(Time =1:length(residuals_arima),Residual =as.numeric(residuals_arima))resid_plot <-plot_ly(resid_data) %>%add_trace(x =~Time,y =~Residual,type ="scatter",mode ="lines+markers",name ="Residuals",line =list(color ="blue"),marker =list(size =4),hovertemplate ="<b>Time:</b> %{x}<br><b>Residual:</b> %{y:.3f}<extra></extra>" ) %>%add_segments(x =0, xend =max(resid_data$Time),y =0, yend =0,line =list(color ="black", dash ="solid", width =1),showlegend =FALSE ) %>%add_segments(x =0, xend =max(resid_data$Time),y =3*sd(residuals_arima), yend =3*sd(residuals_arima),line =list(color ="red", dash ="dash", width =2),showlegend =FALSE ) %>%add_segments(x =0, xend =max(resid_data$Time),y =-3*sd(residuals_arima), yend =-3*sd(residuals_arima),line =list(color ="red", dash ="dash", width =2),showlegend =FALSE ) %>%layout(title =list(text ="ARIMA Residuals Time Plot<br><sub>Residuals should be random with no patterns</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time"),yaxis =list(title ="Residual") )resid_plot```# 3. Control Charts for Residuals## I-MR Chart on ARIMA Residuals```{r residual-control-chart}# Now residuals should be independent - apply traditional control chartsresiduals_vec <-as.numeric(residuals_arima)# Calculate moving rangemr <-abs(diff(residuals_vec))mr_mean <-mean(mr, na.rm =TRUE)# Control limits for individualsd2 <-1.128# For n=2sigma_hat <- mr_mean / d2ucl_i <-mean(residuals_vec) +3* sigma_hatlcl_i <-mean(residuals_vec) -3* sigma_hat# Control limits for moving ranged3 <-0# For n=2d4 <-3.267# For n=2ucl_mr <- d4 * mr_meanlcl_mr <- d3 * mr_meancat("\nControl Chart Parameters (on Residuals):\n\n")cat("Individuals Chart:\n")cat(" Center Line:", round(mean(residuals_vec), 4), "\n")cat(" UCL:", round(ucl_i, 4), "\n")cat(" LCL:", round(lcl_i, 4), "\n\n")cat("Moving Range Chart:\n")cat(" Center Line:", round(mr_mean, 4), "\n")cat(" UCL:", round(ucl_mr, 4), "\n")cat(" LCL:", round(lcl_mr, 4), "\n\n")# Identify out-of-control pointsooc_i <-which(residuals_vec > ucl_i | residuals_vec < lcl_i)cat("Out-of-control points (Individuals):", length(ooc_i), "\n")if (length(ooc_i) >0) {cat(" At times:", head(ooc_i, 10), "\n")}# Control chart datacontrol_data <-tibble(Time =1:length(residuals_vec),Value = residuals_vec,UCL = ucl_i,CL =mean(residuals_vec),LCL = lcl_i,Status =ifelse(Value > UCL | Value < LCL, "Out of Control", "In Control"))# Individuals charti_chart <-plot_ly(control_data) %>%add_trace(x =~Time,y =~Value,type ="scatter",mode ="lines+markers",name ="Residuals",line =list(color ="blue"),marker =list(size =6,color =~ifelse(Status =="Out of Control", "red", "blue") ),text =~paste0("Time: ", Time, "<br>Value: ", round(Value, 3), "<br>", Status),hovertemplate ="%{text}<extra></extra>" ) %>%add_trace(x =~Time, y =~UCL, type ="scatter", mode ="lines",name ="UCL", line =list(color ="red", dash ="dash", width =2) ) %>%add_trace(x =~Time, y =~CL, type ="scatter", mode ="lines",name ="Center Line", line =list(color ="green", width =2) ) %>%add_trace(x =~Time, y =~LCL, type ="scatter", mode ="lines",name ="LCL", line =list(color ="red", dash ="dash", width =2) ) %>%layout(title =list(text ="I-Chart: ARIMA Residuals<br><sub>Monitoring residuals (independent observations)</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time"),yaxis =list(title ="Residual Value") )i_chart```# 4. Monitoring Phase II with ARIMA## One-Step-Ahead Forecasts```{r phase2-monitoring}# Use ARIMA model to monitor Phase II data# Calculate one-step-ahead forecast errorscat("\nPhase II Monitoring with ARIMA Model:\n\n")# Refit model on full data for monitoringall_residuals <-numeric(length(y))all_forecasts <-numeric(length(y))# Calculate residuals for all datafor (i in1:length(y)) {if (i <=150) {# Use Phase I model all_residuals[i] <- residuals_arima[i] all_forecasts[i] <- y[i] - residuals_arima[i] } else {# One-step-ahead forecast for Phase II# Refit on data up to i-1 temp_model <-Arima(y[1:(i-1)], order =c(arima_model$arma[1], arima_model$arma[6], arima_model$arma[2]),include.mean =TRUE) forecast_i <-forecast(temp_model, h =1) all_forecasts[i] <- forecast_i$mean[1] all_residuals[i] <- y[i] - forecast_i$mean[1] }}# Calculate control limits (same as Phase I)ucl_phase2 <-3* sigma_hatlcl_phase2 <--3* sigma_hat# Identify Phase II out-of-controlphase2_residuals <- all_residuals[151:200]ooc_phase2 <-which(abs(phase2_residuals) >3* sigma_hat)cat("Phase II Out-of-Control Points:", length(ooc_phase2), "out of 50\n")cat("False Alarm Rate:", round(length(ooc_phase2) /50*100, 2), "%\n\n")# Full time series plot with forecastsmonitoring_data <-tibble(Time =1:length(y),Observation = y,Forecast = all_forecasts,Residual = all_residuals,Phase =ifelse(Time <=150, "Phase I", "Phase II"),UCL = ucl_phase2,LCL = lcl_phase2,Status =ifelse(abs(Residual) >3* sigma_hat, "Out of Control", "In Control"))# Monitoring chartmonitoring_plot <-plot_ly(monitoring_data) %>%add_trace(x =~Time,y =~Observation,type ="scatter",mode ="lines+markers",name ="Observations",line =list(color ="black"),marker =list(size =4) ) %>%add_trace(x =~Time,y =~Forecast,type ="scatter",mode ="lines",name ="ARIMA Forecast",line =list(color ="blue", dash ="dot") ) %>%add_segments(x =150.5, xend =150.5,y =min(monitoring_data$Observation), yend =max(monitoring_data$Observation),line =list(color ="gray", dash ="dot", width =2),showlegend =FALSE ) %>%layout(title =list(text ="ARIMA-Based Process Monitoring<br><sub>Observations vs One-Step-Ahead Forecasts</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time"),yaxis =list(title ="Measurement"),annotations =list(list(x =150.5, y =max(monitoring_data$Observation),text ="Phase I | Phase II",showarrow =FALSE,font =list(size =10, color ="gray") ) ) )monitoring_plot# Residual monitoring chart for Phase IIresidual_monitor_plot <-plot_ly(monitoring_data %>%filter(Time >150)) %>%add_trace(x =~Time,y =~Residual,type ="scatter",mode ="lines+markers",name ="Forecast Errors",line =list(color ="blue"),marker =list(size =6,color =~ifelse(Status =="Out of Control", "red", "blue") ) ) %>%add_trace(x =~Time, y =~UCL, type ="scatter", mode ="lines",name ="UCL", line =list(color ="red", dash ="dash", width =2) ) %>%add_trace(x =~Time, y =0, type ="scatter", mode ="lines",name ="Center", line =list(color ="green", width =2) ) %>%add_trace(x =~Time, y =~LCL, type ="scatter", mode ="lines",name ="LCL", line =list(color ="red", dash ="dash", width =2) ) %>%layout(title =list(text ="Phase II: Forecast Error Monitoring<br><sub>Detecting shifts in autocorrelated process</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Time"),yaxis =list(title ="Forecast Error (Residual)") )residual_monitor_plot```# 5. Time Series Cross-Validation## Rolling Origin Cross-Validation```{r time-series-cv}cat("\nTime Series Cross-Validation:\n\n")cat("Method: Rolling Origin (Expanding Window)\n")cat("This respects temporal order and tests forecasting performance\n\n")# Rolling origin cross-validation# Start with minimum training size, expand window each iterationmin_train <-50n_test <-length(y_phase1)h <-1# One-step-ahead forecast# Storage for resultscv_errors <-numeric(n_test - min_train)cv_mae <-numeric(n_test - min_train)cv_rmse <-numeric(n_test - min_train)for (i inseq_along(cv_errors)) { train_end <- min_train + i -1 test_point <- train_end +1if (test_point <=length(y_phase1)) {# Fit model on training data train_data <- y_phase1[1:train_end] test_data <- y_phase1[test_point]# Fit ARIMA cv_model <-auto.arima(train_data, seasonal =FALSE)# Forecast cv_forecast <-forecast(cv_model, h = h)# Calculate error error <- test_data - cv_forecast$mean[1] cv_errors[i] <- error cv_mae[i] <-abs(error) cv_rmse[i] <- error^2 }}# Summary statisticsmae_cv <-mean(cv_mae, na.rm =TRUE)rmse_cv <-sqrt(mean(cv_rmse, na.rm =TRUE))mape_cv <-mean(abs(cv_errors / y_phase1[(min_train+1):length(y_phase1)]) *100, na.rm =TRUE)cat("Cross-Validation Results:\n")cat(" Number of folds:", length(cv_errors), "\n")cat(" MAE:", round(mae_cv, 4), "\n")cat(" RMSE:", round(rmse_cv, 4), "\n")cat(" MAPE:", round(mape_cv, 2), "%\n\n")# CV results tablecv_summary <-tibble(Metric =c("Mean Absolute Error (MAE)", "Root Mean Squared Error (RMSE)", "Mean Absolute Percentage Error (MAPE)", "Number of Folds"),Value =c(paste0(round(mae_cv, 4)),paste0(round(rmse_cv, 4)),paste0(round(mape_cv, 2), "%"),as.character(length(cv_errors)) ),Interpretation =c("Average forecast error magnitude","Penalizes large errors more","Relative error as percentage","Expanding window iterations" ))cv_summary %>%gt() %>%tab_header(title ="Time Series Cross-Validation Results",subtitle ="Rolling origin method with expanding window" ) %>%tab_style(style =cell_fill(color ="lightblue"),locations =cells_column_labels() )# Plot CV errors over timecv_plot_data <-tibble(Fold =1:length(cv_errors),`Forecast Error`= cv_errors,`Training Size`= min_train + Fold -1)cv_error_plot <-plot_ly(cv_plot_data) %>%add_trace(x =~`Training Size`,y =~`Forecast Error`,type ="scatter",mode ="lines+markers",name ="CV Error",line =list(color ="blue"),marker =list(size =4),hovertemplate ="<b>Training Size:</b> %{x}<br><b>Error:</b> %{y:.3f}<extra></extra>" ) %>%add_segments(x =min(cv_plot_data$`Training Size`), xend =max(cv_plot_data$`Training Size`),y =0, yend =0,line =list(color ="black", width =1),showlegend =FALSE ) %>%layout(title =list(text ="Cross-Validation Forecast Errors<br><sub>Performance improves with more training data</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Training Sample Size"),yaxis =list(title ="One-Step-Ahead Forecast Error") )cv_error_plot```## Block Bootstrap for Time Series```{r time-series-bootstrap}cat("\nBlock Bootstrap for Time Series:\n\n")cat("Method: Moving Block Bootstrap\n")cat("Preserves temporal dependence structure\n\n")# Moving block bootstrapblock_length <-10# Block sizen_bootstrap <-1000# Number of bootstrap samplesn_obs <-length(y_phase1)# Storage for bootstrap statisticsboot_means <-numeric(n_bootstrap)boot_vars <-numeric(n_bootstrap)boot_acf1 <-numeric(n_bootstrap)set.seed(456)for (b in1:n_bootstrap) {# Generate bootstrap sample using moving blocks n_blocks <-ceiling(n_obs / block_length) boot_sample <-numeric(0)for (i in1:n_blocks) {# Random starting point start <-sample(1:(n_obs - block_length +1), 1) block <- y_phase1[start:(start + block_length -1)] boot_sample <-c(boot_sample, block) }# Trim to original length boot_sample <- boot_sample[1:n_obs]# Calculate statistics boot_means[b] <-mean(boot_sample) boot_vars[b] <-var(boot_sample)# Calculate ACF(1) acf_temp <-acf(boot_sample, lag.max =1, plot =FALSE) boot_acf1[b] <- acf_temp$acf[2]}# Calculate confidence intervalsci_level <-0.95alpha <-1- ci_levelmean_ci <-quantile(boot_means, c(alpha/2, 1-alpha/2))var_ci <-quantile(boot_vars, c(alpha/2, 1-alpha/2))acf1_ci <-quantile(boot_acf1, c(alpha/2, 1-alpha/2))cat("Bootstrap Results (", n_bootstrap, "iterations, block length =", block_length, "):\n\n")# Bootstrap results tableboot_results <-tibble(Statistic =c("Mean", "Variance", "ACF(1)"),`Original Value`=c(mean(y_phase1),var(y_phase1),acf(y_phase1, lag.max =1, plot =FALSE)$acf[2] ),`Bootstrap Mean`=c(mean(boot_means),mean(boot_vars),mean(boot_acf1) ),`95% CI Lower`=c(mean_ci[1], var_ci[1], acf1_ci[1]),`95% CI Upper`=c(mean_ci[2], var_ci[2], acf1_ci[2]))boot_results %>%gt() %>%tab_header(title ="Block Bootstrap Results",subtitle =paste0("Moving block bootstrap (", n_bootstrap, " iterations, block length = ", block_length, ")") ) %>%fmt_number(columns =c(`Original Value`, `Bootstrap Mean`, `95% CI Lower`, `95% CI Upper`),decimals =4 ) %>%tab_footnote(footnote ="Bootstrap preserves autocorrelation structure in time series",locations =cells_column_labels(columns =`Bootstrap Mean`) )# Bootstrap distribution plotsboot_dist_data <-tibble(Mean = boot_means,Variance = boot_vars,`ACF(1)`= boot_acf1)# Mean distributionmean_hist <-plot_ly(x = boot_dist_data$Mean, type ="histogram", name ="Bootstrap Distribution",marker =list(color ="lightblue", line =list(color ="black", width =1))) %>%add_segments(x =mean(y_phase1), xend =mean(y_phase1),y =0, yend =150,line =list(color ="red", width =3),name ="Original Value" ) %>%add_segments(x = mean_ci[1], xend = mean_ci[1],y =0, yend =150,line =list(color ="green", dash ="dash", width =2),name ="95% CI" ) %>%add_segments(x = mean_ci[2], xend = mean_ci[2],y =0, yend =150,line =list(color ="green", dash ="dash", width =2),showlegend =FALSE ) %>%layout(title =list(text ="Bootstrap Distribution of Mean<br><sub>Block bootstrap preserves temporal dependence</sub>",font =list(size =16, weight ="bold") ),xaxis =list(title ="Mean Value"),yaxis =list(title ="Frequency") )mean_hist```## Comparing Bootstrap Methods```{r bootstrap-comparison}# Compare block bootstrap vs naive bootstrapcat("\nComparing Bootstrap Methods:\n\n")# Naive bootstrap (WRONG for time series - ignores autocorrelation)naive_boot_acf1 <-numeric(n_bootstrap)set.seed(456)for (b in1:n_bootstrap) {# Naive bootstrap: random sampling with replacement naive_sample <-sample(y_phase1, replace =TRUE) acf_temp <-acf(naive_sample, lag.max =1, plot =FALSE) naive_boot_acf1[b] <- acf_temp$acf[2]}# Comparisoncomparison_boot <-tibble(Method =c("Original Data", "Block Bootstrap", "Naive Bootstrap (WRONG)"),`Mean ACF(1)`=c(acf(y_phase1, lag.max =1, plot =FALSE)$acf[2],mean(boot_acf1),mean(naive_boot_acf1) ),`Std Dev ACF(1)`=c(NA,sd(boot_acf1),sd(naive_boot_acf1) ),Validity =c("Ground truth","[OK] Preserves dependence","[X] Destroys dependence" ))comparison_boot %>%gt() %>%tab_header(title ="Bootstrap Method Comparison",subtitle ="Block bootstrap vs Naive bootstrap for time series" ) %>%fmt_number(columns =c(`Mean ACF(1)`, `Std Dev ACF(1)`),decimals =4 ) %>%sub_missing(columns =`Std Dev ACF(1)`,missing_text ="—" ) %>%tab_style(style =cell_fill(color ="lightgreen"),locations =cells_body(rows = Method =="Block Bootstrap") ) %>%tab_style(style =cell_fill(color ="lightcoral"),locations =cells_body(rows = Method =="Naive Bootstrap (WRONG)") ) %>%tab_footnote(footnote ="Naive bootstrap underestimates ACF because it breaks temporal order",locations =cells_body(rows = Method =="Naive Bootstrap (WRONG)", columns = Method) )cat("\nKey Insight:\n")cat("Block bootstrap mean ACF(1):", round(mean(boot_acf1), 4), "\n")cat("Original data ACF(1):", round(acf(y_phase1, lag.max =1, plot =FALSE)$acf[2], 4), "\n")cat("Naive bootstrap mean ACF(1):", round(mean(naive_boot_acf1), 4), "\n\n")cat("Block bootstrap preserves autocorrelation, naive bootstrap destroys it!\n")```# 6. Dashboard and Summary```{r dashboard}# Summary metricsdashboard_metrics <-tibble(Metric =c("Total Observations","Phase I Samples","Phase II Samples","ARIMA Model","Autocorrelation (ACF1)","Phase I OOC Points","Phase II OOC Points","CV MAE","Bootstrap Mean CI Width" ),Value =c(length(y),150,50,paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")"),as.character(round(acf(y_phase1, lag.max =1, plot =FALSE)$acf[2], 3)),as.character(length(ooc_i)),as.character(length(ooc_phase2)),as.character(round(mae_cv, 4)),as.character(round(mean_ci[2] - mean_ci[1], 4)) ))dashboard_metrics %>%gt() %>%tab_header(title ="Time Series SPC Dashboard",subtitle =paste("Analysis Date:", Sys.Date()) ) %>%tab_style(style =cell_fill(color ="lightblue"),locations =cells_column_labels() )```# 7. Recommendations## When to Use Time Series SPC**Use ARIMA-based SPC when:** - ✅ Autocorrelation detected (Ljung-Box p \< 0.05) - ✅ Durbin-Watson statistic \< 1.5 or \> 2.5 - ✅ Slow-moving process (chemical, thermal) - ✅ High-frequency measurements - ✅ Traditional charts show too many false alarms**Stick with traditional SPC when:** - ✅ No significant autocorrelation - ✅ Independent samples - ✅ Rational subgrouping possible - ✅ Simple process## Best Practices**1. Model Validation:** - Always check residual diagnostics - Ljung-Box test on residuals should have p \> 0.05 - Residuals should look like white noise**2. Cross-Validation:** - Use time series CV (not random k-fold!) - Rolling origin or expanding window - Report out-of-sample forecast accuracy**3. Bootstrap:** - Use block bootstrap (not naive) - Block length \~ 1.5 × ACF length - Validate CI coverage**4. Monitoring:** - Monitor residuals, not original observations - Update model periodically with new data - Watch for model deterioration## Practical Implementation```{r recommendations}# Generate recommendations based on analysiscat("\n### Analysis-Specific Recommendations:\n\n")if (lb_test$p.value <0.05) {cat("[REQUIRED] Significant autocorrelation detected (p =", round(lb_test$p.value, 4), ")\n")cat(" Action: Use ARIMA-based SPC (implemented above)\n")cat(" Benefit: Proper false alarm rate (~0.27% instead of ~", round(length(ooc_i)/150*100, 1), "%)\n\n")} else {cat("[OPTIONAL] Weak autocorrelation detected\n")cat(" Action: Traditional SPC may be acceptable\n")cat(" Alternative: Use EWMA charts for added robustness\n\n")}if (length(ooc_phase2) >5) {cat("[ALERT] Multiple Phase II out-of-control signals\n")cat(" Count:", length(ooc_phase2), "out of 50 samples\n")cat(" Action: Investigate process change at sample ~", min(ooc_phase2) +150, "\n\n")}cat("Model Performance:\n")cat(" CV MAE:", round(mae_cv, 4), "- Good if < 1 sigma\n")cat(" Model:", paste0("ARIMA(", arima_model$arma[1], ",", arima_model$arma[6], ",", arima_model$arma[2], ")\n"))```------------------------------------------------------------------------**Report Generated:** `r Sys.time()`\**Analysis Method:** ARIMA-based Time Series SPC\**Statistical Software:** R with forecast and qcc packages\**Methodology:** Time series modeling with cross-validation and bootstrap