© 2026 Dr. Debashis Chatterjee. All rights
reserved.
Prepared foreducational purpose only.
A control chart is a time-ordered plot used to check whether a process is stable.
Golden classroom rule:
> First check variation (R or S chart), then check
mean (X̄ chart).
You only need these ideas:
If you remember only one sentence: > A control chart is a “traffic light” for stability: most points should behave normally; unusual behavior suggests a special cause.
We learn charts in this order:
If a statistic is approximately Normal, then about 99.73% of points lie within \(\pm 3\sigma\). So points outside are rare under a stable process.
pkgs <- c("qcc","ggplot2","dplyr","tidyr","knitr")
to_install <- pkgs[!sapply(pkgs, requireNamespace, quietly = TRUE)]
if(length(to_install) > 0) install.packages(to_install, dependencies = TRUE)
library(qcc)
library(ggplot2)
library(dplyr)
library(tidyr)
library(knitr)
set.seed(2026)k_tbl <- function(x, caption=NULL, digits=4){
knitr::kable(as.data.frame(x), caption = caption, digits = digits)
}
# Show CL/LCL/UCL neatly
limits_tbl <- function(q){
data.frame(LCL = q$limits[1], CL = q$center, UCL = q$limits[2])
}
# Out-of-control points beyond limits (most common)
ooc_tbl <- function(q){
idx <- integer(0)
if(!is.null(q$violations) && is.list(q$violations) && !is.null(q$violations$beyond.limits)){
idx <- q$violations$beyond.limits
}
if(length(idx)==0) return(data.frame(message="No points beyond control limits."))
data.frame(index = idx, statistic = as.numeric(q$statistics[idx]))
}ToothGrowth is a real dataset (R built-in) from an
experiment on tooth growth in guinea pigs.
We will use the measurement len (tooth length).
How we create rational subgroups (simple teaching method): - Take the data in order - Make subgroups of size \(n=5\) - Each row in the matrix = one subgroup
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 13.07 19.25 18.81 25.27 33.90
| len | supp | dose |
|---|---|---|
| 4.2 | VC | 0.5 |
| 11.5 | VC | 0.5 |
| 7.3 | VC | 0.5 |
| 5.8 | VC | 0.5 |
| 6.4 | VC | 0.5 |
| 10.0 | VC | 0.5 |
| 11.2 | VC | 0.5 |
| 11.2 | VC | 0.5 |
| 5.2 | VC | 0.5 |
| 7.0 | VC | 0.5 |
x <- as.numeric(ToothGrowth$len)
x <- x[is.finite(x)]
n <- 5
m <- floor(length(x)/n)
m_use <- min(m, 20) # keep small for classroom
Xmat <- matrix(x[1:(m_use*n)], nrow=m_use, ncol=n, byrow=TRUE)
k_tbl(head(Xmat, 6), caption="Subgroup matrix: each row is a subgroup (n=5)", digits=2)| V1 | V2 | V3 | V4 | V5 |
|---|---|---|---|---|
| 4.2 | 11.5 | 7.3 | 5.8 | 6.4 |
| 10.0 | 11.2 | 11.2 | 5.2 | 7.0 |
| 16.5 | 16.5 | 15.2 | 17.3 | 22.5 |
| 17.3 | 13.6 | 14.5 | 18.8 | 15.5 |
| 23.6 | 18.5 | 33.9 | 25.5 | 26.4 |
| 32.5 | 26.7 | 21.5 | 23.3 | 29.5 |
df_raw <- data.frame(t = 1:(m_use*n), value = x[1:(m_use*n)])
ggplot(df_raw, aes(t, value)) +
geom_line(color="#2c7fb8", linewidth=0.5) +
geom_point(color="#2c7fb8", size=1.7) +
labs(title="Raw measurement stream used for subgrouping (ToothGrowth$len)",
x="Observation index", y="Tooth length") +
theme_minimal(base_size=12)| LCL | CL | UCL |
|---|---|---|
| 0 | 8.6417 | 18.2725 |
| message |
|---|
| No points beyond control limits. |
| LCL | CL | UCL |
|---|---|---|
| 13.8288 | 18.8133 | 23.7979 |
| index | statistic |
|---|---|
| 5 | 25.58 |
| 6 | 26.70 |
| 11 | 24.72 |
| 12 | 27.40 |
| 1 | 7.04 |
| 2 | 8.92 |
| 8 | 10.76 |
sub_mean <- rowMeans(Xmat)
sub_range <- apply(Xmat, 1, function(v) max(v)-min(v))
sub_tbl <- data.frame(subgroup=1:m_use, xbar=sub_mean, R=sub_range)
k_tbl(head(sub_tbl, 12), caption="Subgroup statistics (first 12 subgroups)", digits=3)| subgroup | xbar | R |
|---|---|---|
| 1 | 7.04 | 7.3 |
| 2 | 8.92 | 6.0 |
| 3 | 17.60 | 7.3 |
| 4 | 15.94 | 5.2 |
| 5 | 25.58 | 15.4 |
| 6 | 26.70 | 11.0 |
| 7 | 15.70 | 11.8 |
| 8 | 10.76 | 8.3 |
| 9 | 22.60 | 6.7 |
| 10 | 22.80 | 12.8 |
| 11 | 24.72 | 4.0 |
| 12 | 27.40 | 7.9 |
df_xbar <- data.frame(subgroup=1:m_use, xbar=qX$statistics)
ggplot(df_xbar, aes(subgroup, xbar)) +
geom_line(color="#2c7fb8", linewidth=0.6) +
geom_point(color="#2c7fb8", size=2) +
geom_hline(yintercept=qX$limits[2], linetype="dashed", color="#d95f0e", linewidth=1) +
geom_hline(yintercept=qX$center, linetype="solid", color="#1b9e77", linewidth=1) +
geom_hline(yintercept=qX$limits[1], linetype="dashed", color="#d95f0e", linewidth=1) +
labs(title="Subgroup means (X-bar) with control limits", x="Subgroup", y="X-bar") +
theme_minimal(base_size=12)iris is a famous real dataset of flower
measurements.
We use Sepal.Length and create subgroups of size \(n=10\).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.300 5.100 5.800 5.843 6.400 7.900
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
x2 <- as.numeric(iris$Sepal.Length)
x2 <- x2[is.finite(x2)]
n2 <- 10
m2 <- floor(length(x2)/n2)
m2_use <- min(m2, 12)
X2 <- matrix(x2[1:(m2_use*n2)], nrow=m2_use, ncol=n2, byrow=TRUE)
k_tbl(head(X2, 4), caption="iris subgroup matrix (n=10)", digits=2)| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 |
|---|---|---|---|---|---|---|---|---|---|
| 5.1 | 4.9 | 4.7 | 4.6 | 5.0 | 5.4 | 4.6 | 5.0 | 4.4 | 4.9 |
| 5.4 | 4.8 | 4.8 | 4.3 | 5.8 | 5.7 | 5.4 | 5.1 | 5.7 | 5.1 |
| 5.4 | 5.1 | 4.6 | 5.1 | 4.8 | 5.0 | 5.0 | 5.2 | 5.2 | 4.7 |
| 4.8 | 5.4 | 5.2 | 5.5 | 4.9 | 5.0 | 5.5 | 4.9 | 4.4 | 5.1 |
| LCL | CL | UCL |
|---|---|---|
| 0.1301 | 0.4585 | 0.7869 |
| index | statistic |
|---|---|
| 11 | 0.8042 |
| LCL | CL | UCL |
|---|---|---|
| 5.2056 | 5.6525 | 6.0994 |
| index | statistic |
|---|---|
| 6 | 6.10 |
| 8 | 6.26 |
| 11 | 6.57 |
| 12 | 6.55 |
| 1 | 4.86 |
| 3 | 5.01 |
| 4 | 5.07 |
| 5 | 4.88 |
airquality is real daily air quality data in New York
(1973).
We use Temp as a single measurement per day (\(n=1\) each time).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 56.00 72.00 79.00 77.88 85.00 97.00
| Ozone | Solar.R | Wind | Temp | Month | Day |
|---|---|---|---|---|---|
| 41 | 190 | 7.4 | 67 | 5 | 1 |
| 36 | 118 | 8.0 | 72 | 5 | 2 |
| 12 | 149 | 12.6 | 74 | 5 | 3 |
| 18 | 313 | 11.5 | 62 | 5 | 4 |
| NA | NA | 14.3 | 56 | 5 | 5 |
| 28 | NA | 14.9 | 66 | 5 | 6 |
| 23 | 299 | 8.6 | 65 | 5 | 7 |
| 19 | 99 | 13.8 | 59 | 5 | 8 |
xI <- as.numeric(aq$Temp)
qI <- qcc(xI, type="xbar.one", title="Individuals Chart (airquality Temp)", plot=TRUE)| LCL | CL | UCL |
|---|---|---|
| 66.3517 | 77.8824 | 89.413 |
| index | statistic |
|---|---|
| 40 | 90 |
| 42 | 93 |
| 43 | 92 |
| 69 | 92 |
| 70 | 92 |
| 75 | 91 |
| 100 | 90 |
| 101 | 90 |
| 102 | 92 |
| 120 | 97 |
| 121 | 94 |
| 122 | 96 |
| 123 | 94 |
| 124 | 91 |
| 125 | 92 |
| 126 | 93 |
| 127 | 93 |
| 4 | 62 |
| 5 | 56 |
| 6 | 66 |
| 7 | 65 |
| 8 | 59 |
| 9 | 61 |
| 13 | 66 |
| 15 | 58 |
| 16 | 64 |
| 17 | 66 |
| 18 | 57 |
| 20 | 62 |
| 21 | 59 |
| 23 | 61 |
| 24 | 61 |
| 25 | 57 |
| 26 | 58 |
| 27 | 57 |
| 49 | 65 |
| 144 | 64 |
| 148 | 63 |
Moving range \(MR_t = |x_t - x_{t-1}|\) is the range of a 2-point subgroup \((x_t, x_{t-1})\).
pair_mat <- cbind(xI[-1], xI[-length(xI)]) # (x_t, x_{t-1}) => subgroup size 2
qMR <- qcc(pair_mat, type="R", title="Moving Range (MR) Chart (airquality Temp)", plot=TRUE)| LCL | CL | UCL |
|---|---|---|
| 0 | 4.3355 | 14.1654 |
| index | statistic |
|---|---|
| 34 | 17 |
| 143 | 18 |
We use airquality$Ozone (real).
Define defective day = Ozone > 80
(illustration).
aqo <- airquality |> filter(!is.na(Ozone)) |> mutate(defective = as.integer(Ozone > 80))
table(aqo$defective)##
## 0 1
## 100 16
T <- 25
n <- 12
batches <- data.frame(t=1:T, d=NA_integer_)
for(i in 1:T){
samp <- sample(aqo$defective, n, replace=TRUE)
batches$d[i] <- sum(samp)
}
k_tbl(head(batches, 10), caption="First 10 batches: number of defectives (Ozone>80)")| t | d |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 1 |
| 4 | 0 |
| 5 | 2 |
| 6 | 2 |
| 7 | 4 |
| 8 | 1 |
| 9 | 0 |
| 10 | 4 |
| LCL | CL | UCL |
|---|---|---|
| 0 | 0.1767 | 0 |
| message |
|---|
| No points beyond control limits. |
When batch size is constant, we can chart the count \(d\) directly as an np chart.
qNP <- qcc(batches$d, type="np", sizes=rep(n,T), title="np Chart (airquality: Ozone>80)", plot=TRUE)| LCL | CL | UCL |
|---|---|---|
| 0 | 2.12 | 6.0835 |
| message |
|---|
| No points beyond control limits. |
InsectSprays contains counts of insects after using
different sprays.
We treat count as “defects per unit” (for chart
demonstration).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 3.00 7.00 9.50 14.25 26.00
| count | spray |
|---|---|
| 10 | A |
| 7 | A |
| 20 | A |
| 14 | A |
| 14 | A |
| 12 | A |
| 10 | A |
| 23 | A |
| 17 | A |
| 20 | A |
| LCL | CL | UCL |
|---|---|---|
| 0.2534 | 9.5 | 18.7466 |
| index | statistic |
|---|---|
| 3 | 20 |
| 8 | 23 |
| 10 | 20 |
| 15 | 21 |
| 21 | 19 |
| 22 | 21 |
| 64 | 22 |
| 69 | 26 |
| 70 | 26 |
| 71 | 24 |
| 25 | 0 |
| 34 | 0 |
Titanic is a real table of passenger counts.
We treat: - defects = deaths -
opportunity = total passengers
We create \(u = \text{deaths}/\text{total}\) by passenger Class.
data(Titanic)
Tdf <- as.data.frame(Titanic)
u_df <- Tdf |>
group_by(Class) |>
summarise(
deaths = sum(Freq[Survived=="No"]),
total = sum(Freq),
u = deaths/total,
.groups="drop"
) |> mutate(t=row_number())
k_tbl(u_df, caption="u chart data (Titanic by Class)", digits=4)| Class | deaths | total | u | t |
|---|---|---|---|---|
| 1st | 122 | 325 | 0.3754 | 1 |
| 2nd | 167 | 285 | 0.5860 | 2 |
| 3rd | 528 | 706 | 0.7479 | 3 |
| Crew | 673 | 885 | 0.7605 | 4 |
qU <- qcc(u_df$deaths, type="u", sizes=u_df$total, title="u Chart (Titanic deaths per passenger)", plot=TRUE)| LCL | CL | UCL |
|---|---|---|
| 0.54 | 0.677 | 0.5308 |
| index | statistic |
|---|---|
| 4 | 0.7605 |
| 1 | 0.3754 |
For each chart, students should write:
A production/service process generates a sequence of observations over time:
\[ X_1, X_2, \dots, X_t, \dots \]
A process is said to be in statistical control (stable) if its probability distribution does not change over time (mean and variability remain stable). A process is out of control if something changes because of a special/assignable cause, such as tool wear, wrong setting, raw material change, operator change, or environment change.
Control charts help us answer:
Key principle: Control charts do not “improve” quality by themselves; they detect instability so that we can remove special causes.
Every process has variation. SPC separates variation into:
Small natural fluctuations due to many minor factors (background noise). If only common causes exist, the process is stable.
Variation due to identifiable events (machine misalignment, wrong temperature, sensor drift, operator change, etc.). Special causes make the process unstable.
Control charts are designed so that, under common causes only, points rarely cross the control limits; if they do, we suspect special causes.
Many charts are based on the “3-sigma” concept. If a statistic \(Z\) is approximately Normal:
\[ Z \sim N(0,1), \]
then:
\[ \mathbb{P}(|Z| \le 3) \approx 0.9973 \quad \Rightarrow \quad \mathbb{P}(|Z|>3) \approx 0.0027. \]
So, under stable conditions, only about 0.27% of points fall outside ±3σ limits.
If the probability of a false alarm per point is \(\alpha\), then approximately:
\[ \mathrm{ARL}_0 \approx \frac{1}{\alpha}. \]
For \(\alpha \approx 0.0027\):
\[ \mathrm{ARL}_0 \approx \frac{1}{0.0027} \approx 370. \]
Meaning: under stability, we expect (on average) one false signal every ~370 plotted points.
When we take measurements in subgroups, we observe:
\[ X_{t1}, X_{t2}, \dots, X_{tn} \quad\text{(subgroup at time }t\text{ of size }n). \]
A rational subgroup is formed so that: - variation within a subgroup reflects common-cause variation, - variation between subgroups captures potential special causes.
Typical rational subgrouping: - take items close in time (e.g., 5 consecutive items every hour), - keep machine and conditions nearly constant within a subgroup.
Variables charts are for continuous measurements (length, weight, temperature, etc.).
For subgroup \(t\) of size \(n\):
\[ \bar{X}_t = \frac{1}{n}\sum_{j=1}^n X_{tj}, \]
Range:
\[ R_t = \max_j X_{tj} - \min_j X_{tj}, \]
Sample standard deviation:
\[ S_t = \sqrt{\frac{1}{n-1}\sum_{j=1}^n (X_{tj}-\bar{X}_t)^2 }. \]
Golden rule: Always check the R/S chart first.
If variability is unstable, mean chart limits become unreliable.
If the process is stable and approximately Normal:
\[ X \sim N(\mu,\sigma^2), \]
then subgroup mean:
\[ \bar{X}_t \sim N\!\left(\mu,\frac{\sigma^2}{n}\right). \]
So “3-sigma” limits for subgroup mean would be:
\[ \mathrm{UCL}_{\bar{X}}=\mu+3\frac{\sigma}{\sqrt{n}},\quad \mathrm{CL}_{\bar{X}}=\mu,\quad \mathrm{LCL}_{\bar{X}}=\mu-3\frac{\sigma}{\sqrt{n}}. \]
In practice, \(\mu\) and \(\sigma\) are estimated from Phase I data, and constants from standard tables are used (Montgomery).
The range \(R_t\) measures within-subgroup variation. Limits have the form:
\[ \mathrm{UCL}_R = D_4 \bar{R},\quad \mathrm{CL}_R = \bar{R},\quad \mathrm{LCL}_R = D_3 \bar{R}, \]
where:
\[ \bar{R}=\frac{1}{m}\sum_{t=1}^m R_t \]
and \(D_3, D_4\) depend on subgroup size \(n\) (taken from tables).
If \(D_3\bar{R}<0\), we use:
\[ \mathrm{LCL}_R = 0. \]
Similarly:
\[ \mathrm{UCL}_S = B_4 \bar{S},\quad \mathrm{CL}_S = \bar{S},\quad \mathrm{LCL}_S = B_3 \bar{S}, \]
where:
\[ \bar{S}=\frac{1}{m}\sum_{t=1}^m S_t \]
and \(B_3, B_4\) depend on subgroup size \(n\).
Used when subgroup size is 1: one observation at each time.
Let \(X_t\) be the observation at time \(t\). If stable:
\[ X_t \sim N(\mu,\sigma^2). \]
Then:
\[ \mathrm{UCL}_I = \mu + 3\sigma,\quad \mathrm{CL}_I=\mu,\quad \mathrm{LCL}_I = \mu - 3\sigma. \]
But \(\sigma\) is unknown. We estimate it using moving ranges.
Define moving range of length 2:
\[ MR_t = |X_t - X_{t-1}|,\qquad t=2,3,\dots \]
Let:
\[ \overline{MR} = \frac{1}{T-1}\sum_{t=2}^{T} MR_t. \]
A standard estimator is:
\[ \hat{\sigma} \approx \frac{\overline{MR}}{d_2}, \]
where \(d_2 = 1.128\) for moving range of size 2.
Then Individuals chart limits become:
\[ \mathrm{UCL}_I = \bar{X} + 3\hat{\sigma},\quad \mathrm{LCL}_I = \bar{X} - 3\hat{\sigma}. \]
The MR chart itself is an R-chart applied to subgroups of size 2 (or equivalently to the \(MR_t\) sequence).
Attribute charts are for defectives or defects.
At time \(t\), inspect \(n_t\) units and find \(D_t\) defectives:
\[ \hat{p}_t=\frac{D_t}{n_t}. \]
Assume Binomial model:
\[ D_t \sim \mathrm{Bin}(n_t,p). \]
Then:
\[ \E[\hat{p}_t]=p,\qquad \Var(\hat{p}_t)=\frac{p(1-p)}{n_t}. \]
So 3-sigma limits (using estimated \(\bar{p}\)):
\[ \mathrm{UCL}_{p,t}=\bar{p}+3\sqrt{\frac{\bar{p}(1-\bar{p})}{n_t}},\quad \mathrm{LCL}_{p,t}=\bar{p}-3\sqrt{\frac{\bar{p}(1-\bar{p})}{n_t}}. \]
Truncate at 0 and 1 if needed.
If \(n_t=n\) is constant, chart \(D_t\) directly:
\[ \E[D_t]=np,\qquad \Var(D_t)=np(1-p). \]
Limits:
\[ \mathrm{UCL}_{np}=n\bar{p}+3\sqrt{n\bar{p}(1-\bar{p})},\quad \mathrm{LCL}_{np}=n\bar{p}-3\sqrt{n\bar{p}(1-\bar{p})}. \]
Let \(C_t\) be number of defects on a constant-sized unit. Assume Poisson:
\[ C_t \sim \mathrm{Poisson}(\lambda),\quad \E[C_t]=\Var(C_t)=\lambda. \]
Estimate \(\lambda\) by:
\[ \bar{c}=\frac{1}{m}\sum_{t=1}^m C_t. \]
Limits:
\[ \mathrm{UCL}_c=\bar{c}+3\sqrt{\bar{c}},\quad \mathrm{CL}_c=\bar{c},\quad \mathrm{LCL}_c=\bar{c}-3\sqrt{\bar{c}}, \]
truncate LCL at 0.
If opportunity varies, let \(n_t\) be exposure/area/time, and:
\[ C_t \mid n_t \sim \mathrm{Poisson}(n_t\lambda). \]
Define:
\[ u_t=\frac{C_t}{n_t}. \]
Then:
\[ \E[u_t]=\lambda,\qquad \Var(u_t)=\frac{\lambda}{n_t}. \]
Estimate:
\[ \bar{u}=\frac{\sum C_t}{\sum n_t}. \]
Limits:
\[ \mathrm{UCL}_{u,t}=\bar{u}+3\sqrt{\frac{\bar{u}}{n_t}},\quad \mathrm{LCL}_{u,t}=\bar{u}-3\sqrt{\frac{\bar{u}}{n_t}}, \]
truncate at 0.
When you look at any chart:
For each chart: