Introduction

The purpose of this document is to outline a number of techniques for dealing with time series data in operations research. This document will cover the basics of control charts and capability ratios and indexes. This document will provide specific examples covered in the Georgia Tech MGMT-6203 course regarding the use of control charts in operations research. The contents included here are NOT part of a homework assignment, or exam, but rather just a “re-working” of the examples used in the course content. Georgia Tech is very much concerned about academic integrity and honesty. My goal is not to cheat, but rather the help others understand the lectures.

The original work was done by Bob Myers, Lecturer, Georgia Institute of Technology, MGMT-6203 Data Analytics for Business, Summer 2020.

Time series data

Time series data is a set of data that contains elements that are organized and sorted according to a time index.

In dealing with time series data for control charts, we often have data that consists of multiple samples taken at the same time. Consider the following data that measures the diameter of mouth of bottles coming off an assembly line. The mouth of these bottles needs to be consistent and accurate because a bottling process that follows will not work otherwise.

rm(list=ls())
bottles = data.frame(n=1,a=.604,b=.612,c=.588,d=.600)
bottles <- rbind(bottles, data.frame(n=2,a=.597,b=.601,c=.607,d=.603))
bottles <- rbind(bottles, data.frame(n=3,a=.581,b=.570,c=.585,d=.592))
bottles <- rbind(bottles, data.frame(n=4,a=.620,b=.605,c=.595,d=.588))
bottles <- rbind(bottles, data.frame(n=5,a=.590,b=.614,c=.608,d=.604))
bottles <- rbind(bottles, data.frame(n=6,a=.585,b=.583,c=.617,d=.579))

bottles
##   n     a     b     c     d
## 1 1 0.604 0.612 0.588 0.600
## 2 2 0.597 0.601 0.607 0.603
## 3 3 0.581 0.570 0.585 0.592
## 4 4 0.620 0.605 0.595 0.588
## 5 5 0.590 0.614 0.608 0.604
## 6 6 0.585 0.583 0.617 0.579

To analyze this data, we will develop a control chart. A control chart will give us a graphical way to determine if the output of the bottles is staying with statistical control, or if things are starting to get off track. The first step in this process is to calculate the \(R\) value for each of the rows. The \(R\) value is defined as follows:

\[R_i=max(S_i)-min(S_i)\]

Where S is a list of all the samples in row i.

We will also create a column called X-bar that will be the mean of the samples.

\[\bar{X}=mean(X)\]

get_r <- function(x){
  max(x)-min(x)
}

get_xb <- function(x){
  mean(x)
}

bottles$R <- apply(bottles[2:5], 1, get_r)
bottles$Xb <- apply(bottles[2:5],1, get_xb)
bottles
##   n     a     b     c     d     R    Xb
## 1 1 0.604 0.612 0.588 0.600 0.024 0.601
## 2 2 0.597 0.601 0.607 0.603 0.010 0.602
## 3 3 0.581 0.570 0.585 0.592 0.022 0.582
## 4 4 0.620 0.605 0.595 0.588 0.032 0.602
## 5 5 0.590 0.614 0.608 0.604 0.024 0.604
## 6 6 0.585 0.583 0.617 0.579 0.038 0.591

The next step is to calculate the mean of the R values. This value is referred to as “R bar” and represented with the symbol:

\[\bar{R}=mean(R)\]

Rb = mean(bottles$R)
print(Rb)
## [1] 0.025

Next we compute the upper and lower control units based on the following formula:

\[ UCL_r=D_4* \bar{R} \] \[ LCL_R=D_3* \bar{R} \]

Where D3 and D4 come from the following table, which is given based on the statistical level of control desired.

Sample Size \(D_4\) (Upper) \(D_3\) (Lower)
2 3.268 0
3 2.574 0
4 2.282 0
5 2.115 0
6 2.004 0
7 2.924 .0767
8 1.864 0.136

Based on the above we have:

# Note that N is our sample size, which is 4.
D3= 0
D4=2.282
UCLr = D4*Rb
LCLr = D3*Rb

Next we add these values to our data frame, and this allows us to plot the R control chart.

bottles$UCLr <- rep(UCLr,nrow(bottles))
bottles$LCLr <- rep(LCLr,nrow(bottles))
bottles
##   n     a     b     c     d     R    Xb    UCLr LCLr
## 1 1 0.604 0.612 0.588 0.600 0.024 0.601 0.05705    0
## 2 2 0.597 0.601 0.607 0.603 0.010 0.602 0.05705    0
## 3 3 0.581 0.570 0.585 0.592 0.022 0.582 0.05705    0
## 4 4 0.620 0.605 0.595 0.588 0.032 0.602 0.05705    0
## 5 5 0.590 0.614 0.608 0.604 0.024 0.604 0.05705    0
## 6 6 0.585 0.583 0.617 0.579 0.038 0.591 0.05705    0

We can now plot our R chart.

library(ggplot2)
ggplot(bottles) + geom_line(aes(x=n, y=R)) +
  geom_line(aes(x=n, y=UCLr)) +
  geom_line(aes(x=n, y=LCLr)) +
  ylim(-0.05, .10)

Next, we can create a x-bar chart. Similar to the R chart, we have two formulas and a table. The formulas are as follows:

\[ UCLx = \overline{\overline{X}}+A_2*\bar{R}\] \[ LCLx = \overline{\overline{X}}-A_2*\bar{R}\]

Where X “double bar” is the mean of all the samples, and \(A2\) comes from the following table, again based on sample size:

Sample Size \(A_2\) (Mean Factor)
2 1.880
3 1.023
4 .729
5 .577
6 .483
7 .419
8 .373

Doing the calculations in R, we have the following:

Xbb = mean(as.matrix(bottles[,2:5]))
# Based on sample size of n=4

A2 = 0.729
UCLx = Xbb + A2 * Rb
LCLx = Xbb - A2 * Rb

Adding these to the data frame, along with the mean of the samples and graphing we have:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble  3.0.1     ✓ purrr   0.3.4
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
bottles$UCLx <- rep(UCLx,nrow(bottles))
bottles$LCLx <- rep(LCLx,nrow(bottles))
ggplot(bottles) + geom_line(aes(x=n, y=Xb)) +
  geom_line(aes(x=n, y=UCLx)) +
  geom_line(aes(x=n, y=LCLx)) +
  ylim(.55,.65)

Based on the above, since the mean of the samples stays within the upper and lower control limits, we can say that the process is in control.

Process Capability Ratio (Cp) and Index (Cpk)

The Process Capability Index, or Cp and Cpk provide a way to determine if a process is actually capable of meeting a desired specification. In many cases, manufactured parts are often given design tolerances (for example 15 inches +/- .5). The ratio measures the spread of a process (not how well it is centered on the target value.)

The formula for the Cp ratio is given by

\[C_p = \frac{Upper-Lower}{6\sigma}\]

The Process Capability Index, or Cpk, looks at the proportion of variation between the center of the process and the nearest specification limit.

\[C_{pk}=min[(S_u-\overline{\overline{X}})/3\sigma, (\overline{\overline{X}}-S_l)/3\sigma] \]

Where \(S_u\) is the upper specification and \(S_l\) is the lower specification.

The calculations for these are shown below in R code:

\(C_{pk} = 1\) means the process meets specifications
\(C_{pk} > 1\) means the process is better than the specifications
\(C_{pk} < 1\) means the process does NOT meet the specifications

The following R code illustrates these calculations:

Given \(S_u=0.650\) \(S_L=0.550\) \(\sigma=.012\)

su=0.650
sl=0.550
sd=0.012
cp = (su-sl)/(6*sd)
cpk = min((su-Xbb)/(3*sd), (Xbb-sl)/(3*sd))
cap <- data.frame(cp=format(cp,scientific = F), cpk=cpk)
cap
##         cp      cpk
## 1 1.388889 1.305556

Therefore, the process is capable, and exceeds the specifications.

This concludes this write-up on control charts.