Here I will demonstrate how to produce Contol Charts using the qicharts package. I’ll start by modifying and exploring the data using the ggplot2 package then I’ll get into how to quickly produce control charts.
# Tools

For this demo I will use a series of data manipulation and visualiztion tools in addition to a useful helper function

Packages

  • readxl
  • ggplot2
  • tidyverse
  • lubridate
  • dplyr
  • qicharts
library(readxl) # Contains read_excel function for importing data
library(ggplot2) # General package for data visualization
library(tidyverse) # Group of packages for data manipulation
library(lubridate) # Date manipulation
library(qicharts) # Control Charts package
library(dplyr) # Data Manipulation

Helper Function

We use the multiplot function, courtesy of R Cookbooks to create multi-panel plots.

# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols:   Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)
  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)
  numPlots = length(plots)
  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }
 if (numPlots==1) {
    print(plots[[1]])
  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}

Data Exploration

Loading Data

Here we load the golden_gate.xlsx file found here.

We can get a feel for the contents of the file using the glimpse, first, and last functions

golden_gate <- read_excel(paste(path, "Golden Gate Bridge traffic TS//golden_gate.xlsx", sep = ''))
glimpse(golden_gate)
Observations: 168
Variables: 3
$ Row     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 5...
$ Month   <dttm> 1968-01-01, 1968-02-01, 1968-03-01, 1968-04-01, 1968-05-01, 1968-06-01, 1968-07-01, 1968-08-01, 1968-09-01, 1968-10-01, 1968-11-01, 1968-12-01, 1969-01-01, 1969-02-01, 1969-03-01, 1969-04-01, 1969-05-01, 1969-06-01,...
$ Traffic <dbl> 73637, 77136, 81481, 84127, 84562, 91959, 94174, 96087, 88952, 83479, 80814, 77466, 75225, 79418, 84813, 85691, 8749, 92995, 95375, 98396, 92791, 88018, 86899, 83636, 79245, 85536, 89313, 88785, 91307, 96394, 99864, ...
first(golden_gate$Month)
[1] "1968-01-01 UTC"
last(golden_gate$Month)
[1] "1981-12-01 UTC"

We see that the dataset contains three columns.
- The first is an arbitrary row number column.
- The second is the month for which the data was taken. The data appears to span from January 1968 (1968-01-01) to December 1981 (1981-12-01).
- The third column is then the number of cars that travelled across the Golden Gate bridge in the specified month.

Quick Traffic Exploration

Summary of the Traffic Feature

summary(golden_gate$Traffic)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1017   87909   93769   88012   98906  111475 
sd(golden_gate$Traffic)
[1] 23306.58

Here we see that an average day saw 88,012 cars cross the Golden Gate bridge. The slowest day saw only 1,017 cars while the peak observation saw a total of 111,475 cars. With a standard deviation of 23306.58, we should expect to find a few outliers to towards the bottom.

p1 <- ggplot(golden_gate, aes(x = Traffic)) +
  geom_density(fill = 'lightblue', alpha = .6) +
  geom_vline(xintercept = mean(golden_gate$Traffic), col = 'red', lty = 2) +
  theme_minimal() +
  labs(y = 'Density') +
  theme(axis.text.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x  = element_blank())
p2 <- ggplot(golden_gate, aes(x = 1, y = Traffic)) +
  geom_boxplot()+
  coord_flip() +
  xlab( 'Boxplot') +
  theme_minimal() +
  theme(axis.text.y = element_blank())
layout <- matrix(c(1,1,1,2),4,1, byrow = T)
multiplot(p1, p2, layout = layout)

Viewing the density plot shows a very stongly bi-modal dataset. With a total of 1237 observations in the lower mode, there should be sufficient data to properly use a control chart.

I’ll quickly visualize traffic data in comparison to dates in order to get a better feel of how traffic changes over time.

p1 <- ggplot(golden_gate, aes(x = Month, y = Traffic))+
  geom_line(col = 'midnightblue') +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  geom_smooth()
p2 <- golden_gate %>%
  ggplot(aes(Traffic)) +
  geom_histogram(fill = 'blue', bins = 30) +
  theme(axis.text.x= element_text(angle = 90))
p3 <- golden_gate %>%
  mutate(wday = wday(Month, label = T)) %>%
  group_by(wday) %>%
  summarize(DTraffic = median(Traffic)) %>%
  ggplot(aes(wday, DTraffic, fill = wday)) +
  geom_col() +
  labs( y = 'Mean Traffic', x = 'Weekday') +
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle= 90, hjust = 1, vjust = 0.9)) 
p4 <- golden_gate %>%
  mutate(month = month(Month , label = T)) %>%
  group_by(month) %>%
  summarize(MTraffic = median(Traffic), medianTraffic = median(Traffic)) %>%
  ggplot(aes(month, MTraffic, fill = month)) +
  geom_col() +
  labs(x = 'Month', y = '')+
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
layout <- matrix(c(1,1,1,1,2,3,4,4), nrow = 2 ,byrow = TRUE)
multiplot(p1, p2, p3, p4, layout = layout)

From the line graph, it appears there is a slight seasonality in addition to some datapoints that may have some error in collection. Dealing with those errors is outside the scope of this write-up. Traffic seems fairly consistent across weekdays with Wednesday and Thursday seeming to be the slowest days. The seasonality posit seems to be true looking at the median traffic by month. Late summer sees the most traffic with a dip during the winter months.

To make our qicharts demo a little easier to read, lets create a new object ‘gg_no_out’ in which we remove the extreme values.

gg_no_out <- golden_gate %>%
  filter(golden_gate$Traffic > 30000)
dim(gg_no_out)
[1] 156   3

This leaves us with 156 of our original 168 observations. Lets quickly reproduce the above charts using the new set of data

p1 <- ggplot(gg_no_out, aes(x = Month, y = Traffic))+
  geom_line(col = 'midnightblue') +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  geom_smooth()
p2 <- gg_no_out %>%
  ggplot(aes(Traffic)) +
  geom_histogram(fill = 'blue', bins = 30) +
  theme(axis.text.x= element_text(angle = 90))
p3 <- gg_no_out %>%
  mutate(wday = wday(Month, label = T)) %>%
  group_by(wday) %>%
  summarize(DTraffic = median(Traffic)) %>%
  ggplot(aes(wday, DTraffic, fill = wday)) +
  geom_col() +
  labs( y = 'Mean Traffic', x = 'Weekday') +
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle= 90, hjust = 1, vjust = 0.9))
p4 <- gg_no_out %>%
  mutate(month = month(Month , label = T)) %>%
  group_by(month) %>%
  summarize(MTraffic = median(Traffic), medianTraffic = median(Traffic)) %>%
  ggplot(aes(month, MTraffic, fill = month)) +
  geom_col() +
  labs(x = 'Month', y = '')+
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
layout <- matrix(c(1,1,1,1,2,3,4,4),nrow = 2, byrow = T)
multiplot(p1, p2, p3, p4, layout = layout)

The seasonality appears to be even stronger and the smoothing curve shows a much stronger positive trend.

All this being said, it seems we have a sufficient dataset for creating a Control Chart.

qicharts

qic Introduction

The qic() function within qicharts2 makes generating control charts very easy. The default control chart is a simple run chart and can be generated by simply passing the Traffic column to the qic function

qic(y = gg_no_out$Traffic )

This can be used to see the general trend but doesn’t contain the control limits. To put a little more information on the chart, simply specify “chart = ‘i’” in the qic function. This will place upper/lower limit lines as well as highlight points that fall outside of the 3-sigma confidence interval.

i Charts

qic(y = gg_no_out$Traffic, x = gg_no_out$Month,chart = 'i')

As you can see, this generates a beautiful control chart! I have additionally specified the x axis to contain date information.

One limitation to the qic function is that it does not allow the user to specify how many observations to use. For instance you cannot natively just select the most recent 30 instances of the observation. Lucky for us, R is very powerful and this effect can be achieved in just one line.

gg_no_out.recent <- tail(gg_no_out, n = 30)
q <-qic(y = gg_no_out.recent$Traffic, x = gg_no_out.recent$Month, chart = 'i')

Accessing Signal Points

Lastly we may want to investigate the points that fall outside of the control limits. We can do that by subsetting using the $signals element of the qic object.

signal <- q$signals
gg_no_out.recent[signal , ]

Implementation

Using the qic function would easily enough be generalized to be applied to truck MPG or some other metric through rstat if we can figure out how to get it installed properly. This is far from the end of qicharts2’s usefulness. Additional information can be found https://cran.r-project.org/web/packages/qicharts/vignettes/controlcharts.html

---
title: "Control Charts with qicharts"
author: "Orrin Wheeler"
date: "1/25/18"
output:
  html_document: default
  html_notebook:
    code_folding: hide
    theme: flatly
    toc: yes
    toc_float: yes
  pdf_document: default
---

Here I will demonstrate how to produce Contol Charts using the qicharts package.  I'll start by modifying and exploring the data using the ggplot2 package then I'll get into how to quickly produce control charts.  
# Tools

For this demo I will use a series of data manipulation and visualiztion tools in addition to a useful helper function

### Packages

- readxl
- ggplot2
- tidyverse
- lubridate
- dplyr
- qicharts
```{r loadPackage ,warning=F, message=F}
library(readxl) # Contains read_excel function for importing data
library(ggplot2) # General package for data visualization
library(tidyverse) # Group of packages for data manipulation
library(lubridate) # Date manipulation
library(qicharts) # Control Charts package
library(dplyr) # Data Manipulation
```

### Helper Function
We use the *multiplot* function, courtesy of [R Cookbooks](http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/) to create multi-panel plots. 

```{r helperFunction}
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols:   Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}
```



# Data Exploration

### Loading Data
Here we load the golden_gate.xlsx file found [here](http://www.statpoint.net/default.aspx).

We can get a feel for the contents of the file using the glimpse, first, and last functions
```{r setPath ,echo = F}
path <- 'C://Users//e085724//Desktop//R//'
```

```{r loadData}
golden_gate <- read_excel(paste(path, "Golden Gate Bridge traffic TS//golden_gate.xlsx", sep = ''))
glimpse(golden_gate)
first(golden_gate$Month)
last(golden_gate$Month)
```
```{r convertToLubridateDate ,echo = F}
golden_gate$Month <- ymd(golden_gate$Month)
```


We see that the dataset contains three columns.  
- The first is an arbitrary row number column.  
- The second is the month for which the data was taken.  The data appears to span from January 1968 (**`r first(golden_gate$Month)`**) to December 1981 (**`r last(golden_gate$Month)`**).  
- The third column is then the number of cars that travelled across the Golden Gate bridge in the specified month.  


### Quick Traffic Exploration

Summary of the Traffic Feature
```{r trafficSummary}
summary(golden_gate$Traffic)
sd(golden_gate$Traffic)
```



Here we see that an average day saw 88,012 cars cross the Golden Gate bridge.  The slowest day saw only 1,017 cars while the peak observation saw a total of 111,475 cars.  With a standard deviation of 23306.58, we should expect to find a few outliers to towards the bottom.  
```{r trafficDensityBoxplot}
p1 <- ggplot(golden_gate, aes(x = Traffic)) +
  geom_density(fill = 'lightblue', alpha = .6) +
  geom_vline(xintercept = mean(golden_gate$Traffic), col = 'red', lty = 2) +
  theme_minimal() +
  labs(y = 'Density') +
  theme(axis.text.y = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x  = element_blank())

p2 <- ggplot(golden_gate, aes(x = 1, y = Traffic)) +
  geom_boxplot()+
  coord_flip() +
  xlab( 'Boxplot') +
  theme_minimal() +
  theme(axis.text.y = element_blank())

layout <- matrix(c(1,1,1,2),4,1, byrow = T)
multiplot(p1, p2, layout = layout)
```

Viewing the density plot shows a very stongly bi-modal dataset.  With a total of `r sum(which(golden_gate$Traffic < 30000)) ` observations in the lower mode, there should be sufficient data to properly use a control chart.


I'll quickly visualize traffic data in comparison to dates in order to get a better feel of how traffic changes over time.

```{r trafficBreakdown}
p1 <- ggplot(golden_gate, aes(x = Month, y = Traffic))+
  geom_line(col = 'midnightblue') +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  geom_smooth()

p2 <- golden_gate %>%
  ggplot(aes(Traffic)) +
  geom_histogram(fill = 'blue', bins = 30) +
  theme(axis.text.x= element_text(angle = 90))

p3 <- golden_gate %>%
  mutate(wday = wday(Month, label = T)) %>%
  group_by(wday) %>%
  summarize(DTraffic = median(Traffic)) %>%
  ggplot(aes(wday, DTraffic, fill = wday)) +
  geom_col() +
  labs( y = 'Mean Traffic', x = 'Weekday') +
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle= 90, hjust = 1, vjust = 0.9)) 

p4 <- golden_gate %>%
  mutate(month = month(Month , label = T)) %>%
  group_by(month) %>%
  summarize(MTraffic = median(Traffic), medianTraffic = median(Traffic)) %>%
  ggplot(aes(month, MTraffic, fill = month)) +
  geom_col() +
  labs(x = 'Month', y = '')+
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))

layout <- matrix(c(1,1,1,1,2,3,4,4), nrow = 2 ,byrow = TRUE)
multiplot(p1, p2, p3, p4, layout = layout)
```

From the line graph, it appears there is a slight seasonality in addition to some datapoints that may have some error in collection.  Dealing with those errors is outside the scope of this write-up. Traffic seems fairly consistent across weekdays with Wednesday and Thursday seeming to be the slowest days. The seasonality posit seems to be true looking at the median traffic by month. Late summer sees the most traffic with a dip during the winter months. 

To make our qicharts demo a little easier to read, lets create a new object 'gg_no_out' in which we remove the extreme values.

```{r creategg_no_out}
gg_no_out <- golden_gate %>%
  filter(golden_gate$Traffic > 30000)
dim(gg_no_out)
```
This leaves us with 156 of our original 168 observations.  Lets quickly reproduce the above charts using the new set of data
```{r trafficBreakdowngg_no_out, warning=T, message=T}
p1 <- ggplot(gg_no_out, aes(x = Month, y = Traffic))+
  geom_line(col = 'midnightblue') +
  theme_minimal() +
  theme(axis.title.x = element_blank()) +
  geom_smooth()

p2 <- gg_no_out %>%
  ggplot(aes(Traffic)) +
  geom_histogram(fill = 'blue', bins = 30) +
  theme(axis.text.x= element_text(angle = 90))

p3 <- gg_no_out %>%
  mutate(wday = wday(Month, label = T)) %>%
  group_by(wday) %>%
  summarize(DTraffic = median(Traffic)) %>%
  ggplot(aes(wday, DTraffic, fill = wday)) +
  geom_col() +
  labs( y = 'Mean Traffic', x = 'Weekday') +
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle= 90, hjust = 1, vjust = 0.9))

p4 <- gg_no_out %>%
  mutate(month = month(Month , label = T)) %>%
  group_by(month) %>%
  summarize(MTraffic = median(Traffic), medianTraffic = median(Traffic)) %>%
  ggplot(aes(month, MTraffic, fill = month)) +
  geom_col() +
  labs(x = 'Month', y = '')+
  theme_minimal() +
  theme(legend.position = 'none', axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))



layout <- matrix(c(1,1,1,1,2,3,4,4),nrow = 2, byrow = T)
multiplot(p1, p2, p3, p4, layout = layout)
```

The seasonality appears to be even stronger and the smoothing curve shows a much stronger positive trend. 

All this being said, it seems we have a sufficient dataset for creating a Control Chart.



# qicharts

### qic Introduction

The qic() function within qicharts2 makes generating control charts very easy.  The default control chart is a simple run chart and can be generated by simply passing the Traffic column to the qic function

```{r qic}
qic(y = gg_no_out$Traffic )
```

This can be used to see the general trend but doesn't contain the control limits.  To put a little more information on the chart, simply specify "chart = 'i'" in the qic function.  This will place upper/lower limit lines as well as highlight points that fall outside of the 3-sigma confidence interval.

### i Charts

```{r qicIChart}
qic(y = gg_no_out$Traffic, x = gg_no_out$Month,chart = 'i')
```

As you can see, this generates a beautiful control chart!  I have additionally specified the x axis to contain date information.  

One limitation to the qic function is that it does not allow the user to specify how many observations to use.  For instance you cannot natively just select the most recent 30 instances of the observation.  Lucky for us, R is very powerful and this effect can be achieved in just one line.  


```{r recentQicIChart}
gg_no_out.recent <- tail(gg_no_out, n = 30)
q <-qic(y = gg_no_out.recent$Traffic, x = gg_no_out.recent$Month, chart = 'i')
```


### Accessing Signal Points

Lastly we may want to investigate the points that fall outside of the control limits.  We can do that by subsetting using the $signals element of the qic object.

```{r}
signal <- q$signals
gg_no_out.recent[signal , ]
```


# Implementation
Using the qic function would easily enough be generalized to be applied to truck MPG or some other metric through rstat if we can figure out how to get it installed properly.  This is far from the end of qicharts2's usefulness.  Additional information can be found https://cran.r-project.org/web/packages/qicharts/vignettes/controlcharts.html 

