Finding Changes in your data

Author: Carlos Jorge

Looking at historical values (mean) to look for changes by comparing to current values in a set of data points is the purpose of this paper.

Western Electric Company was looking for a way to find problematic electrical lines and came up with a set of rules now known as Statistical Quality Control handbook.

Thanks to the cran-r-project and their work developing the “qcc” R library you can apply these same set of rules to your data to see if you can see changes in your realtime or present data.

We will first look at randomly creating a dataset of historical values with a mean of 100 by replicating it 400 times using the rep() function. We then use the rnorm() function to create 500 values to this repeated set of numbers to add noise to the original data. rnorm(400) generates 400 random numbers centered around zero. The function basically generates a basic normal curve which is then sent to variable x. A histogram will show us the normal curve with a mean close to 100.0 representing historical data.

library(qcc)
package 㤼㸱qcc㤼㸲 was built under R version 3.5.3  __ _  ___ ___ 
 / _  |/ __/ __|  Quality Control Charts and 
| (_| | (_| (__   Statistical Process Control
 \__  |\___\___|
    |_|           version 2.7
Type 'citation("qcc")' for citing this R package in publications.
x <- rep(100,400) + rnorm(400)
mean(x)
[1] 100.0893
hist(x)

We now create a new set of data points , representing current data , by again randomly creating 25 new data points with a mean of 101 which the library(qcc) should flag.

xx <- rep(101,25) + rnorm(25)
mean(xx)
[1] 100.9047

Now we can compare historical data with the current data using the qcc library.

qcc(x,newdata = xx,type = "xbar.one")
List of 15
 $ call        : language qcc(data = x, type = "xbar.one", newdata = xx)
 $ type        : chr "xbar.one"
 $ data.name   : chr "x"
 $ data        : num [1:400, 1] 99.3 98.2 98.6 97.9 100.6 ...
  ..- attr(*, "dimnames")=List of 2
 $ statistics  : Named num [1:400] 99.3 98.2 98.6 97.9 100.6 ...
  ..- attr(*, "names")= chr [1:400] "1" "2" "3" "4" ...
 $ sizes       : int [1:400] 1 1 1 1 1 1 1 1 1 1 ...
 $ center      : num 100
 $ std.dev     : num 1.05
 $ newstats    : Named num [1:25] 102 101 103 100 101 ...
  ..- attr(*, "names")= chr [1:25] "401" "402" "403" "404" ...
 $ newdata     : num [1:25, 1] 102 101 103 100 101 ...
 $ newsizes    : int [1:25] 1 1 1 1 1 1 1 1 1 1 ...
 $ newdata.name: chr "xx"
 $ nsigmas     : num 3
 $ limits      : num [1, 1:2] 96.9 103.2
  ..- attr(*, "dimnames")=List of 2
 $ violations  :List of 2
 - attr(*, "class")= chr "qcc"

Because the current data’s mean was higher, the qcc library flagged UCL . The graph is read first by looking at the top most labels showing Calibration data or “Historical Data”, next is the New data or “Current Data” . Current data points are flagged and show to be above the mean and your presumption is that your data is changing and could be viewed as a change in values.

We will now Add warning limits at 2 std deviations

q1 <- qcc(x,newdata=xx, type="xbar.one",plot=FALSE)
(warn.limits =limits.xbar(q1$center,q1$std.dev, q1$sizes,2))
      LCL      UCL
 97.99358 102.1849

Plotting

plot(q1, restore.par = FALSE)
abline(h = warn.limits, lty = 3, col = "red")

LS0tDQp0aXRsZTogIlN0YXRpc3RpY2FsIFF1YWxpdHkgQ29udHJvbCINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQojIyBGaW5kaW5nIENoYW5nZXMgaW4geW91ciBkYXRhDQpBdXRob3I6IENhcmxvcyBKb3JnZQ0KDQpMb29raW5nIGF0IGhpc3RvcmljYWwgdmFsdWVzIChtZWFuKSAgdG8gbG9vayBmb3IgY2hhbmdlcyBieSBjb21wYXJpbmcgdG8gY3VycmVudCB2YWx1ZXMgaW4gYSBzZXQgb2YgZGF0YSBwb2ludHMgaXMgdGhlIHB1cnBvc2Ugb2YgdGhpcyBwYXBlci4NCg0KV2VzdGVybiBFbGVjdHJpYyBDb21wYW55IHdhcyBsb29raW5nIGZvciBhIHdheSB0byBmaW5kIHByb2JsZW1hdGljIGVsZWN0cmljYWwgbGluZXMgYW5kIGNhbWUgdXAgd2l0aCBhIHNldCBvZiBydWxlcyBub3cga25vd24gYXMgW1N0YXRpc3RpY2FsIFF1YWxpdHkgQ29udHJvbF0oaHR0cHM6Ly93d3cud2VzdGVybmVsZWN0cmljLmNvbS9zdXBwb3J0LXN0YXRpc3RpY2FsLXF1YWxpdHktY29udHJvbC1oYW5kYm9vay5odG1sKSBoYW5kYm9vay4gDQoNClRoYW5rcyB0byB0aGUgW2NyYW4tci1wcm9qZWN0XShodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvcWNjL3ZpZ25ldHRlcy9xY2NfYV9xdWlja190b3VyLmh0bWwpIGFuZCB0aGVpciB3b3JrIGRldmVsb3BpbmcgdGhlICAicWNjIiBSIGxpYnJhcnkgeW91IGNhbiBhcHBseSB0aGVzZSBzYW1lIHNldCBvZiBydWxlcyB0byB5b3VyIGRhdGEgdG8gc2VlIGlmIHlvdSBjYW4gc2VlIGNoYW5nZXMgaW4geW91ciByZWFsdGltZSBvciBwcmVzZW50IGRhdGEuDQoNCldlIHdpbGwgZmlyc3QgbG9vayBhdCByYW5kb21seSBjcmVhdGluZyBhIGRhdGFzZXQgb2YgaGlzdG9yaWNhbCB2YWx1ZXMgd2l0aCBhIG1lYW4gb2YgMTAwIGJ5IHJlcGxpY2F0aW5nIGl0IDQwMCB0aW1lcyB1c2luZyB0aGUgW3JlcCgpXShodHRwczovL3d3dy5yZG9jdW1lbnRhdGlvbi5vcmcvcGFja2FnZXMvYmFzZS92ZXJzaW9ucy8zLjYuMC90b3BpY3MvcmVwKSBmdW5jdGlvbi4gDQpXZSB0aGVuIHVzZSB0aGUgW3Jub3JtKCldKGh0dHBzOi8vd3d3LnJkb2N1bWVudGF0aW9uLm9yZy9wYWNrYWdlcy9jb21wb3NpdGlvbnMvdmVyc2lvbnMvMS40MC0yL3RvcGljcy9ybm9ybSkgZnVuY3Rpb24gdG8gY3JlYXRlIDUwMCB2YWx1ZXMgdG8gdGhpcyByZXBlYXRlZCBzZXQgb2YgbnVtYmVycyB0byBhZGQgbm9pc2UgdG8gdGhlIG9yaWdpbmFsIGRhdGEuIA0Kcm5vcm0oNDAwKSBnZW5lcmF0ZXMgNDAwIHJhbmRvbSBudW1iZXJzIGNlbnRlcmVkIGFyb3VuZCB6ZXJvLiBUaGUgZnVuY3Rpb24gYmFzaWNhbGx5IGdlbmVyYXRlcyBhIGJhc2ljIG5vcm1hbCBjdXJ2ZSB3aGljaCBpcyB0aGVuIHNlbnQgdG8gdmFyaWFibGUgeC4NCkEgaGlzdG9ncmFtIHdpbGwgc2hvdyB1cyB0aGUgbm9ybWFsIGN1cnZlIHdpdGggYSBtZWFuIGNsb3NlIHRvIDEwMC4wIHJlcHJlc2VudGluZyBoaXN0b3JpY2FsIGRhdGEuDQoNCmBgYHtyfQ0KbGlicmFyeShxY2MpDQp4IDwtIHJlcCgxMDAsNDAwKSArIHJub3JtKDQwMCkNCm1lYW4oeCkNCmhpc3QoeCkNCmBgYA0KDQpXZSBub3cgY3JlYXRlIGEgbmV3IHNldCBvZiBkYXRhIHBvaW50cyAsIHJlcHJlc2VudGluZyBjdXJyZW50IGRhdGEgLCBieSBhZ2FpbiByYW5kb21seSBjcmVhdGluZyAyNSBuZXcgZGF0YSBwb2ludHMgd2l0aCBhIG1lYW4gb2YgMTAxIHdoaWNoIHRoZSBsaWJyYXJ5KHFjYykgc2hvdWxkIGZsYWcuDQoNCmBgYHtyfQ0KeHggPC0gcmVwKDEwMSwyNSkgKyBybm9ybSgyNSkNCm1lYW4oeHgpDQoNCmBgYA0KDQoNCk5vdyB3ZSBjYW4gY29tcGFyZSBoaXN0b3JpY2FsIGRhdGEgd2l0aCB0aGUgY3VycmVudCBkYXRhIHVzaW5nIHRoZSBxY2MgbGlicmFyeS4NCg0KYGBge3J9DQpxY2MoeCxuZXdkYXRhID0geHgsdHlwZSA9ICJ4YmFyLm9uZSIpDQpgYGANCg0KDQpCZWNhdXNlIHRoZSBjdXJyZW50IGRhdGEncyBtZWFuIHdhcyBoaWdoZXIsIHRoZSBxY2MgbGlicmFyeSBmbGFnZ2VkIFVDTCAuDQpUaGUgZ3JhcGggaXMgcmVhZCBmaXJzdCBieSBsb29raW5nIGF0IHRoZSB0b3AgbW9zdCBsYWJlbHMgc2hvd2luZyBDYWxpYnJhdGlvbiBkYXRhIG9yICJIaXN0b3JpY2FsIERhdGEiLCBuZXh0IGlzIHRoZSBOZXcgZGF0YSBvciAiQ3VycmVudCBEYXRhIiAuIEN1cnJlbnQgZGF0YSBwb2ludHMgYXJlIGZsYWdnZWQgYW5kIHNob3cgdG8gYmUgYWJvdmUgdGhlIG1lYW4gYW5kIHlvdXIgcHJlc3VtcHRpb24gaXMgdGhhdCB5b3VyIGRhdGEgaXMgY2hhbmdpbmcgYW5kIGNvdWxkIGJlIHZpZXdlZCBhcyBhIGNoYW5nZSBpbiB2YWx1ZXMuDQoNCg0KIyMgIFdlIHdpbGwgbm93IEFkZCB3YXJuaW5nIGxpbWl0cyBhdCAyIHN0ZCBkZXZpYXRpb25zDQoNCmBgYHtyIH0NCnExIDwtIHFjYyh4LG5ld2RhdGE9eHgsIHR5cGU9InhiYXIub25lIixwbG90PUZBTFNFKQ0KKHdhcm4ubGltaXRzID1saW1pdHMueGJhcihxMSRjZW50ZXIscTEkc3RkLmRldiwgcTEkc2l6ZXMsMikpDQpgYGANCg0KIyMgUGxvdHRpbmcgDQpgYGB7cn0NCnBsb3QocTEsIHJlc3RvcmUucGFyID0gRkFMU0UpDQphYmxpbmUoaCA9IHdhcm4ubGltaXRzLCBsdHkgPSAzLCBjb2wgPSAicmVkIikNCmBgYA==