For a real life working example, we have this dataset graciously provided to us by the National Institute of Standards and Technology (http://www.nist.gov/) for one of their test bed parts. We will be trying to solve the first example from the introduction, which was a real problem faced by the NIST researchers. As we walk through each step of the exploratory process, you will understand how you can use similar techniques to solve similar issues that you might have with your Machine Tools.
Accurately estimate the productive time of a part, from the data of a part that was manufactured with interruptions. Productive time is defined as the time taken by the machine to complete the part excluding interruptions and inefficiencies.
First let us define the data files that we are going to be working with. Two sets of example data from the MTConnect Agent samples and the result of the MTConnect probe are provided along with the package. We will be using the one provided by nist for this analysis. Note that the package can read in a compressed file as if it were a normal file as well for the samples.
suppressPackageStartupMessages(require(mtconnectR))
suppressPackageStartupMessages(require(dplyr))
file_path_dmtcd = "../data/delimited_mtc_data/nist_test_bed/GF_Agie.tar.gz"
file_path_xml = "../data/delimited_mtc_data/nist_test_bed/Devices.xml"
Before we read in the data into the MTC Device Class, it might help us a bit in understanding a bit about the data items that we have.
The MTConnectDevices XML document has information about the logical components of one or more devices. This file can obtained using the probe request from an MTConnect Agent.
We can check out the devices for which the info is present in the devices XML using the get_device_info_from_xml function. From the device info, we can select the name of the device that we want to analyse further
(device_info = get_device_info_from_xml(file_path_xml))
## name uuid id
## 1 nist_testbed_Mazak_QT_1 nist_testbed_Mazak_QT_1_74fd52 Mazak_QT_1_1
## 2 nist_testbed_GF_Agie_1 nist_testbed_GF_Agie_1_3a0e8a GF_Agie_1_78
device_name = device_info$name[2]
The get_xpath_from_xml function can read in the xpath info for a single device into a easily read data.frame format.
The data.frame contains the id and name of each data item and the xpath along with the type, category and subType of the data_item. It is easy to find out what are the data items of a particular type using this function. For example, we are going to find out the conditions data items which we will be using in the next step.
xpath_info = get_xpaths_from_xml(file_path_xml, device_name)
head(xpath_info)
## id name type category subType
## 1 dtop_79 avail AVAILABILITY EVENT <NA>
## 2 dtop_80 estop EMERGENCY_STOP EVENT <NA>
## 3 dtop_81 system SYSTEM CONDITION <NA>
## 4 X_84 Xposition POSITION SAMPLE ACTUAL
## 5 Y_86 Yposition POSITION SAMPLE ACTUAL
## 6 Z_88 Zposition POSITION SAMPLE ACTUAL
## xpath
## 1 nist_testbed_GF_Agie_1<Device>:avail<AVAILABILITY>
## 2 nist_testbed_GF_Agie_1<Device>:estop<EMERGENCY_STOP>
## 3 nist_testbed_GF_Agie_1<Device>:system<SYSTEM>
## 4 nist_testbed_GF_Agie_1<Device>:Xposition<POSITION-ACTUAL>
## 5 nist_testbed_GF_Agie_1<Device>:Yposition<POSITION-ACTUAL>
## 6 nist_testbed_GF_Agie_1<Device>:Zposition<POSITION-ACTUAL>
MTConnectStreams dataMTConnectStreams data from an MTConnect Agent can be collected using a ruby script to generate a delimited log of device data (referred to in this document as log data) which is then used by the mtconnectR Package.
create_mtc_device_from_dmtcd function can read in both the Delimited MTConnect data (DMTCD) and the xml data for a device and combine it into a single MTCDevice Class with the data organized separately for each data item.
mtc_device = create_mtc_device_from_dmtcd(file_path_dmtcd, file_path_xml, device_name)
## Reading Delimted MTC data...
names(mtc_device@data_item_list)
It looks like we have the position data items that we might need for this analysis in the log data. Let’s see the variation in position. We can plot all the data items in one plot using ggplot2.
require("ggplot2")
## Loading required package: ggplot2
require("reshape2")
## Loading required package: reshape2
xpos_data = getDataItem(mtc_device, "Xposition") %>% getData()
ypos_data = getDataItem(mtc_device, "Yposition") %>% getData()
zpos_data = getDataItem(mtc_device, "Zposition") %>% getData()
ggplot() + geom_line(data = xpos_data, aes(x = timestamp, y = value))
ggplot() + geom_line(data = ypos_data, aes(x = timestamp, y = value))
ggplot() + geom_line(data = zpos_data, aes(x = timestamp, y = value))
It looks like the machine is going back and forth quite some distance quite often, across all the axes. We also don’t know how this traversal varies across different axis. However, we can get a much better idea of the motion if we could plot one axis against the other. For that we have to merge the different data items. Since the different data items have different timestamp values as the key, it is not as straightforward as doing a join of one data item against the other. For this purpose, the mtconnect packge has a merge method defined for the MTCDevice Class
merged_pos_data = merge(mtc_device, "position") # merge all dataitems with the word position
head(merged_pos_data)
## timestamp
## 1 2015-11-02 14:58:49.994391
## 2 2015-11-02 14:59:03.742392
## 3 2015-11-02 14:59:03.886408
## 4 2015-11-02 14:59:14.122334
## 5 2015-11-02 14:59:14.270387
## 6 2015-11-02 14:59:22.486440
## nist_testbed_GF_Agie_1<Device>:Aposition<ANGLE-ACTUAL>
## 1 -0.0001
## 2 -0.0001
## 3 -0.0001
## 4 -0.0001
## 5 -0.0001
## 6 -0.0001
## nist_testbed_GF_Agie_1<Device>:Cposition<ANGLE-ACTUAL>
## 1 0.0278
## 2 0.0278
## 3 0.0278
## 4 0.0278
## 5 0.0278
## 6 0.0278
## nist_testbed_GF_Agie_1<Device>:Xposition<POSITION-ACTUAL>
## 1 33.69547
## 2 33.69548
## 3 33.69547
## 4 33.69548
## 5 33.69547
## 6 33.69548
## nist_testbed_GF_Agie_1<Device>:Yposition<POSITION-ACTUAL>
## 1 -38.69783
## 2 -38.69783
## 3 -38.69783
## 4 -38.69783
## 5 -38.69783
## 6 -38.69784
## nist_testbed_GF_Agie_1<Device>:Zposition<POSITION-ACTUAL>
## 1 20.37543
## 2 20.37543
## 3 20.37543
## 4 20.37543
## 5 20.37543
## 6 20.37543
Oops. Looks like we have also merged in the angular position. Let’s try a more directed merge. Also, the names of the data items have the full xpaths attached to them. While this might be useful in other circumstances to get the hierarchical position of the data, we can dispense with it now using the extract_param_from_xpath function. Let’s view the data after that
merged_pos_data = merge(mtc_device, "position<POSITION-ACTUAL") # merge all dataitems with the word position
names(merged_pos_data) = extract_param_from_xpath(names(merged_pos_data), param = "DIName", show_warnings = F)
head(merged_pos_data)
## timestamp Xposition Yposition Zposition
## 1 2015-11-02 14:58:49.994391 33.69547 -38.69783 20.37543
## 2 2015-11-02 14:59:03.742392 33.69548 -38.69783 20.37543
## 3 2015-11-02 14:59:03.886408 33.69547 -38.69783 20.37543
## 4 2015-11-02 14:59:14.122334 33.69548 -38.69783 20.37543
## 5 2015-11-02 14:59:14.270387 33.69547 -38.69783 20.37543
## 6 2015-11-02 14:59:22.486440 33.69548 -38.69784 20.37543
Much better. Now let’s plot the data items in one shot.
ggplot(data = merged_pos_data, aes(x = timestamp)) +
geom_line(aes(y = Xposition, col = 'Xpos')) +
geom_line(aes(y = Yposition, col = 'Ypos')) +
geom_line(aes(y = Zposition, col = 'Zpos')) +
theme(legend.title = element_blank())
It does look the sudden traverals are simultaenous across the axes. Plotting one axes against the other leads to the same conclusion. It also gives us an idea of the different representations of the part
ggplot(data = merged_pos_data, aes(x = Xposition, y = Yposition)) + geom_path()
ggplot(data = merged_pos_data, aes(x = Xposition, y = Zposition)) + geom_path()
ggplot(data = merged_pos_data, aes(x = Zposition, y = Yposition)) + geom_path()
So the machine tool is going to the origin every so often.
It might help our analysis to also calculate a few process parameters that the machine tool is not providing directly. Here we are going to calculate the actual Path Feedrate of the machine as it executes the process using the position data.
Path Feedrate can be calculated as the rate of change of the position values. Here, we must use the 3-dimensional distance value and not just one of the position vectors.
PFR = Total Distance / Total Time = Sqrt(Sum of Squares of distane along individual axis) / time taken for motion
position_change_3d =
((lead(merged_pos_data$Xposition, 1) - merged_pos_data$Xposition) ^ 2 +
(lead(merged_pos_data$Yposition, 1) - merged_pos_data$Yposition) ^ 2 +
(lead(merged_pos_data$Zposition, 1) - merged_pos_data$Zposition) ^ 2 ) ^ 0.5
merged_pos_data$time_taken =
lead(as.numeric(merged_pos_data$timestamp), 1) - as.numeric(merged_pos_data$timestamp)
merged_pos_data$pfr = round(position_change_3d / merged_pos_data$time_taken, 4)
dt.df <- melt(merged_pos_data, measure.vars = c("pfr", "Xposition", "Yposition"))
ggplot(dt.df, aes(x = timestamp, y = value)) +
geom_line(aes(color = variable)) +
facet_grid(variable ~ ., scales = "free_y")
## Warning: Removed 1 rows containing missing values (geom_path).
ggplot(data = merged_pos_data, aes(x = timestamp)) +
geom_step(aes(y = pfr)) +
geom_step(aes(y = Xposition))
## Warning: Removed 1 rows containing missing values (geom_path).
Let’s add this derived data back into the MTCDevice Class.
pfr_data = merged_pos_data %>% select(timestamp, value = pfr) # Structuring data correctly
mtc_device = add_data_item_to_mtc_device(mtc_device, pfr_data, data_item_name = "pfr<PATH_FEEDRATE>",
data_item_type = "Sample", source_type = "calculated")
names(mtc_device@data_item_list)
## [1] "nist_testbed_GF_Agie_1<Device>:Aposition<ANGLE-ACTUAL>"
## [2] "nist_testbed_GF_Agie_1<Device>:avail<AVAILABILITY>"
## [3] "nist_testbed_GF_Agie_1<Device>:Cposition<ANGLE-ACTUAL>"
## [4] "nist_testbed_GF_Agie_1<Device>:estop<EMERGENCY_STOP>"
## [5] "nist_testbed_GF_Agie_1<Device>:execution<EXECUTION>"
## [6] "nist_testbed_GF_Agie_1<Device>:Fovr<PATH_FEEDRATE-OVERRIDE>"
## [7] "nist_testbed_GF_Agie_1<Device>:line<LINE>"
## [8] "nist_testbed_GF_Agie_1<Device>:mode<CONTROLLER_MODE>"
## [9] "nist_testbed_GF_Agie_1<Device>:move<x:MOTION>"
## [10] "nist_testbed_GF_Agie_1<Device>:program<PROGRAM>"
## [11] "nist_testbed_GF_Agie_1<Device>:Sovr<SPINDLE_SPEED-OVERRIDE>"
## [12] "nist_testbed_GF_Agie_1<Device>:Xposition<POSITION-ACTUAL>"
## [13] "nist_testbed_GF_Agie_1<Device>:Yposition<POSITION-ACTUAL>"
## [14] "nist_testbed_GF_Agie_1<Device>:Zposition<POSITION-ACTUAL>"
## [15] "pfr<PATH_FEEDRATE>"
Our first task is to identify the periods when the machine was idle. For this we can use a few approaches.
We will be trying out all the approaches and choosing union of the three as the period when machine is idle.
# Getting all the relevant data
merged_data = merge(mtc_device, "EXECUTION|PATH_FEEDRATE|POSITION")
names(merged_data) = extract_param_from_xpath(names(merged_data), param = "DIName", show_warnings = F)
merged_data = merged_data %>%
mutate(exec_idle = F, feed_idle = F, override_idle = F) %>% # Setting everything false by default
mutate(exec_idle = replace(exec_idle, !(execution %in% "ACTIVE"), TRUE)) %>%
mutate(feed_idle = replace(feed_idle, pfr < 0.01, TRUE)) %>%
mutate(override_idle = replace(override_idle, Fovr < 1, TRUE)) %>%
mutate(machine_idle = as.logical(exec_idle + feed_idle + override_idle))
head(merged_data)
## timestamp execution Fovr Xposition Yposition
## 1 2015-11-02 14:58:49.990541 <NA> 111.25 NA NA
## 2 2015-11-02 14:58:49.994391 <NA> 111.25 33.69547 -38.69783
## 3 2015-11-02 14:59:03.742392 <NA> 111.25 33.69548 -38.69783
## 4 2015-11-02 14:59:03.886408 <NA> 111.25 33.69547 -38.69783
## 5 2015-11-02 14:59:14.122334 <NA> 111.25 33.69548 -38.69783
## 6 2015-11-02 14:59:14.270387 <NA> 111.25 33.69547 -38.69783
## Zposition pfr exec_idle feed_idle override_idle machine_idle
## 1 NA NA TRUE FALSE FALSE TRUE
## 2 20.37543 0.0000 TRUE TRUE FALSE TRUE
## 3 20.37543 0.0001 TRUE TRUE FALSE TRUE
## 4 20.37543 0.0000 TRUE TRUE FALSE TRUE
## 5 20.37543 0.0001 TRUE TRUE FALSE TRUE
## 6 20.37543 0.0000 TRUE TRUE FALSE TRUE
We need to identify the time spent by the machine at origin. Let’s look at the X - Y graph again
ggplot(data = merged_pos_data, aes(x = Xposition, y = Yposition)) + geom_path()
It is clear that the periods when the machine was origin are roughly X > 30, Y < -30. Adding this into the mix
merged_data_final = merged_data %>%
mutate(at_origin = F) %>% # Setting everything false by default
mutate(at_origin = replace(at_origin, Xposition > 30 & Yposition < -30, TRUE)) %>%
select(timestamp, machine_idle, at_origin)
head(merged_data_final)
## timestamp machine_idle at_origin
## 1 2015-11-02 14:58:49.990541 TRUE FALSE
## 2 2015-11-02 14:58:49.994391 TRUE TRUE
## 3 2015-11-02 14:59:03.742392 TRUE TRUE
## 4 2015-11-02 14:59:03.886408 TRUE TRUE
## 5 2015-11-02 14:59:14.122334 TRUE TRUE
## 6 2015-11-02 14:59:14.270387 TRUE TRUE
Now we have all the data at our disposal to calculate the time statistics. First we need to convert the time series into interval format to get the duratins. We can use convert_ts_to_interval function to do the same.
merged_data_intervals = convert_ts_to_interval(merged_data_final)
head(merged_data_intervals)
## start end duration
## 1 2015-11-02 14:58:49.990541 2015-11-02 09:58:49.994391 0.00
## 2 2015-11-02 14:58:49.994391 2015-11-02 09:59:03.742392 13.75
## 3 2015-11-02 14:59:03.742392 2015-11-02 09:59:03.886408 0.14
## 4 2015-11-02 14:59:03.886408 2015-11-02 09:59:14.122334 10.24
## 5 2015-11-02 14:59:14.122334 2015-11-02 09:59:14.270387 0.15
## 6 2015-11-02 14:59:14.270387 2015-11-02 09:59:22.486440 8.22
## machine_idle at_origin
## 1 TRUE FALSE
## 2 TRUE TRUE
## 3 TRUE TRUE
## 4 TRUE TRUE
## 5 TRUE TRUE
## 6 TRUE TRUE
Now we can aggregate across the different states to find the total amount of time in each state.
time_summary = merged_data_intervals %>% group_by(machine_idle, at_origin) %>%
summarise(total_time = sum(duration, na.rm = T))
print(time_summary)
## Source: local data frame [4 x 3]
## Groups: machine_idle [?]
##
## machine_idle at_origin total_time
## (lgl) (lgl) (dbl)
## 1 FALSE FALSE 2713.77
## 2 FALSE TRUE 21.99
## 3 TRUE FALSE 3339.20
## 4 TRUE TRUE 576.95
total_time = sum(time_summary$total_time)
efficient_time = sum(time_summary$total_time[1])
inefficient_time = sum(time_summary$total_time[2:4])
interrupted_time = sum(time_summary$total_time[3:4])
time_at_origin = sum(time_summary$total_time[c(2,4)])
## [1] "Results"
## [1] "Total Time of Operation (including interruptions) = 6651.91s"
## [1] "Total Time without identified inefficiencies = 2713.77s"
## [1] "Total Time wasted due to interruptions = 3916.15s"
## [1] "Total Time wasted due to being at origin = 598.94s"