I was taught by Dr Nelson that the first thing we should do after gathering data is plot it and look for trends. I decided to analyze the Irving Oil CUI probability of detection data using probability plots.
For CUI data, either extreme value (Gumbel) or Weibull distributions should provide a satisfactory fit. I tried using both. While and Gumbel sometimes exhibited an improved goodness of fit, for the purposes of this exercise (since I wanted to use interval MLE techniques to analyze the data) it was quicker and easier to use Weibull plots as there are R packages readily available for most of what I wanted to do.
Interval MLE is useful for our purposes because the measured value is not always the most accurate way to analyze the data. For example, even though we may report CUI loss as zero, the actual loss may more be accurately analyzed as being less than a prescribed minimum value which is technique dependent. Interval MLE allows the analysis to consider the data as (1) less than a given value (2) in the interval between two values, as well as (3) greater than a value. We can get into the details of exactly how I formatted the data at a later date, if you wish.
For visual inspection and digital radiography I use 25 mils as the detection limit. For PEC and GUL, I used 50 and 100 mils, respectively.
I did not use a detection limit for RTR since the data was simply reported as moderate or severe wall loss. For RTR, I assumed moderate metal loss was greater than 25% nominal wall and severe metal loss was greater than 50% nominal, which I believe is consistent with the report.
We already know that the data for this study is quite sparse. For the carbonate line, the amount of data proved useful but I don’t believe the East IC4 line currently has enough data for an accurate analysis. My hope is that additional data on more lines will be gathered in the future and the techniques I have used here will prove even more useful.
Some data formatting was performed in the Inspection Result Raw Data spreadsheet before importing the data into R. I will send a copy of the spreadsheet so you can see what changes were made in Excel.
As an aside - I am not a specialist (or even a practitioner) in classical statistics…and in particular the use of F1 and F2 scores. I always tend to gravitate toward data visualization. I do have concerns that a real statistician might suggest that the uncertainty in metrics such as F1 and F2 would require orders of magnitude more data than has been acquired so far.
I believe, currently, the problem statement for this study may be something like…‘what is the probability of obtaining the actual CUI wall loss at a particular location, when using a particular inspection technique’.
Big picture, I suppose what I am pondering is whether the problem statement should be: ‘what is the probability of obtaining the actual distribution of CUI damage on components exposed to a similar environment for the same time period, using a particular inspection technique’.
Whatever the problem statement, I feel like it should lead an inspection organization to areas of focus, which may include progressive inspection using similar or different techniques - or other mitigations, with the obvious goal of minimizing CUI failures.
The truth plot for the carbonate line was generated using profile digital radiography only - there were no visual observations. The “canned” Weibull plot (WeibullR package.. a rather cluttered plot IMO, but OK for our purposes) demonstrates a satisfactory goodness of fit. The 55 “discovery” values (were provided to the analysis as less than 25 mills loss) fit in nicely with the 93 measured wall thicknesses.
The Weibull shape (beta) parameter of 1.16 and the characteristic value eta parameter of approximately 0.05” will be useful for comparison to values obtained using other inspection techniques.
The PEC plot doesn’t have a great goodness of fit - it has quite a bit of curvature at wall losses greater than about 0.1 inches but it’s still useful for our purposes. The data does essentially fit within the 90% confidence intervals. It could be that a distribution analysis would yield that a log normal fit is better, or perhaps something in the data gathering or data pre-processing contributes to this curvature. It would be interesting to speak with our pec technicians about this apparent anomaly.
The Weibull shape (beta) parameter of ~0.85 and the characteristic value eta parameter of approximately 0.04” Is surprisingly close to our truth distribution. More on that later.
The GUL Weibull Plot shows a significantly lower beta of 0.7, indicating a much wider distribution of CUI loss and a much lower eta value of 0.025” indicating lower overall losses than the truth case.
The RTR Weibull Plot shows a very high beta of 4.3, indicating a very narrow distribution of CUI loss, but with a very high eta value of 0.092”.
Because of the way the RTR data was reported, I decided to analyze all RTR readings as interval data. This is something we should discuss amongst ourselves and perhaps with an RTR Tech at some point.
Comparing probability densities for the wall loss distributions we can see how the PEC data most closely resembles the Truth case, while the RTR data could not be further from the truth
The following plot shows the Truth case as a red outline with the GUL PEC and RTR loss distributions superimposed.
Again, it is clear that PEC most closely resembles the Truth loss distribution.
There are several options for providing a quantitative metric, in order to predict how well a particular inspection technique matches the truth case. Since the purpose of this study was to estimate a probability of detection, a quantitative measure would be useful. This is not an easy answer….one such option will be proposed in a subsequent section.
It should be useful to compare the difference between the Truth wall loss distribution versus the losses estimated using each inspection technique. The following plot shows that PEC losses are the lowest of all three techniques; RTR shows the greatest error.
This plot suggests that RTR tends to overestimate wall losses compared with GUL and PEC. This result really needs to be confirmed after obtaining additional data.
The specifics for providing a probability of detection for CUI wall loss distributions is something we need to discuss and/or we should engage a real statistician.
One technique frequently employed is examining the likelihood contour plots for the distribution parameters. Again, boiling this down to one easy number will take some thought, but I believe this is a reasonable start to the discussion.
For expediency, the following plots were generated using a demo version of WinSMITH Weibull and WinSMITH Visual. It is useful to think of these contour plots as looking down from the top of a mountain onto the spread of parameters beta and eta. The top of the mountain is represented by the triangle that is generally centered on each contour plot.
When two distributions (in this case the (1) Truth and (2) inspection technique loss distributions) overlap, then the two distributions are considered identical. When contours overlap, it can be said that the distributions are not significantly different at a particular level. For example, two distributions can be considered “the same” if they do not exhibit greater than 90% significant difference.
The first plot shows parameter contours for the Truth versus PEC comparison. The overlap between the two distributions is obvious. The title shows that the two distributions only exhibit 84% significant difference. This indicates that the PEC data returned essentially the same CUI loss distribution as the Truth case.
The second contour plot shows the Truth versus GUL comparison. Again, the contours overlap - but as shown in the title, the two distributions exceed 90% significant difference.
The third contour plot shows the Truth versus RTR Comparison. In this case the contour plots do not overlap. There is greater than 100 percent significant difference between the two sets of parameters. One would not expect RTR alone to produce the Truth loss distribution