13 June 2017

Overview

  • Statistical Methods for Reliability Data (Meeker & Escobar, 1998) remains a foundational and well-respected text for analyzing failure-time and survival data

  • Along with the text, the authors developed an S-Plus software package to utilize the methods for industry data

  • Today, R is the most popular statistical computing language in the world - largely supplanting S

  • This presentation introduces tools under-development for use with the current and future versions of the SMRD to

    • Analyzing industry and, laboratory test data
    • Simplify reliability/survivability instruction in the classroom

The SMRD Toolkit

  1. R Package SMRD

    • Implements methods from the text to reproduce results from the text
    • Implements methods from the text for use on industry data
    • Interactive Shiny gadgets to instantly perform reliability analyses and generate reports
  2. Statistical Methods for Reliability Data in R

    • Expanded package documentation and example vignettes
    • In-depth examples corresponding to each chapter in the SMRD text
  3. R package teachingSMRD

    • Examples from the SMRD reproduced as interactive shiny apps
    • Automatic generation of assigments and solutions for SMRD exercises

Background

The R Languange & R Packages

The R Project for Statistical Computing

  • A statistical programming environment for data analysis and graphics

  • Developed by Ross Ihaka and Robert Gentleman at the University of Auckland

  • Open-source implementation of the 'S' language created by Becker et. al. at Bell Labs

  • A pre-eminent tool for statistics and data science

  • One of the fastest growing technical computing languages in the world

    • Used for data processing and visualization, computational statistics, and natural language processing etc.

    • Heavily used by Google, Facebook, Twitter, Microsoft, etc.

R Packages

  • In R, the fundamental unit of shareable code is called a package

  • Packages bundle together code, data, documentation, and tests to easily share analysis methods with others

  • Currently 10773 packages are available on the Comprehensive R Archive Network (CRAN)

  • Many more available from the Bioconductor and GitHub repositories

  • The huge variety of packages is a key reason why R is so popular

    • Chances are that someone has already solved a problem that you're working on

    • You can benefit from their work by downloading their package

R Package SMRD

Development Process and Package Features

SMRD - Development Process

  • Meeker developed a large collection of FORTRAN subroutines as part of contracted efforts at Bell Labs and Iowa State

  • Meeker & Escobar wrapped the FORTRAN code into an S-Plus package called SPLIDA (S-Plus LIfe Data Analysis)

  • SPLIDA serves as the companion software for Statistical Methods for Reliability Data 1st ed.

  • Meeker began an effort to translate SPLIDA into R under the name RSplida

    • Not user-friendly - couldn't be installed as a traditional R package

    • Difficult to use with modern IDE's (i.e. RStudio, Visual Studio, Eclipse, etc.)

SMRD - Development Process (cont.)

  • (2015) Freels & Meeker sign MOU to share FORTRAN code for purpose of developing an R package

  • Aim to publish SMRD to the CRAN in 2018

  • Remaining tasks to be completed before publishing (% complete)

    • (90%) Update older R & S-Plus idioms to modern equivalents
    • (10%) Translate FORTRAN code over C++
    • (75%) Update graphics objects
    • (75%) Document datasets
    • (15%) Document exported functions
    • (75%) Update for modern use-cases - literate programming/interactivity
    • (50%) Ensure compatibility with modern dependencies

SMRD Package Features

Powerful estimation/prediction methods for many types of failure data

  • Multiple failure modes
  • Censored observations (right, left, and interval censoring)
  • Truncated observations (right, left, and interval truncation)
  • Failure data with explanatory variables (failure-time regression)
  • Repeated measures degradation data (linear & non-linear mixed effects)
  • Repairable system failure data (recurring events)
  • Physical/performance degradation data
  • Failure data with prior information (Bayesian reliability)
  • Reliability growth test data
  • Reliability test simulations

SMRD Package Features

Minimal data pre-processing through flexible event definitions

  • Organizations often use different terms to describe the same event

    • 'Failure' = 'Failed' = 'Fail' = 'dead' = 'died'
    • 'right' = 'rcensored' = 'suspended' = 'alive'
    • 'left' = 'doa' = 'lcensored'
    • 'interval' = 'int' = 'icensored' = 'grouped'
  • Many applications force users to recode these events

  • SMRD allows for flexible event definitions to utilize the data as-is

  • Event definitions can even be mixed

  • SMRD event definitions easily mapped to survival numeric definitions

SMRD Default Event Definitions

SMRD Package Features

Easily Access Data from Multiple Sources

  • SMRD includes over 120 fully-documented datasets

  • For importing external data, SMRD leverages several other R packages

  • Excel files

    • XLConnect, readxl, xlsx
  • CSV/TSV files

    • base, utils, readr, data.table
  • Info, Minitab, S, SAS, SPSS, Stata, Systat and Weka files

    • foreign, HMISC

SMRD Package Features

Faster workflows through literate programming

Literate programming: integrating text with snippets of executable code in documents & presentations

  • In R, literate programming is supported by the knitr and rmarkdown packages

    • Weave code from multiple languages, \(\LaTeX\)-typeset equations, and text together in one document
    • Run and compile code, \(\LaTeX\), text simultaneously
    • Everything is stored in a single file
    • Output to any of a number of presentation or document formats

SMRD Package Features

Faster workflows through literate programming

  • With SPLIDA, many results were returned simultaneously

    • Graphics
    • Numeric values
    • Tables of results
    • Text summaries
  • For GUI-based software tools, presenting multiple results is GOOD

  • For tools emphasizing reproducible research and literate programming, presenting multiple results simultaneously is BAD

    • SMRD built to support literate programming - go from data to report fast
    • Ensure that specific results can be produced and called where desired

A Quick SMRD Example

Analyzing the shockabsorber dataset

Dataset: shockabsorber

  • This example demonstrates some of the SMRD functions to analyze the shockabsorber dataset used throughout the text

Creating life.data Objects

  • Many of the methods in the package require a life.data-class object
shock.ld <- frame.to.ld(frame = shockabsorber,
                        response.column = 1,
                        failure.mode.column = 2,
                        censor.column = 3,
                        time.units = 'Kilometers')
  • Since SPLIDA was written as a GUI, many functions to produce results and graphics already existed

  • Thus, once the life.data object has been created, many different plots and numeric results can be produced, each requiring only a single line of code

Producing Results From life.data Objects

  • Nonparametric CDF plots

  • Parametric CDF plots

  • ML CDF and hazard plots

  • Explanatory variable plots

  • Multi-failure mode plots

  • Relative likelihood surfaces

  • Relative likelihood curves


  • \(F(t)\) at specified values of \(t\)

  • \(h(t)\) at specified values of \(t\)

  • \(t^{-1}(p)\) at specified values of \(p\)

  • ML parameter estimates and standard errors

  • Logit and log transformed confidence intervals (pointwise and simultaneous)

Nonparametric & Parametric CDF plots

plot(shock.ld)
plot(shock.ld, distribution = 'lognormal')

ML Plots \(F(t)\) & \(h(t)\)

mlehazplot(shock.ld,  distribution = 'lognormal', param.loc = 'topleft')
mleprobplot(shock.ld, distribution = 'weibull', param.loc = 'topleft')

ML Surface Plots

simple.contour(shock.ld, distribution = 'sev', threeD = T, original.par = F)
simple.contour(shock.ld, distribution = 'sev', show.confidence = F, zoom = 1.75)

ML Estimate Table

Generate tables automatically

tab <- print(mlest(shock.ld, distribution = 'weibull'))$mle
xarray(table = tab)

\[ \begin{array}{rrrrr} \hline & MLE & Std.Err. & 95\% Lower & 95\% Upper \\ \hline mu & 10.23 & 0.11 & 10.01 & 10.45 \\ sigma & 0.32 & 0.07 & 0.20 & 0.50 \\ Weibull (eta) & 27718.72 & 3046.02 & 22347.77 & 34380.49 \\ Weibull (beta) & 3.16 & 0.73 & 2.01 & 4.97 \\ \hline \end{array} \]

R Package teachingSMRD

teachingSMRD

  • Teaching generates a lot of teaching files

  • teachingSMRD is designed to manage your course content and make it easier to use SMRD in the classroom

    • Automatically generate assignments and solutions for SMRD exercises
    • Call up worked exercises in-class
    • Render examples as interactive shiny applications
    • Teach R programming skills and course content simultaneously
  • Two package versions

    • Student version (published to the CRAN, solutions not included 👎)
    • Faculty version (not published, solutions included 👍)

Examples Using teachingSMRD

In person demonstration - if you're reading this after QPRC, sorry

Use-Cases for teachingSMRD

Four Example Scenarios

  • Viewing an interactive example from the text with shiny

  • Generating a 'fill-able' assignment of multiple exercises

  • Automatically generating a solution set for an assignment (.html and .pdf)

  • Using shiny gadgets to instantly analyze a data set and generate a report

Summary

New reliability tools are under development for R

  • R package SMRD - implements the methods presented in the Statistical Methods for Reliability Data text

  • R package teachingSMRD - makes it easier to use SMRD in the classroom

  • Statistical Methods for Reliability Data in R - extends the SMRD package documentation for in-depth examples advanded use-cases

Plan is to publish the packages to the CRAN late 2018

Want To Know More? Contact Me!

Maj. Jason Freels, Air Force Institute of Technology

QUESTIONS?