W. Q. Meeker, L. A. Escobar, and J. K. Freels
06 October 2016
The background of the SMRD package we'll use throughout this course
Basic ideas behind product reliability
Reasons for collecting reliability data
Distinguishing features of reliability data
General models for reliability data
Examples of reliability data and motivations for collecting the data
A general strategy used for data analysis, modeling, and inference from reliability data
A
How well a population of manufactured products conform to the initial design requirements and specifications
A product can have BOTH high quality AND low reliability
This highlights the importance of developing good reliability requirements early on in a program
Academic discipline focused on the analysis, characterization, and measurement of system failures to increase system design life and improve system availability by:
Eliminating and/or reducing the likelihood of failures and safety risks
Reducing downtime due to maintenance
Merging many statistical and engineering disciplines together
...is engineering in its most practical form...
James R. Schlesinger U.S. Secretary of Defense (1973-1975)
Ever-increasing system complexity and sophistication
Public awareness and insistence on product reliability
Profit considerations resulting from the high cost of failures, repairs, and warranty programs
Contractual requirements to meet reliability and maintainability performance specifications
Laws and regulations concerning product liability and safety
Food and Drug Act
Flammable Fabrics Act
Federal Hazardous Substance Act
National Traffic and Motor Vehicle Safety Act
Fire Research and Safety Act
Child Protection and Toy Safety Act
Poison Prevention Packaging Act
Occupational Safety and Health Act
Federal Boat Safety Act
Consumer Product Safety Act
Engineering (deterministic) approach
Prevent failures by designing in a safety factor of 4 to 10 times the expected average stress
Can result in overdesigned products leading to dramatically increased costs
Can also result in under-designed products if an unanticipated load or a material weakness results in a failure
Probabilistic approach
Treats failures as random events
In theory, if we understood the exact physics and chemistry of a failure process, many failures of a component could be predicted with certainty
With limited data on the state of a component, and incomplete knowledge of the processes that cause failures, failures will appear to occur at random over time
This random process may exhibit a pattern which can be modeled by some probability distribution
Product consumers & producers often have different priorities when it comes to product reliability
Develop improved materials that can enable new capabilities
Find more efficient system architectures
Issue new safety requirements (for products and employees)
Create new environmental regulations
Develop newer/better/cheaper products that can impact your profit margins
Competition to get your product to market first
Develop new test capabilities to find and remove failures
Derive new analysis techniques to reduce test time or number of samples
Assess performance characteristics of materials over design life
Predict product reliability
Assess the effect of a proposed design change
Compare components from different manufacturers
Assess product reliability in field
Checking the veracity of an advertising claim
Track new failure modes
Predict product warranty costs
Ensure safety requirements are met
Reliability
Software required to solve all but the most basic problems (this class will use R)
Data are typically
Observations (e.g., time or cycles to failure) are strictly positive
Estimating model parameters is usually not the primary interest
This section lists several real-world examples of reliability data sets
Demonstrates the wide range of reliability data structures
Right, left, and interval censoring
Multiple failure modes
Different usage measures (time, flight hours, miles driven, rounds fired)
Explanatory variables (accelerated stresses, differing usage environments)
Explanatory variables are commonly used in regression modeling or design of experiments
May be called regressors or experimental factors in certain contexts
Used to reduce uncertainty in responses by incorporating differences in how units were tested
In the context of reliability, explanatory variables can be used to define the severity of an environment to which a system was exposed
The severity of the environment relates how quickly a product's "life" is consumed
If it can be reasonably assumed that each unit was exposed to an equivalent environment, explanatory variables may be ignored and the failure observations \(t_i, i=1,2,...\) are assumed to be \(iid\)
Assuming \(t_{i} \sim iid, \forall i\) implies that any differences in the sample population or test process are captured by \(Var[X]\)
The time-scale used to measure operating life differs between test units
Company A specifies the lifetime of landing gear components in flight hours
Company B specifies the lifetime of landing gear components in take-off/landing cycles
The severity of the test environment is not considered
Often, units are tested at multiple environments with differing levels of severity
To drive failures more quickly
To reduce the overall test time
Recall, explanatory variables can be used to relate how quickly "life" is consumed in each environment
Changing the value of an explanatory variable changes the value of one or more parameter in the underlying failure distribution (think regression)
Complex models are often required to compare life consumption rates between severity levels
Examples
Cracks grow faster when a higher level of stress is applied
Tire treads wear faster when driven on gravel roads
Metal corrodes faster when it is exposed to humid environments
It is critical that the results obseved at higher stress environments represent the behavior observed at the use-level stress
We could subject a test unit to a temperature that would cause some component to melt, but that would not give any useful information about the time to failure out in the field
We can generate failures very quickly instantly in some cases
But, the desire to reduce the total test time should not cause failures that would never be observed in the operating environment
Occur when the value of a system performance measure crosses above or below a critical value
Power window motor raises/lowers windows too slowly
Tire tread depth falls below a safe level
Power window motor gear teeth thickness wears down to where it fails
Tire wear becomes so extreme that the tire fails below
Used to ASSESS the performance of MATURE products
Most statistics courses discuss enumerative studies
Test a random sample from a population of units
Observe test results, analyze data, and make conclusions
BUT, what if your conclusion is these parts suck and don't meet the reliability requirement?
Used to IMPROVE the performance of IMMATURE products
Most reliability tests are Analytic Studies
You improve the design, and now...
Your test data is based on a prior design that doesn't exist anymore
Failure results when a flaw is exposed to a severe environment for a sufficient period of time.
Flaws result from poor designs
Flaws can result from manufacturing
Flaws can result if the usage environment is not well understood
The environment in which the system operates imparts stresses on the system which over time can
Extend microcracks
Loosen joints
Weaken electical connections
Magnify vibrations
Elevate internal temperatures
These enlarged flaws weaken the system until the environmental stresses exceed the flaw's residual strength
It's well understood that a system's performance is affected by its operating environment
It's less understood how many different environments a system is actually exposed to
Example: A car
Timescale 1: Years of ownership
Timescale 2: Miles driven
Timescale 3: Number of times started
Each customer may have their own time-scale of interest
Each system component has a time-scale that is most appropriate
When it leaves the assembly line?
When it ships from factory to a retailer?
When it is purchased by a customer?
When it is installed/unboxed?
When it is first used?
Is it even possible to know when each of these events occur?
When the customer realizes it has stopped functioning?
When a warranty claim is made?
When it is received by the manufacturer?
As this is a statistics course, defining time origin and failure time is not a concern
However, recognize that these issues must be considered when collecting time-to-event data
Light bulbs
Small appliances
Consumable items (filters, belts, hoses)
Tires (tread depth)
Computers (processing speed, obsolescence)
Non-repairable systems may be represented by a transition diagram with one working state and one or more absorbing failure states
State 0: Initial working state
States 1, 2,...: Absorbing failed states
Reliability tests are often campaigns composed of many disparate tests
Each test will have its own goals
The type(s) of test used depend on where the system is in its development
The goal of a particular test may not sync with the overall goal of the test campaign
Environmental test chambers
Performance diagnostic equipment
Data acquisition systems
Personnel - technical SME's for root cause analyses
Test ranges
Maintenance equipment
Personnel - operational SME's for realistic performance exercises
Test environments should be similar to the fielded environment
Ensure test results are representative of fielded operations
Consumers want the test environment to be
Producers want the test environment to be
Fielded systems will be exposed to multiple-stresses, simultaneously
Every failure mode will respond to each stress differently
If a failure mode is sensitive to thermal loads, a vibration test may not produce meaningful results
Do the data being gathered in the test support this?
Often, data from
This is a very active research area
Assess the data nonparametrically - using simple graphical methods
Fit one or more simple parametric models to the data
Check for violations of model assumptions
Compute model parameter estimates and confidence intervals
Use numeric and graphical methods to check fit to data
Perform sensitivity analyses on parameter values and model assumptions
Examples of time-to-failure data for different types of systems
A strategy for estimating system performance metrics from the data
The most common reliability metric is the failure-time distribution
Many failure time processes are modeled using a continuous scale (i.e. Time)