Addressing FPSO Failures: Strategies for Mitigation and Predictive Maintenance
Shape Technical Report
This study aims to conduct a comprehensive analysis of FPSO platform failures and provide valuable insights for the timing of maintenance measures. By examining historical data and employing advanced analytics techniques, patterns and indicators preceding failures will be identified. Predictive models will be developed to forecast potential failure events, offering significant implications for the oil production industry. Accurate predictions of maintenance actions will enable operators to minimize downtime, optimize production efficiency, enhance operational reliability, reduce costs, and improve overall performance in the FPSO sector.
Outline
1 - Introduction
2 - Methodology
3 - Exploratory Data Analysis
4 - Main Hypothesis
5 – Modelling
6 - Conclusions
Introduction
We have collected measurements from sensors that capture temperature, vibration magnitude in the x, y, and z directions, as well as the underlying vibration frequency.
These sensors continuously record these features over time for a single equipment in different time periods.1
To gain insights from the data, we will begin by conducting an exploratory data analysis (EDA).
I will make certain assumptions, then will apply classification models such as Logistic regression and GLMNET, the last can be viewed as logistic regression with regularization to mitigate overfitting and multicolinearity. This choice allows us to access interpretable parameters and gain insights from the model.
Furhter, I will be utilizing R to conduct the analysis and create visualizations using ggplot2.
Python should be utilized for data wrangling and training the models, but i choose to leave it aside, and use R only.
This dataset consists of 800 measurements taken from the same machine over different configuration over 800 running cycles.
Below, we can see how the dataset is organized:
display(dataset.head(5).to_markdown())| | Cycle | Preset_1 | Preset_2 | Temperature | Pressure | VibrationX | VibrationY | VibrationZ | Frequency | Fail |
|---:|--------:|-----------:|-----------:|--------------:|-----------:|-------------:|-------------:|-------------:|------------:|:-------|
| 0 | 1 | 3 | 6 | 44.2352 | 47.6573 | 46.4418 | 64.8203 | 66.4545 | 44.4832 | False |
| 1 | 2 | 2 | 4 | 60.8072 | 63.1721 | 62.006 | 80.7144 | 81.2464 | 60.2287 | False |
| 2 | 3 | 2 | 1 | 79.0275 | 83.0322 | 82.6421 | 98.2544 | 98.7852 | 80.9935 | False |
| 3 | 4 | 2 | 3 | 79.7162 | 100.509 | 122.362 | 121.363 | 118.653 | 80.3156 | False |
| 4 | 5 | 2 | 5 | 39.9891 | 51.7648 | 42.5143 | 61.0379 | 50.7165 | 64.2452 | False |
The preset variables here are predetermined configurations over which the FPSO will run at each cycle. For each preset, sensors capture measurements that are assumed to represent different aspects of the operation:
Temperature: Sensors capture the average temperature during the cycle.
Pressure: Sensors measure the pressure levels during the cycle.
VibrationX, VibrationY, VibrationZ: Sensors detect vibrations in the X, Y, and Z directions respectively, providing insights into the equipment’s mechanical stability and movement.
Frequency: Sensors record the frequency of certain phenomena or events relevant to the operation.
Fail: A boolean variable used to indicate the failure status of the equipment. 1 represents Failure of the equipment and 0 otherwise It indicates that the equipament eventually failed during the cycle
Methodology
In this work, we will proceed by following the steps outlined below.
Assessing Data Dependencies:
It is essential to determine whether the collected measurements can be considered independent and identically distributed (IID).
Further investigation is conducted to assess potential dependencies present in the data.
Exploratory Data Analysis (EDA):
An EDA is performed to gain insights from the collected data.
Various techniques and visualizations are employed to understand the data characteristics, distributions, and relationships.
Certain assumptions are made during the analysis, such as assuming the independence of data points over time.
Classification Models:
Classification models, such as ordinary Logistic Regression and GLMNET, are applied. GLMNET is a logistic regression model with regularization to mitigate overfitting and provide interpretable parameters.
These models help gain further insights and understand the relationships between variables.
Conclusions are drawn at the end of the analysis, and further work is suggested.
Exploratory Data Analysis
To gain insights into failure patterns, we will address the following basic questions:
How many times the equipment has failed?
failure_count = dataset['Fail'].sum() print("Number of equipment failures:", failure_count)Number of equipment failures: 66py$dataset %>% group_by(Fail) %>% summarize(count = n()) %>% ggplot(aes(x = Fail, y = count, fill = Fail)) + geom_col(width = 0.5) + geom_text(aes(label = count), colour='white', position = position_stack(vjust = 0.5)) + theme_bw()+labs(title='Has the equipment failed?',x = 'Failed', y = 'Frequency')The target variable is highly imbalanced and measures for remedy this problem should be considered further in the analysis
Categorize equipment failures by setups configurations (preset 1 and preset 2)
Next, we have the Failure distribution accordingly to each preset configuration.
I choose to perform chi-square tests of independency to assess if there is any relationship between Presets and Failure rates. The results are shown below.
Failure distribution for Preset 1 1 2 3 FALSE 237 260 237 TRUE 27 21 18 Pearson's Chi-squared test data: . X-squared = 2.0655, df = 2, p-value = 0.356Failure distribution for Preset 2 1 2 3 4 5 6 7 8 FALSE 84 92 95 90 88 92 100 93 TRUE 11 9 6 3 12 9 9 7 Pearson's Chi-squared test data: . X-squared = 7.3847, df = 7, p-value = 0.39
The Chi-squared tests check if there is any association between the marginal distribution of the presets and the failure rates
- For Preset 1, the p-value of the test was 0.356,
- For Preset 2, the p-value of the test was 0.39.
At the 5% level of significance, this suggests that there is not sufficient evidence to conclude a significant direct association between Presets and Failure rates.
Cumulative distribution over the presets
The table and plot below shows the cumulative distribution of failure over the presets. On the x axis we have the preset configuration. The presets are ordered in such way that the failure are on descending order. So the first configuration has the most number of failure and the last has the least.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. ℹ Please use `linewidth` instead.`geom_smooth()` using formula = 'y ~ x'Preset_1 Preset_2 Fail Cumulative_Failures Failure_Percentage 1 1 2 5 5 7.575758 4 1 5 5 10 15.151515 0 1 1 4 14 21.212121 20 3 5 4 18 27.272727 6 1 7 4 22 33.333333 8 2 1 4 26 39.393939 15 2 8 4 30 45.454546 22 3 7 3 33 50.000000 21 3 6 3 36 54.545454 16 3 1 3 39 59.090909 13 2 6 3 42 63.636364 12 2 5 3 45 68.181818 5 1 6 3 48 72.727273 10 2 3 2 50 75.757576 14 2 7 2 52 78.787879 9 2 2 2 54 81.818182 7 1 8 2 56 84.848485 17 3 2 2 58 87.878788 18 3 3 2 60 90.909091 3 1 4 2 62 93.939394 2 1 3 2 64 96.969697 11 2 4 1 65 98.484848 23 3 8 1 66 100.000000 19 3 4 0 66 100.000000 - From the cumulative table, we can draw several conclusions:
- The table provides information about different presets (Preset_1 and Preset_2), the number of failures (Fail), cumulative failures (Cumulative_Failures), and the failure percentage (Failure_Percentage).
- The cumulative failures increase over time, indicating that failures are accumulating.
- The Failure_Percentage column shows the proportion of cumulative failures relative to the total number of observations. For example, at the end of the table, the failure percentage is 100%, indicating that all observations, except by the last one, have experienced at least one failure.
We have a nearly linear cumulative distribution plot. Which indicates that we have a uniform population or a very flatten distribution.
In other words, if the cumulative distribution curve is nearly linear, it suggests that the data points are equally likely to fall within any given range of values. This can be contrasted with other types of distributions, such as skewed distributions (e.g., exponential, log-normal) or distributions with distinct peaks and valleys (e.g., Gaussian or multimodal distributions).
It could suggest that the preset configuration does not exercise influence over the failure rates.
This table shows the cumulative fail distribution for each preset configuration
Categorize equipment failures by their nature/root cause according to parameter readings (temperature, pressure, and others).
We have generated synthetic data using a method called ADASYN (Adaptive Synthetic Sampling). Unlike SMOTE, ADASYN utilizes KNN interpolation to generate synthetic samples for the minority class. YN adaptively generates samples over the manifold created by the features, specifically targeting regions with severe class imbalance.
First, we conducted some data analysis to examine how the Preset configurations influence the numerical measurements and determine if an increase in each exercise is associated with failure rates.
Density estimators for the features without and with oversampling (YN):
This plot displays density estimators for the distributions of sensor measurements categorized into Failure and non-failure groups.
Dataset without oversample Dataset with oversample
Based on the observed data, there is a notable indication that higher values might be linked to increased failure rates. This is evident from the rightward shift in the distributions of the Failure group. To further investigate this relationship, we can perform a battery of ANOVA chi-squaretests to assess the potential correlation between the presets and the measurements.
Df Sum Sq Mean Sq F value Pr(>F) 23 28856.39 1254.626 1.194321 0.2410519 776 815182.59 1050.493 NA NA Df Sum Sq Mean Sq F value Pr(>F) 23 20416.23 887.6620 1.375968 0.1124494 776 500611.79 645.1183 NA NA Df Sum Sq Mean Sq F value Pr(>F) 23 11509.97 500.4334 0.5058131 0.9748019 776 767746.64 989.3642 NA NA Df Sum Sq Mean Sq F value Pr(>F) 23 37461.05 1628.741 1.543265 0.0499838 776 818979.79 1055.386 NA NA Df Sum Sq Mean Sq F value Pr(>F) 23 17773.69 772.7692 0.9966092 0.4669688 776 601709.12 775.3983 NA NA Df Sum Sq Mean Sq F value Pr(>F) 23 17028.07 740.3510 0.8686648 0.6423903 776 661374.04 852.2861 NA NA Next, we have the effect of each preset on the measurements:
Preset effects on measurements distributions:
From the above plots we can conclude that certain presets are introducing uncertainty into the process.
The configurations associated with narrower and left-shifted distributions are the most optimal
Boxplot grid for the variables accordingly to the preset configurations:
Lets draw some intuitions: The plots above shows that configuration: 3 4 is the unique over which no failures has ocurred.
The configuration 3 - 4 is the most stable. all other have higher values for at least one of the measured variables. We can se from the box-plots that clearly, higher values are associated with failed ones.
#dataset %>% filter(config == '3 4') %>% summary() %>% View() #dataset %>% filter(config == '1 2') %>% summary() %>% View() dataset %>%dplyr::filter(config == '3 4') %>% select(Pressure, Temperature, Frequency, VibrationX, VibrationY,VibrationZ) %>% summary() %>% t() %>% kable(caption = "Summary of the variables from the original dataset: Config = 3 4") %>% kable_styling(font_size = 12,latex_options = 'HOLD_position')Summary of the variables from the original dataset: Config = 3 4 Pressure Min. : 29.14 1st Qu.: 55.88 Median : 75.22 Mean : 81.22 3rd Qu.:110.94 Max. :146.73 Temperature Min. : 30.22 1st Qu.: 47.41 Median : 62.72 Mean : 68.96 3rd Qu.: 81.17 Max. :139.17 Frequency Min. : 24.86 1st Qu.: 45.09 Median : 63.40 Mean : 60.59 3rd Qu.: 70.35 Max. :121.60 VibrationX Min. : 12.76 1st Qu.: 48.19 Median : 70.65 Mean : 71.30 3rd Qu.: 87.87 Max. :149.72 VibrationY Min. : 18.27 1st Qu.: 47.60 Median : 62.63 Mean : 68.10 3rd Qu.: 85.11 Max. :120.72 VibrationZ Min. : 28.99 1st Qu.: 50.44 Median : 61.12 Mean : 68.57 3rd Qu.: 89.21 Max. :130.43 Now, lets see the summary statistics for the config = 1 2’ and only for the observation which has failed
Summary statistics for the variables from the original dataset: Config =1 2 Pressure Temperature Frequency VibrationX VibrationY VibrationZ Min. : 69.92 Min. : 48.17 Min. : 48.92 Min. : 88.98 Min. : 71.19 Min. : 80.54 1st Qu.: 73.12 1st Qu.: 80.36 1st Qu.: 90.47 1st Qu.: 89.14 1st Qu.: 75.03 1st Qu.: 90.30 Median : 80.41 Median : 81.55 Median : 90.95 Median : 89.35 Median :104.74 Median :129.17 Mean : 95.88 Mean : 80.96 Mean : 94.43 Mean :115.61 Mean :103.11 Mean :122.00 3rd Qu.:110.69 3rd Qu.: 88.75 3rd Qu.:110.71 3rd Qu.:149.40 3rd Qu.:111.03 3rd Qu.:149.77 Max. :145.27 Max. :105.97 Max. :131.07 Max. :161.19 Max. :153.58 Max. :160.24
Modelling
Classification Model for Failure Prediction
With all the insights gained from the exploratory analysis, now, we can perform the modelling process using the assumptions we set on the previous step.
Firstly we will set up a GLMNET with logit link function in the over sampled data and see if the model have a good performance when comparing with the model trained over the imbalanced dataset.
This model should be useful because its regularization should account for the most important features automatically. Thus, we will fit this along with a normal logistic regression in order to assess what model perform better in predicting the Failure of the equipment given the sensor measurements.
For training these models we’ll be doing the following steps:
- Split the data into training and test sets. (70/20 proportion)
- Apply oversampling ADASYM (Adaptive Synthetic Over-sampling Technique), only to the training set.
- Train the model using the balanced training set.
- Evaluate the model’s performance on the unchanged test set.
Let’s fit the models in R! We could easily fit this in python using the glmnet_py lib.
Ordinary Logistic Regression (GLM)
The first model is logistic regression.
let’s fit it and see how it could explain how the variables affects the outcome, Fail.
Below we have the log-likelihood function for the logistic regression with L1 and L2 regularization.
\[ \ell(\pmb{\beta} \ | \ \pmb{X}) = \frac{1}{N}\sum_{i=1}^N y_i (\beta_0+x_{i}^\intercal\pmb{\beta})-\log(1+e^{(\beta_0+x_{i}^\intercal\pmb{\beta})}) \]
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -23.6990649 | 3.1507676 | -7.521680 | 0.0000000 |
| Pressure | 0.0246172 | 0.0080827 | 3.045671 | 0.0023216 |
| Frequency | 0.0531435 | 0.0114963 | 4.622679 | 0.0000038 |
| Temperature | 0.0345102 | 0.0105590 | 3.268336 | 0.0010818 |
| VibrationX | 0.0374928 | 0.0104573 | 3.585306 | 0.0003367 |
| VibrationY | 0.0357305 | 0.0083359 | 4.286335 | 0.0000182 |
| VibrationZ | 0.0485345 | 0.0108979 | 4.453582 | 0.0000084 |
| term | df | Deviance | Resid..Df | Resid..Dev | p.value |
|---|---|---|---|---|---|
| NULL | NA | NA | 559 | 336.9880 | NA |
| Pressure | 1 | 65.18635 | 558 | 271.8016 | 0e+00 |
| Frequency | 1 | 28.78975 | 557 | 243.0119 | 1e-07 |
| Temperature | 1 | 24.85541 | 556 | 218.1564 | 6e-07 |
| VibrationX | 1 | 30.27358 | 555 | 187.8829 | 0e+00 |
| VibrationY | 1 | 40.57580 | 554 | 147.3071 | 0e+00 |
| VibrationZ | 1 | 24.38525 | 553 | 122.9218 | 8e-07 |
Evaluating model fit:
Parameters
Now let’s analyze the fitting of the model and interpret the coefficients:
The intercept term (-23.6990649) represents the estimated log odds of the response variable when all other predictors are zero.
The categorical variable Preset is not significant, so i decided to remove it, because this suggests that these configurations do not have a significant impact on the log odds of the Failure compared to the reference configuration (I choose Preset1: 3 Preset2:4).
Among the continuous predictors, the coefficients with p-values less than 0.05 indicate strong statistical significance at this level:
A one-unit increase in Pressure is associated with a significant increase in the odds of the response variable, with an estimated increase of \(e^{0.0246172}\).
Frequency shows a significant increase in the log odds for a one-unit increase, with an estimated increase of \(e^{0.0531435}\).
Temperature exhibits a significant increase in odds for a one-unit increase, with an estimated increase of \(e^{0.0345102}\).
VibrationX has a coefficient of 0.0374928, indicating a significant increase in odds for a one-unit increase, with an estimated increase of \(e^{0.0374928}\).
VibrationY also shows a significant increase in odds for a one-unit increase, with an estimated increase of \(e^{0.0357305}\).
VibrationZ has the largest coefficient (0.0485345), suggesting a substantial increase in odds for a one-unit increase.
Deviance
The difference between the null deviance (model with only the intercept) and the residual deviance (model with predictors) helps us understand how well our model performs. A larger difference means our model fits the data better.
By looking at the table, we can see that adding each variable one by one decreases the deviance. Variables like Pressure, Frequency, and Vibration have a big impact in reducing the residual deviance. A high p-value suggests that including a particular variable doesn’t add much improvement to the model. Ideally, we want to see a significant drop in deviance and the AIC (a measure of model quality) to indicate a better model fit.
Here is a breakdown of the deviance-related statistics from the table:
Null: The null model has an unspecified number of degrees of freedom (df) and its corresponding deviance value is not available (NA). The residual df is 559, and the residual deviance is 336.9880.
Pressure: Adding the Pressure variable results in a deviance of 65.18635 with 1 df. The residual df decreases to 558, and the residual deviance decreases to 271.8016. The p-value associated with this variable is 0e+00, indicating high statistical significance.
Frequency: Including the Frequency variable leads to a deviance of 28.78975 with 1 df. The residual df becomes 557, and the residual deviance decreases further to 243.0119. The p-value for this variable is 1e-07, indicating strong statistical significance.
Temperature: Adding the Temperature variable yields a deviance of 24.85541 with 1 df. The residual df reduces to 556, and the residual deviance decreases to 218.1564. The p-value associated with this variable is 6e-07, indicating its statistical significance.
VibrationX: The inclusion of the VibrationX variable results in a deviance of 30.27358 with 1 df. The residual df becomes 555, and the residual deviance decreases to 187.8829. The p-value for this variable is 0e+00, indicating high statistical significance.
VibrationY: Including the VibrationY variable leads to a deviance of 40.57580 with 1 df. The residual df decreases to 554, and the residual deviance decreases to 147.3071. The p-value associated with this variable is 0e+00, indicating strong statistical significance.
VibrationZ: Adding the VibrationZ variable results in a deviance of 24.38525 with 1 df. The residual df becomes 553, and the residual deviance decreases to 122.9218. The p-value for this variable is 8e-07, indicating its statistical significance.
Using ADASYN to oversampling the minority class
Model2 Fit Summary term estimate std.error statistic p.value (Intercept) -37.0986974 3.3131339 -11.197464 0e+00 Temperature 0.0606809 0.0092877 6.533484 0e+00 Pressure 0.0389149 0.0076867 5.062636 4e-07 VibrationX 0.0653861 0.0091865 7.117623 0e+00 VibrationY 0.0655596 0.0083776 7.825541 0e+00 VibrationZ 0.0747454 0.0093757 7.972265 0e+00 Frequency 0.0936682 0.0107925 8.678998 0e+00 ANOVA for Model2 term df Deviance Resid..Df Resid..Dev p.value NULL NA NA 1015 1408.4593 NA Temperature 1 224.9049 1014 1183.5544 0 Pressure 1 160.5223 1013 1023.0321 0 VibrationX 1 125.7149 1012 897.3172 0 VibrationY 1 299.5132 1011 597.8040 0 VibrationZ 1 111.9039 1010 485.9001 0 Frequency 1 161.5295 1009 324.3706 0
Model predictive performance
Confusion matrices the for Logistic regression models without and with oversampling, respectively:
Confusion Matrix and Statistics
Not_Failed Failed
Not_Failed 219 7
Failed 5 9
Accuracy : 0.95
95% CI : (0.9143, 0.9739)
No Information Rate : 0.9333
P-Value [Acc > NIR] : 0.1840
Kappa : 0.5735
Mcnemar's Test P-Value : 0.7728
Sensitivity : 0.56250
Specificity : 0.97768
Pos Pred Value : 0.64286
Neg Pred Value : 0.96903
Prevalence : 0.06667
Detection Rate : 0.03750
Detection Prevalence : 0.05833
Balanced Accuracy : 0.77009
'Positive' Class : Failed
Confusion Matrix and Statistics
Not_Failed Failed
Not_Failed 206 0
Failed 18 16
Accuracy : 0.925
95% CI : (0.8841, 0.9549)
No Information Rate : 0.9333
P-Value [Acc > NIR] : 0.7482
Kappa : 0.6041
Mcnemar's Test P-Value : 6.151e-05
Sensitivity : 1.00000
Specificity : 0.91964
Pos Pred Value : 0.47059
Neg Pred Value : 1.00000
Prevalence : 0.06667
Detection Rate : 0.06667
Detection Prevalence : 0.14167
Balanced Accuracy : 0.95982
'Positive' Class : Failed
The oversampling model performed better in predicting machine failure but had a higher number of false positives. However, in this case, it is preferable to have false positives rather than false negatives.
Logistic Regression with regularization (GLMNET)
GLMNET is a model focused on prediction performance. Regularization techniques like glmnet are designed to address potential issues such as overfitting and multicollinearity.
It’s important to note that the purpose of glmnet is not to replicate the exact coefficient values or significance levels but rather to provide a regularized model that can potentially improve predictive performance and handle collinearity issues.
Below we have the log-likelihood function for the logistic regression with L1 and L2 regularization.
\[ \ell(\pmb{\beta} \ | \ \pmb{X}) = \frac{1}{N}\sum_{i=1}^N y_i (\beta_0+x_{i}^\intercal\pmb{\beta})-\log(1+e^{(\beta_0+x_{i}^\intercal\pmb{\beta})})+\lambda\left[(1-\alpha)||\pmb{\beta}||_2^2/2+\alpha||\pmb{\beta}||_1^1\right] \]
To fit this model, we need to adjust to our data, and then minimize for beta and find the point for which this function is minimal, just like we did for ordinary logistic regression.
Here, \(\alpha\) controls the importance of each kind of regularization in the model (\(L_1\) and\(L_2\)) and \(\lambda\) controls the overall quantity of each regularization.
We are going to fit this model in the oversampled dataset only.
Confusion Matrix and Statistics
Not_Failed Failed
Not_Failed 196 0
Failed 28 16
Accuracy : 0.8833
95% CI : (0.8358, 0.9211)
No Information Rate : 0.9333
P-Value [Acc > NIR] : 0.9985
Kappa : 0.4828
Mcnemar's Test P-Value : 3.352e-07
Sensitivity : 1.00000
Specificity : 0.87500
Pos Pred Value : 0.36364
Neg Pred Value : 1.00000
Prevalence : 0.06667
Detection Rate : 0.06667
Detection Prevalence : 0.18333
Balanced Accuracy : 0.93750
'Positive' Class : Failed
Conclusion
In this analysis, we made assumptions about the dependencies between data points to ensure the reliability of our findings. We conducted a thorough investigation to identify any potential dependencies present in the collected measurements.
To gain a deeper understanding of the data, we performed an Exploratory Data Analysis (EDA) using various techniques and visualizations. This allowed us to uncover valuable insights about the characteristics, distributions, and relationships among variables in the data.
One important finding is a strong correlation between higher observations on the variables and Failure events of FPSO.
Furthermore, we discovered that Preset configuration 3-4 is associated with fewer failure events.
Finally, we fitted three logistic regression models to investigate which variables could be more influential in causing the equipment to experience failure events.
Throughout the analysis, we made certain assumptions, including assuming independence of data points over time.
In addition, we applied classification models such as GLMNET alongside ordinary logistic regression. GLMNET is a logistic regression model with regularization, which helps prevent overfitting and multicollinearity. These models complemented our analysis, allowing us to gain further insights and understand the relationships between variables.
Overall, this comprehensive analysis has provided valuable insights into the dataset, helping us understand the underlying factors contributing to failures.
Now, let’s see a summary of our model performances.
| Model | Precision (2) | Recall (3) | F1 (4) |
|---|---|---|---|
| Logistic Reg. | 0.6428 | 0.5625 | 0.6 |
| Logistic Reg Overs. | 0.47059 | 1.00000 | 0.63 |
| GlmNet Overs. 6fold CV | 0.36364 | 1.00000 | 0.50 |
The last two models performed better in identifying true failures. However, the preference for Logistic Regression with oversampling should be evaluated, considering the need for interpretable parameters.
Future work will involve:
Investigating the proportional hazards assumption to explore the relationship between covariates and the time to failure events.
Estimating the time to failure and other relevant quantities when our data includes censoring. The Survival model enabled us to analyze the impact of different variables on the failure time, providing valuable insights into the factors influencing failures.
Including Mechanical fatigue models are mathematical models which help to predict the behavior and failure of materials or components under cyclic loading conditions.
Footnotes
However, it is important to determine whether these measurements can be considered independent and identically distributed (IID). Further investigation is necessary to assess the potential dependencies present in the data.↩︎
Precision = True Positives / (True Positives + False Positives)↩︎
Recall (Sensitivity) = True Positives / (True Positives + False Negatives)↩︎
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)↩︎