1 Why are you here?

Welcome, you are here to learn how to perform a simulation study using R to evaluate the quality of the estimations in Flexible Weibull extension regression. So let’s get started…


2 The Flexible Weibull extension distribution

The Flexible Weibull extension (FWE) distribution was proposed by Bebbington, Lai and Zikitis in 2007. This distribution has two parameters \(\mu\) and \(\sigma\) and its probability density function is given by:

\[ f(x;\mu,\sigma)= \left(\mu+\frac{\sigma}{x^{2}}\right)e^{\left(\mu x - \frac{\sigma}{x}\right)}\exp\left(-e^{\mu x - \frac{\sigma}{x}}\right), \]

where \(\mu>0\) and \(\sigma>0\). If a random variable X follows a Flexible Weibull extension distribution, we denote it as \(X \sim FWE(\mu, \sigma)\).

If you want to learn more about this distribution, check this article: A flexible Weibull extension


3 RelDists package

RelDists is an R package with several distributions useful for reliability analysis. Also, with this package it is possible to estimate parameters and fit regression models within GAMLSS framework. The current version of the package is hosted in GitHub and any user could download it using the next code:

if (!require('devtools')) install.packages('devtools')
devtools::install_github('ousuga/RelDists', force=TRUE)
require(RelDists)

If you want more information about the package visit the website.


4 Simulation study

Now we are going to perform a Monte Carlo simulation study to evaluate the quality of the estimations in Flexible Weibull extension regression without covariates, with covariates and censored data, using the RelDists package. For the evaluation of the performance of the estimation, we use the mean value, the Bias and the Mean Squared Error (MSE) of the estimator, which are given by:

\[\begin{align*} Mean &= \frac{\sum_{i=1}^{k}\hat{\theta_i}}{k},\\ Bias &= \frac{\sum_{i=1}^{k}(\hat{\theta_i}-\theta)}{k},\\ MSE &= \frac{\sum_{i=1}^{k}(\hat{\theta_i}-\theta)^{2}}{k},\\ \end{align*}\]

where \(\theta\) corresponds to the parameter that we want to estimate, is the estimator and \(k\) is the number of simulations we want to perform.

4.1 Model without covariates

In the first part of the simulation study we are going to generate data from the next model:

\[\begin{align*} y_i &\sim FWE(\mu, \sigma),\\ \mu &= 0.21,\\ \sigma &= 0.25. \end{align*}\]

We are going to consider different scenarios in which the sample size \(n\) and the percentage of censored data \(pcd\) vary, that is, we consider \(n=20, 30, 40, . . . , 280, 290, 300\) and \(pcd=0.00, 0.10, 0.20, 0.30, 0.40, 0.50\) and follow the next steps to simulate the data and estimate the parameters \(\mu\) and \(\sigma\):

  1. Generate a sample of size \(n\) from \(FWE(\mu=0.21, \sigma=0.25)\).

  2. Modify the simulated random sample to ensure a \(pcd \times 100 \%\) of right-censored observations.

  3. Obtain and store the estimations \(\hat{\mu}\) and \(\hat{\sigma}\) using the proposed procedure.

  4. Repeat \(k = 10000\) times the steps 1 to 3.

4.2 Model with covariates

In the second part of the simulation study we are going to generate data from the next model:

\[\begin{align*} y_i &\sim FWE(\mu, \sigma),\\ \log(\mu_i) &= \beta_0 + \beta_1 \times X_{1i},\\ \log(\sigma_i) &= \gamma_0 + \gamma_1 \times X_{2i},\\ X_1 &\sim U(0,1),\\ X_2 &\sim U(0,1),\\ \end{align*}\]

where \(\beta_0=-2\), \(\beta_1=0.9\), \(\gamma_0=2\) and \(\gamma_1=-6.7\). Just like for the first part of the simulation study, we consider \(n=20, 30, 40, . . . , 280, 290, 300\) and \(pcd=0.00, 0.10, 0.20, 0.30, 0.40, 0.50\) and follow the next steps to simulate the data and estimate the parameters \(\beta_0\), \(\beta_1\), \(\gamma_0\) and \(\gamma_1\):

  1. Generate a sample of size \(n\) from the model presented above.

  2. Modify the simulated random sample to ensure a \(pcd \times 100 \%\) of right-censored observations.

  3. Obtain and store the estimations \(\hat{\beta_0}\), \(\hat{\beta_1}\), \(\hat{\gamma_0}\) and \(\hat{\gamma_1}\) using the proposed procedure.

  4. Repeat \(k = 10000\) times the steps 1 to 3.


5 Results

5.1 Model without covariates

The next graphs show us the mean value of the estimator of \(\mu\) and \(\sigma\) for each combination of parameters \(pcd\) and \(n\). We can observe that for both parameters, the mean value of the estimator tends to get closer to the real value of the parameter as the sample size \(n\) increases, regardless of the value of \(pcd\). On the other hand, for parameter \(\sigma\), the behavior of the mean value of the estimator is very similar for the different values of \(n\) and \(pcd\), while for the parameter \(\mu\), with small values of the percentage of censored data \(pcd\), the mean value of the estimator approaches the true value of the parameter faster.


The next graphs show us the MSE of the estimator of \(\mu\) and \(\sigma\) for each combination of parameters \(pcd\) and \(n\). We can observe that for both parameters, the MSE of the estimator tends to get closer to zero as the sample size \(n\) increases, regardless of the value of \(pcd\). On the other hand, for parameter \(\sigma\), the behavior of the MSE is very similar for the different values of \(n\) and \(pcd\), while for the parameter \(\mu\), with small values of the percentage of censored data \(pcd\), the MSE of the estimator approaches zero faster.


The next graphs show us the Bias of the estimator of \(\mu\) and \(\sigma\) for each combination of parameters \(pcd\) and \(n\). We can observe that for both parameters, the Bias of the estimator tends to get closer to zero as the sample size \(n\) increases, regardless of the value of \(pcd\). On the other hand, for parameter \(\sigma\), the behavior of the Bias is very similar for the different values of \(n\) and \(pcd\), however, with \(pcd=0.5\) the behavior varies a little. For the parameter \(\mu\), with small values of the percentage of censored data \(pcd\), the Bias of the estimator approaches zero faster.


5.2 Model with covariates

The next graphs show us the mean value of the estimator of \(\beta_0\), \(\beta_1\), \(\gamma_0\) and \(\gamma_1\) for each combination of parameters \(pcd\) and \(n\). We observe that for parameters \(\gamma_0\) and \(\gamma_1\), the mean value of the estimator tends to get closer to the real value of the parameter as the sample size \(n\) increases, when \(pcd<0.5\). On the other hand, for parameters \(\beta_0\) and \(\beta_1\), the mean value of the estimator tends to get closer to the real value of the parameter as the sample size \(n\) increases, when \(pcd<0.4\). For all the parameters, when working with large values of the percentage of censored data \(pcd\), the mean value of the estimator deviates significantly from the real value of the parameter, especially in the case of \(\beta_0\) and \(\beta_1\).


The next graphs show us the MSE of the estimator of \(\beta_0\), \(\beta_1\), \(\gamma_0\) and \(\gamma_1\) for each combination of parameters \(pcd\) and \(n\). We observe that for parameters \(\gamma_0\) and \(\gamma_1\), the MSE of the estimator tends to get closer to zero as the sample size \(n\) increases, regardless of the value of \(pcd\). On the other hand, for parameters \(\beta_0\) and \(\beta_1\), the MSE of the estimator tends to get closer to zero as the sample size \(n\) increases when \(pcd<0.4\). When \(pcd>0.4\), the MSE deviates significantly from zero.


The next graphs show us the Bias of the estimator of \(\beta_0\), \(\beta_1\), \(\gamma_0\) and \(\gamma_1\) for each combination of parameters \(pcd\) and \(n\). We observe that for parameters \(\gamma_0\) and \(\gamma_1\), the Bias of the estimator tends to get closer zero as the sample size \(n\) increases, when \(pcd<0.5\). On the other hand, for parameters \(\beta_0\) and \(\beta_1\), the Bias of the estimator tends to get closer to zero as the sample size \(n\) increases when \(pcd<0.4\). For all the parameters, when working with large values of the percentage of censored data \(pcd\), the Bias of the estimator deviates significantly from zero, especially in the case of \(\beta_0\) and \(\beta_1\).



I hope you enjoyed this post and learned about how to perform a simulation study using R. I encourage you to replicate this procedure with other regression models.