Some issues on Monte Carlo simulation for safety factor

1. Test independent variables

Multivariate Monte Carlo simulation for the safety factor yita requires the esitimated joint distribution of the four parametershp h_peak, h0, t_rise and alpha. According to multiplication rule of probability, if all paired parameters are independent, we can just simulate each parameter separately and thus calculate yita, which is equivalent to simulate using joint distribution.

First plot the scatterplots of all paired parameters:

Then test the association between all paired parameters, using Pearson’s product moment correlation coefficient:

correlation test of h_peak - h0

## 
##  Pearson's product-moment correlation
## 
## data:  dat$hp and dat$h0
## t = -0.28105, df = 28, p-value = 0.7807
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.405558  0.313216
## sample estimates:
##         cor 
## -0.05303814

correlation test of h_peak - t_rise

## 
##  Pearson's product-moment correlation
## 
## data:  dat$hp and dat$d
## t = 0.42922, df = 28, p-value = 0.6711
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2878036  0.4286329
## sample estimates:
##        cor 
## 0.08084854

correlation test of h_peak - alpha

## 
##  Pearson's product-moment correlation
## 
## data:  dat$hp and dat$a
## t = 3.6292, df = 28, p-value = 0.001125
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2578929 0.7691489
## sample estimates:
##       cor 
## 0.5656108

correlation test of h0 - t_rise

## 
##  Pearson's product-moment correlation
## 
## data:  dat$hp and dat$a
## t = 3.6292, df = 28, p-value = 0.001125
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2578929 0.7691489
## sample estimates:
##       cor 
## 0.5656108

correlation test of h0 - alpha

## 
##  Pearson's product-moment correlation
## 
## data:  dat$h0 and dat$a
## t = -4.202, df = 28, p-value = 0.0002439
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8023795 -0.3371440
## sample estimates:
##        cor 
## -0.6218783

correlation test of t_rise - alpha

## 
##  Pearson's product-moment correlation
## 
## data:  dat$d and dat$a
## t = 3.7909, df = 28, p-value = 0.000734
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2810899 0.7791704
## sample estimates:
##       cor 
## 0.5823824

The above-mentioned test is a kind of hypothesis testing. It seems at least 3 pairs have correlations under 5% significance level: hp-a(positive), h0-a(negative), d-a(positive). So in this case, we can not separately simulate each paramter to get the yita.

But good news is that only alpha is correlated with the other three parameters. So we can first model the multivariate correlation between alpha and the other three parameters, and then separately simulate the three independent paramters and obtain alpha based on the constructed model.

References

D. J. Best & D. E. Roberts (1975), Algorithm AS 89: The Upper Tail Probabilities of Spearman’s rho. Applied Statistics, 24, 377–379.
Myles Hollander & Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 185–194

2. Multivariate regression of alpha

Parameter \(α\) will be predict using multivariate regression model.

First read the four parameters data:

##       hp  h0     d     a
## 1  561.0 330 1.215 0.257
## 2  570.2 350 1.132 0.251
## 3  531.0 390 0.924 0.219
## 4  577.5 310 0.965 0.327
## 5  566.4 350 1.069 0.264
## 6  679.1 340 1.160 0.381
## 7  595.4 430 1.271 0.265
## 8  651.5 330 0.688 0.335
## 9  566.3 340 1.750 0.369
## 10 645.2 330 1.000 0.418
## 11 525.2 320 1.007 0.328
## 12 624.1 380 1.111 0.340
## 13 545.0 290 0.785 0.207
## 14 548.0 360 0.681 0.159
## 15 523.0 400 0.583 0.157
## 16 666.0 380 1.479 0.357
## 17 569.0 280 1.563 0.408
## 18 654.0 260 0.868 0.477
## 19 572.0 330 2.049 0.365
## 20 533.0 320 0.840 0.175
## 21 529.0 380 1.000 0.241
## 22 561.0 380 0.917 0.237
## 23 552.0 270 1.931 0.438
## 24 579.5 370 0.729 0.182
## 25 628.4 400 0.688 0.234
## 26 539.5 420 1.188 0.148
## 27 527.2 370 0.917 0.177
## 28 539.1 410 0.563 0.131
## 29 626.9 430 1.097 0.258
## 30 624.5 360 1.153 0.247

Then predict \(α\) using the multivariate regression model estimated by ordinary least square (OLS). References please find [1]. Here I provide two candidate models.

Model 1 with intercept: \[α=β_0+β_1h_p+β_2h_0+β_3d\]
Model 2 without intercept: \[α=β_1h_p+β_2h_0+β_3d\]

For model 1:

## 
## Call:
## lm(formula = a ~ hp + h0 + d, data = dat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.077824 -0.028875  0.002744  0.025956  0.079294 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0698254  0.1213786  -0.575     0.57    
## hp           0.0010166  0.0001657   6.134 1.74e-06 ***
## h0          -0.0010004  0.0001782  -5.614 6.69e-06 ***
## d            0.1040380  0.0219487   4.740 6.68e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04201 on 26 degrees of freedom
## Multiple R-squared:  0.8236, Adjusted R-squared:  0.8033 
## F-statistic: 40.47 on 3 and 26 DF,  p-value: 6.112e-10

The formula of model 1 is \[α=-0.0698254+0.0010166h_p-0.0010004h_0+0.1040380d\].

For model 2:

## 
## Call:
## lm(formula = a ~ hp + h0 + d - 1, data = dat)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.076658 -0.028311  0.002986  0.028703  0.072518 
## 
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)    
## hp  9.407e-04  9.917e-05   9.486 4.35e-10 ***
## h0 -1.062e-03  1.410e-04  -7.529 4.24e-08 ***
## d   1.005e-01  2.079e-02   4.833 4.78e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04148 on 27 degrees of freedom
## Multiple R-squared:  0.982,  Adjusted R-squared:   0.98 
## F-statistic: 491.8 on 3 and 27 DF,  p-value: < 2.2e-16

The formula of model 2 is \[α=0.0009407h_p-0.001062h_0+0.1005d\].

From the summaries above we can use the Residual standard error and Adjusted R-squared to quantify and compare the uncertainties of two models. So we can select Model 2 as the BETTER one.

Finally just visualize the differences between the observed and predicted alpha under the two models.

References

Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

3. Distribution fitting of three parameters

I try to obtain the optimal probability distributions of the three independent parameters given observed data.

Read the three parameters data:

##       hp  h0     d
## 1  561.0 330 1.215
## 2  570.2 350 1.132
## 3  531.0 390 0.924
## 4  577.5 310 0.965
## 5  566.4 350 1.069
## 6  679.1 340 1.160
## 7  595.4 430 1.271
## 8  651.5 330 0.688
## 9  566.3 340 1.750
## 10 645.2 330 1.000
## 11 525.2 320 1.007
## 12 624.1 380 1.111
## 13 545.0 290 0.785
## 14 548.0 360 0.681
## 15 523.0 400 0.583
## 16 666.0 380 1.479
## 17 569.0 280 1.563
## 18 654.0 260 0.868
## 19 572.0 330 2.049
## 20 533.0 320 0.840
## 21 529.0 380 1.000
## 22 561.0 380 0.917
## 23 552.0 270 1.931
## 24 579.5 370 0.729
## 25 628.4 400 0.688
## 26 539.5 420 1.188
## 27 527.2 370 0.917
## 28 539.1 410 0.563
## 29 626.9 430 1.097
## 30 624.5 360 1.153

I will select the most fitted one from the following continuous probability distributions because I don’t know the approprate one for each parameter and the observed data show continuous patterns:

Normal distribution
Lognormal distribution
Exponential distribution
Gamma distribution
T distribution
Cauchy distribution
Weibull distribution

Then I will fit all parameters separately based on maximum likelihood estimation(MLE).

1. Parameter \(h_p\)

1.1 Normal distribution

The estimated parameters (shape and rate) of normal distribution:

##      mean        sd 
## 580.33333  46.45295

The log-likelihood value:

## [1] -157.7214

1.2 Lognormal distribution

The estimated parameters (shape and rate) of lognormal distribution:

##    meanlog      sdlog 
## 6.36048583 0.07842126

The log-likelihood value:

## [1] -157.0129

1.3 Exponential distribution

The estimated parameters (rate) of exponential distribution:

##        rate 
## 0.001723148

The log-likelihood value:

## [1] -220.9081

1.4 Gamma distribution

The estimated parameters (shape and rate) of gamma distribution:

##       shape        rate 
## 150.8708281   0.2599733

The log-likelihood value:

## [1] -157.26

1.5 T distribution

The estimated parameters (m, s and df) of t distribution:

##         m         s        df 
## 580.08575  46.22848 117.46225

The log-likelihood value:

## [1] -157.7782

1.6 Cauchy distribution

The estimated parameters (location and scale) of cauchy distribution:

##  location     scale 
## 562.31167  25.71839

The log-likelihood value:

## [1] -163.2164

1.7 Weibull distribution

The estimated parameters (shape and scale) of weibull distribution:

##     shape     scale 
##  12.52087 602.67229

The log-likelihood value:

## [1] -160.3896

So I suggest select Lognormal distribution to fit \(h_p\):

where \(μ=6.36048583\) and \(σ=0.07842126\).

Information about lognormal distribution please refer to https://en.wikipedia.org/wiki/Log-normal_distribution

2. Parameter \(h_0\)

2.1 Normal distribution

The estimated parameters (shape and rate) of normal distribution:

##      mean        sd 
## 353.66667  44.83178

The log-likelihood value:

## [1] -156.6557

2.2 Lognormal distribution

The estimated parameters (shape and rate) of lognormal distribution:

##   meanlog     sdlog 
## 5.8600025 0.1306732

The log-likelihood value:

## [1] -157.3166

2.3 Exponential distribution

The estimated parameters (rate) of exponential distribution:

##        rate 
## 0.002827521

The log-likelihood value:

## [1] -206.0506

2.4 Gamma distribution

The estimated parameters (shape and rate) of gamma distribution:

##      shape       rate 
## 60.1579393  0.1700979

The log-likelihood value:

## [1] -157.0291

2.5 T distribution

The estimated parameters (m, s and df) of t distribution:

##         m         s        df 
## 353.80551  44.34350  64.80637

The log-likelihood value:

## [1] -156.727

2.6 Cauchy distribution

The estimated parameters (location and scale) of cauchy distribution:

##  location     scale 
## 355.77171  29.38045

The log-likelihood value:

## [1] -163.2001

2.7 Weibull distribution

The estimated parameters (shape and scale) of weibull distribution:

##      shape      scale 
##   9.091592 373.241314

The log-likelihood value:

## [1] -156.5283

So I suggest select Weibull distribution to fit \(h_0\):

where \(k=9.091592\) and \(λ=373.241314\).

Information about weibull distribution please refer to https://en.wikipedia.org/wiki/Weibull_distribution

3. Parameter \(d\)

3.1 Normal distribution

The estimated parameters (shape and rate) of normal distribution:

##      mean        sd 
## 1.0774333 0.3646438

The log-likelihood value:

## [1] -12.30313

3.2 Lognormal distribution

The estimated parameters (shape and rate) of lognormal distribution:

##    meanlog      sdlog 
## 0.02193016 0.32072027

The log-likelihood value:

## [1] -9.110482

3.3 Exponential distribution

The estimated parameters (rate) of exponential distribution:

##      rate 
## 0.9281317

The log-likelihood value:

## [1] -32.23745

3.4 Gamma distribution

The estimated parameters (shape and rate) of gamma distribution:

##    shape     rate 
## 9.660083 8.965831

The log-likelihood value:

## [1] -9.723271

3.5 T distribution

The estimated parameters (m, s and df) of t distribution:

##         m         s        df 
## 1.0140764 0.2577623 3.3185024

The log-likelihood value:

## [1] -11.46695

3.6 Cauchy distribution

The estimated parameters (location and scale) of cauchy distribution:

##  location     scale 
## 1.0032638 0.1774824

The log-likelihood value:

## [1] -13.8768

3.7 Weibull distribution

The estimated parameters (shape and scale) of weibull distribution:

## Warning in densfun(x, parm[1], parm[2], ...): 产生了NaNs

##    shape    scale 
## 3.063641 1.204610

The log-likelihood value:

## [1] -12.16217

So I suggest also select Lognormal distribution to fit \(d\) just as \(h_p\):

where \(μ=0.02193016\) and \(σ=0.32072027\).

And this case is the most fitted one among the three paramters, for its significant increase in log-likelihood value.

So in the next step, we can simulate the three parameters using the above estimated distributions, and then obtain \(α\) using the multivariate regression.

Some issues on Monte Carlo simulation for safety factor

Dengfeng Liu, Visiting Research Fellow of EU-IRSES, Nanjing University

2016-12

1. Test independent variables

2. Multivariate regression of alpha

3. Distribution fitting of three parameters

1. Parameter \(h_p\)

2. Parameter \(h_0\)

3. Parameter \(d\)