Multivariate Monte Carlo simulation for the safety factor yita requires the esitimated joint distribution of the four parametershp h_peak, h0, t_rise and alpha. According to multiplication rule of probability, if all paired parameters are independent, we can just simulate each parameter separately and thus calculate yita, which is equivalent to simulate using joint distribution.
First plot the scatterplots of all paired parameters:
Then test the association between all paired parameters, using Pearson’s product moment correlation coefficient:
##
## Pearson's product-moment correlation
##
## data: dat$hp and dat$h0
## t = -0.28105, df = 28, p-value = 0.7807
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.405558 0.313216
## sample estimates:
## cor
## -0.05303814
##
## Pearson's product-moment correlation
##
## data: dat$hp and dat$d
## t = 0.42922, df = 28, p-value = 0.6711
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2878036 0.4286329
## sample estimates:
## cor
## 0.08084854
##
## Pearson's product-moment correlation
##
## data: dat$hp and dat$a
## t = 3.6292, df = 28, p-value = 0.001125
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2578929 0.7691489
## sample estimates:
## cor
## 0.5656108
##
## Pearson's product-moment correlation
##
## data: dat$hp and dat$a
## t = 3.6292, df = 28, p-value = 0.001125
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2578929 0.7691489
## sample estimates:
## cor
## 0.5656108
##
## Pearson's product-moment correlation
##
## data: dat$h0 and dat$a
## t = -4.202, df = 28, p-value = 0.0002439
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8023795 -0.3371440
## sample estimates:
## cor
## -0.6218783
##
## Pearson's product-moment correlation
##
## data: dat$d and dat$a
## t = 3.7909, df = 28, p-value = 0.000734
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2810899 0.7791704
## sample estimates:
## cor
## 0.5823824
The above-mentioned test is a kind of hypothesis testing. It seems at least 3 pairs have correlations under 5% significance level: hp-a(positive), h0-a(negative), d-a(positive). So in this case, we can not separately simulate each paramter to get the yita.
But good news is that only alpha is correlated with the other three parameters. So we can first model the multivariate correlation between alpha and the other three parameters, and then separately simulate the three independent paramters and obtain alpha based on the constructed model.
References
D. J. Best & D. E. Roberts (1975), Algorithm AS 89: The Upper Tail Probabilities of Spearman’s rho. Applied Statistics, 24, 377–379.
Myles Hollander & Douglas A. Wolfe (1973), Nonparametric Statistical Methods. New York: John Wiley & Sons. Pages 185–194
Parameter \(α\) will be predict using multivariate regression model.
First read the four parameters data:
## hp h0 d a
## 1 561.0 330 1.215 0.257
## 2 570.2 350 1.132 0.251
## 3 531.0 390 0.924 0.219
## 4 577.5 310 0.965 0.327
## 5 566.4 350 1.069 0.264
## 6 679.1 340 1.160 0.381
## 7 595.4 430 1.271 0.265
## 8 651.5 330 0.688 0.335
## 9 566.3 340 1.750 0.369
## 10 645.2 330 1.000 0.418
## 11 525.2 320 1.007 0.328
## 12 624.1 380 1.111 0.340
## 13 545.0 290 0.785 0.207
## 14 548.0 360 0.681 0.159
## 15 523.0 400 0.583 0.157
## 16 666.0 380 1.479 0.357
## 17 569.0 280 1.563 0.408
## 18 654.0 260 0.868 0.477
## 19 572.0 330 2.049 0.365
## 20 533.0 320 0.840 0.175
## 21 529.0 380 1.000 0.241
## 22 561.0 380 0.917 0.237
## 23 552.0 270 1.931 0.438
## 24 579.5 370 0.729 0.182
## 25 628.4 400 0.688 0.234
## 26 539.5 420 1.188 0.148
## 27 527.2 370 0.917 0.177
## 28 539.1 410 0.563 0.131
## 29 626.9 430 1.097 0.258
## 30 624.5 360 1.153 0.247
Then predict \(α\) using the multivariate regression model estimated by ordinary least square (OLS). References please find [1]. Here I provide two candidate models.
Model 1 with intercept: \[α=β_0+β_1h_p+β_2h_0+β_3d\]
Model 2 without intercept: \[α=β_1h_p+β_2h_0+β_3d\]
For model 1:
##
## Call:
## lm(formula = a ~ hp + h0 + d, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.077824 -0.028875 0.002744 0.025956 0.079294
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0698254 0.1213786 -0.575 0.57
## hp 0.0010166 0.0001657 6.134 1.74e-06 ***
## h0 -0.0010004 0.0001782 -5.614 6.69e-06 ***
## d 0.1040380 0.0219487 4.740 6.68e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04201 on 26 degrees of freedom
## Multiple R-squared: 0.8236, Adjusted R-squared: 0.8033
## F-statistic: 40.47 on 3 and 26 DF, p-value: 6.112e-10
The formula of model 1 is \[α=-0.0698254+0.0010166h_p-0.0010004h_0+0.1040380d\].
For model 2:
##
## Call:
## lm(formula = a ~ hp + h0 + d - 1, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.076658 -0.028311 0.002986 0.028703 0.072518
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## hp 9.407e-04 9.917e-05 9.486 4.35e-10 ***
## h0 -1.062e-03 1.410e-04 -7.529 4.24e-08 ***
## d 1.005e-01 2.079e-02 4.833 4.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04148 on 27 degrees of freedom
## Multiple R-squared: 0.982, Adjusted R-squared: 0.98
## F-statistic: 491.8 on 3 and 27 DF, p-value: < 2.2e-16
The formula of model 2 is \[α=0.0009407h_p-0.001062h_0+0.1005d\].
From the summaries above we can use the Residual standard error and Adjusted R-squared to quantify and compare the uncertainties of two models. So we can select Model 2 as the BETTER one.
Finally just visualize the differences between the observed and predicted alpha under the two models.
References
I try to obtain the optimal probability distributions of the three independent parameters given observed data.
Read the three parameters data:
## hp h0 d
## 1 561.0 330 1.215
## 2 570.2 350 1.132
## 3 531.0 390 0.924
## 4 577.5 310 0.965
## 5 566.4 350 1.069
## 6 679.1 340 1.160
## 7 595.4 430 1.271
## 8 651.5 330 0.688
## 9 566.3 340 1.750
## 10 645.2 330 1.000
## 11 525.2 320 1.007
## 12 624.1 380 1.111
## 13 545.0 290 0.785
## 14 548.0 360 0.681
## 15 523.0 400 0.583
## 16 666.0 380 1.479
## 17 569.0 280 1.563
## 18 654.0 260 0.868
## 19 572.0 330 2.049
## 20 533.0 320 0.840
## 21 529.0 380 1.000
## 22 561.0 380 0.917
## 23 552.0 270 1.931
## 24 579.5 370 0.729
## 25 628.4 400 0.688
## 26 539.5 420 1.188
## 27 527.2 370 0.917
## 28 539.1 410 0.563
## 29 626.9 430 1.097
## 30 624.5 360 1.153
I will select the most fitted one from the following continuous probability distributions because I don’t know the approprate one for each parameter and the observed data show continuous patterns:
Then I will fit all parameters separately based on maximum likelihood estimation(MLE).
1.1 Normal distribution
The estimated parameters (shape and rate) of normal distribution:
## mean sd
## 580.33333 46.45295
The log-likelihood value:
## [1] -157.7214
1.2 Lognormal distribution
The estimated parameters (shape and rate) of lognormal distribution:
## meanlog sdlog
## 6.36048583 0.07842126
The log-likelihood value:
## [1] -157.0129
1.3 Exponential distribution
The estimated parameters (rate) of exponential distribution:
## rate
## 0.001723148
The log-likelihood value:
## [1] -220.9081
1.4 Gamma distribution
The estimated parameters (shape and rate) of gamma distribution:
## shape rate
## 150.8708281 0.2599733
The log-likelihood value:
## [1] -157.26
1.5 T distribution
The estimated parameters (m, s and df) of t distribution:
## m s df
## 580.08575 46.22848 117.46225
The log-likelihood value:
## [1] -157.7782
1.6 Cauchy distribution
The estimated parameters (location and scale) of cauchy distribution:
## location scale
## 562.31167 25.71839
The log-likelihood value:
## [1] -163.2164
1.7 Weibull distribution
The estimated parameters (shape and scale) of weibull distribution:
## shape scale
## 12.52087 602.67229
The log-likelihood value:
## [1] -160.3896
So I suggest select Lognormal distribution to fit \(h_p\):
where \(μ=6.36048583\) and \(σ=0.07842126\).
Information about lognormal distribution please refer to https://en.wikipedia.org/wiki/Log-normal_distribution
2.1 Normal distribution
The estimated parameters (shape and rate) of normal distribution:
## mean sd
## 353.66667 44.83178
The log-likelihood value:
## [1] -156.6557
2.2 Lognormal distribution
The estimated parameters (shape and rate) of lognormal distribution:
## meanlog sdlog
## 5.8600025 0.1306732
The log-likelihood value:
## [1] -157.3166
2.3 Exponential distribution
The estimated parameters (rate) of exponential distribution:
## rate
## 0.002827521
The log-likelihood value:
## [1] -206.0506
2.4 Gamma distribution
The estimated parameters (shape and rate) of gamma distribution:
## shape rate
## 60.1579393 0.1700979
The log-likelihood value:
## [1] -157.0291
2.5 T distribution
The estimated parameters (m, s and df) of t distribution:
## m s df
## 353.80551 44.34350 64.80637
The log-likelihood value:
## [1] -156.727
2.6 Cauchy distribution
The estimated parameters (location and scale) of cauchy distribution:
## location scale
## 355.77171 29.38045
The log-likelihood value:
## [1] -163.2001
2.7 Weibull distribution
The estimated parameters (shape and scale) of weibull distribution:
## shape scale
## 9.091592 373.241314
The log-likelihood value:
## [1] -156.5283
So I suggest select Weibull distribution to fit \(h_0\):
where \(k=9.091592\) and \(λ=373.241314\).
Information about weibull distribution please refer to https://en.wikipedia.org/wiki/Weibull_distribution
3.1 Normal distribution
The estimated parameters (shape and rate) of normal distribution:
## mean sd
## 1.0774333 0.3646438
The log-likelihood value:
## [1] -12.30313
3.2 Lognormal distribution
The estimated parameters (shape and rate) of lognormal distribution:
## meanlog sdlog
## 0.02193016 0.32072027
The log-likelihood value:
## [1] -9.110482
3.3 Exponential distribution
The estimated parameters (rate) of exponential distribution:
## rate
## 0.9281317
The log-likelihood value:
## [1] -32.23745
3.4 Gamma distribution
The estimated parameters (shape and rate) of gamma distribution:
## shape rate
## 9.660083 8.965831
The log-likelihood value:
## [1] -9.723271
3.5 T distribution
The estimated parameters (m, s and df) of t distribution:
## m s df
## 1.0140764 0.2577623 3.3185024
The log-likelihood value:
## [1] -11.46695
3.6 Cauchy distribution
The estimated parameters (location and scale) of cauchy distribution:
## location scale
## 1.0032638 0.1774824
The log-likelihood value:
## [1] -13.8768
3.7 Weibull distribution
The estimated parameters (shape and scale) of weibull distribution:
## Warning in densfun(x, parm[1], parm[2], ...): 产生了NaNs
## shape scale
## 3.063641 1.204610
The log-likelihood value:
## [1] -12.16217
So I suggest also select Lognormal distribution to fit \(d\) just as \(h_p\):
where \(μ=0.02193016\) and \(σ=0.32072027\).
And this case is the most fitted one among the three paramters, for its significant increase in log-likelihood value.
So in the next step, we can simulate the three parameters using the above estimated distributions, and then obtain \(α\) using the multivariate regression.