This week, one of the topics of interest is variance reduction. Discuss a method of variance reduction in simulation (see Rizzo’s Statistical Computing with R chapter 5). Demonstrate the method. Discuss when it might be used.
This method uses negatively correlated variables as a method for reducing variance. It capitalizes on the fact that if \(U\) is uniformly distributed on \([0,1]\) then \(1-U\) has the same distribution as \(U\), but \(U\) and \(1-U\) are negatively correlated. This method can be used in Monto-Carlo simulation to approximate the value of \(\pi\) faster than it would be approximated without the variance reduction. The below code demonstrates the approximation of \(\pi\) using the Monte-Carlo methods (sometimes called a dart throwing estimate) with variance reduction. The pseudorandom number generator used is the Mersenne Twister.
#include <ctime>
#include <Rcpp.h>
double MTUniform (unsigned int seed) {
static unsigned int X[624], m[2];
static int i0 = 625, next[624], shift[624];
unsigned int k, N;
if (i0 == 625) {
X[0] = (seed ? seed : 1);
for (k = 0; k < 624; k++) {
if (k) X[k] = 22695477 * X[k-1] + 1;
(k < 623) ? next[k] = k + 1 : next[k] = 0;
(k < 227) ? shift[k] = 397 : shift[k] = -227;
}
m[0] = 0; m[1] = 0x9908b0df;
i0 = 624;
}
if (i0 == 624) {
for (k = 0; k < 624; k++) {
N = (X[k] & 0x80000000) | (X[next[k]] & 0x7fffffff);
X[k] = X[k+shift[k]] ^ (N >> 1) ^ m[N & 1];
}
i0 = 0;
}
N = X[i0];
i0 ++;
N ^= (N >> 11);
N ^= (N << 7) & 0x9d2c5680;
N ^= (N << 15) & 0xefc60000;
N ^= (N >> 18);
return ( (N + 0.5) / 4294967296.0);
}
// [[Rcpp::export]]
void pi_antithetic() {
std::clock_t start;
int n, done;
double U, V, X, Y, Z, Zbar, Z2bar, stdZbarhat, epsilon, t, t_star;
epsilon = 0.0001;
start = clock();
Zbar = Z2bar = n = done = 0;
std::printf("simulations estimate error time elapsed time expected\n");
while (!done) {
U = MTUniform(0);
V = MTUniform(0);
X = (U*U + V*V <= 1 ? 4 : 0);
U = 1.0 - U;
V = 1.0 - V;
Y = (U*U + V*V <= 1 ? 4 : 0);
Z = (X + Y) / 2;
n ++;
Zbar = ((n-1) * Zbar + Z) / n;
Z2bar = ((n-1) * Z2bar + Z*Z) / n;
if (n % 50000000 == 0) {
stdZbarhat = sqrt ( (Z2bar - Zbar*Zbar) / n );
t = (clock() - start) / CLK_TCK;
t_star = t * pow (1.96 * stdZbarhat / epsilon, 2);
std::printf("%10d \%10.6f %10.6f %10.6f %14.6f\n", n, Zbar, 1.96*stdZbarhat, t, t_star);
if (1.96 * stdZbarhat <= epsilon) {
done = 1;
}
}
}
}
pi_antithetic()
simulations estimate error time elapsed time expected
50000000 3.141617 0.000274 4.000000 30.116444
100000000 3.141665 0.000194 8.000000 30.116028
150000000 3.141650 0.000158 12.000000 30.116154
200000000 3.141639 0.000137 17.000000 31.998513
250000000 3.141600 0.000123 21.000000 31.622416
300000000 3.141596 0.000112 25.000000 31.371481
350000000 3.141565 0.000104 29.000000 31.192498
400000000 3.141553 0.000097 33.000000 31.058155
Kelton Chapter 6 problems 1, 2, 3, 4, 5, 8
library(fitdistrplus)
base_url <- "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20604/"
The Excel file Problem_Dataset_06_01
, available for download in the student area on the book’s website as described in the Preface, contains 42 observations on interarrival times (in minutes) to a call center. Use Stat::Fit (or other software) to fit One or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model from which to generate these interarival times? Provide the correct Sinio expression, heeding any parameterization-difference issues.
file <- "Problem_Dataset_06_01"
x <- read.csv(paste0(base_url, file, ".csv"), header=F)
descdist(x$V1, discrete=FALSE, boot=500)
## summary statistics
## ------
## min: 2.06 max: 48.65
## median: 9.015
## mean: 11.92048
## estimated sd: 9.832445
## estimated skewness: 1.790431
## estimated kurtosis: 6.895655
dist <- c("gamma", "lnorm", "weibull")
fit <- list()
for (i in 1:length(dist)) { fit[[i]] = fitdist(x$V1, dist[i]) }
par(mfrow=c(2,2))
denscomp(fit, legendtext = dist)
cdfcomp(fit, legendtext = dist)
qqcomp(fit, legendtext = dist)
ppcomp(fit, legendtext = dist)
gofstat(fit)
## Goodness-of-fit statistics
## 1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Kolmogorov-Smirnov statistic 0.07461236 0.05044292 0.07934941
## Cramer-von Mises statistic 0.05174037 0.01825121 0.05847634
## Anderson-Darling statistic 0.35230811 0.14142579 0.44537170
##
## Goodness-of-fit criteria
## 1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Akaike's Information Criterion 288.2076 284.9146 290.3611
## Bayesian Information Criterion 291.6830 288.3900 293.8364
The Gamma Distribution with the below parameters would be used in the Simio expression Random.Gamma(1.8546097, 0.1555721)
.
fitdist(x$V1, 'gamma')
## Fitting of the distribution ' gamma ' by maximum likelihood
## Parameters:
## estimate Std. Error
## shape 1.8546097 0.37397891
## rate 0.1555721 0.03598171
The Excel file Problem_Dataset_06_02
, available for download in the student area on the book’s website as described in the Preface, contains 47 observations on call durations (in minutes) at the call center of Problem 1. Use Stati::Fit (or other Software) to fit one or more probability distributions to these data. including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model from which to generate these call-duration times? Provide the and correct Simio expression, heeding any parameterization-difference issues.
file <- "Problem_Dataset_06_02"
x <- read.csv(paste0(base_url, file, ".csv"), header=F)
descdist(x$V1, discrete=FALSE, boot=500)
## summary statistics
## ------
## min: 3.8 max: 36.13
## median: 9.14
## mean: 10.89234
## estimated sd: 6.364957
## estimated skewness: 1.97635
## estimated kurtosis: 7.933002
dist <- c("gamma", "lnorm", "weibull")
fit <- list()
for (i in 1:length(dist)) { fit[[i]] = fitdist(x$V1, dist[i]) }
par(mfrow=c(2,2))
denscomp(fit, legendtext = dist)
cdfcomp(fit, legendtext = dist)
qqcomp(fit, legendtext = dist)
ppcomp(fit, legendtext = dist)
gofstat(fit)
## Goodness-of-fit statistics
## 1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Kolmogorov-Smirnov statistic 0.09941791 0.06764782 0.1140618
## Cramer-von Mises statistic 0.11444195 0.03973639 0.2008969
## Anderson-Darling statistic 0.71667740 0.26739444 1.3230284
##
## Goodness-of-fit criteria
## 1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Akaike's Information Criterion 288.3562 282.6829 296.2684
## Bayesian Information Criterion 292.0565 286.3832 299.9687
The Log-normal Distribution with the below parameters would be used in the Simio expression Random.Lognormal(2.2579185, 0.4905905)
.
fitdist(x$V1, 'lnorm')
## Fitting of the distribution ' lnorm ' by maximum likelihood
## Parameters:
## estimate Std. Error
## meanlog 2.2579185 0.07155999
## sdlog 0.4905905 0.05059961
The Excel file Problem_Dataset-06-03
,available for download in the student area on the book’s website as described in the Preface, contains 45 observations on the number of extra tech-support people (beyond the initial person who originally took the call) needed to resolve the problem on a call to the call center of Problems 1 and 2. Use Stat::Fit (or each other software) to fit one or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model from which to generate the number of extra tech-Support people needed for a call? Provide the correct Simio expression. heeding any parameterization difference issues.
file <- "Problem_Dataset_06_03"
x <- read.csv(paste0(base_url, file, ".csv"), header=F)
descdist(x$V1, discrete=TRUE, boot=500)
## summary statistics
## ------
## min: 1 max: 8
## median: 3
## mean: 2.955556
## estimated sd: 1.536755
## estimated skewness: 1.02143
## estimated kurtosis: 4.428947
dist <- c("geom", "nbinom", "pois")
fit <- list()
for (i in 1:length(dist)) { fit[[i]] = fitdist(x$V1, dist[i], discrete=T) }
par(mfrow=c(2,2))
denscomp(fit, legendtext = dist)
cdfcomp(fit, legendtext = dist)
qqcomp(fit, legendtext = dist)
ppcomp(fit, legendtext = dist)
gofstat(fit)
## Chi-squared statistic: 31.0542 1.919848 1.920083
## Degree of freedom of the Chi-squared distribution: 3 2 3
## Chi-squared p-value: 8.280024e-07 0.3829221 0.5891583
## the p-value may be wrong with some theoretical counts < 5
## Chi-squared table:
## obscounts theo 1-mle-geom theo 2-mle-nbinom theo 3-mle-pois
## <= 1 7 19.876752 9.265752 9.264853
## <= 2 13 6.351383 10.230512 10.230086
## <= 3 11 4.745696 10.078506 10.078530
## <= 4 8 3.545942 7.446571 7.446914
## > 4 6 10.480227 7.978659 7.979617
##
## Goodness-of-fit criteria
## 1-mle-geom 2-mle-nbinom 3-mle-pois
## Akaike's Information Criterion 203.2825 166.2799 164.2799
## Bayesian Information Criterion 205.0891 169.8932 166.0866
The Poisson Distribution with the below parameters would be used in the Simio expression Random.Poisson(2.9555556)
.
fitdist(x$V1, 'pois')
## Fitting of the distribution ' pois ' by maximum likelihood
## Parameters:
## estimate Std. Error
## lambda 2.955556 0.2562791
Derive the inverse-CDF formula for generating random variates from a continuous uniform distribution between real numbers \(a\) and \(b\), \((a < b)\). \[F(x)=y=\frac{x-a}{b-a} \Rightarrow y(b-a)=x-a \Rightarrow\]\[y(b-a)+a=x \Rightarrow\]\[Q(x) = x(b-a)+a\]
Derive the inverse-CDF formula for generating random variates from a Weibull distribution. Look up its definition (including its CDF) in one of the references, either book or online, cited in this chapter. Check the Simio definition in its documentation to be sure you have your formula parameterized properly.
\[F(x;k,\lambda )=y=1+{ e }^{ { -\left( { x }/{ \lambda } \right) }^{ k } } \Rightarrow\]\[y-1={ e }^{ { -\left( { x }/{ \lambda } \right) }^{ k } }\Rightarrow \ln { \left( y-1 \right) } =-\left( \frac { x }{ \lambda } \right) ^{ k }\Rightarrow -\ln { \left( y-1 \right) } =\left( \frac { x }{ \lambda } \right) ^{ k }\Rightarrow \left( -\ln { \left( y-1 \right) } \right) ^{ 1/k }=\frac { x }{ \lambda } \Rightarrow \lambda \left( -\ln { \left( y-1 \right) } \right) ^{ 1/k }=x\Rightarrow\]\[Q(x;k,\lambda )=\lambda \left( -\ln { \left( x-1 \right) } \right) ^{ 1/k }\]
Re-do the produce-stand spreadsheet simulation (Problem 17 from Chapter 3), but now use more precise probability mass functions for daily demand; Walther has taken good data over recent years on this, and he also now allows customers to buy only in certain pre-packaged weights. For oats, use the distribution in Problem 7 in the present chapter [0.05, 0.07, 0.09, 0.11, 0.15, 0.25, 0.10, 0.09, 0.06, 0.03]; note that the pre-packaged sizes are (in pounds), 0 (meaning that a customer chooses not to buy any packages of oats), 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 7.5, and 10.0. For peas, the pre-packaged sizes are 0, 0.5, 1.0, 1.5, 2.0, and 3.0 pounds with respective demand probabilities 0.1, 0.2, 0.2, 0.3, 0.1, and 0.1. For beans, the pre-packaged sizes are 0, 1.0, 3.0, and 4.5 pounds with respective demand probabilities 0.2, 0.4, 0.3, and 0.1. For barley, the pre-packaged sizes are 0, 0.5, 1.0, and 3.5 pounds with respective demand probabilities 0.2, 0.4, 0.3, and 0.1. Refer to Problem 7 in the present chapter for the method to generate simulated daily demands on each product.
Compute the… expected value… of \(X\). Then, write a program… to generate first \(n = 100\) and then a separate \(n = 1000\) I.I.D. variates, and for each value of \(n\), compute the sample mean…use whatever random-number generator is built-in or convenient, which is good enough for this purpose.
\[\bar{x} = \frac{\sum_{i=1}^N w_i x_i}{\sum_{i=1}^N w_i}, \quad\quad s=\sqrt{ \frac{ \sum_{i=1}^N w_i (x_i - \bar{x}^*)^2 }{ \frac{(M-1)}{M} \sum_{i=1}^N w_i } }\]
demand <- function(product) {
x <- product[1, ]
w <- product[2, ]
mu <- weighted.mean(x, w)
numerator <- sum(w * (x - mu)^2)
M <- sum(w > 0)
denominator <- ((M - 1) / M) * sum(w)
sigma <- sqrt(numerator / denominator)
N <- 1000
sample1 <- numeric(N)
sample2 <- numeric(N / 10)
for (i in 1:N) {
sample1[i] <- abs(rnorm(1, mean = mu, sd = sigma))
if (i < 101) {
sample2[i] <- abs(rnorm(1, mean = mu, sd = sigma))
}
}
return(mean(mean(sample1), mean(sample2)))
}
oats <- matrix(c(0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 7.5, 10.0, 0.05, 0.07,
0.09, 0.11, 0.15, 0.25, 0.10, 0.09, 0.06, 0.03), 2 , 10, T)
peas <- matrix(c(0, 0.5, 1.0, 1.5, 2.0, 3.0, 0.1, 0.2, 0.2, 0.3, 0.1, 0.1), 2, 6, T)
beans <- matrix(c(0, 1.0, 3.0, 4.5, 0.2, 0.4, 0.3, 0.1), 2, 4, T)
barley <- matrix(c(0, 0.5, 1.0, 3.5, 0.2, 0.4, 0.3, 0.1), 2, 4, T)
n <- 90
day <- matrix(NA, n, 7)
for (i in 1:n) {
day[i, 1] <- demand(oats)
day[i, 2] <- demand(peas)
day[i, 3] <- demand(beans)
day[i, 4] <- demand(barley)
day[i, 5] <- sum(day[i, 1:4] * c(1.05, 3.17, 1.99, 0.95))
day[i, 6] <- sum(day[i, 1:4] * c(1.29, 3.76, 2.23, 1.65))
day[i, 7] <- day[i, 6] - day[i, 5]
}
t(data.frame(mean_cost = mean(day[, 5]), mean_revenue = mean(day[, 6]), mean_profit = mean(day[, 7]),
total_cost = sum(day[, 5]), total_revenue = sum(day[, 6]), total_profit = sum(day[, 7])))
## [,1]
## mean_cost 12.538676
## mean_revenue 15.337758
## mean_profit 2.799083
## total_cost 1128.480805
## total_revenue 1380.398237
## total_profit 251.917432
https://en.wikipedia.org/wiki/Weibull_distribution
http://www.di.fc.ul.pt/~jpn/r/distributions/fitting.html
https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html
https://cran.r-project.org/web/packages/fitdistrplus/vignettes/FAQ.html
http://s1.daumcdn.net/editor/fp/service_nc/pencil/Pencil_chromestore.html
https://www.rdocumentation.org/packages/fitdistrplus/versions/1.0-8/topics/fitdist
Simio and Simulation: Modeling, Analysis, Applications 3d Ed. by W. David Kelton, Jeffrey S. Smith and David T. Sturrock with Simio software.
Discrete-Event Systems Simulation, 5th Edition (2010), by Jerry Banks, John S. Carlson, Barry L. Nelson,and David M. Nicol.
Statistical Computing with R (2008), by Maria L. Rizzo. ISBN 1-58488-545-9
Methods of Monte Carlo Simulation, Baruch College MTH 4135, C. Douglas Howard, 2011