Discussion Problem
- Antithetic Variance Reduction
Assignment Problems
References

Discussion Problem

This week, one of the topics of interest is variance reduction. Discuss a method of variance reduction in simulation (see Rizzo’s Statistical Computing with R chapter 5). Demonstrate the method. Discuss when it might be used.

Antithetic Variance Reduction

This method uses negatively correlated variables as a method for reducing variance. It capitalizes on the fact that if \(U\) is uniformly distributed on \([0,1]\) then \(1-U\) has the same distribution as \(U\), but \(U\) and \(1-U\) are negatively correlated. This method can be used in Monto-Carlo simulation to approximate the value of \(\pi\) faster than it would be approximated without the variance reduction. The below code demonstrates the approximation of \(\pi\) using the Monte-Carlo methods (sometimes called a dart throwing estimate) with variance reduction. The pseudorandom number generator used is the Mersenne Twister.

#include <ctime>
#include <Rcpp.h>

double MTUniform (unsigned int seed) {
    static unsigned int X[624], m[2];
    static int i0 = 625, next[624], shift[624];
    unsigned int k, N;
    if (i0 == 625) {
        X[0] = (seed ? seed : 1);
        for (k = 0; k < 624; k++) {
           if (k) X[k] = 22695477 * X[k-1] + 1;
           (k < 623) ? next[k] = k + 1 : next[k] = 0;
           (k < 227) ? shift[k] = 397 : shift[k] = -227;
        }
        m[0] = 0; m[1] = 0x9908b0df;
        i0 = 624;
    }
    if (i0 == 624) {
       for (k = 0; k < 624; k++) {
          N = (X[k] & 0x80000000) | (X[next[k]] & 0x7fffffff);
          X[k] = X[k+shift[k]] ^ (N >> 1) ^ m[N & 1];
       }
       i0 = 0;
    }
    N = X[i0];
    i0 ++;
    N ^= (N >> 11);
    N ^= (N << 7) & 0x9d2c5680;
    N ^= (N << 15) & 0xefc60000;
    N ^= (N >> 18);
    return ( (N + 0.5) / 4294967296.0);
}

// [[Rcpp::export]]
void pi_antithetic() {
  std::clock_t start;
  int n, done;
  double U, V, X, Y, Z, Zbar, Z2bar, stdZbarhat, epsilon, t, t_star;
  epsilon = 0.0001;
  start = clock();
  Zbar = Z2bar = n = done = 0;
  std::printf("simulations  estimate   error     time elapsed   time expected\n");
  while (!done) {
    U = MTUniform(0);
    V = MTUniform(0);
    X = (U*U + V*V <= 1 ? 4 : 0);
    U = 1.0 - U;
    V = 1.0 - V;
    Y = (U*U + V*V <= 1 ? 4 : 0);
    Z = (X + Y) / 2; 
    n ++;
    Zbar = ((n-1) * Zbar + Z) / n;
    Z2bar = ((n-1) * Z2bar + Z*Z) / n;
    if (n % 50000000 == 0) {
      stdZbarhat  = sqrt ( (Z2bar - Zbar*Zbar) / n );
      t = (clock() - start) / CLK_TCK;
      t_star = t * pow (1.96 * stdZbarhat / epsilon, 2);
      std::printf("%10d \%10.6f %10.6f %10.6f %14.6f\n", n, Zbar, 1.96*stdZbarhat, t, t_star);
      if (1.96 * stdZbarhat <= epsilon) {
        done = 1;
      }
    }
  }
}

pi_antithetic()

simulations  estimate   error     time elapsed   time expected
  50000000   3.141617   0.000274   4.000000      30.116444
 100000000   3.141665   0.000194   8.000000      30.116028
 150000000   3.141650   0.000158  12.000000      30.116154
 200000000   3.141639   0.000137  17.000000      31.998513
 250000000   3.141600   0.000123  21.000000      31.622416
 300000000   3.141596   0.000112  25.000000      31.371481
 350000000   3.141565   0.000104  29.000000      31.192498
 400000000   3.141553   0.000097  33.000000      31.058155

Assignment Problems

Kelton Chapter 6 problems 1, 2, 3, 4, 5, 8

library(fitdistrplus)
base_url <- "https://raw.githubusercontent.com/jzuniga123/SPS/master/DATA%20604/"

Problem 6.5.1

The Excel file Problem_Dataset_06_01, available for download in the student area on the book’s website as described in the Preface, contains 42 observations on interarrival times (in minutes) to a call center. Use Stat::Fit (or other software) to fit One or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model from which to generate these interarival times? Provide the correct Sinio expression, heeding any parameterization-difference issues.

file <- "Problem_Dataset_06_01"
x <- read.csv(paste0(base_url, file, ".csv"), header=F)
descdist(x$V1, discrete=FALSE, boot=500)

## summary statistics
## ------
## min:  2.06   max:  48.65 
## median:  9.015 
## mean:  11.92048 
## estimated sd:  9.832445 
## estimated skewness:  1.790431 
## estimated kurtosis:  6.895655

dist <- c("gamma", "lnorm", "weibull")
fit <- list()
for (i in 1:length(dist)) { fit[[i]] = fitdist(x$V1, dist[i]) }
par(mfrow=c(2,2))
denscomp(fit, legendtext = dist)
cdfcomp(fit, legendtext = dist)
qqcomp(fit, legendtext = dist)
ppcomp(fit, legendtext = dist)

gofstat(fit)

## Goodness-of-fit statistics
##                              1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Kolmogorov-Smirnov statistic  0.07461236  0.05044292    0.07934941
## Cramer-von Mises statistic    0.05174037  0.01825121    0.05847634
## Anderson-Darling statistic    0.35230811  0.14142579    0.44537170
## 
## Goodness-of-fit criteria
##                                1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Akaike's Information Criterion    288.2076    284.9146      290.3611
## Bayesian Information Criterion    291.6830    288.3900      293.8364

The Gamma Distribution with the below parameters would be used in the Simio expression Random.Gamma(1.8546097, 0.1555721).

fitdist(x$V1, 'gamma')

## Fitting of the distribution ' gamma ' by maximum likelihood 
## Parameters:
##        estimate Std. Error
## shape 1.8546097 0.37397891
## rate  0.1555721 0.03598171

Problem 6.5.2

The Excel file Problem_Dataset_06_02, available for download in the student area on the book’s website as described in the Preface, contains 47 observations on call durations (in minutes) at the call center of Problem 1. Use Stati::Fit (or other Software) to fit one or more probability distributions to these data. including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model from which to generate these call-duration times? Provide the and correct Simio expression, heeding any parameterization-difference issues.

file <- "Problem_Dataset_06_02"
x <- read.csv(paste0(base_url, file, ".csv"), header=F)
descdist(x$V1, discrete=FALSE, boot=500)

## summary statistics
## ------
## min:  3.8   max:  36.13 
## median:  9.14 
## mean:  10.89234 
## estimated sd:  6.364957 
## estimated skewness:  1.97635 
## estimated kurtosis:  7.933002

dist <- c("gamma", "lnorm", "weibull")
fit <- list()
for (i in 1:length(dist)) { fit[[i]] = fitdist(x$V1, dist[i]) }
par(mfrow=c(2,2))
denscomp(fit, legendtext = dist)
cdfcomp(fit, legendtext = dist)
qqcomp(fit, legendtext = dist)
ppcomp(fit, legendtext = dist)

gofstat(fit)

## Goodness-of-fit statistics
##                              1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Kolmogorov-Smirnov statistic  0.09941791  0.06764782     0.1140618
## Cramer-von Mises statistic    0.11444195  0.03973639     0.2008969
## Anderson-Darling statistic    0.71667740  0.26739444     1.3230284
## 
## Goodness-of-fit criteria
##                                1-mle-gamma 2-mle-lnorm 3-mle-weibull
## Akaike's Information Criterion    288.3562    282.6829      296.2684
## Bayesian Information Criterion    292.0565    286.3832      299.9687

The Log-normal Distribution with the below parameters would be used in the Simio expression Random.Lognormal(2.2579185, 0.4905905).

fitdist(x$V1, 'lnorm')

## Fitting of the distribution ' lnorm ' by maximum likelihood 
## Parameters:
##          estimate Std. Error
## meanlog 2.2579185 0.07155999
## sdlog   0.4905905 0.05059961

Problem 6.5.3

The Excel file Problem_Dataset-06-03,available for download in the student area on the book’s website as described in the Preface, contains 45 observations on the number of extra tech-support people (beyond the initial person who originally took the call) needed to resolve the problem on a call to the call center of Problems 1 and 2. Use Stat::Fit (or each other software) to fit one or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model from which to generate the number of extra tech-Support people needed for a call? Provide the correct Simio expression. heeding any parameterization difference issues.

file <- "Problem_Dataset_06_03"
x <- read.csv(paste0(base_url, file, ".csv"), header=F)
descdist(x$V1, discrete=TRUE, boot=500)

## summary statistics
## ------
## min:  1   max:  8 
## median:  3 
## mean:  2.955556 
## estimated sd:  1.536755 
## estimated skewness:  1.02143 
## estimated kurtosis:  4.428947

dist <- c("geom", "nbinom", "pois")
fit <- list()
for (i in 1:length(dist)) { fit[[i]] = fitdist(x$V1, dist[i], discrete=T) }
par(mfrow=c(2,2))
denscomp(fit, legendtext = dist)
cdfcomp(fit, legendtext = dist)
qqcomp(fit, legendtext = dist)
ppcomp(fit, legendtext = dist)

gofstat(fit)

## Chi-squared statistic:  31.0542 1.919848 1.920083 
## Degree of freedom of the Chi-squared distribution:  3 2 3 
## Chi-squared p-value:  8.280024e-07 0.3829221 0.5891583 
##    the p-value may be wrong with some theoretical counts < 5  
## Chi-squared table:
##      obscounts theo 1-mle-geom theo 2-mle-nbinom theo 3-mle-pois
## <= 1         7       19.876752          9.265752        9.264853
## <= 2        13        6.351383         10.230512       10.230086
## <= 3        11        4.745696         10.078506       10.078530
## <= 4         8        3.545942          7.446571        7.446914
## > 4          6       10.480227          7.978659        7.979617
## 
## Goodness-of-fit criteria
##                                1-mle-geom 2-mle-nbinom 3-mle-pois
## Akaike's Information Criterion   203.2825     166.2799   164.2799
## Bayesian Information Criterion   205.0891     169.8932   166.0866

The Poisson Distribution with the below parameters would be used in the Simio expression Random.Poisson(2.9555556).

fitdist(x$V1, 'pois')

## Fitting of the distribution ' pois ' by maximum likelihood 
## Parameters:
##        estimate Std. Error
## lambda 2.955556  0.2562791

Problem 6.5.4

Derive the inverse-CDF formula for generating random variates from a continuous uniform distribution between real numbers \(a\) and \(b\), \((a < b)\). \[F(x)=y=\frac{x-a}{b-a} \Rightarrow y(b-a)=x-a \Rightarrow\]\[y(b-a)+a=x \Rightarrow\]\[Q(x) = x(b-a)+a\]

Problem 6.5.5

Derive the inverse-CDF formula for generating random variates from a Weibull distribution. Look up its definition (including its CDF) in one of the references, either book or online, cited in this chapter. Check the Simio definition in its documentation to be sure you have your formula parameterized properly.

\[F(x;k,\lambda )=y=1+{ e }^{ { -\left( { x }/{ \lambda } \right) }^{ k } } \Rightarrow\]\[y-1={ e }^{ { -\left( { x }/{ \lambda } \right) }^{ k } }\Rightarrow \ln { \left( y-1 \right) } =-\left( \frac { x }{ \lambda } \right) ^{ k }\Rightarrow -\ln { \left( y-1 \right) } =\left( \frac { x }{ \lambda } \right) ^{ k }\Rightarrow \left( -\ln { \left( y-1 \right) } \right) ^{ 1/k }=\frac { x }{ \lambda } \Rightarrow \lambda \left( -\ln { \left( y-1 \right) } \right) ^{ 1/k }=x\Rightarrow\]\[Q(x;k,\lambda )=\lambda \left( -\ln { \left( x-1 \right) } \right) ^{ 1/k }\]

Problem 6.5.8

Re-do the produce-stand spreadsheet simulation (Problem 17 from Chapter 3), but now use more precise probability mass functions for daily demand; Walther has taken good data over recent years on this, and he also now allows customers to buy only in certain pre-packaged weights. For oats, use the distribution in Problem 7 in the present chapter [0.05, 0.07, 0.09, 0.11, 0.15, 0.25, 0.10, 0.09, 0.06, 0.03]; note that the pre-packaged sizes are (in pounds), 0 (meaning that a customer chooses not to buy any packages of oats), 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 7.5, and 10.0. For peas, the pre-packaged sizes are 0, 0.5, 1.0, 1.5, 2.0, and 3.0 pounds with respective demand probabilities 0.1, 0.2, 0.2, 0.3, 0.1, and 0.1. For beans, the pre-packaged sizes are 0, 1.0, 3.0, and 4.5 pounds with respective demand probabilities 0.2, 0.4, 0.3, and 0.1. For barley, the pre-packaged sizes are 0, 0.5, 1.0, and 3.5 pounds with respective demand probabilities 0.2, 0.4, 0.3, and 0.1. Refer to Problem 7 in the present chapter for the method to generate simulated daily demands on each product.

Compute the… expected value… of \(X\). Then, write a program… to generate first \(n = 100\) and then a separate \(n = 1000\) I.I.D. variates, and for each value of \(n\), compute the sample mean…use whatever random-number generator is built-in or convenient, which is good enough for this purpose.

\[\bar{x} = \frac{\sum_{i=1}^N w_i x_i}{\sum_{i=1}^N w_i}, \quad\quad s=\sqrt{ \frac{ \sum_{i=1}^N w_i (x_i - \bar{x}^*)^2 }{ \frac{(M-1)}{M} \sum_{i=1}^N w_i } }\]

demand <- function(product) {
  x <- product[1, ]
  w <- product[2, ]
  mu <- weighted.mean(x, w)
  numerator <- sum(w * (x - mu)^2)
  M <- sum(w > 0)
  denominator <- ((M - 1) / M) * sum(w)
  sigma <- sqrt(numerator / denominator)
  N <- 1000
  sample1 <- numeric(N)
  sample2 <- numeric(N / 10)
  for (i in 1:N) {
    sample1[i] <- abs(rnorm(1, mean = mu, sd = sigma))
    if (i < 101) {
      sample2[i] <- abs(rnorm(1, mean = mu, sd = sigma))
    }
  }
  return(mean(mean(sample1), mean(sample2)))
}

oats <- matrix(c(0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 5.0, 7.5, 10.0, 0.05, 0.07, 
                 0.09, 0.11, 0.15, 0.25, 0.10, 0.09, 0.06, 0.03), 2 , 10, T)
peas <- matrix(c(0, 0.5, 1.0, 1.5, 2.0, 3.0, 0.1, 0.2, 0.2, 0.3, 0.1, 0.1), 2, 6, T)
beans <- matrix(c(0, 1.0, 3.0, 4.5, 0.2, 0.4, 0.3, 0.1), 2, 4, T)
barley <- matrix(c(0, 0.5, 1.0, 3.5, 0.2, 0.4, 0.3, 0.1), 2, 4, T)

n <- 90
day <- matrix(NA, n, 7)
for (i in 1:n) {
  day[i, 1] <- demand(oats)
  day[i, 2] <- demand(peas)
  day[i, 3] <- demand(beans)
  day[i, 4] <- demand(barley)
  day[i, 5] <- sum(day[i, 1:4] * c(1.05, 3.17, 1.99, 0.95))
  day[i, 6] <- sum(day[i, 1:4] * c(1.29, 3.76, 2.23, 1.65))
  day[i, 7] <- day[i, 6] - day[i, 5]
  }

t(data.frame(mean_cost = mean(day[, 5]), mean_revenue = mean(day[, 6]), mean_profit = mean(day[, 7]),
             total_cost = sum(day[, 5]), total_revenue = sum(day[, 6]), total_profit = sum(day[, 7])))

##                      [,1]
## mean_cost       12.538676
## mean_revenue    15.337758
## mean_profit      2.799083
## total_cost    1128.480805
## total_revenue 1380.398237
## total_profit   251.917432