DATA 604: Week 8 Assignment

From Simio and Simulation: Modeling, Analysis, Applications

Chapter 6 Problem # 1

The Excel file Problem_Dataset_06_01.xls contains 42 observations on interarrival times (in minutes) to a call center. Use Stat::Fit to fit one or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model? Provice the correct Simio expression.

Looking at the results of Stats::Fit, an Exponential or Lognormal distribution could be appropriate for these data based on the “eyeball test” and the output of the goodness-of-fit tests.

The Exponential distribution would be the first one to try for modeling interarrival times in Simio. The expression would be:

2.+Random.Exponential(9.92)

(Lognormal: 2.+Random.Lognormal(1.72, 1.28))

Chapter 6 Problem # 2

The Excel file Problem_Dataset_06_02.xls contains 47 observations on call duration times (in minutes) to a call center. Use Stat::Fit to fit one or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model? Provice the correct Simio expression.

Looking at the results of Stats::Fit, an Exponential or Lognormal distribution could be appropriate for these data based on the “eyeball test” and the output of the goodness-of-fit tests. Based on the p-value from the goodness-of-fit tests, I would recommend the lognomral distribution.

For importing the Lognormal distribution into Simio for modeling call duration, the expression would be:

2.76+Random.Lognormal(1.84, 0.717)

Chapter 6 Problem # 3

The Excel file Problem_Dataset_06_03.xls contains 45 observations on the number of extra tech-support people need to resolve the problems on a call to the call center in Problem 1 and 2. Use Stat::Fit to fit one or more probability distributions to these data, including goodness-of-fit testing and probability plots. What’s your recommendation for a distribution to be used in the simulation model? Provice the correct Simio expression.

For this problem, the data were modeled as a discrete distribution with Binomial and Poisson distributions looking to be appropriate.

Of these, the Binomial distribution has a slightly higher goodness-of-fit p-value. The P-P Plot in this case is too close to differentiate between the two.

For importing the Binomial distribution into Simio for modeling call center support staff needed, the expression would be:

Random.Binomial(0.185, 16.)

Chapter 6 Problem # 4

Derive the inverse-CDF formula for generating random variates from a continuous uniform distribution between real numbers a and b (a < b)

The CDF for a continuous uniform distribution between real numbers a and b is given by:

\[ F(x) = \begin{cases} 0, x \lt a \\ \frac{x - a}{b - a} , a \le x \le b \\ 1, x \gt b \end{cases} \]

Using the inverse transform method,

\[ F(X) = (X - a)/(b - a) = R \]

or \[ R = (X - a)/(b - a) \]

Solving for X:

$ R(b - a) = X - a $

$ a + (b - a)R = X $ or

\[ X = a + (b - a)R \]

Chapter 6 Problem # 5

Derive the inverse-CDF formula for generating random variates from a Weibull distribution.

The Weibull distribution a model for time failure for machines or electronic components.

\[ f(x) = \begin{cases} \frac{\beta}{\sigma^\beta}x^{\beta-1}e^{-(x/\sigma)^\beta}, x \ge 0 \\ \frac{x - a}{b - a} , a \le x \le b \\ 0, otherwise \end{cases} \]

To generate a Weibull variate:

Determine the CDF which is given by:

\[ F(X) = 1 - e^{-(x/\sigma)^\beta} \space\space where \space \space x \ge 0 \]

Set $F(X) = 1 - e^{-(X/)} = R $

\[ R = 1 - e^{-(X/\sigma)^\beta} \]

\[ e^{-(X/\sigma)^\beta} = 1 - R \]

\[ -(X/\sigma)^\beta = ln(1 - R) \] \[ (X/\sigma)^\beta = -ln(1 - R) \] \[ (X)^\beta = \sigma[-ln(1 - R)] \] Giving us X in terms of R:

\[ X = \sigma[-ln(1 - R)]^{1\beta} \]

Chapter 6 Problem # 8

Redo the produce-stand spreadsheet simulation (Problem 17 from Chapter 3), but now use more precise probability mass functions for the daily demand; Walther has taken good data over the recent years on this, and he also now allows customers to buy only certain pre-packaged weights.

Set up the weights, probability, and cumulative probability per crop

# oats
oats <- data.frame(type = 'oats',
                   qty=c(0, 0.5, 1.0, 1.5, 2.0, 3, 4, 5, 7.5, 10),
                   prob=c(0.05, 0.07, 0.09, 0.11, 0.15, 0.25, 0.1, 0.09, 0.06, 0.03))
                   
oats$cumulative_prob <- cumsum(oats$prob)

# peas
peas <- data.frame(type = 'peas',
                  qty = c(0,0.5,1,1.5,2,3),
                  prob= c(0.1,0.2,0.2,0.3,0.1,0.1))

peas$cumulative_prob <- cumsum(peas$prob)

# beans
beans <- data.frame(type = 'beans',
                    qty= c(0,1,3,4.5),
                    prob = c(0.2,0.4,0.3,0.1))
                    
beans$cumulative_prob <- cumsum(beans$prob)

# barley

barley  <- data.frame(type = 'barley',
                      qty=c(0,0.5,1,3.5),
                      prob = c(0.2,0.4,0.3,0.1))
                      

barley$cumulative_prob <- cumsum(barley$prob)                    

# Combine them all into single data frame
crop_prob_table <- rbind(oats, peas, beans, barley)
crop_prob_table$type <- as.character(crop_prob_table$type)


# create the crop reference data for acquisition cose and sale price for each crop
crop_data <- 
  data.frame(
     crop = c("oats", "peas", "beans", "barley"),
     wac = c(1.05, 3.17, 1.99, 0.95),     # acquisition cost
     sell = c(1.29, 3.76, 2.23, 1.65),    # selling price
     crop_demand = 0,                     # simulated daily demand for the given crop; uniform distribution
     cost = 0,                            # crop simulated cost (wac * simulated demand) 
     revenue = 0,                         # crop simulated revenue value (selling price * simulated demand)  
     profit = 0,                          # crop simulated profit value (selling price * simulated demand)
     stringsAsFactors = F) 
     


## Function to simulate a crop season
sim_crop_season <- function(run_num, days) {
  
  # replicate each crop entry by the number of days in the season to simulate
  crop__sim <- as.data.frame(lapply(crop_data, rep, days))
  
  # set the iteration number
  crop__sim$run_num <- run_num 
  
  crop__sim$crop <- as.character(crop__sim$crop)  # convert from factor to character
  
  # daily daily demain as a uniform random number
  crop__sim$daily_demand <- replicate(nrow(crop__sim), runif(1)) 
  
  # set the day number
  crop__sim$day <- ceiling(seq(1:nrow(crop__sim))/4)

  for (i in 1:nrow(crop__sim)) {
    
    crop__sim$crop_demand[i] <-subset(crop_prob_table, type == crop__sim$crop[i] & 
                                      cumulative_prob >= crop__sim$daily_demand[i], 
                                     select=qty)[1,1]
    
    crop__sim$cost[i] <- crop__sim$crop_demand[i] * crop__sim$wac[i]
    crop__sim$revenue[i] <- crop__sim$crop_demand[i] * crop__sim$sell[i]
    crop__sim$profit[i] <- crop__sim$revenue[i] - crop__sim$cost[i]
  }
 
 return (crop__sim)
}

# calcualte for one season
sim1 <- sim_crop_season(1, 90)

crop	wac	sell	crop_demand	cost	revenue	profit	run_num	daily_demand	day
oats	1.05	1.29	4.0	4.200	5.160	0.960	1	0.7883051	1
peas	3.17	3.76	1.0	3.170	3.760	0.590	1	0.4089769	1
beans	1.99	2.23	3.0	5.970	6.690	0.720	1	0.8830174	1
barley	0.95	1.65	3.5	3.325	5.775	2.450	1	0.9404673	1
oats	1.05	1.29	0.0	0.000	0.000	0.000	1	0.0455565	2
peas	3.17	3.76	1.5	4.755	5.640	0.885	1	0.5281055	2

Run 30 iterations

sim <-  rbindlist(lapply(1:30, function(x) sim_crop_season(x, 90))) %>% as.data.frame()


sim %>% 
  group_by(run_num) %>% summarise(mean_profit = sum(profit)/90,
                                  mean_cost =  sum(cost)/90,
                                  mean_revenue = sum(revenue)/90)  -> agg

## Warning: package 'bindrcpp' was built under R version 3.3.3

Results

Walther doesn’t appear to be making much money with this approach

run_num	mean_profit	mean_cost	mean_revenue
1	2.454500	10.92189	13.37639
2	2.413889	10.52683	12.94072
3	2.293889	10.73489	13.02878
4	2.528722	11.60683	14.13556
5	2.418389	10.86444	13.28283
6	2.432556	11.21150	13.64406
7	2.419333	11.49667	13.91600
8	2.477333	11.42444	13.90178
9	2.568556	11.71267	14.28122
10	2.475889	11.30211	13.77800
11	2.387056	11.33761	13.72467
12	2.321389	10.73328	13.05467
13	2.513111	11.42172	13.93483
14	2.369778	10.63772	13.00750
15	2.500056	11.21778	13.71783
16	2.566389	11.62422	14.19061
17	2.423222	11.32961	13.75283
18	2.480222	11.58050	14.06072
19	2.416778	11.23383	13.65061
20	2.363722	11.18600	13.54972
21	2.387500	10.46561	12.85311
22	2.477111	11.39967	13.87678
23	2.315889	10.79744	13.11333
24	2.368667	11.19406	13.56272
25	2.352778	10.75539	13.10817
26	2.476556	11.90239	14.37894
27	2.489167	11.65811	14.14728
28	2.547111	11.48178	14.02889
29	2.290222	10.42800	12.71822
30	2.571167	11.33056	13.90172

# Run t-tests on the mean values:

t.test(agg$mean_profit)

## 
##  One Sample t-test
## 
## data:  agg$mean_profit
## t = 164.01, df = 29, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  2.406313 2.467083
## sample estimates:
## mean of x 
##  2.436698

t.test(agg$mean_cost)

## 
##  One Sample t-test
## 
## data:  agg$mean_cost
## t = 152.02, df = 29, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  11.03345 11.33438
## sample estimates:
## mean of x 
##  11.18392

t.test(agg$mean_revenue)

## 
##  One Sample t-test
## 
## data:  agg$mean_revenue
## t = 159.72, df = 29, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  13.44620 13.79503
## sample estimates:
## mean of x 
##  13.62062