library(tidyverse)
Note: Please use the “Code” drop-down (upper-right of this window) to show the code.
Answers to a, b, & c are combined:
I’ll use a function dice_toss
that allows me to run toss_qty
number of times. The output is a data frame that is similar in appearance to the Excel workbook mentioned in the text:
dice_toss <- function(name, toss_qty, p){
# p is a vector of size equal to number discrete possibilities
d_prob <- p
toss <- c(1:toss_qty)
d1 <- sample(1:6, size = toss_qty, replace = TRUE, d_prob)
d2 <- sample(1:6, size = toss_qty, replace = TRUE, d_prob)
out <- cbind(run_name = name, toss, d1, d2) %>%
as.data.frame(stringsAsFactors=F) %>%
transmute(run_name = as.character(run_name),
toss = as.numeric(toss),
d1 = as.numeric(d1), d2 = as.numeric(d2),
d_tot = d1+d2,
run_tot = toss_qty)
return(out)
}
Below, I prepare the 4 data sets that will form the answer to this homework problem:
std_prob <- c(rep(1/6, 6))
ans_base <- dice_toss("50 Toss", 50, std_prob)
ans_a <- dice_toss("500 Toss", 500, std_prob)
ans_b <- dice_toss("500 Toss with p(1)=1/3", 500, c(1/3, 2/15, 2/15, 2/15, 2/15, 2/15))
ans_c <- dice_toss("10000 Toss", 10000, std_prob)
all_dat <- rbind(ans_base, ans_a, ans_b, ans_c)
Below is a sampling of a data set I created that contains the dice-toss runs from example 3.2.1 (50 tosses), 1a (500 tosses), 1b (500 w/ increased prob of getting a 1 on each die), and 1c (10,000 tosses).
run_name | toss | d1 | d2 | d_tot | run_tot |
---|---|---|---|---|---|
50 Toss | 48 | 2 | 1 | 3 | 50 |
50 Toss | 49 | 5 | 3 | 8 | 50 |
50 Toss | 50 | 4 | 1 | 5 | 50 |
500 Toss | 1 | 1 | 6 | 7 | 500 |
500 Toss | 2 | 2 | 5 | 7 | 500 |
Next I compare each run for parts a, b, and c:
new_order <- c("50 Toss","500 Toss","500 Toss with p(1)=1/3", "10000 Toss")
compare_runs <- all_dat %>%
select(run_name, d_tot) %>%
group_by(run_name) %>%
summarise_each(funs(n(), mean, min, max)) %>%
ungroup() %>%
as.data.frame()
compare_runs$run_name <- factor(compare_runs$run_name, levels = new_order)
# reorder
compare_runs<-compare_runs[order(compare_runs$run_name),]
row.names(compare_runs) <- NULL
run_name | n | mean | min | max |
---|---|---|---|---|
50 Toss | 50 | 7.1600 | 2 | 12 |
500 Toss | 500 | 7.1740 | 2 | 12 |
500 Toss with p(1)=1/3 | 500 | 5.9220 | 2 | 12 |
10000 Toss | 10000 | 6.9753 | 2 | 12 |
In the above table, note that the means between the 500 and the 10000 toss runs are close. The mean of the tosses for p(1) = 1/3 (i.e. the probability of rolling a 1 has been increased to 1/3 and each other number was assigned p = (1-1/3)/5 = 2/15) has been skewed enough to drop down to near 6 instead of the true mean 7.
Below, I’ve plotted each of the runs including the example run of 50 to visually inspect the different distributions:
Above, I note that the “500 Toss with p(1)=1/3” has been skewed significantly and no longer appears to be normal in nature.
Next, I’ll prepare the data set for comparison to the actual, calculated probability for each dice sum and review it in a table:
# prob of rolling 2, 3, ..., 10, 11, 12 with 2 dice:
actual_probs <- data.frame(run_name = rep("Actual",11),
d_tot = c(2.0,3.0,4.0,
5.0,6.0,7.0,
8.0,9.0,10.0,
11.0,12.0),
p_x = c(1,2,3,4,5,6,5,4,3,2,1)/36,
stringsAsFactors = F) %>% ungroup()
compare_p <- all_dat %>%
select(run_name, d_tot, run_tot) %>%
mutate(run = 1) %>%
group_by(run_name, d_tot, run_tot) %>%
summarise_each(funs(n())) %>%
mutate(p_x = run/run_tot) %>%
select(run_name, d_tot, p_x) %>% ungroup()
disp_p <- rbind(compare_p,actual_probs) %>%
mutate(p_x = round(p_x,digits=4)) %>%
spread(run_name, p_x) %>%
select(d_tot,
`50 Toss`:`500 Toss with p(1)=1/3`,
`10000 Toss`,
Actual)
row.names(disp_p) <- NULL
d_tot | 50 Toss | 500 Toss | 500 Toss with p(1)=1/3 | 10000 Toss | Actual |
---|---|---|---|---|---|
2 | 0.04 | 0.030 | 0.082 | 0.0281 | 0.0278 |
3 | 0.10 | 0.046 | 0.108 | 0.0569 | 0.0556 |
4 | 0.06 | 0.076 | 0.134 | 0.0833 | 0.0833 |
5 | 0.06 | 0.098 | 0.148 | 0.1136 | 0.1111 |
6 | 0.16 | 0.130 | 0.140 | 0.1422 | 0.1389 |
7 | 0.10 | 0.164 | 0.142 | 0.1644 | 0.1667 |
8 | 0.14 | 0.172 | 0.068 | 0.1367 | 0.1389 |
9 | 0.12 | 0.094 | 0.072 | 0.1059 | 0.1111 |
10 | 0.06 | 0.098 | 0.054 | 0.0872 | 0.0833 |
11 | 0.14 | 0.052 | 0.032 | 0.0559 | 0.0556 |
12 | 0.02 | 0.040 | 0.020 | 0.0258 | 0.0278 |
As expected, the table above shows that the run with 10,000 dice tosses is the closest to the calculated values.
In this problem we’re using static monte carlo simulation to generate an estimate of the area under a normal density curve with mean \(\mu\) and standard devation \(\sigma\) by simulating values \(X_i=U(b-a)+b\) in the function \(Y=(b-a)\phi(X_i)\) which should provide an estimate for the area given that \(E(Y)=\int_a^b \phi(x)dx\):
# some given variables and helper functions.
mu <- 5.8
sig <- 2.3
a <- 4.5
b <- 6.7
x_i <- function(a,b, t){
return(runif(t, a, b))
}
my_phi <- function(x, a, b, mu, sig){
return((b-a)*dnorm(x, mu, sig))
}
# z <- x_i(a, b, 30)
# q <- my_phi(z, a, b, mu, sig)
Below, I’ve created the my_monte
function that will create a data set employing the Static Monte Carlo Integration of the Normal Density Function where:
\(X_{i}=(b-a)U\) where \(U\) is distibuted continuously and uniformally between 0 and 1.
The normal density function is given by: \(\phi_{\mu,\sigma}(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-u)^2}{2\sigma^2}}\) where \(\mu\) is the mean and \(\sigma\) is the standard deviation.
# Monte Carlo integration of Normal Density function
my_monte <- function(n, a, b, mu, sig){
my_run <- data.frame(i = c(1:n),
X_i = c(x_i(a, b, n))) %>%
mutate(`(b-a)phi(x)` = my_phi(X_i, a, b, mu, sig))
return(my_run)
}
Here’s a small sample of the data set I’m working with:
i | X_i | (b-a)phi(x) |
---|---|---|
1 | 6.506592 | 0.3640076 |
2 | 5.301792 | 0.3727487 |
3 | 5.969329 | 0.3805642 |
4 | 5.283372 | 0.3720907 |
5 | 6.040188 | 0.3795219 |
6 | 5.243317 | 0.3705818 |
Below I calculate the confidence intervals using the sample data I generated:
exact_mu <- pnorm(b, mu, sig, log=F) - pnorm(a, mu, sig, log=F)
sample_mu <- mean(my_run$`(b-a)phi(x)`)
sample_sig <- sd(my_run$`(b-a)phi(x)`)
my_conf <- function(mu, s, n, i){
z_s <- c(qnorm(1-(1-0.90)/2), # z scores 90, 95, and 99
qnorm(1-(1-0.95)/2),
qnorm(1-(1-0.99)/2))
lower_l <- mu - z_s[i] * sample_sig/sqrt(n)
upper_l <- mu + z_s[i] * sample_sig/sqrt(n)
return(c(lower_l, upper_l))
}
Below is a table containing the confidence intervals for 90, 95, and 99% about the mean:
int_90 <- my_conf(sample_mu, sample_sig, nrow(my_run), 1)
int_95 <- my_conf(sample_mu, sample_sig, nrow(my_run), 2)
int_99 <- my_conf(sample_mu, sample_sig, nrow(my_run), 3)
my_intervals <- data.frame(names = c("90%", "95%", "99%"),
lower_b = c(int_90[1], int_95[1], int_99[1]),
sample_mu = rep(sample_mu, 3),
upper_b = c(int_90[2], int_95[2], int_99[2]),
stringsAsFactors = F) %>%
mutate(inside_conf = ifelse(sample_mu >= lower_b & sample_mu <= upper_b, "within", "out"))
names | lower_b | sample_mu | upper_b | inside_conf |
---|---|---|---|---|
90% | 0.3657184 | 0.3688065 | 0.3718946 | within |
95% | 0.3651269 | 0.3688065 | 0.3724862 | within |
99% | 0.3639706 | 0.3688065 | 0.3736424 | within |
Next, I run my monte-carlo simulation for the 90, 95, and 99 % confidence intervals 5,000 times to get the below plot that contains the sample mean (sample_mu
) with the 3 different confidence intervals.
run_qty <- 5000
many_runs <- NULL
for(i in 1:run_qty){
my_run <- my_monte(run_qty, a, b, mu, sig)
sample_mu <- mean(my_run$`(b-a)phi(x)`)
sample_sig <- sd(my_run$`(b-a)phi(x)`)
int_90 <- my_conf(sample_mu, sample_sig, nrow(my_run), 1)
int_95 <- my_conf(sample_mu, sample_sig, nrow(my_run), 2)
int_99 <- my_conf(sample_mu, sample_sig, nrow(my_run), 3)
my_intervals <- data.frame(run_name = c(rep(i,3)),
names = c("90%", "95%", "99%"),
lower_b = c(int_90[1], int_95[1], int_99[1]),
sample_mu = rep(sample_mu, 3),
upper_b = c(int_90[2], int_95[2], int_99[2]),
stringsAsFactors = F) %>%
mutate(inside_conf = ifelse(sample_mu >= lower_b &
sample_mu <= upper_b, "within", "out"))
many_runs<-rbind(many_runs, my_intervals)
}
As shown in the plot below, the upper and lower bounds of the confidence intervals (gray in color) become more well-defined around the true mean of 0.3663 as the confidence increases from 90 to 99%.
ggplot(many_runs, aes(names, sample_mu)) +
geom_jitter(aes(y=lower_b), color = "gray", alpha = .3) +
geom_jitter(aes(y=upper_b), color = "gray", alpha = .3) +
geom_jitter(aes(y=sample_mu, colour = names), alpha = .1)
Given the below data, simulate the total cost, total revenue, and total profit for each day and for a 90-day season (via problem 3.5.17 in the text pg71):
nm_code <- c(1:4)
produce <- c("oats", "peas", "beans", "barley")
w_price <- c(1.05, 3.17, 1.99, 0.95) # per pound wholesale $
r_price <- c(1.29, 3.76, 2.23, 1.65) # per pound retail $
D_min <- c(rep(0,4)) # zero vector min demand
D_max <- c(10, 8, 14, 11) # upper-range demand
t_season <- 90 # number of days to run
my_df <- cbind(w_price, r_price, D_max) %>%
as.data.frame()
rownames(my_df) <- produce
w_price | r_price | D_max | |
---|---|---|---|
oats | 1.05 | 1.29 | 10 |
peas | 3.17 | 3.76 | 8 |
beans | 1.99 | 2.23 | 14 |
barley | 0.95 | 1.65 | 11 |
Below I’ve created a function that generates a single day’s business and captures all of day’s output in single format - a business day.
D_gen <- function(i, d, w, r, nms){
day <- i
dmnd_qty <- round(runif(length(d))*d, digits = 0)
cost <- dmnd_qty * w
rev <- dmnd_qty * r
prof <- dmnd_qty * (r-w)
# below is the raw data frame containing cost, rev, & profit
# PER produce item. Detail not shown.
biz_day_det <- cbind(nms, day, dmnd_qty, cost, rev, prof) %>%
as.data.frame()
biz_day <- c(day=day,
daily_cost = sum(biz_day_det$cost),
daily_rev = sum(biz_day_det$rev),
daily_profit = sum(biz_day_det$prof)) %>%
t() %>%
as.data.frame()
return(biz_day)
}
Here’s an example of it running on day’s 1 through 4 - the results are randomly generated and driven by the specified ranges:
day | daily_cost | daily_rev | daily_profit |
---|---|---|---|
1 | 38.35 | 46.28 | 7.93 |
2 | 49.56 | 63.11 | 13.55 |
3 | 41.29 | 51.69 | 10.40 |
4 | 33.21 | 41.71 | 8.50 |
To capture the information for a season, I’ve created the below function that re-runs the above function for a given number of days, sums the results, and produces the season totals among cost, revenue, and profit.
season_run <- function(season_t){
a_season <- NULL #empty season
for(i in 1:season_t){
a_season <- rbind(a_season,
D_gen(i, D_max, w_price, r_price, nm_code)) %>%
as.data.frame()
}
season_results<- c(cost = sum(a_season$daily_cost),
revenue = sum(a_season$daily_rev),
profit = sum(a_season$daily_profit))
return(season_results)
}
Below, please find 10 different 90-day season runs - i.e. each row represents the total season’s cost, revenue, and profit:
cost | revenue | profit |
---|---|---|
3245.62 | 4063.71 | 818.09 |
3093.13 | 3865.93 | 772.80 |
3242.27 | 4052.72 | 810.45 |
3174.62 | 3952.55 | 777.93 |
3298.59 | 4100.95 | 802.36 |
3296.37 | 4139.39 | 843.02 |
3262.51 | 4045.60 | 783.09 |
3355.63 | 4140.04 | 784.41 |
3209.00 | 4006.75 | 797.75 |
3329.65 | 4135.00 | 805.35 |