HW3_604

library(tidyverse)

Note: Please use the “Code” drop-down (upper-right of this window) to show the code.

3.5.1:

Answers to a, b, & c are combined:

I’ll use a function dice_toss that allows me to run toss_qty number of times. The output is a data frame that is similar in appearance to the Excel workbook mentioned in the text:

dice_toss <- function(name, toss_qty, p){
  # p is a vector of size equal to number discrete possibilities
  d_prob <- p

  toss <- c(1:toss_qty)
  d1 <- sample(1:6, size = toss_qty, replace = TRUE, d_prob)
  d2 <- sample(1:6, size = toss_qty, replace = TRUE, d_prob)
  out <- cbind(run_name = name, toss, d1, d2) %>% 
    as.data.frame(stringsAsFactors=F)  %>% 
    transmute(run_name = as.character(run_name), 
              toss = as.numeric(toss),
              d1 = as.numeric(d1), d2 = as.numeric(d2),
              d_tot = d1+d2,
              run_tot = toss_qty)
  return(out)
}

Below, I prepare the 4 data sets that will form the answer to this homework problem:

std_prob <- c(rep(1/6, 6))
ans_base <- dice_toss("50 Toss", 50, std_prob)
ans_a <- dice_toss("500 Toss", 500, std_prob)
ans_b <- dice_toss("500 Toss with p(1)=1/3", 500, c(1/3, 2/15, 2/15, 2/15, 2/15, 2/15))
ans_c <- dice_toss("10000 Toss", 10000, std_prob) 

all_dat <- rbind(ans_base, ans_a, ans_b, ans_c)

Below is a sampling of a data set I created that contains the dice-toss runs from example 3.2.1 (50 tosses), 1a (500 tosses), 1b (500 w/ increased prob of getting a 1 on each die), and 1c (10,000 tosses).

Sample of dataset rows 48 to 52
run_name	toss	d1	d2	d_tot	run_tot
50 Toss	48	2	1	3	50
50 Toss	49	5	3	8	50
50 Toss	50	4	1	5	50
500 Toss	1	1	6	7	500
500 Toss	2	2	5	7	500

Next I compare each run for parts a, b, and c:

new_order <- c("50 Toss","500 Toss","500 Toss with p(1)=1/3", "10000 Toss")

compare_runs <- all_dat %>% 
  select(run_name, d_tot) %>% 
  group_by(run_name) %>% 
  summarise_each(funs(n(), mean, min, max)) %>% 
  ungroup() %>% 
  as.data.frame()

compare_runs$run_name <- factor(compare_runs$run_name, levels = new_order)

# reorder 
compare_runs<-compare_runs[order(compare_runs$run_name),]
row.names(compare_runs) <- NULL

Comparison of different runs:
run_name	n	mean	min	max
50 Toss	50	7.1600	2	12
500 Toss	500	7.1740	2	12
500 Toss with p(1)=1/3	500	5.9220	2	12
10000 Toss	10000	6.9753	2	12

In the above table, note that the means between the 500 and the 10000 toss runs are close. The mean of the tosses for p(1) = 1/3 (i.e. the probability of rolling a 1 has been increased to 1/3 and each other number was assigned p = (1-1/3)/5 = 2/15) has been skewed enough to drop down to near 6 instead of the true mean 7.

Below, I’ve plotted each of the runs including the example run of 50 to visually inspect the different distributions:

Above, I note that the “500 Toss with p(1)=1/3” has been skewed significantly and no longer appears to be normal in nature.

Next, I’ll prepare the data set for comparison to the actual, calculated probability for each dice sum and review it in a table:

# prob of rolling 2, 3, ..., 10, 11, 12 with 2 dice:  
actual_probs <- data.frame(run_name = rep("Actual",11),
                           d_tot = c(2.0,3.0,4.0, 
                                     5.0,6.0,7.0, 
                                     8.0,9.0,10.0, 
                                     11.0,12.0),
                           p_x = c(1,2,3,4,5,6,5,4,3,2,1)/36, 
                           stringsAsFactors = F) %>% ungroup()

compare_p <- all_dat %>% 
  select(run_name, d_tot, run_tot) %>% 
  mutate(run = 1) %>% 
  group_by(run_name, d_tot, run_tot) %>% 
  summarise_each(funs(n())) %>% 
  mutate(p_x = run/run_tot) %>% 
  select(run_name, d_tot, p_x) %>% ungroup()

disp_p <- rbind(compare_p,actual_probs) %>% 
  mutate(p_x = round(p_x,digits=4)) %>% 
  spread(run_name, p_x) %>% 
  select(d_tot, 
         `50 Toss`:`500 Toss with p(1)=1/3`,
         `10000 Toss`,
         Actual)

row.names(disp_p) <- NULL

P(Dice Total) by Run vs. Actual
d_tot	50 Toss	500 Toss	500 Toss with p(1)=1/3	10000 Toss	Actual
2	0.04	0.030	0.082	0.0281	0.0278
3	0.10	0.046	0.108	0.0569	0.0556
4	0.06	0.076	0.134	0.0833	0.0833
5	0.06	0.098	0.148	0.1136	0.1111
6	0.16	0.130	0.140	0.1422	0.1389
7	0.10	0.164	0.142	0.1644	0.1667
8	0.14	0.172	0.068	0.1367	0.1389
9	0.12	0.094	0.072	0.1059	0.1111
10	0.06	0.098	0.054	0.0872	0.0833
11	0.14	0.052	0.032	0.0559	0.0556
12	0.02	0.040	0.020	0.0258	0.0278

As expected, the table above shows that the run with 10,000 dice tosses is the closest to the calculated values.

3.5.5:

In this problem we’re using static monte carlo simulation to generate an estimate of the area under a normal density curve with mean $\mu$ and standard devation $\sigma$ by simulating values $X_i=U(b-a)+b$ in the function $Y=(b-a)\phi(X_i)$ which should provide an estimate for the area given that $E(Y)=\int_a^b \phi(x)dx$:

# some given variables and helper functions. 
mu <- 5.8
sig <- 2.3
a <- 4.5
b <- 6.7

x_i <- function(a,b, t){
  return(runif(t, a, b))
}

my_phi <- function(x, a, b, mu, sig){
 return((b-a)*dnorm(x, mu, sig))
}

# z <- x_i(a, b, 30)
# q <- my_phi(z, a, b, mu, sig)

Below, I’ve created the my_monte function that will create a data set employing the Static Monte Carlo Integration of the Normal Density Function where:

$X_{i}=(b-a)U$ where $U$ is distibuted continuously and uniformally between 0 and 1.

The normal density function is given by: $\phi_{\mu,\sigma}(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-u)^2}{2\sigma^2}}$ where $\mu$ is the mean and $\sigma$ is the standard deviation.

# Monte Carlo integration of Normal Density function
my_monte <- function(n, a, b, mu, sig){
  my_run <- data.frame(i = c(1:n), 
                       X_i = c(x_i(a, b, n))) %>% 
    mutate(`(b-a)phi(x)` = my_phi(X_i, a, b, mu, sig))
  
  return(my_run)
}

Here’s a small sample of the data set I’m working with:

Sample of Static Monte Carlo output
i	X_i	(b-a)phi(x)
1	6.506592	0.3640076
2	5.301792	0.3727487
3	5.969329	0.3805642
4	5.283372	0.3720907
5	6.040188	0.3795219
6	5.243317	0.3705818

Below I calculate the confidence intervals using the sample data I generated:

exact_mu <- pnorm(b, mu, sig, log=F) - pnorm(a, mu, sig, log=F)

sample_mu <- mean(my_run$`(b-a)phi(x)`)
sample_sig <- sd(my_run$`(b-a)phi(x)`)

my_conf <- function(mu, s, n, i){
  z_s <- c(qnorm(1-(1-0.90)/2), # z scores 90, 95, and 99
          qnorm(1-(1-0.95)/2), 
          qnorm(1-(1-0.99)/2))
  
  lower_l <- mu - z_s[i] * sample_sig/sqrt(n)
  upper_l <- mu + z_s[i] * sample_sig/sqrt(n)

  return(c(lower_l, upper_l))
}

Below is a table containing the confidence intervals for 90, 95, and 99% about the mean:

int_90 <- my_conf(sample_mu, sample_sig, nrow(my_run), 1)
int_95 <- my_conf(sample_mu, sample_sig, nrow(my_run), 2)
int_99 <- my_conf(sample_mu, sample_sig, nrow(my_run), 3)

my_intervals <- data.frame(names = c("90%", "95%", "99%"),
                      lower_b = c(int_90[1], int_95[1], int_99[1]), 
                      sample_mu = rep(sample_mu, 3),
                      upper_b = c(int_90[2], int_95[2], int_99[2]), 
                      stringsAsFactors = F) %>% 
  mutate(inside_conf = ifelse(sample_mu >= lower_b & sample_mu <= upper_b, "within", "out"))

Means within confidience intervals
names	lower_b	sample_mu	upper_b	inside_conf
90%	0.3657184	0.3688065	0.3718946	within
95%	0.3651269	0.3688065	0.3724862	within
99%	0.3639706	0.3688065	0.3736424	within

Next, I run my monte-carlo simulation for the 90, 95, and 99 % confidence intervals 5,000 times to get the below plot that contains the sample mean (sample_mu) with the 3 different confidence intervals.

run_qty <- 5000
many_runs <- NULL

for(i in 1:run_qty){
  my_run <- my_monte(run_qty, a, b, mu, sig)
  sample_mu <- mean(my_run$`(b-a)phi(x)`)
  sample_sig <- sd(my_run$`(b-a)phi(x)`)
  
  int_90 <- my_conf(sample_mu, sample_sig, nrow(my_run), 1)
  int_95 <- my_conf(sample_mu, sample_sig, nrow(my_run), 2)
  int_99 <- my_conf(sample_mu, sample_sig, nrow(my_run), 3)
  
  my_intervals <- data.frame(run_name = c(rep(i,3)),
                             names = c("90%", "95%", "99%"),
                             lower_b = c(int_90[1], int_95[1], int_99[1]), 
                             sample_mu = rep(sample_mu, 3),
                             upper_b = c(int_90[2], int_95[2], int_99[2]), 
                             stringsAsFactors = F) %>% 
  mutate(inside_conf = ifelse(sample_mu >= lower_b & 
                                sample_mu <= upper_b, "within", "out"))
  
  many_runs<-rbind(many_runs, my_intervals)
  
}

As shown in the plot below, the upper and lower bounds of the confidence intervals (gray in color) become more well-defined around the true mean of 0.3663 as the confidence increases from 90 to 99%.

ggplot(many_runs, aes(names, sample_mu)) + 
  geom_jitter(aes(y=lower_b), color = "gray", alpha = .3) + 
  geom_jitter(aes(y=upper_b), color = "gray", alpha = .3) + 
  geom_jitter(aes(y=sample_mu, colour = names), alpha = .1)

3.5.17:

Given the below data, simulate the total cost, total revenue, and total profit for each day and for a 90-day season (via problem 3.5.17 in the text pg71):

nm_code <- c(1:4)
produce <- c("oats", "peas", "beans", "barley")
w_price <- c(1.05, 3.17, 1.99, 0.95)                 # per pound wholesale $
r_price <- c(1.29, 3.76, 2.23, 1.65)                 # per pound retail $
D_min <- c(rep(0,4))                                 # zero vector min demand
D_max <- c(10, 8, 14, 11)                            # upper-range demand 
t_season <- 90                                       # number of days to run

my_df <- cbind(w_price, r_price, D_max) %>% 
  as.data.frame() 
rownames(my_df) <- produce

Wholesale $, Retail $, and Daily Peak Demand for Produce:
	w_price	r_price	D_max
oats	1.05	1.29	10
peas	3.17	3.76	8
beans	1.99	2.23	14
barley	0.95	1.65	11

Below I’ve created a function that generates a single day’s business and captures all of day’s output in single format - a business day.

D_gen <- function(i, d, w, r, nms){
  day <- i
  dmnd_qty <- round(runif(length(d))*d, digits = 0)
  cost <- dmnd_qty * w
  rev <- dmnd_qty * r
  prof <- dmnd_qty * (r-w)

  # below is the raw data frame containing cost, rev, & profit 
  # PER produce item.  Detail not shown. 
  biz_day_det <- cbind(nms, day, dmnd_qty, cost, rev, prof) %>% 
    as.data.frame()
  
  biz_day <- c(day=day, 
               daily_cost = sum(biz_day_det$cost),
               daily_rev = sum(biz_day_det$rev), 
               daily_profit = sum(biz_day_det$prof)) %>% 
    t() %>% 
    as.data.frame() 
  
  return(biz_day)
}

Here’s an example of it running on day’s 1 through 4 - the results are randomly generated and driven by the specified ranges:

Example of daily runs:
day	daily_cost	daily_rev	daily_profit
1	38.35	46.28	7.93
2	49.56	63.11	13.55
3	41.29	51.69	10.40
4	33.21	41.71	8.50

To capture the information for a season, I’ve created the below function that re-runs the above function for a given number of days, sums the results, and produces the season totals among cost, revenue, and profit.

season_run <- function(season_t){
  a_season <- NULL #empty season
  for(i in 1:season_t){
    a_season <- rbind(a_season, 
                      D_gen(i, D_max, w_price, r_price, nm_code)) %>% 
      as.data.frame()
    
  }
  season_results<- c(cost = sum(a_season$daily_cost), 
                   revenue = sum(a_season$daily_rev), 
                   profit = sum(a_season$daily_profit))
  return(season_results)
}

Below, please find 10 different 90-day season runs - i.e. each row represents the total season’s cost, revenue, and profit:

Example of 10 unique 90-day runs:
cost	revenue	profit
3245.62	4063.71	818.09
3093.13	3865.93	772.80
3242.27	4052.72	810.45
3174.62	3952.55	777.93
3298.59	4100.95	802.36
3296.37	4139.39	843.02
3262.51	4045.60	783.09
3355.63	4140.04	784.41
3209.00	4006.75	797.75
3329.65	4135.00	805.35

HW3_604

jbrnbrg

September 13, 2017

3.5.1:

3.5.5:

3.5.17: