bsad_lab08

About

In this lab we will focus on sensitivity analysis and Monte Carlo simulations.

Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. We will use the lpSolveAPI R-package as we did in the previous lab.

Monte Carlo Simulations utilize repeated random sampling from a given universe or population to derive certain results. This type of simulation is known as a probabilistic simulation, as opposed to a deterministic simulation.

An example of a Monte Carlo simulation is the one applied to approximate the value of pi. The simulation is based on generating random points within a unit square and see how many points fall within the circle enclosed by the unit square (marked in red). The higher the number of sampled points the closer the result is to the actual result. After selecting 30,000 random points, the estimate for pi is much closer to the actual value within the four decimal points of precision.

In this lab, we will learn how to generate random samples with various simulations and how to run a sensitivity analysis on the marketing use case covered so far.

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.

PART A: SENSITIVITY ANALYSIS

In order to conduct the sensitivity analysis, we will need to download again the lpSolveAPI package unless you have it already installed in your R environment

# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lpSolveAPI",quietly = TRUE))
  install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")

We will revisit and solve again the marketing case discussed in class (also part of previous lab).

# We start with `0` constraint and `2` decision variables. The object name `lpmark` is discretionary.
lpmark = make.lp(0, 2)

# Define type of optimization as maximum and dump the screen output into a `dummy` variable
dummy = lp.control(lpmark, sense="max") 

# Set the objective function coefficients 
set.objfn(lpmark, c(275.691, 48.341))

#Add all constraints to the model
add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">=", 15000)
add.constraint(lpmark, c(0, 1), ">=", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)

#Show the problem setting in tabular/matrix form. It's useful to see if our contraints have been properly set.
lpmark

## Model name: 
##                C1       C2            
## Maximize  275.691   48.341            
## R1              1        1  <=  350000
## R2              1        0  >=   15000
## R3              0        1  >=   75000
## R4              2       -1   =       0
## R5              1        0  >=       0
## R6              0        1  >=       0
## Kind          Std      Std            
## Type         Real     Real            
## Upper         Inf      Inf            
## Lower           0        0

#Solve the linear programming problem
solve(lpmark)

## [1] 0

#The next two lines of codes will show the optimum results.
#Frist: Display the objective function optimum value i.e. the optimum sales value.
get.objective(lpmark)

## [1] 43443517

#Second: Display the decision variables optimum values i.e. the optimum values for radio and tv ads.
get.variables(lpmark)

## [1] 116666.7 233333.3

For the sensitivity part we will add two new code sections to obtain the sensitivity results.

#Display sensitivity to the COEFFICIENTS of objective function. 
get.sensitivity.obj(lpmark)

## $objfrom
## [1]  -96.6820 -137.8455
## 
## $objtill
## [1] 1e+30 1e+30

TASK 1: The results have two parts which are: the output labeled `objfrom` shows the lower limit of the coefficients while the output labeled `objtill` shows the upper limit. Explain in coincise manner what the sensitivity results represent in reference to the marketing model.

ANSWER TASK 1: The sensitivity results show us the lower limits of both tv and radio. Radio has a lower limit of -96.682 while tv has a lower limit of -137.846. There are no upper limits in this situation. This means that radio and tv values can move anywhere in the scale of -96.682 to infinity or -137.846 to infinity without changing the optimum solution. The optimum solution will only change if the vaues go beyond the stated lower limits.

#Display sensitivity to the CONSTRAINTS (or the right hand side values). 
#There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark)

## $duals
## [1] 124.12433   0.00000   0.00000  75.78333   0.00000   0.00000   0.00000
## [8]   0.00000
## 
## $dualsfrom
## [1]  1.125e+05 -1.000e+30 -1.000e+30 -3.050e+05 -1.000e+30 -1.000e+30
## [7] -1.000e+30 -1.000e+30
## 
## $dualstill
## [1] 1.00e+30 1.00e+30 1.00e+30 4.75e+05 1.00e+30 1.00e+30 1.00e+30 1.00e+30

TASK 2: For this exercise we are only interested in the first part of the output which is labeled `duals`. Explain in coincise manner what the two non-zero sensitivity results represent. In your answer, distinguish between the binding and non-binding constraints, and include the explanation about the surplus/slack, and marginal values.

ANSWER TASK 2: The values 124.12433 and 75.78333 are binding constraints in this situation. These values are binding because the resource is limiting and any increase in these values will have an impact the optimal solution. The 0 values are non-binding because when it changes in value it has no impact on the optimal solution. The binding values do not have a surplus or slack in resource. There is a marginal value, implied by the vlaue being binding.

To acquire a better understanding of the sensitivity results, and to confirm integrity of the calculations, independent tests can be conducted.

TASK 3: Run the linear programing solver again starting from the begining, by defining a new model object `lpmark1`. All being equal, change the budget constraint by only $1 and solve. Specifially, all being equal, change the first constraint X1 + X2 <= 350000 by only $1 so that the new constraint will be X1 + X2 <= 350001. Note the optimum value for sales as given by the objective function.

The optimum value of sales as given by the objective function is 43443641.

# Define a new model object called lpmark1
lpmark1 = make.lp(0, 2)
# Repeat rest of commands with the one constraint change for budget. Solve and display the objective function optimum value
lpmark1 = make.lp(0, 2)

dummy = lp.control(lpmark1, sense="max") 

set.objfn(lpmark1, c(275.691, 48.341))
add.constraint(lpmark1, c(1, 1), "<=", 350001)
add.constraint(lpmark1, c(1, 0), ">=", 15000)
add.constraint(lpmark1, c(0, 1), ">=", 75000)
add.constraint(lpmark1, c(2, -1), "=", 0)
add.constraint(lpmark1, c(1, 0), ">=", 0)
add.constraint(lpmark1, c(0, 1), ">=", 0)

lpmark1

## Model name: 
##                C1       C2            
## Maximize  275.691   48.341            
## R1              1        1  <=  350001
## R2              1        0  >=   15000
## R3              0        1  >=   75000
## R4              2       -1   =       0
## R5              1        0  >=       0
## R6              0        1  >=       0
## Kind          Std      Std            
## Type         Real     Real            
## Upper         Inf      Inf            
## Lower           0        0

solve(lpmark1)

## [1] 0

get.objective(lpmark1)

## [1] 43443641

get.variables(lpmark1)

## [1] 116667 233334

ANSWER TASK 4: The optimum sales for lpmark is 43443517. The optimum sales for lpmark1 is 43443641. The differential change in sales between the two vales is 124.

TASK 5: Running the linear programing solver again starting from the begining, by defining a new model object `lpmark2`. All being equal, change the fourth constraint 2X1 - X2 = 0 by only $1 and solve. The new constraint will be 2X1 - X2 = 1. Note the optimum value for sales as given by the objective function.

The optimum value for sales as given by the objective function is 43443592.

# Define a new model object called lpmark2
lpmark2 = make.lp(0, 2)
# Repeat rest of commands with the above constraint changed. Solve and display the objective function optimum value
lpmark2 = make.lp(0, 2)

dummy = lp.control(lpmark2, sense="max") 

set.objfn(lpmark2, c(275.691, 48.341))
add.constraint(lpmark2, c(1, 1), "<=", 350000)
add.constraint(lpmark2, c(1, 0), ">=", 15000)
add.constraint(lpmark2, c(0, 1), ">=", 75000)
add.constraint(lpmark2, c(2, -1), "=", 1)
add.constraint(lpmark2, c(1, 0), ">=", 0)
add.constraint(lpmark2, c(0, 1), ">=", 0)

lpmark2

## Model name: 
##                C1       C2            
## Maximize  275.691   48.341            
## R1              1        1  <=  350000
## R2              1        0  >=   15000
## R3              0        1  >=   75000
## R4              2       -1   =       1
## R5              1        0  >=       0
## R6              0        1  >=       0
## Kind          Std      Std            
## Type         Real     Real            
## Upper         Inf      Inf            
## Lower           0        0

solve(lpmark2)

## [1] 0

get.objective(lpmark2)

## [1] 43443592

get.variables(lpmark2)

## [1] 116667 233333

ANSWER TASK 6: The optimum sales for lpmark is 43443517. The optium sales for lpmark2 is 43443592. The differential change in sales between the two vales is 75. The optimum sales for lpmark1 is 43443641. The optium sales for lpmark2 is 43443592. The differential change in sales between the two vales is 49.

PART B: MONTE CARLO SIMULATION

For this task we will be running a Monte Carlo simulation to calculate the probability that the daily return from S&P will be > 5%. We will assume that the historical S&P daily return follows a normal distribution with an average daily return of 0.03 (%) and a standard deviation of 0.97 (%).

To begin we will generate 100 random samples from the normal distribution. For the generated samples we will calculate the mean, standard deviation, and probability of occurrence where the simulation result is greater than 5%.

To generate random samples from a normal distribution we will use the rnorm() function in R. In the example below we set the number of runs (or samples) to 100.

# number of simulations/samples
runs = 100
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)

# Mean calculated from the random distribution of samples
average = mean(sims)
average

## [1] 0.07731953

# STD calculated from the random distribution of samples
std = sd(sims) 
std

## [1] 1.025724

# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples. 
prob = sum(sims >=0.05)/runs
prob

## [1] 0.51

TASK 7: Repeat the above calculations for the case where the number of simulations/samples is equal to 1000. record the mean, standard deviation, and probability. Name all the required variables as runs1, sims1, average1, std1, and prob1.

# Repeat calculations here
runs1 = 1000
sims1 = rnorm(runs1,mean=0.03,sd=0.97)
average1 = mean(sims1)
average1

## [1] -0.0002551315

std1 = sd(sims1)
std1

## [1] 0.9840458

prob1 = sum (sims1 >=0.05)/runs1
prob1

## [1] 0.471

TASK 8: Repeat the above calculations for the case where the number of simulations/samples is equal to 10000. record the mean, standard deviation, and probability. Name all the required variables as runs2, sims2, average2, std2, and prob2.

# Repeat calculations here
runs2 = 10000
sims2 = rnorm(runs2,mean=0.03,sd=0.97)
average2 = mean(sims2)
average2

## [1] 0.03491316

std2 = sd(sims2)
std2

## [1] 0.9751932

prob2 = sum (sims2 >=0.05)/runs2
prob2

## [1] 0.495

TASK 9: List in a tabular form the values for mean, standard deviation, and probability for all three cases: 100, 1000, and 10000 simulations.

runs = 100: average = 0.09111678 std = 0.9815205 prob = 0.58

runs1 = 1000 average1 = -0.009480934 std1 = 0.9548793 prob1 = 0.475

runs2 = 10000 average2 = 0.04620888 std2 = 0.9792959 prob2 = 0.4996

TASK 10: Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this similar to the image use case to calculate `pi` that was presented in the introductory paragraph?

ANSWER TASK 10: With a larger sample it is to be expected that the data portrays more accurate results given the numbers are randomly selected from a sample. Therefore, our best bet would be the probability of the 10000 sample, which is 0.4996. As the pi example increases the number of samples or data points it uses the more the graph fills in and supports the data. This example is similar, with few data points the graph seems spotted and the data random but the more simulations that are considered the more clear the pattern of the data becomes.The sample of 10000 offers the most simulations and therefore we can conclude that this probability calculation is more accurate than the others.

The last 2C) exercise is optional for those interested in further enhancing their subject matter learning, and refining their skills in R. Your work will be assessed but you will not be graded for this exercise. You can follow the instructions presented in the video Excel equivalent example at [https://www.youtube.com/watch?v=wKdmEXCvo9s]

2C) Repeat the exercise for the S&P daily return where all is equal except we are now interested in the weekly cumulative return and the probability that the weekly cummulative return is greater than 5%. Set the number of simulations to 10000.

bsad_lab08

CME Group Foundation Business Analytics Lab

Chase Wright

04/03/2019

About

Setup

Note

PART A: SENSITIVITY ANALYSIS

TASK 1: The results have two parts which are: the output labeled `objfrom` shows the lower limit of the coefficients while the output labeled `objtill` shows the upper limit. Explain in coincise manner what the sensitivity results represent in reference to the marketing model.

ANSWER TASK 4: The optimum sales for lpmark is 43443517. The optimum sales for lpmark1 is 43443641. The differential change in sales between the two vales is 124.

PART B: MONTE CARLO SIMULATION

TASK 7: Repeat the above calculations for the case where the number of simulations/samples is equal to 1000. record the mean, standard deviation, and probability. Name all the required variables as runs1, sims1, average1, std1, and prob1.

TASK 8: Repeat the above calculations for the case where the number of simulations/samples is equal to 10000. record the mean, standard deviation, and probability. Name all the required variables as runs2, sims2, average2, std2, and prob2.

TASK 9: List in a tabular form the values for mean, standard deviation, and probability for all three cases: 100, 1000, and 10000 simulations.

TASK 10: Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this similar to the image use case to calculate `pi` that was presented in the introductory paragraph?

2C) Repeat the exercise for the S&P daily return where all is equal except we are now interested in the weekly cumulative return and the probability that the weekly cummulative return is greater than 5%. Set the number of simulations to 10000.

bsad_lab08

CME Group Foundation Business Analytics Lab

Chase Wright

04/03/2019

About

Setup

Note

PART A: SENSITIVITY ANALYSIS

TASK 1: The results have two parts which are: the output labeled objfrom shows the lower limit of the coefficients while the output labeled objtill shows the upper limit. Explain in coincise manner what the sensitivity results represent in reference to the marketing model.

TASK 4: Calculate the differential change in sales. Share your observations.

ANSWER TASK 4: The optimum sales for lpmark is 43443517. The optimum sales for lpmark1 is 43443641. The differential change in sales between the two vales is 124.

TASK 6: Calculate the differential change in sales. Share your observations.

PART B: MONTE CARLO SIMULATION

TASK 7: Repeat the above calculations for the case where the number of simulations/samples is equal to 1000. record the mean, standard deviation, and probability. Name all the required variables as runs1, sims1, average1, std1, and prob1.

TASK 8: Repeat the above calculations for the case where the number of simulations/samples is equal to 10000. record the mean, standard deviation, and probability. Name all the required variables as runs2, sims2, average2, std2, and prob2.

TASK 9: List in a tabular form the values for mean, standard deviation, and probability for all three cases: 100, 1000, and 10000 simulations.

TASK 10: Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this similar to the image use case to calculate pi that was presented in the introductory paragraph?

2C) Repeat the exercise for the S&P daily return where all is equal except we are now interested in the weekly cumulative return and the probability that the weekly cummulative return is greater than 5%. Set the number of simulations to 10000.

TASK 1: The results have two parts which are: the output labeled `objfrom` shows the lower limit of the coefficients while the output labeled `objtill` shows the upper limit. Explain in coincise manner what the sensitivity results represent in reference to the marketing model.

TASK 10: Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this similar to the image use case to calculate `pi` that was presented in the introductory paragraph?