About

In this lab, we will learn how to run a sensitivity analysis on the marketing use case, and how to generate random samples from a simple normal distribution similar to a Monte Carlo simulation.

Setup

Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work in Sakai as detailed in previous notes.

Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions provided, before executing any included code chunks and/or adding your own code. For clarity, tasks/questions to be completed/answered are highlighted in red color and numbered according to their particular placement in the task section. The red color is only apparent when in Preview mode. Quite often you will need to add your own code chunk.

Execute all code chunks (already included and own added), preview, check integrity, and submit final work (\(html\) file) in Sakai.


Task 1: Sensitivity Analysis

Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. We will use the lpSolveAPI R-package as we did in the previous lab.

In order to conduct a sensitivity analysis, we will need to download again the lpSolveAPI package unless you have it already installed in your R environment

# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lpSolveAPI",quietly = TRUE))
  install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")

We will revisit and solve again the marketing case discussed in class (also part of previous lab).

# We start with `0` constraint and `2` decision variables. The object name `lpmark` is discretionary.
lpmark = make.lp(0, 2)

# Define type of optimization as maximum and dump the screen output into a `dummy` variable
dummy = lp.control(lpmark, sense="max") 
# Set the objective function coefficients 
set.objfn(lpmark, c(275.691, 48.341))

Add all constraints to the model.

add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">=", 15000)
add.constraint(lpmark, c(0, 1), ">=", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)

Now, view the problem setting in tabular/matrix form. This is a good checkpoint to confirm that our contraints have been properly set.

lpmark
Model name: 
               C1       C2            
Maximize  275.691   48.341            
R1              1        1  <=  350000
R2              1        0  >=   15000
R3              0        1  >=   75000
R4              2       -1   =       0
R5              1        0  >=       0
R6              0        1  >=       0
Kind          Std      Std            
Type         Real     Real            
Upper         Inf      Inf            
Lower           0        0            
# solve
solve(lpmark) 
[1] 0

Next we get the optimum results.

# display the objective function optimum value
get.objective(lpmark)
[1] 43443517
# display the decision variables optimum values
get.variables(lpmark)
[1] 116666.7 233333.3

For sensitivity we will introduce the code command to obtain the sensitivities due to changes in resources as expressed in the right hand side (rhs) constraints.

# display sensitivity to rhs constraints. 
# There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark) 
$duals
[1] 124.12433   0.00000   0.00000  75.78333   0.00000
[6]   0.00000   0.00000   0.00000

$dualsfrom
[1]  1.125e+05 -1.000e+30 -1.000e+30 -3.050e+05
[5] -1.000e+30 -1.000e+30 -1.000e+30 -1.000e+30

$dualstill
[1] 1.00e+30 1.00e+30 1.00e+30 4.75e+05 1.00e+30
[6] 1.00e+30 1.00e+30 1.00e+30

For this exercise we are only interested in the first six values (corresponding to the six constraints) of the output labeled duals.

##### 1A) Identify and explain what the binding/non-binding constraints, the surplus/slack, and the marginal values are and what they represent in the context of our marketing use case (4pts) Binding values have nonzero marginal values. In this case, it is the first and forth contraints. (max budget of 350,000 [x1+x2<=350000] and tv ads must be 2x more than radio ads [2x1-x1=0]) The marginal value for the first constraint means that for each $1 increase in the overall budget (350,000 to 350,001) sales will increase by 124.12 dollars The marginal value for the forth constraint is that for each dollar increase in the limit/contraint, the sales will increase by 75.78 dollars. The marginal value for the second, third, fifth and sixth constraint means that for every $1 increase in their limit (ex: second constraint goes from 15,000 to 15,001, etc.), sales will increase by zero dollars. The surplus of the first constraint is zero. This means that the outcome was equal to the limit. (116,666.7+223,3333.3 =350,000 350,000-350,000=0) The surplus of the second constraint is 101,666.7. This means the outcome was over the minimun by 101,666.7 (116,666.7-15,000=101,666.7) The Surplus of the third constraint is 233,333.3. This means the outcome was over the minimum by 233,333.3 (233,333.3-75,000=158,333.3) The surplus of the forth constraint is zero. This means that the outcome was equal to the limit. (2(116,666.7)-1(233,333.3)=0) The surplus of the fifth constraint is 116,666.7. This means the outcome was over the minimym by 116,666.7 (116,666.7-0=116,666.7) The surplus of the sixth constraint is 233,333.3. This means the outcome was over the minimum by 233,333.3 (233,333.3-0=233,333.3) The non binding contraints are the second, third, fifth, and sixth constraints. (x1>=15000, x2=>75000, x1>=0, and x2>=0)

To acquire a better understanding of the sensitivity results, and to confirm integrity of the calculations, independent tests can be conducted. To demonstrate, we will repeat the linear programming (LP) optimization problem by slightly tweaking one binding constraint and verifying the impact.

##### 1B) Define a new model object lpmark1 and add the constraints exactly as entered at beginning of this task. All being equal, change the budget constraint by a marginal $1 value and solve the LP problem. Note the new optimum value for sales. Calculate the differential change in optimum sales from the earlier computed optimum sales, and compare your calculation to the value obtained from the prior sensitivity calculation (4pts)

# Define a new model object called lpmark1
lpmark1 = make.lp(0, 2)
dummy = lp.control(lpmark1, sense="max") 
# Set the objective function coefficients 
set.objfn(lpmark1, c(275.691, 48.341))
# Repeat rest of commands with the one constraint change for budget. Solve and display the objective function optimum value
add.constraint(lpmark1, c(1, 1), "<=", 350001)
add.constraint(lpmark1, c(1, 0), ">=", 15000)
add.constraint(lpmark1, c(0, 1), ">=", 75000)
add.constraint(lpmark1, c(2, -1), "=", 0)
add.constraint(lpmark1, c(1, 0), ">=", 0)
add.constraint(lpmark1, c(0, 1), ">=", 0)
lpmark1
Model name: 
               C1       C2            
Maximize  275.691   48.341            
R1              1        1  <=  350001
R2              1        0  >=   15000
R3              0        1  >=   75000
R4              2       -1   =       0
R5              1        0  >=       0
R6              0        1  >=       0
Kind          Std      Std            
Type         Real     Real            
Upper         Inf      Inf            
Lower           0        0            
solve(lpmark1) 
[1] 0
get.objective(lpmark1)
[1] 43443641
get.variables(lpmark1)
[1] 116667 233334

This data (lpmark1) shows sales is 43,443,641 compared to 43443517 sales from lpmark. This difference is a $124 difference. This number matches the marginal value calculated in the previous park. This means for the dollar increase in budget, sales increased by 124 dollars, which it did.

1C) Following the logic of calculations in 1B) and without running another code, explain the steps to validate the integrity of the other remaining marginal value identified in 1A). Use a different line to explain each step (4pts)

You could use another lpmark to do that. You would define another lpmark, called lpmark2 and make it analze two variables. You would set the objective to maximize sales and use the function coefficients to have x1=275.691 and x2=48.341. Then you would set the contraints. You would use the same ones as lpmark, but the forth constraint would be 2x1-1x1=1, instead of 2x1-1x1=0. Then you would solve, get the objective, and get the variables. Then you could see the new maximized sales. You would use this to find the difference from lmpark sales and lpmark2 sales.This difference should equal $75, the same as the marginal value.


Task 2: Random Sampling & Monte Carlo Simulation

Monte Carlo Simulations utilize repeated random sampling from a given universe or population distribution to derive certain results. This type of simulation is known as a probabilistic simulation, as opposed to a deterministic simulation.

An example of a Monte Carlo simulation is the one applied to approximate the value of pi. The simulation is based on generating random points within a unit square and see how many points fall within the circle enclosed by the unit square (marked in red). The value of pi is estimated by the number of points inside the circle over the total number of points generated inside the square. The higher the number of sampled points the closer the result is to the actual result. After selecting 30,000 random points, the estimate for pi is much closer to the actual value within the four decimal points of precision. The interested and curious reader is encouraged to explore the mathematical foundation behind the logic.

For this task we will be running a Monte Carlo simulation to calculate the probability that the daily return from S&P will be > 5%. We will assume that the historical S&P daily return follows a normal distribution with an average daily return of 0.03 (%) and a standard deviation of 0.97 (%). Consider this as our population.

To begin we will generate 100 random samples from the normal distribution. For the generated samples we will calculate the mean, standard deviation, and probability of occurrence where the simulation result is greater than 5%.

To generate random samples from a normal distribution we will use the rnorm() function in R. In the example below we set the number of runs (or samples) to 100.

# number of simulations/samples
runs = 100
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)
# Mean calculated from the random distribution of samples
average = mean(sims)
average
[1] -0.04732575
# STD calculated from the random distribution of samples
std = sd(sims) 
std
[1] 0.9802492
# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples. 
prob = sum(sims >=0.05)/runs
prob
[1] 0.474

##### 2A) Repeat the above calculations for the two cases where the number of simulations/samples is equal to 1000 and 10000. For each case record the mean, standard deviation, and probability (4pts)

# Repeat calculations here
# simulations/sample
runs = 1000
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)
##Mean calculated from the random distribution of samples
average = mean(sims)
average
[1] 0.03134824
# STD calculated from the random distribution of samples
std = sd(sims) 
std
[1] 0.9733819
prob = sum(sims >=0.05)/runs
probprob = sum(sims >=0.05)/runs
prob
[1] 0.483
# Repeat calculations here
# simulations/sample
runs = 10000
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)
##Mean calculated from the random distribution of samples
average = mean(sims)
average
[1] 0.02370166
# STD calculated from the random distribution of samples
std = sd(sims) 
std
[1] 0.9676935
prob = sum(sims >=0.05)/runs
probprob = sum(sims >=0.05)/runs
prob
[1] 0.4852

##### 2B) List in seperate lines the values for mean, standard deviation, and probability for all three cases: 100, 1000, and 10000 simulations. Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this behavior similar to the use case demonstration to calculate pi? (4pts) 100 runs: mean: -0.04732575 standard deviation: 0.9802492 probability: 0.474

1000 runs: mean:0.03134824 standard deviation: 0.9733819 probability:0.483

10000 runs: mean:0.02370166 standard deviation:0.9676935 probability:0.4852

As the number of runs increased, the probability got bigger, but the increase in probability was not as great from 1000 to 10000 runs compared to the 100 to 1000 run change. The standard deviation decreased as the run count got larger. The same case was true here, as the standard deviation changed from 1000 to 10000 runs, the change was not as large as the change from 100 to 1000 runs. This is similar to a graph converging to a line. As the x value, or # of runs increases, it will get closer and closer to the true value. I think the 10,000 run simulation would be closer to the actual true value. The larger the sample/number of runs, the less impact an outlier would have on the data. Just like in statistics, the larger sample gives better data, in accuracy and representation. Just like pi, after selecting more random points, (pi was 30,000 but here it’s 10,000) the estimate will get closer to the actual value.

---
title: "Sensitivity Analysis & Simulations (lab09)"
author: "Lauren Kroll"
date: "4/9/2020"
output:
  html_notebook: default
  html_document: default
  pdf_document: default
subtitle: BSAD 343H, Business Analytics, Spring 2020
---


### About

In this lab, we will learn how to run a sensitivity analysis on the marketing use case, and how to generate random samples from a simple normal distribution similar to a Monte Carlo simulation.

### Setup

Remember to always set your working directory to the source file location. Go to 'Session', scroll down to 'Set Working Directory', and click 'To Source File Location'. Read carefully the below and follow the instructions to complete the tasks and answer any questions.  Submit your work in Sakai as detailed in previous notes. 

### Note

For your assignment you may be using different data sets than what is included here. Always read carefully the instructions provided, before executing any included code chunks and/or adding your own code.  For clarity, tasks/questions to be completed/answered are highlighted in red color and numbered according to their particular placement in the task section.  The red color is only apparent when in Preview mode. Quite often you will need to add your own code chunk.

Execute all code chunks (already included and own added), preview, check integrity, and submit final work ($html$ file) in Sakai.

--------------

### Task 1: Sensitivity Analysis

Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. We will use the `lpSolveAPI` R-package as we did in the previous lab. 

In order to conduct a sensitivity analysis, we will need to download again the `lpSolveAPI` package unless you have it already installed in your R environment

```{r}
# Require will load the package only if not installed 
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lpSolveAPI",quietly = TRUE))
  install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")
```

We will revisit and solve again the marketing case discussed in class (also part of previous lab). 
```{r}
# We start with `0` constraint and `2` decision variables. The object name `lpmark` is discretionary.
lpmark = make.lp(0, 2)

# Define type of optimization as maximum and dump the screen output into a `dummy` variable
dummy = lp.control(lpmark, sense="max") 
# Set the objective function coefficients 
set.objfn(lpmark, c(275.691, 48.341))
```

Add all constraints to the model.

```{r}
add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">=", 15000)
add.constraint(lpmark, c(0, 1), ">=", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)
```

Now, view the problem setting in tabular/matrix form. This is a good checkpoint to confirm that our contraints have been properly set.

```{r}
lpmark
# solve
solve(lpmark) 
```

Next we get the optimum results.
```{r}
# display the objective function optimum value
get.objective(lpmark)

# display the decision variables optimum values
get.variables(lpmark)
```

For sensitivity we will introduce the code command to obtain the sensitivities due to changes in resources as expressed in the right hand side (rhs) constraints. 

```{r}
# display sensitivity to rhs constraints. 
# There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark) 
```

For this exercise we are only interested in the first six values (corresponding to the six constraints) of the output labeled `duals`. 

<span style="color:red">
##### 1A)  Identify and explain what the binding/non-binding constraints, the surplus/slack, and the marginal values are and what they represent in the context of our marketing use case (4pts)
</span>
Binding values have nonzero marginal values. In this case, it is the first and forth contraints. (max budget of 350,000 [x1+x2<=350000] and tv ads must be 2x more than radio ads [2x1-x1=0])
The marginal value for the first constraint means that for each $1 increase in the overall budget (350,000 to 350,001) sales will increase by 124.12 dollars
The marginal value for the forth constraint is that for each dollar increase in the limit/contraint, the sales will increase by 75.78 dollars.
The marginal value for the second, third, fifth and sixth constraint means that for every $1 increase in their limit (ex: second constraint goes from 15,000 to 15,001, etc.), sales will increase by zero dollars.
The surplus of the first constraint is zero. This means that the outcome was equal to the limit. (116,666.7+223,3333.3 =350,000 350,000-350,000=0)
The surplus of the second constraint is 101,666.7. This means the outcome was over the minimun by 101,666.7
(116,666.7-15,000=101,666.7)
The Surplus of the third constraint is 233,333.3. This means the outcome was over the minimum by 233,333.3
(233,333.3-75,000=158,333.3)
The surplus of the forth constraint is zero. This means that the outcome was equal to the limit.
(2(116,666.7)-1(233,333.3)=0)
The surplus of the fifth constraint is 116,666.7. This means the outcome was over the minimym by 116,666.7
(116,666.7-0=116,666.7)
The surplus of the sixth constraint is 233,333.3. This means the outcome was over the minimum by 233,333.3
(233,333.3-0=233,333.3)
The non binding contraints are the second, third, fifth, and sixth constraints. (x1>=15000, x2=>75000, x1>=0, and x2>=0)


To acquire a better understanding of the sensitivity results, and to confirm integrity of the calculations, independent tests can be conducted.  To demonstrate, we will repeat the linear programming (LP) optimization problem by slightly tweaking one binding constraint and verifying the impact.

<span style="color:red">
##### 1B)  Define a new model object `lpmark1` and add the constraints exactly as entered at beginning of this task. All being equal, change the budget constraint by a marginal $1 value and solve the LP problem.  Note the new optimum value for sales. Calculate the differential change in optimum sales from the earlier computed optimum sales,  and compare your calculation to the value obtained from the prior sensitivity calculation (4pts)
</span>

```{r}
# Define a new model object called lpmark1
lpmark1 = make.lp(0, 2)
dummy = lp.control(lpmark1, sense="max") 
# Set the objective function coefficients 
set.objfn(lpmark1, c(275.691, 48.341))
# Repeat rest of commands with the one constraint change for budget. Solve and display the objective function optimum value

```
```{r}
add.constraint(lpmark1, c(1, 1), "<=", 350001)
add.constraint(lpmark1, c(1, 0), ">=", 15000)
add.constraint(lpmark1, c(0, 1), ">=", 75000)
add.constraint(lpmark1, c(2, -1), "=", 0)
add.constraint(lpmark1, c(1, 0), ">=", 0)
add.constraint(lpmark1, c(0, 1), ">=", 0)
```
```{r}
lpmark1
solve(lpmark1) 
get.objective(lpmark1)
get.variables(lpmark1)
```
This data (lpmark1) shows sales is 43,443,641 compared to 43443517 sales from lpmark. This difference is a $124 difference. This number matches the marginal value calculated in the previous park. This means for the dollar increase in budget, sales increased by 124 dollars, which it did.
<span style="color:red">

##### 1C)  Following the logic of calculations in 1B) and without running another code, explain the steps to validate the integrity of the other remaining marginal value identified in 1A). Use a different line to explain each step (4pts)
You could use another lpmark to do that.
You would define another lpmark, called lpmark2 and make it analze two variables.
You would set the objective to maximize sales and use the function coefficients to have x1=275.691 and x2=48.341.
Then you would set the contraints. You would use the same ones as lpmark, but the forth constraint would be 2x1-1x1=1, instead of  2x1-1x1=0.
Then you would solve, get the objective, and get the variables.
Then you could see the new maximized sales. 
You would use this to find the difference from lmpark sales and lpmark2 sales.This difference should equal $75, the same as the marginal value.
</span>

----------

### Task 2: Random Sampling & Monte Carlo Simulation

Monte Carlo Simulations utilize repeated random sampling from a given universe or population distribution to derive certain results. This type of simulation is known as a probabilistic simulation, as opposed to a deterministic simulation.

An example of a Monte Carlo simulation is the one applied to approximate the value of `pi`. The simulation is based on generating random points within a unit square and see how many points fall within the circle enclosed by the unit square (marked in red). The value of `pi` is estimated by the number of points inside the circle over the total number of points generated inside the square.  The higher the number of sampled points the closer the result is to the actual result.  After selecting 30,000 random points, the estimate for `pi` is much closer to the actual value within the four decimal points of precision. The interested and curious reader is encouraged to explore the mathematical foundation behind the logic.

![](montecarlo.gif)


For this task we will be running a Monte Carlo simulation to calculate the probability that the daily return from S&P will be > 5%. 
We will assume that the historical S&P daily return follows a normal distribution with an average daily return of 0.03 (%) and a standard deviation of 0.97 (%). Consider this as our population.

To begin we will generate 100 random samples from the normal distribution. For the generated samples we will calculate the mean, standard deviation, and probability of occurrence where the simulation result is greater than 5%. 

To generate random samples from a normal distribution we will use the `rnorm()` function in R. In the example below we set the number of runs (or samples) to 100.

```{r}
# number of simulations/samples
runs = 100
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)
```

```{r}
# Mean calculated from the random distribution of samples
average = mean(sims)
average
```

```{r}
# STD calculated from the random distribution of samples
std = sd(sims) 
std
```

```{r}
# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples. 
prob = sum(sims >=0.05)/runs
prob
```

<span style="color:red">
##### 2A)  Repeat the above calculations for the two cases where the number of simulations/samples is equal to 1000 and 10000.  For each case record the mean, standard deviation, and probability (4pts)
</span>

````{r}
# Repeat calculations here
# simulations/sample
runs = 1000
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)
##Mean calculated from the random distribution of samples
average = mean(sims)
average
# STD calculated from the random distribution of samples
std = sd(sims) 
std
prob = sum(sims >=0.05)/runs
probprob = sum(sims >=0.05)/runs
prob
```
```{r}
# Repeat calculations here
# simulations/sample
runs = 10000
# random number generator per defined normal distribution with given mean and standard deviation
sims =  rnorm(runs,mean=0.03,sd=0.97)
##Mean calculated from the random distribution of samples
average = mean(sims)
average
# STD calculated from the random distribution of samples
std = sd(sims) 
std
prob = sum(sims >=0.05)/runs
probprob = sum(sims >=0.05)/runs
prob
```

<span style="color:red">
##### 2B)  List in seperate lines the values for mean, standard deviation, and probability for all three cases: 100, 1000, and 10000 simulations. Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this behavior similar to the use case demonstration to calculate `pi`?  (4pts) 
</span>
100 runs:
mean: -0.04732575
standard deviation: 0.9802492
probability: 0.474

1000 runs:
mean:0.03134824
standard deviation: 0.9733819
probability:0.483

10000 runs:
mean:0.02370166
standard deviation:0.9676935
probability:0.4852

As the number of runs increased, the probability got bigger, but the increase in probability was not as great from 1000 to 10000 runs compared to the 100 to 1000 run change. The standard deviation decreased as the run count got larger. The same case was true here, as the standard deviation changed from 1000 to 10000 runs, the change was not as large as the change from 100 to 1000 runs. This is similar to a graph converging to a line. As the x value, or # of runs increases, it will get closer and closer to the true value. I think the 10,000 run simulation would be closer to the actual true value. The larger the sample/number of runs, the less impact an outlier would have on the data. Just like in statistics, the larger sample gives better data, in accuracy and representation.
Just like pi, after selecting more random points, (pi was 30,000 but here it's 10,000) the estimate will get closer to the actual value. 



