In this lab we will focus on sensitivity analysis and Monte Carlo simulations.
Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. We will use the lpSolveAPI R-package as we did in the previous lab.
Monte Carlo Simulations utilize repeated random sampling from a given universe or population distribution to derive certain results. This type of simulation is known as a probabilistic simulation, as opposed to a deterministic simulation.
An example of a Monte Carlo simulation is the one applied to approximate the value of pi. The simulation is based on generating random points within a unit square and see how many points fall within the circle enclosed by the unit square (marked in red). The value of pi is estimated by the number of points inside the circle over the total number of points generated inside the square. The higher the number of sampled points the closer the result is to the actual result. After selecting 30,000 random points, the estimate for pi is much closer to the actual value within the four decimal points of precision. The interested and curious reader is encouraged to explore the mathematical foundation behind the logic.
In this lab, we will learn how to generate random samples with various simulations and how to run a sensitivity analysis on the marketing use case covered so far.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. For clarity, tasks/questions to be completed/answered are highlighted in red color (visible in preview) and numbered according to their particular placement in the task section. Quite often you will need to add your own code chunk.
Execute all code chunks, preview, publish, and submit link on Sakai.
In order to conduct the sensitivity analysis, we will need to download again the lpSolveAPI package unless you have it already installed in your R environment
# Require will load the package only if not installed
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lpSolveAPI",quietly = TRUE))
install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")
We will revisit and solve again the marketing case discussed in class (also part of previous lab).
# We start with `0` constraint and `2` decision variables. The object name `lpmark` is discretionary.
lpmark = make.lp(0, 2)
# Define type of optimization as maximum and dump the screen output into a `dummy` variable
dummy = lp.control(lpmark, sense="max")
# Set the objective function coefficients
set.objfn(lpmark, c(275.691, 48.341))
Add all constraints to the model.
add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">=", 15000)
add.constraint(lpmark, c(0, 1), ">=", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)
Now, view the problem setting in tabular/matrix form. This is a good checkpoint to confirm that our contraints have been properly set.
lpmark
## Model name:
## C1 C2
## Maximize 275.691 48.341
## R1 1 1 <= 350000
## R2 1 0 >= 15000
## R3 0 1 >= 75000
## R4 2 -1 = 0
## R5 1 0 >= 0
## R6 0 1 >= 0
## Kind Std Std
## Type Real Real
## Upper Inf Inf
## Lower 0 0
# solve
solve(lpmark)
## [1] 0
Next we get the optimum results.
# display the objective function optimum value
get.objective(lpmark)
## [1] 43443517
# display the decision variables optimum values
get.variables(lpmark)
## [1] 116666.7 233333.3
For the sensitivity part we will add two new code sections to obtain the sensitivity results. Frist, we will obtain the sensitivity values due to changes in the coefficients of the objective function.
# display sensitivity to coefficients of objective function.
get.sensitivity.obj(lpmark)
## $objfrom
## [1] -96.6820 -137.8455
##
## $objtill
## [1] 1e+30 1e+30
The first part of the output labeled objfrom displays the lower limit/boundary and the output labeled objtill displays the upper limit/boundary of the objective function coefficients sensitivity.
##### 1A) Explain in clear words what the sensitivity boundary values (lower/upper limits) represent in context of the marketing model. Refer to class notes.
In doing the sensitivity anlysis for the coefficients of the objective function, “Z = 275.691Xradio + 48.341Xtv,” we are able to see how much these coefficients can change without affecting the optimum solutions for Xradio and Xtv, which would consequently affect the objective function (sales). So, after running the command for objective function sensitivity, we receive the the range that is allowed for each variable, Xradio and Xtv, to change without the optimum solution for the objective function of sales also changing. Thus, Xradio has a range of a lower limit of -96.682 to an upper limit of infinity. Meaning, the optimum solution for Xradio will remain the same so long as the coeffeicient remains in between these boundaries. Additionally, for Xtv, there is a range that consists of a lower limit -137.8455 and an upper limit of infinity. This, as well, means that the optimum solution for Xtv will remain the same so long as the coeffeicient remains in between these boundaries.
Looking at the big picture now, the optimum value for sales will remain the same so long as Xradio and Xtv operate within the ranges that we found. If the variables go outside of their range, not only will the optimum values for the two variables change, but the optimum objective function (sales) value will change as a result.
Next we will obtain the sensitivity results due to changes in resources impacting the constraints.
# display sensitivity to right hand side constraints.
# There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark)
## $duals
## [1] 124.12433 0.00000 0.00000 75.78333 0.00000 0.00000 0.00000
## [8] 0.00000
##
## $dualsfrom
## [1] 1.125e+05 -1.000e+30 -1.000e+30 -3.050e+05 -1.000e+30 -1.000e+30
## [7] -1.000e+30 -1.000e+30
##
## $dualstill
## [1] 1.00e+30 1.00e+30 1.00e+30 4.75e+05 1.00e+30 1.00e+30 1.00e+30 1.00e+30
For this exercise we are only interested in the first six values (corresponding to the six constraints) of the output labeled duals.
##### 1B) Explain in clear words what each of the zero and non-zero sensitivity values represent. In your explanation identify the binding/non-binding constraints, the surplus/slack, and the marginal values.
In order to explain what the zero and non-zero sensitivity values represent, it is necessary to understand what the meanings of binding and non-binding constraint sensitivity are. Binding is when the constaint is limiting, whereby any increase will impact the optimal solution (meaning there is no surplus in the constraint); also implying that there is a marginal value. The marginal value is the rate/amount at which Z is increased/decreased when the budget, for example, is increased/decreased (usually by one unit), keeping all else the same. Consequently, a non-binding constraint is when increasing it, it has no impact on optimal solution ( meaning there is a surplus, or slack in resource); also implying there is no marginal value.
Now, in analyzing our output, the values that have zero values are non-binding. This is because non-binding resources allow a constraint to be increased without impacting the optimal solution; resulting in slack, or surplus, in the constraint. As a result, there will be no marginal value, and no consequent dual price in the output (zero). Furthermore, the non-zero values in our sensitivity report signal a binding constraint. This means that any increase in the constraint will impact the optimal solution, resulting in no surplus. Because there is no surplus, or slack, when the constraint is increased by one unit ($1), the values that are not zero constitute the resulting amount that our optimum solution will increase by; also known as the marginal value. For our first constraint that is binding, if budget is now set to 350,001 (instead of 350,000) the optimum solution of sales will be increased by 124.12433. For the fourth constraint, if “2Xradio - 1Xtv = 1 now” (instead of 0) we would see an increase of 75.78333 in the optimum solution of sales, as observed in the output.
To acquire a better understanding of the sensitivity results, and to confirm integrity of the calculations, independent tests can be conducted. To demonstrate, we will repeat the linear programming (LP) optimization problem by slightly tweaking one binding constraint and verifying the impact.
##### 1C) Define a new model object lpmark1 and add the constraints exactly as entered at beginning of this task. All being equal, change the budget constraint by a marginal $1 value and solve the LP problem. Note the new optimum value for sales. Calculate the differential change in optimum sales from the earlier computed optimum sales, and compare your calculation to the value obtained from the sensitivity calculation.
# Define a new model object called lpmark1
lpmark1 = make.lp(0, 2)
# Repeat rest of commands with the one constraint change for budget. Solve and display the objective function optimum value
dummy = lp.control(lpmark1, sense="max")
# Set the objective function coefficients
set.objfn(lpmark1, c(275.691, 48.341))
add.constraint(lpmark1, c(1, 1), "<=", 350001)
add.constraint(lpmark1, c(1, 0), ">=", 15000)
add.constraint(lpmark1, c(0, 1), ">=", 75000)
add.constraint(lpmark1, c(2, -1), "=", 0)
add.constraint(lpmark1, c(1, 0), ">=", 0)
add.constraint(lpmark1, c(0, 1), ">=", 0)
lpmark1
## Model name:
## C1 C2
## Maximize 275.691 48.341
## R1 1 1 <= 350001
## R2 1 0 >= 15000
## R3 0 1 >= 75000
## R4 2 -1 = 0
## R5 1 0 >= 0
## R6 0 1 >= 0
## Kind Std Std
## Type Real Real
## Upper Inf Inf
## Lower 0 0
# solve
solve(lpmark1)
## [1] 0
# display the objective function optimum value
get.objective(lpmark1)
## [1] 43443641
# display the decision variables optimum values
get.variables(lpmark1)
## [1] 116667 233334
# display sensitivity to coefficients of objective function.
get.sensitivity.obj(lpmark1)
## $objfrom
## [1] -96.6820 -137.8455
##
## $objtill
## [1] 1e+30 1e+30
# display sensitivity to right hand side constraints.
# There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark1)
## $duals
## [1] 124.12433 0.00000 0.00000 75.78333 0.00000 0.00000 0.00000
## [8] 0.00000
##
## $dualsfrom
## [1] 1.12500e+05 -1.00000e+30 -1.00000e+30 -3.05001e+05 -1.00000e+30
## [6] -1.00000e+30 -1.00000e+30 -1.00000e+30
##
## $dualstill
## [1] 1.00000e+30 1.00000e+30 1.00000e+30 4.75002e+05 1.00000e+30 1.00000e+30
## [7] 1.00000e+30 1.00000e+30
# Lpmark1 optimum sales minus Lpmark optimum sales (in order to test the variation)
43443641 - 43443517
## [1] 124
The new optimum value for sales is increased by $124 in the new calculation for optimum sales. This makes sense because in doing our sensitivity analysis to the original problem, we discovered that our constraint of “Xradio + Xtv <= 350,000” is binding. In doing further analysis, we concluded that when one unit (1 dollar) is added to this constrain, our marginal value is $124. This was tested by running a new model with all else the same except for the constraint now was “Xradio + Xtv <= 350,001.” Our new optimum value for the objective function (sales) was, sure enough, $124 more than the original optimum function value for sales.
##### 1D) Based on the previous exercise explain in clear words, and without running another solver again , how would you check the integrity of the other marginal value identified in 1B).
Without running another solver again, you could check the integrity of the other constraint that is binding, which leads to a marginal value, by using logic. The constraint “2Xradio - Xtv = 0” states that the two variables, however set up, have to equal 0. So any increase to the right side of the constraint would break the constraint, meaning it is a binding constrain. Since the constraint is binding, there is a resulting marginal value when the constraint is increased by one unit ($1). The marginal value that the sensitivity report told us is 75.78333, meaning that if the constraint where to be “2Xradio - Xtv = 1,” the optimum solution of sales would be increased by that 75 amount.
In order to test this for an exact answer however, we would run an identical model as done above, except for changing the constraint to the new “2Xradio - Xtv = 1.” After running all the same functions/commands as above as well, we should be able to see that the new optimum solution for sales is $75 more than the orginal model’s optimum solution.
For this task we will be running a Monte Carlo simulation to calculate the probability that the daily return from S&P will be > 5%. We will assume that the historical S&P daily return follows a normal distribution with an average daily return of 0.03 (%) and a standard deviation of 0.97 (%). Consider this as our population.
To begin we will generate 100 random samples from the normal distribution. For the generated samples we will calculate the mean, standard deviation, and probability of occurrence where the simulation result is greater than 5%.
To generate random samples from a normal distribution we will use the rnorm() function in R. In the example below we set the number of runs (or samples) to 100.
# number of simulations/samples
runs = 100
# random number generator per defined normal distribution with given mean and standard deviation
sims = rnorm(runs,mean=0.03,sd=0.97)
# Mean calculated from the random distribution of samples
average = mean(sims)
average
## [1] -0.02796551
# STD calculated from the random distribution of samples
std = sd(sims)
std
## [1] 0.9096555
# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples.
prob = sum(sims >=0.05)/runs
prob
## [1] 0.48
##### 2A) Repeat the above calculations for the two cases where the number of simulations/samples is equal to 1000 and 10000. For each case record the mean, standard deviation, and probability.
# Repeat calculations here
#1,000
# number of simulations/samples
runs = 1000
# random number generator per defined normal distribution with given mean and standard deviation
sims = rnorm(runs,mean=0.03,sd=0.97)
# Mean calculated from the random distribution of samples
average = mean(sims)
average
## [1] 0.02504335
# STD calculated from the random distribution of samples
std = sd(sims)
std
## [1] 0.9576497
# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples.
prob = sum(sims >=0.05)/runs
prob
## [1] 0.484
#10,000
# number of simulations/samples
runs =10000
# random number generator per defined normal distribution with given mean and standard deviation
sims = rnorm(runs,mean=0.03,sd=0.97)
# Mean calculated from the random distribution of samples
average = mean(sims)
average
## [1] 0.03949036
# STD calculated from the random distribution of samples
std = sd(sims)
std
## [1] 0.9639407
# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples.
prob = sum(sims >=0.05)/runs
prob
## [1] 0.4932
##### 2B) List in seperate lines the values for mean, standard deviation, and probability for all three cases: 100, 1000, and 10000 simulations. Describe how the values change/behave as the number of simulations is increased. What is your best bet on the probability of occurrence greater than 5% and why? How is this behavior similar to the introductory image use case demonstration to calculate pi?
Sample of 100: Mean: 0.05164983 Standard Deviation: 0.9518847 Probability: 0.49
Sample of 1,000: Mean: 0.04577516 Standard Deviation: 0.9669221 Probability: 0.501
Sample of 10,000: Mean: 0.01670423 Standard Deviation: 0.9838422 Probability: 0.4837
Overall, the values for mean and standard deviation that we are focusing should get closer to the actual stipulated values (.03 and .97, respectively) as the number of samples used increases. However, since monte carlo simulations are random, we are not guaranteed that is is the exact case, as seen above in our listing of the results. My best bet on the probability of occurrence of a daily return greater than 5% would be 48.37%. This is taken from the probability (.4837) that resulted from our sample of 10,000. I would do this because, as stated above, your chance for a more exact answer when searching for our stated mean and standard deviation is more probable when looking at a larger number of sample size. Relating to the introductory image on calculating pi, the higher the number of sampled points, the closer the result is to the actual result. So for the example in the beginning, only after testing roughly 30,000 points was the estimate for pi closer to the actaul value.