In this lab we will focus on sensitivity analysis and Monte Carlo simulations.
Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. We will use the lpSolveAPI R-package as we did in the previous lab.
Monte Carlo Simulations utilize repeated random sampling from a given universe or population to derive certain results. This type of simulation is known as a probabilistic simulation, as opposed to a deterministic simulation.
An example of a Monte Carlo simulation is the one applied to approximate the value of pi. The simulation is based on generating random points within a unit square and see how many points fall within the circle enclosed by the unit square (marked in red). The higher the number of sampled points the closer the result is to the actual result. After selecting 30,000 random points, the estimate for pi is much closer to the actual value within the four decimal points of precision.
In this lab, we will learn how to generate random samples with various simulations and how to run a sensitivity analysis on the marketing use case covered so far.
Remember to always set your working directory to the source file location. Go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Read carefully the below and follow the instructions to complete the tasks and answer any questions. Submit your work to RPubs as detailed in previous notes.
For your assignment you may be using different data sets than what is included here. Always read carefully the instructions on Sakai. Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their particular placement in the task section.
In order to conduct the sensitivity analysis, we will need to download again the lpSolveAPI package unless you have it already installed in your R environment
# Require will load the package only if not installed
# Dependencies = TRUE makes sure that dependencies are install
if(!require("lpSolveAPI",quietly = TRUE))
install.packages("lpSolveAPI",dependencies = TRUE, repos = "https://cloud.r-project.org")
We will revisit and solve again the marketing case discussed in class (also part of previous lab).
# We start with `0` constraint and `2` decision variables. The object name `lpmark` is discretionary.
lpmark = make.lp(0, 2)
# Define type of optimization as maximum and dump the screen output into a `dummy` variable
dummy = lp.control(lpmark, sense="max")
# Set the objective function coefficients
set.objfn(lpmark, c(275.691, 48.341))
Above, lpmark has 0 constraints so far and 2 decision variables. We set the coefficients to Xradio and Xtv to 275 and 48 respectively. Add all 6 constraints to the model.
add.constraint(lpmark, c(1, 1), "<=", 350000)
add.constraint(lpmark, c(1, 0), ">=", 15000)
add.constraint(lpmark, c(0, 1), ">=", 75000)
add.constraint(lpmark, c(2, -1), "=", 0)
add.constraint(lpmark, c(1, 0), ">=", 0)
add.constraint(lpmark, c(0, 1), ">=", 0)
Now, view the problem setting in tabular/matrix form. This is a good checkpoint to confirm that our contraints have been properly set.
lpmark
## Model name:
## C1 C2
## Maximize 275.691 48.341
## R1 1 1 <= 350000
## R2 1 0 >= 15000
## R3 0 1 >= 75000
## R4 2 -1 = 0
## R5 1 0 >= 0
## R6 0 1 >= 0
## Kind Std Std
## Type Real Real
## Upper Inf Inf
## Lower 0 0
# solve
solve(lpmark)
## [1] 0
In tabular form, we see that all constraints are correctly represented and c1= Xradio and c2=Xtv. Next we get the optimum results.
# display the objective function optimum value
get.objective(lpmark)
## [1] 43443517
# display the decision variables optimum values
get.variables(lpmark)
## [1] 116666.7 233333.3
As in previous labs, we get that the optimum sales is $43,443,517, and the optimum value for Xradio and Xtv is ~$116,666 and ~$233,333 respectively. Theee two values satisfy all the constraints listed above. For the sensitivity part we will add two new code sections to obtain the sensitivity results.
# display sensitivity to coefficients of objective function.
get.sensitivity.obj(lpmark)
## $objfrom
## [1] -96.6820 -137.8455
##
## $objtill
## [1] 1e+30 1e+30
objfrom. Explain in coincise manner what the sensitivity results represent in reference to the marketing model.The values above calculate the sensivity to coefficients of the objective function: z=275Xradio + 48Xtv. If we changed the coefficient for Xradio from ~275 to -96.68, then the optimum solutions for the decision variables will not change. Same with the coefficient for Xtv. It can range from ~-137.85 to infinity without changing the the optimum solutions for Xradio and Xtv. The objtill is the same for both variables, meaning it can go all the way to infinity without changing the solutions for the decision variables.
# display sensitivity to right hand side constraints.
# There will be a total of m+n values where m is the number of contraints and n is the number of decision variables
get.sensitivity.rhs(lpmark)
## $duals
## [1] 124.12433 0.00000 0.00000 75.78333 0.00000 0.00000 0.00000
## [8] 0.00000
##
## $dualsfrom
## [1] 1.125e+05 -1.000e+30 -1.000e+30 -3.050e+05 -1.000e+30 -1.000e+30
## [7] -1.000e+30 -1.000e+30
##
## $dualstill
## [1] 1.00e+30 1.00e+30 1.00e+30 4.75e+05 1.00e+30 1.00e+30 1.00e+30 1.00e+30
duals. Explain in coincise manner what the two non-zero sensitivity results represent. Distinguish the binding/non-binding constraints, the surplus/slack, and marginal values.The two nonzero sensitivity results signify binding relationships that will involve a marginal value. The first value of 124. 12 is a marginal value and means that if you added $1 to the budget constraint of $350,000, then optimum sales would increase by $124.12. The marginal value of 75.78 signifies that if you added $1 to the fourth constraint that is 2Xradio - Xtv = 0, then optimum sales would increase by $75.78. When we are interesting in finging sensitvity to constraints then we focus on the right side of the tabular formation. Binding relationships will result in a marginal value, as just mentioned. Non-binding relationships will result in a 0 value, meaning there is surplus or slack. To calculate this value, you would take the optimum solution for the variable and subtract the respective constraint from it.
To acquire a better understanding of the sensitivity results, and to confirm integrity of the calculations, independent tests can be conducted.
For this task we will be running a Monte Carlo simulation to calculate the probability that the daily return from S&P will be > 5%. We will assume that the historical S&P daily return follows a normal distribution with an average daily return of 0.03 (%) and a standard deviation of 0.97 (%).
To begin we will generate 100 random samples from the normal distribution. For the generated samples we will calculate the mean, standard deviation, and probability of occurrence where the simulation result is greater than 5%.
To generate random samples from a normal distribution we will use the rnorm() function in R. In the example below we set the number of runs (or samples) to 100.
# number of simulations/samples
runs = 100
# random number generator per defined normal distribution with given mean and standard deviation
sims = rnorm(runs,mean=0.03,sd=0.97)
# Mean calculated from the random distribution of samples
average = mean(sims)
average
## [1] 0.05315012
# STD calculated from the random distribution of samples
std = sd(sims)
std
## [1] 0.910721
# probability of occurrence on any given day based on samples will be equal to count (or sum) where sample result is greater than 5% divided by total number of samples.
prob = sum(sims >=0.05)/runs
prob
## [1] 0.46
By taking a random sample with n=100, we notice that the mean is not very close to the mean of the actual population, however the standard deviation is. The probability of getting a sample greater than 5% in regards to our sample set is 0.42.
# Repeat calculations here
runs1 = 1000
sims1 = rnorm(runs1,mean=0.03,sd=0.97)
average1 = mean(sims1)
average1
## [1] 0.03309758
std1 = sd(sims1)
std1
## [1] 0.9714898
prob1 = sum(sims1 >=0.05)/runs1
prob1
## [1] 0.497
runs2 = 10000
sims2 = rnorm(runs2,mean=0.03,sd=0.97)
average2 = mean(sims2)
average2
## [1] 0.02477478
std2 = sd(sims2)
std2
## [1] 0.9645229
prob2 = sum(sims2 >=0.05)/runs2
prob2
## [1] 0.4917
pi that was presented in the introductory paragraph?NoOfSims Mean Standard Dev. Probability 100 0.0006465363 1.043655 0.44 1000 0.05475066 0.9681247 0.496 10,000 0.04871212 0.9629738 0.4974
The population data set has a mean of 0.03 and standard deviation of 0.97. As “n” increases, or as more simulations are done, the values for mean get closer to the mean for the population, and the probabilities increase. The standard deviation, on the other hand is closest to 0.97 when only 1000 simulations are done. The probability of occurrence greater than 5% is when there are 10,000 simulations because there is a larger number of “n,” meaning that more samples are taken, increasaing the chance. It is similar to the image use case calculating pi because as “n” increased, the closer it was to the actual value. As with the pi scenario, the more sample points taken or added, the closer the calculations are to the actual value, that is the closer they are to 0.03 and 0.97.
The last 2C) exercise is optional for those interested in further enhancing their subject matter learning, and refining their skills in R. Your work will be assessed but you will not be graded for this exercise. You can follow the instructions presented in the video Excel equivalent example at [https://www.youtube.com/watch?v=wKdmEXCvo9s]