Problem 1.
1. (Bayesian):
A new credit scoring system has been developed to predict the
likelihood of loan defaults. The system has a 90% sensitivity, meaning
that it correctly identifies 90% of those who will default on their
loans. It also has a 95% specificity, meaning that it correctly
identifies 95% of those who will not default. The default rate among
borrowers is 2%. Given these prevalence, sensitivity, and specificity
estimates, what is the probability that a borrower flagged by the system
as likely to default will actually default? If the average loss per
defaulted loan is $200,000 and the cost to run the credit scoring test
on each borrower is $500, what is the total first-year cost for
evaluating 10,000 borrowers? NOTE: There were many ways to think
about this problem.
dformat=function(w,x) noquote(paste0(w,"=","$", format(x, big.mark=",",nsmall=2)))
pformat=function(x) noquote(paste0("Pr=",x))
myformat=function(w,x) noquote(paste0(w,"=",x))
pformat((.9*.02)/(.9*.02 +.05*.98))
## [1] Pr=0.26865671641791
dformat("TC",500*10000)
## [1] TC=$5,000,000.00
dformat("Net Cost=",(10000*500-.9*200*200000))
## [1] Net Cost==$-31,000,000.00
2. (Binomial):
The probability that a stock will pay a dividend in any given quarter
is 0.7. What is the probability that the stock pays dividends exactly 6
times in 8 quarters? What is the probability that it pays dividends 6 or
more times? What is the probability that it pays dividends fewer than 6
times? What is the expected number of dividend payments over 8 quarters?
What is the standard deviation?
x=6; N=8; p=.7
pformat(dbinom(x,N,p))
## [1] Pr=0.29647548
pformat(sum(dbinom(x:N,N,p)))
## [1] Pr=0.55177381
pformat(pbinom(x-1,N,p))
## [1] Pr=0.44822619
myformat("EX=",N*p)
## [1] EX==5.6
3. (Poisson):
A financial analyst notices that there are an average of 12 trading
days each month when a certain stock’s price increases by more than 2%.
What is the probability that exactly 4 such days occur in a given month?
What is the probability that more than 12 such days occur in a given
month? How many such days would you expect in a 6-month period? What is
the standard deviation of the number of such days? If an investment
strategy requires at least 70 days of such price increases in a year for
profitability, what is the percent utilization and what are your
recommendations?
lambda=12; t=6
pformat(dpois(4,lambda))
## [1] Pr=0.00530859947327557
myformat("EX=", lambda*t)
## [1] EX==72
myformat("SX=", sqrt(lambda*t))
## [1] SX==8.48528137423857
pformat(ppois(69,lambda*t, lower.tail=FALSE))
## [1] Pr=0.608943690746807
myformat("Run with it","Cool")
## [1] Run with it=Cool
4.
(Hypergeometric):
A hedge fund has a portfolio of 25 stocks, with 15 categorized as
high-risk and 10 as low-risk. The fund manager randomly selects 7 stocks
to closely monitor. If the manager selected 5 high-risk stocks and 2
low-risk stocks, what is the probability of selecting exactly 5
high-risk stocks if the selection was random? How many high-risk and
low-risk stocks would you expect to be selected?
S=10; F=15; k=7; s=5
pformat(dhyper(s,S,F,k))
## [1] Pr=0.0550447264406074
myformat("EF", S*k/(S+F) )
## [1] EF=2.8
myformat("EF", F*k/(S+F) )
## [1] EF=4.2
5. (Geometric):
The probability that a bond defaults in any given year is 0.5%. A
portfolio manager holds this bond for 10 years. What is the probability
that the bond will default during this period? What is the probability
that it will default in the next 15 years? What is the expected number
of years before the bond defaults? If the bond has already survived 10
years, what is the probability that it will default in the next 2
years?
p=.995; n1=10; n2=15
pformat(1-p^n1)
## [1] Pr=0.0488898695342281
pformat(1-p^n2)
## [1] Pr=0.0724310311816722
myformat("EX", 1/(1-p))
## [1] EX=200
pformat(1-p^2) #memoryless
## [1] Pr=0.00997499999999996
6. (Poisson):
A high-frequency trading algorithm experiences a system failure about
once every 1500 trading hours. What is the probability that the
algorithm will experience more than two failures in 1500 hours? What is
the expected number of failures?
lambda=1; x=2
pformat(ppois(2, 1, lower.tail=FALSE))
## [1] Pr=0.0803013970713942
myformat("EX", lambda)
## [1] EX=1
8. (Exponential
Distribution):
A financial model estimates that the lifetime of a successful
start-up before it either goes public or fails follows an exponential
distribution with an expected value of 8 years. What is the expected
time until the start-up either goes public or fails? What is the
standard deviation? What is the probability that the start-up will go
public or fail after 6 years? Given that the start-up has survived for 6
years, what is the probability that it will go public or fail in the
next 2 years?
gamma=1/8
myformat("EX",1/gamma)
## [1] EX=8
myformat("SX",1/gamma)
## [1] SX=8
pformat(pexp(6,gamma, lower.tail=TRUE))
## [1] Pr=0.527633447258985
pformat(pexp(2,gamma)) #memoryless
## [1] Pr=0.221199216928595
Problem 2.
1. (Product
Selection):
A company produces 5 different types of green pens and 7 different
types of red pens. The marketing team needs to create a new promotional
package that includes 5 pens. How many different ways can the package be
created if it contains fewer than 2 green pens?
myformat("Ways", choose(5,0)*choose(7,5)+choose(5,1)*choose(7,4))
## [1] Ways=196
3. (Marketing
Campaign Outcomes):
A marketing campaign involves three stages: first, a customer is sent
5 email offers; second, the customer is targeted with 2 different online
ads; and third, the customer is presented with 3 personalized product
recommendations. If the email offers, online ads, and product
recommendations are selected randomly, how many different possible
outcomes are there for the entire campaign?
myformat("Ways", 5*2*3) #multiplication rule
## [1] Ways=30
4. (Product Defect
Probability):
A quality control team draws 3 products from a batch of size N
without replacement. What is the probability that at least one of the
products drawn is defective if the defect rate is known to be
consistent?
\[
P(\text{At least one defective}) =\sum_{k=1}^3
\frac{\binom{D}{k}\binom{N-D}{3-k}}{\binom{N}{3}},D=Defective
\] ## 5. (Business Strategy Choices):
A business strategist is choosing potential projects to invest in,
focusing on 17 high-risk, high-reward projects and 14 low-risk,
steady-return projects.
o Step 1:
How many different combinations of 5 projects can the strategist
select?
myformat("Ways", choose(31,5))
## [1] Ways=169911
o Step 2:
How many different combinations of 5 projects can the strategist
select if they want at least one low-risk project?
myformat("Ways", choose(31,5)-choose(17,5)*choose(14,0))
## [1] Ways=163723
6. (Event
Scheduling):
A business conference needs to schedule 9 different keynote sessions
from three different industries: technology, finance, and healthcare.
There are 4 potential technology sessions, 104 finance sessions, and 17
healthcare sessions to choose from. How many different schedules can be
made? Express your answer in scientific notation rounding to the
hundredths place.
options(scipen=0)
myformat("Ways", formatC(factorial(9)*choose(125,9), format="e", digits=2))
## [1] Ways=5.55e+18
7. (Book Selection
for Corporate Training): An HR manager needs to create a reading list
for a corporate leadership training program, which includes 13 books in
total. The books are categorized into 6 novels, 6 business case studies,
7 leadership theory books, and 5 strategy books.
o Step 1:
If the manager wants to include no more than 4 strategy books, how
many different reading schedules are possible? Express your answer in
scientific notation rounding to the hundredths place.
myformat("Ways", factorial(13)*(choose(24,13)-choose(5,5)*choose(19,8)))
## [1] Ways=15072889921689600
o Step 2:
If the manager wants to include all 6 business case studies, how many
different reading schedules are possible? Express your answer in
scientific notation rounding to the hundredths place.
myformat("Ways", formatC(factorial(13)*choose(24-6, 13-6), format="e", digits=2))
## [1] Ways=1.98e+14
8. (Product
Arrangement):
A retailer is arranging 10 products on a display shelf. There are 5
different electronic gadgets and 5 different accessories. What is the
probability that all the gadgets are placed together and all the
accessories are placed together on the shelf? Express your answer as a
fraction or a decimal number rounded to four decimal places.
pformat(2*factorial(5)*factorial(5)/factorial(10))
## [1] Pr=0.00793650793650794
##9. (Expected Value of a Business Deal):
A company is evaluating a deal where they either gain $4 for every
successful contract or lose $16 for every unsuccessful contract. A
“successful” contract is defined as drawing a queen or lower from a
standard deck of cards. (Aces are considered the highest card in the
deck.)
o Step 1: Find the
expected value of the deal. Round your answer to two decimal places.
Losses must be expressed as negative values.
ans=4*44/52-16*8/52
myformat("EX", formatC(ans, digits=2))
## [1] EX=0.92
o Step 2: If the
company enters into this deal 833 times, how much would they expect to
win or lose? Round your answer to two decimal places. Losses must be
expressed as negative values.
myformat("E833X",formatC(ans*833, format="f",digits=2))
## [1] E833X=768.92
Problem 3.
1. (Supply Chain Risk
Assessment):
Let X1,X2,…,Xn represent the lead times (in days) for the delivery of
key components from n=5 different suppliers. Each lead time is uniformly
distributed across a range of 1 to k=20 days, reflecting the uncertainty
in delivery times. Let Y denote the minimum delivery time among all
suppliers. Understanding the distribution of Y is crucial for assessing
the earliest possible time you can begin production. Determine the
distribution of Y to better manage your supply chain and minimize
downtime.
Let \(Y\) be the distribution of the
minimum.
Equation 1 (CDF Formulation from Complement): \(P(Y \le y)=1-P(X1>y,
X2>y,..Xn>y)\). All \(X_i\) must be greater than y.
The probability that all \(X_i\) are
greater than y is then \((1-F(y))^n\).
Substituting into Equation 1, the cumulative distribution function is
then Equation 2.
Equation 2 (CDF of Minimum): \(1-(1-F(y))^n\).
Now, the CDF of a discrete uniform is well-known as \(\frac{\lfloor{x}\rfloor-a+1}{b-a+1}\).
Here, as is the minimum of the uniform (in our case, 1) and b is the
maximum (in our case 20). The floor of x are the integers {1,2,3..20}.
Substituting into Equation 2, we have Equation 3.
Equation 3 (CDF of Discrete Uniform Minimum): \(1-(1-\frac{\lfloor{y}\rfloor-a+1}{b-a+1})^n\)
With a=1 and b=20, we have Equation 4.
Equation 4 (CDF of Our Discrete Uniform Minimum): \(1-(1-\frac{\lfloor{y}\rfloor}{20})^n\)
Plugging in 1 through 20 for the floor of X generates the
distribution.
n <- 5
a <- 1
b <- 20
x_vals <- 1:20
F_Y <- 1 - (1 - (x_vals / (b - a + 1)))^n
data.frame(Day = x_vals, P_Y_leq_x = round(F_Y, 4))
## Day P_Y_leq_x
## 1 1 0.2262
## 2 2 0.4095
## 3 3 0.5563
## 4 4 0.6723
## 5 5 0.7627
## 6 6 0.8319
## 7 7 0.8840
## 8 8 0.9222
## 9 9 0.9497
## 10 10 0.9688
## 11 11 0.9815
## 12 12 0.9898
## 13 13 0.9947
## 14 14 0.9976
## 15 15 0.9990
## 16 16 0.9997
## 17 17 0.9999
## 18 18 1.0000
## 19 19 1.0000
## 20 20 1.0000
2. (Maintenance
Planning for Critical Equipment):
Your organization owns a critical piece of equipment, such as a
high-capacity photocopier (for a law firm) or an MRI machine (for a
healthcare provider). The manufacturer estimates the expected lifetime
of this equipment to be 8 years, meaning that, on average, you expect
one failure every 8 years. It’s essential to understand the likelihood
of failure over time to plan for maintenance and replacements.
a. Geometric
Model:
Calculate the probability that the machine will not fail for the
first 6 years. Also, provide the expected value and standard deviation.
This model assumes each year the machine either fails or does not,
independently of previous years.
pformat((7/8)^6)
## [1] Pr=0.448795318603516
b. Exponential
Model:
Calculate the probability that the machine will not fail for the
first 6 years. Provide the expected value and standard deviation,
modeling the time to failure as a continuous process.
fine_grain=1; lambda=8*fine_grain; beta=1/lambda; x=6*fine_grain
pformat(pexp(x,1/lambda,lower.tail=FALSE))
## [1] Pr=0.472366552741015
myformat("EX", 1/lambda)
## [1] EX=0.125
myformat("SX",1/lambda)
## [1] SX=0.125
c. Binomial
Model:
Calculate the probability that the machine will not fail during the
first 6 years, given that it is expected to fail once every 8 years.
Provide the expected value and standard deviation, assuming a fixed
number of trials (years) with a constant failure probability each
year.
pformat(dbinom(0,6,1/8))
## [1] Pr=0.448795318603516
myformat("EX", 6*1/8)
## [1] EX=0.75
myformat("SX",sqrt(6*1/8*7/8))
## [1] SX=0.810092587300983
d. Poisson
Model:
Calculate the probability that the machine will not fail during the
first 6 years, modeling the failure events as a Poisson process. Provide
the expected value and standard deviation.
pformat(dpois(0,6/8))
## [1] Pr=0.472366552741015
myformat("EX", 6/8)
## [1] EX=0.75
myformat("SX",sqrt(6/8))
## [1] SX=0.866025403784439
Problem 4.
1. MGF
You are managing two independent servers in a data center. The time
until the next failure for each server follows an exponential
distribution with different rates: • Server A has a failure rate of
\(\lambda_A = 0.5\) failures per hour.
• Server B has a failure rate of \(\lambda_B =
0.3\) failures per hour.
What is the distribution of the total time until both servers have
failed at least once? Use the moment generating function (MGF) to find
the distribution of the sum of the times to failure.
\(M_x(t)=E(e^{tx})=\frac{\lambda}{\lambda-t}\)
By independence, we can multiply both \(M_x(t)\) together.
\(M_TA(t)\times
M_TB(t)=\frac{.5}{.5-t}\frac{.3}{.3-t}, t>.3\)
This distribution has a name: the hypoexponential!
2. Sum of Independent
Normally Distributed Random Variables
An investment firm is analyzing the returns of two independent
assets, Asset X and Asset Y. The returns on these assets are normally
distributed: \(X\sim \text{N}(\mu_X = 5\%,
\sigma_X^2 = 4\%)\) \(Y\sim
\text{N}(\mu_Y = 7\%, \sigma_Y^2 = 9\%)\)
Question: Find the distribution of the combined return of the
portfolio consisting of these two assets using the moment generating
function (MGF).
First, the MGF of the normal is \(M_N(t)=exp(\mu t+ 0.5 \sigma^2t^2)\)
As above, \(M_X(t) \times
M_Y(t)=exp(0.05t+.002t^2) \times exp(0.07t+.0045t^2)\)
Simplifying… \(M_{X+Y}(t)=exp(0.12t+0.065t^2)\)
And we can see this is the MGF of a Normal with the mean of 0.12 and
a variance of \(0.065\).
3. Poisson MGF
• Region A: \(X_A \sim
\text{Poisson}(\lambda_A = 3)\) • Region B: \(X_B \sim \text{Poisson}(\lambda_B =
5)\)
Question: Determine the distribution of the total number of calls
received in an hour from both regions using the moment generating
function (MGF).
Same as above.. \(M_A(t) \times
M_B(t)=exp(3(e^t-1)) \times exp(5(e^t-1))=exp(8(e^t-1))\)
Which is a Poisson MGF with rate 8.
Problem 5.
1. Customer Retention
and Churn Analysis
A telecommunications company wants to model the behavior of its
customers regarding their likelihood to stay with the company
(retention) or leave for a competitor (churn). The company segments its
customers into three states: • State 1: Active customers who are
satisfied and likely to stay (Retention state). • State 2: Customers who
are considering leaving (At-risk state). • State 3: Customers who have
left (Churn state). The company has historical data showing the
following monthly transition probabilities: • From State 1 (Retention):
80% stay in State 1, 15% move to State 2, and 5% move to State 3. • From
State 2 (At-risk): 30% return to State 1, 50% stay in State 2, and 20%
move to State 3. • From State 3 (Churn): 100% stay in State 3.
Retention (R), At-risk (A), Churn (C)
(a)
The company wants to analyze the long-term behavior of its customer
base. Question: (a) Construct the transition matrix for this Markov
Chain. (b) If a customer starts as satisfied (State 1), what is the
probability that they will eventually churn (move to State 3)? (c)
Determine the steady-state distribution of this Markov Chain. What
percentage of customers can the company expect to be in each state in
the long run?
library(markovchain)
## Loading required package: Matrix
## Package: markovchain
## Version: 0.10.0
## Date: 2024-11-14 00:00:02 UTC
## BugReport: https://github.com/spedygiorgio/markovchain/issues
states <- c("R", "A", "C")
transitionMatrix <- matrix(c(
0.80, 0.15, 0.05, # From R
0.30, 0.50, 0.20, # From A
0.00, 0.00, 1.00 # From C (absorbing)
),
nrow = 3, byrow = TRUE)
mc_customer <- new("markovchain", states = states, transitionMatrix = transitionMatrix, name = "Customer Behavior")
mc_customer
## Customer Behavior
## A 3 - dimensional discrete Markov Chain defined by the following states:
## R, A, C
## The transition matrix (by rows) is defined as follows:
## R A C
## R 0.8 0.15 0.05
## A 0.3 0.50 0.20
## C 0.0 0.00 1.00
(b)
If the warehouse starts with a high inventory level (State 1), what
is the probability that it will eventually end up in a low inventory
level (State 3)?
absorbingStates(mc_customer)
## [1] "C"
(c)
Determine the steady-state distribution of this Markov Chain. What is
the long-term expected proportion of time that the warehouse will spend
in each inventory state?
steadyStates(mc_customer)
## R A C
## [1,] 0 0 1
100% will eventually churn.
2. Inventory
Management in a Warehouse
A warehouse tracks the inventory levels of a particular product using
a Markov Chain model. The inventory levels are categorized into three
states: • State 1: High inventory (More than 100 units in stock). •
State 2: Medium inventory (Between 50 and 100 units in stock). • State
3: Low inventory (Less than 50 units in stock). The warehouse has the
following transition probabilities for inventory levels from one month
to the next: • From State 1 (High): 70% stay in State 1, 25% move to
State 2, and 5% move to State 3. • From State 2 (Medium): 20% move to
State 1, 50% stay in State 2, and 30% move to State 3. • From State 3
(Low): 10% move to State 1, 40% move to State 2, and 50% stay in State
3.
The warehouse management wants to optimize its restocking strategy by
understanding the long-term distribution of inventory levels. Question:
(a) Construct the transition matrix for this Markov Chain. (b) If the
warehouse starts with a high inventory level (State 1), what is the
probability that it will eventually end up in a low inventory level
(State 3)? (c) Determine the steady-state distribution of this Markov
Chain. What is the long-term expected proportion of time that the
warehouse will spend in each inventory state?
States: High Inventory (H), Medium Inventory (M), Low Inventory
(L)
- Construct the Transition Matrix
states <- c("H", "M", "L")
transitionMatrix <- matrix(c(
0.70, 0.25, 0.05, # From High
0.20, 0.50, 0.30, # From Medium
0.10, 0.40, 0.50 # From Low
), byrow = TRUE, nrow = 3)
mc_inventory <- new("markovchain", transitionMatrix = transitionMatrix, states = states, name = "Inventory")
mc_inventory
## Inventory
## A 3 - dimensional discrete Markov Chain defined by the following states:
## H, M, L
## The transition matrix (by rows) is defined as follows:
## H M L
## H 0.7 0.25 0.05
## M 0.2 0.50 0.30
## L 0.1 0.40 0.50
(b)
If the warehouse starts in State 1 (High), what is the probability
that it eventually ends up in Low (State 3)? This is not an absorbing
chain, so “eventually ends up in” can be interpreted as the long-run
probability that the warehouse is in State 3 — which is just the
steady-state probability for State 3 when starting in State 1.
steady <- steadyStates(mc_inventory)
round(steady, 4)[3]
## [1] 0.2667
(c)
round(steady,4)
## H M L
## [1,] 0.3467 0.3867 0.2667