Problem 1

1.1

(Bayesian): A new credit scoring system has been developed to predict the likelihood of loan defaults. The system has a 90% sensitivity, meaning that it correctly identifies 90% of those who will default on their loans as postive. It also has a 95% specificity, meaning that it correctly identifies 95% of those who will not default as not defaulting. The default rate among borrowers is 2%.

1.1.a

Given these prevalence, sensitivity, and specificity estimates, what is the probability that a borrower flagged by the system as likely to default will actually default?

The fast way…

.90*.02 / (.9*.02+.05*.98)
## [1] 0.2686567

Detailed…*

#A=Default_Forecasted
#B=Default
#We want P(B|A) = P(A|B) x P(B) / [P(A|B) x P(B) + P(A|B') x P(B')]
P_B=prev=.02
P_not_B=1-P_B
P_A_given_B=sens=.9
P_not_A_given_not_B=spec=.95
P_A_given_not_B=1-.95 #complement of specificity, they add to 1

(Default_Positive=Result=P_A_given_B=sens*P_B / ((P_A_given_B=sens*P_B)+P_A_given_not_B*P_not_B))
## [1] 0.2686567

1.1.b

If the average loss per defaulted loan is 200,000 and the cost to run the credit scoring test on each borrower is 500, what is the total first-year cost for evaluating 10,000 borrowers?

options(scipen = 999)
loss=200000
borrowers=10000
cost_per_test=500

#Estimate defaulters and non-defaulters
defaulters=0.02*borrowers
non_defaulters=10000-defaulters

#Estimate those funded / not funded by mistake
defaulters_funded=.1*defaulters
non_defaulters_not_funded=0.05*non_defaulters



total_cost=(defaulters_funded*loss+500*borrowers)
#define a profit function, say an average of $10000 per loan
opportunity_cost=non_defaulters_not_funded*10000
costs=c(total_cost, opportunity_cost)
names(costs)=c("measurable costs", "opportunity cost @ 20K per loan")
costs
##                measurable costs opportunity cost @ 20K per loan 
##                         9000000                         4900000

1.2

(Binomial): The probability that a stock will pay a dividend in any given quarter is 0.7. What is the probability that the stock pays dividends exactly 6 times in 8 quarters? What is the probability that it pays dividends 6 or more times? What is the probability that it pays dividends fewer than 6 times? What is the expected number of dividend payments over 8 quarters? What is the standard deviation?

myB=function(low, high,N,pi){
  sum(dbinom(low:high, N,pi))
}
N=8
pi=.7
myB(6,6,N,pi)
## [1] 0.2964755
myB(6,8,N,pi)
## [1] 0.5517738
myB(0,5,N,pi)
## [1] 0.4482262
print(c(EX=N*pi, SDX=sqrt(N*pi*(1-pi))))
##       EX      SDX 
## 5.600000 1.296148

1.3

(Poisson): A financial analyst notices that there are an average of 12 trading days each month when a certain stock’s price increases by more than 2%. What is the probability that exactly 4 such days occur in a given month? What is the probability that more than 12 such days occur in a given month? How many such days would you expect in a 6-month period? What is the standard deviation of the number of such days? If an investment strategy requires at least 70 days of such price increases in a year for profitability, what is the percent utilization and what are your recommendations?

myP=function(low,high,lambda, t){
  sum(dpois(low:high, lambda*t))
}

lambda=12
t=1

myP(4,4,lambda,t)
## [1] 0.005308599
myP(13,1000, lambda,t)
## [1] 0.4240348
print(c(EX=lambda*6, SDX=sqrt(lambda*6)))
##        EX       SDX 
## 72.000000  8.485281

1.4

(Hypergeometric): A hedge fund has a portfolio of 25 stocks, with 15 categorized as high-risk and 10 as low-risk. The fund manager randomly selects 7 stocks to closely monitor. If the manager selected 5 high-risk stocks and 2 low-risk stocks, what is the probability of selecting exactly 5 high-risk stocks if the selection was random? How many high-risk and low-risk stocks would you expect to be selected?

myH=function(low, high, n1, n2, draws){
  sum(dhyper(low:high, n1,n2, draws))
}

n1=15
n2=10
draws=7

myH(5,5,n1,n2,draws)
## [1] 0.2811213
print(c(En1=draws*n1/(n1+n2), En2=draws*n2/(n1+n2)))
## En1 En2 
## 4.2 2.8

1.5

(Geometric): The probability that a bond defaults in any given year is 0.5%. A portfolio manager holds this bond for 10 years. What is the probability that the bond will default during this period? What is the probability that it will default in the next 15 years? What is the expected number of years before the bond defaults? If the bond has already survived 10 years, what is the probability that it will default in the next 2 years?

p1=.995^10
p2=.995^15
p3=1/.005
p4=.995^2 #memoryless
myl=c(p1,p2,p3,p4)
names(myl)=c("Survive 10", "Survive 15", "Expected Value", "Survive Additional 2")
myl
##           Survive 10           Survive 15       Expected Value 
##            0.9511101            0.9275690          200.0000000 
## Survive Additional 2 
##            0.9900250

1.6

(Poisson): A high-frequency trading algorithm experiences a system failure about once every 1500 trading hours. What is the probability that the algorithm will experience more than two failures in 1500 hours? What is the expected number of failures?

lambda=1
t=1

myP(3,1000,lambda,t)
## [1] 0.0803014
print(c(EX=lambda*t))
## EX 
##  1

1.7

(Uniform Distribution): An investor is trying to time the market and is monitoring a stock that they believe has an equal chance of reaching a target price between 20 and 60 days. What is the probability that the stock will reach the target price in more than 40 days? If it hasn’t reached the target price by day 40, what is the probability that it will reach it in the next 10 days? What is the expected time for the stock to reach the target price?

#1-P(20<X<40|min=20, max=60, Uniform)
range=length((seq(20,60)))
a1=1/range*length(seq(41,60))
a2=length(seq(41,50))/length(seq(41,60))
#P(41<=X<=50 | X>=41) = 
#P(X<=50, X>=41)/P(X>=41) - 
#P(X<=41, P(X>=41))P(X>=41)=
#The second term is zero.
#P(X<=50, X>=41)/P(X>=41)=(10/41)/(20/41)-0=10/20=.5

EX=(40+60)/2
mya=c(a1,a2,EX)
names(mya)=c("P(X>=41)", "P(X<=50 | X>=41)", "EX")
mya
##         P(X>=41) P(X<=50 | X>=41)               EX 
##        0.4878049        0.5000000       50.0000000

1.8

(Exponential Distribution): A financial model estimates that the lifetime of a successful start-up before it either goes public or fails follows an exponential distribution with an expected value of 8 years. What is the expected time until the start-up either goes public or fails? What is the standard deviation? What is the probability that the start-up will go public or fail after 6 years? Given that the start-up has survived for 6 years, what is the probability that it will go public or fail in the next 2 years?

myE=function(low, high, lambda)
{
  p1=pexp(high, lambda)-pexp(low, lambda)
  p2=lambda
  p3=lambda
  mylist=c(p1,p2,p3)
  names(mylist)=c('P','EX','SDX')
  return(mylist)
  }


myE(6, 1000,1/8)
##         P        EX       SDX 
## 0.4723666 0.1250000 0.1250000
myE(0,2, 1/8)
##         P        EX       SDX 
## 0.2211992 0.1250000 0.1250000

Problem 2

2.1

(Product Selection): A company produces 5 different types of green pens and 7 different types of red pens. The marketing team needs to create a new promotional package that includes 5 pens. How many different ways can the package be created if it contains fewer than 2 green pens?

choose(5,0)*choose(7,5)+choose(5,1)*choose(7,4)
## [1] 196

2.2

(Team Formation for a Project): A project committee is being formed within a company that includes 14 senior managers and 13 junior managers. How many ways can a project team of 5 members be formed if at least 4 of the members must be junior managers?

a1=14*13*12*11*10*9 #if order matters
a2=14*choose(13,5) #if order doesn't matter
ans=c(a1,a2)
names(ans)=c('Order Matters', 'Order Does not Matter')
ans
##         Order Matters Order Does not Matter 
##               2162160                 18018

2.3

(Marketing Campaign Outcomes): A marketing campaign involves three stages: first, a customer is sent 5 email offers; second, the customer is targeted with 2 different online ads; and third, the customer is presented with 3 personalized product recommendations. If the email offers, online ads, and product recommendations are selected randomly, how many different possible outcomes are there for the entire campaign?

#rule of multiplication
5*2*3
## [1] 30

2.4

(Product Defect Probability): A quality control team draws 3 products from a batch of size N without replacement. What is the probability that at least one of the products drawn is defective if the defect rate is known to be consistent?

$1-P(X=0|N,) = $ \(1-{N\choose{x}} \pi^x(1-\pi)^(N-x)=\) \(1-1 \times \ 1 \times (1-\pi)^N\) = \(1-(1-\pi)^N\)

Which is the geometric distributions CDF complement. \(F'\).

2.5

(Business Strategy Choices): A business strategist is choosing potential projects to invest in, focusing on 17 high-risk, high-reward projects and 14 low-risk, steady-return projects.

2.5 Step 1:

How many different combinations of 5 projects can the strategist select?

temp=0
for (i in 0:5){
  p1=choose(17,i)*choose(14,5-i)
  temp=temp+p1
}
temp
## [1] 169911

2.5 Step 2

Step 2: How many different combinations of 5 projects can the strategist select if they want at least one low-risk project?

temp=0
for (i in 0:4){
  p1=choose(17,i)*choose(14,5-i)
  temp=temp+p1
}
temp
## [1] 163723

2.6

(Event Scheduling): A business conference needs to schedule 9 different keynote sessions from three different industries: technology, finance, and healthcare. There are 4 potential technology sessions, 104 finance sessions, and 17 healthcare sessions to choose from. How many different schedules can be made? Express your answer in scientific notation rounding to the hundredths place.

# We are not told that we have to generate one or more selections from every industry!

answer=choose(4+104+17, 9)*factorial(9)

formatC(answer, format = "e", digits = 2)
## [1] "5.55e+18"

2.7

(Book Selection for Corporate Training): An HR manager needs to create a reading list for a corporate leadership training program, which includes 13 books in total. The books are categorized into 6 novels, 6 business case studies, 7 leadership theory books, and 5 strategy books.

2.7 Step 1

If the manager wants to include no more than 4 strategy books, how many different reading schedules are possible? Express your answer in scientific notation rounding to the hundredths place.

temp2=(choose(24,13)-choose(19,8)*choose(5,5))*factorial(13)
formatC(temp2, format = "e", digits = 2)
## [1] "1.51e+16"

2.7 Step 2

If the manager wants to include all 6 business case studies, how many different reading schedules are possible? Express your answer in scientific notation rounding to the hundredths place.

temp3=choose(18,7)*choose(6,6)*factorial(13)
formatC(temp3, format = "e", digits = 2)
## [1] "1.98e+14"

2.8

(Product Arrangement): A retailer is arranging 10 products on a display shelf. There are 5 different electronic gadgets and 5 different accessories. What is the probability that all the gadgets are placed together and all the accessories are placed together on the shelf? Express your answer as a fraction or a decimal number rounded to four decimal places.

formatC(factorial(2)*factorial(5)^2/factorial(10), digits=3)
## [1] "0.00794"

2.9

(Expected Value of a Business Deal): A company is evaluating a deal where they either gain 4 for every successful contract or lose 16 for every unsuccessful contract. A “successful” contract is defined as drawing a queen or lower from a standard deck of cards. (Aces are considered the highest card in the deck.)

2.9 Step 1

Find the expected value of the deal. Round your answer to two decimal places. Losses must be expressed as negative values.

44/52*4-8/52*16
## [1] 0.9230769

2.9 Step 2

If the company enters into this deal 833 times, how much would they expect to win or lose? Round your answer to two decimal places. Losses must be expressed as negative values.

833*(44/52*4-8/52*16)
## [1] 768.9231

Problem 3

3.1

(Supply Chain Risk Assessment): Let \(X_1, X_2,..\) represent the lead times (in days) for the delivery of key components from \(n=5\) different suppliers. Each lead time is uniformly distributed across a range of 1 to k=20 days, reflecting the uncertainty in delivery times. Let \(Y\) denote the minimum delivery time among all suppliers. Understanding the distribution of \(Y\) is crucial for assessing the earliest possible time you can begin production. Determine the distribution of to better manage your supply chain and minimize downtime.

\(1-(1-FX(x))^5=1- (1-\frac{x-2}{20})^5\)

3.2

(Maintenance Planning for Critical Equipment): Your organization owns a critical piece of equipment, such as a high-capacity photocopier (for a law firm) or an MRI machine (for a healthcare provider). The manufacturer estimates the expected lifetime of this equipment to be 8 years, meaning that, on average, you expect one failure every 8 years. It’s essential to understand the likelihood of failure over time to plan for maintenance and replacements.

3.2.a

Geometric Model: Calculate the probability that the machine will not fail for the first 6 years. Also, provide the expected value and standard deviation. This model assumes each year the machine either fails or does not, independently of previous years.

#cannot use pgeom here because it adds q at the end.

(7/8)^6
## [1] 0.4487953
1/(1/8)
## [1] 8
sqrt(7/8)/(1/8)
## [1] 7.483315

3.2.b

Exponential Model: Calculate the probability that the machine will not fail for the first 6 years. Provide the expected value and standard deviation, modeling the time to failure as a continuous process.

# I will convert years to days to make the estimate better.
#P(X<=72 months)
1-pexp(2191.5,1/2922)
## [1] 0.4723666
#converting lambda back
2922/365.25
## [1] 8
2922/365.25
## [1] 8

3.c

Binomial Model: Calculate the probability that the machine will not fail during the first 6 years, given that it is expected to fail once every 8 years. Provide the expected value and standard deviation, assuming a fixed number of trials (years) with a constant failure probability each year.

dbinom(0,6,1/8)
## [1] 0.4487953
6/8
## [1] 0.75
sqrt(6/8*(7/8))
## [1] 0.8100926

3.c Refined

# Let's use days for a refined estimate that more closely matches.
dbinom(0,72,1/96)
## [1] 0.4705121
72/96*12
## [1] 9
sqrt(72/96*(95/96))*12
## [1] 10.33804

3.d

  1. Poisson Model: Calculate the probability that the machine will not fail during the first 6 years, modeling the failure events as a Poisson process. Provide the expected value and standard deviation.
#You must adjust the rate so that N is large (e.g., 96 months), right?

dpois(0,6/8)
## [1] 0.4723666
6/8
## [1] 0.75
sqrt(6/8)
## [1] 0.8660254

Problem 4

4.1

Scenario: You are managing two independent servers in a data center. The time until the next failure for each server follows an exponential distribution with different rates:

Server A has a failure rate of \(\lambda_\alpha\)=.5 failures per hour. Server B has a failure rate of \(\lambda_\beta\)=.3 failures per hour.

Question: What is the distribution of the total time until both servers have failed at least once? Use the moment generating function (MGF) to find the distribution of the sum of the times to failure.

\(\int{_0^\inf}e^{sx}\lambda \times e^{-\lambda x}dx=\) \(\lambda \int{_0^\inf}e^{(s-\lambda)x} dx=\)
\(\frac{\lambda} {\lambda-s}\)

\(M_T=\frac{.5 \times .3}{(.5-s)(.3-s)}\)

This is the hypoexponential distribution, a mixture of two exponentials with different rates!

$ f_T= (e{-.5t}-e{-.3t}) $

4.2

Sum of Independent Normally Distributed Random Variables

Scenario: An investment firm is analyzing the returns of two independent assets, Asset X and Asset Y. The returns on these assets are normally distributed:

Asset X: mean of 5%, variance of 4% Asset Y: mean of 7%, variance of 9%

Question: Find the distribution of the combined return of the portfolio consisting of these two assets using the moment generating function (MGF).

You can actually do this without the MGF quicker. The mean of the sum of the two normally distributed independent assets is 12%, \(\mu_1 +\mu_2\), and the variances also sum since they are independent (13%). When you use the MGFs of the Normal and multiply them, you find another MGF of another with the associated means and variances.

4.3

Scenario: A call center receives calls independently from two different regions. The number of calls received from Region A and Region B in an hour follows a Poisson distribution:

Region A: \(X_A~Poisson(3)\) Region B: \(X_B=Poisson(5)\)

Question: Determine the distribution of the total number of calls received in an hour from both regions using the moment generating function (MGF).

Again, too easy with MGFs. The immediate answer is a Poisson(8) (summative).When you use the MGFs and multiply them, you find the MGF of the Poisson again!

Problem 5

5.1

Customer Retention and Churn Analysis

Scenario: A telecommunications company wants to model the behavior of its customers regarding their likelihood to stay with the company (retention) or leave for a competitor (churn). The company segments its customers into three states:

State 1: Active customers who are satisfied and likely to stay (Retention state). State 2: Customers who are considering leaving (At-risk state). State 3: Customers who have left (Churn state).

The company has historical data showing the following monthly transition probabilities:

From State 1 (Retention): 80% stay in State 1, 15% move to State 2, and 5% move to State 3. From State 2 (At-risk): 30% return to State 1, 50% stay in State 2, and 20% move to State 3. From State 3 (Churn): 100% stay in State 3.

The company wants to analyze the long-term behavior of its customer base.

5.1.a

Question: (a) Construct the transition matrix for this Markov Chain.

require(markovchain)
## Loading required package: markovchain
## Warning: package 'markovchain' was built under R version 4.4.2
## Package:  markovchain
## Version:  0.9.5
## Date:     2023-09-24 09:20:02 UTC
## BugReport: https://github.com/spedygiorgio/markovchain/issues
# Part a.
mymat=matrix(c(.8,.15,.05, .3,.5,.2, 0, 0, 1), nrow=3, byrow = T)
mymat=new("markovchain", states=c("retention", "at-risk", "churn"), transitionMatrix=mymat)
mymat
## Unnamed Markov chain 
##  A  3 - dimensional discrete Markov Chain defined by the following states: 
##  retention, at-risk, churn 
##  The transition matrix  (by rows)  is defined as follows: 
##           retention at-risk churn
## retention       0.8    0.15  0.05
## at-risk         0.3    0.50  0.20
## churn           0.0    0.00  1.00
plot(mymat)

5.1.b

If a customer starts as satisfied (State 1), what is the probability that they will eventually churn (move to State 3)?

100%

#b.
steadyStates(mymat)
##      retention at-risk churn
## [1,]         0       0     1

5.1.c

Determine the steady-state distribution of this Markov Chain. What percentage of customers can the company expect to be in each state in the long run?

steadyStates(mymat)*100
##      retention at-risk churn
## [1,]         0       0   100

5.2

Inventory Management in a Warehouse

Scenario: A warehouse tracks the inventory levels of a particular product using a Markov Chain model. The inventory levels are categorized into three states:

State 1: High inventory (More than 100 units in stock). State 2: Medium inventory (Between 50 and 100 units in stock). State 3: Low inventory (Less than 50 units in stock).

The warehouse has the following transition probabilities for inventory levels from one month to the next:

From State 1 (High): 70% stay in State 1, 25% move to State 2, and 5% move to State 3. From State 2 (Medium): 20% move to State 1, 50% stay in State 2, and 30% move to State 3. From State 3 (Low): 10% move to State 1, 40% move to State 2, and 50% stay in State 3.

The warehouse management wants to optimize its restocking strategy by understanding the long-term distribution of inventory levels.

5.2.a

Question: (a) Construct the transition matrix for this Markov Chain.

# Part a.
mymat=matrix(c(.7,.25,.05,.2,.5,.3, .1,.4,.5), nrow=3, byrow = T)
mymat=new("markovchain", states=c("high", "medium", "low"), transitionMatrix=mymat)
mymat
## Unnamed Markov chain 
##  A  3 - dimensional discrete Markov Chain defined by the following states: 
##  high, medium, low 
##  The transition matrix  (by rows)  is defined as follows: 
##        high medium  low
## high    0.7   0.25 0.05
## medium  0.2   0.50 0.30
## low     0.1   0.40 0.50
plot(mymat)

5.2.b

If the warehouse starts with a high inventory level (State 1), what is the probability that it will eventually end up in a low inventory level (State 3)?

steadyStates(mymat)[3]
## [1] 0.2666667

5.3.c

Determine the steady-state distribution of this Markov Chain. What is the long-term expected proportion of time that the warehouse will spend in each inventory state?

steadyStates(mymat)
##           high    medium       low
## [1,] 0.3466667 0.3866667 0.2666667