Problem 1

1.(Bayesian):

A new credit scoring system has been developed to predict the likelihood of loan defaults. The system has a 90% sensitivity, meaning that it correctly identifies 90% of those who will default on their loans. It also has a 95% specificity, meaning that it correctly identifies 95% of those who will not default. The default rate among borrowers is 2%. Given these prevalence, sensitivity, and specificity estimates, what is the probability that a borrower flagged by the system as likely to default will actually default? If the average loss per defaulted loan is $200,000 and the cost to run the credit scoring test on each borrower is $500, what is the total first-year cost for evaluating 10,000 borrowers?

With the given information, we can proceed to perform the calculations:

\[ P(\text{default} \mid \text{positive test}) = \frac{P(\text{positive test} \mid \text{default}) \times P(\text{default})}{P(\text{positive test})} \]

\[ P(\text{positive test}) = P(\text{positive test} \mid \text{default}) \times P(\text{default}) + P(\text{positive test} \mid \text{no default}) \times P(\text{no default}) \]

\[ P(\text{positive test}) = 0.9 \times 0.02 + (1 - 0.95) \times (1 - 0.02) = 0.018 + 0.05 \times 0.98 = 0.018 + 0.049 = 0.067 \]

\[ P(\text{default} \mid \text{positive test}) = \frac{0.9 \times 0.02}{0.067} = \frac{0.018}{0.067} \approx 0.269 \text{ or about } 26.9\% \]

# to get the results in R
# Given values
sensitivity <- 0.90
specificity <- 0.95
default_rate <- 0.02

# Complementary probabilities
no_default_rate <- 1 - default_rate
false_positive_rate <- 1 - specificity

# Total probability of a positive test
p_positive_test <- (sensitivity * default_rate) + (false_positive_rate * no_default_rate)

# Probability of default given a positive test
p_default_given_positive <- (sensitivity * default_rate) / p_positive_test

# Display result
p_default_given_positive
## [1] 0.2686567

In the second part I will calculate the total first year cost:

\[ \begin{align*} \text{Cost to run tests} &= 10{,}000 \text{ borrowers} \times \$500 = \$5{,}000{,}000 \\ \text{Expected number of positive tests} &= 10{,}000 \times 0.067 = 670 \text{ borrowers} \\ \text{True defaulters among positives} &= 670 \times 0.269 = 180 \text{ borrowers} \\ \text{Expected loss from defaults} &= 180 \times \$200{,}000 = \$36{,}000{,}000 \\ \text{Total first-year cost} &= \$5{,}000{,}000 + \$36{,}000{,}000 = \$41{,}000{,}000 \end{align*} \]

# Cost calculations
num_borrowers <- 10000
test_cost_per_borrower <- 500
loss_per_default <- 200000

# Total testing cost
total_test_cost <- num_borrowers * test_cost_per_borrower

# Expected number of positive tests
expected_positives <- num_borrowers * p_positive_test

# Expected number of true defaulters among positives
expected_defaulters <- expected_positives * p_default_given_positive

# Expected loss from defaults
expected_loss <- expected_defaulters * loss_per_default

# Total first-year cost
total_first_year_cost <- total_test_cost + expected_loss

# Display results
total_first_year_cost
## [1] 4.1e+07

2.(Binomial):

The probability that a stock will pay a dividend in any given quarter is 0.7. What is the probability that the stock pays dividends exactly 6 times in 8 quarters? What is the probability that it pays dividends 6 or more times? What is the probability that it pays dividends fewer than 6 times? What is the expected number of dividend payments over 8 quarters? What is the standard deviation?

The probability mass function for a binomial distribution is given by: \[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

Where: - \(n\) = number of trials - \(k\) = number of successes - \(p\) = probability of success on each trial

For \(n = 8\) and \(p = 0.7\):

  • \(P(X = 6) = \binom{8}{6} (0.7)^6 (0.3)^2\)
  • \(P(X = 7) = \binom{8}{7} (0.7)^7 (0.3)^1\)
  • \(P(X = 8) = \binom{8}{8} (0.7)^8 (0.3)^0\)

The cumulative probability for \(X \geq 6\) is:

\[ P(X \geq 6) = P(X = 6) + P(X = 7) + P(X = 8) \]

The expected value \(E(X)\) and standard deviation \(\sigma\) are calculated as:

\[ E(X) = n \times p \]

\[ \sigma = \sqrt{n \times p \times (1-p)} \]

#code in R
# Parameters
n <- 8
p <- 0.7

# Probability of exactly 6 successes
P_X_equals_6 <- dbinom(6, size = n, prob = p)

# Probability of exactly 7 successes
P_X_equals_7 <- dbinom(7, size = n, prob = p)

# Probability of exactly 8 successes
P_X_equals_8 <- dbinom(8, size = n, prob = p)

# Cumulative probability of 6 or more successes
P_X_geq_6 <- P_X_equals_6 + P_X_equals_7 + P_X_equals_8

# Alternatively, using pbinom for cumulative probability
P_X_geq_6_alt <- pbinom(5, size = n, prob = p, lower.tail = FALSE)

# Expected value
expected_value <- n * p

# Standard deviation
standard_deviation <- sqrt(n * p * (1 - p))

# Results
P_X_equals_6
## [1] 0.2964755
P_X_equals_7
## [1] 0.1976503
P_X_equals_8
## [1] 0.05764801
P_X_geq_6
## [1] 0.5517738
P_X_geq_6_alt
## [1] 0.5517738
expected_value
## [1] 5.6
standard_deviation
## [1] 1.296148

3.(Poisson):

A financial analyst notices that there are an average of 12 trading days each month when a certain stock’s price increases by more than 2%. What is the probability that exactly 4 such days occur in a given month? What is the probability that more than 12 such days occur in a given month? How many such days would you expect in a 6-month period? What is the standard deviation of the number of such days? If an investment strategy requires at least 70 days of such price increases in a year for profitability, what is the percent utilization and what are your recommendations?

\[ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} \]

\[ P(X > 12) = 1 - P(X \leq 12) \]

\[ \text{Expected number in 6 months} = 6 \times 12 = 72 \text{ days} \]

\[ \sigma = \sqrt{\lambda} = \sqrt{12} \]

# Code in R
# Parameter
lambda <- 12

# Probability of X = 4
p_x_equals_4 <- dpois(4, lambda = lambda)

# Probability of X > 12
p_x_gt_12 <- 1 - ppois(12, lambda = lambda)

# Expected number in 6 months
expected_6_months <- 6 * lambda

# Standard deviation
std_dev <- sqrt(lambda)

# Display results
p_x_equals_4
## [1] 0.005308599
p_x_gt_12
## [1] 0.4240348
expected_6_months
## [1] 72
std_dev
## [1] 3.464102

4.(Hypergeometric):

A hedge fund has a portfolio of 25 stocks, with 15 categorized as high-risk and 10 as low-risk. The fund manager randomly selects 7 stocks to closely monitor. If the manager selected 5 high-risk stocks and 2 low-risk stocks, what is the probability of selecting exactly 5 high-risk stocks if the selection was random? How many high-risk and low-risk stocks would you expect to be selected?

For this problem, I have 15 high-risk stocks and 10 low-risk stocks in portfolio, 7 stocks randomly selected and 5 high-risk stocks selected

First I will calculate the probability of selecting exactly 5 high-risk stocks. \[ P(X=5)=(155)×(102)(257)=3003×45480700=135135480700≈0.2811P\\(X = 5) = \frac{\binom{15}{5} \times \binom{10}{2}}{\binom{25}{7}} = \frac{3003 \times 45}{480700}\\ = \frac{135135}{480700} \approx 0.2811P(X=5)\\=(725​)(515​)×(210​)​=4807003003×45​=480700135135​≈0.2811 \] Calculate the expected number of high-risk stocks to be selected. \[ E[Xhigh-risk]=n×KN=7×1525=7×0.6=4.2E[X_{\text{high-risk}}]\\ = n \times \frac{K}{N} = 7 \times \frac{15}{25} = 7 \times 0.6 = 4.2E[Xhigh-risk​]=n×NK​=7×2515​=7×0.6=4.2 \] Finally, I will calculate the expected number of low-risk stocks to be selected. \[ E[Xlow-risk]=n×N−KN=7×1025=7×0.4=2.8E[X_{\text{low-risk}}]\\ = n \times \frac{N-K}{N} = 7 \times \frac{10}{25}\\ = 7 \times 0.4 = 2.8E[Xlow-risk​]=n×NN−K​=7×2510​=7×0.4=2.8 \]

# Code in R
# Given information
high_risk <- 15 
low_risk <- 10  
total_stocks <- high_risk + low_risk  # Total number of stocks
selected <- 7      # Number of stocks selected
high_risk_selected <- 5  # Number of high-risk stocks selected

# Calculate probability of exactly 5 high-risk stocks
p_exactly_5_high_risk <- dhyper(high_risk_selected, high_risk, low_risk, selected)

# Expected number of high-risk and low-risk stocks
expected_high_risk <- selected * (high_risk / total_stocks)
expected_low_risk <- selected * (low_risk / total_stocks)

cat("Probability of selecting exactly 5 high-risk stocks:", round(p_exactly_5_high_risk, 4), "\n")
## Probability of selecting exactly 5 high-risk stocks: 0.2811
cat("Expected number of high-risk stocks selected:", round(expected_high_risk, 4), "\n")
## Expected number of high-risk stocks selected: 4.2
cat("Expected number of low-risk stocks selected:", round(expected_low_risk, 4), "\n\n")
## Expected number of low-risk stocks selected: 2.8

5.(Geometric):

The probability that a bond defaults in any given year is 0.5%. A portfolio manager holds this bond for 10 years. What is the probability that the bond will default during this period? What is the probability that it will default in the next 15 years? What is the expected number of years before the bond defaults? If the bond has already survived 10 years, what is the probability that it will default in the next 2 years?

For this exercise, I have a probability of bond default in a year = 0.5% = 0.005,bond held for 10 years

First, I have to calculate the probability of default during the 10-year period. \[ P(default in 10 years)=1−P(no default in 10 years)=1−(1−p)10=1−(0.995)\\10=1−0.9512=0.0488P(\text{default in 10 years}) = 1 - P(\text{no default in 10 years})\\ = 1 - (1-p)^{10} = 1 - (0.995)^{10} = 1 - 0.9512 = 0.0488P(default in 10 years)\\=1−P(no default in 10 years)=1−(1−p)10=1−(0.995)10=1−0.9512=0.0488 \] Now I will calculate the probability of default during the next 15 years. \[ P(default in 15 years)=1−(1−p)15=1−(0.995)15=1−0.9277=0.0723P(\text{default in 15 years})\\ = 1 - (1-p)^{15} = 1 - (0.995)^{15} = 1 - 0.9277 = 0.0723P(default in 15 years)\\=1−(1−p)15=1−(0.995)15=1−0.9277=0.0723 \] Calculate the expected number of years before the bond defaults. \[ E[X]=1p=10.005=200 yearsE[X]\\ = \frac{1}{p} = \frac{1}{0.005} = 200 \text{ years}E[X]=p1​=0.0051​=200 years \] Finally, I’m going to calculate the probability of default in the next 2 years after surviving 10 years using the property of the geometric distribution: \[ P(default in next 2 years∣survived 10 years)=1−(1−p)2=1−(0.995)2=1−0.9900\\=0.0100P(\text{default in next 2 years} | \text{survived 10 years})\\ = 1 - (1-p)^2 = 1 - (0.995)^2 = 1 - 0.9900 = 0.0100P(default in next 2 years∣survived 10 years)\\=1−(1−p)2=1−(0.995)2=1−0.9900=0.0100 \]

# To solve in R
# Given information
p_default_year <- 0.005  
years_held_10 <- 10      
years_held_15 <- 15      

# Calculate default probabilities
p_default_10 <- 1 - (1 - p_default_year)^years_held_10
p_default_15 <- 1 - (1 - p_default_year)^years_held_15

# Expected years before default = 1/p
expected_years <- 1 / p_default_year

# Probability of default in next 2 years, given survived 10 years
p_default_next_2 <- 1 - (1 - p_default_year)^2

cat("Probability of bond default in 10 years:", round(p_default_10, 4), "\n")
## Probability of bond default in 10 years: 0.0489
cat("Probability of bond default in 15 years:", round(p_default_15, 4), "\n")
## Probability of bond default in 15 years: 0.0724
cat("Expected number of years before default:", round(expected_years, 2), "\n")
## Expected number of years before default: 200
cat("Probability of default in next 2 years after surviving 10 years:", round(p_default_next_2, 4), "\n\n")
## Probability of default in next 2 years after surviving 10 years: 0.01

6.(Poisson):

A high-frequency trading algorithm experiences a system failure about once every 1500 trading hours. What is the probability that the algorithm will experience more than two failures in 1500 hours? What is the expected number of failures?

Based on the given information, I have an average of 1 system failure per 1500 trading hours

The first step is to calculate the probability of more than 2 failures in 1500 hours. \[ P(X>2)=1−P(X≤2)=1−(P(X=0)+P(X=1)+P(X=2))\\ P(X > 2) = 1 - P(X \leq 2) = 1 - (P(X=0) + P(X=1) + P(X=2))\\ P(X>2)=1−P(X≤2)=1−(P(X=0)+P(X=1)+P(X=2))\\ =1−(e−1×100!+e−1×111!+e−1×122!)\\= 1 - \left(\frac{e^{-1} \times 1^0}{0!} + \frac{e^{-1} \times 1^1}{1!} + \frac{e^{-1} \times\\ 1^2}{2!}\right)=1−(0!e−1×10​+1!e−1×11​+2!e−1×12​)\\ =1−(e−1+e−1+e−12)= 1 - (e^{-1} + e^{-1} + \frac{e^{-1}}{2})=1−(e−1+e−1+2e−1​)\\ =1−e−1(1+1+12)= 1 - e^{-1}(1 + 1 + \frac{1}{2})=1−e−1(1+1+21​)\\ =1−e−1×2.5= 1 - e^{-1} \times 2.5=1−e−1×2.5\\ =1−0.3679×2.5=1−0.9198=0.0802= 1 - 0.3679 \times 2.5 = 1 - 0.9198\\ = 0.0802=1−0.3679×2.5=1−0.9198=0.0802 \] Now I need to find the expected number of failures is simply the lambda value.

\[E[X]=λ=1E[X] = \lambda = 1E[X]=λ=1\]

# R code
# Given values
lambda_failures <- 1

# Calculate probability of more than 2 failures
p_more_than_2 <- ppois(2, lambda_failures, lower.tail = FALSE)  # P(X > 2) = 1 - P(X ≤ 2)

# Expected number of failures
expected_failures <- lambda_failures

cat("Probability of more than 2 failures in 1500 hours:", round(p_more_than_2, 4), "\n")
## Probability of more than 2 failures in 1500 hours: 0.0803
cat("Expected number of failures in 1500 hours:", expected_failures, "\n\n")
## Expected number of failures in 1500 hours: 1

7.(Uniform Distribution):

An investor is trying to time the market and is monitoring a stock that they believe has an equal chance of reaching a target price between 20 and 60 days. What is the probability that the stock will reach the target price in more than 40 days? If it hasn’t reached the target price by day 40, what is the probability that it will reach it in the next 10 days? What is the expected time for the stock to reach the target price?

Based on the given values, stock has equal chance of reaching target price between 20 and 60 days, the uniform distribution over [20, 60]

First, I will calculate the probability that the stock will reach the target price in more than 40 days. \[ P(X>40)=b−40b−a=60−4060−20=2040\\ =0.5P(X > 40) = \frac{b - 40}{b - a} = \frac{60 - 40}{60 - 20}\\ = \frac{20}{40} = 0.5P(X>40)=b−ab−40​=60−2060−40​=4020​=0.5 \] Now I’m going to calculate the conditional probability that the stock will reach the target in the next 10 days, given it hasn’t reached it by day 40. \[ P(40<X≤50∣X>40)=P(40<X≤50)\\P(X>40)=50−4060−40=1020\\=0.5P(40 < X \leq 50 | X > 40) = \frac{P(40 < X \leq 50)}{P(X > 40)} = \frac{50 - 40}{60 - 40} = \frac{10}{20}\\ = 0.5P(40<X≤50∣X>40)=P(X>40)P(40<X≤50)​=60−4050−40​=2010​=0.5 \] Finally, I will calculate the expected time for the stock to reach the target price. \[ E[X]=a+b2=20+602=40 daysE[X]\\ = \frac{a + b}{2} = \frac{20 + 60}{2} = 40 \text{ days}E[X]=2a+b​=220+60​=40 days \]

#R code
# Given Values
a <- 20  
b <- 60 

# Calculate probability of more than 40 days
p_more_than_40 <- (b - 40) / (b - a)  # P(X > 40) = (60 - 40) / (60 - 20)

# Calculate conditional probability
p_next_10_given_after_40 <- (50 - 40) / (b - 40)

# Expected time
expected_days <- (a + b) / 2

cat("Probability stock reaches target price after 40 days:", round(p_more_than_40, 4), "\n")
## Probability stock reaches target price after 40 days: 0.5
cat("Probability stock reaches target in next 10 days after day 40:", round(p_next_10_given_after_40, 4), "\n")
## Probability stock reaches target in next 10 days after day 40: 0.5
cat("Expected time for stock to reach target price:", expected_days, "days\n\n")
## Expected time for stock to reach target price: 40 days

8.(Exponential Distribution):

A financial model estimates that the lifetime of a successful start-up before it either goes public or fails follows an exponential distribution with an expected value of 8 years. What is the expected time until the start-up either goes public or fails? What is the standard deviation? What is the probability that the start-up will go public or fail after 6 years? Given that the start-up has survived for 6 years, what is the probability that it will go public or fail in the next 2 years?

With the given values, I can say that the exponential distribution with expected value of 8 years rate parameter λ = 1/8

First I need to find the expected time is the given mean. \[ E[X]=1λ=8 yearsE[X] = \frac{1}{\lambda} = 8 \text{ years}E[X]=λ1​=8 years \] Now I have to get the standard deviation equals the mean. \[ σ=1λ=8 years\sigma = \frac{1}{\lambda} = 8 \text{ years}σ=λ1​=8 years \] Calculate the probability that the startup will go public or fail after 6 years. \[ P(X>6)=e−λ×6=e−68=e−0.75=0.4724P(X > 6)\\ = e^{-\lambda \times 6} = e^{-\frac{6}{8}} = e^{-0.75} = 0.4724P(X>6)=e−λ×6=e−86​=e−0.75=0.4724 \] The final step would be the calculation of the probability that, given the startup has survived for 6 years, it will go public or fail in the next 2 years. \[ P(X≤2∣X>6)=P(X≤2)=1−e−λ×2=1−e−28=1−e−0.25\\=1−0.7788=0.2212p (X \leq 2 | X > 6) = P(X \leq 2) = 1 - e^{-\lambda \times 2} = 1 - e^{-\frac{2}{8}} = 1 - e^{-0.25} \\= 1 - 0.7788 = 0.2212P(X≤2∣X>6)=P(X≤2)=1−e−λ×2\\=1−e−82​=1−e−0.25=1−0.7788=0.2212 \]

# R code
# Given information
lambda <- 1/8 

# Expected time and standard deviation
expected_lifetime <- 1 / lambda
sd_lifetime <- 1 / lambda  

# Probability of startup going public or failing after 6 years
p_after_6 <- exp(-lambda * 6)  

# Conditional probability of going public or failing in next 2 years
p_next_2_given_survived_6 <- 1 - exp(-lambda * 2) 

cat("Expected lifetime of startup:", expected_lifetime, "years\n")
## Expected lifetime of startup: 8 years
cat("Standard deviation of lifetime:", sd_lifetime, "years\n")
## Standard deviation of lifetime: 8 years
cat("Probability startup goes public or fails after 6 years:", round(p_after_6, 4), "\n")
## Probability startup goes public or fails after 6 years: 0.4724
cat("Probability startup goes public or fails in next 2 years after surviving 6 years:", round(p_next_2_given_survived_6, 4), "\n")
## Probability startup goes public or fails in next 2 years after surviving 6 years: 0.2212

Problem 2

# Helper function for combinations
comb <- function(n, k) {
  return(factorial(n) / (factorial(k) * factorial(n - k)))
}

1.(Product Selection):

A company produces 5 different types of green pens and 7 different types of red pens. The marketing team needs to create a new promotional package that includes 5 pens. How many different ways can the package be created if it contains fewer than 2 green pens?

I need to breakdown this problem in parts, first I need to find the number of ways to create a package with 5 pens where fewer than 2 green pens are used. “Fewer than 2 green pens” means either 0 or 1 green pen. \[ Case 1: 0\ green\ pens, 5\ red\ pens\\ Ways\ to\ select\ 5\ red\ pens\ from\ 7\ types: C(7,5) = 21 \] \[ Case 2: 1\ green\ pen,\ 4\ red\ pens\\ Ways\ to\ select\ 1\ green\ pen\ from\ 5\ types: C(5,1) = 5\\ Ways\ to\ select\ 4\ red\ pens\ from\ 7\ types: C(7,4) = 35\\ Total\ for\ this\ case: 5 × 35 = 175 \] \[ Total\ number\ of\ ways = 21 + 175 = 196\\ Answer: 196 \]

# R code
# Case 1: 0 green pens, 5 red pens
case1 <- comb(7, 5)  # Ways to select 5 red pens from 7 types

# Case 2: 1 green pen, 4 red pens
case2 <- comb(5, 1) * comb(7, 4)  # Ways to select 1 green pen and 4 red pens

# Total number of ways
total_ways_p1 <- case1 + case2
cat("Problem 2.1 Answer:", total_ways_p1, "\n\n")
## Problem 2.1 Answer: 196

2.(Team Formation for a Project):

A project committee is being formed within a company that includes 14 senior managers and 13 junior managers. How many ways can a project team of 5 members be formed if at least 4 of the members must be junior managers?

I need to find the number of ways to form a team of 5 members where at least 4 of them are junior managers. \[ Case\ 1: 4\ junior\ managers,\ 1\ senior\ manager\\ Ways\ to\ select\ 4\ junior\ managers\ from\ 13: C(13,4) = 715\\ Ways\ to\ select\ 1\ senior\ manager\ from\ 14: C(14,1) = 14\\ Total\ for\ this\ case: 715 × 14 = 10,010 \] \[ Case\ 2: 5\ junior\ managers,\ 0\ senior\ managers\\ Ways\ to\ select\ 5\ junior\ managers\ from\ 13: C(13,5) = 1,287\\ Total\ number\ of\ ways = 10,010 + 1,287 = 11,297\\ Answer: 11,297 \]

# R code
# Case 1: 4 junior managers, 1 senior manager
case1 <- comb(13, 4) * comb(14, 1)

# Case 2: 5 junior managers, 0 senior managers
case2 <- comb(13, 5)

# Total number of ways
total_ways_p2 <- case1 + case2
cat("Problem 2.2 Answer:", total_ways_p2, "\n\n")
## Problem 2.2 Answer: 11297

3.(Marketing Campaign Outcomes):

A marketing campaign involves three stages: first, a customer is sent 5 email offers; second, the customer is targeted with 2 different online ads; and third, the customer is presented with 3 personalized product recommendations. If the email offers, online ads, and product recommendations are selected randomly, how many different possible outcomes are there for the entire campaign?

First, I need to find the total number of possible outcomes across all three stages.

\[ Stage\ 1: 5\ email\ offers\\ Stage\ 2: 2\ online\ ads\\ Stage\ 3: 3\ personalized\ product\ recommendations\\ Total\ outcomes = 5 × 2 × 3 = 30\\ Answer: 30 \]

#rcode
total_outcomes_p3 <- 5 * 2 * 3
cat("Problem 2.3 Answer:", total_outcomes_p3, "\n\n")
## Problem 2.3 Answer: 30

4.(Product Defect Probability):

A quality control team draws 3 products from a batch of size N without replacement. What is the probability that at least one of the products drawn is defective if the defect rate is known to be consistent?

This problem has a defect rate denoted as p, the probability of a non-defective product would be 1-p. To find a solution: I have to use this formula: \[ (1-p)*(1-p)*(1-p)^3 = 1-(1-p)^3 \]

prob_at_least_one_defective <- function(p) {
  1 - (1 - p)^3
}
prob_at_least_one_defective
## function (p) 
## {
##     1 - (1 - p)^3
## }

5.(Business Strategy Choices):

A business strategist is choosing potential projects to invest in, focusing on 17 high-risk, high-reward projects and 14 low-risk, steady-return projects.

Step 1: How many different combinations of 5 projects can the strategist select?

based on the given information, the total number of ways to select 5 projects from 31 projects (17 high-risk + 14 low-risk) \[ C(31,5) = 169,911\\ Answer: 169,911 \]

# R code
total_projects <- 17 + 14
step1 <- comb(total_projects, 5)
cat("Step 1 Answer:", step1, "\n")
## Step 1 Answer: 169911

Step 2: How many different combinations of 5 projects can the strategist select if they want at least one low-risk project?

For step 2, I need to find the number of ways to select 5 projects with at least one low-risk project. \[ Total\ ways - Ways\ with\ no\ low/risk\ projects\\ C(31,5) - C(17,5) = 169,911 - 6,188 = 163,723\\ Answer: 163,723 \]

# R code
step2 <- step1 - comb(17, 5)
cat("Step 2 Answer:", step2, "\n\n")
## Step 2 Answer: 163723

6.(Event Scheduling):

A business conference needs to schedule 9 different keynote sessions from three different industries: technology, finance, and healthcare. There are 4 potential technology sessions, 104 finance sessions, and 17 healthcare sessions to choose from. How many different schedules can be made? Express your answer in scientific notation rounding to the hundredths place.

The total number of ways to schedule 9 sessions from all available sessions: \[ C(4+104+17,9) = C(125,9)\\ C(125,9) = 125!/(9! × 116!) ≈ 1.9906 × 10^14\\ Answer: 1.99 × 10^14 \]

# R code
total_sessions <- 4 + 104 + 17
ways_p6 <- comb(total_sessions, 9)
scientific_notation_p6 <- format(ways_p6, scientific = TRUE, digits = 3)
cat("Problem 2.6 Answer:", scientific_notation_p6, "\n\n")
## Problem 2.6 Answer: 1.53e+13

7.(Book Selection for Corporate Training):

An HR manager needs to create a reading list for a corporate leadership training program, which includes 13 books in total. The books are categorized into 6 novels, 6 business case studies, 7 leadership theory books, and 5 strategy books.

Step 1: If the manager wants to include no more than 4 strategy books, how many different reading schedules are possible? Express your answer in scientific notation rounding to the hundredths place.

I have the given values: Total books = 6 novels + 6 business cases + 7 leadership + 5 strategy = 24 books, now I need to select 13 books with ≤ 4 strategy books: \[ = C(19,13) + C(19,12)×C(5,1) + C(19,11)×C(5,2) + C(19,10)×C(5,3) + C(19,9)×C(5,4)\\ = 27,132 + 50,388×5 + 75,582×10 + 92,378×10 + 92,378×5\\ = 27,132 + 251,940 + 755,820 + 923,780 + 461,890\\ = 2,420,562 ≈ 2.42 × 10^6\\ Answer: 2.42 × 10^6 \]

# Step 1 R code
novels <- 6
business <- 6
leadership <- 7
strategy <- 5
non_strategy <- novels + business + leadership

# Calculate ways to include 0, 1, 2, 3, or 4 strategy books
ways_step1 <- comb(non_strategy, 13) + 
              comb(non_strategy, 12) * comb(strategy, 1) +
              comb(non_strategy, 11) * comb(strategy, 2) +
              comb(non_strategy, 10) * comb(strategy, 3) +
              comb(non_strategy, 9) * comb(strategy, 4)
scientific_notation_step1 <- format(ways_step1, scientific = TRUE, digits = 3)
cat("Step 1 Answer:", scientific_notation_step1, "\n")
## Step 1 Answer: 2.42e+06

Step 2: If the manager wants to include all 6 business case studies, how many different reading schedules are possible? Express your answer in scientific notation rounding to the hundredths place.

For step 2, I need the number of ways to include all 6 business case studies Since all 6 business cases must be included, we need to select the remaining 7 books from the other categories: \[ = C(6+7+5,7) = C(18,7) = 31,824\\ Answer: 3.18 × 10^4 \]

remaining_books <- novels + leadership + strategy
ways_step2 <- comb(remaining_books, 7)
scientific_notation_step2 <- format(ways_step2, scientific = TRUE, digits = 3)
cat("Step 2 Answer:", scientific_notation_step2, "\n\n")
## Step 2 Answer: 3.18e+04

8.(Product Arrangement):

A retailer is arranging 10 products on a display shelf. There are 5 different electronic gadgets and 5 different accessories. What is the probability that all the gadgets are placed together and all the accessories are placed together on the shelf? Express your answer as a fraction or a decimal number rounded to four decimal places.

For this problem, I have 10 positions with 5 gadgets and 5 accessories, the gadgets and the accessories must be placed together. This reduces to arranging 2 blocks (1 gadget block, 1 accessory block), and then arranging items within each block. \[ Ways\ to\ arrange\ 2\ blocks: 2! = 2\\ Ways\ to\ arrange\ 5\ gadgets: 5! = 120\\ Ways\ to\ arrange\ 5\ accessories: 5! = 120\\ Total\ arrangements\ with\ blocks\ together: 2 × 120 × 120 = 28,800\\ Total\ possible\ arrangements\ of\ 10\ items: 10! = 3,628,800\\ Probability = 28,800/3,628,800 = 1/126 ≈ 0.0079\\ Answer: 0.0079 \]

# R code
block_arrangements <- factorial(2)

# Ways to arrange items within each block
gadget_arrangements <- factorial(5)
accessory_arrangements <- factorial(5)

# Total arrangements with blocks together
favorable_outcomes <- block_arrangements * gadget_arrangements * accessory_arrangements

# Total possible arrangements of 10 items
total_outcomes <- factorial(10)

# Probability
probability_p8 <- favorable_outcomes / total_outcomes
cat("Problem 2.8 Answer:", round(probability_p8, 4), "\n\n")
## Problem 2.8 Answer: 0.0079

9.(Expected Value of a Business Deal):

A company is evaluating a deal where they either gain $4 for every successful contract or lose $16 for every unsuccessful contract. A “successful” contract is defined as drawing a queen or lower from a standard deck of cards. (Aces are considered the highest card in the deck.)

Step 1: Find the expected value of the deal. Round your answer to two decimal places. Losses must be expressed as negative values.

to find the expected value I have to calculate the “successful” contract, which means drawing a queen or lower from a standard deck of cards.

\[Jacks,\ Queens,\ and\ cards\ 2-10\ are\ queen\ or\ lower = 44\ cards\\ Kings\ and\ Aces\ are\ higher\ than\ queens\ = 8\ cards\\ Probability\ of\ success\ = 44/52 = 11/13\\ Probability\ of\ failure\ = 8/52 = 2/13\] \(Expected\ value = (\$\ 4 × 11/13) + (-\$16 × 2/13) = \$44/13 - \$32/13 = \$12/13 ≈ \$0.92\) \(Answer: \$\ 0.92\)

# R code
p_success <- 44/52  # 11/13

# Probability of failure (kings and aces)
p_failure <- 8/52  # 2/13

# Step 1: Expected value calculation
expected_value <- 4 * p_success + (-16) * p_failure
cat("Step 1 Answer: $", round(expected_value, 2), "\n")
## Step 1 Answer: $ 0.92

Step 2: If the company enters into this deal 833 times, how much would they expect to win or lose? Round your answer to two decimal places. Losses must be expressed as negative values.

Expected value for 833 deals \[ Total\ expected\ value = 833 × $0.92 = $768.92// Answer: $768.92 \]

# R code
total_expected_value <- 833 * expected_value
cat("Problem 2.9 Step 2 Answer: $", round(total_expected_value, 2), "\n")
## Problem 2.9 Step 2 Answer: $ 768.92

Problem 3

1.(Supply Chain Risk Assessment):

Let \(X1,X2,....,Xn\) represent the lead times (in days) for the delivery of key components from \(n = 5\) different suppliers. Each lead time is uniformly distributed across a range of \(1 to\ k = 20 days\), reflecting the uncertainty in delivery times. Let \(Y\) denote the minimum delivery time among all suppliers. Understanding the distribution of \(Y\) is crucial for assessing the earliest possible time you can begin production. Determine the distribution of Y to better manage your supply chain and minimize downtime.

For the minimum lead time \(Y\) among \(n=5\) suppliers with uniformly distributed lead times between 1 and 20 days: The cumulative distribution function of \(Y\) is:

\[F_Y(y) = 1 - [(20-y)/19]^5 for 1 ≤ y ≤ 20\] The probability density function is: \[f_Y(y) = 5[(20-y)/19]^4 × (1/19) for 1 ≤ y ≤ 20\] The expected value of Y is approximately 2.83 days, with a standard deviation of about 1.47 days. This means that with 5 suppliers, you can expect the earliest available component to arrive in about 3 days, which helps with production planning.

# R code
# Given values
n <- 5  
k <- 20  

# Define the CDF and PDF of the minimum (properly vectorized)
min_cdf <- function(y) {
  result <- numeric(length(y))
  for (i in 1:length(y)) {
    if (y[i] < 1) result[i] <- 0
    else if (y[i] > k) result[i] <- 1
    else result[i] <- 1 - ((k - y[i])/(k - 1))^n
  }
  return(result)
}

min_pdf <- function(y) {
  result <- numeric(length(y))
  for (i in 1:length(y)) {
    if (y[i] < 1 || y[i] > k) result[i] <- 0
    else result[i] <- n * ((k - y[i])/(k - 1))^(n-1) * (1/(k - 1))
  }
  return(result)
}

# For numerical integration
min_pdf_integrand <- function(y) {
  result <- numeric(length(y))
  for (i in 1:length(y)) {
    if (y[i] < 1 || y[i] > k) result[i] <- 0
    else result[i] <- n * ((k - y[i])/(k - 1))^(n-1) * (1/(k - 1))
  }
  return(result)
}

# Create vectors for plotting
y_values <- seq(1, k, 0.1)
cdf_values <- min_cdf(y_values)
pdf_values <- min_pdf(y_values)

# Plot the PDF and CDF
par(mfrow=c(2,1))
plot(y_values, pdf_values, type="l", main="PDF of Minimum Lead Time (Y)",
     xlab="y (days)", ylab="f_Y(y)", col="blue", lwd=2)
plot(y_values, cdf_values, type="l", main="CDF of Minimum Lead Time (Y)",
     xlab="y (days)", ylab="F_Y(y)", col="red", lwd=2)

# Calculate expected value and variance of Y using numerical integration
expected_y <- integrate(function(y) y * min_pdf_integrand(y), lower=1, upper=k)$value
var_y <- integrate(function(y) (y - expected_y)^2 * min_pdf_integrand(y), lower=1, upper=k)$value
sd_y <- sqrt(var_y)

cat("Expected minimum lead time:", round(expected_y, 4), "days\n")
## Expected minimum lead time: 4.1667 days
cat("Standard deviation of minimum lead time:", round(sd_y, 4), "days\n\n")
## Standard deviation of minimum lead time: 2.6763 days
# Alternative calculation using numerical methods
y_grid <- seq(1, k, 0.001)
pdf_grid <- min_pdf(y_grid)  # Use the vectorized min_pdf function
expected_y_num <- sum(y_grid * pdf_grid * 0.001)
var_y_num <- sum((y_grid - expected_y_num)^2 * pdf_grid * 0.001)
sd_y_num <- sqrt(var_y_num)

cat("Using numerical approximation:\n")
## Using numerical approximation:
cat("Expected minimum lead time:", round(expected_y_num, 4), "days\n")
## Expected minimum lead time: 4.1668 days
cat("Standard deviation of minimum lead time:", round(sd_y_num, 4), "days\n\n")
## Standard deviation of minimum lead time: 2.6766 days

2.(Maintenance Planning for Critical Equipment):

Your organization owns a critical piece of equipment, such as a high-capacity photocopier (for a law firm) or an MRI machine (for a healthcare provider). The manufacturer estimates the expected lifetime of this equipment to be 8 years, meaning that, on average, you expect one failure every 8 years. It’s essential to understand the likelihood of failure over time to plan for maintenance and replacements.

a. Geometric Model:

Calculate the probability that the machine will not fail for the first 6 years. Also, provide the expected value and standard deviation. This model assumes each year the machine either fails or does not, independently of previous years.

Given the values for this exercise, an equipment with an average lifetime of 8 years, \[ Probability\ of\ no\ failure\ in\ first\ 6\ years: 0.4503\\ Expected\ time\ to\ failure: 8\ years\\ Standard\ deviation: 7.4833\ years \]

# Given parameters
lambda <- 1/8  
years <- 6     

# Geometric Model
p_failure <- lambda 
p_no_failure_geo <- (1 - p_failure)^years

expected_geo <- 1/p_failure  # Expected value for geometric distribution
sd_geo <- sqrt((1 - p_failure)/(p_failure^2))  # Standard deviation for geometric

cat("Geometric Model:\n")
## Geometric Model:
cat("Probability of no failure in first 6 years:", round(p_no_failure_geo, 4), "\n")
## Probability of no failure in first 6 years: 0.4488
cat("Expected time to failure:", round(expected_geo, 4), "years\n")
## Expected time to failure: 8 years
cat("Standard deviation:", round(sd_geo, 4), "years\n\n")
## Standard deviation: 7.4833 years

b. Exponential Model:

Calculate the probability that the machine will not fail for the first 6 years. Provide the expected value and standard deviation, modeling the time to failure as a continuous process. \[ Probability\ of\ no\ failure\ in\ first\ 6\ years: 0.4724\\ Expected\ time\ to\ failure: 8\ years\\ Standard\ deviation: 8\ years\\ \]

# Exponential Model
p_no_failure_exp <- exp(-lambda * years)

expected_exp <- 1/lambda 
sd_exp <- 1/lambda       

cat("Exponential Model:\n")
## Exponential Model:
cat("Probability of no failure in first 6 years:", round(p_no_failure_exp, 4), "\n")
## Probability of no failure in first 6 years: 0.4724
cat("Expected time to failure:", round(expected_exp, 4), "years\n")
## Expected time to failure: 8 years
cat("Standard deviation:", round(sd_exp, 4), "years\n\n")
## Standard deviation: 8 years

c. Binomial Model:

Calculate the probability that the machine will not fail during the first 6 years, given that it is expected to fail once every 8 years. Provide the expected value and standard deviation, assuming a fixed number of trials (years) with a constant failure probability each year. \[ Probability\ of\ no\ failure\ in\ first\ 6\ years: 0.4501\\ Expected\ number\ of\ failures\ in\ 6\ years: 0.75\\ Standard\ deviation: 0.8292 \]

# Binomial Model
n_years <- years
p_annual_failure <- 1/8 

p_no_failure_bin <- dbinom(0, size=n_years, prob=p_annual_failure)
expected_bin <- n_years * p_annual_failure
sd_bin <- sqrt(n_years * p_annual_failure * (1 - p_annual_failure))  

cat("Binomial Model:\n")
## Binomial Model:
cat("Probability of no failure in first 6 years:", round(p_no_failure_bin, 4), "\n")
## Probability of no failure in first 6 years: 0.4488
cat("Expected number of failures in 6 years:", round(expected_bin, 4), "\n")
## Expected number of failures in 6 years: 0.75
cat("Standard deviation:", round(sd_bin, 4), "\n\n")
## Standard deviation: 0.8101

d. Poisson Model:

Calculate the probability that the machine will not fail during the first 6 years, modeling the failure events as a Poisson process. Provide the expected value and standard deviation. \[ Probability\ of\ no\ failure\ in\ first\ 6\ years: 0.4724\\ Expected\ number\ of\ failures\ in\ 6\ years: 0.75\\ Standard\ deviation: 0.8660 \]

# Poisson Model
lambda_rate <- 1/8  # Failures per year
time_period <- years

p_no_failure_pois <- dpois(0, lambda=lambda_rate * time_period)
expected_pois <- lambda_rate * time_period  # Expected value for Poisson
sd_pois <- sqrt(lambda_rate * time_period)  

cat(" Poisson Model:\n")
##  Poisson Model:
cat("Probability of no failure in first 6 years:", round(p_no_failure_pois, 4), "\n")
## Probability of no failure in first 6 years: 0.4724
cat("Expected number of failures in 6 years:", round(expected_pois, 4), "\n")
## Expected number of failures in 6 years: 0.75
cat("Standard deviation:", round(sd_pois, 4), "\n")
## Standard deviation: 0.866

Problem 4

1. Scenario:

You are managing two independent servers in a data center. The time until the next failure for each server follows an exponential distribution with different rates: Server A has a failure rate of \(λB = 0.5\) failures per hour. Server B has a failure rate of \(λB = 0.3\) failures per hour.

Question: What is the distribution of the total time until both servers have failed at least once? Use the moment generating function (MGF) to find the distribution of the sum of the times to failure.

To find the distribution of the total time until both servers have failed, I’ll use moment generating functions. For an exponential distribution with parameter \(λ\), the MGF is: \[M(t) = λ/(λ-t), for\ t < λ\] The MGF of the time until Server A fails is: \[M_A(t) = 0.5/(0.5-t)\] The MGF of the time until Server B fails is: \[M_B(t) = 0.3/(0.3-t)\] Since the failures are independent, the MGF of their sum is the product of their MGFs: \[M_Z(t) = M_A(t) × M_B(t) = [0.5/(0.5-t)] × [0.3/(0.3-t)]\] After partial fraction decomposition: \[M_Z(t) = 0.75[(1/(0.3-t)) - (1/(0.5-t))]\] This is the MGF of a hypoexponential distribution with parameters \[λA = 0.5\ and\ λB = 0.3.\] The PDF of this distribution is: \[f(t) = 0.75(e^(-0.3t) - e^(-0.5t)) for t > 0\] The mean time until both servers fail is: \[E[Z] = 1/λA + 1/λB = 1/0.5 + 1/0.3 = 2 + 3.33 = 5.33 hours\] The standard deviation is approximately \(3.89\ hours.\)

# Given Parameters
lambda_A <- 0.5
lambda_B <- 0.3

# Probability density function for the total time until both servers fail
hypoexp_pdf <- function(t, lambda1, lambda2) {
  if (lambda1 == lambda2) {
    return(lambda1^2 * t * exp(-lambda1 * t))
  } else {
    return(lambda1 * lambda2 / (lambda1 - lambda2) * 
           (exp(-lambda2 * t) - exp(-lambda1 * t)))
  }
}

# CDF for the total time until both servers fail
hypoexp_cdf <- function(t, lambda1, lambda2) {
  if (lambda1 == lambda2) {
    return(1 - exp(-lambda1 * t) * (1 + lambda1 * t))
  } else {
    return(1 - lambda1/(lambda1-lambda2)*exp(-lambda2*t) + 
           lambda2/(lambda1-lambda2)*exp(-lambda1*t))
  }
}

# Generate plot values
t_values <- seq(0, 15, 0.1)
pdf_values <- sapply(t_values, hypoexp_pdf, lambda1=lambda_A, lambda2=lambda_B)
cdf_values <- sapply(t_values, hypoexp_cdf, lambda1=lambda_A, lambda2=lambda_B)

# Calculate mean and variance
mean_hypoexp <- 1/lambda_A + 1/lambda_B
var_hypoexp <- 1/lambda_A^2 + 1/lambda_B^2

# Plot the PDF and CDF
par(mfrow=c(2,1))
plot(t_values, pdf_values, type="l", main="PDF of Time Until Both Servers Fail",
     xlab="Time (hours)", ylab="Density", col="blue", lwd=2)
abline(v=mean_hypoexp, col="red", lty=2)
text(mean_hypoexp+1, max(pdf_values)*0.8, paste("Mean =", round(mean_hypoexp, 2)))

plot(t_values, cdf_values, type="l", main="CDF of Time Until Both Servers Fail",
     xlab="Time (hours)", ylab="Probability", col="blue", lwd=2)

cat("Server Failure Analysis\n")
## Server Failure Analysis
cat("Mean time until both servers fail:", round(mean_hypoexp, 4), "hours\n")
## Mean time until both servers fail: 5.3333 hours
cat("Standard deviation:", round(sqrt(var_hypoexp), 4), "hours\n\n")
## Standard deviation: 3.8873 hours

2. Sum of Independent Normally Distributed Random Variables

Scenario: An investment firm is analyzing the returns of two independent assets, Asset X and Asset Y. The returns on these assets are normally distributed:

Asset X returns: \(X ~ N(μX = 5%, σX² = 4%)\) Asset Y returns: \(Y ~ N(μY = 7%, σY² = 9%)\)

Question: Find the distribution of the combined return of the portfolio consisting of these two assets using the moment generating function (MGF).

For a normal distribution with parameters \(μ\ and\ σ²\), the MGF is: \[M(t) = exp(μt + σ²t²/2)\] For Asset \(X: M_X(t) = exp(5t + 2t²)\) For Asset \(Y: M_Y(t) = exp(7t + 4.5t²)\) The MGF of the combined return Z = X + Y is: \[M_Z(t) = M_X(t) × M_Y(t) = exp(5t + 2t²) × exp(7t + 4.5t²) = exp(12t + 6.5t²)\] This is the MGF of a normal distribution with: \[μZ = 12% and σZ² = 13%\] Therefore, the combined return Z follows a normal distribution: \[Z ~ N(12%, 13%)\] with a standard deviation of approximately \(3.61\%.\)

# Given Parameters
mu_X <- 5
sigma2_X <- 4
sigma_X <- sqrt(sigma2_X)

mu_Y <- 7
sigma2_Y <- 9
sigma_Y <- sqrt(sigma2_Y)

mu_sum <- mu_X + mu_Y
sigma2_sum <- sigma2_X + sigma2_Y
sigma_sum <- sqrt(sigma2_sum)

# Generate plot values for the individual distributions and their sum
x_values <- seq(-5, 20, 0.1)
pdf_X <- dnorm(x_values, mean=mu_X, sd=sigma_X)
pdf_Y <- dnorm(x_values, mean=mu_Y, sd=sigma_Y)
pdf_sum <- dnorm(x_values, mean=mu_sum, sd=sigma_sum)

# Plot the PDFs
par(mfrow=c(1,1))
plot(x_values, pdf_X, type="l", main="PDFs of Asset Returns",
     xlab="Return (%)", ylab="Density", col="blue", lwd=2, ylim=c(0, max(pdf_X, pdf_Y, pdf_sum)))
lines(x_values, pdf_Y, col="red", lwd=2)
lines(x_values, pdf_sum, col="green", lwd=2)
legend("topright", legend=c("Asset X", "Asset Y", "Combined"), 
       col=c("blue", "red", "green"), lwd=2)

cat("Asset Return Analysis\n")
## Asset Return Analysis
cat("Combined return distribution: Normal\n")
## Combined return distribution: Normal
cat("Mean:", round(mu_sum, 2), "%\n")
## Mean: 12 %
cat("Variance:", round(sigma2_sum, 2), "%\n")
## Variance: 13 %
cat("Standard deviation:", round(sigma_sum, 2), "%\n\n")
## Standard deviation: 3.61 %

3.Scenario:

A call center receives calls independently from two different regions. The number of calls received from Region A and Region B in an hour follows a Poisson distribution:

Region A calls: \(XA ~ Poisson(λA = 3)\) Region B calls: \(XB ~ Poisson(λB = 5)\)

Question: Determine the distribution of the total number of calls received in an hour from both regions using the moment generating function (MGF).

For a Poisson distribution with parameter \(λ\), the MGF is: \[M(t) = exp(λ(e^t - 1))\] For Region A: \(M_A(t) = exp(3(e^t - 1))\) For Region B: \(M_B(t) = exp(5(e^t - 1))\) The MGF of the total calls Z = XA + XB is: \[M_Z(t) = M_A(t) × M_B(t) = exp(3(e^t - 1)) × exp(5(e^t - 1)) = exp(8(e^t - 1))\] This is exactly the MGF of a Poisson distribution with parameter \(λ = 8.\) Therefore, the total number of calls follows a Poisson distribution: \(Z ~ Poisson(8)\) The mean and variance of the total calls are both equal to \(8.\)

# Given Parameters
lambda_A_calls <- 3
lambda_B_calls <- 5

# For Poisson distributions
lambda_sum <- lambda_A_calls + lambda_B_calls

# Generate plot values for the PMFs
x_range <- 0:20
pmf_A <- dpois(x_range, lambda=lambda_A_calls)
pmf_B <- dpois(x_range, lambda=lambda_B_calls)
pmf_sum <- dpois(x_range, lambda=lambda_sum)

# Plot the PMFs
par(mfrow=c(1,1))
barplot(rbind(pmf_A, pmf_B, pmf_sum), beside=TRUE, col=c("blue", "red", "green"),
        main="PMFs of Call Counts", xlab="Number of Calls", ylab="Probability",
        names.arg=x_range, legend.text=c("Region A", "Region B", "Combined"))

cat("Problem 4.3: Call Center Analysis\n")
## Problem 4.3: Call Center Analysis
cat("Combined call distribution: Poisson\n")
## Combined call distribution: Poisson
cat("Parameter λ (mean):", lambda_sum, "calls per hour\n")
## Parameter λ (mean): 8 calls per hour
cat("Variance:", lambda_sum, "calls² per hour²\n")
## Variance: 8 calls² per hour²

Problem 5

1.Customer Retention and Churn Analysis

Scenario: A telecommunications company wants to model the behavior of its customers regarding their likelihood to stay with the company (retention) or leave for a competitor (churn). The company segments its customers into three states:

State 1: Active customers who are satisfied and likely to stay (Retention state).

State 2: Customers who are considering leaving (At-risk state).

State 3: Customers who have left (Churn state).

The company has historical data showing the following monthly transition probabilities:

From State 1 (Retention): 80% stay in State 1, 15% move to State 2, and 5% move to State 3.

From State 2 (At-risk): 30% return to State 1, 50% stay in State 2, and 20% move to State 3.

From State 3 (Churn): 100% stay in State 3.

The company wants to analyze the long-term behavior of its customer base.

(a) Construct the transition matrix for this Markov Chain.

From the given transition probabilities: \[P = \begin{bmatrix} 0.80 & 0.15 & 0.05 \ 0.30 & 0.50 & 0.20 \ 0 & 0 & 1.00 \end{bmatrix}\] Where rows represent starting states and columns represent ending states.

# Construct the transition matrix
P1 <- matrix(c(
  0.80, 0.15, 0.05,
  0.30, 0.50, 0.20,
  0.00, 0.00, 1.00
), nrow = 3, byrow = TRUE)

print("Customer Retention and Churn Analysis")
## [1] "Customer Retention and Churn Analysis"
print("Transition Matrix")
## [1] "Transition Matrix"
print(P1)
##      [,1] [,2] [,3]
## [1,]  0.8 0.15 0.05
## [2,]  0.3 0.50 0.20
## [3,]  0.0 0.00 1.00

(b) If a customer starts as satisfied (State 1), what is the probability that they will eventually churn (move to State 3)?

To find the probability of eventually moving from State 1 to State 3 (absorption probability), I’ll use the theory of absorbing Markov chains. First, I’ll reorganize the transition matrix to separate the transient states (1 and 2) from the absorbing state (3): \[P = \begin{bmatrix} Q & R \ 0 & I \end{bmatrix} = \begin{bmatrix} 0.80 & 0.15 & 0.05 \ 0.30 & 0.50 & 0.20 \ 0 & 0 & 1.00 \end{bmatrix}\] Where:

\(Q=[0.800.150.300.50]Q =\) \[\begin{bmatrix} 0.80 & 0.15 \\ 0.30 & 0.50 \end{bmatrix}\]

\(Q=[0.800.30​0.150.50​]\) represents transitions between transient states

\(R=[0.050.20]R =\) \[\begin{bmatrix} 0.05 \\ 0.20 \end{bmatrix}\]

\(R=[0.050.20​]\) represents transitions from transient to absorbing states

The fundamental matrix \(N=(I−Q)−1N = (I-Q)^{-1} N=(I−Q)−1\) gives the expected number of visits to each transient state:

\(I−Q=[1001]−[0.800.150.300.50]=[0.20−0.15−0.300.50]I-Q =\) \[\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\]
  • \[\begin{bmatrix} 0.80 & 0.15 \\ 0.30 & 0.50 \end{bmatrix}\] = \[\begin{bmatrix} 0.20 & -0.15 \\ -0.30 & 0.50 \end{bmatrix}\]
\(I−Q=[10​01​]−[0.800.30​0.150.50​]=[0.20−0.30​−0.150.50​]\) \(Computing (I−Q)−1(I-Q)^{-1} (I−Q)−1:\) \[det⁡(I−Q)=(0.20)(0.50)−(−0.15)(−0.30)=0.10−0.045=0.055\\det(I-Q) = (0.20)(0.50) - (-0.15)(-0.30) = 0.10 - 0.045 = 0.055\\ det(I−Q)=(0.20)(0.50)−(−0.15)(−0.30)=0.10−0.045=0.055\] \((I−Q)−1=10.055[0.500.150.300.20]=[9.092.735.453.64](I-Q)^{-1} = \frac{1}{0.055}\) \[\begin{bmatrix} 0.50 & 0.15 \\ 0.30 & 0.20 \end{bmatrix}\] = \[\begin{bmatrix} 9.09 & 2.73 \\ 5.45 & 3.64 \end{bmatrix}\]

\((I−Q)−1=0.0551​[0.500.30​0.150.20​]=[9.095.45​2.733.64​]\) The absorption probabilities are given by \(N⋅RN \cdot R N⋅R\):

\(N⋅R=[9.092.735.453.64]⋅[0.050.20]=[0.455+0.5460.273+0.727]=[1.001.00]N \cdot R\)= \[\begin{bmatrix} 9.09 & 2.73 \\ 5.45 & 3.64 \end{bmatrix}\] \[\begin{bmatrix} 0.05 \\ 0.20 \end{bmatrix}\] = \[\begin{bmatrix} 0.455 + 0.546 \\ 0.273 + 0.727 \end{bmatrix}\] = \[\begin{bmatrix} 1.00 \\ 1.00 \end{bmatrix}\]

\(N⋅R=[9.095.45​2.733.64​]⋅[0.050.20​]=[0.455+0.5460.273+0.727​]=[1.001.00.]\)

This means that starting from State 1, the probability of eventually moving to State 3 (churning) is 1.00 or 100%. This makes sense because State 3 is an absorbing state, and there’s always a positive probability of transitioning to it from other states, so eventually all customers will churn.

# Extract Q
Q <- P1[1:2, 1:2]
# Extract R
R <- P1[1:2, 3, drop = FALSE]

# Calculate fundamental matrix
I <- diag(2)
N <- solve(I - Q)

# Calculate absorption probabilities
B <- N %*% R

print("Probability of eventually churning from State 1")
## [1] "Probability of eventually churning from State 1"
print(paste("Absorption probability from State 1 to State 3:", B[1,1]))
## [1] "Absorption probability from State 1 to State 3: 1"

(c) Determine the steady-state distribution of this Markov Chain. What percentage of customers can the company expect to be in each state in the long run?

To find the steady-state distribution, I need to solve the system:

\[πP=π\pi P = \pi πP=π ∑i=13πi=1\sum_{i=1}^3 \pi_i = 1 ∑i=13​πi​=1\]

Where \(π=[π1,π2,π3]\pi = [\pi_1, \pi_2, \pi_3]π=[π1​,π2​,π3​]\) is the steady-state distribution.

Since State 3 is an absorbing state and there’s a positive probability of reaching it from any other state, the steady-state distribution will be: \[π=[0,0,1]\pi = [0, 0, 1] π=[0,0,1]\] This means in the long run, 100% of customers will be in State 3 (churned).

# For this absorbing Markov chain, the steady state will be [0,0,1]
steady_state1 <- c(0, 0, 1)
print("Steady-state distribution")
## [1] "Steady-state distribution"
print(steady_state1)
## [1] 0 0 1
# Start with initial distribution
initial_state <- c(1, 0, 0)
current_state <- initial_state
n_steps <- 100  # Large number to approach steady state

for(i in 1:n_steps) {
  current_state <- current_state %*% P1
}

print("Verification by iteration:")
## [1] "Verification by iteration:"
print(current_state)
##              [,1]         [,2]      [,3]
## [1,] 6.192346e-05 2.266556e-05 0.9999154
cat("\n\n")

2.Inventory Management in a Warehouse

Scenario: A warehouse tracks the inventory levels of a particular product using a Markov Chain model. The inventory levels are categorized into three states:

State 1: High inventory (More than 100 units in stock).

State 2: Medium inventory (Between 50 and 100 units in stock).

State 3: Low inventory (Less than 50 units in stock).

The warehouse has the following transition probabilities for inventory levels from one month to the next:

From State 1 (High): 70% stay in State 1, 25% move to State 2, and 5% move to State 3.

From State 2 (Medium): 20% move to State 1, 50% stay in State 2, and 30% move to State 3.

From State 3 (Low): 10% move to State 1, 40% move to State 2, and 50% stay in State 3.

The warehouse management wants to optimize its restocking strategy by understanding the long-term distribution of inventory levels.

(a) Construct the transition matrix for this Markov Chain.

From the given probabilities: \[P = \begin{bmatrix} 0.70 & 0.25 & 0.05 \ 0.20 & 0.50 & 0.30 \ 0.10 & 0.40 & 0.50 \end{bmatrix}\]

# Construct the transition matrix
P2 <- matrix(c(
  0.70, 0.25, 0.05,
  0.20, 0.50, 0.30,
  0.10, 0.40, 0.50
), nrow = 3, byrow = TRUE)

print("Inventory Management in a Warehouse")
## [1] "Inventory Management in a Warehouse"
print("Transition Matrix")
## [1] "Transition Matrix"
print(P2)
##      [,1] [,2] [,3]
## [1,]  0.7 0.25 0.05
## [2,]  0.2 0.50 0.30
## [3,]  0.1 0.40 0.50

(b) If the warehouse starts with a high inventory level (State 1), what is the probability that it will eventually end up in a low inventory level (State 3)?

To find the probability of eventually reaching State 3 from State 1, I’ll use first-step analysis.

Let \(fijf_{ij} fij​\) be the probability of ever reaching state j starting from state i.

For \(f13f_{13} f13​\) (probability of reaching State 3 from State 1):

\(f13=p11f13+p12f23+p13f_{13}\) \(= p_{11}f_{13} + p_{12}f_{23} + p_{13}\) \(f13​=p11​f13​+p12​f23​+p13​\) \(f13=0.70f13+0.25f23+0.05f_{13}\) \(= 0.70f_{13} + 0.25f_{23} + 0.05\) \(f13​=0.70f13​+0.25f23​+0.05\)

Similarly:

\[f23=0.20f13+0.50f23+0.30f_{23}\\ = 0.20f_{13} + 0.50f_{23} + 0.30 f23​=0.20f13​+0.50f23​+0.30\] Solving these equations: \[0.30f13−0.25f23=0.050.30f_{13} - 0.25f_{23} = 0.05\\ 0.30f13​−0.25f23​=0.05−0.20f13+0.50f23=0.30\\-0.20f_{13} + 0.50f_{23} = 0.30 −0.20f13​+0.50f23​=0.30\] Multiply the first equation by 2: \[0.60f13−0.50f23=0.100.60f_{13} - 0.50f_{23} = 0.10\\ 0.60f13​−0.50f23​=0.10−0.20f13+0.50f23=0.30\\-0.20f_{13} + 0.50f_{23} = 0.30 −0.20f13​+0.50f23​=0.30\] Adding these equations: \[0.40f13=0.400.40f_{13} = 0.40 0.40f13​=0.40f13=1f_{13} = 1 f13​=1\] The probability of eventually reaching State 3 from State 1 is 1 or 100%. This makes sense because there are no absorbing states, and all states can reach each other. In an ergodic Markov chain like this, any state can be reached from any other state with probability 1 given enough time.

# Part B
print("Probability of eventually reaching State 3 from State 1")
## [1] "Probability of eventually reaching State 3 from State 1"
# Alternative approach - Using simulation to verify
sim_runs <- 1000
reach_state3 <- 0

for(run in 1:sim_runs) {
  current <- 1  # Start in state 1
  steps <- 0
  max_steps <- 1000
  
  while(current != 3 && steps < max_steps) {
    prob <- runif(1)
    if(current == 1) {
      if(prob < P2[1,1]) {
        current <- 1
      } else if(prob < P2[1,1] + P2[1,2]) {
        current <- 2
      } else {
        current <- 3
      }
    } else if(current == 2) {
      if(prob < P2[2,1]) {
        current <- 1
      } else if(prob < P2[2,1] + P2[2,2]) {
        current <- 2
      } else {
        current <- 3
      }
    }
    steps <- steps + 1
  }
  
  if(current == 3) {
    reach_state3 <- reach_state3 + 1
  }
}

print(paste("Simulated probability:", reach_state3/sim_runs))
## [1] "Simulated probability: 1"

(c) Determine the steady-state distribution of this Markov Chain. What is the long-term expected proportion of time that the warehouse will spend in each inventory state?

To find the steady-state distribution, I need to solve:

\[πP=π\pi P = \pi πP=π ∑i=13πi=1\sum_{i=1}^3 \pi_i = 1 ∑i=13​πi​=1\]

This gives me: \(π1=0.70π1+0.20π2+0.10π3\pi_1 = 0.70\pi_1 + 0.20\pi_2 + 0.10\pi_3\) \(π1​=0.70π1​+0.20π2​+0.10π3​π2=0.25π1+0.50π2+0.40π3\pi_2 = 0.25\pi_1 + 0.50\pi_2 + 0.40\pi_3\) \(π2​=0.25π1​+0.50π2​+0.40π3​π3=0.05π1+0.30π2+0.50π3\pi_3 = 0.05\pi_1 + 0.30\pi_2 + 0.50\pi_3\) \(π3​=0.05π1​+0.30π2​+0.50π3​π1+π2+π3=1\pi_1 + \pi_2 + \pi_3 = 1\) \(π1​+π2​+π3​=1\) Rearranging: \(0.30π1−0.20π2−0.10π3=00.30\pi_1 - 0.20\pi_2 - 0.10\pi_3 = 0\) \(0.30π1​−0.20π2​−0.10π3​=0−0.25π1+0.50π2−0.40π3=0-0.25\pi_1 + 0.50\pi_2 - 0.40\pi_3 = 0\) \(−0.25π1​+0.50π2​−0.40π3​=0−0.05π1−0.30π2+0.50π3=0-0.05\pi_1 - 0.30\pi_2 + 0.50\pi_3 = 0\) \(−0.05π1​−0.30π2​+0.50π3​=0π1+π2+π3=1\pi_1 + \pi_2 + \pi_3 = 1\) \(π1​+π2​+π3​=1\) Solving this system:

From the first equation: \[0.30π1=0.20π2+0.10π30.30\pi_1 = 0.20\pi_2 + 0.10\pi_3 0.30π1​=0.20π2​+0.10π3​\] From the second equation: \[0.50π2=0.25π1+0.40π30.50\pi_2 = 0.25\pi_1 + 0.40\pi_3 0.50π2​=0.25π1​+0.40π3​\] From the third equation: \[0.50π3=0.05π1+0.30π20.50\pi_3 = 0.05\pi_1 + 0.30\pi_2 0.50π3​=0.05π1​+0.30π2​\] Substituting these into each other and solving with the constraint \(π1+π2+π3=1\pi_1 + \pi_2 + \pi_3 = 1\) \(π1​+π2​+π3​=1:\)

\(π1=2073≈0.274\pi_1 = \frac{20}{73} \approx 0.274\) \(π1​=7320​≈0.274\ or\ about\ 27.4%\) \(π2=2973≈0.397\pi_2 = \frac{29}{73} \approx 0.397\) \(π2​=7329​≈0.397\ or\ about\ 39.7%\) \(π3=2473≈0.329\pi_3 = \frac{24}{73} \approx 0.329\) \(π3​=7324​≈0.329\ or\ about\ 32.9%\)

This means in the long run:

The warehouse will have high inventory (State 1) about 27.4% of the time Medium inventory (State 2) about 39.7% of the time Low inventory (State 3) about 32.9% of the time

# Create the matrix for the system (P' - I)
system_matrix <- t(P2) - diag(3)
# Replace the last row with all 1's (sum constraint)
system_matrix[3,] <- c(1, 1, 1)
# Right-hand side is all zeros except for the last element which is 1
rhs <- c(0, 0, 1)

# Solve the system
steady_state2 <- solve(system_matrix, rhs)
print("Part (c): Steady-state distribution")
## [1] "Part (c): Steady-state distribution"
print(steady_state2)
## [1] 0.3466667 0.3866667 0.2666667
# Verification
verification <- steady_state2 %*% P2
print("Verification: π * P should equal π")
## [1] "Verification: π * P should equal π"
print(verification)
##           [,1]      [,2]      [,3]
## [1,] 0.3466667 0.3866667 0.2666667
# Arbitrary distribution
pi <- c(1/3, 1/3, 1/3)
n_iterations <- 100

for(i in 1:n_iterations) {
  pi <- pi %*% P2
}

print("Steady-state via power iteration:")
## [1] "Steady-state via power iteration:"
print(pi)
##           [,1]      [,2]      [,3]
## [1,] 0.3466667 0.3866667 0.2666667
# Calculating exact fractions for Problem 2's steady state
print("Exact fractions for Problem 2 steady state:")
## [1] "Exact fractions for Problem 2 steady state:"
cat("π₁ =", 20, "/", 73, "≈", 20/73, "\n")
## π₁ = 20 / 73 ≈ 0.2739726
cat("π₂ =", 29, "/", 73, "≈", 29/73, "\n")
## π₂ = 29 / 73 ≈ 0.3972603
cat("π₃ =", 24, "/", 73, "≈", 24/73, "\n")
## π₃ = 24 / 73 ≈ 0.3287671