Homework #2

Problem 1

(Bayesian): A new credit scoring system has been developed to predict the likelihood of loan defaults. The system has a 90% sensitivity, meaning that it correctly identifies 90% of those who will default on their loans. It also has a 95% specificity, meaning that it correctly identifies 95% of those who will not default. The default rate among borrowers is 2%. Given these prevalence, sensitivity, and specificity estimates, what is the probability that a borrower flagged by the system as likely to default will actually default? If the average loss per defaulted loan is $200,000 and the cost to run the credit scoring test on each borrower is $500, what is the total first-year cost for evaluating 10,000 borrowers?

# Data From Problem
sensitivity <- 0.90
specificity <- 0.95
prevalence <- 0.02
loss <- 200000
cost <- 500
borrowers <- 10000
# Calculating Probability of Flagged Person defaulting, or the probability of defaulting once identified.
# True Positives is simply the sensitivity times the prevalence
# False Positives are the ones flagged but do not default and that is (1 - specificity) (1-prevalence)
p_flagged <- (sensitivity*prevalence) + ((1-specificity)*(1-prevalence))
# Positive Predictive Value (PPV) is the number of true positives divided by the total number flagged
ppv <- (sensitivity*prevalence)/p_flagged
total_cost <- borrowers*cost
num_flagged <- ceiling(ppv*borrowers)
expected_default <- ceiling(num_flagged*ppv)
default_cost <- expected_default*loss

## Probability that a person flagged as a default risk actually defaults is:  6.7 %

## Total Number Flagged as Default Risk is: 2687

## Expected Number of Actual Defaults is: 722

## Cost of Defaults in First Year is: 144400000

## Total First Year Costs is: 149400000

(Binomial): The probability that a stock will pay a dividend in any given quarter is 0.7. What is the probability that the stock pays dividends exactly 6 times in 8 quarters? What is the probability that it pays dividends 6 or more times? What is the probability that it pays dividends fewer than 6 times? What is the expected number of dividend payments over 8 quarters? What is the standard deviation?

# Data from problem
p <-  0.7 # Probability of Dividend Each Quarter
n <- 8 # Number of Quarters
# Exactly 6 we use dbinom function (PMF)
exactly_6 <- dbinom(6,n,p)
# 6 or more we use pbinom function (CDF)
six_or_more <- pbinom(5,n,p,lower.tail = FALSE)
# Less than 6 we use pbinom function (CDF)
less_than_6 <- pbinom(5,n,p,lower.tail = TRUE)
# Expected Number of Dividends over Period (since can't be partial we round down)
expected_dividends <- floor(n*p)
# Standard Deviation
sd <- sqrt(n*p*(1-p))

## Probability of exactly 6 dividend payments in 8 quarters is: 0.2964755

## Probability of 6 or more dividend payments in 8 quarters is: 0.5517738

## Probability of less than 6 dividend payments in 8 quarters is: 0.4482262

## Expected number of dividend payments in 8 quarters is: 5 (rounded down because can't be partial)

## Standard deviation is: 1.296

(Poisson): A financial analyst notices that there are an average of 12 trading days each month when a certain stock’s price increases by more than 2%. What is the probability that exactly 4 such days occur in a given month? What is the probability that more than 12 such days occur in a given month? How many such days would you expect in a 6-month period? What is the standard deviation of the number of such days? If an investment strategy requires at least 70 days of such price increases in a year for profitability, what is the percent utilization and what are your recommendations?

# Average Number of Trading Days per Month
lambda <- 12
exactly_4 <- dpois(4,lambda)
more_than_12 <- ppois(12,lambda,lower.tail = FALSE)
expected_six_month <- lambda*6
std_month <- sqrt(lambda)
std_6_month <- sqrt(expected_six_month)
lambda_year <- lambda*12
more_than_69 <- ppois(69,lambda_year,lower.tail=FALSE)
percent_util <- round((more_than_69*100),3)

## Lambda for the average number of trading days per month that results in a price increase of 2% or more: 12

## The prbability that exactly 4 days of 2% or more occur per month occurs: 0.005308599

## The probability that 12 or more days of 2% or more occur per month: 0.4240348

## Expected number of days in a six (6) month period: 72

## The Standard Deviation of days of 2% or more price increase is: 8.485281

## A year would have an expected number of days of 2% or more price increase to be the monthly times 12 months, which would equal: 144 days.

## If the investment strategy requires a minimum of 70 days to execute, then the percent 
## of the average number of days for investment is: 48.61 % or about half the expected amount

## The probability of reaching that minimum 70 days is: 100 % probability. This is logical since the minimum number of days is greater than 3 standard deviations from the mean of 144. My recommendation would be to execute the strategy.

(Hypergeometric): A hedge fund has a portfolio of 25 stocks, with 15 categorized as high-risk and 10 as low-risk. The fund manager randomly selects 7 stocks to closely monitor. If the manager selected 5 high-risk stocks and 2 low-risk stocks, what is the probability of selecting exactly 5 high-risk stocks if the selection was random? How many high-risk and low-risk stocks would you expect to be selected?

h <- 15 # High Risk Stocks
l <- 10 # Low Risk Stocks
k <- 7 # Sample Size
x <- 5 # Number of high risk socks selected in sample
exactly_five <- dhyper(5,15,10,7)
expected_high <- round(k * (h/(h+l))) # must round because can't have half stock
expected_low <- round(k * (l/(h+l))) # must round because can't have half stock

## The probability of selecting exactly 5 high risk stock out of 7 selected is: 0.2811213

## Expected number of high risk stocks selected (rounded because can't have 1/2 stock): 4

## Expected number of low risk stocks selected (rounded because can't have 1/2 stock): 3

(Geometric): The probability that a bond defaults in any given year is 0.5%. A portfolio manager holds this bond for 10 years. What is the probability that the bond will default during this period? What is the probability that it will default in the next 15 years? What is the expected number of years before the bond defaults? If the bond has already survived 10 years, what is the probability that it will default in the next 2 years?

prob_default <- 0.005 # 0.5% default rate per year
prob_default_10 <- 1 - (1 - prob_default)^10
prob_default_15 <- 1 - (1 - prob_default)^15
years_default <- 1/prob_default
prob_default_2 <- 1 - (1 - prob_default)^2 # Memoryless, probability of default is independent of previous information.

## Probability of default within 10 years is: 0.04888987

## Probability of default within the next 15 years is: 0.07243103

## Expected number of years for the bond to default is: 200 years.

## Probability of default within two years after surviving 10 years is: 0.009975 because this is a memoryless 
## application as the possibility of default is independent of the years survived.

(Poisson): A high-frequency trading algorithm experiences a system failure about once every 1500 trading hours. What is the probability that the algorithm will experience more than two failures in 1500 hours? What is the expected number of failures?

lambda2 <- 1 # in 1500 hours
more_than_two <- ppois(2,lambda2,lower.tail = FALSE)

## The probability of experiencing more than 2 failures within 1500 hours is: 0.0803014

## The expected number of failures within 1500 hours is lambda, which is equal to: 1

(Uniform Distribution): An investor is trying to time the market and is monitoring a stock that they believe has an equal chance of reaching a target price between 20 and 60 days. What is the probability that the stock will reach the target price in more than 40 days? If it hasn’t reached the target price by day 40, what is the probability that it will reach it in the next 10 days? What is the expected time for the stock to reach the target price?

min_days <- 20
max_days <- 60
prob_more_40 <- punif(40,min_days,max_days, lower.tail = FALSE)
prob_next_10 <- (50-40)/(max_days-40)
expected_time_target <- (min_days+max_days)/2

## Given a uniform distribution with a min of 20 days and max of 60 days:

## The probability that a stock will reach target price in more than 40 days is: 0.5

## The probability that a stock will reach target price in the next 10 days is: 0.5

## The expected time for a stock to reach a target price is: 40

(Exponential Distribution): A financial model estimates that the lifetime of a successful start-up before it either goes public or fails follows an exponential distribution with an expected value of 8 years. What is the expected time until the start-up either goes public or fails? What is the standard deviation? What is the probability that the start-up will go public or fail after 6 years? Given that the start-up has survived for 6 years, what is the probability that it will go public or fail in the next 2 years?

exp_fail <- 8
exp_SD <- exp_fail # mu = sigma
prob_after_six <- pexp(6,(1/exp_fail), lower.tail = FALSE)
prob_two_six <- pexp(2,(1/exp_fail), lower.tail = TRUE)

## Expected time until start_up going public or fail is the same as given expected value of: 8 years

## The standard deviation is the same as the mean or expected value, a feature of exponential distributions (mu=sigma) which is also: 8 years.

## The probability that a start-up will go public or fail after 6 years is: 0.4723666

## The probability a start-up will go public or fail in the next two years after six is: 0.2211992

## This is also memoryless since the event occuring is independent of what has happend before.

Problem 2

(Product Selection): A company produces 5 different types of green pens and 7 different types of red pens. The marketing team needs to create a new promotional package that includes 5 pens. How many different ways can the package be created if it contains fewer than 2 green pens?

green_pens <- 5
red_pens <- 7
package <- 5
max_green_pens <- 2
# Can only have combinations of 0 and 1 green pen.
greenpens_0 <- choose(red_pens,package)
green_pens_1 <- choose(red_pens,(package-1))
total_combinations <- green_pens_1 + greenpens_0

## Answer: Total combinations include those with 0 green pens and 1 green pen in a package of 5 pens is: 56

(Team Formation for a Project): A project committee is being formed within a company that includes 14 senior managers and 13 junior managers. How many ways can a project team of 5 members be formed if at least 4 of the members must be junior managers?

senior <- 14
junior <- 13
team <- 5
# Two possibilities of a committee is 5 junior or 4 junior and one senior manager
all_junior <- choose(junior,team)
junior_senior <- choose(junior, (team-1))*choose(senior,1)
total_combinations2 <- all_junior+junior_senior

## Answer: Total combinations of the two possible scenarios is: 11297

(Marketing Campaign Outcomes): A marketing campaign involves three stages: first, a customer is sent 5 email offers; second, the customer is targeted with 2 different online ads; and third, the customer is presented with 3 personalized product recommendations. If the email offers, online ads, and product recommendations are selected randomly, how many different possible outcomes are there for the entire campaign?

Answer: This problem does not actually give us the pool numbers for each of the stages: (E)email, (A)ads, and (P)personal). We shall approach this generically.

Combinations would be calculated by:

EO = choose(E,5)

AO = choose(A,2)

PO = choose(P,3)

Total outcomes would be:

Total = EO*AO*PO

(Product Defect Probability): A quality control team draws 3 products from a batch without replacement. What is the probability that at least one of the products drawn is defective if the defect rate is known to be consistent? Express your answer as a fraction or a decimal number rounded to four decimal places.

Let,
N = total batch size D = defect count n = 3 (number drawn)

P(# defect >0) = 1 - P(zero defects drawn)
P(zero defects drawn) = choose((N-D),3)/choose(N,n)

Sample code run with N = 20, R = 0.15 Defect Rate (15%), and D = defect count based upon batch and defect rate.

N <- 20
R <- 0.15
D <- round(N*R)
P_zero <- choose((N-D),3)/choose(N,3)
P_at_least_one <- 1 - P_zero

## Given a batch of N = 20 and defect rate of R = 0.15 the probability of at least one of the three drawn is defective is: 0.4035088

(Business Strategy Choices): A business strategist is choosing potential projects to invest in, focusing on 17 high-risk, high-reward projects and 14 low-risk, steady-return projects. Step 1: How many different combinations of 5 projects can the strategist select? Step 2: How many different combinations of 5 projects can the strategist select if they want at least one low-risk project?

high_risk <- 17
low_risk <- 14
projects <- 5
any_five_projects <- choose((low_risk+high_risk),projects)
least_one_low <- any_five_projects - choose(high_risk, projects)

## Answer: How many different combinations of 5 projects can the strategist select? 169911

## Answer: How many different combinations of 5 projects can the strategist select if they want at least one low-risk project? 163723

(Event Scheduling): A business conference needs to schedule 9 different keynote sessions from three different industries: technology, finance, and healthcare. There are 4 potential technology sessions, 104 finance sessions, and 17 healthcare sessions to choose from. How many different schedules can be made? Express your answer in scientific notation rounding to the hundredths place.

sessions <- 9
tech_num <- 4
finance_num <- 104
health_num <- 17
# We will use the expand.grid function to check all possible combinations.
# We must also assume at least one(1) keynote from each industry, which means -2 will be considered for each vector total
combinations_conference <- expand.grid(tech = 1:min(tech_num, sessions-2),
                                       finance = 1:min(finance_num, sessions-2),
                                       healthcare = 1:min(health_num, sessions-2))
# Total only those that equal 9
combinations_conference <- combinations_conference[rowSums(combinations_conference)==sessions,]
# Calculate the number of schedules for each combination
schedules <- numeric(0)
for (i in 1:nrow(combinations_conference)){
  combinations <- combinations_conference[i,]
  schedule_count <- choose(tech_num, combinations_conference$tech) *
                    choose(finance_num, combinations_conference$finance)*
                    choose(health_num, combinations_conference$healthcare)
  schedules <- c(schedules, schedule_count)
}
total_schedules <- formatC(sum(schedules), format = 'e', digits = 2)

## Answer: Total number of possible schedules made assuming at least one per industry is: 6.22e+13

(Book Selection for Corporate Training): An HR manager needs to create a reading list for a corporate leadership training program, which includes 13 books in total. The books are categorized into 6 novels, 6 business case studies, 7 leadership theory books, and 5 strategy books.

Step 1: If the manager wants to include no more than 4 strategy books, how many different reading schedules are possible? Express your answer in scientific notation rounding to the hundredths place.

Step 2: If the manager wants to include all 6 business case studies, how many different reading schedules are possible? Express your answer in scientific notation rounding to the hundredths place.

reading_list <- 13
novels <- 6
case_study <- 6
leadership <- 7
strategy <- 5
comb_1 <- 0
# There can be no more than 4 strategy books, so that the remaining books can be selected from case study and novels
for (i in 0:4){
  comb_1 <- comb_1 + choose(strategy,i) * choose(novels+case_study+leadership, (reading_list-i))
}
comb_1 <- formatC(comb_1, format = 'e', digits = 2)
# Choose all 6 Case Studies
comb_2 <- choose((novels+leadership+strategy), (reading_list - case_study))
comb_2 <- formatC(comb_2, format = 'e', digits = 2)

## Answer: If the manager wants to include no more than 4 strategy books, how many different reading schedules are possible? 2.42e+06

## Answer: If the manager wants to include all 6 business case studies, how many different reading schedules are possible? 3.18e+04

(Product Arrangement): A retailer is arranging 10 products on a display shelf. There are 5 different electronic gadgets and 5 different accessories. What is the probability that all the gadgets are placed together and all the accessories are placed together on the shelf? Express your answer as a fraction or a decimal number rounded to four decimal places.

products <- 10
gadgets <- 5
accessory <- 5
# Total number of possible arrangements
total_number <- factorial(10)
# Total number of gadget arrangements
gadget_num <- factorial(5)
# Total number of accessory arrangements
accessory_num <- factorial(5)
# Total number of arrangements where all accessory and all gadgets are arranged as a group is 2
# All gadgets and then all accessory, or all accessory and then all gadgets
group_num <- factorial(2)
probability_products <- (group_num*gadget_num*accessory_num)/total_number

## Probability that all gadgets and all products are grouped together on a 10 product shelf is: 0.0079

(Expected Value of a Business Deal): A company is evaluating a deal where they either gain $4 for every successful contract or lose $16 for every unsuccessful contract. A “successful” contract is defined as drawing a queen or lower from a standard deck of cards. (Aces are considered the highest card in the deck.)

Step 1: Find the expected value of the deal. Round your answer to two decimal places. Losses must be expressed as negative values.

Step 2: If the company enters into this deal 833 times, how much would they expect to win or lose? Round your answer to two decimal places. Losses must be expressed as negative values.

gain <- 4
loss <- -16
success <- 4 * 11 # 4 suits of queen and lower
total_outcomes <- 52
total_deals <- 833
success_prob <- success/total_outcomes
ev <- (gain*success_prob) + (loss*(1-success_prob))
total_ev <- ev*total_deals

## Answer: Expected value of the deal. Round your answer to two decimal places is: $ 0.92

## Answer: The company enters into this deal 833 times, how much would they expect to win or lose is: $ 768.92

Problem 3

(Supply Chain Risk Assessment): Let X1, X2,…..Xn represent the lead times (in days) for the delivery of key components from n=5 different suppliers. Each lead time is uniformly distributed across a range of 1 to k=20 days, reflecting the uncertainty in delivery times. Let Y denote the minimum delivery time among all suppliers. Understanding the distribution of Y is crucial for assessing the earliest possible time you can begin production. Determine the distribution of Y to better manage your supply chain and minimize downtime.

n <- 5
k <- 20
y_values <- seq(1, k, by = 0.1)
cdf <- 1 - (1 - (y_values - 1)/(k - 1))^n
pdf <- n * (1 - (y_values - 1)/(k - 1))^(n - 1) * (1/(k - 1))
plot(y_values, cdf, type = "l", col = "blue", 
     xlab = "Minimum Lead Time (Y) in Days", ylab = "CDF",
     main = "Distribution of Minimum Lead Time")
lines(y_values, pdf, type="l", col="green")
legend("bottomright", legend = c("CDF", "PDF"),
   col = c("blue", "green"), lty = 1)

## Answer: By looking at the CDF and PDF over the length of time k=20 days, we can determine what an effective lead
##  time for production from supply might be.  While this is more of a qualitative view, we can also identify an exact probability and day of potential production start by looking at CDF and PDF assigned to the vaules of Y.  
## We would make our decision based upon production factors and costs, and then decide what probability would 
## best suit our needs.  If we are looking for arrival of supplies of at least 90% probability, then our 
## production lead time would be about 7 days (according to our plot). 
## 
## A review of the assigned cumulative probability occurs at day 7.2. 
## Cumulative Probability Days 6.8 through 7.2: 0.8863008 0.8908481 0.8952488 0.8995064 0.9036245

(Maintenance Planning for Critical Equipment): Your organization owns a critical piece of equipment, such as a high-capacity photocopier (for a law firm) or an MRI machine (for a healthcare provider). The manufacturer estimates the expected lifetime of this equipment to be 8 years, meaning that, on average, you expect one failure every 8 years. It’s essential to understand the likelihood of failure over time to plan for maintenance and replacements.

Geometric Model: Calculate the probability that the machine will not fail for the first 6 years. Also, provide the expected value and standard deviation. This model assumes each year the machine either fails or does not, independently of previous years.

mean_life <- 8
prob_fail <- 1/mean_life
prob_no_fail <- (1 - prob_fail)^6
std_geo <- sqrt((1-prob_fail)/prob_fail^2)

## Probability of no failure within the first six (6) years is: 0.4487953

## The expected value is already given in the problem as the expected lifetime of : 8 years

## Standard deviation is: 7.483315 years

Exponential Model: Calculate the probability that the machine will not fail for the first 6 years. Provide the expected value and standard deviation, modeling the time to failure as a continuous process.

lambda3 <- 1/mean_life
prob_no_fail_exp <- pexp(6,lambda3, lower.tail = FALSE)

## Probability of no failure within the first six (6) years is: 0.4723666

## The expected value is already given in the problem, or 1/lambda, as the expected lifetime of : 8 years

## Standard deviation is also 1/lambda or: 8 years

Binomial Model: Calculate the probability that the machine will not fail during the first 6 years, given that it is expected to fail once every 8 years. Provide the expected value and standard deviation, assuming a fixed number of trials (years) with a constant failure probability each year.

n_years <- 6
prob_fail_binom <- 1/mean_life
prob_no_fail_binom <- 1 - prob_fail_binom
prob_no_fail_6_binom <- dbinom(0, n_years, prob_fail_binom )
exp_fails_binom <- n_years*prob_fail_binom
std_binom <- sqrt(n_years * prob_fail_binom * prob_no_fail_binom)

## Probability of no failures in the first six (6) years is: 0.4487953

## Expected value over six(6) years (trials) is: 0.75

## Standard deviation of failures over six (6) years is: 0.8100926

Poisson Model: Calculate the probability that the machine will not fail during the first 6 years, modeling the failure events as a Poisson process. Provide the expected value and standard deviation.

lambda_pois <- (1/mean_life)*n_years
prob_no_fail_6_pois <- dpois(0,lambda_pois)
std_pois <- sqrt(lambda_pois)

## Probability of no failures in the first six (6) years is: 0.4723666

## Expected value over six(6) years is: 0.75

## Standard deviation of failures over six (6) years is: 0.8660254

Problem 4

1. Scenario: You are managing two independent servers in a data center. The time until the next failure for each server follows an exponential distribution with different rates:

Server A has a failure rate of $\lambda$_a = 0.5 failures per hour.

Server B has a failure rate of $\lambda$_b = 0.3 failures per hour.

Question: What is the distribution of the total time until both servers have failed at least once? Use the moment generating function (MGF) to find the distribution of the sum of the times to failure.

The mean time for both servers to fail would be the sum of the individual mean server time failures.

E[T] = E[T_A] + E[T_B] = 2 = 3.333 = 5.333 hours

T_a ~ Exp($\lambda$_a = 0.5) with mean of 1/$\lambda$a = 2 hours

T_b ~ Exp($\lambda$_b = 0.5) with mean of 1/$\lambda$b = 3.33 hours

M_T(s) = E[e^xT] = $\lambda$/($\lambda$ - s), s < $\lambda$

For T_a, the MGF is, M_TA(s) = 0.5/(0.5 - s)

For T_a, the MGF is, M_TB(s) = 0.3/(0.3 - s)

T = T_A + T_B

M_T(s) = (0.5/(0.5 - s)) (0.3/(0.3 - s)) = 0.15/(0.5 - s)(0.3 - s)*

Mean(T) = Mean(T_A) + Mean(T_B) = 2 + 3.333 = 5.333 hours

Var(T) = 1/$\lambda$²

Var(T) = Var(T_A) + Var(T_B) = 1/(0.5)² + 1/(0.3)² = 4 hours² + 11.11 hours² = 15.11 hours²

2. Sum of Independent Normally Distributed Random Variables

Scenario: An investment firm is analyzing the returns of two independent assets, Asset X and Asset Y. The returns on these assets are normally distributed:

Asset X: X ~ N ($\mu$_X = 5%, $\sigma$²_X = 4%)

Asset Y: Y ~ N ($\mu$_Y = 7%, $\sigma$²_Y = 9%)

Question: Find the distribution of the combined return of the portfolio consisting of these two assets using the moment generating function (MGF).

For N($\mu$,$\sigma$²) the MGF is M(t) = exp(t$\mu$ + 1/2 t²$\sigma$²)

M_X(t) = exp(t * 5% + 1/2 * t^2 * 4%) = exp(0.05 * t + 0.02 * t²)
M_Y(t) = exp(t * 7% + 1/2 * t^2 * 9%) = exp(0.07 * t + 0.045 * t²)
Z = X + Y
M_Z(t) = M_X(t) * M_Y(t) = exp(0.05 * t + 0.02 * t²) * exp(0.07 * t + 0.045 * t²)
We simplify the above, and it reduces to:
M_Z(t) = exp(0.12 * t + 0.065* t²)

From this, we get the $\mu$ and $\sigma$ for a combined normal distribution of the assets to be:

Combined Assets: Z ~ N ($\mu$_Z = 12%, $\sigma$²_Z = 6.5%)

3. Scenario: A call center receives calls independently from two different regions. The number of calls received from Region A and Region B in an hour follows a Poisson distribution:

Region A: X_A ~ Poisson ($\lambda$_A = 3)
Region B: X_B ~ Poisson ($\lambda$_B = 5)

Question: Determine the distribution of the total number of calls received in an hour from both regions using the moment generating function (MGF).

The MGF for a poisson distribution is represented by the following equation:
M_X(t) = exp($\lambda$ * (e^t - 1)
We can create the MGFs for each of the call centers with the following:
For X_A: M_XA(t) = exp(3 * (e^t - 1)
For X_B: M_XB(t) = exp(5 * (e^t - 1)

As with the MGFs for the other parts of this problem, the combined MGF fo Z = X_A + X_B is:
M_Z(t) = M_XA(t) M_XB(t) = exp(3 * (e^t - 1) * exp(5 * (e^t - 1) = exp((3 + 5) * (e^t -1)) = exp(8 * (e^t -1))*

This tell us the MGF follows a poisson distribution with a mean and variance are 8 calls per hour for the combined call centers

Problem 5

Customer Retention and Churn Analysis

Scenario: A telecommunications company wants to model the behavior of its customers regarding their likelihood to stay with the company (retention) or leave for a competitor (churn). The company segments its customers into three states:

State 1: Active customers who are satisfied and likely to stay (Retention state). State 2: Customers who are considering leaving (At-risk state). State 3: Customers who have left (Churn state). The company has historical data showing the following monthly transition probabilities:

From State 1 (Retention): 80% stay in State 1, 15% move to State 2, and 5% move to State 3. From State 2 (At-risk): 30% return to State 1, 50% stay in State 2, and 20% move to State 3. From State 3 (Churn): 100% stay in State 3. The company wants to analyze the long-term behavior of its customer base.

Question: (a) Construct the transition matrix for this Markov Chain.

transition_matrix <- matrix(c(0.80,0.15,0.05,
                              0.30,0.50,0.20,
                              0.00,0.00,1.00),
                            nrow = 3, byrow = TRUE)
rownames(transition_matrix) <- c('Retention', 'At-Risk', 'Churn')
colnames(transition_matrix) <- c('Retention', 'At-Risk', 'Churn')

## [1] "The transition matrix is:"

##           Retention At-Risk Churn
## Retention       0.8    0.15  0.05
## At-Risk         0.3    0.50  0.20
## Churn           0.0    0.00  1.00

If a customer starts as satisfied (State 1), what is the probability that they will eventually churn (move to State 3)?
For this we need to construct the required matrices to calculate the fundamental matrix to identify the probability that they will eventually churn.

# Create the sub-matrix of transient states.  This is from state 1 and state 2, state 3 does not have any.
Q <- transition_matrix[1:2,1:2]
# Create the sub-matrix from transient to absorption states.  This is specifically from state 1 and 2 going directly to state 3.
R <- transition_matrix[1:2,3]
# The fundamental matrix N = (I-Q)^-1
N <- solve(diag(2)- Q)
# Create the absorption probability matrix B
B <- N%*%R

## [1] "The probability of being absorbed from any transient state is:"

##           [,1]
## Retention    1
## At-Risk      1

So the probability of being absorbed from either of the transient states is 1.0

(c) Determine the steady-state distribution of this Markov Chain. What percentage of customers can the company expect to be in each state in the long run?

We can raise the transition matrix to a large enough power to see where the results seemed to converge. In this case we will raise the matrix to the 100th power and look at the results.

p_steady <- round(transition_matrix %^% 100)

##           Retention At-Risk Churn
## Retention         0       0     1
## At-Risk           0       0     1
## Churn             0       0     1

We can see that the steady state will result in 100% of the customers being in the absorbed state.

2: Inventory Management in a Warehouse Scenario: A warehouse tracks the inventory levels of a particular product using a Markov Chain model. The inventory levels are categorized into three states:

State 1: High inventory (More than 100 units in stock). State 2: Medium inventory (Between 50 and 100 units in stock). State 3: Low inventory (Less than 50 units in stock). The warehouse has the following transition probabilities for inventory levels from one month to the next:

From State 1 (High): 70% stay in State 1, 25% move to State 2, and 5% move to State 3. From State 2 (Medium): 20% move to State 1, 50% stay in State 2, and 30% move to State 3. From State 3 (Low): 10% move to State 1, 40% move to State 2, and 50% stay in State 3. The warehouse management wants to optimize its restocking strategy by understanding the long-term distribution of inventory levels.