The following RMD contains CUNY SPS DATA 605 Fall 2025 context for
the Homework 02 assignment. This project has the following 5 problem
sections, with their answers:
Problem 1
1. Bayesian :
Probability that a borrower flagged by the system as likely to default will actually default: 26.87 %
If the average loss per defaulted loan is $200,000 and the cost to run the credit scoring test on each borrower is $500, this is the total first-year cost for evaluating 10,000 borrowers: 542,400,000
2. Binomial
The probability that the stock pays dividends exactly 6 times in 8 quarters: 0.2965
The probability that it pays dividends 6 or more times: 0.5518
The probability that it pays dividends fewer than 6 times: 0.4482
The expected number of dividend payments over 8 quarters: 5.6
Standard deviation: 1.296
3. Poisson
The probability that exactly 4 such days occur in a given month: 0.00531
The probability that more than 12 such days occur in a given month: 0.424
Expected in a 6-month period: 72
Standard deviation in 6 months: 8.485
Percent utilization in at least 70 days: 100 %
4. Hypergeometric
The probability of selecting exactly 5 high-risk stocks if the selection was random when the manager selected 5 high-risk stocks and 2 low-risk stocks:): 0.2811
Expected high-risk stocks selected: 4.2
Expected low-risk stocks selected: 2.8
5. Geometric
The probability that the bond will default during the 10 year period: 0.0489
The probability that it will default in the next 15 years: 0.0724
The expected number of years before the bond defaults: 200
The probability that it will default in the next 2 years if the bond has already survived 10 years: 0.01
6. Poisson
The probability that the algorithm will experience more than two failures in 1500 hours: 0.0803
The expected number of failures: 1
7. Uniform Distribution
The probability that the stock will reach the target price in more than 40 days: 0.5
The probability that it will reach it in the next 10 days if it hasn’t reached the target price by day 40: Inf
The expected time to reach the target price: 40
8. Exponential Distribution
The expected time until the start-up either goes public or fails: 8
Standard deviation: 8
The probability that the start-up will go public or fail after 6 years: 0.4724
The probability that it will go public or fail in the next 2 years given that the start-up has survived for 6 years: 0.2212
Problem 5
1. Customer Retention and Churn Analysis
a. Transition Matrix
Transition Matrix - Customer Retention:
Retention At-Risk Churn
Retention 0.8 0.15 0.05
At-Risk 0.3 0.50 0.20
Churn 0.0 0.00 1.00
b. Probability to Eventually Churn
Probability of eventually churning (move to State 3):
Churn
Retention 1
At-Risk 1
c. Steady-State Distribution
Steady-state distribution of this Markov Chain:
Retention At-Risk Churn
Retention 0 0 1
At-Risk 0 0 1
Churn 0 0 1
Percentage of customers can the company expect to be in each state in the long run:
All customers expected to be in the churned state (100% churn)
2. Inventory Management in a Warehouse
a. Transition Matrix
Transition Matrix - Inventory Management:
High Medium Low
High 0.7 0.25 0.05
Medium 0.2 0.50 0.30
Low 0.1 0.40 0.50
b. Probability to End up in a Low Inventory
Level
Probability of eventually ending in Low inventory from High: 0.267
c. Steady-State Distribution
Steady-state distribution of this Markov Chain:
High Medium Low
0.347 0.387 0.267
Problem 1
1. Bayesian
# Provided stats
sensitivity <- 0.90
specificity <- 0.95
prevalence <- 0.02
loss <- 200000
cost <- 500
borrowers <- 10000
# Positive predictive value
p_flagged <- sensitivity*prevalence + (1-specificity)*(1-prevalence)
ppv <- (sensitivity*prevalence) / p_flagged
# Costs
expected_defaults <- round(ppv*borrowers)
default_cost <- expected_defaults * loss
total_cost <- borrowers*cost + default_cost
# Results
cat("Probability that a borrower flagged by the system as likely to default will actually default:", round(ppv*100,2), "%\n")
## Probability that a borrower flagged by the system as likely to default will actually default: 26.87 %
cat("If the average loss per defaulted loan is $200,000 and the cost to run the credit scoring test on each borrower is $500, this is the total first-year cost for evaluating 10,000 borrowers: ", format(total_cost, big.mark=","))
## If the average loss per defaulted loan is $200,000 and the cost to run the credit scoring test on each borrower is $500, this is the total first-year cost for evaluating 10,000 borrowers: 542,400,000
2. Binomial
# Provided stats
p <- 0.7
n <- 8
# Calculations
exactly_6 <- dbinom(6,n,p)
six_or_more <- pbinom(5,n,p,lower.tail=FALSE)
less_than_6 <- pbinom(5,n,p,lower.tail=TRUE)
expected_dividends <- n*p
sd_div <- sqrt(n*p*(1-p))
# Results
cat("The probability that the stock pays dividends exactly 6 times in 8 quarters: ", round(exactly_6,4), "\n")
## The probability that the stock pays dividends exactly 6 times in 8 quarters: 0.2965
cat("The probability that it pays dividends 6 or more times: ", round(six_or_more,4), "\n")
## The probability that it pays dividends 6 or more times: 0.5518
cat("The probability that it pays dividends fewer than 6 times: ", round(less_than_6,4), "\n")
## The probability that it pays dividends fewer than 6 times: 0.4482
cat("The expected number of dividend payments over 8 quarters: ", expected_dividends, "\n")
## The expected number of dividend payments over 8 quarters: 5.6
cat("Standard deviation:", round(sd_div,3))
## Standard deviation: 1.296
3. Poisson
# Provided stats
lambda <- 12
# Calculations
exactly_4 <- dpois(4,lambda)
more_than_12 <- ppois(12,lambda,lower.tail=FALSE)
expected_6m <- lambda*6
sd_6m <- sqrt(expected_6m)
lambda_year <- lambda*12
percent_util <- round(ppois(69, lambda_year, lower.tail=FALSE)*100,3)
# Results
cat("The probability that exactly 4 such days occur in a given month:", round(exactly_4,5), "\n")
## The probability that exactly 4 such days occur in a given month: 0.00531
cat("The probability that more than 12 such days occur in a given month:", round(more_than_12,4), "\n")
## The probability that more than 12 such days occur in a given month: 0.424
cat("Expected in a 6-month period:", expected_6m, "\n")
## Expected in a 6-month period: 72
cat("Standard deviation in 6 months:", round(sd_6m,3), "\n")
## Standard deviation in 6 months: 8.485
cat("Percent utilization in at least 70 days:", percent_util)
## Percent utilization in at least 70 days: 100
4. Hypergeometric
# Provided stats
total_stocks <- 25
high_risk <- 15
low_risk <- 10
selected <- 7
x_high <- 5
# Calculations
prob_hyper <- dhyper(x_high, high_risk, low_risk, selected)
expected_high <- selected * (high_risk/total_stocks)
expected_low <- selected * (low_risk/total_stocks)
# Results
cat("The probability of selecting exactly 5 high-risk stocks if the selection was random when the manager selected 5 high-risk stocks and 2 low-risk stocks:):", round(prob_hyper,4), "\n")
## The probability of selecting exactly 5 high-risk stocks if the selection was random when the manager selected 5 high-risk stocks and 2 low-risk stocks:): 0.2811
cat("Expected high-risk stocks selected:", round(expected_high,2), "\n")
## Expected high-risk stocks selected: 4.2
cat("Expected low-risk stocks selected:", round(expected_low,2))
## Expected low-risk stocks selected: 2.8
5. Geometric
# Provided stats
p_default <- 0.005
years <- 10
# P(default in 10 years)
prob_geometric_10 <- 1 - (1 - p_default)^years
prob_geometric_15 <- 1 - (1 - p_default)^15
expected_years <- 1/p_default
# Conditional probability (survived 10 years)
prob_next_2 <- 1 - (1 - p_default)^2
# Results
cat("The probability that the bond will default during the 10 year period:", round(prob_geometric_10,4), "\n")
## The probability that the bond will default during the 10 year period: 0.0489
cat("The probability that it will default in the next 15 years:", round(prob_geometric_15,4), "\n")
## The probability that it will default in the next 15 years: 0.0724
cat("The expected number of years before the bond defaults:", round(expected_years,2), "\n")
## The expected number of years before the bond defaults: 200
cat("The probability that it will default in the next 2 years if the bond has already survived 10 years:", round(prob_next_2,4))
## The probability that it will default in the next 2 years if the bond has already survived 10 years: 0.01
6. Poisson
# Provided stats
lambda_hft <- 1/1500 # per hour
time_hours <- 1500
lambda_period <- lambda_hft * time_hours
# Calculations
prob_more_than_2 <- 1 - ppois(2, lambda_period)
expected_failures <- lambda_period
# Results
cat("The probability that the algorithm will experience more than two failures in 1500 hours:", round(prob_more_than_2,4), "\n")
## The probability that the algorithm will experience more than two failures in 1500 hours: 0.0803
cat("The expected number of failures:", round(expected_failures,4))
## The expected number of failures: 1
8. Exponential Distribution
# Provided stats
mean_time <- 8
lambda_exp <- 1/mean_time
t <- 6
# Probability survive past t
prob_gt_t <- exp(-lambda_exp * t)
# Conditional probability survive 6 years, then next 2
prob_next_2_exp <- 1 - exp(-lambda_exp*2)
expected_exp <- mean_time
sd_exp <- mean_time
# Results
cat("The expected time until the start-up either goes public or fails:", expected_exp, "\n")
## The expected time until the start-up either goes public or fails: 8
cat("Standard deviation:", sd_exp, "\n")
## Standard deviation: 8
cat("The probability that the start-up will go public or fail after 6 years:", round(prob_gt_t,4), "\n")
## The probability that the start-up will go public or fail after 6 years: 0.4724
cat("The probability that it will go public or fail in the next 2 years given that the start-up has survived for 6 years:", round(prob_next_2_exp,4))
## The probability that it will go public or fail in the next 2 years given that the start-up has survived for 6 years: 0.2212
Problem 2
1. Product Selection
# Provided stats
green <- 5
red <- 7
package <- 5
# Calculations
total_combinations <- choose(red,5) + choose(red,4)*choose(green,1)
# Results
cat("Answer:", total_combinations)
## Answer: 196
3. Marketing Campaign Outcomes
# Provided stats
emails_chosen <- 5
ads_chosen <- 2
products_chosen <- 3
# Sample size I chose for these, to get an exact numeric answer
total_emails <- 10
total_ads <- 8
total_products <- 6
# Total possible outcomes
total_outcomes <- choose(total_emails, emails_chosen) *
choose(total_ads, ads_chosen) *
choose(total_products, products_chosen)
# Results
cat("Answer:", format(total_outcomes, big.mark=","))
## Answer: 141,120
4. Product Defect Probability
# Provided stats
N <- 20
R <- 0.15
# Calculations
D <- round(N*R)
n_draw <- 3
P_at_least_one <- 1 - choose(N-D,n_draw)/choose(N,n_draw)
# Results
cat("Answer:", round(P_at_least_one,4))
## Answer: 0.4035
5. Business Strategy Choices
# Provided stats
high_risk <- 17
low_risk <- 14
projects <- 5
# Calculations
total_combinations_projects <- choose(high_risk + low_risk, projects)
combinations_at_least_one_low <- total_combinations_projects - choose(high_risk, projects)
# Results
cat("Step 1 Answer:", format(total_combinations_projects, big.mark=","), "\n")
## Step 1 Answer: 169,911
cat("Step 2 Answer:", format(combinations_at_least_one_low, big.mark=","))
## Step 2 Answer: 163,723
6. Event Scheduling
# Provided stats
tech <- 4
finance <- 104
health <- 17
sessions <- 9
# Calculations
grid <- expand.grid(tech = 1:min(tech, sessions-2),
finance = 1:min(finance, sessions-2),
healthcare = 1:min(health, sessions-2))
grid <- grid[rowSums(grid) == sessions,]
schedule_counts <- numeric(nrow(grid))
for (i in 1:nrow(grid)) {
schedule_counts[i] <- choose(tech, grid$tech[i]) *
choose(finance, grid$finance[i]) *
choose(health, grid$healthcare[i])
}
total_schedules <- formatC(sum(schedule_counts), format = 'e', digits = 2)
# Results
cat("Answer:", total_schedules)
## Answer: 2.83e+12
7. Book Selection for Corporate Training
# Provided stats
novels <- 6
case_study <- 6
leadership <- 7
strategy <- 5
reading_list <- 13
# Calculations
comb_1 <- 0
for (i in 0:4) {
comb_1 <- comb_1 + choose(strategy, i) * choose(novels + case_study + leadership, reading_list - i)
}
comb_1 <- formatC(comb_1, format = 'e', digits = 2)
comb_2 <- choose(novels + leadership + strategy, reading_list - case_study)
comb_2 <- formatC(comb_2, format = 'e', digits = 2)
# Results
cat("Step 1 Answer:", comb_1, "\n")
## Step 1 Answer: 2.42e+06
cat("Step 2 Answer:", comb_2)
## Step 2 Answer: 3.18e+04
8. Product Arrangement
# Provided stats
total_products <- 10
gadgets <- 5
accessories <- 5
# Calculations
total_arrangements <- factorial(total_products)
group_arrangements <- factorial(2) * factorial(gadgets) * factorial(accessories)
prob_grouped <- group_arrangements / total_arrangements
# Results
cat("Answer:", round(prob_grouped,4))
## Answer: 0.0079
9. Expected Value of a Business Deal
# Provided stats
gain <- 4
loss <- -16
queen_count <- 4
lower_than_queen_count <- 11
total_outcomes <- 52
total_deals <- 833
# Calculations
success <- queen_count * lower_than_queen_count
success_prob <- success / total_outcomes
ev <- gain * success_prob + loss * (1 - success_prob)
total_ev <- ev * total_deals
# Results
cat("Step 1 Answer:", dollar(round(ev, 2)), "\n")
## Step 1 Answer: $0.92
cat("Step 2 Answer:", dollar(round(total_ev, 2)))
## Step 2 Answer: $768.92
Problem 3
1. Supply Chain Risk Assessment
# Provided stats
n_suppliers <- 5
max_days <- 20
lead_times <- 1:max_days
# Calculations
y_values <- seq(1, max_days, by = 0.1)
cdf_min <- 1 - (1 - (y_values - 1)/(max_days - 1))^n_suppliers
pdf_min <- n_suppliers * (1/(max_days - 1)) * (1 - (y_values - 1)/(max_days - 1))^(n_suppliers - 1)
expected_min <- 1 + (max_days - 1)/(n_suppliers + 1)
# Plot CDF and PDF
plot(y_values, cdf_min, type = "l", col = "salmon", lwd = 2,
xlab = "Minimum Lead Time (Y) in Days", ylab = "Probability",
main = "Distribution of Minimum Lead Time from Suppliers")
lines(y_values, pdf_min, type = "l", col = "cyan", lwd = 2)
legend("bottomright", legend = c("CDF", "PDF"),
col = c("salmon", "cyan"), lty = 1, lwd = 2)

# Results
cat("The earliest possible time to begin production:", round(expected_min, 2), "days", "\n")
## The earliest possible time to begin production: 4.17 days
2. Maintenance Planning for Critical Equipment
# Provided stats
expected_lifetime <- 8
p <- 1 / expected_lifetime
a. Geometric Model
# Calculations
prob_no_fail_6yr_geom <- (1 - p)^6
ev_geom <- 1 / p
sd_geom <- sqrt((1 - p) / p^2)
# Results
cat("The probability that the machine will not fail for the first 6 years:", round(prob_no_fail_6yr_geom,4), "\n")
## The probability that the machine will not fail for the first 6 years: 0.4488
cat("Expected value:", ev_geom, "\n")
## Expected value: 8
cat("Standard deviation:", round(sd_geom,4))
## Standard deviation: 7.4833
b. Exponential Model
# Calculations
prob_no_fail_6yr_exp <- exp(-6 / expected_lifetime)
ev_exp <- expected_lifetime
sd_exp <- expected_lifetime
# Results
cat("The probability that the machine will not fail for the first 6 years:", round(prob_no_fail_6yr_exp,4), "\n")
## The probability that the machine will not fail for the first 6 years: 0.4724
cat("Expected value:", ev_exp, "\n")
## Expected value: 8
cat("Standard deviation:", sd_exp)
## Standard deviation: 8
c. Binomial Model
# Calculations
prob_no_fail_binom <- dbinom(0, size = 6, prob = p)
ev_binom <- 6 * p
sd_binom <- sqrt(6 * p * (1 - p))
# Results
cat("The probability that the machine will not fail for the first 6 years given that it is expected to fail once every 8 years:", round(prob_no_fail_binom,4), "\n")
## The probability that the machine will not fail for the first 6 years given that it is expected to fail once every 8 years: 0.4488
cat("Expected value:", ev_binom, "\n")
## Expected value: 0.75
cat("Standard deviation:", round(sd_binom,4))
## Standard deviation: 0.8101
d. Poisson Model
# Calculations
lambda <- 6 / expected_lifetime
prob_no_fail_poisson <- dpois(0, lambda = lambda)
ev_poisson <- lambda
sd_poisson <- sqrt(lambda)
# Results
cat("The probability that the machine will not fail for the first 6 years:", round(prob_no_fail_poisson,4), "\n")
## The probability that the machine will not fail for the first 6 years: 0.4724
cat("Expected value:", ev_poisson, "\n")
## Expected value: 0.75
cat("Standard deviation:", round(sd_poisson,4))
## Standard deviation: 0.866
Problem 4
1. Sum of Two Independent Exponential Times
# Provided stats
lambda_A <- 0.1
lambda_B <- 0.2
# Calculations
mean_total_exp <- 1/lambda_A + 1/lambda_B
var_total_exp <- 1/(lambda_A^2) + 1/(lambda_B^2)
sd_total_exp <- sqrt(var_total_exp)
# Results
cat("The distribution of the total time until both servers have failed at least once (mean):", mean_total_exp, "hours\n")
## The distribution of the total time until both servers have failed at least once (mean): 15 hours
cat("Standard deviation:", round(sd_total_exp,2), "hours")
## Standard deviation: 11.18 hours
2. Sum of Independent Normally Distributed Random Variables
# Provided stats
mean_X <- 0.05
sd_X <- 0.1
mean_Y <- 0.02
sd_Y <- 0.08
# Calculations
mean_portfolio <- mean_X + mean_Y
sd_portfolio <- sqrt(sd_X^2 + sd_Y^2)
# Results
cat("The distribution of the combined return of the portfolio consisting of these two assets (mean):", mean_portfolio, "hours\n")
## The distribution of the combined return of the portfolio consisting of these two assets (mean): 0.07 hours
cat("Standard deviation:", round(sd_portfolio,2), "hours")
## Standard deviation: 0.13 hours
3. Sum of Independent Poissons
# Provided stats
lambda_A_calls <- 5
lambda_B_calls <- 3
# Calculations
lambda_total <- lambda_A_calls + lambda_B_calls
# Results
cat("The distribution of the total number of calls received in an hour from both regions (mean):", lambda_total, "hours\n")
## The distribution of the total number of calls received in an hour from both regions (mean): 8 hours
cat("Standard deviation:", round(sqrt(lambda_total),2), "hours")
## Standard deviation: 2.83 hours
Problem 5
1. Customer Retention and Churn Analysis
a. Transition Matrix
# Transition matrix
transition_matrix <- matrix(c(
0.80, 0.15, 0.05,
0.30, 0.50, 0.20,
0.00, 0.00, 1.00
), nrow = 3, byrow = TRUE)
rownames(transition_matrix) <- colnames(transition_matrix) <- c('Retention', 'At-Risk', 'Churn')
# Results
cat("Transition Matrix - Customer Retention:\n")
## Transition Matrix - Customer Retention:
print(transition_matrix)
## Retention At-Risk Churn
## Retention 0.8 0.15 0.05
## At-Risk 0.3 0.50 0.20
## Churn 0.0 0.00 1.00
b. Probability to Eventually Churn
# Calculations
Q <- transition_matrix[1:2, 1:2] # transient states
R <- transition_matrix[1:2, 3, drop=FALSE] # transient to absorbing
# Fundamental matrix
N <- solve(diag(2) - Q)
# Absorption probabilities matrix
B <- N %*% R
# Results
cat("\nProbability of eventually churning (move to State 3):\n")
##
## Probability of eventually churning (move to State 3):
print(B)
## Churn
## Retention 1
## At-Risk 1
c. Steady-State Distribution
# Calculations
steady_state_customers <- round(transition_matrix %^% 100, 3)
# Results
cat("\nSteady-state distribution of this Markov Chain:\n")
##
## Steady-state distribution of this Markov Chain:
print(steady_state_customers)
## Retention At-Risk Churn
## Retention 0 0 1
## At-Risk 0 0 1
## Churn 0 0 1
cat("\nPercentage of customers can the company expect to be in each state in the long run:\n")
##
## Percentage of customers can the company expect to be in each state in the long run:
cat("All customers expected to be in the churned state (100% churn)")
## All customers expected to be in the churned state (100% churn)
2. Inventory Management in a Warehouse
a. Transition Matrix
# Transition matrix
transition_matrix_2 <- matrix(c(
0.70, 0.25, 0.05,
0.20, 0.50, 0.30,
0.10, 0.40, 0.50
), nrow = 3, byrow = TRUE)
rownames(transition_matrix_2) <- colnames(transition_matrix_2) <- c('High', 'Medium', 'Low')
# Results
cat("\nTransition Matrix - Inventory Management:\n")
##
## Transition Matrix - Inventory Management:
print(transition_matrix_2)
## High Medium Low
## High 0.7 0.25 0.05
## Medium 0.2 0.50 0.30
## Low 0.1 0.40 0.50
b. Probability to End up in a Low Inventory
Level
# Calculations
prob_limit <- transition_matrix_2 %^% 100
# Results
cat("\nProbability of eventually ending in Low inventory from High:", round(prob_limit[1, "Low"], 3), "\n")
##
## Probability of eventually ending in Low inventory from High: 0.267
c. Steady-State Distribution
# Calculations
eigen_decomp <- eigen(t(transition_matrix_2))
steady_state_inventory <- Re(eigen_decomp$vectors[,1])
steady_state_inventory <- steady_state_inventory / sum(steady_state_inventory)
names(steady_state_inventory) <- c('High', 'Medium', 'Low')
# Results
cat("\nSteady-state distribution of this Markov Chain:\n")
##
## Steady-state distribution of this Markov Chain:
print(round(steady_state_inventory, 3))
## High Medium Low
## 0.347 0.387 0.267