1- (Bayesian). A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease? If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 per administration, what is the total first-year cost for treating 100,000 individuals?
Answer:
The probability that an individual actually has the disease given a positive test result is basically the positive predictive value (PPV). This is calculated (true positive rate)/(true positive rate + true negative rate). Using Bayes Theorem though, it would be the posterior probability that the person has the disease given a positive test result (P(Disease|Positive)): \[ P(\text{Disease} | \text{Positive}) = \frac{P(\text{Positive} | \text{Disease}) \cdot P(\text{Disease})}{P(\text{Positive})} \] where:
\[ P(\text{Positive}) = P(\text{Positive} | \text{Disease}) \cdot P(\text{Disease}) + P(\text{Positive} | \text{No Disease}) \cdot P(\text{No Disease}) \]
and:
\[ P(\text{Positive} | \text{No Disease}) = 1 - \text{Specificity} \]
\[ P(\text{No Disease}) = 1 - P(\text{Disease}) \]
\[ \text{Total Cost} = (\text{Number of Tests} \times \text{Test Cost}) + (\text{Number of True Positives} \times \text{Treatment Cost per Case}) \]
\[ \text{Number of True Positives} = P(\text{Disease} | \text{Positive}) \times \text{Expected Number of Positive Tests} \]
\[ \text{Expected Number of Positive Tests} = P(\text{Positive}) \times \text{Total Individuals} \]
sensitivity <- 0.96 # P(Positive|Disease)
specificity <- 0.98 # P(Negative|No Disease)
prevalence <- 0.001 # P(Disease)
cost_per_case <- 100000
cost_per_test <- 1000
population <- 100000
# P(Positive|No Disease)
p_pos_no_disease <- 1 - specificity
# P(No Disease)
p_no_disease <- 1 - prevalence
# P(Positive)
p_positive <- (sensitivity * prevalence) + (p_pos_no_disease * p_no_disease)
# P(Disease|Positive)
p_disease_given_positive <- (sensitivity * prevalence) / p_positive
# expected number of true positives
expected_true_positives <- p_disease_given_positive * population * p_positive
# total first-year cost
total_cost <- (expected_true_positives * cost_per_case) + (population * cost_per_test)
cat("Probability that an individual actually has the disease given a positive test result:", p_disease_given_positive, "\n")
## Probability that an individual actually has the disease given a positive test result: 0.04584527
cat("Expected number of true positives:", expected_true_positives, "\n")
## Expected number of true positives: 96
cat("Total first-year cost for treating 100,000 individuals:", total_cost, "\n")
## Total first-year cost for treating 100,000 individuals: 109600000
2- (Binomial). The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections? What is the probability that, after 24 months, you received 2 or more inspections? What is the probability that your received fewer than 2 inspections? What is the expected number of inspections you should have received? What is the standard deviation?
Answer:
Probability of Exactly 2 Inspections: Using the the binomial distribution, the probability mass function (PMF) for exactly \(k\) inspections in \(n\) months in a binomial distribution is given by: \[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \] where \(\binom{n}{k}\) is the binomial coefficient, \(p\) is the probability of an inspection, and \(1-p\) is the probability of no inspection.
n <- 24
p <- 0.05
prob_exactly_2 <- dbinom(2, size = n, prob = p)
cat("Probability of exactly 2 inspections: ", prob_exactly_2, "\n")
## Probability of exactly 2 inspections: 0.2232381
Probability of 2 or More Inspections: This is calculated as 1 minus the cumulative probability of getting 1 or fewer inspections.
prob_2_or_more <- 1 - pbinom(1, size = n, prob = p)
cat("Probability of 2 or more inspections: ", prob_2_or_more, "\n")
## Probability of 2 or more inspections: 0.3391827
Probability of Fewer Than 2 Inspections: This is the cumulative probability of getting 0 or 1 inspection.
prob_fewer_than_2 <- pbinom(1, size = n, prob = p)
cat("Probability of fewer than 2 inspections: ", prob_fewer_than_2, "\n")
## Probability of fewer than 2 inspections: 0.6608173
Expected Number of Inspections: The expected number of inspections is given by: \[ E[X] = np \]
expected_inspections <- n * p
cat("Expected number of inspections: ", expected_inspections, "\n")
## Expected number of inspections: 1.2
Standard Deviation: The standard deviation of the number of inspections is given by: \[ \sigma = \sqrt{np(1-p)} \]
std_deviation <- sqrt(n * p * (1 - p))
cat("Standard deviation: ", std_deviation, "\n")
## Standard deviation: 1.067708
3- (Poisson). You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour? What is the probability that more than 10 arrive in one hour? How many would you expect to arrive in 8 hours? What is the standard deviation of the appropriate probability distribution? If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?
Answer:
The Poisson distribution is appropriate for modeling the number of events (in this case, patients arriving at a clinic) in a fixed interval of time or space, given a constant mean rate of occurrence. Probability of Exactly 3 Patients in One Hour: The probability of observing exactly \(k\) arrivals in a fixed interval is given by the Poisson probability mass function: \[ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} \] where \(\lambda = 10\) patients per hour.
lambda <- 10
prob_exactly_3 <- dpois(3, lambda)
cat("Probability of exactly 3 patients: ", prob_exactly_3, "\n")
## Probability of exactly 3 patients: 0.007566655
Probability of More Than 10 Patients in One Hour: This requires calculating the complement of the cumulative probability of 10 or fewer arrivals.
prob_more_than_10 <- 1 - ppois(10, lambda)
cat("Probability of more than 10 patients: ", prob_more_than_10, "\n")
## Probability of more than 10 patients: 0.4169602
Expected Arrivals in 8 Hours and Standard Deviation: The expected number of arrivals over 8 hours is \(8 \times \lambda\), with a standard deviation of \(\sqrt{8 \times \lambda}\).
expected_8_hours <- 8 * lambda
std_dev_8_hours <- sqrt(8 * lambda)
cat("Expected patients in 8 hours: ", expected_8_hours, "\n")
## Expected patients in 8 hours: 80
cat("Standard deviation in 8 hours: ", std_dev_8_hours, "\n")
## Standard deviation in 8 hours: 8.944272
Percent Utilization: With three providers and a total daily capacity for 72 patient visits, we calculate the utilization based on the expected number of arrivals.
total_capacity <- 3 * 24
percent_utilization <- (8 * lambda) / total_capacity * 100
cat("Percent utilization: ", percent_utilization, "%\n")
## Percent utilization: 111.1111 %
recommendations <- ifelse(percent_utilization > 100, "May need to increase capacity (number of providers) or change/optimize appointment scheduling.", "Capacity is adequate.")
cat("Recommendations: ", recommendations, "\n")
## Recommendations: May need to increase capacity (number of providers) or change/optimize appointment scheduling.
4- (Hypergeometric). Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips? How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?
Answer: In this case, the population is the workers, divided into two groups: nurses and non-nurses. The key question is the probability of selecting a specific number of nurses (5 in this case) for the company-paid trips, given no favoritism. The Hypergeometric distribution can have the following parameters:
N <- 30
K <- 15
n <- 6
k <- 5
# probability of selecting 5 nurses out of 6 trips
prob_select_5_nurses <- dhyper(k, K, N-K, n)
# expected number of nurses
expected_nurses <- n * K / N
# expected number of non-nurses
expected_non_nurses <- n * (N - K) / N
cat("Probability of selecting 5 nurses: ", prob_select_5_nurses, "\n")
## Probability of selecting 5 nurses: 0.07586207
cat("Expected number of nurses to be sent: ", expected_nurses, "\n")
## Expected number of nurses to be sent: 3
cat("Expected number of non-nurses to be sent: ", expected_non_nurses, "\n")
## Expected number of non-nurses to be sent: 3
So yes, there was favoritism towards nurses.
5- (Geometric). The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year? In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured? Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?
Answer: Given a .1% chance (p=0.001) of serious injury per hour while driving, we can explore the risk over different periods and the expected timeframe for such an event. The Geometric distribution has two key properties we’ll use:
For continuous periods like 1200 hours or 15 months (equivalent to \(\frac{15}{12} \times 1200\) hours), we need to adjust our approach since the Geometric distribution assumes discrete trials. However, we can approximate the probability of at least one success (injury) over a period using the complement of the probability of no successes:
Probability of Being Injured Within a Year (1200 hours): The probability of at least one injury occurring within 1200 hours is calculated as: \[ P(\text{injury in 1200 hours}) = 1 - (1 - p)^{1200} \]
# probability of injury per hour
p <- 0.001
prob_injury_1200 <- 1 - (1 - p)^1200
cat("Probability of injury in 1200 hours: ", prob_injury_1200, "\n")
## Probability of injury in 1200 hours: 0.6989866
Probability of Being Injured Within 15 Months (1500 hours): Similarly, for 15 months: \[ P(\text{injury in 1500 hours}) = 1 - (1 - p)^{1500} \]
prob_injury_1500 <- 1 - (1 - p)^1500
cat("Probability of injury in 1500 hours: ", prob_injury_1500, "\n")
## Probability of injury in 1500 hours: 0.7770372
Expected Hours Before Being Injured: The expected number of hours before the first injury occurs is: \[ E(X) = \frac{1}{p} \]
expected_hours <- 1 / p
cat("Expected number of hours before injury: ", expected_hours, "\n")
## Expected number of hours before injury: 1000
Risk After 1200 Hours for the Next 100 Hours: The risk of injury in the next 100 hours, regardless of the previous driving history, is given by: \[ P(\text{injury in next 100 hours}) = 1 - (1 - p)^{100} \]
prob_injury_next_100 <- 1 - (1 - p)^100
cat("Probability of injury in next 100 hours: ", prob_injury_next_100, "\n")
## Probability of injury in next 100 hours: 0.09520785
6- You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?
Answer: We can model the generator failures using the Poisson distribution. The Poisson distribution is appropriate here because we’re dealing with the number of occurrences of an event (generator failures) within a fixed period of time, given a known average rate of those events happening. The probability that the generator will fail more than twice in 1000 hours is given by: \[ P(X > 2) = 1 - (P(X=0) + P(X=1) + P(X=2)) \] where \(P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}\). The expected number of failures in 1000 hours is the rate of failure, \(\lambda\), which equals: \[ E(X) = \lambda \]
# rate of failure
lambda <- 1
# probability of more than two failures in 1000 hours
prob_more_than_2 <- 1 - (ppois(2, lambda))
# expected value (mean) of failures in 1000 hours
expected_value <- lambda
cat("Probability of more than two failures in 1000 hours: ", prob_more_than_2, "\n")
## Probability of more than two failures in 1000 hours: 0.0803014
cat("Expected number of failures in 1000 hours: ", expected_value, "\n")
## Expected number of failures in 1000 hours: 1
7- A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes? If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen? What is the expected waiting time?
Answer: We can use the properties of the Uniform distribution, where there is an equal probability of all outcomes within a specified range. Here, the waiting time is uniformly distributed between 0 and 30 minutes. The probability that a patient will wait more than 10 minutes is given by: \[ P(X > 10) = \frac{30 - 10}{30 - 0} = \frac{20}{30} \] Given the patient has already waited 10 minutes, the probability of waiting at least another 5 minutes is: \[ P(X > 15 | X > 10) = \frac{30 - 15}{30 - 10} = \frac{15}{20} \] The expected waiting time for a patient is: \[ E(X) = \frac{0 + 30}{2} = 15 \] minutes
a <- 0 # minimum
b <- 30 # maximum
# probability of waiting more than 10 minutes
prob_more_than_10 <- (b - 10) / (b - a)
# given already waited 10 minutes, probability of waiting at least another 5 minutes
prob_at_least_5_more <- (b - 15) / (b - 10)
# expected waiting time
expected_waiting_time <- (a + b) / 2
cat("Probability of waiting more than 10 minutes: ", prob_more_than_10, "\n")
## Probability of waiting more than 10 minutes: 0.6666667
cat("Probability of waiting at least another 5 minutes after 10 minutes: ", prob_at_least_5_more, "\n")
## Probability of waiting at least another 5 minutes after 10 minutes: 0.75
cat("Expected waiting time: ", expected_waiting_time, " minutes\n")
## Expected waiting time: 15 minutes
8- Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value).Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time? What is the standard deviation? What is the probability that your MRI will fail after 8 years? Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?
Answer: the failure of MRIs is described by an exponential distribution, which is often used to model the time between events in a continuously occurring process, such as the time until a machine fails. The key parameter for the exponential distribution is the rate parameter (lamda), which is the reciprocal of the expected value (mu) or the mean time to failure. Given the expected value (\(\mu\)) of 10 years:
lambda <- 1 / 10
# probability of failing after 8 years
prob_fail_after_8 <- exp(-lambda * 8)
# probability of failing in the next 2 years, given 8 years of ownership
prob_fail_next_2 <- exp(-lambda * 2)
cat("Expected failure time: 10 years\n")
## Expected failure time: 10 years
cat("Standard deviation: 10 years\n")
## Standard deviation: 10 years
cat("Probability of failing after 8 years: ", prob_fail_after_8, "\n")
## Probability of failing after 8 years: 0.449329
cat("Probability of failing in the next 2 years, given 8 years of ownership: ", prob_fail_next_2, "\n")
## Probability of failing in the next 2 years, given 8 years of ownership: 0.8187308