Haig Bedros

1. (Bayesian). A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease? If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 peradministration, what is the total first-year cost for treating 100,000 individuals?

Part 1:

Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease?

To find the probability $P(\text{Disease}|\text{Positive})$, we use Bayes’ theorem with the following inputs:

Prevalence ($P(\text{Disease})$): 0.1% or 0.001
Sensitivity ($P(\text{Positive}|\text{Disease})$): 96% or 0.96
Specificity ($P(\text{Negative}|\text{Not Disease})$): 98% or 0.98

Bayes’ theorem formula for this scenario is:

\[ P(\text{Disease}|\text{Positive}) = \frac{P(\text{Positive}|\text{Disease}) \times P(\text{Disease})}{P(\text{Positive})} \]

Where:

$P(\text{Positive}) = P(\text{Positive}|\text{Disease}) \times P(\text{Disease}) + P(\text{Positive}|\text{Not Disease}) \times P(\text{Not Disease})$
$P(\text{Positive}|\text{Not Disease}) = 1 - \text{Specificity}$

Let’s calculate $P(\text{Disease}|\text{Positive})$ using these inputs.

prevalence <- 0.001
sensitivity <- 0.96
specificity <- 0.98

# Calculate the probability of a positive test result given no disease (false positive rate)
prob_positive_given_not_disease <- 1 - specificity

# Calculate the overall probability of testing positive
prob_positive <- (sensitivity * prevalence) + (prob_positive_given_not_disease * (1 - prevalence))

# Apply Bayes' theorem to calculate P(Disease|Positive)
prob_disease_given_positive <- (sensitivity * prevalence) / prob_positive

print(prob_disease_given_positive)

## [1] 0.04584527

The probability that an individual actually has the disease given a positive test result, $P(\text{Disease}|\text{Positive})$, is approximately 4.58%. This means that even with a positive test result from this new test, the chance of actually having the MNR HIV-1 is relatively low, primarily due to the rarity of the disease (low prevalence rate).

Part 2

If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 peradministration, what is the total first-year cost for treating 100,000 individuals?

To answer the question we need to consider:

The cost of administering the test to 100,000 individuals.
The expected number of positive cases, which is the product of the total number of individuals tested and the overall probability of testing positive.
The total treatment cost, which is the product of the expected number of positive cases and the median cost per positive case.

The total first-year cost is the sum of the cost of administering the tests to all individuals and the total treatment cost for all expected positive cases.

Let’s calculate the total first-year cost using the previously calculated probability of testing positive ($P(\text{Positive})$) and the given costs.

# Cost parameters
test_cost_per_administration = 1000
median_cost_per_positive_case = 100000
total_individuals = 100000

# Calculate the total cost of administering the tests to all individuals
total_testing_cost = total_individuals * test_cost_per_administration

# Calculate the expected number of positive cases
expected_positive_cases = total_individuals * prob_positive

# Calculate the total treatment cost for all expected positive cases
total_treatment_cost = expected_positive_cases * median_cost_per_positive_case

# Calculate the total first-year cost
total_first_year_cost = total_testing_cost + total_treatment_cost

total_first_year_cost

## [1] 309400000

The total first-year cost for treating 100,000 individuals, considering the cost of test administration and the treatment of expected positive cases, is approximately $309,400,000. This includes the cost of administering the test to all individuals and the treatment costs for those who test positive, based on the test’s sensitivity, specificity, and the prevalence rate of the disease.

2. (Binomial). The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections? What is the probability that, after 24 months, you received 2 or more inspections? What is the probability that your received fewer than 2 inspections? What is the expected number of inspections you should have received? What is the standard deviation?

Given:

Number of trials ($n$) = 24 months
Probability of success on a single trial ($p$) = 0.05 (probability of receiving an inspection in any given month)

We need to find:

$P(X = 2)$: The probability of exactly 2 inspections in 24 months.
$P(X \geq 2)$: The probability of 2 or more inspections in 24 months.
$P(X < 2)$: The probability of fewer than 2 inspections in 24 months.
The expected number of inspections ($\mu$) = $n \times p$.
The standard deviation ($\sigma$) = $\sqrt{n \times p \times (1 - p)}$.

Using the base R function dbinom.

Probability of exactly 2 inspections:

prob_exactly_2 <- dbinom(2, size = 24, prob = 0.05)
prob_exactly_2

## [1] 0.2232381

Probability of 2 or more inspections:

prob_2_or_more <- 1 - pbinom(1, size = 24, prob = 0.05)
prob_2_or_more

## [1] 0.3391827

Probability of fewer than 2 inspections:

prob_fewer_than_2 <- pbinom(1, size = 24, prob = 0.05)
prob_fewer_than_2

## [1] 0.6608173

Expected number of inspections: This is simply the product of the number of trials and the probability of success on each trial.

expected_inspections <- 24 * 0.05
expected_inspections

## [1] 1.2

Standard deviation: The standard deviation of a binomial distribution is calculated as the square root of $n \times p \times (1 - p)$.

std_dev_inspections <- sqrt(24 * 0.05 * (1 - 0.05))
std_dev_inspections

## [1] 1.067708

3. (Poisson). You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour? What is the probability that more than 10 arrive in one hour? How many would you expect to arrive in 8 hours? What is the standard deviation of the appropriate probability distribution? If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

I used the Poisson distribution to analyze patient arrivals at a family practice clinic, where the rate is 10 patients per hour. In R, I calculated the probability of exactly 3 patients arriving in one hour using the dpois function, and found the probability of more than 10 patients arriving in one hour by subtracting the cumulative probability of 10 or fewer patients (ppois) from 1. The expected number of patients in 8 hours is simply 80, as the arrival rate is 10 per hour. The standard deviation, which represents the variability in patient arrivals, can be calculated as the square root of the expected number in 8 hours.

With three providers, each capable of seeing 24 patients a day, the clinic can handle 72 patients daily. By calculating the utilization rate, I found that the clinic is overutilized at 111.11%, indicating that demand exceeds capacity. This analysis suggests that the clinic may need to adjust staffing or operational hours to better accommodate patient arrivals.

# Probability of exactly 3 arrivals in one hour
prob_exactly_3 <- dpois(3, lambda = 10)

# Probability of more than 10 arrivals in one hour
prob_more_than_10 <- 1 - ppois(10, lambda = 10)

# Expected number of arrivals in 8 hours
expected_arrivals_8_hours <- 10 * 8

# Standard deviation for 8 hours
std_dev_8_hours <- sqrt(expected_arrivals_8_hours)

# Percent utilization
providers <- 3
patients_per_provider_per_day <- 24
total_capacity <- providers * patients_per_provider_per_day
percent_utilization <- (expected_arrivals_8_hours / total_capacity) * 100

4. (Hypergeometric). Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips? How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?

In this scenario, I analyzed the probability of a supervisor selecting 5 nurses out of 6 company-paid trips to Disney World using the Hypergeometric distribution in R. The total workforce consisted of 30 individuals, evenly divided between nurses and non-nurses. The dhyper function in R was used to find the probability of choosing exactly 5 nurses out of 6 possible selections, which turned out to be approximately 7.59%.

To understand the expected behavior without any bias, I calculated the expected number of nurses and non-nurses that should have been selected for the trips. Given the equal distribution of nurses and non-nurses, one would typically expect 3 nurses and 3 non-nurses to be selected for the 6 trips, under the assumption of a fair selection process. This analysis highlights the deviation in the actual selection from what would be statistically expected in a fair scenario.

# Probability of selecting 5 nurses out of 6 trips
prob_5_nurses <- dhyper(5, 15, 15, 6)

# Expected number of nurses
expected_nurses <- 6 * (15 / 30)

# Expected number of non-nurses
expected_non_nurses <- 6 * (15 / 30)

5. (Geometric). The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year? In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured? Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

I calculated the probability of a driver being injured in a car crash using the Geometric distribution in R, given a .1% chance of injury per hour. Over 1200 hours in a year, the chance of at least one injury is about 69.9%. Extending to 15 months, the probability slightly increases due to more driving hours. The expected time before the first injury occurs is around 1000 hours, highlighting the rarity of such events. Even after driving for 1200 hours, the probability of an injury in the next 100 hours remains at about 9.5%, due to the memoryless property of the Geometric distribution.

p <- 0.001  # Injury probability per hour

# Probability of injury over 1200 hours
prob_year <- 1 - (1 - p) ^ 1200

# Probability of injury over 15 months (1200 * 15/12 hours)
prob_15_months <- 1 - (1 - p) ^ (1200 * 15/12)

# Expected hours before injury
expected_hours <- 1 / p

# Probability of injury in the next 100 hours after 1200 hours
prob_next_100 <- 1 - (1 - p) ^ 100

6. You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?

To determine the likelihood of the hospital’s primary generator failing more than twice in a 1000-hour timeframe and its expected failure rate, I utilized the Poisson distribution in R. The calculation revealed that the probability of the generator failing more than twice in this period is about 8.03%, with an expected failure rate of once every 1000 hours.

library(stats)

# Given mean rate of failure
lambda_per_1000_hours <- 1 

# Probability of more than 2 failures in 1000 hours
prob_more_than_2_failures <- 1 - ppois(2, lambda_per_1000_hours)

# Expected value (mean number of failures in 1000 hours) is simply the lambda
expected_failures <- lambda_per_1000_hours

print(paste('Probability of more than 2 failures:', prob_more_than_2_failures))

## [1] "Probability of more than 2 failures: 0.0803013970713942"

print(paste('Expected number of failures:', expected_failures))

## [1] "Expected number of failures: 1"

This R code snippet calculates the probability of experiencing more than two generator failures in 1000 hours and the expected number of failures within the same timeframe using the Poisson distribution.

7. A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes? If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen? What is the expected waiting time?

For a patient arriving for surgery, given the waiting time is uniformly distributed between 0 and 30 minutes:

The probability of waiting more than 10 minutes is simply calculated by the proportion of the distribution above 10 minutes. Since it’s uniform, this is straightforward: $(30 - 10) / (30 - 0) = 20 / 30$.
If the patient has already waited 10 minutes, the remaining waiting time is still uniformly distributed between the remaining times (10 to 30 minutes). The probability of waiting at least another 5 minutes (so waiting more than 15 minutes in total) is $(30 - 15) / (30 - 10) = 15 / 20$.
The expected waiting time for a uniform distribution is the average of the minimum and maximum times, which in this case is $(0 + 30) / 2 = 15$ minutes.

Here’s the R code to calculate these probabilities and the expected waiting time:

# Define the parameters for the uniform distribution
min_wait <- 0  # Minimum wait time in minutes
max_wait <- 30  # Maximum wait time in minutes

# Probability of waiting more than 10 minutes
prob_more_than_10 <- 1 - punif(10, min_wait, max_wait)

# Probability of waiting at least another 5 minutes after already waiting 10
prob_additional_5_after_10 <- 1 - punif(15, min_wait, max_wait)

# The expected waiting time for a uniform distribution
expected_wait <- (min_wait + max_wait) / 2

print(paste('Probability of waiting more than 10 minutes:', prob_more_than_10))

## [1] "Probability of waiting more than 10 minutes: 0.666666666666667"

print(paste('Probability of waiting at least another 5 minutes after 10:', prob_additional_5_after_10))

## [1] "Probability of waiting at least another 5 minutes after 10: 0.5"

print(paste('Expected waiting time:', expected_wait, 'minutes'))

## [1] "Expected waiting time: 15 minutes"

This R code snippet calculates the required probabilities and the expected waiting time using the uniform distribution functions in R.

8. Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time? What is the standard deviation? What is the probability that your MRI will fail after 8 years? Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?

For analyzing the lifetime of an old MRI with an expected lifetime of 10 years and failure times following an exponential distribution, you can use the following R code snippet. This code will calculate the expected failure time, standard deviation, the probability of failure after 8 years, and the probability of failure in the next 2 years given the machine has already lasted 8 years.

# Define the rate parameter for the exponential distribution
# The expected value (mean) for an exponential distribution is 1/lambda
lambda <- 1 / 10  # 10 years is the expected lifetime

# The expected failure time is the mean of the distribution, which is 1/lambda
expected_failure_time <- 1 / lambda

# The standard deviation of an exponential distribution is also 1/lambda
std_dev <- 1 / lambda

# Probability that the MRI will fail after 8 years
prob_fail_after_8_years <- exp(-lambda * 8)

# For the memoryless property of the exponential distribution,
# the probability that the machine will fail in the next 2 years, given it has already lasted 8 years,
# is the same as the probability of failing within 2 years from the start
prob_fail_next_2_years_after_8 <- exp(-lambda * 2)

print(paste('Expected failure time:', expected_failure_time, 'years'))

## [1] "Expected failure time: 10 years"

print(paste('Standard deviation:', std_dev, 'years'))

## [1] "Standard deviation: 10 years"

print(paste('Probability of failing after 8 years:', prob_fail_after_8_years))

## [1] "Probability of failing after 8 years: 0.449328964117222"

print(paste('Probability of failing in the next 2 years after 8 years:', prob_fail_next_2_years_after_8))

## [1] "Probability of failing in the next 2 years after 8 years: 0.818730753077982"

Haig Bedros - HW5

2024-02-27

Part 1:

Part 2

6. You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?