hw5

Question 1

(Bayesian). A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease?*

In probability notation, what we are looking to solve for is the $P(H|Pos)$, or the probability of a person genuinely having HIV-1 given that they got a positive test result. Bayes’ theorem allows for calculating such a probability–that is, the probability of an event (the person having HIV-1) given a known, related outcome (a positive test), along with the probabilities of relevant outcomes.

Per Bayes’ theorem, substituting in our variables: \[ P(H|Pos) = \frac{P(Pos|H) * P(H)}{P(Pos)} \] We mostly have what we need here. $P(Pos|H)$ is the probability of a positive test result when the patient actually has HIV-1, also known as the sensitivity of the test (0.98). $P(H)$ represents the likelihood that any given person has HIV-1 (0.001).

We are not given $P(Pos)$ directly, as there are multiple circumstances under which someone could test positive (ie. that they actually have the disease and the test catches it, or they do not and the test registers a false positive). So in our case, we must treat $P(Pos)$ as the union of the range of possibilities for testing positive. $P(Pos) = P(Pos|H\cup Pos|H')$. Using our knowledge of probability unions, this is equivalent to $P(Pos|H) * P(H) + P(Pos|H') * P(H')$. We now have all the values we need to calculate the probability of $P(H|Pos)$:

\[ P(H|Pos) = \frac{P(Pos|H) * P(H)}{P(Pos|H) * P(H) + P(Pos|H') * P(H')} \]

P_pos_given_H <- 0.096
P_H <- 0.001
P_pos_given_H_prime <- 0.02
P_H_prime <- 0.999

(P_pos_given_H * P_H) / ((P_pos_given_H * P_H) + (P_pos_given_H_prime * P_H_prime))

## [1] 0.004781829

\[ P(H|Pos) = \frac{(0.96) * (0.001)}{(0.96) * (0.001) + (0.02) * (0.999)} = 0.0478 \]

If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 per administration, what is the total first-year cost for treating 100,000 individuals?

In order to get this total cost figure, we must add the flat cost of testing 100,000 indidividuals ($\$1,000 * 100,000$ or $\$100,000,000$) with the cost of treating a person who tested positive ($\$100,000$) times the number of likely positives in a sample of 100,000 people.

That second number should be give by the combined probability $P(Pos)$ of testing positive given someone actually has the disease (represented by $P(Pos|H)$), and the probability of testing positive without the disease ($P(Pos|H')$). We know from the denominator of our above calculation that that amounts to $(0.98) * (0.001) + (0.02) * (0.999)$, or $0.02978$. Therefore, the total cost of treating those 100,000 people is:

\[ (n\;people) * (test\;cost) + (treatment\;cost) * ((n\;people) * (pos.\;test\:likelihood)) \]

n_people <- 100000
cost_per_test <- 1000

treatment_cost <- 100000
pos_test_likelihood <- ((P_pos_given_H * P_H) + (P_pos_given_H_prime * P_H_prime))

(n_people * cost_per_test) + treatment_cost * (n_people * pos_test_likelihood)

## [1] 300760000

\[ (100,000) * (1,000) + (100,000) * ((100,000) * (0.020076)) \]

That calculates out to $\$300,760,000$.

Question 2

(Binomial). The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections?

For this problem, because it involves trials over a set period, where there is either success or failure at a given probability, we can rely on the binomial probability formula.

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \] Per this formula, $P(X=k)$ represent the probability that in $n$ number of trials (in our case, months), there will be exactly $k$ successes (inspections…though I doubt the organization would consider those successes!), given the discrete probability $p$ of any one of those successes (our 5% chance of an inspection). Let’s plug in the appropriate values for our variables for this first question and solve accordingly.

Note that $\binom{n}{k}$ does not represent a fraction, but the number of ways in which $k$ successes can occur in $n$ trials. That can be calculated using factorials accordingly:

\[ \frac{n!}{(n-k)!k!} \]

n_months <- 24
k_inspections <- 2
p_of_inspection <- 0.05

(factorial(n_months)) / (factorial(n_months-k_inspections) * factorial(k_inspections)) * (p_of_inspection ^ k_inspections) * ((1 - p_of_inspection)^(n_months - k_inspections))

## [1] 0.2232381

So, the probability of getting exactly 2 inspections in 24 months is about 22.3%.

What is the probability that, after 24 months, you received 2 or more inspections?

The this probability could be represented as the sum of probabilities of 2 inspections, 3 inspections, 4, 5, etc. all the way up to 24 (the max number of possible inspections). While there are certainly easier ways of doing this, now that I have a formula for solving the probability for any $k$ number of inspections, I can iteratively sum those vales for a series of $k$s.

prob_of_k_inspections <- function(n_months,
                                  k_inspections,
                                  p_of_inspection) {
  return((factorial(n_months)) / (factorial(n_months-k_inspections) * factorial(k_inspections)) * (p_of_inspection ^ k_inspections) * ((1 - p_of_inspection)^(n_months - k_inspections)))
}

total_prob <- 0

for (i in 2:24) {
  total_prob <- total_prob + prob_of_k_inspections(24, i, 0.05)
}

print(total_prob)

## [1] 0.3391827

The total probability of 2 or more inspections is about 33.9%.

What is the probability that your received fewer than 2 inspections?

Using the same basic methodology:

total_prob <- 0

for (i in 0:1) {
  total_prob <- total_prob + prob_of_k_inspections(24, i, 0.05)
}

print(total_prob)

## [1] 0.6608173

The probability of receiving 1 or 0 inspections in 2 years is roughly 66.1%. Not the worst odds if you’re playing fast and loose on standards…

What is the expected number of inspections you should have received? What is the standard deviation?

The expected number of inspections is, quite simply, the number of trials (months) times the probability of success (inspection) for an individual trial.

24 * 0.05

## [1] 1.2

In our case, that works out to around 1.2 inspections.

Meanwhile, the standard deviation is given by the square root of the number of trials, times the probability of success, times the probability of failure (represented by 1 minus the success probability). In our case:

sqrt(24 * (0.05) * (1 - 0.05))

## [1] 1.067708

So, our standard deviation is about 1.07.

Question 3

(Poisson). You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour?

Here, we rely on the Poisson distribution, because we are looking at the number of occurrences of an event over a specific period of time. That distribution is modeled as:

\[ P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!} \] Here, $k$ represents our number of events (in our case, patients), and $\lambda$ is our average rate of events for our time period (in the first part of the question, 10 per hour).

#this function returns e^n, and e^1 = e.
e <- exp(1)
k_patients <- 3
lambda_patients_per_hour <- 10

(e^(-1*lambda_patients_per_hour) * lambda_patients_per_hour^k_patients) / factorial(k_patients)

## [1] 0.007566655

The probability of precisely 3 patients arriving in an hour is around 0.76%.

What is the probability that more than 10 arrive in one hour?

As in question 2, we can find probabilities for spans of values by running our Poisson distribution formula across all possible values. In this case, because there are technically infinitely many possibilities >10, we are better off subtracting the probabilities of 10 patients or less from 1.

poisson_clinic <- function (e, k_patients, lambda_patients_per_hour) {
  return((e^(-1*lambda_patients_per_hour) * lambda_patients_per_hour^k_patients) / factorial(k_patients))
}

p_of_10_or_fewer <- 0

for (i in 0:10) {
  p_of_10_or_fewer <- p_of_10_or_fewer + poisson_clinic(exp(1),i,10)
}

1 - p_of_10_or_fewer

## [1] 0.4169602

Therefore, the probability of getting more than 10 patients in a given hour is around 41.2%.

Now, as the number of patients increases, the chances should theoretically get so infinitesimally small that we could get roughly the same result by looping over some large number of patient counts above 10. Let’s try it:

p_of_10_or_more <- 0

for (i in 11:100) {
  p_of_10_or_more <- p_of_10_or_more + poisson_clinic(exp(1),i,10)
}

p_of_10_or_more

## [1] 0.4169602

Neat!

How many would you expect to arrive in 8 hours?

Quite simply, if 10 patients on average came in per hour, I would expect around 80 patients across 8 hours (10 times the number of hours).

What is the standard deviation of the appropriate probability distribution?

The standard deviation of a Poisson distribution is the square root of $\lambda$, so $\sqrt{10}$ or roughly:

sqrt(10)

## [1] 3.162278

If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

If each of the three providers can see 24 patients a day, the total capacity of the clinic is $3 * 24$, or $72$ patients. Given our expected intake of $80$ patients, the facility would be at $80 / 72$ or roughly $111%* utilization. One more provider would make the capacity %96% patients, accounting for expected volume as well as a small cushion of variability in case a day’s intake is abnormally high.

Question 4

(Hypergeometric). Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips?

Here, we will rely on the hypergeometric probability distribution for two reasons: because we are making selections without replacement from a finite population and, more imporantly, because the question tells us it’s a hypergeometric distribution problem in the beginning.

The formula for that distribution is as follows:

\[ P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}} \] Here, the “k”s represent successes (or in our case, nurses). Capital $K$ represents the number of possible successes in the population (so, nurses among the staff in question) and lower-case $k$ is the number of resulting successes (the number of nurses chosen for the trip). The “n”s are analagous, in the sense that capital $N$ stands in for the full population (all relevant staff) and lower-case $n$ is the number of “draws” from that population (the number of slots for the Disney trip).

Because binomial coefficients (each instance of a number over another in parentheses) are cumbersome, I will write a quick function to represent them on the pathway to solving this problem.

binom_coeff <- function(a, b) {
  return(factorial(a) / (factorial(b) * factorial(a-b)))
}

Now, plugging in my relevant values to the binomial coefficient formula and, extending further, in the hypergeometric distribution formula:

N <- 30
n <- 6
K <- 15
k  <- 5

P_of_5_nurses_chosen <- (binom_coeff(K,k) * binom_coeff(N-K,n-k)) / binom_coeff(N,n)

P_of_5_nurses_chosen

## [1] 0.07586207

The chances of picking fairly/randomly from among the staff and sending 5 nurses is about 7.6%.

The chances of at least 5 nurses being chosen can be determined by adding in the probability of all Disney-bound staff being nurses.

P_of_6_nurses_chosen <- (binom_coeff(K,6) * binom_coeff(N-K,n-6)) / binom_coeff(N,n)

P_of_5_nurses_chosen + P_of_6_nurses_chosen

## [1] 0.08429119

So the chance of selecting all 6 nurses raises our probability of at least 5 nurses only slightly.

How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?

Given the even split among the staff, I would expect 3 nurses and 3 non nurses. 5 nurses is by no means impossible in this probability distribution…but it raises an eyebrow!

Question 5

(Geometric). The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year?

Once again, we’ll trust the question and use the geometric distribution formula. The geometric distribution formula is given as follows:

\[ P(X=k) = (1-p)^{k-1} \] Here, $p$ is the probability of injury per hour, and $k$ is the number of hours. If we figure out the probability for $k=1200$, we will have found the probability that the first injury does NOT occur in the first 1200 trials (hours). Subtracting that from 1 tells us the probability that an injury DID occur in those first 1200.

1 - ((1 - 0.001) ^ (1201 -1))

## [1] 0.6989866

So there is about a 70% chance of injury in those 1200 hours.

In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured?

Using the same logic, we can sub in $15 * 1200 = 18,000$ hours.

1 - ((1 - 0.001) ^ (18000 -1))

## [1] 1

According to my math, a serious injury is virtually guaranteed at this amount of driving. Ah!

Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

The geometric distribution is considered “memoryless,” meaning the fact that the driver has already driven 1200 hours has no bearing on the likelihood that they will be in an accident in the next 100. Therefore, we can use the same formula we’ve been using, plugging in 100.

1 - ((1 - 0.001) ^ (100 -1))

## [1] 0.09430216

There is a roughly 9.4% chance the driver will get in an accident in the next 100 hours.

Question 6

You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours?

This feels ripe for a Poisson distribution, because we have an average rate of “success” (1 failure per 1000 hours) and we want the probability of a different number of successes in the same time period. Recall our formula:

\[ P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!} \] In this case, $k$ represents the number of events (generator failures) for which we are trying to understand our probability, and $\lambda$ is our average rate of events for our time period (1 per 1000 hours). Since the question is asking for the probability of the generator failing more than twice, we can find the probability of it failing 0, 1 or 2 times and subtracting that from 1.

#this function returns e^n, and e^1 = e.
e <- exp(1)
lambda_failures_per_1000 <- 1

p_of_2_or_fewer_failures <- 0

for (k in 0:2) {
  p <- (e^(-1*lambda_failures_per_1000) * lambda_failures_per_1000^k) / factorial(k)
  p_of_2_or_fewer_failures <- p_of_2_or_fewer_failures + p
}

1 - p_of_2_or_fewer_failures

## [1] 0.0803014

The probability that the generator will fail more than two times in 1000 hours is about 8.03%.

What is the expected value?

The expected value is given: 1!

Question 7

A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes?

In a uniform distribution, probabilities can be understood as simple proportions. Of all the time in a half hour, the amount greater than 10 minutes is 20 minutes. Therefore, we can find the probability of waiting more than 10 minutes by dividing that block of time (20) by the total interval (30), resulting in about 66.7%.

If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen?

The uniformity of the probability distribution does not change given how long the person has waited; all that changed is the overall amount of time from which the final value can be drawn. Given there are 20 minutes remaining in the half hour, and 15 of those minutes occur at least 5 minutes after the current time, the likelihood that person will have to wait another 5 minutes at last is 15/20, or 75%.

What is the expected waiting time?

In this case, the expected waiting time would be the average of the two ends of the interval, sitting right in the middle of it. In this case, it would be 15 minutes.

Question 8

Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time?

The expected failure time is 10 years.

What is the standard deviation?

The standard deviation is also 10 years, as it is given as the reciprocal of the rate of deterioration (which is 1/10 per year)

What is the probability that your MRI will fail after 8 years?

Here, because the failure follows an exponential distribution, we can rely on the cumulative distribution function, given by:

\[ F(x) = 1 - e^{-\lambda x} \] Here, $\lambda$ is the rate (in our case, 1/10 per year) and $x$ is the time period in question, 8 years. Solving for F:

1 - exp(1) ^ ((-1 * 0.1)*8)

## [1] 0.550671

The probability that the MRI machine fails within 8 years is roughly 55.1%. That means that the chance it will fail after that period is 1 minus 55.1%, or 44.9%.

Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?

Using the exponential distribution, we are not measuring decay per se. That is to say, the fact that the machine has survived 8 years does not impact our assessment of whether it will fail in the next two. Therefore, we approach the problem the same we did the previous, only with an $x$ of 2 years (and skipping subtracting from 1, as we are interested in the interval within 2 years.)

1 - exp(1) ^ ((-1 * 0.1)*2)

## [1] 0.1812692

The probability the machine will fail in the next two years is about 18.1%.

hw5

2024-02-24