Probability

Question 1

(Bayesian). A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease?

Event $B$ = Positive Test Event $A1$ = Actual HIV Postive Event $A2$ = Actual HIV Negative

Here we can use Bayes Theorem or multiple conditions:

\[P(A1 \mid B)=\frac{P(B \mid A1)P(A1)}{P(B \mid A1)P(A1)+P(B \mid A2)P(A2)}\]

# set probs
p_b_a1 <- 0.96
p_b_a2 <- 1-0.98
p_a1 <- 0.001
p_a2 <- 0.999

# compute
q1 <- round((p_b_a1*p_a1) / ((p_b_a1 * p_a1) + (p_b_a2 * p_a2)),4)
print(paste0("The probability that an individual who is reported as positive actually has the disease is ",q1))

## [1] "The probability that an individual who is reported as positive actually has the disease is 0.0458"

If the median cost (consider this the best point estimate) is about $100,000 per positive case total, and the test itself costs $1000 per administration, what is the total first-year cost for treating 100,000 individuals?

# set variables
test_per_cost <- 1000
ind <- 100000

# compute
pos_test <- ind * p_a1 # use variable of actual pos from part a
test_cost <- test_per_cost * ind
pos_cost <- pos_test * ind
tot_cost <- test_cost + pos_cost

class(format(tot_cost, scientific = F))

## [1] "character"

paste0("The total cost for treating 100,000 people would be $",format(tot_cost, scientific = F))

## [1] "The total cost for treating 100,000 people would be $110000000"

Question 2

(Binomial). The probability of your organization receiving a Joint Commission inspection in any given month is .05.

What is the probability that, after 24 months, you received exactly 2 inspections?

We can use the binomial formula:

\[P_x=\left(\begin{array}{l} n \\ x \end{array}\right) p^x q^{n-x}\]

# set variables
x <- 2
n <- 24
p <- 0.05
q <- 1 - p

# compute
(p_x_2 <- (factorial(n) / (factorial(n - x) * factorial(x))) * (p^x) * (q^(n-x)))

## [1] 0.2232381

# cross check
dbinom(x = 2, size = 24, prob = 0.05)

## [1] 0.2232381

What is the probability that, after 24 months, you received 2 or more inspections?

In this instance we would take $1 - P(1) - P(0)$

# set variables
x_0 <- 0
x_1 <- 1

# compute
p_x_0 <- (factorial(n) / (factorial(n - x_0) * factorial(x_0))) * (p^x_0) * (q^(n-x_0))
p_x_1 <- (factorial(n) / (factorial(n - x_1) * factorial(x_1))) * (p^x_1) * (q^(n-x_1))
(p_x_2_more <- 1 - (p_x_0 + p_x_1))

## [1] 0.3391827

# crosscheck
1-(dbinom(1,24,.05)+dbinom(0,24,.05))

## [1] 0.3391827

What is the probability that your received fewer than 2 inspections?

# copmute
(p_x_2_fewer <- p_x_0 + p_x_1)

## [1] 0.6608173

# cross check
pbinom(1,24,.05)

## [1] 0.6608173

What is the expected number of inspections you should have received?

n*p

## [1] 1.2

What is the standard deviation?

sqrt(n*p*(1-p))

## [1] 1.067708

Question 3

(Poisson). You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour.

To model the arrival of patients at the clinic, we can use a Poisson distribution, where the parameter λ represents the average number of arrivals per hour. Given that the arrival rate is 10 per hour, we have:

λ = 10

Poisson formula is as follows:

\[f(x)=\frac{\lambda^x}{x !} e^{-\lambda}\]

What is the probability that exactly 3 arrive in one hour?

# set variables
lambda <- 10
x <- 3
x_fac <- factorial(x)
e <- exp(1)

# compute
(p_3 <- (lambda^x / x_fac) * e^-lambda)

## [1] 0.007566655

# cross check
dpois(x,lambda)

## [1] 0.007566655

What is the probability that more than 10 arrive in one hour?

1-sum(dpois(0:10,lambda))

## [1] 0.4169602

How many would you expect to arrive in 8 hours?

lambda*8

## [1] 80

What is the standard deviation of the appropriate probability distribution?

sqrt(lambda*1)

## [1] 3.162278

If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

# set variables
dr <- 3
hrs_day <- 8
pat_ea_dr <- 24
tot_seen <- dr*pat_ea_dr
exp_pat <- hrs_day*lambda

#compute
util_perc <- round(exp_pat/tot_seen * 100)

print(paste0("Percent utilization is ",util_perc,"%, they are understaffed, perhaps hire a part-time dr."))

## [1] "Percent utilization is 111%, they are understaffed, perhaps hire a part-time dr."

Question 4

(Hypergeometric). Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse.

If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips?

We can use the Hypergeometric Distribution for sampling without replacement

\[P=\frac{{ }_K C_k^*{ }_{(N-K)} C_{(n-k)}}{{ }_N C_n}\]

prob_5_nurses <- dhyper(5,15,15,6, log = F)
prob_5_nurses

## [1] 0.07586207

How many nurses would we have expected your subordinate to send?

We can use the expected value formula.

#E(X)=KM/N expected number of nurses
exp_nurses <- 6 * 15 / 30
exp_nurses

## [1] 3

How many non-nurses would we have expected your subordinate to send?

exp_non_nurses <- 6 * 15 / 30
exp_non_nurses

## [1] 3

Question 5

(Geometric). The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year.

What is the probability that the driver will be seriously injured during the course of the year?

#1-P(0)=probability at least 1 injury

1-pgeom(1200,.001)

## [1] 0.3007124

In the course of 15 months?

#15 months equates to 1200 + 300= 1500 hours
1-pgeom(1500,.001)

## [1] 0.2227398

What is the expected number of hours that a driver will drive before being seriously injured?

#E[X]=1/p
1/.001

## [1] 1000

Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

#Geometric distribution
#so P(injured next 100 hours|not injured 1200)
#
((pgeom(1300,.001)-pgeom(1200,.001))*(1-pgeom(1200,.001)))/(1-pgeom(1200,.001))

## [1] 0.02863018

Question 6

You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours.

What is the probability that the generator will fail more than twice in 1000 hours?

We can use the Poisson Distribution to solve this.

# model with Poison lambda =1
# 1-P(0)-p(1)-p(2)
# 1-CDFP(P(2))
# set variables
lambda <- 1
x <- 2
1-ppois(x,lambda)

## [1] 0.0803014

What is the expected value?

lambda

## [1] 1

Question 7

A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes.

What is the probability that this patient will wait more than 10 minutes?

#unif -- so P(x>10), 1- p(10)
1-punif(10,0,30)

## [1] 0.6666667

If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen?

# conditional P(A|B)/P(B)
#P(A&B)/PA=PA/PB
#PA= 1-cdf P(15)
#PB =CDF P(10)
PA<-1-punif(15,10,30)
PB<-1-punif(10,10,30)
PA/PB

## [1] 0.75

What is the expected waiting time?

# expected waiting time E[X]=(a+b)/2
(0+30)/2

## [1] 15

Question 8

Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution.

What is the expected failure time?

# define failure rate as lambda
lifetime <- 10
lambda <- 1/lifetime

# compute (opposite of failure rate)
(exp_fail_time <- 1/lambda)

## [1] 10