data_605_hw5

Question 1 (Bayesian)

A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease? If the median cost (consider this the best point estimate) is about 100,000 per positive case total and the test itself costs 1000 per administration, what is the total first-year cost for treating 100,000 individuals?

# What is the chance of really getting the disease given the positive result, i.e. P(A|B)? 
sens = .96
spec = .98
prev = .001

pool_of_individuals = 100000
num_detected_really_with_disease = prev * pool_of_individuals * sens
num_detected_but_actually_without_disease = (pool_of_individuals - prev * pool_of_individuals) * (1 - spec)
actual_num_patients = num_detected_really_with_disease + num_detected_but_actually_without_disease
chance_really_getting_disease = (sens * prev) / ((sens * prev) + ((1 - spec) * (1 - prev)))

print(paste0("Given the sensitivity is equal to 96%, specificity equal to 98% and prevalence equal to .1%, the chance of really getting the disease given a positive result, i.e. P(A|B), is roughly ", round({chance_really_getting_disease}*100, 2), "%"))

## [1] "Given the sensitivity is equal to 96%, specificity equal to 98% and prevalence equal to .1%, the chance of really getting the disease given a positive result, i.e. P(A|B), is roughly 4.58%"

print(paste0("In addition, the 100,000 tests for targeting the same number of individuals would have initially costed $100,000,000 (since that is $1000 per administration), and given the expected number of positive cases out of the targets to be roughly ", {actual_num_patients}, ", the medical cost for treating these patients would be $209,400,000. Together, the overall cost is roughly $309,400,000."))

## [1] "In addition, the 100,000 tests for targeting the same number of individuals would have initially costed $100,000,000 (since that is $1000 per administration), and given the expected number of positive cases out of the targets to be roughly 2094, the medical cost for treating these patients would be $209,400,000. Together, the overall cost is roughly $309,400,000."

Question 2 (Binomial)

The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections? What is the probability that, after 24 months, you received 2 or more inspections? What is the probability that you received fewer than 2 inspections? What is the expected number of inspections you should have received? What is the standard deviation?

prob = .05
num_of_inspection = 2
trial = 24

prob.dist <- dbinom(1:trial, trial, prob)
mu <- sum(1:trial * prob.dist)
variance <- sum((1:trial)^2 * prob.dist) - mu^2
sd <- sqrt(variance)

exactly_two_inspection = dbinom(num_of_inspection, trial, prob)
print(glue(paste0("The probability of getting exactly {num_of_inspection} inspections after {trial} months is about ", round({exactly_two_inspection}*100, 2), "%")))

## The probability of getting exactly 2 inspections after 24 months is about 22.32%

two_or_more <- pbinom(1, trial, prob, lower.tail = FALSE)
print(glue(paste0("The probability of getting ", {num_of_inspection}, " or more inspections after {trial} months is about ", round({two_or_more}*100, 2), "%")))

## The probability of getting 2 or more inspections after 24 months is about 33.92%

less_than_two = 1 - two_or_more
print(glue(paste0("The probability of getting less than ", {num_of_inspection}, " inspections after {trial} months is about ", round({less_than_two}*100, 2), "%")))

## The probability of getting less than 2 inspections after 24 months is about 66.08%

print("The expected number of inspection is about 1")

## [1] "The expected number of inspection is about 1"

barplot(prob.dist, main = "Prob of number of inspection")

print(glue("The standard deviation is ", round({sd}, 2)))

## The standard deviation is 1.07

Question 3 (Poisson)

You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour? What is the probability that more than 10 arrive in one hour? How many would you expect to arrive in 8 hours? What is the standard deviation of the appropriate probability distribution? If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

lambda = 10

exact3 <- ppois(3, lambda = lambda) - ppois(2, lambda = 10)
print(glue("The probability of having exactly 3 patients arrived in one hour at a rate of 10 per hour is about ", round({exact3}*100, 2), "%"))

## The probability of having exactly 3 patients arrived in one hour at a rate of 10 per hour is about 0.76%

more10 <- ppois(10, lambda = lambda, lower = FALSE)
print(glue("The probability of having more than 10 patients arrived in one hour at a rate of 10 per hour is about ", round({more10}*100, 2), "%"))

## The probability of having more than 10 patients arrived in one hour at a rate of 10 per hour is about 41.7%

expect_8_hrs <- lambda * 8
print(glue("There are ", {expect_8_hrs}, " patients expected to arrive in 8 hours given the rate of 10 per hour, i.e. 10 * 8"))

## There are 80 patients expected to arrive in 8 hours given the rate of 10 per hour, i.e. 10 * 8

sd <- sqrt(lambda)
print(glue("The standard deviation for Poisson distribution is sqrt(lambda), i.e. ", round({sd}, 2)))

## The standard deviation for Poisson distribution is sqrt(lambda), i.e. 3.16

Question 4 (Hypergeometric)

Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips? How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?

K <- 15
k <- 5
N <- 30
n <- 6

prob5_nurses <- choose(K, k) * choose(N-K, n-k) / choose(N,n)

print(glue("The probability of selecting ", {k}, " nurses on the trip is about ", round({prob5_nurses}*100, 2), "%"))

## The probability of selecting 5 nurses on the trip is about 7.59%

dhd <- stats::dhyper(seq(0, n, by = 1), 
                     m = 15, 
                     n = 15, 
                     k = 6)

dhd.df <- dhd %>%
    as.data.frame() %>%
    dplyr::mutate(nurses = 0:6) %>%
    dplyr::select(nurses, `outcome_prob (%)` = ".") %>%
    dplyr::mutate(`outcome_prob (%)` = round(`outcome_prob (%)` * 100, 2)) 

dhd.df

##   nurses outcome_prob (%)
## 1      0             0.84
## 2      1             7.59
## 3      2            24.14
## 4      3            34.87
## 5      4            24.14
## 6      5             7.59
## 7      6             0.84

dhd.df %>%
    ggplot(aes(x = nurses, y = `outcome_prob (%)`)) +
    geom_bar(stat = "identity") +
    ggtitle("Prob of number of nurses chosen")

print("The most likely outcome from above table and chart using the stats::dhyper() function is 3, i.e. approximately 35%. In other words, it should be most likely the case to have 3 nurses and 3 non-nurses chosen for the trip.")

## [1] "The most likely outcome from above table and chart using the stats::dhyper() function is 3, i.e. approximately 35%. In other words, it should be most likely the case to have 3 nurses and 3 non-nurses chosen for the trip."

Question 5 (Geometric)

The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year? In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured? Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

crash_per_hour = .001
hrs = 1200

crash1200 = pgeom(q = hrs, prob = crash_per_hour)

print(glue("The chance of getting seriously injuried is about ", round({crash1200} *100, 2), "%. Given the fact that the person can no longer drive after the serious injury, and so we choose the geometric distribution instead of binomial distribution"))

## The chance of getting seriously injuried is about 69.93%. Given the fact that the person can no longer drive after the serious injury, and so we choose the geometric distribution instead of binomial distribution

crash1500 = pgeom(q = hrs +300, prob = crash_per_hour)

print(glue("Assume that there's 100 hours per month, we can redo the calculation by changing the parameters, so that we see that the chance of getting seriously injuried after driving 1500 hours is approximatley ", round({crash1500} *100, 2), "%"))

## Assume that there's 100 hours per month, we can redo the calculation by changing the parameters, so that we see that the chance of getting seriously injuried after driving 1500 hours is approximatley 77.73%

print(glue("The expected value is equal to 1/p, i.e. ", {1/crash_per_hour}, " hours"))

## The expected value is equal to 1/p, i.e. 1000 hours

crash100 = pgeom(q = 100, prob = crash_per_hour)

print(glue("The chance of getting seriously injuried in the next 100 hours is about ", round({crash100} *100, 2), "%"))

## The chance of getting seriously injuried in the next 100 hours is about 9.61%

Question 6

You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?

num_of_failure = 2
trial = 1000
prob = 1 / trial

fail_twice <- pbinom(num_of_failure, size = trial, prob = prob, lower.tail = FALSE)

print(glue("The probability of failing it more than twice in ", {trial}, " hours is approximately ", round({fail_twice} *100, 2), "%"))

## The probability of failing it more than twice in 1000 hours is approximately 8.02%

print(glue("The expected value is already suggested by the probability, i.e. once in every 1000 hours and that is ", round({prob} *100, 2), "%. In other words, E(x) is equal to n * prob, i.e. ", round({trial * prob})))

## The expected value is already suggested by the probability, i.e. once in every 1000 hours and that is 0.1%. In other words, E(x) is equal to n * prob, i.e. 1

Question 7

A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes? If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen? What is the expected waiting time?

# wait more than 10 min
wait10plus = punif(10, min = 0, max = 30, lower.tail = FALSE)

print(glue("The probability of waiting longer than 10 min is approximately ", round({wait10plus} *100, 2), "% in this uniform distribution"))

## The probability of waiting longer than 10 min is approximately 66.67% in this uniform distribution

# after first 10 min, wait for another 5 min before seen and it will happen in the remaining 20 min
wait5more = punif(5, min = 1, max = 20, lower.tail = TRUE)

print(glue("The probability of waiting for another 5 min after the first 10 min is approximately ", round({wait5more} *100, 2), "% in this uniform distribution"))

## The probability of waiting for another 5 min after the first 10 min is approximately 21.05% in this uniform distribution

# expected wait time
print(glue("The expected wait time is simply (0 + 30) / 2, i.e. ", {(0 + 30) / 2}, " min"))

## The expected wait time is simply (0 + 30) / 2, i.e. 15 min

Question 8

Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time? What is the standard deviation? What is the probability that your MRI will fail after 8 years? Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?

# What is the expected failure time? 
print("As stated, the lifetime of the machine is about 10 years (expected value). In other words, the mu or expected failure time is about 10 years")

## [1] "As stated, the lifetime of the machine is about 10 years (expected value). In other words, the mu or expected failure time is about 10 years"

# What is the standard deviation? 
print("For exponential distribution, the sd is the same as the mean. So in this case, it is also 10")

## [1] "For exponential distribution, the sd is the same as the mean. So in this case, it is also 10"

# What is the probability that your MRI will fail after 8 years? 
fail_after_8_yr = pexp(q = 8, rate = 0.1, lower.tail = FALSE)
print(glue("The probability of the machine to fail after owning it for 8 years is about ", round({fail_after_8_yr} *100, 2), "%"))

## The probability of the machine to fail after owning it for 8 years is about 44.93%

# Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?
fail_next_2_yr = pexp(q = 2, rate = 0.1)
print(glue("The probability of the machine to fail in the next two years is about ", round({fail_next_2_yr} *100, 2), "%"))

## The probability of the machine to fail in the next two years is about 18.13%