library(MASS)
  1. (Bayesian). A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease? If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 per administration, what is the total first-year cost for treating 100,000 individuals?

Solution:

Sensitivity = True Positive for disease = 96% Specificity = True Negative for disease = 98% Prevalence rate = rare instance rate = 0.1% = .001 What is the probability that an individual who is reported as positive by the new test actually has the disease?

Median cost per positive case total = $100,000 Test administration cost = $1000 What is the total first-year cost for treating 100,000 individuals?

Bayesian Formula:

The Bayes Theorem finds the chance that an event will happen based on our prior knowledge of similar events.

\[P(A|B)=((P(B|A)⋅P(A))/P(B)\]

where

P(A|B) - conditional probability that event A will happen if event B has already happened.

P(B|A) - conditional probability of event B given that event A has already occurred.

P(A) - the earlier probability that event A will take place.

P(B) - the probability that event B will occur.

Question 1:

# Sensitivity = True Positive for disease = 96%

sensitivity <- 0.96
# Specificity = True Negative for disease = 98%

specificity <- 0.98
# Prevalence rate = rare instance rate

prevalence_rate <- 0.001
# False positive rate

false_positive_rate <- (1 - specificity)
false_positive_rate <- (1 - 0.98)
false_positive_rate
## [1] 0.02

Total Probability of getting the true positive result

total_prob_positive <- sensitivity * prevalence_rate + false_positive_rate * (1 - prevalence_rate)
total_prob_positive <- .96 * .001 + .02 * (1 - .001)
total_prob_positive
## [1] 0.02094

Probability of Positive given the positive test result

prob_disease_given_positiveTest <- sensitivity * prevalence_rate / total_prob_positive
prob_disease_given_positiveTest <- .96 * .001 / .02094
prob_disease_given_positiveTest 
## [1] 0.04584527

Rounding the result and calculating percentage of

cat(round(prob_disease_given_positiveTest * 100, 2), "%\n")
## 4.58 %

The probability of actually having the positive for disease by the new test is 4.58 %

Question 2:

Test administration cost = $1000

cost_per_head <- 4.58 * 1000
cost_per_head
## [1] 4580
total_cost <- 100000 * 4580
total_cost
## [1] 4.58e+08
  1. (Binomial). The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections? What is the probability that, after 24 months, you received 2 or more inspections? What is the probability that your received fewer than 2 inspections? What is the expected number of inspections you should have received? What is the standard deviation?

Solution:

\[ P(x)= \left( \begin{bmatrix} n \\ x \end{bmatrix}\right) p^x(1-p)^{n-x}\]

The binomial distribution model is used for finding the probability of success of an event which has only two possible outcomes in a series of experiments.

R has four in-built functions to generate binomial distribution.

dbinom(x, size, prob) - the probability density distribution at each point. pbinom(x, size, prob) - the cumulative probability of an event. It is a single value representing the probability. qbinom(p, size, prob) - takes the probability value and gives a number whose cumulative value matches the probability value. rbinom(n, size, prob) - generates required number of random values of given probability from a given sample.

x is a vector of numbers.

p is a vector of probabilities.

n is number of observations.

size is the number of trials.

prob is the probability of success of each trial.

Mean = n * p Variance = n * p * (1 - p)

Probability that, after 24 months, you received exactly 2 inspection

prob_inspection = 0.05
months = 24
prob_2months <-  dbinom(2, 24, 0.05)

prob_2months
## [1] 0.2232381

Probability that, after 24 months, you received 2 or more inspections

prob_2month_or_more <- 1 - pbinom(1, 24, 0.05)

prob_2month_or_more
## [1] 0.3391827

Probability that you received fewer than 2 inspections

prob_2month_or_less <- pbinom(1, 24, 0.05)

prob_2month_or_less
## [1] 0.6608173

Expected number of inspections you should have received

expected_inspections <- 24 * 0.05
expected_inspections
## [1] 1.2

Standard deviation

standard_deviation <- sqrt(months * prob_inspection * (1-prob_inspection))

standard_deviation <- sqrt(24 * 0.05 * 0.95)

standard_deviation 
## [1] 1.067708
  1. (Poisson). You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour? What is the probability that more than 10 arrive in one hour? How many would you expect to arrive in 8 hours? What is the standard deviation of the appropriate probability distribution? If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

Solution:

\[ P(x)= \lambda^{x}e^{-\lambda}/x^{!}\]

where

x=0,1,2,3…..

𝑒 is the euler’s number

𝜆 is an average rate of expected value and λ=variance,λ>0

Probability that exactly 3 arrive in one hour

dpois() function calculates the probability mass function (PMF) for a given Poisson distribution. It takes two arguments: the value at which to evaluate the PMF and the mean of the Poisson distribution.

lambda<- 10
x<- 3
prob_3 <- dpois(x, lambda)
prob_3 <- dpois(3, 10)
prob_3
## [1] 0.007566655

Probability that more than 10 arrive in one hour

ppois() function calculates the cumulative distribution function (CDF) for a Poisson distribution.

It takes two arguments: the value at which to evaluate the CDF and the mean of the Poisson distribution.

prob <- ppois(10, lambda=10)   # lower tail

prob
## [1] 0.5830398
ppois(9, lambda=10, lower=FALSE)   # upper tail
## [1] 0.5420703

Expected to arrive in 8 hours

lambda<- 10
time <- 8
expected_arrive <- lambda * time
expected_arrive
## [1] 80

Standard Deviation of the appropriate probability distribution

\[SD=sqrt(λ)\]

lambda<- 10

std_dev_poisson <- sqrt(lambda)

std_dev_poisson 
## [1] 3.162278

For three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

utilization <- 30 * 24 / (3 * 24)

utilization
## [1] 10

Percent utilization of 3 providers each seeing 24 patients per day

cat(utilization * 100, "%\n")
## 1000 %
  1. (Hypergeometric). Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips? How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?

Solution:

The probability mass function (PMF) for the hypergeometric distribution is:

P(X = k) = [K choose k] * [(N - K) choose (n - k)] / [N choose n]

where: X is the random variable representing the number of objects of interest in the sample k is a specific value of X [a choose b] represents the binomial coefficient, which is the number of ways to choose b objects from a set of a objects

dhyper(x, m, n, k, log = FALSE) phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE) rhyper(nn, m, n, k)

p - probability, it must be between 0 and 1.

x: represents the data set of values m: size of the population n: number of samples drawn k: number of items in the population

Probability 5 nurses were innocently selected.

prob_5_nurses <- dhyper(5,15,15,6)
prob_5_nurses
## [1] 0.07586207

Number of expected nurses

expected_nurses = 6 * (15/30)
expected_nurses
## [1] 3

Number of expected non nurses

expected_non_nurses = 6 * (15/30)
expected_non_nurses
## [1] 3
  1. (Geometric). The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year? In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured? Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

Solution:

Probability of seriously injured in the course of a year.

prob_injury = .001

hours = 1200

crash_1200hrs = pgeom(hours,prob_injury)

crash_1200hrs
## [1] 0.6992876

Probability of seriously injured in the course of 15 months.

crash_15_months = pgeom(hours * (15/12),prob_injury)
crash_15_months
## [1] 0.7772602

Expected number of hours that a driver will drive before being seriously injured.

Expected_hours <-1/prob_injury

Expected_hours
## [1] 1000

Probability that he or she will be injured in the next 100 hours

prob_100hours <- pgeom(100,prob_injury)

prob_100hours
## [1] 0.09611265
  1. You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?

Solution:

Probability that the generator will fail more than twice in 1000 hours.

fail_more_than_twice = 1 - ppois(2,1)

fail_more_than_twice
## [1] 0.0803014

Expected value

expected_value <- 1/1000 * 1000

cat(expected_value, "\n")
## 1
  1. A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes? If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen? What is the expected waiting time?

Solution:

Setting values of minimum and maximum waiting times

Probability that this patient will wait more than 10 minutes

wait = 10
min = 0
max = 30
wait_more_than_10 = 1- punif(wait,min,max)
wait_more_than_10 = 1- punif(10,0,30)

wait_more_than_10
## [1] 0.6666667

Probability that he/she will wait at least another 5 minutes prior to being seen.

wait_2 = 15
new_min = 10
wait_more_than_15 = 1 - punif(wait_2,new_min,max)
wait_more_than_15
## [1] 0.75

Expected waiting time E[X]=1/2(a+b)

expected_wait <- .5*(0+30)

expected_wait
## [1] 15
  1. Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time? What is the standard deviation? What is the probability that your MRI will fail after 8 years? Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?

Solution:

Expected failure time is after 10 years. As this is an exponential distribution,

the standard deviation is the mean - 10 years.

Probability that your MRI will fail after 8 years

years = 8

fail_rate = 1/10

fail_after_8yrs = 1- pexp(years,fail_rate)

fail_after_8yrs = 1- pexp(8,1/10)

fail_after_8yrs
## [1] 0.449329

Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years.

next_years = 2
fail_after_8_wi_2 = pexp(next_years,fail_rate)
fail_after_8_wi_2 = pexp(2,1/10)
fail_after_8_wi_2
## [1] 0.1812692