Data 605 Week 5

1. Bayesian Calculation for HIV Test
2. Binomial Distribution for Inspection
3. Poisson Distribution for Patient Arrival
4. Hypergeometric Distribution for Favoring Nurses
5. Geometric Distribution for Car Injuries
6. Poisson Distribution for Independent, Fixed Interval Events (Generator Failure)
7. Continuous Uniform Distribution for Waiting Time
8. Exponential Distribution for MRI Failure

1. Bayesian Calculation for HIV Test

A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease? If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 per administration, what is the total first-year cost for treating 100,000 individuals?

Using Bayes Theorem:

\[ P(\text{Disease} | \text{Positive Test}) = \frac{P(\text{Positive Test} | \text{Disease}) \times P(\text{Disease})}{P(\text{Positive Test})} \]

$P(\text{Disease})$ = Probability of having the disease
$P(\text{Positive Test} | \text{Disease})$ = Probability of a positive test given the disease
$P(\text{Positive Test})$ = Overall probability of a positive test

# Sensitivity, Specificity, and Prevalence
sensitivity <- 0.96
specificity <- 0.98
prevalence <- 0.001

# Calculate P(Positive Test)
p_positive_test <- (sensitivity * prevalence) + ((1 - specificity) * (1 - prevalence))

# Calculate P(Disease | Positive Test)
p_disease_given_positive <- (sensitivity * prevalence) / p_positive_test

p_disease_given_positive_rounded <- round(p_disease_given_positive, 4)

print(paste("The probability that an individual who tests positive actually has the disease is ", p_disease_given_positive_rounded))

## [1] "The probability that an individual who tests positive actually has the disease is  0.0458"

2. Binomial Distribution for Inspection

The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections? What is the probability that, after 24 months, you received 2 or more inspections? What is the probability that your received fewer than 2 inspections? What is the expected number of inspections you should have received? What is the standard deviation?

The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. - Wikipedia

\[ P(X=k) = \binom{n}{k} \times p^k \times (1-p)^{(n-k)} \]

Where:

n is the number of trials = 24 months
k is the number of successes = number of inspections
p is the probability of success in a single trial = 0.05

Probability of Exactly 2 Inspections

To find the probability of receiving exactly 2 inspections in 24 months

\[ P(X=2) = \binom{24}{2} \times 0.05^2 \times (1 - 0.05)^{(24 - 2)} \]

Probability of 2 or More Inspections

To find the probability of receiving 2 or more inspections in 24 months, we can sum the probabilities of receiving exactly 2 to 24 inspections:

\[ P(X \geq 2) = \sum_{k=2}^{24} \binom{24}{k} \times 0.05^k \times (1 - 0.05)^{(24 - k)} \]

# Parameters
p <- 0.05  # Probability of inspection in a month
n <- 24  # Number of months 

# Probability of exactly 2 inspections in 24 months using dbinom
p_exactly_2 <- round(dbinom(2, n, p), 4)
print(paste("The probability of exactly 2 inspections in 24 months is", p_exactly_2))

## [1] "The probability of exactly 2 inspections in 24 months is 0.2232"

# Probability of 2 or more inspections in 24 months using pbinom
p_2_or_more <- round(1 - pbinom(1, n, p), 4)
print(paste("The probability of 2 or more inspections in 24 months is", p_2_or_more))

## [1] "The probability of 2 or more inspections in 24 months is 0.3392"

# Probability of fewer than 2 inspections in 24 months using pbinom
p_fewer_than_2 <- round(pbinom(1, n, p), 4)
print(paste("The probability of fewer than 2 inspections in 24 months is", p_fewer_than_2))

## [1] "The probability of fewer than 2 inspections in 24 months is 0.6608"

# Expected number of inspections and standard deviation
expected_inspections <- round(n * p, 4)
std_dev_inspections <- round(sqrt(n * p * (1 - p)), 4)
print(paste("The expected number of inspections is ", expected_inspections, 
            " and the standard deviation is ", std_dev_inspections))

## [1] "The expected number of inspections is  1.2  and the standard deviation is  1.0677"

3. Poisson Distribution for Patient Arrival

You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour? What is the probability that more than 10 arrive in one hour? How many would you expect to arrive in 8 hours? What is the standard deviation of the appropriate probability distribution? If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. - Wikipedia

The Poisson distribution is given by:

\[ P(X=k) = \frac{\lambda^k \times e^{-\lambda}}{k!} \]

Where:

$\lambda$ is the average rate of occurrence = 3 or more than 10 patients per hour).
$k$ is the number of occurrences.
$e$ is the base of the natural logarithm = 2.71828

Probability of Exactly 3 Patients Arriving in One Hour

To find the probability of exactly 3 patients arriving in one hour:

\[ P(X=3) = \frac{10^3 \times e^{-10}}{3!} \]

Probability of More Than 10 Patients Arriving in One Hour

To find the probability of more than 10 patients arriving in one hour, we use the complement rule:

\[ P(X > 10) = 1 - \sum_{k=0}^{10} \frac{\lambda^k \times e^{-\lambda}}{k!} \]

# Define the rate of patient arrivals per hour
lambda <- 10

# Probability that exactly 3 patients arrive in one hour
prob_3_arrivals <- round(dpois(3, lambda), 4)
print(paste("Probability that exactly 3 patients arrive in one hour: ", prob_3_arrivals))

## [1] "Probability that exactly 3 patients arrive in one hour:  0.0076"

# Probability that more than 10 patients arrive in one hour
prob_more_than_10_arrivals <- round(1 - ppois(10, lambda), 4)
print(paste("Probability that more than 10 patients arrive in one hour: ", prob_more_than_10_arrivals))

## [1] "Probability that more than 10 patients arrive in one hour:  0.417"

# Expected number of arrivals in 8 hours
expected_8_hours <- round(8 * lambda, 4)
print(paste("Expected number of arrivals in 8 hours: ", expected_8_hours))

## [1] "Expected number of arrivals in 8 hours:  80"

# Standard deviation of the appropriate probability distribution
std_dev <- round(sqrt(lambda), 4)
print(paste("Standard deviation of the distribution: ", std_dev))

## [1] "Standard deviation of the distribution:  3.1623"

# Percent utilization is 24 patients oer day x 3 providers = 72 patients a day. Clinic operates for 8 hours.
provider_capacity_per_hour <- 72 / 8

# Percent utilization is (actual rate / capacity rate) * 100
percent_utilization <- round((lambda / provider_capacity_per_hour) * 100, 4)
print(paste("Percent utilization: ", percent_utilization, "%"))

## [1] "Percent utilization:  111.1111 %"

#Recommendation
print("Given the over-utilization, the clinic would need to extend hours, reduce the number of patients by hours by 1,  or hire one more provider to see the 80 expected arrivals")

## [1] "Given the over-utilization, the clinic would need to extend hours, reduce the number of patients by hours by 1,  or hire one more provider to see the 80 expected arrivals"

4. Hypergeometric Distribution for Favoring Nurses

Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips? How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?

The hypergeometric distribution is the probability of k successes in n draws, without replacement, from a finite population of size N that contains exactly K objects with that feature, wherein each draw is either a success or a failure - Wikipedia

Hypergeometric Distribution

The hypergeometric distribution is given by:

\[ P(X=k) = \frac{\binom{K}{k} \times \binom{N-K}{n-k}}{\binom{N}{n}} \]

Where:

$K$ is the total number of successes in the population = 15 nurses.
$N$ is the total population size = 30 supervisors.
$n$ is the number of draws = 6 trips
$k$ is the number of observed successes = 5 nurses.

Probability of Selecting Five Nurses

To find the probability of selecting exactly 5 nurses for the trips, we can use the PMF:

\[ P(X=5) = \frac{\binom{15}{5} \times \binom{15}{1}}{\binom{30}{6}} \]

Expected Number of Nurses and Non-Nurses

In a hypergeometric distribution, the expected value ($\mu$) of the number of successes is given by:

\[ \mu = n \times \frac{K}{N} \]

Where $n = 6$ (number of trips), $K = 15$ (number of nurses), and $N = 30$ (total number of supervisors).

# Parameters
total_workers <- 30  
total_nurses <- 15  
total_non_nurses <- 15  
total_trips <- 6  
nurses_sent <- 5 

# Probability of selecting 5 nurses for the trips if acting innocently
prob_5_nurses <- round(dhyper(nurses_sent, total_nurses, total_non_nurses, total_trips), 4)
print(paste("The probability of innocently selecting 5 nurses for the trips is approximately ", prob_5_nurses))

## [1] "The probability of innocently selecting 5 nurses for the trips is approximately  0.0759"

# Expected number of nurses to be sent
expected_nurses <- round((total_nurses / total_workers) * total_trips, 4)
print(paste("The expected number of nurses to be sent is", expected_nurses))

## [1] "The expected number of nurses to be sent is 3"

# Expected number of non-nurses to be sent
expected_non_nurses <- round((total_non_nurses / total_workers) * total_trips, 4)
print(paste("The expected number of non-nurses to be sent is", expected_non_nurses))

## [1] "The expected number of non-nurses to be sent is 3"

print("With the probability of 5 nurses at only 7.6%, the selection may be intentions.  The expected number of nurses and non nurses should be the same as well.")

## [1] "With the probability of 5 nurses at only 7.6%, the selection may be intentions.  The expected number of nurses and non nurses should be the same as well."

5. Geometric Distribution for Car Injuries

The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year? In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured? Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

The geometric distribution gives the probability that the first occurrence of success requires k independent trials, each with success probability p.

Geometric Distribution:

\[ P(X = k) = q^{(k-1)} \times p \]

The expected value $E[X]$ for a geometric distribution with success probability $p$ is:

\[ E[X] = \frac{1}{p} \]

Probability of Being Injured in a Year (1200 hours)

The probability of not being injured in a single hour is $q = 1 - p$ = 1 − 0.001 = 0.999
The probability of not being injured in 1200 hours is $q^{1200}$ = =0.999^1200 = 0.301
The probability of being injured at least once in 1200 hours is $1 - q^{1200}$ = 1 − 0.301 = 0.699

Probability of Being Injured in 15 Months (1800 hours)

The probability of being injured at least once in 1800 hours is $1 - q^{1800}$ = 1 − 0.165 = 0.835

Expected Number of Hours Before Being Injured

\[ E[X] = \frac{1}{0.001} = 1000 \text{ hours} \]

Probability of Being Injured in the Next 100 Hours Given 1200 Hours Driven

The probability of being injured in the next 100 hours, given that the driver has already driven 1200 hours without injury, is $1 - q^{100}$.

# Probability of being seriously injured per hour
p <- 0.001

# Probability of not being seriously injured per hour
q <- 1 - p

# Probability of being seriously injured in 1200 hours
n1 <- 1200
prob_injury_year <- round(1 - (q ^ n1), 4)
print(paste("Probability of being injured in 1200 hours:", prob_injury_year))

## [1] "Probability of being injured in 1200 hours: 0.699"

# Probability of being seriously injured in the course of 15 months (1800 hours)
n2 <- 1800
prob_injury_15months <- round(1 - (q ^ n2), 4)
print(paste("Probability of being injured in 1800 hours:", prob_injury_15months))

## [1] "Probability of being injured in 1800 hours: 0.8348"

# Expected number of hours before being seriously injured
expected_hours <- round(1 / p, 4)
print(paste("Expected number of hours before being injured:", expected_hours))

## [1] "Expected number of hours before being injured: 1000"

# Probability of being seriously injured in the next 100 hours given 1200 hours driven
n3 <- 100
prob_injury_next100 <- round(1 - (q ^ n3), 4)
print(paste("Probability of being injured in the next 100 hours given 1200 hours driven:", prob_injury_next100))

## [1] "Probability of being injured in the next 100 hours given 1200 hours driven: 0.0952"

6. Poisson Distribution for Independent, Fixed Interval Events (Generator Failure)

You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?

Poisson Distribution

$\lambda$: Average rate of failure per 1000 hours (1 failure/1000 hours) - $k$: Number of failures

\[ P(X = k) = \frac{\lambda^k \times e^{-\lambda}}{k!} \]

The expected value $E[X]$ is:

\[ E[X] = \lambda \]

Probability of More Than Two Failures in 1000 Hours

The probability that the generator will fail more than twice in 1000 hours can be calculated as $1 - P(X=0) - P(X=1) - P(X=2)$.

Expected Value

The expected value is simply $\lambda$, which is 1 failure per 1000 hours.

# Average rate of failure per 1000 hours 
lambda <- 1

# Probability of more than two failures in 1000 hours
prob_0_failures <- dpois(0, lambda)
prob_1_failure <- dpois(1, lambda)
prob_2_failures <- dpois(2, lambda)

# Probability of more than two failures
prob_more_than_2 <- 1 - (prob_0_failures + prob_1_failure + prob_2_failures)
print(paste("Probability of more than two failures in 1000 hours:", round(prob_more_than_2, 4)))

## [1] "Probability of more than two failures in 1000 hours: 0.0803"

# Expected value
expected_value <- lambda
print(paste("Expected value:", round(expected_value, 4), "failure per 1000 hours"))

## [1] "Expected value: 1 failure per 1000 hours"

7. Continuous Uniform Distribution for Waiting Time

A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes? If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen? What is the expected waiting time?

Continuous Uniform distributions describes an experiment where there is an arbitrary outcome that lies between certain bounds - variables that are equally likely to take any value within a specified range. - Wikipedia

Continuous Uniform Distribution Equations

$a$: Lower bound of the waiting time (0 minutes)
$b$: Upper bound of the waiting time (30 minutes)
$t$: Time in minutes

\[ f(t) = \frac{1}{b - a} \]

The expected value $E[T]$ is:

\[ E[T] = \frac{a + b}{2} \]

Probability of Waiting More Than 10 Minutes

The probability that the patient will wait more than 10 minutes is calculated as the area under the PDF curve from 10 to 30 minutes, which is $\frac{30 - 10}{30 - 0}$.

Probability of Waiting At Least Another 5 Minutes Given 10 Minutes Waited

Since the distribution is uniform, the probability of waiting at least another 5 minutes is $\frac{30 - 15}{30 - 10}$.

Expected Waiting Time

The expected waiting time is $\frac{0 + 30}{2}$.

# Lower and upper bounds of the waiting time
a <- 0
b <- 30

# Probability of waiting more than 10 minutes
prob_wait_more_10 <- (b - 10) / (b - a)
print(paste("Probability of waiting more than 10 minutes:", round(prob_wait_more_10, 4)))

## [1] "Probability of waiting more than 10 minutes: 0.6667"

# Probability of waiting at least another 5 minutes given 10 minutes waited
prob_wait_another_5 <- (b - 15) / (b - 10)
print(paste("Probability of waiting at least another 5 minutes given 10 minutes waited:", round(prob_wait_another_5, 4)))

## [1] "Probability of waiting at least another 5 minutes given 10 minutes waited: 0.75"

# Expected waiting time
expected_wait_time <- (a + b) / 2
print(paste("Expected waiting time:", round(expected_wait_time, 4), "minutes"))

## [1] "Expected waiting time: 15 minutes"

8. Exponential Distribution for MRI Failure

Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time? What is the standard deviation? What is the probability that your MRI will fail after 8 years? Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years? and what is the best probability technique?

The exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. - Wikipedia

Exponential Distribution Equations

$\lambda$: Rate parameter, which is the reciprocal of the expected value ($\lambda = \frac{1}{\text{Expected Value}}$)
$t$: Time in years

\[ f(t) = \lambda e^{-\lambda t} \]

The expected value $E[T]$ is:

\[ E[T] = \frac{1}{\lambda} \]

The standard deviation $\sigma$ is:

\[ \sigma = \frac{1}{\lambda} \]

The probability that the MRI will fail after $t$ years is:

\[ P(T > t) = e^{-\lambda t} \]

Expected Failure Time

The expected failure time is given as 10 years, which means $\lambda = \frac{1}{10}$.

Standard Deviation

The standard deviation is $\frac{1}{\lambda}$ = 10 years.

Probability of Failure After 8 Years

The probability that the MRI will fail after 8 years is $e^{-\lambda \times 8}$.

Probability of Failure in the Next 2 Years Given 8 Years Owned

The probability of failure in the next 2 years is $P(T > 2) = e^{-\lambda \times 2}$.

# Expected value in years
expected_value <- 10

# Calculate lambda (rate)
lambda <- 1 / expected_value

# Expected failure time
print(paste("Expected failure time:", round(expected_value, 4), "years"))

## [1] "Expected failure time: 10 years"

# Standard deviation
std_dev <- 1 / lambda
print(paste("Standard deviation:", round(std_dev, 4), "years"))

## [1] "Standard deviation: 10 years"

#  Probability of failure after 8 years
prob_failure_after_8 <- exp(-lambda * 8)
print(paste("Probability of failure after 8 years:", round(prob_failure_after_8, 4)))

## [1] "Probability of failure after 8 years: 0.4493"

# Probability of failure in the next 2 years given 8 years owned
prob_failure_next_2 <- exp(-lambda * 2)
print(paste("Probability of failure in the next 2 years given 8 years owned:", round(prob_failure_next_2, 4)))

## [1] "Probability of failure in the next 2 years given 8 years owned: 0.8187"

Data 605 Week 5

Johnny Rodriguez

2023-10-07

1. Bayesian Calculation for HIV Test

2. Binomial Distribution for Inspection

Probability of Exactly 2 Inspections

Probability of 2 or More Inspections

3. Poisson Distribution for Patient Arrival

Probability of Exactly 3 Patients Arriving in One Hour

Probability of More Than 10 Patients Arriving in One Hour

4. Hypergeometric Distribution for Favoring Nurses

Hypergeometric Distribution

Probability of Selecting Five Nurses

Expected Number of Nurses and Non-Nurses

5. Geometric Distribution for Car Injuries

Geometric Distribution:

Probability of Being Injured in a Year (1200 hours)

Probability of Being Injured in 15 Months (1800 hours)

Expected Number of Hours Before Being Injured

Probability of Being Injured in the Next 100 Hours Given 1200 Hours Driven

6. Poisson Distribution for Independent, Fixed Interval Events (Generator Failure)

Poisson Distribution

Probability of More Than Two Failures in 1000 Hours

Expected Value

7. Continuous Uniform Distribution for Waiting Time

Continuous Uniform Distribution Equations

Probability of Waiting More Than 10 Minutes

Probability of Waiting At Least Another 5 Minutes Given 10 Minutes Waited

Expected Waiting Time

8. Exponential Distribution for MRI Failure

Exponential Distribution Equations

Expected Failure Time

Standard Deviation

Probability of Failure After 8 Years

Probability of Failure in the Next 2 Years Given 8 Years Owned