DATA 605 HW5-YINA QIAO

Problem 1: Bayesian

A new test for multinucleoside-resistant (MNR) human immunodeficiency virus type 1 (HIV-1) variants was recently developed. The test maintains 96% sensitivity, meaning that, for those with the disease, it will correctly report “positive” for 96% of them. The test is also 98% specific, meaning that, for those without the disease, 98% will be correctly reported as “negative.” MNR HIV-1 is considered to be rare (albeit emerging), with about a .1% or .001 prevalence rate. Given the prevalence rate, sensitivity, and specificity estimates, what is the probability that an individual who is reported as positive by the new test actually has the disease? If the median cost (consider this the best point estimate) is about $100,000 per positive case total and the test itself costs $1000 per administration, what is the total first-year cost for treating 100,000 individuals?

sensitivity <- 0.96
specificity <- 0.98
prevalence <- 0.001

p_positive <- (sensitivity * prevalence) + ((1 - specificity) * (1 - prevalence))
p_disease_given_positive <- (sensitivity * prevalence) / p_positive

test_cost <- 1000
treatment_cost_per_positive_case <- 100000
num_individuals <- 100000
total_test_cost <- num_individuals * test_cost
expected_true_positives <- p_disease_given_positive * num_individuals
total_treatment_cost <- expected_true_positives * treatment_cost_per_positive_case
total_first_year_cost <- total_test_cost + total_treatment_cost

cat("Probability disease given positive:", format(p_disease_given_positive, digits = 2))

## Probability disease given positive: 0.046

cat("Total first-year cost:", format(total_first_year_cost, big.mark = ",", scientific = FALSE))

## Total first-year cost: 558,452,722

Problem 2: Binomial Distribution

The probability of your organization receiving a Joint Commission inspection in any given month is .05. What is the probability that, after 24 months, you received exactly 2 inspections? What is the probability that, after 24 months, you received 2 or more inspections? What is the probability that your received fewer than 2 inspections? What is the expected number of inspections you should have received? What is the standard deviation?

n_months <- 24
p_inspection <- 0.05

p_exactly_2 <- dbinom(2, n_months, p_inspection)
p_2_or_more <- 1 - pbinom(1, n_months, p_inspection)
p_fewer_than_2 <- pbinom(1, n_months, p_inspection)
expected_inspections <- n_months * p_inspection
std_dev_inspections <- sqrt(expected_inspections * (1 - p_inspection))

cat("Probability of exactly 2 inspections:", format(p_exactly_2, digits = 2))

## Probability of exactly 2 inspections: 0.22

cat("Probability of 2 or more inspections:", format(p_2_or_more, digits = 2))

## Probability of 2 or more inspections: 0.34

cat("Probability of fewer than 2 inspections:", format(p_fewer_than_2, digits = 2))

## Probability of fewer than 2 inspections: 0.66

cat("Expected inspections:", expected_inspections, "\n")

## Expected inspections: 1.2

cat("Standard deviation of inspections:", format(std_dev_inspections, digits = 3))

## Standard deviation of inspections: 1.07

Problem 3: Poisson Distribution

(Poisson). You are modeling the family practice clinic and notice that patients arrive at a rate of 10 per hour. What is the probability that exactly 3 arrive in one hour? What is the probability that more than 10 arrive in one hour? How many would you expect to arrive in 8 hours? What is the standard deviation of the appropriate probability distribution? If there are three family practice providers that can see 24 templated patients each day, what is the percent utilization and what are your recommendations?

lambda <- 10

# Probability of exactly 3 arrivals
p_exactly_3 <- dpois(3, lambda)

# Probability of more than 10 arrivals
p_more_than_10 <- 1 - ppois(10, lambda)

# Expected arrivals in 8 hours
expected_8_hours <- 8 * lambda

# Standard deviation
std_dev <- sqrt(lambda)

# Utilization rate calculation
provider_capacity <- 3 * 24
utilization_rate <- (expected_8_hours / provider_capacity) * 100

cat("Probability of exactly 3 arrivals: ", p_exactly_3)

## Probability of exactly 3 arrivals:  0.007566655

cat("Probability of more than 10 arrivals: ", p_more_than_10)

## Probability of more than 10 arrivals:  0.4169602

cat("Expected arrivals in 8 hours: ", expected_8_hours)

## Expected arrivals in 8 hours:  80

cat("Standard deviation: ", std_dev)

## Standard deviation:  3.162278

cat("Utilization rate: ", utilization_rate,"%")

## Utilization rate:  111.1111 %

With three providers each able to see 24 patients a day, the utilization rate is 111.11%, indicating that the clinic is overutilized given the current arrival rate of patients.

Problem 4: Hypergeometric Distribution

(Hypergeometric). Your subordinate with 30 supervisors was recently accused of favoring nurses. 15 of the subordinate’s workers are nurses and 15 are other than nurses. As evidence of malfeasance, the accuser stated that there were 6 company-paid trips to Disney World for which everyone was eligible. The supervisor sent 5 nurses and 1 non-nurse. If your subordinate acted innocently, what was the probability he/she would have selected five nurses for the trips? How many nurses would we have expected your subordinate to send? How many non-nurses would we have expected your subordinate to send?

total_workers <- 30
total_nurses <- 15
total_selected <- 6
nurses_selected <- 5

# Probability of selecting 5 nurses
p_5_nurses <- dhyper(nurses_selected, total_nurses, total_workers - total_nurses, total_selected)
expected_nurse <- total_selected * (total_nurses/total_workers)
expected_non_nurse <- total_selected - expected_nurse

cat("Probability of selecting 5 nurses: ", p_5_nurses)

## Probability of selecting 5 nurses:  0.07586207

cat("Expected number of nurses to be send: ", expected_nurse)

## Expected number of nurses to be send:  3

cat("Expected number of non-nurses to be send: ", expected_non_nurse)

## Expected number of non-nurses to be send:  3

There’s a 7% chance that, if selections were made completely at random, 5 out of the 6 people chosen for the trip would be nurses. In other words, if you were to repeat this random selection process many times, approximately 7 out of every 100 selections would result in 5 nurses being chosen.

This probability doesn’t necessarily imply bias or fairness by itself but gives an idea of how likely or unlikely the outcome is under completely random circumstances. A 7% chance is relatively low, which suggests that such an outcome is somewhat unlikely but not impossible

Problem 5: Geometric Distribution

(Geometric). The probability of being seriously injured in a car crash in an unspecified location is about .1% per hour. A driver is required to traverse this area for 1200 hours in the course of a year. What is the probability that the driver will be seriously injured during the course of the year? In the course of 15 months? What is the expected number of hours that a driver will drive before being seriously injured? Given that a driver has driven 1200 hours, what is the probability that he or she will be injured in the next 100 hours?

p_injury <- 0.001
hours_per_year <- 1200
hours_per_15m <- 1200 * 1.25
# Probability of being seriously injured during the 1200h in a year
p_injury_year <- 1 - (1 - p_injury) ^ hours_per_year

# Driving Hours in 15 Months: 15 months would be 1.25 years, so 1200 hours * 1.25 = 1500 hours.
p_injury_15m <- 1 - (1 - p_injury) ^ hours_per_15m

# Expected Time to Injury: This is the average time until the first injury, which is simply the reciprocal of the injury probability per hour (1 / 0.001 = 1000 hours).

expected_hours_before_injury <- 1 / p_injury

# Probability of being seriously injured in the next 100 hours, given 1200 hours driven
#After 1200 Hours: The geometric distribution's "memoryless" property means past events don't affect future probabilities. The chance of getting injured in any 100-hour period is the same, regardless of previous hours driven. So, the probability of getting injured in the next 100 hours is 

p_injury_next_100 <- 1 - (1 - p_injury) ^ 100

cat("Probability of injury during the year: ", p_injury_year)

## Probability of injury during the year:  0.6989866

cat("Probability of injury during the 15 months: ", p_injury_15m)

## Probability of injury during the 15 months:  0.7770372

cat("Expected number of hours that a driver will drive before being seriously injured?: ", expected_hours_before_injury)

## Expected number of hours that a driver will drive before being seriously injured?:  1000

cat("Probability of injury in the next 100 hours: ", p_injury_next_100)

## Probability of injury in the next 100 hours:  0.09520785

Problem 6: Poisson Distribution for Generator Failure

You are working in a hospital that is running off of a primary generator which fails about once in 1000 hours. What is the probability that the generator will fail more than twice in 1000 hours? What is the expected value?

lambda_failure <- 1 / 1000 * 1000  # Rate per 1000 hours

# Probability of more than two failures in 1000 hours
#The probability of the generator failing more than twice in 1000 hours can be found using the sum of probabilities for 0, 1, and 2 failures, and then subtracting from 1.
p_fail_more_than_2 <- 1 - ppois(2, lambda_failure)

cat("Probability of more than two failures: ", p_fail_more_than_2, "\n")

## Probability of more than two failures:  0.0803014

Since the rate λ=1 for a 1000-hour period, the expected number of failures in 1000 hours is simply 1.

Problem 7: Uniform Distribution

A surgical patient arrives for surgery precisely at a given time. Based on previous analysis (or a lack of knowledge assumption), you know that the waiting time is uniformly distributed from 0 to 30 minutes. What is the probability that this patient will wait more than 10 minutes? If the patient has already waited 10 minutes, what is the probability that he/she will wait at least another 5 minutes prior to being seen? What is the expected waiting time?

total_waiting <- 30
p_wait_more_10 <- (total_waiting - 10) / total_waiting

# Given already waited 10 minutes, probability of waiting at least another 5 minutes
p_wait_another_5 <- 15 / 20

#The midpoint between 0 and 30 minutes is 15 minutes, so the average or expected waiting time is 15 minutes.
expected_weight_time <- (0 + 30) / 2 

cat("Probability of waiting more than 10 minutes: ", p_wait_more_10)

## Probability of waiting more than 10 minutes:  0.6666667

cat("Probability of waiting at least another 5 minutes after 10: ", p_wait_another_5)

## Probability of waiting at least another 5 minutes after 10:  0.75

cat("Expected waiting time: ", expected_weight_time)

## Expected waiting time:  15

Problem 8: Exponential Distribution

Your hospital owns an old MRI, which has a manufacturer’s lifetime of about 10 years (expected value). Based on previous studies, we know that the failure of most MRIs obeys an exponential distribution. What is the expected failure time? What is the standard deviation? What is the probability that your MRI will fail after 8 years? Now assume that you have owned the machine for 8 years. Given that you already owned the machine 8 years, what is the probability that it will fail in the next two years?

lambda_mri <- 1 / 10  # Rate parameter (1/expected lifetime)

# Expected failure time and standard deviation for an exponential distribution
expected_failure_time <- 1 / lambda_mri
# Same as the mean for exponential distribution
std_dev_failure_time <- 1 / lambda_mri  

# Probability that MRI will fail after 8 years
p_fail_after_8 <- 1 - pexp(8, lambda_mri)

# Given 8 years old, probability of failing in the next 2 years=probability of  MRI falling within 10 years-probability of falling within 8 years.
p_fail_next_2 <- pexp(10, lambda_mri) - pexp(8, lambda_mri)

cat("Expected failure time (years): ", expected_failure_time)

## Expected failure time (years):  10

cat("Standard deviation (years): ", std_dev_failure_time)

## Standard deviation (years):  10

cat("Probability MRI will fail after 8 years: ", p_fail_after_8)

## Probability MRI will fail after 8 years:  0.449329

cat("Probability of failing in the next 2 years, given 8 years old: ", p_fail_next_2)

## Probability of failing in the next 2 years, given 8 years old:  0.08144952