Quiz Chapter 5 and Probability

Author

Jack Hegarty

Description of Samples and Populations

  1. The carbon monoxide in cigarettes is thought to be hazardous to the fetus of a pregnant woman who smokes. In a study of this hypothesis, blood was drawn from pregnant women before and after smoking a cigarette. Measurements were made of the percent increase of blood hemoglobin bound to carbon monoxide (COHb). The results for 26 women are:
blooddata <- c(6.4, 2.6, 3.5, 2.9, 3.9, 2.2, 5.5, 4.4, 3.5, 3.2, 2.8, 2.4, 3.5, 3.3, 3.7, 2.6, 3.5, 4.5, 4.2, 2.9, 3.1, 3.3, 4.3, 2.6, 4.1, 3.7)
  1. (2 pts) Find the mean, median, sample standard deviation, and IQR. Be sure to include proper STATISTICAL NOTATION for each with their respective values

Answer: xbar = 3.56, M = 3.5, s = 0.95, IQR = 1.15

  1. (1 pt) Create a boxplot of these observations. If you are not using R, be sure your axis shows a proper scale (R will display the scale by default).
boxplot(blooddata, col = "red", horizontal = TRUE, main = "CO in Pregnant Smokers' Blood", xlab = "% increase of COHb")

  1. (1 pt) Create a histogram of these observations. If you are not using R, be sure your axis shows a proper scale (R will display the scale by default).
hist(blooddata, col = "red", main = "CO in Pregnant Smokers' Blood", xlab = "% increase of COHb")

  1. Suppose a medical test has a 96% chance of detecting a disease if the person has it and a 92% chance of correctly indicating that the disease is absent if the person really does not have the disease. Assume 12% of a particular population has the disease.

    a. (2 pts) Create a probability tree diagram to show probabilities of outcomes. Indicate on your diagram with labels the following: sensitivity, specificity, false positive, false negative.

library(DiagrammeR)

bayes_probability_tree <- function(prior, true_positive, true_negative) {
  
  if (!all(c(prior, true_positive, true_negative) > 0) && !all(c(prior, true_positive, true_negative) < 1)) {
    stop("probabilities must be greater than 0 and less than 1.",
         call. = FALSE)
  }
  c_prior <- 1 - prior
  c_tp <- 1 - true_positive
  c_tn <- 1 - true_negative
  
  round4 <- purrr::partial(round, digits = 4)
  
  b1 <- round4(prior * true_positive)
  b2 <- round4(prior * c_tp)
  b3 <- round4(c_prior * c_tn)
  b4 <- round4(c_prior * true_negative)
  
  bp <-  round4(b1/(b1 + b3))
  
  labs <- c("X", prior, c_prior, true_positive, c_tp, true_negative, c_tn, b1, b2, b4, b3)
  
  tree <-
    create_graph() %>%
    add_n_nodes(
      n = 11,
      type = "path",
      label = labs,
      node_aes = node_aes(
        shape = "circle",
        height = 1,
        width = 1,
        x = c(0, 3, 3, 6, 6, 6, 6, 8, 8, 8, 8),
        y = c(0, 2, -2, 3, 1, -3, -1, 3, 1, -3, -1))) %>% 
    add_edge(
      from = 1,
      to = 2,
      edge_aes = edge_aes(
        label = "Disease"
      )
    ) %>% 
    add_edge(
      from = 1, 
      to = 3,
      edge_aes = edge_aes(
        label = "No Disease"
      )
    ) %>% 
    add_edge(
      from = 2,
      to = 4,
      edge_aes = edge_aes(
        label = "Sensitivity"
      )
    ) %>% 
    add_edge(
      from = 2,
      to = 5,
      edge_aes = edge_aes(
        label = "False Negative"
      )
    ) %>% 
    add_edge(
      from = 3,
      to = 7,
      edge_aes = edge_aes(
        label = "False Positive"
      )
    ) %>% 
    add_edge(
      from = 3,
      to = 6,
      edge_aes = edge_aes(
        label = "Specificity"
      )
    ) %>% 
    add_edge(
      from = 4,
      to = 8,
      edge_aes = edge_aes(
        label = "="
      )
    ) %>% 
    add_edge(
      from = 5,
      to = 9,
      edge_aes = edge_aes(
        label = "="
      )
    ) %>% 
    add_edge(
      from = 7,
      to = 11,
      edge_aes = edge_aes(
        label = "="
      )
    ) %>% 
    add_edge(
      from = 6,
      to = 10,
      edge_aes = edge_aes(
        label = "="
      )
    ) 
  print(render_graph(tree))
  invisible(tree)
}
bayes_probability_tree(prior = 0.12, true_positive = 0.96, true_negative = 0.92)
  1. (1 pt) What is the probability that a randomly chosen person will test positive? (Show calculations)

Answer: p(+) = .1152 + .0704 = 0.1856

  1. (1 pt) Suppose that a randomly chosen person does test positive. What is the probability that this person really has the disease?

Answer: p(D|+) = .1152/.1856 = .621

  1. The prevalence of mild myopia in adults over age 40 is 23% in the U.S. Let random variable Y denote the number with myopia out of a random sample of 5.

    1. (2 pts) Complete the table to the right
n = 5 p = 0.23
Y (No. of Myopic) successes No. of Non-myopic Probability (Binomial expansion) Probability (as a decimal; feel free to use R code here)
0 5 (.230)(1-.23)5 .271
1 4 5(.231)(1-.23)4 .404
2 3 10(.232)(1-.23)3 .242
3 2 10(.233)(1-.23)2 .072
4 1 5(.234)(1-.23)1 .011
5 0 (.235)(1-.23)0 .001
Total = 1.000
  1. (1 pt) Find Pr{Y>=3}

Answer: 1 - pbinom(2, 5, .23) = .084

  1. (1 pt) Find Pr{Y<=2}

Answer: pbinom(2, 5, .23) = .916