*Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an extension.

*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.

*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manually calculations on your exams, so practice accordingly.

*You must include an explanation and/or intermediate calculations for an exercise to be complete.

*Be sure to submit the HWK3 Auto grade Quiz which will give you ~20 of your 40 accuracy points.

*50 points total: 40 points accuracy, and 10 points completion

Probability

Exercise 1: A geneticist is studying two genes. Each gene can be either dominant or recessive. A collection of 100 individuals is categorized and found to have 58 individuals with both genes dominant, 6 individuals with both genes recessive and a total of 70 Gene 2 dominant individuals.

  1. Create a 2-way table to organize the counts of individuals within each of the 4 combinations of dominant and recessive for the two genes.
X Gene 2 Dominant Gene 2 Recessive Total
Gene 1 Dominant 58 24 82
Gene 1 Recessive 12 6 18
Total 70 30 100
  1. What is the probability that a randomly sampled individual from this group has Gene 1 dominant?

The probability that a randomly sampled individual from this group has gene 1 dominant is: 82 total Gene 1 dominant/100. This is 82%

  1. What is the probability that a randomly sampled individual from this group has Gene 1 or Gene 2 dominant?

The probability that a randomly sampled individual from this group has Gene one or Gene 2 dominant is 94 total Gene 1 and Gene 2 Dominant/100. This is 94%. 58+24+12=94

  1. What is the probability that in a random sample of 3 individuals from this group (without replacement), at least one of the three has both recessive genes?

The probability that at least one of the three picked has both recessive genes is 17.1% 1-((94/100)(93/99)(92/98)=0.17103

  1. What is the probability that a randomly sampled individual from this group has Gene 2 dominant, given we know they have Gene 1 dominant?

The probability that a randomly sampled individual has Gene 2 dominant, given Gene 1 dominant, is 58 total gene 2 dominant when also gene 1 is dominant/82. 58/82 is 70.7%

  1. The genes are said to be in linkage equilibrium if the event that Gene 1 is dominant is independent of the event that Gene 2 is dominant. Are these genes in linkage equilibrium in this group of 100 individuals?

If the events are independent then P(G2D and G1D)=P(G2D)P(G1D) 58/100=(70/100)*(82/100) 58/100=57.4/100 0.58=0.574

  1. Now suppose in another group of 100 individuals, 6 individuals have both genes recessive and a total of 70 Gene 2 dominant individuals. How many individuals would have both genes dominant if the event: Gene 1 is dominant is independent of the event: Gene 2 is dominant in this group of 100 individuals? Make sure to show how you calculated your answer.
X Gene 2 Dominant Gene 2 Recessive Total
Gene 1 Dominant x=56 24 80
Gene 1 Recessive 14 6 20
Total 70 30 100

Because Gene 1 is independent and Gene 2 is Dominant, P(G2D/G1D)=P(G2D) /[/frac{x}{100}/=,x=56] From the calculations above, 56 individuals have both dominant genes if the event: Gene 1 is dominant is independent and Gene 2 is dominant in this group of 100 individuals. 56+24=80 total Gene 1 Dominant

Exercise 2: Prevention after acute myocardial infarction (AMI) is primarily managed through medications. A large cohort study of post-AMI patients >65 years of old (*) found only 74% of patients filled all their discharge prescriptions by 120 days after discharge.

A physician at UW has 4 post-AMI patients >65 yo and would like to use 0.74 has his estimate for \(\pi\), the probability for each of his patients filling all of their discharge prescriptions by 120 days after discharge. Define a random variable F, the count of the physician’s four patients who fill all of their discharge prescriptions by 120 days after discharge. Assume that the filling of prescription behavior is independent between the 4 patients and that \(\pi=0.74\).

  1. Determine the probability distribution of F (write out the pmf) using probability theory. You can check your answers with R dbinom.
dbinom(0,size=4,prob=.74)
## [1] 0.00456976
dbinom(1,size=4,prob=.74)
## [1] 0.05202496
dbinom(2,size=4,prob=.74)
## [1] 0.2221066
dbinom(3,size=4,prob=.74)
## [1] 0.421433
dbinom(4,size=4,prob=.74)
## [1] 0.2998658
f P(F=f)
0 0.00457
1 0.05202
2 0.22211
3 0.421433
4 0.29987
  1. Compute the probability that F > 0. What does this value mean in the context of the scenario?

The probability that F is greater than 0 is: 0.052025+0.222107+0.421422+0.299866=0.995431, or 99.5%. F greater than zero means at least one of the four people filled all their discharge prescriptions by 120 days. In the context of the scenario, these values mean that 99.5% of the time at least one of the four people filled all of their prescriptions by 120 days after discharge.

  1. What is the expected value for F, \(\mu_F\)? What does that value mean in the context of the scenerio?

The expected value for F,uf, is: 00.004571+10.052025+20.222107+30.421433+4*0.299866=2.960. In this scenario, this value means that 2.96 out of 4 patients will fill their prescriptions before the 120 days, if they follow 74% of people who usually do.

  1. What is the standard deviation for F, \(\sigma_F\)?

The standard deviation for F is: 0.004571(0-2.96)2+0.052025(1-2.96)2+0.22107(2-296)2+0.421433(3-2.96)2+0.299866(4-2.96)2=(0.76967)1/2=0.8773

  1. Explain (briefly, in ~2 sentences) how you can use the following simulation to check your answers for part 2a. Some questions to consider: Why did I define FilledPresc as I did? What values are stored into the CountFilled vector? What does the histogram show?

You can use the following simulation to check your answers by dividing the frequencies by 100,000. This histogram is showing the experiment done 100,000 times, so it is the same probabilities just multiplied by 100,000. For example, for f4, all four patients will fill their prescription, this simulation got 29.85% probability and I got 29.99% probability. They are not exactly the same, but they are similar enough where you can check answers from A.

FilledPresc <- c(rep(1, 74), rep(0, 26))
iterations <- 100000
CountFilled <- rep(0, iterations)
set.seed(1)
for (i in 1:iterations){
  samp <- sample(FilledPresc, 4, replace = TRUE)
  CountFilled[i] <- sum(samp)
}

hist(CountFilled, labels = TRUE,
     ylim = c(0, .5*iterations), breaks = seq(-0.5, 4.5, 1))

  1. Suppose this physician now has 20 post-AMI patients >65 years and wants to use a Binomial model (n = 20, \(\pi=0.74\)) to describe the number of those 20 patients who will get all discharge prescriptions filled within 120 days.
  1. What the the probability that exactly 15 of those 20 patients get all discharge prescriptions filled within 120 days?

The probability that exactly 15 is 20.1%.

dbinom(15,size=20,prob=74/100)
## [1] 0.2012734
  1. What the the probability that 15 or more of those 20 patients get all discharge prescriptions filled within 120 days?

The probability of at least 15 people get prescriptions filled is 57.65%.

dbinom(15,size=20,prob=74/100)
## [1] 0.2012734
dbinom(16,size=20,prob=74/100)
## [1] 0.1790172
dbinom(17,size=20,prob=74/100)
## [1] 0.1198848
dbinom(18,size=20,prob=74/100)
## [1] 0.05686843
dbinom(19,size=20,prob=74/100)
## [1] 0.01703751
dbinom(20,size=20,prob=74/100)
## [1] 0.002424568

0.2012734+0.1790172+0.1198848+0.05686843+0.01703751+0.002424568=0.5765.

  1. Which histogram given below correctly shows the probability histogram for the binomial model described in f?

The histogram of Graph B shows the probability for the binomial model described in f. This histogram is left skewed and relatively shows the same location and maximum.

Exercise 3: For each of the following questions, say whether the random variable is reasonably approximated by a binomial random variable or not, and explain your answer. Comment on the reasonableness of each of things that must be true for a variable to be a binomial random variable (ex: identify \(n\) the number of Bernoulli trials, \(\pi\) the probability of success, etc).

  1. A fair die is rolled until a 1 appears, and X denotes the number of rolls.

This is not reasonable because there is not a specific set of trials. The dice is being rolled until a 1 appears,so the number of trials in unknown. The probability of success would 1/6.

  1. Twenty of the different Badger basketball players each attempt 1 free throw and X is the total number of successful attempts.

This is not reasonable because each basketball player has a different success rate based on if they can make a free throw. The number of trials would be twenty, but this scenario has no set probability of success.

  1. A die is rolled 50 times. Let X be the face that lands up.

This is not reasonable because there is no set probability of success. This scenario is not trying to find any probability, it is just recording the face the die lands on for 50 trials. There is no set success of failure. There us a set number of trials for this scenario, 50, but no set probability of success of need to use the binomial random variable.

  1. In a bag of 10 batteries, I know 2 are old. Let X be the number of old batteries I choose when taking a sample of 4 to put into my calculator.

This is not reliable because the batteries are not being replaced after each trial. The probability of choosing an old battery is different and will change each time you choose one. The number of trials is 4, but with different probabilities for each trial, the binomial random variable would be reasonable. For example, the first battery probability would be 2/10, but for the second one it would be 1/9.

  1. It is reported that 20% of Madison homeowners have installed a home security system. Let X be the number of homes without home security systems installed in a random sample of 100 houses in the Madison city limits.

This is reliable because there is a set number of trial and a set probability of success. The number of trials would the random sample of 100 houses, and the probability is 20% of homeowners install a home security system. With all of the variables, this scenario would be reasonably approximated by a binomial random variable.