Defining Random Variables, Transforming, Binomial, and Normal RVs

Question 1 A chemical supply company ships a certain solvent in 10-gallon drums. Let X represent the number of drums ordered by a randomly chosen customer. Assume X has the following probability mass function (pmf). The mean and variance of X is : \(\mu_X=2.3\) and \(\sigma^2_X=1.81\):

X P(X=x)
1 0.4
2 0.2
3 0.2
4 0.1
5 0.1
  1. Calculate \(P(X \le 2)\) and describe what it means in the context of the problem.

\(P(X \le 2)\) = \(P(X=2)\) + \(P(X=1)\)

0.2 + 0.4
## [1] 0.6

This means the probability of 2 or less drums being ordered by a randomly chosen customer is 0.6.

  1. Let Y be the number of gallons ordered, so \(Y=10X\). Determine the probability mass function of Y.
number of gallons ordered,Y 10*1 = 10 10*2 = 20 10*3 = 30 10*4 = 40
Probability 0.4 0.2 0.2 0.1
  1. Calculate the mean number of gallons ordered \(\mu_Y\).

\(\mu_Y\) = \(10*\mu_X\)

10*2.3
## [1] 23
  1. Calculate the standard deviation of the number of gallons ordered, \(\sigma_Y\).

\(\sigma_Y\) = \(10*√\sigma_X\)

10*sqrt(1.81)
## [1] 13.45362

Exercise 2 Prevention after acute myocardial infarction (AMI) is primarily managed through medications. A large cohort study of post-AMI patients >65 years of old (*) found only 74% of patients filled all their discharge prescriptions by 120 days after discharge.

A physician at UW has 4 post-AMI patients >65 yo and would like to use 0.74 has his estimate for \(\pi\), the probability of each of his patients filling all of their discharge prescriptions by 120 days after discharge. Define a random variable F, the count of the physician’s four patients who fill all of their discharge prescriptions by 120 days after discharge. Assume that the filling of prescription behavior is independent between the 4 patients and that \(\pi=0.74\).

  1. Determine the probability distribution of F (write out the pmf) using probability theory.
F P(F=f)
0 \((1-0.74)^4 = 0.26^4\)
1 \(4(0.74)(0.26)^3\)
2 \(4(0.74)^2(0.26)^2\)
3 \(4(0.74)^3(0.26)\)
4 \(0.74^4\)
dbinom(0, 4, .74, log = FALSE)
## [1] 0.00456976
dbinom(1, 4, .74, log = FALSE)
## [1] 0.05202496
dbinom(2, 4, .74, log = FALSE)
## [1] 0.2221066
dbinom(3, 4, .74, log = FALSE)
## [1] 0.421433
dbinom(4, 4, .74, log = FALSE)
## [1] 0.2998658
  1. Compute the probability that F>0. What does this value mean in the context of the scenerio?

P(F>0) = 1-P(F=0)

1-dbinom(0, 4, .74, log = FALSE)
## [1] 0.9954302

The probability that at least one of the 4 patients will fill all of their discharge prescriptions by 120 days after discharge is 99.54%.

  1. What is the expected value for F, \(\mu_F\)? What does that value mean in the context of the scenerio?

\(\mu_F\) = 0(F=0)+1(F=1)+2(F=2)+3(F=3)+4(F=4)

0*(0.00456976)+1*(0.05202496)+2*(0.2221066)+3*(0.421433)+4*(0.2998658)
## [1] 2.96

This value is the long term average. If we did many experiments where we looked at the number of patients who fill all of their prescriptions by 120 days after discharge between 4 patients that follow this model, and took the average of what we observed, we would get 2.96.

  1. What is the standard deviation for F, \(\sigma_F\)?

\(\sigma_F\) =

sqrt((0.00456976*(0-2.96)^2)+(0.05202496*(1-2.96)^2)+(0.2221066*(2-2.96)^2)+(0.421433*(3-2.96)^2)+(0.2998658*(4-2.96)^2))
## [1] 0.8772685
  1. Explain (briefly) how you can use the following simulation to check your answers for part 2a. Some questions to consider: Why did I define FilledPresc as I did? What values are stored into the CountFilled vector? What does the histogram show?
FilledPresc=c(rep(1,74), rep(0,26))
manytimes=100000
CountFilled=rep(0,manytimes)
set.seed(1)
for (i in 1:manytimes){
  samp=sample(FilledPresc,4, replace=TRUE)
  CountFilled[i]=sum(samp)
}

hist(CountFilled, labels=TRUE, ylim=c(0,.5*manytimes), breaks=seq(-0.5, 4.5, 1))

FilledPresc defines the population where it has 74% of 1(fills prescription) values and 26% of O(doesn’t fill prescription) values. It represents the vector of values that we sampled for the loop. Then, we set the variable ‘manytimes’ to 10,000. The CountFilled vector then uses 10000 0s as spacers that are going to be replaced once the ‘for’ loop is run. The histogram shows the probability of each trial in the simulation after 10,000 runs.

This simulation confirms my answer to 2a because the probability of each trial matches my values from part a.

  1. Suppose this physician now has 20 post-AMI patients >65 years and wants to use a Binomial model (n=20, \(\pi=0.74\)) to describe the count of those 20 patients who will get all discharge prescriptions filled within 120 days. What the the probability that exactly 15 of those 20 patients get all discharge prescriptions filled within 120 days?

The probability that 15/20 of the patients fill all their prescriptions 120 days after discharge is =0.20127, or 20.127%.

dbinom(x=15, size=20, prob=74/100)
## [1] 0.2012734

Question 3 For each of the following questions, say whether the random variable is reasonably approximated by a binomial RV or not, and explain your answer. If it is not a binomial process, explain what assumptions are not well met. If it is a binomial process, comment on the validity of each of things that must be true for a process to be a binomial process (ex: identify \(n:\) the number of Bernoulli trials, \(\pi\) the probability of success, etc) .

  1. A fair die is rolled until a 1 appears, and X denotes the number of rolls.

This is not considered a binomial random variable because it doesn’t have a set number of trials as we’re going until we roll a 1.

  1. Twenty of the different Badger basketball players each attempt 1 free throw and X is the total number of successful attempts.

Not a binomial random variable because each player has a different probability of success and the success or failure for each could potentially impact the performance of another. Therefore, there is no independence.

  1. A die is rolled 50 times. Let X be the face that lands up.

This is a binomial process because it can yield only one of two outcomes. Number of Bernoulli trials \(n:50\).

Probability of success (face lands up): \(\pi=0.5\) and probability of failure (face lands down): \(1-\pi=0.5\) are the same. Also, they are independent because probability of one roll doesn’t affect the other.

  1. In a bag of 10 batteries, I know 2 are old. Let X be the number of old batteries I choose when taking a sample of 4 to put into my calculator.

Not a binomial because the probability of picking an old battery will change as you pick each battery. For example, the beginning probability of you choosing an old battery is 2/10, but let’s say you choose an old battery on your first choice. Now, the probability of choosing an old battery will become 1/9, which is a different value from 2/10.

  1. It is reported that 20% of Madison homeowners have installed a home security system. Let X be the number of homes without home security systems installed in a random sample of 100 houses in the Madison city limits.

This would be well approximated by a binomial because we have a set sample amount n=100.

Probability of success (houses with security systems): \(\pi=0.5\) and probability of failure (houses without security systems): \(1-\pi=0.5\) are the same.

The probability of success will also remain constant (or just about) due to the sheer size of the sample population (houses in Madison city limits). Additionally, the results of the houses are independent of each other because the houses are being chosen at random and the probability of success or failure of one doesn’t depend on another.

Exercise 4: Weights of female cats of a certain breed (A) are well approximated by a normal distribution with mean 4.1 kg and standard deviation of 0.6 kg \(W_A~\sim N(4.1, 0.6^2)\).

  1. What proportion of female cats of that breed (A) have weights between 3.7 and 4.4 kg?

\(P(3.7<A<4.4) =\)

pnorm(4.4,4.1,0.6)- pnorm(3.7,4.1,0.6)
## [1] 0.4389699
  1. A female cat of that breed (A) has a weight that is 0.5 standard deviations above the mean. What proportion of female cats of that breed (A) are heavier than this one?
4.1+(0.6/2)
## [1] 4.4
1 - pnorm(4.4, 4.1, 0.6)
## [1] 0.3085375
  1. How heavy is a female cat of this breed whose weght is on the 80th percentile?
qnorm(0.8,4.1,0.6)
## [1] 4.604973
  1. What is the IQR of weights for female cats of this breed using the normal distribution approximation?

\(IQR= Q3 -Q1\)

Q3: 75th percentile Q1: 25th percentile

qnorm(0.75,4.1,.6)-qnorm(0.25,4.1,.6)
## [1] 0.8093877
  1. Females from another breed of cats (breed B) have weights well approximated by a normal distribution with mean 10.6 lb and standard deviation of 0.9 lb \(W_{B.lb}~\sim N(10.6, 0.9^2)\). Transform the weights of cat breed B into kilograms using the conversion: 1 lb \(\approx\) 0.454 kgs. You can use the transformation: \(W_{B}=0.454*W_{B.lb}\). Compare the shape, mean, and stanard deviation of the two breeds.

\(\mu_{A}=4.1\) \(\sigma_{A}=0.6\)

\(\mu_{B}=0.454*\mu_{B.lb}=\)

10.6*0.453592
## [1] 4.808075

\(\sigma_{B}=0.454*\sigma_{B.lb}=\)

0.9*0.453592
## [1] 0.4082328

Seeing that the mean is higher than in breed A, and the standard deviation is lower than in breed A. This will translate to a graph of B by having more data shifted towards the right (higher end of the graph) than A, and the values for breed B will be more tightly grouped than the values in breed A since the standard deviation is lower.