Part One
prob <- 0.15
Sample <- 7
Q1 Find the expected number of people that will develop a Condition
Knowing that \[E(x)=np\]
Binomial_expected <- function(prob, size) {
E=size*prob
return(E)
}
Expected_value=Binomial_expected(prob, Sample)
Expected_value
## [1] 1.05
Q2 Find the probability that exactly 4 smokers of the sample will develop a severe lung condition.
The easiest way to calculate this manually is with the Probability Mass Function forumla: \[\Pr(X = x) = p(x) = \binom{n}{x} p^x (1-p)^{n-x},\] which is the same as the pbinomial(x = x, n=sample, p=prob)
dbinom(4,size=Sample, p=prob)
## [1] 0.01088153
There is a way to verify that this is the correct value: We can use the Cumulative Distribution Function formula and the axiom \[F(x) = \Pr(X \leq x) = \sum_{i = 0}^{\lfloor{x}\rfloor}\binom{n}{i} p^i (1-p)^{n-i}\] \[p(x)=F(x)-F(x-1)\] so in this case we can use pbinom(x = x, n=sample, p=prob)`
pbinom(4,size=Sample, prob = prob)-pbinom(3, size=Sample, prob=prob)
## [1] 0.01088153
Since both give the same answer we can be confident that this is the correct number
Q3 Find the probability that more than 2 smokers of the sample will develop a severe lung condition.
Again we have 2 ways to find this: One uses the fact that \[\Pr(X > x) = 1-F(x)\] So we can find the CDF and take one from it
1-pbinom(2, size=Sample, prob=prob)
## [1] 0.07376516
The second, more computationally intensive method involves summing the PMFs together
\[Pr(X > x) = \sum_{i = x+1}^{Max Size} p(i) \]
sum(dbinom(3:Sample, size = Sample, prob=prob))
## [1] 0.07376516
Again they match
Q4 Find the probability that at least 3 but less than 6 smokers of the sample will develop a severe lung condition.
In this case the we want \[Pr(3 \le x < 6 )\] First thing to note is that \[Pr(x<X)= F(x-1)\] Secondly we know that \[Pr(X\le x \le Y ) = F(Y)-F(X-1)\] so we can use
pbinom(5, size=Sample, prob=prob) - pbinom(2, size=Sample, prob = prob)
## [1] 0.07369568
or again we can sum the probabilities being careful to note that it goes to 5 because the question explicity says less than
sum(dbinom(3:5, size = Sample, prob=prob))
## [1] 0.07369568
Q5 Find the conditional probability that no more than 5 smokers of the sample will develop a severe lung condition given that more than 3 smokers of the sample will develop a severe lung condition.
For this we need to make use of a number of function and capabilities and axioms
Firstly the conditional probability forumla
\[\Pr(A | B) = \frac{\Pr(A \cap B)}{\Pr(B)}.\] so we can infer that
\[\Pr(X\le 5 | X > 3) = \frac{\Pr(3 \le X < 6 ))}{\Pr(X > 3)}= \frac{Pr(X\le 5)-Pr(X \le 3)}{1-Pr(X \le 3)}.\]
Lets Calculate this
(pbinom(5, size=Sample, prob=prob)-pbinom(3, size=Sample, prob=prob))/(1-pbinom(3, size=Sample, prob=prob))
## [1] 0.9942591
We can also use the formula for our Sample size \[Pr(X\le5|X>3)=\frac{Pr(x=4)+Pr(x=5)}{Pr(x=4)+Pr(x=5)+Pr(x=6)+Pr(x=7)}\]
sum(dbinom(4:5, size=Sample, prob=prob))/sum(dbinom(4:Sample, size=Sample, prob=prob))
## [1] 0.9942591
Q6 Create a barplot of the PMF of the random variable representing the number of smokers of the sample who will develop a severe lung condition.
Lets write a function to do this and add some colours to seperate the liklihood of each event occuring
plotBinomPMF <- function(n, p) {
##Get parameters
x <- 0:n
y <- dbinom(x, size = n, prob=p)
yLim <- c(0, 1.1*max(y))
rounding<- round(y, digits=10)
breaks <- c(0, 0.25*max(y), 0.5*max(y), 0.75*max(y))
cols<-c("darkgreen", "darkblue", "orange", "darkred")[findInterval(y, vec=breaks)]
# Plot the value
plot = barplot(y, main='Liklihood of number of Smokers getting a Lung Condition', names.arg = x,ylim=yLim, xlab = "# Smokers with Lung condition", ylab="liklihood of # smokers getting condition", col= cols)
#Add some labels to the axes
text(x=plot, y=y, labels = rounding, pos=3, cex=0.5)
#Add legend to plot
legend("topright" ,
legend=c("below 25% of max prob","between 25% and 50%" , "Between 50 and 75%", "over 75%"),
text.col = c("darkGreen", "darkblue", "Orange", "darkRed") ,
col=c("darkGreen", "darkblue", "Orange", "darkRed"),
lty=1,
cex=0.8)
#return (plot)
}
Calling the function
plotBinomPMF(n = Sample,
p = prob)
we can also plot the Cumulative Density Function for this sample space
plotbinomPDF <- function(Sample, prob) {
x_C <- 0:Sample
y_C <- c(0, pbinom(x_C, size=Sample, prob=prob))
yLim <- c(0, 1.1*max(y_C))
rounding<- round(y_C, digits=10)
breaks <- c(0, 0.25, 0.5, 0.99)
cols<-c("darkred", "orange", "darkblue", "darkgreen")[findInterval(y_C, vec=breaks)]
CDF <- stepfun(x_C, y_C)
plot1<- plot.stepfun(CDF, main = 'Cumulative Density Function of the above distribution', ylab = 'Probability of max # of smokers with Condition',xlab="maximum number of smokers with condition", ylim = yLim, verticals = FALSE, do.points = TRUE, pch = 16, col.points = cols)
legend("bottomright",
legend=c("P<0.25","0.25<P<0.5" , "0.5<P<.75", "P> .75"),
text.col = c("darkRed", "Orange", "Darkblue", "darkgreen") ,
col=c("darkRed", "orange", "DarkBlue", "darkgreen"), )
}
Calling the function
plotbinomPDF(7, 0.15)
TASK 2 Mutations in a chromosone follows a Poisson Distribution with mean of 9 For a Poisson distribution we need to have the mean of the distribution \(\lambda\)
lambda <- 9
For a Poisson distribution the Probability Mass Function follows the formula \[Pr(X=x)=f(x) = \frac{\lambda^x e^{-\lambda}}{x!}, \qquad x = 0, 1, 2, \ldots\] We can also use the Cumulative Density Function \[Pr(X \le x)=F(x) = \sum_{i = 0}^{\lfloor{x}\rfloor}\frac{\lambda^i e^{-\lambda}}{i!}, \qquad x \ge 0.\] Q7 Find the probability of observing less than 3 or more than 7 mutations. Here we need \[Pr(X<3) \cap Pr(X>7)\]
For this we can use the CDF but we need to make use of the fact that \[Pr(X>x)=1-Pr(X \le x)\] So the for the above question since the events are independant (cannot have an event being 1 and 2) we can say that probabilities can be summed \[Px(X<3||X>7)=Pr(x \le 2)+[1-Pr(X \le 7)]\]
less3 <- ppois(2, lambda = lambda)
greater7 <- 1-ppois(7, lambda=lambda)
less3+greater7
## [1] 0.6823352
Q8 Find the probability of observing no more than 5 mutations. For the probability of no more than 5 mutations we can use one of 2 ways we can sum the PMF or we can use the CDF PMF
sum(dpois(0:5, lambda = lambda))
## [1] 0.1156905
CDF
ppois(5, lambda= lambda)
## [1] 0.1156905
Q9 Find the probability of observing either more than 5 or less than 2 mutations. \[Pr(X>5|X<2)= 1-Pr(2\le X \le 5)\]
1-(sum(dpois(2:5, lambda)))
## [1] 0.8855436
Q10 Create a barplot of the PMF of the random variable representing the number of mutations (constrain the number of mutations to a maximum of 13).
plotPMFPoisson <- function(max, lambda) {
x<-0:max
y<-dpois(x, lambda = lambda)
yLim <- c(0, 1.1*max(y))
rounding<- round(y, digits=4)
breaks <- c(0, max(y)*0.25, max(y)*0.5, max(y)*0.75)
cols<-c("darkgreen","darkblue", "orange", "darkred")[findInterval(y, vec=breaks)]
plot1 = barplot(y,
main='Liklihood of number of Mutations on this chromosone',
names.arg = x,
ylim=yLim,
xlab = "# mutations", ylab="liklihood of # mutations", col= cols)
text(x=plot1, y=y, labels = rounding, pos=3, cex=0.5)
legend("topleft",
legend=c("prob <25% of Max Prob","between 25% and 50% " , "between 50% and 75%", "over 75%"),
text.col = c("darkGreen", "Dark Blue", "Orange", "darkred") ,
col=c("darkgreen", "darkblue", "orange", "darkred") )
}
calling the function with maximum 13 mutations
plotPMFPoisson(13,9)