Part One

prob <- 0.15
Sample <- 7

Q1 Find the expected number of people that will develop a Condition

Knowing that \[E(x)=np\]

Binomial_expected <- function(prob, size) {
  E=size*prob
  return(E)
}

Expected_value=Binomial_expected(prob, Sample)
Expected_value
## [1] 1.05

Q2 Find the probability that exactly 4 smokers of the sample will develop a severe lung condition.

The easiest way to calculate this manually is with the Probability Mass Function forumla: \[\Pr(X = x) = p(x) = \binom{n}{x} p^x (1-p)^{n-x},\] which is the same as the pbinomial(x = x, n=sample, p=prob)

dbinom(4,size=Sample, p=prob)
## [1] 0.01088153

There is a way to verify that this is the correct value: We can use the Cumulative Distribution Function formula and the axiom \[F(x) = \Pr(X \leq x) = \sum_{i = 0}^{\lfloor{x}\rfloor}\binom{n}{i} p^i (1-p)^{n-i}\] \[p(x)=F(x)-F(x-1)\] so in this case we can use pbinom(x = x, n=sample, p=prob)`

pbinom(4,size=Sample, prob = prob)-pbinom(3, size=Sample, prob=prob)
## [1] 0.01088153

Since both give the same answer we can be confident that this is the correct number

Q3 Find the probability that more than 2 smokers of the sample will develop a severe lung condition.

Again we have 2 ways to find this: One uses the fact that \[\Pr(X > x) = 1-F(x)\] So we can find the CDF and take one from it

1-pbinom(2, size=Sample, prob=prob)
## [1] 0.07376516

The second, more computationally intensive method involves summing the PMFs together

\[Pr(X > x) = \sum_{i = x+1}^{Max Size} p(i) \]

sum(dbinom(3:Sample, size = Sample, prob=prob))
## [1] 0.07376516

Again they match

Q4 Find the probability that at least 3 but less than 6 smokers of the sample will develop a severe lung condition.

In this case the we want \[Pr(3 \le x < 6 )\] First thing to note is that \[Pr(x<X)= F(x-1)\] Secondly we know that \[Pr(X\le x \le Y ) = F(Y)-F(X-1)\] so we can use

pbinom(5, size=Sample, prob=prob) - pbinom(2, size=Sample, prob = prob)
## [1] 0.07369568

or again we can sum the probabilities being careful to note that it goes to 5 because the question explicity says less than

 sum(dbinom(3:5, size = Sample, prob=prob))
## [1] 0.07369568

Q5 Find the conditional probability that no more than 5 smokers of the sample will develop a severe lung condition given that more than 3 smokers of the sample will develop a severe lung condition.

For this we need to make use of a number of function and capabilities and axioms

Firstly the conditional probability forumla

\[\Pr(A | B) = \frac{\Pr(A \cap B)}{\Pr(B)}.\] so we can infer that

\[\Pr(X\le 5 | X > 3) = \frac{\Pr(3 \le X < 6 ))}{\Pr(X > 3)}= \frac{Pr(X\le 5)-Pr(X \le 3)}{1-Pr(X \le 3)}.\]

Lets Calculate this

(pbinom(5, size=Sample, prob=prob)-pbinom(3, size=Sample, prob=prob))/(1-pbinom(3, size=Sample, prob=prob))
## [1] 0.9942591

We can also use the formula for our Sample size \[Pr(X\le5|X>3)=\frac{Pr(x=4)+Pr(x=5)}{Pr(x=4)+Pr(x=5)+Pr(x=6)+Pr(x=7)}\]

sum(dbinom(4:5, size=Sample, prob=prob))/sum(dbinom(4:Sample, size=Sample, prob=prob))
## [1] 0.9942591

Q6 Create a barplot of the PMF of the random variable representing the number of smokers of the sample who will develop a severe lung condition.

Lets write a function to do this and add some colours to seperate the liklihood of each event occuring

plotBinomPMF <- function(n, p) {
    ##Get parameters
    x <- 0:n
    y <- dbinom(x, size = n, prob=p)
    yLim <- c(0, 1.1*max(y))
    rounding<- round(y, digits=10)
    breaks <- c(0, 0.25*max(y), 0.5*max(y), 0.75*max(y))
    cols<-c("darkgreen", "darkblue", "orange", "darkred")[findInterval(y, vec=breaks)]
    
    # Plot the value
    plot = barplot(y, main='Liklihood of number of Smokers getting a Lung Condition', names.arg = x,ylim=yLim, xlab = "# Smokers with Lung condition", ylab="liklihood of # smokers getting condition", col= cols)
    #Add some labels to the axes
    text(x=plot, y=y, labels = rounding, pos=3, cex=0.5)
    #Add legend to plot
    legend("topright" ,
           legend=c("below 25% of max prob","between 25% and 50%" , "Between 50 and 75%", "over 75%"),
           text.col = c("darkGreen", "darkblue", "Orange", "darkRed") ,
       col=c("darkGreen", "darkblue", "Orange", "darkRed"), 
       lty=1, 
       cex=0.8)
    
    #return (plot)
}

Calling the function

plotBinomPMF(n = Sample,
             p = prob)

we can also plot the Cumulative Density Function for this sample space

plotbinomPDF <- function(Sample, prob) {
  x_C <- 0:Sample
  y_C <- c(0, pbinom(x_C, size=Sample, prob=prob))
  yLim <- c(0, 1.1*max(y_C))
  rounding<- round(y_C, digits=10)
  breaks <- c(0, 0.25, 0.5, 0.99)
  cols<-c("darkred", "orange", "darkblue", "darkgreen")[findInterval(y_C, vec=breaks)]
  CDF <- stepfun(x_C, y_C)
  
  plot1<- plot.stepfun(CDF, main = 'Cumulative Density Function of the above distribution', ylab = 'Probability of max # of smokers with Condition',xlab="maximum number of smokers with condition", ylim = yLim, verticals = FALSE, do.points = TRUE, pch = 16, col.points = cols)
  
  legend("bottomright",
         legend=c("P<0.25","0.25<P<0.5" , "0.5<P<.75", "P> .75"),
           text.col = c("darkRed", "Orange", "Darkblue", "darkgreen") ,
       col=c("darkRed", "orange", "DarkBlue", "darkgreen"),  )
}

Calling the function

plotbinomPDF(7, 0.15)

TASK 2 Mutations in a chromosone follows a Poisson Distribution with mean of 9 For a Poisson distribution we need to have the mean of the distribution \(\lambda\)

lambda <- 9

For a Poisson distribution the Probability Mass Function follows the formula \[Pr(X=x)=f(x) = \frac{\lambda^x e^{-\lambda}}{x!}, \qquad x = 0, 1, 2, \ldots\] We can also use the Cumulative Density Function \[Pr(X \le x)=F(x) = \sum_{i = 0}^{\lfloor{x}\rfloor}\frac{\lambda^i e^{-\lambda}}{i!}, \qquad x \ge 0.\] Q7 Find the probability of observing less than 3 or more than 7 mutations. Here we need \[Pr(X<3) \cap Pr(X>7)\]

For this we can use the CDF but we need to make use of the fact that \[Pr(X>x)=1-Pr(X \le x)\] So the for the above question since the events are independant (cannot have an event being 1 and 2) we can say that probabilities can be summed \[Px(X<3||X>7)=Pr(x \le 2)+[1-Pr(X \le 7)]\]

less3 <- ppois(2, lambda = lambda)
greater7 <- 1-ppois(7, lambda=lambda)
less3+greater7
## [1] 0.6823352

Q8 Find the probability of observing no more than 5 mutations. For the probability of no more than 5 mutations we can use one of 2 ways we can sum the PMF or we can use the CDF PMF

sum(dpois(0:5, lambda = lambda))
## [1] 0.1156905

CDF

ppois(5, lambda= lambda)
## [1] 0.1156905

Q9 Find the probability of observing either more than 5 or less than 2 mutations. \[Pr(X>5|X<2)= 1-Pr(2\le X \le 5)\]

1-(sum(dpois(2:5, lambda)))
## [1] 0.8855436

Q10 Create a barplot of the PMF of the random variable representing the number of mutations (constrain the number of mutations to a maximum of 13).

plotPMFPoisson <- function(max, lambda) {
      x<-0:max
      y<-dpois(x, lambda = lambda)
      yLim <- c(0, 1.1*max(y))
      rounding<- round(y, digits=4)
      breaks <- c(0, max(y)*0.25, max(y)*0.5, max(y)*0.75)
      cols<-c("darkgreen","darkblue", "orange", "darkred")[findInterval(y, vec=breaks)]
      
      
      plot1 = barplot(y,
                      main='Liklihood of number of Mutations on this chromosone', 
                      names.arg = x,
                      ylim=yLim, 
                      xlab = "# mutations", ylab="liklihood of # mutations", col= cols)
      
      text(x=plot1, y=y, labels = rounding, pos=3, cex=0.5)
      legend("topleft",
         legend=c("prob <25% of Max Prob","between 25% and 50% " , "between 50% and 75%", "over 75%"),
           text.col = c("darkGreen", "Dark Blue", "Orange", "darkred") ,
       col=c("darkgreen", "darkblue", "orange", "darkred")  )
      
} 

calling the function with maximum 13 mutations

plotPMFPoisson(13,9)