Wk 7 Live Lab, Probability Distribution

1. 1. Binomial Distribution.

R makes the following functions available:

dbinom(x, size, prob)
- probability of x successes in size trials when the probability of success is
pbinom(q, size, prob, lower.tail)
- cumulative probability (lower.tail = TRUE for left tail, lower.tail = FALSE for right tail) of less than or equal to q successes.
rbinom(n, size, prob) returns n random numbers from the binomial distribution
- A function which randomly generates numbers which follow a binomial distribution with given parameters

rbinom(n = 10, size = 1, prob = 0.5)

##  [1] 1 0 0 1 1 0 0 1 1 1

rbinom(1, 100, 0.3)

## [1] 33

A<-rbinom(10000,2,0.08)
head(A)

## [1] 0 0 0 0 0 0

table(A)

## A
##    0    1    2 
## 8505 1424   71

table(A)/10000

## A
##      0      1      2 
## 0.8505 0.1424 0.0071

dbinom(0,2,0.08)#three different lines

## [1] 0.8464

dbinom(1,2,0.08)

## [1] 0.1472

dbinom(2,2,0.08)

## [1] 0.0064

round(dbinom(0:2,2,0.08),4) #all 3 probabilites in one line

## [1] 0.8464 0.1472 0.0064

# Axis titles can be added via the xlab and ylab arguments. 
barplot(table(A)/10000,main="PDF",xlab="X",
          ylab="Probability",col=2)

cumsum(table(A))/10000

##      0      1      2 
## 0.8505 0.9929 1.0000

round(pbinom(0:2,2,0.08),4)

## [1] 0.8464 0.9936 1.0000

plot(0:2,cumsum(table(A))/10000,type='b',
       xlab="X",ylab="Probability", main="CDF")

#change in shape when you incease n to 10
barplot(round(dbinom(0:10,10,0.08),4),main="PDF",xlab="X",
       names.arg = 0:10, ylab="Probability",col=2,ylim=c(0,1))

1.1.1 Worked example

A fair coin is tossed 60 times, if we treat heads as a success, answer the following:

Display the probability distribution function on a graph

A<-rbinom(100000,60,0.5)
head(A)

## [1] 27 24 30 33 34 26

table(A)

## A
##    15    16    17    18    19    20    21    22    23    24    25    26 
##     5    12    41    72   157   396   693  1251  2118  3210  4547  6110 
##    27    28    29    30    31    32    33    34    35    36    37    38 
##  7626  9011  9900 10279  9773  8816  7648  6120  4491  3100  2019  1222 
##    39    40    41    42    43    44    45 
##   715   349   184    84    34    10     7

table(A)/100000 #get prob

## A
##      15      16      17      18      19      20      21      22      23 
## 0.00005 0.00012 0.00041 0.00072 0.00157 0.00396 0.00693 0.01251 0.02118 
##      24      25      26      27      28      29      30      31      32 
## 0.03210 0.04547 0.06110 0.07626 0.09011 0.09900 0.10279 0.09773 0.08816 
##      33      34      35      36      37      38      39      40      41 
## 0.07648 0.06120 0.04491 0.03100 0.02019 0.01222 0.00715 0.00349 0.00184 
##      42      43      44      45 
## 0.00084 0.00034 0.00010 0.00007

barplot(table(A)/100000,main="PDF", xlab="X",
        ylab="Probability",col=2)

p<-round(dbinom(0:60,60,0.5),8) 
barplot(p,main="PDF",names.arg = 0:60, xlab="X",
        ylab="Probability",col=2)

b. What is the probability heads comes up 20 times?

dbinom(20,60,0.5)

## [1] 0.003635846

What is the probability heads comes up 20, 25 or 30 times?

# prob 20 or 25 or 30 heads out of 60
dbinom(20,60,0.5)+
dbinom(25,60,0.5)+
dbinom(30,60,0.5)

## [1] 0.1512435

What is the probability heads comes up less than 20 times?

pbinom(19,60,0.5) #gives P(X<=19)

## [1] 0.003108801

What is the probability heads comes up between 20 and 30 times?

pbinom(30,60,0.5) #gives P(X<=30)

## [1] 0.5512891

pbinom(19,60,0.5) #gives P(X<=19)

## [1] 0.003108801

#p(20<=X<=30)
pbinom(30,60,0.5) - pbinom(19,60,0.5)

## [1] 0.5481803

cum_p<-round(pbinom(0:60,60,0.5),8)

plot(0:60,cum_p,type='l',
     xlab="X",ylab="Probability", main="CDF",ylim=c(0,1))

1.1.1 Assessment 1

Q 3a

A lab network consisting of 20 computers was attacked by a computer virus. This virus enters each computer with probability 0.4, independently of other computers. Find the probability that the virus enters at least 10 computers.

$P(x) = \binom{n}{x} p^{x}(1-p)^{n-x}$

To find the probability of exactly 10 computers being infected

$P(x) = (0.184756) 0.6^{10}(0.4)^{10} = 0.11714155$

dbinom(10, size=20, prob=0.4)

## [1] 0.1171416

Now find find the probability that the virus enters at least 10 computers are infected:

dbinom(10, size=20, prob=0.4) +
dbinom(11, size=20, prob=0.4) +
dbinom(12, size=20, prob=0.4) +
dbinom(13, size=20, prob=0.4) +
dbinom(14, size=20, prob=0.4) +
dbinom(15, size=20, prob=0.4) +
dbinom(16, size=20, prob=0.4) +
dbinom(17, size=20, prob=0.4) +
dbinom(18, size=20, prob=0.4) +
dbinom(19, size=20, prob=0.4) +
dbinom(20, size=20, prob=0.4)

## [1] 0.2446628

Alternatively, we can use the cumulative probability function for binomial distribution pbinom:

1 - pbinom(9, size=20, prob=0.4)

## [1] 0.2446628

Q 5a

Ten percent of computer parts produced by a certain supplier are defective. What is the probability that a sample of 10 parts contains more than 3 defective ones?

$P(x > 3) - 1 - P(X 3) $

1 - pbinom(3, size=10, prob=0.1)

## [1] 0.0127952

1. 2. Geometric Distribution.

1.2.1 Assessment 1

3b

A lab network consisting of 20 computers was attacked by a computer virus. This virus enters each computer with probability 0.4, independently of other computers. A computer manager checks the lab computers, one after another, to see if they were infected by the virus. What is the probability that she must test at least 6 computers to find the first infected one?

$P(x) = p(1-p)^{x-1}$

The geometric distributon in R is zero based so to get P(6):

dgeom(5, prob=0.4)

## [1] 0.031104

But it’s “at least” so we need to sum P(6)..P(20), alternatively:

$P(x \ge 6) = 1 - P(X < 6)$

1 - (dgeom(0, prob=0.4) +
     dgeom(1, prob=0.4) +  
     dgeom(2, prob=0.4) +
     dgeom(3, prob=0.4) +
     dgeom(4, prob=0.4))

## [1] 0.07776

1 - pgeom(4,0.4)

## [1] 0.07776

However, the argument, lower.tail defaults to TRUE, meaning probabilities are P[X <= x], otherwise, P[X > x]. So, we can use lower.tail = FALSE, meaning we compute the probability of an observation greater than 5 (at least 6):

pgeom(4,0.4,lower.tail = FALSE)

## [1] 0.07776

1. 3. Normal Distribution.

1.3.1. Worked example

A random variable comes from a normal distribution with mean 22 and variance 25.

#var is 25
#parameters describe a normal dist are mean and sd
mu<-22
sigma<-sqrt(25)
sigma

## [1] 5

Display the probability density function on a graph.

X<-rnorm(100000,mean=mu,sd=sigma)
hist(X)

plot(density(X),main="Normal PDF",xlab="X")

b. Display the cumulative density function on a graph.

x<-seq(0,44,length=1000) #create a sequence of numbers
x[1:10]

##  [1] 0.00000000 0.04404404 0.08808809 0.13213213 0.17617618 0.22022022
##  [7] 0.26426426 0.30830831 0.35235235 0.39639640

x[990:1000]

##  [1] 43.55956 43.60360 43.64765 43.69169 43.73574 43.77978 43.82382
##  [8] 43.86787 43.91191 43.95596 44.00000

dnorm(x[500:510],mu,sigma)

##  [1] 0.07978768 0.07978768 0.07978149 0.07976911 0.07975054 0.07972579
##  [7] 0.07969487 0.07965777 0.07961452 0.07956511 0.07950957

plot(x,dnorm(x,mu,sigma), type='l',
     main="Normal PDF")

plot(x,pnorm(x,mu,sigma),type='l',
     main="Normal CDF",xlab="X",
     ylab="Cumulative Density")

What is the probability that the random variable lies between 16.2 and 27.5?

pnorm(27.5,mu,sigma)-pnorm(16.2,mu,sigma)

## [1] 0.7413095

What is the probability that the random variable is greater than 29?

pnorm(29,mu,sigma) #p(x<29)

## [1] 0.9192433

1- pnorm(29,mu,sigma)#p(x>29)

## [1] 0.08075666

What is the probability that the random variable is less than 17?

pnorm(17,mu,sigma) #P(X<=17)=P(X<17)

## [1] 0.1586553

What is the probability that the random variable is less than 15 or greater than 25?

1-pnorm(25,mu,sigma) #P(x>25)

## [1] 0.2742531

pnorm(15,mu,sigma) #P(x<15)

## [1] 0.08075666

1-pnorm(25,mu,sigma)+pnorm(15,mu,sigma)

## [1] 0.3550098

#P(X<15 or X>25) = P(X<15)+P(X>25)
pnorm(15,mu,sigma)+(1-pnorm(25,mu,sigma))

## [1] 0.3550098