1 Class Example: Probability Distribution

1.1 Count of heads on a toss of two coins

Let X be the count of heads on a toss of two coins.

Input the data.

x  <- c(  0,  1,   2)
px <- c(.25, .5, .25)

Now, lets plot it.

?barplot

barplot(height = px, 
        names.arg = x, 
        main = "Discrete Distribution", 
        xlab = "Heads", 
        ylab = "Probability"
        )

Calculate the mean and variance of the probability distribution.

1.2 MEAN

# print 
x

## [1] 0 1 2

px

## [1] 0.25 0.50 0.25

1.2.1 3 different ways

# manually calculate the mean
mu <- 0* .25 + 1 *.5 + 2 * .25
mu

## [1] 1

x*px

## [1] 0.0 0.5 0.5

mu2 <-sum(x * px) 
mu2 # should be one too if we calculated things correctly

## [1] 1

mu3 <- x%*%px # dot product in R 
mu3

##      [,1]
## [1,]    1

Dot Product

1.3 VARIANCE

var(x) / sd(x) will assume each outcome is equally probable

## [1] 0 1 2

px

## [1] 0.25 0.50 0.25

# assumes each outcome is equally probable 

var(x)

## [1] 1

sd(x)

## [1] 1

Lets apply the correct weighting instead.

# manually 
sigma2 <- (0-1)^2 *.25 + (1-1)^2 * .50 + (2-1)^2 *.25 
sigma2

## [1] 0.5

sigma2_alt <- sum( (x-mu)^2 * px ) # PEDMAS
sigma2_alt

## [1] 0.5

# Var (X) = N * p * (1-p)
sigma2_alt2 <- 2 * .5 *(1-.5) 
sigma2_alt2

## [1] 0.5

Standard deviation will still be sqrt(variance).

# SD
sigma <- sqrt(sigma2)
sigma

## [1] 0.7071068

moments <- c(mu, sigma2)
moments

## [1] 1.0 0.5

2 Binomial Distribution: Class Example (# of Heads on coin)

Let X be the random variable that tracks the number of heads on n=5 flips of an unfair coin (p = .60). What is the probability of getting exactly 3 heads, P(X=3)?

Let X be the random variable that tracks the number of heads on \(n=5\) flips of an unfair coin (\(\pi = .60\)).

Our solution required us to calculate -

\(P \ (X=3 \ | \ n = 5, \ p = .6)\)

2.1 Logical Approach from 1st Principles

# P (X=3 | n = 5, p = .6)

n <-  5
p <- .6


p^3 * (1-p)^2  # prob of one sequence, say HHHTT

## [1] 0.03456

(1-p)^2 * p^3  # prob of another sequence, say TTHHH

## [1] 0.03456

(1-p)^2 * p^3  # prob of another sequence, say TTHHH...and so on...

## [1] 0.03456

choose(5,3)   # 10 different sequences in total

## [1] 10

# using pmf formula 
choose(5,3) * p^3 * (1-p)^2

## [1] 0.3456

2.1.1 Aside: Mean and Variance of Binomial

### expected mean of binomial
# N * p 
# size * prob 
 5 * .6    # 3 successes

## [1] 3

### expected variance of binomial 
# N * p * (1-p) 
# size * prob * (1-prob)
 5 * .6 * .4    # 1.2

## [1] 1.2

2.2 Calculate directly in R with `dbinom`

Use the dbinom function to calculate this -

?dbinom
dbinom(x = 3, 
       size = 5, 
       prob = .6
       )

## [1] 0.3456

3 Graph

This might help clarify the situation but requires a bit of coding and understanding the binomial distribution.

# Number of trials
n <- 5

# Probability of success
p <- 0.6

# Generate values for x (number of successes)
x <- 0:n

# Calculate the probabilities for each value of x
probabilities <- dbinom(x = x, 
                        size = n, 
                        prob = p
                        )

# Plot the binomial distribution
barplot(height = probabilities, 
        names.arg = x, 
        col = "skyblue", 
        main = "Binomial Distribution", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

4 Binomial Distribution: Class Example (Hospital)

The national 30-day case fatality rate for intracranial hemorrhage (ICH) surgical intervention is 16%. The neurosurgical team performed 20 of these operations last year and experienced 6 deaths. Does sufficient evidence exist to suggest that the neurosurgical practice pattern requires review? In other words, what is the probability that this team would experience something this or more severe, i.e., 6 or more deaths in 20 trials if they were operating at the national rate of .16? Use an evidentiary standard that investigates if the event is rare (routinely defined as less than .05 probability.)

You should get the same answer through all the methods.

Lets plot the graph at the beginning now.

# Number of trials
n <- 20

# Probability of success
p <- 0.16

# Generate values for x (number of successes)
x <- 0:n

# Calculate the probabilities for each value of x (can be calculated for individual values as well instead of the entire x vector)
dbinom(x = 0:20,
           size = 20, 
           prob = .16)

##  [1] 3.059044e-02 1.165350e-01 2.108729e-01 2.409976e-01 1.950933e-01
##  [6] 1.189140e-01 5.662571e-02 2.157170e-02 6.676955e-03 1.695735e-03
## [11] 3.552968e-04 6.152325e-05 8.789035e-06 1.030217e-06 9.811587e-08
## [16] 7.475495e-09 4.449699e-10 1.994263e-11 6.330993e-13 1.269372e-14
## [21] 1.208926e-16

# store them 
probabilities <- dbinom(x = x, 
                        size = n, 
                        prob = p
                        )

# Plot the binomial distribution
barplot(height = probabilities, 
        names.arg = x, 
        col = "green", 
        main = "Binomial Distribution", 
        xlab = "Number of Successes", 
        ylab = "Probability"
        )

4.1 Brute Force - PDFs

Typing equations in R Markdown.

\[\begin{align*} choose(20,6) \* \pi\^6\*(1- \pi)\^14 & \\\\ \\ choose(20,7) \* \pi\^7 \* (1- \pi)\^13 & \\\\ \+ ... + & \\\\ choose(20,20) \* \pi\^20 \* (1- \pi)\^0 \\\\ \end{align*}\]

4.1.1 Add the PDFs directly

#  BRUTE FORCE CALCULATION
## P(X >= 6) = P(X=6) + P(X=7) + ... + P(X=20)
 
 pi <- .16
 
         choose(20,6)  * pi^6  * (1-pi)^14 + 
         choose(20,7)  * pi^7  * (1-pi)^13 + 
         choose(20,8)  * pi^8  * (1-pi)^12 + 
         choose(20,9)  * pi^9  * (1-pi)^11 + 
         choose(20,10) * pi^10 * (1-pi)^10 + 
         choose(20,11) * pi^11 * (1-pi)^9 +
         choose(20,12) * pi^12 * (1-pi)^8 + 
         choose(20,13) * pi^13 * (1-pi)^7 + 
         choose(20,14) * pi^14 * (1-pi)^6 + 
         choose(20,15) * pi^15 * (1-pi)^5 + 
         choose(20,16) * pi^16 * (1-pi)^4 + 
         choose(20,17) * pi^17 * (1-pi)^3 +
         choose(20,18) * pi^18 * (1-pi)^2 + 
         choose(20,19) * pi^19 * (1-pi)^1 + 
         choose(20,20) * pi^20 * (1-pi)^0

## [1] 0.08699685

Easy to type in R though if you combine with some basic functions like dbinom and sum. Else manually tying it would take too much time.

# PMF addition

sum(dbinom(x = 6:20,
           size = 20, 
           prob = .16)
    )

## [1] 0.08699685

4.1.2 Complement Rule and add the PDFs

\[1 = P(X<6) + P(X \geq 6) \] due to sum of the area under a curve (of a discrete variable) is equal to 1/complement rule. This simplifies to \[ \\P(X\geq6) = 1- P(X \leq 5)\] and
\[P(X\geq6) = 1 - P(X \leq 5) \\ = 1 – P(X=0) - P(X=1) – P(X=2) – P(X=3) - P(X=4) – P(X=5) \]

## Manually - use the pdf formula n choose x * pi^x * (1-pi)^x, smart way 
pi <- .16

1 - (choose(20,0) * pi^0 * (1-pi)^20 + 
     choose(20,1) * pi^1 * (1-pi)^19 + 
     choose(20,2) * pi^2 * (1-pi)^18 + 
     choose(20,3) * pi^3 * (1-pi)^17 + 
     choose(20,4) * pi^4 * (1-pi)^16 + 
     choose(20,5) * pi^5 * (1-pi)^15
     ) # can be added due to mutually exclusivity

## [1] 0.08699685

1 - sum(dbinom(x = 0:5,
          size = 20, 
          prob = .16
          )
        )

## [1] 0.08699685

4.2 CDF

We need the area to the right of the curve.

Two ways, either declare the lower tail = FALSE , or apply the complement rule.

4.2.1 Complement rule

Compute the regular CDF and take its complement to find the required area.

?pbinom

# CDF - P(X>=6) = 1- P(X<=5)
1 - pbinom(q = 5, 
           size = 20,
           prob = .16 
           )

## [1] 0.08699685

4.2.2 Lower Tail

Or simply compute the same areas above by changing the default options of lower.tail=TRUE

# CDF - P(X>=6) = 1- P(X<=5)
    pbinom(q = 5, 
           size = 20,
           prob = .16, 
           lower.tail =  FALSE
           )

## [1] 0.08699685

5 Normal Distribution

The normal human temperature is known from old studies to be 98.6 and the standard deviation is 1. What is the probability that a randomly selected person’s temperature will be less than 96? More than 99.5? Between 96 and 98?

5.1 Temperature Example

Lets begin with plotting the normal distribution centerted at 98.6 degrees Celsius and a standard deviation of 1 degree Celsius.

# Set the mean and standard deviation
mu    <- 98.6
sigma <- 1

# Generate a range of values around the mean
?seq
x <- seq(from       =  mu - 3*sigma,
         to         =  mu + 3*sigma, 
         length.out =  1000
         )

# Calculate the probability density function
pdf <- dnorm(x    = x, 
             mean = mu, 
             sd   = sigma
             )

# Plot the normal distribution

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'red', 
     lwd  = 2, 
     xlab = 'Temperature', 
     ylab = 'Density',
     main = 'Normal Distribution with Mean 98.6 and SD 1'
     )

5.1.1 Empirical Rule

dnorm (density norm) - density of an observation on the curve / pdf
pnorm (probability norm) - probability of area under the curve / cdf

Lets apply the empirical rule to check the area under 1, 2 and 3 sd of the mean.

?pnorm

pnorm(q = 98.6 + 1, mean = 98.6, sd = 1 ) - pnorm(q = 98.6 - 1, mean = 98.6, sd = 1 )

## [1] 0.6826895

pnorm(q = 98.6 + 2, mean = 98.6, sd = 1 ) - pnorm(q = 98.6 - 2, mean = 98.6, sd = 1 )

## [1] 0.9544997

pnorm(q = 98.6 + 3, mean = 98.6, sd = 1 ) - pnorm(q = 98.6 - 3, mean = 98.6, sd = 1 )

## [1] 0.9973002

5.2 Under 96 degree Celsius

Lets plot the area as well so that we can eyeball the probability size as well to double check our answer.

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'blue', 
     lwd  = 2, 
     xlab = 'Temperature', 
     ylab = 'Density',
     main = 'Normal Distribution with Mean 98.6 and SD 1'
     )


# Shade the area under the curve for values below 96
x_shade <- seq(from       = mu - 3*sigma, 
               to         = 96, 
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )

?polygon
polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

# Draw a vertical line at 98.6
abline(v = 98.6, col = 'darkgreen', lty = 2)
abline(v = 97.6, col = 'green', lty = 2)
abline(v = 99.6, col = 'green', lty = 2)
abline(v = 96.6,  col = 'cyan', lty = 2)
abline(v = 100.6, col = 'cyan', lty = 2)
abline(v = 95.6,  col = 'yellow', lty = 2)
abline(v = 101.6, col = 'yellow', lty = 2)

Lets calculate the probability -

pnorm(q = 96, mean = 98.6, sd = 1)

## [1] 0.004661188

# explicitly state the lower tail argument
pnorm(q = 96, mean = 98.6, sd = 1, lower.tail = TRUE)

## [1] 0.004661188

1 - pnorm(q = 96, mean = 98.6, sd = 1, lower.tail = FALSE)

## [1] 0.004661188

5.3 Over 99.5 degree Celsius

Lets plot the area as well so that we can eyeball the probability size as well to double check our answer.

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'blue', 
     lwd  = 2, 
     xlab = 'Temperature', 
     ylab = 'Density',
     main = 'Normal Distribution with Mean 98.6 and SD 1'
     )


# Shade the area under the curve for values below 96
x_shade <- seq(from       = 99.5, 
               to         = mu + 3*sigma, 
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )

polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

# Draw a vertical line at 98.6
abline(v = 98.6, col = 'darkgreen', lty = 2)
abline(v = 97.6, col = 'green', lty = 2)
abline(v = 99.6, col = 'green', lty = 2)
abline(v = 96.6,  col = 'cyan', lty = 2)
abline(v = 100.6, col = 'cyan', lty = 2)
abline(v = 95.6,  col = 'yellow', lty = 2)
abline(v = 101.6, col = 'yellow', lty = 2)

Lets calculate the probability -

pnorm(q = 99.5, mean = 98.6, sd = 1, lower.tail = FALSE)

## [1] 0.1840601

1 - pnorm(q = 99.5, mean = 98.6, sd = 1)

## [1] 0.1840601

# explicitly state the lower tail argument
1 - pnorm(q = 99.5, mean = 98.6, sd = 1, lower.tail = TRUE)

## [1] 0.1840601

5.4 Between 96 and 98 degree Celsius

Lets plot the area as well so that we can eyeball the probability size as well to double check our answer.

plot(x    = x, 
     y    = pdf,
     type = 'l', 
     col  = 'blue', 
     lwd  = 2, 
     xlab = 'Temperature', 
     ylab = 'Density',
     main = 'Normal Distribution with Mean 98.6 and SD 1'
     )


# Shade the area under the curve for values below 96
x_shade <- seq(from       = 96, 
               to         = 98, 
               length.out = 100
               )

pdf_shade <- dnorm(x    = x_shade, 
                   mean = mu, 
                   sd   = sigma
                   )
?rev
polygon(x      =  c(x_shade, rev(x_shade)),
        y      =  c(pdf_shade, rep(x      = 0, 
                                   times  = length(pdf_shade)
                                  )
                   ), 
        col    = 'red', 
        border = NA
        )

# Draw a vertical line at 98.6
abline(v = 98.6, col = 'darkgreen', lty = 2)
abline(v = 97.6, col = 'green', lty = 2)
abline(v = 99.6, col = 'green', lty = 2)
abline(v = 96.6,  col = 'cyan', lty = 2)
abline(v = 100.6, col = 'cyan', lty = 2)
abline(v = 95.6,  col = 'yellow', lty = 2)
abline(v = 101.6, col = 'yellow', lty = 2)

Lets calculate the probability -

pnorm(q = 98, mean = 98.6, sd = 1) - pnorm(q = 96, mean = 98.6, sd = 1)

## [1] 0.2695919

# explicitly state the lower tail argument
pnorm(q = 98, mean = 98.6, sd = 1, lower.tail = T) - pnorm(q = 96, mean = 98.6, sd = 1, lower.tail = T)

## [1] 0.2695919

1 - pnorm(q = 98, mean = 98.6, sd = 1, lower.tail = F) - (1 - pnorm(q = 96, mean = 98.6, sd = 1, lower.tail = F))

## [1] 0.2695919

6 Temperature of the median human? Continued Example from above.

In the context of a normal distribution with a mean of 98.6 and a standard deviation of 1, you can use the quantile function qnorm to calculate the temperature corresponding to the median (50th percentile) or any other quantile.

?qnorm # quantile function
qnorm(p = .5, 
      mean = 98.6, 
      sd = 1
      )

## [1] 98.6

This code is using the qnorm function to find the temperature corresponding to the 50th percentile (median) in a normal distribution with a mean of 98.6 and a standard deviation of 1.

The result would give you the temperature at which 50% of the population has a lower temperature, and 50% has a higher temperature, assuming a normal distribution. This would be the median temperature in this specific distribution.

Feel free to modify the p parameter in qnorm to calculate the temperature corresponding to different percentiles (e.g., p = 0.95 for the 95th percentile).

qnorm(p = .95, 
      mean = 98.6, 
      sd = 1
      )

## [1] 100.2449

This seems to be a very high temperature.

qnorm(p = .05, 
      mean = 98.6, 
      sd = 1
      )

## [1] 96.95515

This seems to be a very low temperature.

7 Poisson

The rate of medication errors across the United States is 2 per 1000 orders (volume). A sample from your local hospital finds 5 errors in a sample of 2000. What is the probability that this hospital is within the US standard? In other words, what is the probability of having 5 or more errors.

Let X be count of errors in 1000 orders.

We have 5 errors in sample of 2000.

\(Rate=2\), and \(t=\dfrac{2000}{1000}=2\).

Thus, \(E(X)=Var(X)=\lambda*t=2*2=4\).

t          <-    2        # 2000 sample / 1000 orders
lambda_t   <-    2 * t    # lambda = 2 (2 errors in 1000 orders); but there are 2000 orders in sample so scale lambda by time
   lambda_t

## [1] 4

# P (Y >=5 | lambda * t = 4) 

 
################################################ 
### Method 1: brute force
sum(dpois(x      = 5:2000, 
          lambda = lambda_t)
    )

## [1] 0.3711631

################################################  
### Method 2: Lower Tail TRUE
### P(Y >= 5 errors | lambda=2, t=2) = 1–P(Y<=4) = 1-ppois(y=4, lambda*t = 4)
   
1 - ppois( q          = 4,  # discrete variable adjustment - read lower tail argument specification P[X≤x]
           lambda     = lambda_t, 
           lower.tail = TRUE
           )

## [1] 0.3711631

################################################  
### Method 3: Lower Tail False / Upper Tail
### P(Y >= 5 errors | lambda=2, t=2) 
 ppois( q          = 4,   # discrete variable adjustment - read lower tail argument specification P[X>x]
        lambda     = lambda_t, 
        lower.tail = FALSE
        )

## [1] 0.3711631

 ?ppois
 
 
 
 ?rpois # random sample to generate distribution
hist(rpois(n      = 10000,
           lambda = 4)
      ) # mean of 10000 samples with lambda  4 plotted as histogram

Poisson is rate limiting distribution of Binomial…
Binomial will approximate well the Poisson distribution if \(n\) is large and \(\pi\) is small…If you reduce the sample size or probability of success, the numbers will be off. You can try it by chnagig the numbers below.

N  <- 2000
pi <- 2/1000  # rate of medication error is 2 in 1000 orders, the truth

lambda_approx <- N * pi # alternatively
   lambda_approx

## [1] 4

   lambda_t

## [1] 4

sum(dbinom(x    = 5:2000, 
           size = 2000, 
           prob = pi)   # .002
    )

## [1] 0.371163

sum(dpois(x    = 5:2000, 
          lambda = lambda_approx) # .002
    )

## [1] 0.3711631

pbinom(q          = 4 ,
       size       = 2000,
       prob       = pi, 
       lower.tail = F
       )

## [1] 0.371163

ppois(q          = 4 ,
      lambda     = lambda_approx, 
       lower.tail = F
       )

## [1] 0.3711631

7.1 SUMMARY OF THE POISSON QUESTION IN COMMANDS

7.1.1 A. Model as a Poisson

## A.  Model as a Poisson (# .371)
### P(Y >= 5 errors  | lambda = 2, t=2) = 1 – P(Y<=4) = 1-ppois(4,4)

sum(dpois(x = 5:2000, lambda = 4) )                      # brute force

## [1] 0.3711631

1 - ppois(q = 4,lambda = 4,lower.tail=T)  # adjust for discrete variable !!!

## [1] 0.3711631

ppois(q = 4,lambda = 4,lower.tail=F)      # adjust for discrete variable !!!

## [1] 0.3711631

7.1.2 B. Model as a Binomial

## B.  Model as a Binomial (# .371)
### P(X>=5 errors | n = 2000, p= .002) = 1-P(Y<=4) = 1-pbinom(4,2000,.002)

sum(dbinom(5:2000,size = 2000,prob = .002))           # brute force

## [1] 0.371163

1 - pbinom(4,2000,.002)                 # adjust for discrete variable !!!

## [1] 0.371163

pbinom(4,2000,.002,lower.tail=F)        # adjust for discrete variable !!!

## [1] 0.371163

Recall the Poisson distribution is an approximation of normal distribution when \(N\) is large and \(\pi\) is small.

7.2 Plot Poisson

# Parameters
lambda <- 4  # Mean of the Poisson distribution

# Generate x values (number of events)
x_values <- 0:50

# Calculate the PMF using dpois
pmf_values <- dpois(x = x_values, 
                    lambda = lambda
                    )

# Plot the PMF
plot(x = x_values, 
     y = pmf_values, 
     type = "h", 
     lwd = 2, 
     col = "blue",
     xlab = "Number of Events", 
     ylab = "Probability",
     main = "Poisson Distribution PMF")

7.3 Plot Binomial

# Parameters
N  <- 2000
pi <- 2/1000  # rate of medication error is 2 in 1000 orders, the truth

# Generate x values (number of events)
x_values <- 0:50

# Calculate the cumulative probabilities using ppois
pmf_values <- dbinom(x = x_values,
                     size = 2000, 
                     prob = pi
                     )

# Plot the CDF as a "step" plot
plot(x = x_values, 
     y = pmf_values, 
     type = "h", 
     lwd = 2, 
     col = "red",
     xlab = "Number of Events", 
     ylab = "Probability",
     main = "Binomial Distribution PMF"
     )

8 Hypergeometric Distribution

# 6 Choose 6 * 48 Choose 0 / 54 Choose 6 
choose(6,6) * choose(48,0)/ choose(54,6)

## [1] 3.871892e-08

?dhyper # (white balls successes, black balls are failure )
dhyper(x = 6, 
       m = 6,
       n = 48, 
       k = 6
       )

## [1] 3.871892e-08

# choose 3 or more correct balls to get the reward - mutually exclusive events so rule of addition applies.
# Prob (X >=3) 
# Prob (3, 4, 5 or 6 numbers) 

# Brute force - 

  choose(6,3) * choose(48,3) / choose(54,6) + 
  choose(6,4) * choose(48,2) / choose(54,6) +   
  choose(6,5) * choose(48,1) / choose(54,6) + 
  choose(6,6) * choose(48,0) / choose(54,6) # [1] 0.01405996

## [1] 0.01405996

sum(dhyper(x = 3:6, m = 6, n = 48, k = 6 ))

## [1] 0.01405996

1-phyper(q = 2, m = 6, n = 48, k = 6) # discrete distribution

## [1] 0.01405996

9 Extra

From Discovering Business Statistics.

## Binomial dist 
# Prob Dist:  P(X = 14 events, n trials = 19, success prob = .6) 
dbinom(x = 14, size = 19, prob = .6)

## [1] 0.09330877

# Prob Dist:  P(X <= 3  events, n trials=7, success prob = .95) 
pbinom(q = 3, size = 7,prob = .95)

## [1] 0.0001935781

1-pbinom(q = 3, size = 7,prob = .95, lower.tail = F) # because sum of area under distribution is one.

## [1] 0.0001935781

# Prob Dist:  P(X >= 5 events, n trials=14, success prob = .4) 
sum(dbinom(x = 5:14, size = 14,prob = .4))

## [1] 0.720743

1 - pbinom(q = 4, size = 14,prob = .4)  # because binomial is a discrete distribution, thus adjust X by one.

## [1] 0.720743

## Poisson dist 
# Prob Dist:  P(X = 4 events, lambda = 4.3, t = 1 )      
dpois(x = 4, lambda = 4.3)

## [1] 0.1932842

# Prob Dist:  P(X = 3 events, lambda = 3, t = 2 )   
dpois(x = 3, lambda = 6)

## [1] 0.08923508

# Prob Dist:  P(X = 4 events, lambda = 4, t = 3 )   
dpois(x = 3, lambda = 6)

## [1] 0.08923508

## HyperGeometric 

# 10 black balls and 7 red balls, choose 4 balls from 17 without replacement, prob that exactly 1 black ball drawn?

#Prob Dist:  P( )   
choose(10,1) * choose(7,3) / choose(17,4)

## [1] 0.1470588

# P(X = 1 black ball | S=10 , F= 17-10=7, n=4, N=17)
dhyper(x = 1, m = 10,  n = 7 ,k = 4)

## [1] 0.1470588

choose(4,3)

## [1] 4

factorial(3)

## [1] 6

choose(5,2) * choose(4,1) * factorial(3)

## [1] 240

Distributions

Arvind Sharma

2024-03-25

1 Class Example: Probability Distribution

1.1 Count of heads on a toss of two coins

1.2 MEAN

1.2.1 3 different ways

1.3 VARIANCE

2 Binomial Distribution: Class Example (# of Heads on coin)

2.1 Logical Approach from 1st Principles

2.1.1 Aside: Mean and Variance of Binomial

2.2 Calculate directly in R with `dbinom`

3 Graph

4 Binomial Distribution: Class Example (Hospital)

4.1 Brute Force - PDFs

4.1.1 Add the PDFs directly

4.1.2 Complement Rule and add the PDFs

4.2 CDF

4.2.1 Complement rule

4.2.2 Lower Tail

5 Normal Distribution

5.1 Temperature Example

5.1.1 Empirical Rule

5.2 Under 96 degree Celsius

5.3 Over 99.5 degree Celsius

5.4 Between 96 and 98 degree Celsius

6 Temperature of the median human? Continued Example from above.

7 Poisson

7.1 SUMMARY OF THE POISSON QUESTION IN COMMANDS

7.1.1 A. Model as a Poisson

7.1.2 B. Model as a Binomial

7.2 Plot Poisson

7.3 Plot Binomial

8 Hypergeometric Distribution

9 Extra

Distributions

Arvind Sharma

2024-03-25

1 Class Example: Probability Distribution

1.1 Count of heads on a toss of two coins

1.2 MEAN

1.2.1 3 different ways

1.3 VARIANCE

2 Binomial Distribution: Class Example (# of Heads on coin)

2.1 Logical Approach from 1st Principles

2.1.1 Aside: Mean and Variance of Binomial

2.2 Calculate directly in R with dbinom

3 Graph

4 Binomial Distribution: Class Example (Hospital)

4.1 Brute Force - PDFs

4.1.1 Add the PDFs directly

4.1.2 Complement Rule and add the PDFs

4.2 CDF

4.2.1 Complement rule

4.2.2 Lower Tail

5 Normal Distribution

5.1 Temperature Example

5.1.1 Empirical Rule

5.2 Under 96 degree Celsius

5.3 Over 99.5 degree Celsius

5.4 Between 96 and 98 degree Celsius

6 Temperature of the median human? Continued Example from above.

7 Poisson

7.1 SUMMARY OF THE POISSON QUESTION IN COMMANDS

7.1.1 A. Model as a Poisson

7.1.2 B. Model as a Binomial

7.2 Plot Poisson

7.3 Plot Binomial

8 Hypergeometric Distribution

9 Extra

2.2 Calculate directly in R with `dbinom`