A discrete distribution describes the probability of occurrence of each value of a discrete random variable, and the latter is defined as a random variable that has countable values, such as a list of non-negative integers. With a discrete probability distribution, each possible value of the discrete random variable can be associated with a non-zero probability.
A continuous distribution describes the probabilities of the possible values of a continuous random variable, which is a random variable with a set of possible values (known as the range) that is infinite and uncountable.
Probabilities of continuous random variables (X) are defined as the area under the curve of its PDF. Thus, only ranges of values can have a nonzero probability. The probability that a continuous random variable equals some value is always zero.
The probability density function is a smooth curve giving the probability distribution of a continuous random variable.
A probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value.
A binomial distribution is a frequency distribution of the possible number of successful outcomes in a given number of trials in each of which there is the same probability of success.The number of observations or trials is fixed and one can only figure out the probability of something happening if one does it a certain number of times.Each observation or trial is independent and none of the trials have an effect on the probability of the next trial.
The formula for binomial is P(k out of n) = [n!/k!(!(n-k)!]* [p^k * (1-p)^(n-k)]
A normal distribution is a function that represents the distribution of many random variables as a symmetrical bell-shaped graph.
light smokers?
Sensitivity = P(T+|D+) = 0.050moderate smokers?
Sensitivity = P(T+|D+) = 0.326heavy smokers?
Sensitivity = P(T+|D+) = 0.652Find the specificity of the test
False positive rate = P(T+|D-) = 0.033 = 1 - specificity
Specificity = P(T-|D-) = 1- False Positive rate = 1- 0.033 = 0.967#5a
dbinom(4, size=10, prob=0.3)
## [1] 0.2001209
#5b
pbinom(4, size=10, prob=0.3)
## [1] 0.8497317
dbinom(4, size=10, prob=0.3) + dbinom(0, size=10, prob=0.3) + dbinom(1, size=10, prob=0.3) + dbinom(2, size=10, prob=0.3) + dbinom(3, size=10, prob=0.3)
## [1] 0.8497317
# n = number of trials, p = probability of surviving 90 days post-op, q = probability of dying
n=10
p=0.7
q=0.3
c(n,p,q)
## [1] 10.0 0.7 0.3
#expected value
ev= n*p
ev
## [1] 7
#variance
variance=n*p*q
variance
## [1] 2.1
#std
std=sqrt(n*p*q)
std
## [1] 1.449138
# Probability of surviving 90 days is 0.7. Probability of those who survive 90 days and are still alive 5 years after surgery is 0.75.
0.75*0.7
## [1] 0.525
dbinom(2, size=10, prob=0.525)
## [1] 0.03214253
Why would the binomial distribution provide an appropriate model? Hint: Remember the acronym B.I.N.S.
The outcome of each trial is binary (e.g. development the disease or no development of disease)
The outcome of each trial is independent (e.g. one phlebotomist developing the disease will not affect another phlebotomist)
There are a fixed number of trials (e.g. five phlebotomists)
Probability of development of disease is the same for each five phlebotomists.What are the parameters of the distribution of the values of X?
n = number of trials, p = probability of developing disease, q = probability of not developing diseaseList the possible values for X
Possible values for x = 0,1,2,3,4,5What is the mean number of phlebotomists who will develop Hep-B via needle stick accident?
# n = number of trials, p = probability of developing disease, q = probability of not developing disease
n=5
p=0.3
q=0.7
c(n,p,q)
## [1] 5.0 0.3 0.7
#expected value
ev= n*p
ev
## [1] 1.5
# n = number of trials, p = probability of developing disease, q = probability of not developing disease
n=5
p=0.3
q=0.7
c(n,p,q)
## [1] 5.0 0.3 0.7
#standard deviation
std=sqrt(n*p*q)
std
## [1] 1.024695
factorial(5)
## [1] 120
choose(5,1)
## [1] 5
dbinom(1, size=5, prob=0.3)
## [1] 0.36015
dbinom(0, size=5, prob=0.3)
## [1] 0.16807
dbinom(3, size=5, prob=0.3)+dbinom(4, size=5, prob=0.3)+dbinom(5, size=5, prob=0.3)
## [1] 0.16308
1- pbinom(2, size=5, prob=0.3)
## [1] 0.16308
dbinom(0, size=5, prob=0.3)+dbinom(1, size=5, prob=0.3)
## [1] 0.52822
pbinom(1, size=5, prob=0.3)
## [1] 0.52822
How many smokers would you expect to see in the study cohort, on average?
2000 adults * 0.193 = 386
What is the standard deviation of the number of smokers in the study cohort?
# n = number of people, p = probability of being a smoker, q = probability of not being a smoker
n=2000
p=0.193
q=1-p
c(n,p,q)
## [1] 2000.000 0.193 0.807
#standard deviation
std=sqrt(n*p*q)
std
## [1] 17.64942
dbinom(386, size=2000, prob=0.193)
## [1] 0.0225986
# 25% of study population is .25*2000 = 500
1- pbinom(499, size=2000, prob=0.193)
## [1] 2.402751e-10
pbinom(499, 2000, 0.193, lower.tail = FALSE, log.p = FALSE)
## [1] 2.402751e-10
# 20% of study population is .20*2000 = 400
pbinom(400, size=2000, prob=0.193)
## [1] 0.7948741