Probability Density Functions (pdfs):
Cumulative Distribution Functions (cdfs):
Example:
Take a 6-sided dice for example, the probability of each side is \(\frac{1}{6}\).
The probability density function of 1 is the probability of landing on 1 which is \(\frac{1}{6}\) or 16.67%. And the probability density function 2, 3, 4, 5, 6, respectively are all the same which is equal to \(\frac{1}{6}\).
The cumulative distribution of 1 is the probability that the next roll will take a value less than or equal to 1 and the probability of that is \(\frac {1}{6}\) or 16.67% because of the only to get is to throw a 1. The cumulative distribution of 2 is \(\frac{1}{6} + \frac{1}{6} = 1/3\) or 33.33% as there are two ways of getting a 2 or below.
PDF & CDF Graphing Examples
# Graph pdf
x<-seq(from=-3,to=+3,length.out=100)
plot(x,dnorm(x))
# Graph cdf
sample <- data.frame(x = c(-3,3))
ggplot(sample, aes(x = x)) +
stat_function(fun = pnorm)
Description: A distribution that is center symmetrically around the mean of the data set, the data closer to the mean are more frequent in occurrence than data further away from the mean.
Parameters: The mean (\(\mu\)), The standard deviation (\(\sigma\)),
68% of data are within +/- 1 standard deviation away from the mean
95% of data are within +/- 2 standard deviation away from the mean
99.7% of data are within +/- 3 standard deviation away from the mean
Example: Individual testosterone level, SAT scores, Shoe size, Birth weight…
Description: A statistical distribution that summarizes the probability of observing a certain outcome when performing a series of tests for which there are only two possible outcome.
Parameters: Success rate(p), total number of observations is fiexd, each observation is independent, each observation can only represent one of two outcomes.
Example: coin flip, statistical results that can be answered by T/F.
Description: A probability distribution that is used to show how many times an event is likely to occur over a specified period of time. It is a discrete function which means that the variable can’t take all values in any continuous range, for example the whole numbers, 1, 2, 3, 4…
Parameters: Average event rate (\(\lambda\)t), \(\lambda\) is the rate, t is the interval
Example: Number of cars pass an intersection in one cycle of traffic light
We set the following parameters for the problem:
Total number of procedures (N): 20
Total number of death resulted from this procedure (x): 3
The success rate = the death rate (\(\pi\)) : 0.45
Binomial Model:
# Set parameters
n <- 20
x <- 3
p <- .45
pbinom(3,20,0.45,lower.tail = F)
## [1] 0.9950666Poisson Model
# applied poisson model function
ppois(
q=0.45,
lambda = n*p,
lower.tail = FALSE,
log.p = FALSE
)
## [1] 0.9998766