11/19/2024

Intro to Probablility Distribution

What is a probability distribution? - A probability distribution assigns probabilities to each possible value of a random of a random variable.

  • It provides a more complete picture of how likely values are spread across a range.

  • Common Applications:

    • Health Science: Modeling the spread of a disease

    • Financial Industry: Risk assessment or stock performance assessment

    • Environmental Science: Modeling weather patterns like rainfall or temperature distributions

Types of Probability Distributions

Probability distributions fall into two main categories:

  • Discrete Distributions: Defined for distinct and countable values. Example: Binomial distribution, where the probability of success is \[P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}\]

  • Continuous Distributions: Defined over a continuous range of values. Example: normal distribution, with density function \[f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}\]

These categories help guide the choice of distribution in real-world scenarios.

The Normal Distribution

The Normal Distribution is a continuous probability distribution that is symmetric around the mean, givint it a bell shaped curve. It is defined by two parameters:

  • Mean (\(\mu\)) - Center of the distribution

  • Standard Deviation (\(\sigma\)) - Controls the spread

Properties:

  • Approximately 68% of values fall within 1 standard deviation of the mean.

  • Approximately 95% of values fall within 2 standard deviations.

  • Approximately 99.7% of values fall within 3 standard deviations.

Standard Deviation Formula

  • Formula: \[ \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2} \] where:

  • \(N\) is the number of data points in the population

  • \(\mu\) is the population mean.

Normal Distribution Plot

Normal Distribution Density Plot

The Binomial Distribution

The Binomial Distribution models the number of successes in a fixed number of events, where each event has the same probability of success.

Example: Flipping a coin 10 times and measuring how many times it lands on heads.

Key Parameters:

  • n: Number of flips

  • p: Probability of success

The Binomial Distribution Plot Code

library(ggplot2)

#Parameters
n = 10
p = .5

#Data Generation
x = 0:n
prob = dbinom(x, size = n, prob = p)
data = data.frame(x, prob)

#Plot
ggplot(data, aes(x = x, y = prob)) +
  geom_bar(stat = "identity", fill = "darkseagreen3") +
  labs(title = "Binomial Distribution (n = 10, p = 50%)", 
       x = "Numbers of Successes",
       y = "PRobability")

The Binomial Distribution Plot

Conclusion

  • Probability Distributions are essential tools in statistics and are widely used in many fields of science and commerce.

    • Normal Distribution: Continuous, symmetric, commonly used for naturally occurring data.

    • Binomial Distribution: Discrete data, used for modeling probabilities of success in series of events.

  • Key Point: Choosing the correct distribution allows for more accurate data modeling and analysis.