Probability Distribution

2024-10-20

Introduction to Probability Distribution

A probability distribution shows how likely different outcomes are in a data set. They help us understand patterns and make predictions. Whether it’s predicting movie ratings or understanding sales data, probability distributions are key in statistics. There are a few main types of probability distributions: Normal distributions, and discrete distributions like Binomial and Poisson.

Normal Distribution

Normal distribution is a continuous probability distribution that is symmetric about the mean. It describes data that clusters around a central value with no bias left or right. The shape of the Normal distribution is often referred to as a bell curve.

The probability density function (PDF) for the Normal distribution is : \[f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^{2}}{2\sigma^{2}}}\]

Example of a Normal Distribution Using Movie Ratings

This histogram shows us how often different ratings occur. The distribution indicates that most movie ratings cluster around the average, with fewer movies receiving very low or very high ratings.

Plot

ggplot(movies, aes(x = rating)) +
  geom_histogram(binwidth = 0.5, 
  fill = "palegreen", color = "black", alpha = 0.7) +
  labs(title = "Distribution of Movie Ratings",
  x = "Rating", y = "Frequency") +
  theme_minimal()

Discrete Probability Distributions

Discrete distribution describes the probability of outcomes in a countable sample space. The most common discrete distribution is the Binomial distribution, which models the number of successes in a fixed number of trials.

The probability mass function (PMF) for the Binomial distribution is: \[P(X = k) = \frac{e^{-\lambda} \lambda^{k}}{k!}\]

Example of a Binomial Distribution Using Movie Ratings

This considers a scenario where you want to calculate the probability of finding a certain number of movies with a high rating (e.g., rating ≥ 8) in a sample of 10 movies. The bar chart displays the probabilities of obtaining various counts of high-rated movies, revealing that getting a few high-rated movies is more likely than getting many.

Plot

3D distribution of movie ratings

This plot will show how the probability of different ratings changes with respect to frequency, providing an interactive view of the data.

Conclusion

In summary, understanding probability distributions is essential for analyzing data and making predictions. We explored the Normal and Binomial distributions using movie ratings, which allowed us to visualize how ratings are distributed among films. Probability distributions not only help us analyze current data but also guide future decisions in various fields, from entertainment to business.

References

Probability Distribution info: https://www.scribbr.com/statistics/probability-distributions/

Movie data set: https://cran.r-project.org/web/packages/ggplot2movies/ggplot2movies.pdf