- The normal distribution, also known as the Gaussian distribution, is one of the most important probability distributions in statistics.
- It is characterized by its bell-shaped curve and is symmetric around the mean.
2024-10-21
The probability density function (PDF) of a Normal Distribution is given by:
\[ f(x|\mu, \sigma^2) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
Where \(\mu\) is the mean and \(\sigma\) is the standard deviation.
We can visualize the Normal Distribution using ggplot2. Below is the plot of the Standard Normal Distribution (mean = 0 and standard deviation = 1).
The Standard Normal Distribution is a special case of the Normal Distribution. It has a mean of 0 and a standard deviation of 1.
Any Normal Distribution can be transformed into a Standard Normal Distribution using the formula:
\[ Z = \frac{X - \mu}{\sigma} \]
Where \(Z\) is the standard score (z-score), \(X\) is the value from the original Normal Distribution, \(\mu\) is the mean of the original distribution, and \(\sigma\) is the standard deviation of the original distribution
The Z-score represents how many standard deviations a data point \(X\) is from the mean \(\mu\).
The Normal Distribution is widely used. Let’s use ggplot2 to visualize the iris dataset, where the sepal lengths are normally distributed.
# Load the iris dataset
data(iris)
# Generate data for sepal length
sepal_length <- iris$Sepal.Length
density_sepal_length <- dnorm(
sort(sepal_length), mean=mean(sepal_length), sd=sd(sepal_length))
data_sepal_length <- data.frame(
Sepal.Length = sort(sepal_length),
Density = density_sepal_length)
library(ggplot2)
ggplot(data_sepal_length, aes(Sepal.Length, Density)) +
geom_line(color="purple", linewidth=1) +
labs(title="Distribution of Sepal Length in Iris Dataset",
x="Sepal Length",
y="Density") +
theme_minimal() +
geom_vline(xintercept = mean(sepal_length), linetype = "dashed",
color = "red") +
geom_vline(xintercept = c(mean(sepal_length) - sd(sepal_length),
mean(sepal_length) + sd(sepal_length)),
linetype = "dashed", color = "green") +
geom_vline(xintercept = c(mean(sepal_length) - 2 *
sd(sepal_length), mean(sepal_length) +
2 * sd(sepal_length)),
linetype = "dashed", color = "orange") +
annotate("text", x = mean(sepal_length), y = 0.05,
label = paste("Mean =", round(mean(sepal_length), 2)),
vjust = -1, color = "red", size = 4)
Let’s use Plotly to chart a 3D visualization of the mtcars dataset. This is the relationship between miles per gallon and horsepower. Adding a Z axis representing PDF density shows that it follows a 3D Normal Distribution.
