A Short Essay Describing Normal, t, chi-square, and F
Distributions, Their Assumptions, and Their Connections
Develop a clear technical understanding of nonparametric
cumulative distribution function (CDF) estimation and various kernel
density estimators.
Translate mathematical formulas into R functions and apply them
to solve related problems.
Create effective visualizations to demonstrate your understanding
of key concepts in the following questions.
The Normal Distribution
The Normal distribution is a continuous, unimodal distribution that
is characterized by its symmetric, bell-shaped curve. A Normal
distribution is characterized by two values, its mean, \(\mu\) and its variance, \(\sigma^2\). For instance, a Normal
distribution is written as \(N(\mu,
\sigma^2)\). A Standard Normal is defined as a Normal
distribution with \(\mu\) = 0 and \(\sigma^2\) = 1. This would be written as
\(N(0, 1)\).
For a random sample of \(X_1, X_2, \ldots,
X_n\), we would be interested in finding the sample mean, \(\bar{X}\), as an estimator of \(\mu\). In this case, the mean of the
distribution of \(\bar{X}\) would still
be \(\mu\). However, the standard
deviation would be found by \(\sigma /
\sqrt{n}\). So, this would be written as \(N\left(\mu,
\frac{\sigma}{\sqrt{n}}\right)\). This value can be standardized
by finding the Z-score. This Z-score represents how many standard
deviations an observation is away from the mean. A positive Z-score
means an observation is to the right of the mean, and a negative Z-score
means that an observation is to the left of the mean. In this case, Z =
\(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\). Once
this Z-score is calculated then we have a standardized value with N(0,
1) as seen in the Standard Normal.
Below is a visualization of several Normal distribution curves with
different means and variances to show how these values shift the
appearance of a Normal curve. This visualization includes a Standard
Normal curve with a mean of 0 and a variance of 1. Additionally, the
visualization includes two other Normal distributions with a mean of 0,
but with different variances. One of these distributions has a variance
of 4, and it can be seen how this curve is much flatter and wider than
the Standard Normal. The other of these two distributions has a variance
of 0.25, and it can be seen how this distribution is much more narrow
with a sharper and higher peak than the Standard Normal. This shows that
when a Normal distribution has a variance greater than that of a
Standard Normal, the curve becomes wider, but if it has a variance less
than that of a Standard Normal, the curve becomes narrower. Finally,
there is one more Normal distribution which has a mean of 2, and a
variance of 1. It can be seen that this distribution has the same spread
as a Standard Normal, due to having an equivalent variance, but is
shifted two units to the right due to having a mean of 2 rather than 0.
This shows that the mean of a Normal distribution affects how the curve
is shifted from that of a Standard Normal. A distribution with a
positive mean would be shifted to the right, while a distribution with a
negative mean would be shifted to the left. Overall, this visualization
shows how Normal distribution curves change based upon changes to their
mean and variance.
x <- seq(-6, 6, length = 1000)
# Standard Normal: mean = 0, var = 1
y_standard <- dnorm(x, mean = 0, sd = 1)
# mean = 0, var = 4
y_wide <- dnorm(x, mean = 0, sd = 2)
# mean = 0, var = 0.25
y_narrow <- dnorm(x, mean = 0, sd = 0.5)
# mean = 2, var = 1
y_shifted <- dnorm(x, mean = 2, sd = 1)
plot(x, y_standard,
type = "l",
lwd = 3,
col = "purple",
ylim = c(0, max(y_narrow)),
main = expression("Normal Distributions with Different Values of " * mu * " and " * sigma^2),
xlab = "x",
ylab = "Density")
lines(x, y_wide, col = "lightblue", lwd = 3, lty = 2)
lines(x, y_narrow, col = "green", lwd = 3, lty = 3)
lines(x, y_shifted, col = "pink", lwd = 3)
legend("topright",
legend = c(
expression(mu == 0 ~ "," ~ sigma^2 == 1),
expression(mu == 0 ~ "," ~ sigma^2 == 4),
expression(mu == 0 ~ "," ~ sigma^2 == 0.25),
expression(mu == 2 ~ "," ~ sigma^2 == 1)
),
col = c("purple", "lightblue", "green", "pink"),
lty = c(1, 2, 3, 1),
lwd = 3,
bty = "n")

The Normal distribution is defined from \(-\infty\) to \(\infty\).
Assumptions of a Normal Distribution
In order to use a Normal distribution, the following assumptions must
be met:
The observations are independent from one another.
The dependent variable must be continuous.
The sample errors are normally distributed.
The sample size is sufficiently large enough.
Going off of the last assumption, the exact number to be sufficiently
large enough can vary, but often is given as n > 30. The importance
of this is seen through one of the most fundamental theorems in
statistics, the Central Limit Theorem (CLT). This theorem states that
the distribution of the sampling mean approaches a normal distribution
as the sample size becomes sufficiently large enough. This occurs
regardless of the distribution of the population as long as the sample
size is sufficiently large enough. Typically n = 30 is the value used in
statistics as the marker of a sufficiently large population, however
this can vary as a highly skewed population distribution would likely
need a much larger sample size to achieve an approximately normal
sampling distribution.
Below shows a visualization of how the CLT applies to a sampling
distribution. In this visualization, a sample is done three times, first
with n = 5, then n = 30, and then n = 100. This shows how as sample size
increases and becomes sufficiently large, the sampling distribution
begins to follow that of a Normal distribution.
set.seed(123)
n_values <- c(5, 30, 100)
par(mfrow = c(2, 3))
for (n in n_values) {
sample_means <- replicate(1000, mean(rexp(n)))
hist(sample_means,
probability = TRUE,
breaks = 30,
col = "lavender",
border = "purple",
main = paste("Sampling Distribution (n =", n, ")"),
xlab = "Sample Mean")
lines(density(sample_means), lwd = 2, col = "purple")
curve(dnorm(x, mean(sample_means), sd(sample_means)),
add = TRUE, col = "darkmagenta", lwd = 2, lty = 2)
}
par(mfrow = c(1, 1))

As we can see, as the sample size, n, increases, the sample
distribution begins to become more like that of a Normal distribution
regardless of the population distribution.
The t-Distribution
The t-Distribution is a continuous, unimodal distribution with a
symmetric, bell-shaped curve. This type of curve appears similar to that
of a Normal distribution, however a t-distribution curve has a flatter
shape and thicker tails in comparison. A t-distribution is used over a
Normal distribution in the case that the population standard deviation
is unknown. Additionally, a t-distribution would also be the ideal
choice if the sample size is small, typically n < 30. So, while a
Normal distribution would have a known population standard deviation, a
t-distribution would have an unknown population standard deviation.
From a random sample of \(X_1, X_2, \ldots,
X_n\), let the sample mean \(\bar{X} =
\frac{1}{n}\sum_{i=1}^{n} X_i\). In this case, the population
standard deviation is unknown, so we are interested in using a
t-distribution. It turns out that t = \(\frac{\bar{X} - \mu}{s / \sqrt{n}}\). Where
\(s\) is the sample standard deviation,
and \(S^2\) the sample variance, where
\(S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i -
\bar{X})^2\).
An important characteristic of a t-distribution is the degrees of
freedom. The degrees of freedom, often represents as v, equals n-1 where
n is the sample size. This is the key parameter of a t-distribution, as
the degrees of freedom will be a fixed value when the sample size is
known. The visualization below shows t-distribution curves for various
degrees of freedom values. A Normal curve is included for
comparison.
x <- seq(-4, 4, length = 1000)
y_df2 <- dt(x, df = 2)
y_df5 <- dt(x, df = 5)
y_df30 <- dt(x, df = 30)
y_df50 <- dt(x, df = 50)
y_norm <- dnorm(x)
y_max <- max(y_df2, y_df5, y_df30, y_df50, y_norm)
plot(x, y_df2,
type = "l",
lwd = 2,
col = "purple",
ylim = c(0, y_max),
main = "t-Distributions with Different Degrees of Freedom",
ylab = "Density",
xlab = "x")
lines(x, y_df5, lwd = 2, col = "lightblue")
lines(x, y_df30, lwd = 2, col = "green")
lines(x, y_df50, lwd = 2, col = "brown", lty = 2)
lines(x, y_norm, lwd = 2, col = "pink")
legend("topright",
legend = c("df = 2", "df = 5", "df = 30", "df = 50", "Normal"),
col = c("purple", "lightblue", "green", "brown", "pink"),
lty = c(1, 1, 1, 2),
lwd = 2,
bty = "n")
The t-distribution is defined from \(-\infty\) to \(\infty\).
As seen in the visualization above, a t-distribution with smaller
degrees of freedom has a flatter peak with wider tails. On the other
hand, a t-distribution with larger degrees of freedom has a higher peak
with more narrow tails. Also, the visualization shows that as the number
of degrees of freedom increases further and further, the curve of the
distribution becomes closer to that of a Normal distribution curve.
Assumptions of a t-Distribution
In order to use a t-distribution, the following assumptions must be
met:
The observations are independent from one another.
The dependent variable must be continuous.
The data follows an approximately Normal distribution.
The population standard deviation is unknown.
The Chi-Square Distribution
Another commonly used distribution is the Chi-Square distribution.
The Chi-Square distribution is a variation of the Gamma distribution
that is also represented as the sum of squared standard Normal random
variables. If \(Z_1, Z_2, \ldots, Z_k
\stackrel{iid}{\sim} N(0,1)\) then \(\sum_{i=1}^{k} Z_i^2 \sim \chi^2_k\) where
k is the degrees of freedom. The exact distribution of the scaled sample
variance for a Normal distribution is as follows, \(\frac{(n-1)S^2}{\sigma^2} {\to}
\chi_{n-1}^2\). This gives us the Chi-Square distribution.
The shape of a Chi-Square distribution depends on its degrees of
freedom, just like how the shape of a t-distribution also depends on its
degrees of freedom. Once again, degrees of freedom is defined as n-1,
where n is the sample size. One major difference of the Chi-Square
distribution from the Normal distribution and t-distribution is that the
Chi-Square distribution is asymmetrically shaped, and does not follow a
symmetric, bell-shaped curve as was seen of the previous two
distributions.
The visualization below shows the Chi-Square distributions for
various degrees of freedom values.
x <- seq(0, 30, length = 1000)
y_df2 <- dchisq(x, df = 2)
y_df5 <- dchisq(x, df = 5)
y_df15 <- dchisq(x, df = 15)
y_max <- max(y_df2, y_df5, y_df15)
plot(x, y_df2,
type = "l",
lwd = 2,
col = "purple",
ylim = c(0, y_max),
main = "Chi-Square Distributions with Different Degrees of Freedom",
xlab = "x",
ylab = "Density")
lines(x, y_df5, lwd = 2, col = "lightblue")
lines(x, y_df15, lwd = 2, col = "green")
legend("topright",
legend = c("df = 2", "df = 5", "df = 15"),
col = c("purple", "lightblue", "green"),
lwd = 2,
bty = "n")

The Chi-Square distribution is defined from 0 to \(\infty\).
In the visualization above, it can be seen that as the degrees of
freedom increases, the distribution curve becomes flatter and wider, and
shifts over to the right. The smaller the degrees of freedom, the higher
the peak of the distribution is, and the quicker it flattens out. For
these smaller degrees of freedom values, the distribution is very much
skewed to the right and asymmetric. It can be seen that how as the
degrees of freedom becomes larger and larger, the distribution becomes
less significantly skewed, and very large values of degrees of freedom
begin to become closer and closer to the shape of a Normal
distribution.
Assumptions of a Chi-Square Distribution
In order to use a Chi-Square distribution, the following assumptions
must be met:
The observations are independent from one another.
The sample size is sufficiently large enough.
The population follows a Normal distribution.
The Chi-Square statistics is formed from squared
deviations.
The F Distribution
One other important distribution is the F distribution. The F
distribution is the sampling distribution for the ratio of two
independent sample variances. The F distribution is useful for comparing
variances and is used in ANOVA (analysis of variance) and regression
modeling.
For a F distribution, we have two independent random samples, \(\{X_1, X_2, \cdots,
X_{n_1}\} \overset{i.i.d}{\sim} N(\mu_1, \sigma_1^2) \quad\text{ and }
\quad \{Y_1, Y_2, \cdots, Y_{n_2}\} \overset{i.i.d}{\sim} N(\mu_2,
\sigma_2^2)\). From these two samples, we have the sample
variance for each of the two distributions respectively, \(S_1^2 = \frac{1}{n_1-1} \sum_{i=1}^{n_1} (X_i -
\bar{X})^2 \quad\text{ and } \quad S_2^2 = \frac{1}{n_2-1}
\sum_{i=1}^{n_2} (Y_i - \bar{Y})^2\). The F statistic, is found
as follows, \(F =
\frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2} \overset{d}{\to} F_{n_1-1,
n_2-1}\). Thus, the F statistic serves as a ratio of the sample
variances for the two independent distributions. Once again, the F
distribution depends on the degrees of freedom for each of the two
independent samples. In this case, \(n_1-1\) and \(n_2-1\) are the degrees of freedom for
sample one and sample two respectively, where n is the sample size for
each independent, random sample. These two values of the degrees of
freedom for the numerator and denominator are the two parameters of a F
distribution.
The following visualization shows F distributions for various values
of the degrees of freedom for each of the two independent samples. Each
F distribution has two parameters, df1 and df2, which are these two
degrees of freedom values. This visualization shows how the F
distribution changes in appearence based upon these two degrees of
freedom parameters.
x <- seq(0, 5, length = 1000)
y_2_10 <- df(x, df1 = 2, df2 = 10)
y_5_10 <- df(x, df1 = 5, df2 = 10)
y_10_10 <- df(x, df1 = 10, df2 = 10)
y_max <- max(y_2_10, y_5_10, y_10_10)
plot(x, y_2_10,
type = "l",
lwd = 2,
col = "purple",
ylim = c(0, y_max),
main = "F Distributions with Different Degrees of Freedom",
xlab = "x",
ylab = "Density")
lines(x, y_5_10, lwd = 2, col = "lightblue")
lines(x, y_10_10, lwd = 2, col = "green")
legend("topright",
legend = c("df1 = 2, df2 = 10",
"df1 = 5, df2 = 10",
"df1 = 10, df2 = 10"),
col = c("purple", "lightblue", "green"),
lwd = 2,
bty = "n")

The F distribution is defined from 0 to \(\infty\).
As seen above, the curves of a F distribution as skewed right and
asymmetric. These curves do not follow a symmetric, bell-shaped curve
that the Normal distribution was seen to follow. In fact, these curves
look quite similar to what was seen with the Chi-Square distribution.
Similarly to the Chi-Square distribution, for a F distribution, smaller
values of degrees of freedom show steeper, and more skewed distribution
while larger values of degrees of freedom show wider distributions with
less skew in comparison. In fact, the F distribution can be defined
based on two independent Chi-Square distributions. The numerator and
denominator of a F distribution can be written in terms of two
independent Chi-Square distributions.If the samples are independent and
normally distributed, then \(\frac{(n_1 -
1)S_1^2}{\sigma_1^2} \sim \chi^2_{n_1 - 1},\qquad\frac{(n_2 -
1)S_2^2}{\sigma_2^2} \sim \chi^2_{n_2 - 1}\). Taking the ratio
results in, \(\frac{S_1^2}{S_2^2} \sim F_{n_1
- 1,\; n_2 - 1}\). Overall, the F distribution is a great way to
compare variances between these two independent distributions.
Assummptions of a F Distribution
In order to use a F distribution, the following assumptions must be
met:
The observations are independent from one another.
Each of the two samples are Normally distributed.
The samples are drawn independently from one another.
The populations should have homogeneity of variances (equal
variances).
Connections Between These Distributions
All four of these distributions, the Normal distribution, the
t-distribution, the Chi-Square distribution, and the F distribution, are
incredibly important to statistical analysis and random sampling.
These distributions connect to one another in several ways. For
instance, the sum of squared Normal variables follows a chi-square
distribution. Additionally, another example of this is that a F
statistic is the ratio of two independent Chi-Square random variables.
Another important occurrence of this is that if \(Z \sim N(0,1)\), then \(Z^2 \sim \chi^2\). So, while all four
distributions have distinctions from one another, they also overlap in
several ways and show clear connections with each other.
The table below shows a clear comparison of key features of the four
distributions. These features include a brief description of the shape
of each distribution, their paramaters, and the support of values for
which the distribution can take on.
dist_table <- data.frame(
Distribution = c("Normal", "t", "Chi-square", "F"),
Shape = c("Symmetric", "Symmetric, thicker tails", "Right-skewed", "Right-skewed"),
Support = c("$(-\\infty, \\infty)$", "$(-\\infty, \\infty)$", "$(0, \\infty)$", "$(0, \\infty)$"),
Parameters = c("$\\mu, \\sigma^2$", "df(v)", "df", "df$_1$, df$_2$")
)
kable(dist_table, format = "html", escape = FALSE)
|
Distribution
|
Shape
|
Support
|
Parameters
|
|
Normal
|
Symmetric
|
\((-\infty, \infty)\)
|
\(\mu, \sigma^2\)
|
|
t
|
Symmetric, thicker tails
|
\((-\infty, \infty)\)
|
df(v)
|
|
Chi-square
|
Right-skewed
|
\((0, \infty)\)
|
df
|
|
F
|
Right-skewed
|
\((0, \infty)\)
|
df\(_1\), df\(_2\)
|
Altogether, the Normal distribution, the t-distribution, the
Chi-Square distribution, and the F distribution are all important
statistical tools when it comes to observing sampling distributions and
making assumptions regarding the overall population based upon these
distributions. These four distributions have distinct differences from
one another, based upon their appearance and the parameters used within
each distribution. However, these distributions connect with one another
in various ways which shows the importance of each of these
distributions based upon how they can work together based upon
transformations of random variables through statistical procedures.
