1 Introduction

This note covers the definitions and inter-relationships of normal, t, chi-square, and F distributions, and their assumptions.

2 Normal Distrubtions

A normal distribution is a parametric distribution. A parametric distribution assumes the shape of the distribution. In other words, a parametric model assumes how the data is organized to make analyses from.

A normal distribution assumes about:
* 68% of data is 1 standard deviation of the mean
* 95% of the data is 2 standard deviations of the mean
* 99.7% of the data is 3 standard deviations of the mean
(Wackery, Mendenhall, Scheaffer, 1945, p.10).

 Normal Distrubtion

Normal Distrubtion

The assumed normal distribution takes on a bell curve. Demonstrated by the formula:

\[ f(y) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(y-\mu)^2}{2\sigma^2}} \]

Note: What is a standard deviation? - A standard deviation measures how much variation (or how dispersed) a set of values is from the mean. A lower standard deviation (ig. 1 standard deviation from the mean) is a value closer to the mean. A value closer the the mean as well as lower variance may suggest a stronger value or model. As lower variance may suggest points are clustered tightly around the the mean while higher variance suggests the data is spread out, potentially containing outliers.

2.1 Assumptions for Normal Distrubutions

  1. Data is continuous
  2. Symmetric with one peak
  3. Bell Shaped
  4. Mean, median and mode all assumed to be equal

3 Using Normal Distrubutions for Estimation

As the majority of random samples take on a normal distribution as using a parametric normal distribution as a sole estimator for a true Cumulative density function (CDF) should come with causation because assuming the shape of sample may add noise or bias into the estimator and or analysis, as adding a parameter test may not best hit the sample and or population.

-Note in most cases the population parameters and or distribution is unknown.

Although the majority of random samples take on a normal distribution inherently taking on a parametric distribution as a estimator for a true Cumulative density function (CDF), always assuming a normal distribution should come with caution. Assuming the shape of sample, when the sample distribution shape may be unknown may add noise or bias into the estimator and or analysis. The a parameter test may not best fit the sample and or population. If we force a bell-curve onto data that is actually skewed (leaning to one side instead of symmetric and non-naturally bell-shaped) our conclusions will hold bias.

Note in many cases the true population’s parameters or distributions are unknown so assuming shape may increase chance of error.

# https://www.biologyforlife.com/skew.html

knitr::include_graphics("C:/Users/75ER969287/OneDrive - West Chester University of PA/STA 506 - Mathematical Statistics II/Weekly Modules/Week 3/Homework/skewness image image 2.png")

3.1 Normal Distrubution Advantages

Despite a normal distribution being a parametric distribution that assumes shape, there are many advantages to using a normal distribution.***********

Uses for a normal distribution

  • A normal distribution has act as a comparison and validity checker
    • A normal distribution
      Often the case for linear regression, t-test and ANOVA residual tests /
    • We will later discuss an normal distribution assumption to assess the quality of two models by their variance ratio, also known as an F distribution /
    • Overall we often use normal distribution as the bases to make conclusions or approximations about our distributions because often samples or population often approach normal distributions and we can standardize our distributions relatively easily to match a universal scale for standard deviations. Regardless of the data’s original units, we can standardize within our data to digest how rare or common a data point is in respect to the other values.
  • Estimation for the Cumulative Distribution Function A normal distribution can act as an estimator for the cumulative distribution function. if an empirical CDF is used, the theoretical normal distribution function can be a base comparison to see if these two are statistically different.

In this case the empirical CDF is model based on the data observed. We can compare how well an empirical model compares to a theoretical normal distribution model to help us understand any nuance in our observed data.

set.seed(45)
# Generate sample data
sample_data <- rnorm(100, mean = 0, sd = 1)

# Create empirical CDF
empirical_cdf <- ecdf(sample_data)

# Plot empirical vs theoretical CDF
plot_df <- data.frame(
  x = seq(-5, 5, length.out = 2500),
  Empirical = empirical_cdf(seq(-15, 15, length.out = 2500)),
  Theoretical = pnorm(seq(-15, 15, length.out = 2500))
)

plot_df_long <- plot_df %>%
  pivot_longer(cols = c(Empirical, Theoretical), 
               names_to = "Type", values_to = "CDF")

Fn.plt <- ggplot(plot_df_long, aes(x = x, y = CDF, color = Type)) +
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("Empirical" = "green", "Theoretical" = "purple")) +
  labs(title = "Empirical vs Theoretical CDF",
       subtitle = "Sample size n = 100 from Standard Normal",
       x = "x", y = "CDF") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.margin = margin(t = 35, r = 20, b = 30, l = 30, unit = "pt"))
ggplotly(Fn.plt)

While the Normal distribution is the foundation of parametric inference, it can be applied to describe a sampling distribution in two distinct capacities:

  1. Exact Distribution: our population distribution is known to be normally distributed with random variables that are identically and independently distributed within a small sample size or

Because we know our population distribution is normal we can standardize our distribution sample to be under one.

\[ Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \rightarrow^d N(0,1) \]

  1. Asymptotic Distribution our population distribution is unknown but our sample size is large

3.2 Extract Distrubtion

A exact distribution assumes a normally distributed population with random variables that are identically and independently distributed within a small sample size. Our random variables are collected and treated independently, as the probability of the one value does not effect the probability of the next value.

3.3 Asymptotic Distribution Relation in Normal Distrubution

As mentioned earlier, there are some advantages to using the normal distribution despite its shape assumption.

As we take a random sample from the population in which we are not sure of its distribution, we use the normal distribution as an approximation for the true distribution.

Due to the large sample size, we can use the Central Limit Theorem to assume our sample statistic (often our sample mean) approaches and reaches a normal distribution. Even if in smaller quantities the distribution may not appear normal, with a large sample size our distribution my converge to a normal distribution. If an original population contains skew in its distribution, with a large enough sample the distribution can place into a bell curve.

# Set a seed so the random results are the same every time you 'knit'
set.seed(12)

# Define number of simulations and different sample sizes to test
n_simulations <- 10000
sample_sizes <- c(2, 5, 20, 50)

# Set up a 2x2 grid for the graphs
par(mfrow = c(2, 2))

for (n in sample_sizes) {
  # 1. Take 10,000 random samples of size 'n' from a skewed population
  # 2. Calculate the mean for each of those 10,000 samples
  sample_means <- replicate(n_simulations, mean(rexp(n, rate = 1)))
  
  # 3. Create the histogram
  hist(sample_means, 
       breaks = 40, 
       freq = FALSE, 
       main = paste("Sample Size n =", n),
       xlab = "Value of Sample Mean", 
       col = "skyblue", 
       border = "white")
  
  # 4. Add a theoretical Normal Curve (Red line) to see the fit
  # The mean of our population is 1, and SD is 1.
  curve(dnorm(x, mean = 1, sd = 1/sqrt(n)), 
        add = TRUE, col = "red", lwd = 2)
}

In these images we see as the sample size (n) increases,the distribution to moves to be less skewed into a more normal distribution.

As the observed distribution, may not follow a normal distribution directly we standardize our values into a normal distribution, using the following formula. Note this formula is an approximation. An approximation acts as best estimate considering we do not know the true distribution, unlike the exact distribution.

\[ Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \rightarrow^{aprox} N(0,1) \]

By standardizing our values into Z scores, we can approximate our probability distribution sample statistics, which are values like the mean, proportion, or regression coefficient.

4 t-distubution

We used a normal distribution when our population standard deviation \(\sigma^2\) was known. When we do not know our population’s standard deviation we use a t distribution.

When we do not know our population’s standard deviation we estimate using the sample variance \(S^2\) .
\[ T = \frac{\bar{X} - \mu}{\ S/ \sqrt{n}} \rightarrow t_{n-1} \] Our formula for out sample variance \(S^2\) is :

\[ S^2 = \frac {1}{n-1} \sum_{i=1}^{n} ({x_i -\bar{X}})^2 \] In our T distribution since we do not know our population standard deviation and divide by the sample standard deviation, we must consider the variation within the sample, which is why we divide \(S\) by the square root of the sample size.

As the shape of a t-distribution and normal distribution by the naked eye follow highly similar shapes the formulaic difference of the \(S /\sqrt{n}\) ,in the t distribution denominator, causes more uncertainty in the distribution leading to wider “fatter” tails in the t- distribution rather than the normal distribution. Unlike in the normal distribution where we know the variance, or spread of data, with the goal of finding the sample mean, in a t-distribution we neither know the variance nor the sample mean, leading to a greater chance of uncertainty. Greater uncertainty ultimately leads us to have fatter tails showing higher variance, although as n increases this uncertainty is reduced, giving a smaller standard deviation \(S^2\) ultimately decreasing the spread in the distribution.

Note as our sample size grows, the tails of the t-distribution get less fat, often converging closer the a normal distribution. Similar to our Central Limit Theron where the larger our sample size, the more our model converges to the normal distribution. The formula component of \(\sqrt{n}\) assists in changing the distribution shape, as the level of n contributes to the degrees of freedom, the only parameter in the t-distribution formula.

Below we compare a t-distribution with a normal distribution. Note the t-distribution has ‘fatter’ tails.

set.seed(359)
n <- 15
mu <- 5
sigma <- 2

# Generate t-statistics
n.samples <- 10000
t.stats <- numeric(n.samples)  # This defines a 10000 dimensional zero vector
                               # t.test <- NULL uses more computing resource
for(i in 1:n.samples) {
  sample.data <- rnorm(n, mu, sigma)
  x.bar <- mean(sample.data)
  s <- sd(sample.data)
  t.stats[i] <- (x.bar - mu) / (s/sqrt(n))
}

# Compare with theoretical t-distribution
x.vals <- seq(-4, 4, length.out = 200)
theoretical.t <- dt(x.vals, df = n-1)    # calling t-density function
theoretical.normal <- dnorm(x.vals)      # standard normal distribution

comparison.df <- data.frame(
  x = rep(x.vals, 2),
  density = c(theoretical.t, theoretical.normal),
  distribution = rep(c("t(9)", "N(0,1)"), each = length(x.vals))
)

t.plt <- ggplot(comparison.df, aes(x = x, y = density, color = distribution)) +
  geom_line(size = 1) +
  labs(title = "t-Distribution vs Normal Distribution",
       x = "Value", y = "Density") +
    theme(plot.title = element_text(hjust = 0.5),
        plot.margin = margin(t = 35, r = 20, b = 30, l = 30, unit = "pt")) +
   scale_color_manual(values = c("blue", "orange"))
ggplotly(t.plt)

Why our t-distribution connects with the normal distribution?

Our t-distribution in function behaves similarly to the normal distribution. Our normal standardized formula and normal distribution formula share a numerator accounting for the distance between the mean and the observed value ( shared numerator: \(\bar{X} - \mu\)). Even in shape both the t and normal distributions share a bell curve centered at 0. When the sample size is large enough a t distribution can become a normal distribution. Our t-distribution converging to the normal distribution is valuable because with a large enough sample size, our sample variance because accurate enough to account as the true population variance, allowing our statistics to be more accurate as they are closer to the true population.

Our assumptions of the t-distribution include independent random observations, random sampling, and that our population is normally distributed.

As t distributions tend to converge into normal distributions upon large sample size, often t distributions prior to converging have smaller sample sizes.

The t-distribution considers estimations for the sample mean, we use a chi-squared distribution to assess the variance, also known as spread, of the distribution.

5 Chi-Squared Distrubution

A chi-squared distribution is also a distribution type that can converge into normal distribution. A chi- squared distribution is a special case of a gamma distribution, in which both distributions have skewness.

Below is an example of a chi-square distribution:

set.seed(6)
n <- 5
sigma <- 2

# Generate chi-square statistics
n.samples <- 10000
chisq.stats <- numeric(n.samples)

for(i in 1:n.samples) {
  sample.data <- rnorm(n, 0, sigma)
  chisq.stats[i] <- sum((sample.data/sigma)^2)
}

# Compare with theoretical chi-square
x.vals <- seq(0, 30, length.out = 200)
theoretical.chisq <- dchisq(x.vals, df = n)
theory.df <- data.frame(x = x.vals, density = theoretical.chisq)

chi.plt <- ggplot(data.frame(x = chisq.stats), aes(x = x)) +
  geom_histogram(aes(y = ..density..), bins = 50, alpha = 0.7, fill = "green") +
  geom_line(data = theory.df, aes(x = x, y = density), 
            color = "blue", linewidth = 1.5) +
  #stat_function(fun = dchisq, args = list(df = n), color = "red", size = 1) +
  labs(title = "Chi-Squared Distribution (n=5) ",
       subtitle = "Sum of squared standard normals",
       x = "Value", y = "Density") +
   theme(plot.title = element_text(hjust = 0.5),
        plot.margin = margin(t = 35, r = 20, b = 30, l = 30, unit = "pt"))
ggplotly(chi.plt)

An increase in sample size can lead to a normal distribution, as increase in sample size reduces the skewness.

set.seed(6)
n <- 500
sigma <- 2

# Generate chi-square statistics
n.samples <- 10000
chisq.stats <- numeric(n.samples)

for(i in 1:n.samples) {
  sample.data <- rnorm(n, 0, sigma)
  chisq.stats[i] <- sum((sample.data/sigma)^2)
}

# Compare with theoretical chi-square
x.vals <- seq(0, 1100, length.out = 200)
theoretical.chisq <- dchisq(x.vals, df = n)
theory.df <- data.frame(x = x.vals, density = theoretical.chisq)

chi.plt <- ggplot(data.frame(x = chisq.stats), aes(x = x)) +
  geom_histogram(aes(y = ..density..), bins = 50, alpha = 0.7, fill = "green") +
  geom_line(data = theory.df, aes(x = x, y = density), 
            color = "blue", linewidth =1.5) +
  #stat_function(fun = dchisq, args = list(df = n), color = "red", size = 1) +
  labs(title = "Chi-Squared Distribution (n=500) ",
       subtitle = "Sum of squared standard normals",
       x = "Value", y = "Density") +
   theme(plot.title = element_text(hjust = 0.5),
        plot.margin = margin(t = 35, r = 20, b = 30, l = 30, unit = "pt"))
ggplotly(chi.plt)

A chi-square distribution can derive from an exact normal distribution. It can essentially act as a squared standard normal with one degree of freedom or a sum of squares with more than 1 degree of freedom, adding skewness to a normal distribution. As the chi-square is a squared normal, the distribution can never be negative, exaggerating any right skewness. In other words, a chi-squared distribution can be described as a normal distribution, whose center has moved and now possesses skewness.

In contribution that the variance is a squared parameter, the sampling distribution of the sample variance can be described as a chi-square distribution upon scaling. When our sample size increases,the degree of freedom parameter \((n-1)\) in our numerator increases. The Central Limit Theorem helps our chi-squared distribution take on a normal distribution, smoothing the skewness into a bell-shape, as the sample size increase.

Our chi-square distribution relationship with a normal distribution:

\[ \frac{(n-1)S^2}{\sigma^2} \rightarrow \chi^2_{n-1} \]

Like a t-distribution, our assumptions about a chi-square distribution is that we sample from a normally distributed population as well as individually and independently distributed. The chi-squared and t-distribution have degrees of freedom in their parameters, dictating the shapes of their distributions.

6 F distrubution

With our chi-squared distribution we see the variance of our distribution. Often we have multiple distributions and we need to relate the variances to each other to see quality of the two distributions. Hence we build a F distribution as a ratio of two chi-squared variables of independent sample variances. Like the individual chi-squared, both assume normal distribution. These chi-squared distributions source from independent populations.

\[ {X_1, X_2, ...X_n} ~ ^{i.i.d} N(\mu_2, \sigma^2_1) \\ and \\{Y_1, Y_2, ...Y_n} ~ ^{i.i.d} N(\mu_2, \sigma^2_2) \] \[ S^2_1 = \frac {1}{n_1-1} \sum_{i=1}^{n} ({X_i -\bar{X}})^2 \\and\\ S^2_2 = \frac {1}{n_2-1} \sum_{i=1}^{n} ({Y_i -\bar{Y}})^2 \] Define \[ F= \frac {S^2_1/ \sigma^2} {S^2_2/\sigma^2_2} \rightarrow^d F_{n-1, n_2-1} \]

The 1 degree of freedom each chi-squares hold in the F-distribution give us 2 degrees of freedom (one in the denominator, and one in the numerator) that both contribute to the distribution’s shape.

In the greater the F distribution ratio, the more variance in the numerator’s distribution. The smaller the variance the variance is larger in the denominator distribution. If the F distribution ratio is 1, the two variances in the numerator and denominator are equal.

set.seed(45)
df1 <- 20
df2 <- 25

# Generate F statistics
n.samples <- 10000
f.stats <- numeric(n.samples)

for(i in 1:n.samples) {
  u1 <- rchisq(1, df1)
  u2 <- rchisq(1, df2)
  f.stats[i] <- (u1/df1) / (u2/df2)
}

# Compare with theoretical F-distribution
x.vals <- seq(0, 5, length.out = 200)
theoretical.f <- df(x.vals, df1, df2)
theory.df <- data.frame(x = x.vals, density = theoretical.f)




f.plt <- ggplot(data.frame(x = f.stats), aes(x = x)) +
  geom_histogram(aes(y = ..density..), bins = 50, alpha = 0.7, fill = "blue") +
  geom_line(data = theory.df, aes(x = x, y = density), 
            color = "red", linewidth = 1) +
  coord_cartesian(xlim = c(0, 5)) +
  labs(title = paste("F-Distribution \n F(", df1, ",", df2, ")", sep = ""),
       x = "Value", y = "Density") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.margin = margin(t = 35, r = 20, b = 30, l = 30, unit = "pt"))
ggplotly(f.plt)

7 Conclusion

Assuming a normal distribution allows us to connect a t distribution to a normal distribution. A normal distribution can be used towards a chi- squared distribution to assess model variance and two chi-square tests can be used in an F distribution ratio to assess the overall quality of two distributions.

The normal distribution builds into advanced analyse that allow us to consider the quality of our distribution. Without a standardized distribution shape that the normal distribution gives us, making these comparisons would be challenging, especially through various unit types. Our normal distribution allows us to organize our distribution for analysis, and test the quality of an observed data set either through approximation or theoretical comparison.

