Distributions (N,z,t)

0.1 Normal Distribution:

0.1.1 Definition:

A normal distribution is a symmetric, bell-shaped probability distribution that is fully characterized by its mean (μ) and standard deviation (σ). In the context of student grades, a normal distribution suggests that a significant number of students perform close to the average grade, with fewer students performing exceptionally well or poorly.

0.1.2 Characteristics:

The grades follow a bell-shaped curve, with the majority of students clustered around the mean grade. The mean, median, and mode are all equal and located at the center of the distribution. About 68% of students fall within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations. Grading distributions that closely resemble a normal distribution are common in large classes or when grades are determined by multiple factors.

0.1.3 Example:

If student grades in a class are normally distributed with a mean of 75 and a standard deviation of 10, most students would have grades between 65 and 85, and very few would have grades below 55 or above 95.

0.2 Standard Normal Distribution (Z-distribution):

0.2.1 Definition:

The standard normal distribution, also known as the Z-distribution, is a specific type of normal distribution with a mean of 0 and a standard deviation of 1.
It serves as a reference distribution and is often used in statistical hypothesis testing and constructing confidence intervals.

0.2.2 Characteristics:

The Z-distribution is symmetric and bell-shaped, similar to the normal distribution.
Z-scores represent the number of standard deviations a data point is from the mean.
In the context of grading, the Z-distribution is frequently used to determine the proportion of students scoring above or below a certain grade threshold.

0.2.3 Example:

If student grades follow a normal distribution with a known mean and standard deviation, you can use the Z-distribution to find the probability that a randomly selected student has a grade above or below a specified value.

In summary, the normal distribution, t-distribution, and Z-distribution each have specific use cases in the context of grading students. The normal distribution is often suitable for larger datasets, the t-distribution for smaller samples, and the Z-distribution for standardized scores and hypothesis testing.

0.3 t-Distribution:

0.3.1 Definition:

The t-distribution is similar to the normal distribution but is used when the sample size is small, and the population standard deviation is unknown. It is shaped like a normal distribution but has heavier tails, providing wider intervals for confidence intervals and hypothesis testing in small samples.

0.3.2 Characteristics:

The t-distribution is influenced by the degrees of freedom (df), which are related to the sample size. As the degrees of freedom increase, the t-distribution approaches the shape of the standard normal distribution (normal distribution). In the context of grading, the t-distribution might be relevant when working with small groups of students or when the population standard deviation is not known.

0.3.3 Example:

If you want to calculate a confidence interval for the average grade of a small sample of students (e.g., n = 10), you would use a t-distribution with 9 degrees of freedom (df = n - 1). In summary, while a normal distribution is commonly used to describe large datasets of student grades, a t-distribution becomes particularly useful when working with smaller samples or when the population standard deviation is unknown. Both distributions play crucial roles in statistical analysis and inference within the context of student assessment and grading.

# Load necessary libraries
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.2.3

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.2.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Set up a sequence of values for the x-axis
x_values <- seq(-4, 4, 0.01)

# Create data frames for normal and t-distributions with different degrees of freedom
normal_data <- data.frame(x = x_values, y = dnorm(x_values))
t_dist_df2 <- data.frame(x = x_values, y = dt(x_values, df = 2))
t_dist_df20 <- data.frame(x = x_values, y = dt(x_values, df = 20))
t_dist_df30 <- data.frame(x = x_values, y = dt(x_values, df = 30))

# Combine data frames
combined_data <- bind_rows(
  data.frame(distribution = "Normal", normal_data),
  data.frame(distribution = "t (df=2)", t_dist_df2),
  data.frame(distribution = "t (df=20)", t_dist_df20),
  data.frame(distribution = "t (df=30)", t_dist_df30)
)

# Plot the distributions with a bold and thick line for the normal distribution
ggplot(data = combined_data, 
       mapping = aes(x = x, 
                     y = y, 
                     color = distribution)
       ) +
  geom_line(size = 1.5, 
            linetype = ifelse(test = combined_data$distribution == "Normal", 
                              yes = "solid",
                              no =  "dashed"
                              )
            ) +
  labs(title = "Normal and Student's t-Distributions",
       x = "Value",
       y = "Density") +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Alternatively, you can compress the code into fewer line below (useful for reading and making corrections/visibility).

library(ggplot2)

x <- seq(-5, 5, length.out = 200)

data_normal <- data.frame(x = x, y = dnorm(x), distribution = 'Normal')
dof_values <- c(2, 5, 15, 30, 120)
data_t <- do.call(rbind, lapply(dof_values, function(dof) {
  data.frame(x = x, y = dt(x, df = dof), distribution = paste('t (df =', dof, ')'))
}))
data_all <- rbind(data_normal, data_t)

# Plot
ggplot(data_all, aes(x = x, y = y, color = distribution)) +
  geom_line() +
  labs(title = 'Normal and t Distributions',
       x = 'Value',
       y = 'Probability Density') +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

# Load necessary libraries
library(ggplot2)
library(dplyr)

# Set seed for reproducibility
set.seed(123)

# Generate normal data from a population with mean 50 and standard deviation 10
population_data <- rnorm(1000, mean = 50, sd = 10)

# Plot the original normal distribution
ggplot(data.frame(x = population_data), aes(x = x)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black", alpha = 0.7) +
  labs(title = "Original Normal Distribution",
       x = "Value",
       y = "Frequency") +
  theme_minimal()

# Z-scale the data
z_scaled_data <- scale(population_data)

# Plot the Z-scaled distribution
ggplot(data.frame(x = z_scaled_data), aes(x = x)) +
  geom_histogram(binwidth = 0.2, fill = "lightgreen", color = "black", alpha = 0.7) +
  labs(title = "Z-scaled Distribution",
       x = "Z-score",
       y = "Frequency") +
  theme_minimal()