2/12/2023

Normal Distributions

A sample in Statistics is a sub-group of a larger population that is used to make evaluations and assumptions about it’s population. A phenomena that Statisticians have noticed is that if you take infinitely many random samples from the same population, calculate their means, and graph them, the graph would take on the shape of a bell. This distribution of means is called a Normal Distribution.

Evaluating Normality

Deciding if data is normally distributed isn’t a well-defined process. Therefore, statisticians will often times assume normality based on a few characteristics, including graph shape.

Creating Normal distributions can be done in many different ways in R. One way is using the norm functions that come with R and each of those 4 functions (dnorm, pnorm, qnorm, rnorm) require a mean and a standard deviation.

Mean

To collect the mean, we will call \(N\) our sample. \(n\) is size of our sample. In calculating mean, we can sum every observation in \(N\) and divide by the number of observations \[\bar{x} = \frac{\sum{N}}{n} \] where \(\bar{x}\) is the sample mean

Standard Deviation

The standard deviation is used to determine how far a certain observation is from the mean of the sample and can be written mathematically \[s = \sqrt{\frac{\sum{(x-\bar{x})^2}}{n-1}}\] where \(s\) equals the sample standard deviation, \(x\) is the observation and \(\bar{x}\) is the sample mean

rnorm(1:100, mean = 50, sd = 15)

6. Random Data Generation (R code)

The next method of creating Normal Distributions in R is to create the data programmatically. This will create 100 random samples of 5 integers between 1 and 100 and collect their means in the vector \(samp\)

samp <- c()
for(i in 1:100){
  temp <- sample(1:100, 5)
  samp[i] <- sum(temp)/5
}
head(samp)
## [1] 46.4 48.6 54.0 35.6 71.8 41.6

7. Random Data plotted

8. Comparing Two Box Plots