Mean and Standard Deviation

The Mean of a sample, represented by \(\bar{x}\), is the mathematical average of that sample, found by \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i\). \(\newline\) \(\newline\)

The Standard Deviation of a sample, represented by \(s_x\), is the average distance all elements of that sample are from the sample Mean. It is found by the equation \(s_x\) = \(\sqrt{ \frac{1}{n - 1} \sum_{i=1}^{n} (x_i - \bar{x})^2 }\). \(\newline\) \(\newline\) \(\bar{x}\) and \(s_x\) are called statistics since they are properties of a sample.

Population Parameters

Recall that \(\bar{x}\) is the sample mean, and that \(s_x\) is the sample standard deviation. There are similar metrics but for the whole population (which are called parameters instead of statistics), and they are \(\mu\) for mean of the population, and \(\sigma\) for standard deviation of the population.

\(\newline\) \(\newline\)

Note that \(\bar{x} \neq \mu\) and that \(s_x \neq \sigma\), as the sample that was collected on the population might not accurately represent the population. For instance, if the average length of a mouse in a given population is 3.2 inches, when we randomly choose 10 mice out of the population, our sample mean, \(\bar{x}\), likely will not equal \(\mu\) due to random noise, and probably will rather be some number around it.

Distribution Curves

There are many different types of distribution curves, but we are going to keep things simple by only talking about one group –> Standard Normal Density Curves. Here is what it means:

\(\newline\)

Curve —> A parabolic - like graph which represents a distribution \(\newline\) Standard —> The curve has \(\mu\) = 0, \(\sigma\) = 1. \(\newline\) Normal —> The curve is symmetrical \(\newline\) Density —> The curve has a total area = 1.

\(\newline\) \(\newline\) Normal curves are represented by N(\(\mu\), \(\sigma^2\)), and a Standard Normal Density Curve, specifically, can be represented by N(0,1). \(\newline\) The next slide will show you what exactly this curve looks like.

Distribution Curves Continued

The curve of N(0,1) is shown below.

library(ggplot2)
x_vals <- seq(-4, 4, length.out = 1000)
df <- data.frame(
  x = x_vals,
  density = dnorm(x_vals, mean = 0, sd = 1)
)
ggplot(df, aes(x = x, y = density)) + geom_line(color = "blue") + coord_cartesian(ylim = c(0,1))

68-95-99.7 rule

From the curve N(0,1) on the previous slide, multiple properties can be discovered via doing the integral of the curve, most notably the 68-95-99.7 rule, which states that, for the standard normal density curve, 68% of the data falls within 1 standard deviation of the mean, 95% falls within 2 standard deviations of the mean, and 99.7% falls within 3 standard deviations of the mean. Below is an interactive example using plotly.

Sampling Distributions and Confidence Intervals

A Sampling Distribution is a distribution of the mean of multiple samples, each of same sample size n. As it happens, the mean of the sampling distributions is about the same as the mean of the population, \(\mu\), whereas the standard deviation is roughly equal to \(\frac{\sigma}{\sqrt(n)}\). Combining this fact with the rule on the previous slide, 95% of the samples of a size n we get will have a mean that is within two standard deviations (\(\frac{\sigma}{\sqrt(n)}\)) of the true population mean \(\mu\). We can then flip this around to get that, if we randomly choose a sample of size n who has mean \(\bar{x}\), there is a 95% chance that that sample is within 2 \(\frac{\sigma}{\sqrt(n)}\) of \(\mu\) or, in other words, there is a 95% chance that \(\mu\) is within 2 \(\frac{\sigma}{\sqrt(n)}\) of \(\bar{x}\).

Confidence Level

As we have seen, Confidence Intervals are useful approximations for \(\mu\), but they are kind of limited. For instance, what if I wanted to find the Confidence Interval for which there is a 90% chance that \(\mu\) is in it from a sample? I definitely cannot use the 68-95-99.7 rule, so what can I use? \(\newline\) \(\newline\) A Confidence Level is the level of certainty (e.g., probability) that \(\mu\) will fall within a certain interval. For instance, if there is a 95% chance that \(\mu\) is in an interval, then the confidence level of the interval is 95%. From a given confidence level, we can find a multiplier which multiplies the given interval by to make it just big enough to fulfil the confidence level. For instance, for a confidence level of 90%, the multiplier of the interval is 1.645. You can see more multiplier values here under “Confidence Interval Critical Values”. We then plug this multiplier into the formula \(\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}\), where \(z^*\) is the multiplier, to get our confidence interval at Confidence level C.

Confidence Intervals Example (1)

Now that we understand Confidence Levels, let’s look at an example. \(\newline\) Suppose that the average grade of 40 students on a math exam is 88 (\(\bar{x}\)), and that the standard deviation of student’s grades is 3 (\(\sigma\)). Find a 90% confidence interval for the average student grade on the math exam (\(\mu\)).

We can represent this with the following plot:

Notice how the \(\mu\) of the curve is our value for \(\bar{x}\). We do it this way to make it easier to infer information about \(\mu\).

Confidence Intervals Example (2)

Recall that for a sampling distribution of size n, the standard deviation = \(\frac{\sigma}{\sqrt(n)}\). Therefore, the sampling distribution in our problem has standard deviation \(\frac{3}{\sqrt(40)}\). From the confidence interval table provided earlier, we find that, for the confidence level C of 90%, we have the multiplier 1.645. Lastly, recall that the mean of our sample was 88. From this, and from the formula \(\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}\), we get confidence interval [87.22,88.78]. Thus, there is a 90% chance that the average grade on the math exam is in the interval [87.22,88.78].

Terms

(In order of appearance) \(\newline\) \(\bar{x}\) —> The Mathematical Average (Mean) of a Sample \(\newline\) \(s_x\) —> The Standard Deviation of a Sample \(\newline\) \(\mu\) —> The Mathematical Average of a Population \(\newline\) \(\sigma\) —> The Standard Deviation of a Population \(\newline\) Standard Deviation —> The Mean distance all members of a sample fall from its mean \(\newline\) Standard Normal Density Curves —> A symmetrical distribution curve with area = 1 of form N(0,1) \(\newline\)

Terms (continued)

Sampling Distribution —> The Distribution of the Mean of Samples, each of size n \(\newline\) Confidence Level —> The Probability that \(\mu\) falls within a certain Interval \(\newline\) Confidence Interval —> An interval with a certain Confidence level which estimates the value of \(\mu\)