MAS 261 - Lecture 6

Introduction to the Normal Distribution

Penelope Pooler Eisenbies

2024-09-14

Housekeeping

Today’s plan 📋
- Review Question about Frequency data
- A few minutes for R Questions 🪄
- Introduction to the concept of Normal Data
  - Simplifying a Histogram to a Normal Curve Distribution
- Answering Questions using the Normal Distribution
- Z scores - Calculations and Interpretations
- A peek at the Z Table - “How we used to do things…”
A tour of HW 3
- In-class Exercises

R and RStudio

In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
- I will demo how to download completed work so that you can use this allotment efficiently.
- For those who want to go further with R/RStudio:
  - After Test 1, I will provide videos on how to download the software (R/RStudio/Quarto) and lecture files to your computer.

💥Lecture 6 In-class Exercises - Q1 💥

Session ID: MAS261f24

In lecture 5, we discussed frequency distributions of quantitative and categorical data.

Recall the categorical data about Education Attainment:

# A tibble: 5 × 7
  Highest_Degree    Freq Cum_Freq Rel_Freq Cum_Rel_Freq Pct_Freq Cum_Pct_Freq
  <chr>            <dbl>    <dbl>    <dbl>        <dbl>    <dbl>        <dbl>
1 Left high school   330      330     0.13         0.13     13           13  
2 High school       1269     1598     0.5          0.63     50           63  
3 Junior college     186     1786     0.07         0.7       7.3         70.4
4 Bachelor's         472     2258     0.19         0.89     18.6         89  
5 Graduate           280     2537     0.11         1        11          100

What PERCENT of survey respondents have a Junior college degree or a HIGHER level of education?

Review of Histograms of Different Distributions

Histograms are an effective tool for examining the distribution of the data.

LEFT SKEWED

Tail pulled out to LEFT

Low outliers

e.g. Human Lifespan

NORMAL/SYMMETRIC

Data appear in a symmetric bell-shaped curve

No graphic evidence of outliers

e.g. Test scores

RIGHT SKEWED

Tail pulled out to RIGHT

High outliers

e.g. Movie Gross values

Hypothetical Histogram

Most of the data falls in the middle intervals
Distribution is symmetric, and bell-shaped with no outliers.

Histogram overlayed with Density Curve

Recall the sum of the proportion of data in each interval equals 1
Area under the curve ALSO sums to 1

Normal Density Curve

We “smooth out” the histogram to a curve.
Area under the curve equals 1
We use this distribution to find the probability (percent chance) that a certain data value occurs.

Example: Annual Movie Gross Averages

We have the annual average movie theater gross for 1980 to 2022.
These are population data, not a sample
We exclude 2020 and 2021
- Movie theaters were closed for much of those years.
The data follows a approximately normal distribution with
- Population Mean ($\mu$) = $17.77 million
- Population Std. Dev. ($\sigma$) = $2.41 million

Normal Distribution - Annual Movie Gross Averages

Shaded regions show +/- 1 SD ($\sigma$), +/- 2SD ($\sigma$), +/- 3 SD ($\sigma$)

Using this plot (and a simple command or two), we can find the probability (percent chance) of particular value or range of values.

Annual Movie Gross Averages

Finding a Probability (Percent Chance)

What is the percent chance that next year’s average will be 20 million or more?

vdist_normal_prob(20, mean=17.77, sd=2.41, type="upper")

Annual Movie Gross Averages

Finding a Probability (Percent Chance)

What is the percent chance that next year’s average will be 15 million or less?

vdist_normal_prob(15, mean=17.77, sd=2.41, type="lower")

💥 Lecture 6 In-class Exercises - Q2-Q3 💥

Session ID: MAS261f24

What is the probability (percent chance) that the average gross will be 18 million or more?

HINT: This is an upper tail probability (percent chance) question that can be answered with vdist_norm_prob which shows the histogram or pnorm.

What is the probability that the average gross will be 17.77 million or less?

HINT: -Recall that the population mean, $\mu$, is 17.77, is the center of this symmetric normal distribution.

Area under the whole curve is 1 and if the mean, ($\mu = 17.77$), is at the center.
The area to the left of this center point is half of the area under the curve, 0.5.
No calculation needed, but you can use a plot to confirm your answer.

Probability (Percent Chance) within an Interval

What is the probability (percent chance) that the average gross per movie in 2024 will be between 16 and 19 million?

vdist_normal_prob(c(16,19), mean=17.77, sd=2.41, type="both")

💥 Lecture 6 In-class Exercises - Q4-Q5 💥

Session ID: MAS261f24

Probability (percent chance) within an Interval

We specify the interval using the grouping operator, c(), e.g. c(14,20).

What is the percent chance that the average gross per movie in 2024 will be between 14 million and 20 million (mean ($\mu$) = 17.77 and sd ($\sigma$)= 2.41)?

Probability (Percent Chance) outside of an Interval

What is the percent chance that the average gross per movie in 2024 will be more than 20 million or less than 14 million (mean ($\mu$) = 17.77 and sd ($\sigma$) = 2.41)?

HINTS:

Area under the whole normal curve is 1.
Percent chance that the new average gross is OUTSIDE of that interval is 100 - Percent Chance from Question 4.

Effects of Changes to the Mean ($\mu$) or SD ($\sigma$)

The mean ($\mu$) specifies to location of the normal distribution on the number line.
The standard deviation ($\sigma$) specifies the width of the normal distribution.
On the next slide are three hypothetical alternatives showing
- A change in mean ($\mu$) from $17.77 million to $14.00 million
  - A change in the mean ($\mu$) shifts the distribution’s location.
- A change in standard deviation ($\sigma$) from $2.41 to $6.0
  - A larger standard deviation ($\sigma$) means the distribution is wider .
- A change in standard deviation ($\sigma$) from $2.41 to $1.2
  - A smaller standard deviation ($\sigma$) means the distribution is more narrow.

Changes in Mean or SD Effect % Chance

Original Mean and Standard Deviation

Decrease in Mean

Increase in Standard Deviation

Decrease in Standard Deviation

💥Lecture 6 In-class Exercises - Q6 💥

Session ID: MAS261f24

How is the percent chance within an interval affected if the standard deviation increases or decreases

Use the previous code and experiment with different values for sd.

We specify the interval using the grouping operator, c(), e.g. c(14,20).

What is the percent chance that the average gross per movie in 2024 will be between 14 million and 20 million (mean ($\mu$) = 17.77 and sd = 4.7)?

Z score and Why It Is Important

In our data example, each observation is the average gross for single year.
For a single observation, X, e.g. the average gross for 2024, the Z score is calculated as follows
- $Z = \frac{X-\mu}{\sigma}$, observation minus mean divided by standard deviation.
- If we know the Z score, we can also find X: $X = (Z\times\sigma) + \mu$
Converting our data to Z scores, converts the data to the Standard Normal Distribution
- The Standard Normal Distribution has
  - a mean, $\mu$, of 0
  - a standard deviation, $\sigma$, of 1.
Probability (Percent Chance) questions like those covered today previously required converting all values to Z scores and using a table like this.

Calculations and Interpretations of Z-scores

Today we can bypass the Z-table BUT it is still helpful to know what the Z-score tells us.
Z indicates how many standard deviations an observed value is away from the mean ($\mu$).
If we know X then
- $Z = \frac{X-\mu}{\sigma}$ is how many standard deviations our X value is from the mean ($\mu$).
If we know the Z score, i.e. how many standard deviations a value is away from the mean ($\mu$), then
- $X = (Z\times\sigma) + \mu$ will tell us what the original data value is.

Examples of Z score calculations from Today

Recall that in the original data, the mean ($\mu$) = 17.77 and the standard deviation ($\sigma$) is 2.41

We examined the probability that X (next year’s average) is less than 15 is P(X < 15)
- $Z = \frac{X-\mu}{\sigma} = \frac{15 - 17.77}{2.41} = -1.149$
- X = 15 is 1.149 standard deviations BELOW the population mean.
- A negative Z value indicates the observed value is below the population mean, $\mu$ (15 < 17.77).

Examples of Z score calculations from Today

We also examined the probability (percent chance) that X (next year’s average) is more than 20 is P(X > 20)
- $Z = \frac{X-\mu}{\sigma} = \frac{20 - 17.77}{2.41} = 0.925$
- X = 20 is 0.925 standard deviations ABOVE the population mean.
- A positive Z value indicates the observed value is above the population mean, $\mu$ (20 > 17.77).

Finding X, an observed value from Z

If we want to know what value of X is 2 standard deviations below the mean (Z = -2):
- $X = (Z\times\sigma) + \mu = (-2 \times 2.41) + 17.77 = 12.95$
- If we encounter a new average gross in a subsequent year of 12.95, that value is 2 standard deviations below the population mean $\mu$.
- In the next lecture we’ll learn that knowing Z tells up how likely an observed value is to occur.

💥Lecture 6 In-class Exercises - Q7 💥

Session ID: MAS261f24

Recall that in the UPDATED average movie gross data, the population mean ($\mu$) = 14 and the standard deviation ($\sigma$) is 2.41.

What would average gross return (X) be if we were 3 standard deviations ABOVE population mean ($\mu$) = 14?

In the next class we will talk about just how unlikely it is to observe a value far from the population mean.

Key Points from Today

Normal Distribution is symmetric and bell-shaped
- Width is determined by the population standard deviation, $\sigma$.
- Location is determined by the population mean ($\mu$).
  - we can find the probability of seeing a new value or one farther from the mean ($\mu$).
  - we can also find the probability (percent chance) of X being in a range or interval.
  - Probabilities (Percent Chances) will change if the mean ($\mu$) or standard deviation $\sigma$ changes based on new information about the population.
- We can also convert our observed value X to a Z score and we can convert a Z score to and X value.
- Z tells the number of standard deviations ($\sigma$) X is above or below the mean $\mu$.

To submit an Engagement Question or Comment about material from Lecture 6: Submit it by midnight today (day of lecture).

MAS 261 - Lecture 6

Housekeeping

R and RStudio

💥Lecture 6 In-class Exercises - Q1 💥

Review of Histograms of Different Distributions

Hypothetical Histogram

Histogram overlayed with Density Curve

Normal Density Curve

Example: Annual Movie Gross Averages

Normal Distribution - Annual Movie Gross Averages

Annual Movie Gross Averages

Annual Movie Gross Averages

💥 Lecture 6 In-class Exercises - Q2-Q3 💥

Probability (Percent Chance) within an Interval

💥 Lecture 6 In-class Exercises - Q4-Q5 💥

Effects of Changes to the Mean (\(\mu\)) or SD (\(\sigma\))

Changes in Mean or SD Effect % Chance

💥Lecture 6 In-class Exercises - Q6 💥

Z score and Why It Is Important

Calculations and Interpretations of Z-scores

Examples of Z score calculations from Today

Examples of Z score calculations from Today

Finding X, an observed value from Z

💥Lecture 6 In-class Exercises - Q7 💥

Key Points from Today