# A tibble: 5 × 7
Highest_Degree Freq Cum_Freq Rel_Freq Cum_Rel_Freq Pct_Freq Cum_Pct_Freq
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Left high school 330 330 0.13 0.13 13 13
2 High school 1269 1598 0.5 0.63 50 63
3 Junior college 186 1786 0.07 0.7 7.3 70.4
4 Bachelor's 472 2258 0.19 0.89 18.6 89
5 Graduate 280 2537 0.11 1 11 100
MAS 261 - Lecture 6
Introduction to the Normal Distribution
Housekeeping
Today’s plan
Review Question about Frequency data
A few minutes for R Questions 🪄
Introduction to the concept of Normal Data
- Simplifying a Histogram to a Normal Curve Distribution
Answering Questions using the Normal Distribution
Z scores - Calculations and Interpretations
A peek at the Z Table - “How we used to do things…”
A tour of HW 3
- In-class Exercises
R and RStudio
In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
I will demo how to download completed work so that you can use this allotment efficiently.
For those who want to go further with R/RStudio:
- After Test 1, I will provide videos on how to download the software (R/RStudio/Quarto) and lecture files to your computer.
Lecture 6 In-class Exercises - Q1
Session ID: MAS261f24
In lecture 5, we discussed frequency distributions of quantitative and categorical data.
Recall the categorical data about Education Attainment:
What PERCENT of survey respondents have a Junior college degree or a HIGHER level of education?
Review of Histograms of Different Distributions
Histograms are an effective tool for examining the distribution of the data.
LEFT SKEWED
Tail pulled out to LEFT
Low outliers
e.g. Human Lifespan
NORMAL/SYMMETRIC
Data appear in a symmetric bell-shaped curve
No graphic evidence of outliers
e.g. Test scores
RIGHT SKEWED
Tail pulled out to RIGHT
High outliers
e.g. Movie Gross values
Hypothetical Histogram
- Most of the data falls in the middle intervals
- Distribution is symmetric, and bell-shaped with no outliers.
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Histogram overlayed with Density Curve
- Recall the sum of the proportion of data in each interval equals 1
- Area under the curve ALSO sums to 1
Normal Density Curve
- We “smooth out” the histogram to a curve.
- Area under the curve equals 1
- We use this distribution to find the probability (percent chance) that a certain data value occurs.
Example: Annual Movie Gross Averages
We have the annual average movie theater gross for 1980 to 2022.
These are population data, not a sample
We exclude 2020 and 2021
- Movie theaters were closed for much of those years.
The data follows a approximately normal distribution with
- Population Mean (\(\mu\)) = $17.77 million
- Population Std. Dev. (\(\sigma\)) = $2.41 million
Normal Distribution - Annual Movie Gross Averages
Shaded regions show +/- 1 SD (\(\sigma\)), +/- 2SD (\(\sigma\)), +/- 3 SD (\(\sigma\))
Using this plot (and a simple command or two), we can find the probability (percent chance) of particular value or range of values.
Annual Movie Gross Averages
Finding a Probability (Percent Chance)
What is the percent chance that next year’s average will be 20 million or more?
Annual Movie Gross Averages
Finding a Probability (Percent Chance)
What is the percent chance that next year’s average will be 15 million or less?
Lecture 6 In-class Exercises - Q2-Q3
Session ID: MAS261f24
What is the probability (percent chance) that the average gross will be 18 million or more?
HINT: This is an upper tail probability (percent chance) question that can be answered with vdist_norm_prob
which shows the histogram or pnorm
.
What is the probability that the average gross will be 17.77 million or less?
HINT: -Recall that the population mean, \(\mu\), is 17.77, is the center of this symmetric normal distribution.
Area under the whole curve is 1 and if the mean, (\(\mu = 17.77\)), is at the center.
The area to the left of this center point is half of the area under the curve, 0.5.
No calculation needed, but you can use a plot to confirm your answer.
Probability (Percent Chance) within an Interval
What is the probability (percent chance) that the average gross per movie in 2024 will be between 16 and 19 million?
Lecture 6 In-class Exercises - Q4-Q5
Session ID: MAS261f24
Probability (percent chance) within an Interval
We specify the interval using the grouping operator, c()
, e.g. c(14,20)
.
What is the percent chance that the average gross per movie in 2024 will be between 14 million and 20 million (mean (\(\mu\)) = 17.77 and sd (\(\sigma\))= 2.41)?
Probability (Percent Chance) outside of an Interval
What is the percent chance that the average gross per movie in 2024 will be more than 20 million or less than 14 million (mean (\(\mu\)) = 17.77 and sd (\(\sigma\)) = 2.41)?
HINTS:
Area under the whole normal curve is 1.
Percent chance that the new average gross is OUTSIDE of that interval is 100 - Percent Chance from Question 4.
Effects of Changes to the Mean (\(\mu\)) or SD (\(\sigma\))
The mean (\(\mu\)) specifies to location of the normal distribution on the number line.
The standard deviation (\(\sigma\)) specifies the width of the normal distribution.
On the next slide are three hypothetical alternatives showing
A change in mean (\(\mu\)) from
$
17.77 million to$
14.00 million- A change in the mean (\(\mu\)) shifts the distribution’s location.
A change in standard deviation (\(\sigma\)) from
$
2.41 to$
6.0- A larger standard deviation (\(\sigma\)) means the distribution is wider .
A change in standard deviation (\(\sigma\)) from
$
2.41 to$
1.2- A smaller standard deviation (\(\sigma\)) means the distribution is more narrow.
Changes in Mean or SD Effect % Chance
Original Mean and Standard Deviation
Decrease in Mean
Increase in Standard Deviation
Decrease in Standard Deviation
Lecture 6 In-class Exercises - Q6
Session ID: MAS261f24
How is the percent chance within an interval affected if the standard deviation increases or decreases
Use the previous code and experiment with different values for sd.
We specify the interval using the grouping operator, c()
, e.g. c(14,20)
.
What is the percent chance that the average gross per movie in 2024 will be between 14 million and 20 million (mean (\(\mu\)) = 17.77 and sd = 4.7)?
Z score and Why It Is Important
In our data example, each observation is the average gross for single year.
For a single observation, X, e.g. the average gross for 2024, the Z score is calculated as follows
\(Z = \frac{X-\mu}{\sigma}\), observation minus mean divided by standard deviation.
If we know the Z score, we can also find X: \(X = (Z\times\sigma) + \mu\)
Converting our data to Z scores, converts the data to the Standard Normal Distribution
The Standard Normal Distribution has
- a mean, \(\mu\), of 0
- a standard deviation, \(\sigma\), of 1.
Probability (Percent Chance) questions like those covered today previously required converting all values to Z scores and using a table like this.
Calculations and Interpretations of Z-scores
Today we can bypass the Z-table BUT it is still helpful to know what the Z-score tells us.
Z indicates how many standard deviations an observed value is away from the mean (\(\mu\)).
If we know X then
- \(Z = \frac{X-\mu}{\sigma}\) is how many standard deviations our X value is from the mean (\(\mu\)).
If we know the Z score, i.e. how many standard deviations a value is away from the mean (\(\mu\)), then
- \(X = (Z\times\sigma) + \mu\) will tell us what the original data value is.
Examples of Z score calculations from Today
Recall that in the original data, the mean (\(\mu\)) = 17.77 and the standard deviation (\(\sigma\)) is 2.41
We examined the probability that X (next year’s average) is less than 15 is P(X < 15)
\(Z = \frac{X-\mu}{\sigma} = \frac{15 - 17.77}{2.41} = -1.149\)
X = 15 is 1.149 standard deviations BELOW the population mean.
A negative Z value indicates the observed value is below the population mean, \(\mu\) (15 < 17.77).
Examples of Z score calculations from Today
We also examined the probability (percent chance) that X (next year’s average) is more than 20 is P(X > 20)
\(Z = \frac{X-\mu}{\sigma} = \frac{20 - 17.77}{2.41} = 0.925\)
X = 20 is 0.925 standard deviations ABOVE the population mean.
A positive Z value indicates the observed value is above the population mean, \(\mu\) (20 > 17.77).
Finding X, an observed value from Z
If we want to know what value of X is 2 standard deviations below the mean (Z = -2):
\(X = (Z\times\sigma) + \mu = (-2 \times 2.41) + 17.77 = 12.95\)
If we encounter a new average gross in a subsequent year of 12.95, that value is 2 standard deviations below the population mean \(\mu\).
In the next lecture we’ll learn that knowing Z tells up how likely an observed value is to occur.
Lecture 6 In-class Exercises - Q7
Session ID: MAS261f24
Recall that in the UPDATED average movie gross data, the population mean (\(\mu\)) = 14 and the standard deviation (\(\sigma\)) is 2.41.
What would average gross return (X) be if we were 3 standard deviations ABOVE population mean (\(\mu\)) = 14?
In the next class we will talk about just how unlikely it is to observe a value far from the population mean.
Key Points from Today
Normal Distribution is symmetric and bell-shaped
Width is determined by the population standard deviation, \(\sigma\).
Location is determined by the population mean (\(\mu\)).
we can find the probability of seeing a new value or one farther from the mean (\(\mu\)).
we can also find the probability (percent chance) of X being in a range or interval.
Probabilities (Percent Chances) will change if the mean (\(\mu\)) or standard deviation \(\sigma\) changes based on new information about the population.
We can also convert our observed value X to a Z score and we can convert a Z score to and X value.
Z tells the number of standard deviations (\(\sigma\)) X is above or below the mean \(\mu\).
To submit an Engagement Question or Comment about material from Lecture 6: Submit it by midnight today (day of lecture).