Loading required package: pacman
MAS 261 - Lecture 7
Empirical Rule / Finding X from a Probability
Housekeeping
Today’s plan
Review Question about Normal Probability
A few minutes for R Questions 🪄
Review of Of The Normal Distribution
Empirical Rule
- Interpreting data values intuitively
Finding an observed value, X, from a probability (percent chance)
Questions about HW 3
In-class Exercises
R and RStudio
In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
I will demo how to download completed work so that you can use this allotment efficiently.
For those who want to go further with R/RStudio:
- After Test 1, I will provide videos on how to download the software (R/RStudio/Quarto) and lecture files to your computer.
Lecture 7 In-class Exercises - Q1
Session ID: MAS261f24
The mean number of customers at a local cafe on a Monday morning is 39 with a standard deviation of 3
What is the percent chance that they will have 45 or more customers next Monday morning?
Use the vdist_norm_prob
command to help you answer this question.
Review of Histograms of Different Distributions
Histograms are an effective tool for examining the distribution of the data.
LEFT SKEWED
Tail pulled out to LEFT
Low outliers
e.g. Human Lifespan
NORMAL/SYMMETRIC
Data appear in a symmetric bell-shaped curve
No graphic evidence of outliers
e.g. Test scores
RIGHT SKEWED
Tail pulled out to RIGHT
High outliers
e.g. Movie Gross values
Hypothetical Histogram
- Most of the data falls in the middle intervals
- Distribution is symmetric, and bell-shaped with no outliers.
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Histogram overlayed with Density Curve
- Recall the sum of the proportion of data in each interval equals 1
- Area under the curve ALSO sums to 1
Normal Density Curve
- We “smooth out” the histogram to a curve.
- Area under the curve equals 1
- We use this distribution to find the probability (percent chance) that a certain data value occurs.
Normal Distribution
In lecture 6 we talked about the normal distribution
It is symmetric and bell-shaped.
It’s location is determined by the population mean, \(\mu\)
It’s width is determined by the population standard devation, \(\sigma\)
Regardless of the values of \(\mu\) and \(\sigma\), the normal distribution has a consistent shape
That shape is well known and provides information about all normally distributed populations.
Also recall that all normally distributed populations can be converted to the standard normal distribution \(Z\)
Z is normally distributed with mean, \(\mu\) of 0 and SD, \(\sigma\) of 1.
If X is from a normal population, \(Z = \frac{X-\mu}{\sigma}\)
Normal Distribution - Empirical Rule
Part 1: 68% of all Normal populations fall within one standard deviation of their mean (illustrated using Z distribution).
Normal Distribution - Empirical Rule
Part 2: 95% of all Normal populations fall within two standard deviations of their mean (illustrated using Z distribution).
Normal Distribution - Empirical Rule
Part 3: 99.7% of all Normal population falls within three standard deviations of their mean (illustrated using Z distribution).
True probability is 0.9973, so I think this package rounds up.
Normal Distribution - Empirical Rule
Also Referred to as the 68-95-99.7 Rule
Summarizing the Emperical Rule in Words
68% of all values are within 1 std. dev, \(\sigma\), of the pop. mean, \(\mu\)
95% of all values are within 2 std. dev, \(2\times \sigma\), of the pop. mean, \(\mu\)
99.7% of all values are within 3 std. dev, \(3\times \sigma\), of the pop. mean, \(\mu\)
How is the 68-95-99.7 Rule Useful?
- R, Excel, Other software, Normal Tables, Apps for phone or PC, etc. can ALL be used find probabilities from a normal distribution.
BUT
Internalizing the Empirical Rule allows you to understand the probability of seeing observed data intuitively WITHOUT using a computer or phone.
Learning these rules and how to use them allows you to immediately evaluate data to determine
- Is the observation reasonable
- Is it unlikely but not too surprizing
- Is it so unlikely that it may be due to an error in data collection or
- Is it so unlikely it might cause us to reevaluate are assumptions about the population distribution.
Example: Trading on the NYSE
Historic data indicates that the first 30 minutes of New York Stock Exchange (NYSE) trading volume (millions of shares) is normally distributed with
a mean of 200 million shares, \(\mu = 200\)
a standard deviation of 26 million shares, \(\sigma = 26\).
Answering Questions using the Empirical Rule
Use the Empirical Rule to find the probability that the trading volume will be in the range of 174 to 226 million shares.
- A good approach is to convert range endpoints to Z-scores:
174 to 226 is (\(\mu \pm \sigma\)) mean +/- 1 SD
Recall the Rule:
- 68% of population within \(\mu \pm \sigma\)
- 95% of population within \(\mu \pm 2\sigma\)
- 99.7% of population within \(\mu \pm 3\sigma\)
Probability that trading will be between 174 and 226 is 68%
Lecture 7 In-class Exercises - Q2
Session ID: MAS261f24
Use the Empirical Rule to find the probability that the NYSE morning trading volume will be in the range of 148 to 200 million shares.
Convert endpoints to Z scores
Hint: Normal distribution is SYMETRIC
Lecture 7 In-class Exercises - Q3
Session ID: MAS261f24
Use the Empirical Rule to find the probability that the NYSE morning trading volume will be in the range of 200 to 278 million shares.
Convert endpoints to Z scores
Hint: Normal distribution is SYMETRIC
Interpreting a Z score using the Empirical Rule
If Z is between -1 and 1, observed value is VERY LIKELY.
If Z is between -1 and -2 or between 1 and 2, observed value is NOT AT ALL UNLIKELY, BUT MAY NOT BE TOO COMMON (especially as Z gets closer to 2 or -2).
If Z value is between -2 and -3 or between 2 and 3, observed value is UNLIKELY, BUT NOT TOO SUPRISING (until Z gets closer to 3 or -3).
If Z value is less than -3 or greater than 3, observed value is EXTREMELY UNLIKELY and could be due to error if \(\vert{Z}\vert\) is very large.
Lecture 7 In-class Exercises - Q4
Session ID: MAS261f24
If trading in the first half hour is at 250 million shares, how should we interpret that?
Step 1. Convert 250 to Z score
Step 2. Use the guidance on the previous slide (based on the Empirical Rule) to interpret that Z score.
Finding X (observed value) from a Percentile
Sometimes what we want to know is what value would put us in
- the top 10%
- the bottom 5%
- etc.
For example, how high would trading have to be to put it in the top 5% for sales
To answer this question we use a similar command to one we already know,
vdist_normal_perc
perc
stands for percentile.
We (the user) specify the percentile and the output shows the value needed to achieve that percentile.
Finding X from a Percentile - NYSE
How high would trading have to be to put it in the top 5%?
Finding X from a Percentile - Average Movie Gross
Recall our Annual Average Movie Gross example from Lecture 6:
The data follows an approximately normal distribution with
- Population Mean (\(\mu\)) = $17.77 million
- Population Std. Dev. (\(\sigma\)) = $2.41 million
Two Polling Questions:
How low would the annual average gross have to be in 2024 to be in the bottom 10%?
How high would the annual average gross have to be in 2024 to be in the top 2%?
Lecture 7 In-class Exercises - Q5-Q6
Session ID: MAS261f24
Use vdist_normal_perc
to answer each of these questions.
Round each answer to two decimal places.
How low would the annual average gross have to be in 2024 to be in the bottom 10%?
How high would the annual average gross have to be in 2024 to be in the top 2%?
A note about vdist_normal_perc
In each of the previous questions, one logical choice is to match the command inputs to how the question is written.
You can get the same answer two different ways
Example: How high would the annual average gross have to be in 2025 to be in the top 20%?
Preview of Lecture 8
In Lectures 6 and 7 we have talked about the normal distribution.
If we know are data are from a normal population, then we can easily find the probability of observing a single observation
greater than or equal to a specific value
less than or equal to a specific value
within a specified range
In Lecture 8 we will talk about the probability of observing a sample mean.
How does working with a sample mean with sample size (n) greater than 1, change our calculations?
Spoiler Alert: The adjustment to our calculations is very straightforward.
Key Points from Today
Normal Distribution is symmetric and bell-shaped
Width is determined by the population standard deviation, \(\sigma\).
Location is determined by the population mean (\(\mu\)).
Emperical (68-95-99.7) Rule
- 68% of all values are within 1 std. dev, \(\sigma\), of the pop. mean, \(\mu\)
- 95% of all values are within 2 std. dev, \(2\times \sigma\), of the pop. mean,\(\mu\)
- 99.7% of all values are within 3 std. dev, \(3\times \sigma\), of the pop. mean,\(\mu\)
Convert values of interest and then use rule to determine how likely a value or range of values is.
Finding a value of interest from a percent chance or percentile?
- use
vdist_normal_perc
and interpret
- use
To submit an Engagement Question or Comment about material from Lecture 7: Submit it by midnight today (day of lecture).