MAS 261 - Lecture 7

Empirical Rule / Finding X from a Probability

Penelope Pooler Eisenbies

2024-09-14

Housekeeping

  • Today’s plan 📋

    • Review Question about Normal Probability

    • A few minutes for R Questions 🪄

    • Review of Of The Normal Distribution

    • Empirical Rule

      • Interpreting data values intuitively
    • Finding an observed value, X, from a probability (percent chance)

  • Questions about HW 3

  • In-class Exercises

R and RStudio

  • In this course we will use R and RStudio to understand statistical concepts.

  • You will access R and RStudio through Posit Cloud.

  • I will post R/RStudio files on Posit Cloud that you can access in provided links.

  • I will also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I will demo how to download completed work so that you can use this allotment efficiently.

    • For those who want to go further with R/RStudio:

      • After Test 1, I will provide videos on how to download the software (R/RStudio/Quarto) and lecture files to your computer.

💥Lecture 7 In-class Exercises - Q1 💥

Session ID: MAS261f24


The mean number of customers at a local cafe on a Monday morning is 39 with a standard deviation of 3


What is the percent chance that they will have 45 or more customers next Monday morning?


Use the vdist_norm_prob command to help you answer this question.

Review of Histograms of Different Distributions

Histograms are an effective tool for examining the distribution of the data.

LEFT SKEWED

Tail pulled out to LEFT

Low outliers

e.g. Human Lifespan

NORMAL/SYMMETRIC

Data appear in a symmetric bell-shaped curve

No graphic evidence of outliers

e.g. Test scores

RIGHT SKEWED

Tail pulled out to RIGHT

High outliers

e.g. Movie Gross values

Hypothetical Histogram

  • Most of the data falls in the middle intervals
  • Distribution is symmetric, and bell-shaped with no outliers.

Histogram overlayed with Density Curve

  • Recall the sum of the proportion of data in each interval equals 1
  • Area under the curve ALSO sums to 1

Normal Density Curve

  • We “smooth out” the histogram to a curve.
  • Area under the curve equals 1
  • We use this distribution to find the probability (percent chance) that a certain data value occurs.

Normal Distribution

In lecture 6 we talked about the normal distribution

  • It is symmetric and bell-shaped.

  • It’s location is determined by the population mean, \(\mu\)

  • It’s width is determined by the population standard devation, \(\sigma\)

  • Regardless of the values of \(\mu\) and \(\sigma\), the normal distribution has a consistent shape

  • That shape is well known and provides information about all normally distributed populations.

  • Also recall that all normally distributed populations can be converted to the standard normal distribution \(Z\)

    • Z is normally distributed with mean, \(\mu\) of 0 and SD, \(\sigma\) of 1.

    • If X is from a normal population, \(Z = \frac{X-\mu}{\sigma}\)

Normal Distribution - Empirical Rule

Part 1: 68% of all Normal populations fall within one standard deviation of their mean (illustrated using Z distribution).

Normal Distribution - Empirical Rule

Part 2: 95% of all Normal populations fall within two standard deviations of their mean (illustrated using Z distribution).

Normal Distribution - Empirical Rule

Part 3: 99.7% of all Normal population falls within three standard deviations of their mean (illustrated using Z distribution).

True probability is 0.9973, so I think this package rounds up.

Normal Distribution - Empirical Rule

Also Referred to as the 68-95-99.7 Rule

Summarizing the Emperical Rule in Words

68% of all values are within 1 std. dev, \(\sigma\), of the pop. mean, \(\mu\)

95% of all values are within 2 std. dev, \(2\times \sigma\), of the pop. mean, \(\mu\)

99.7% of all values are within 3 std. dev, \(3\times \sigma\), of the pop. mean, \(\mu\)

How is the 68-95-99.7 Rule Useful?

  • R, Excel, Other software, Normal Tables, Apps for phone or PC, etc. can ALL be used find probabilities from a normal distribution.

BUT

  • Internalizing the Empirical Rule allows you to understand the probability of seeing observed data intuitively WITHOUT using a computer or phone.

  • Learning these rules and how to use them allows you to immediately evaluate data to determine

    • Is the observation reasonable
    • Is it unlikely but not too surprizing
    • Is it so unlikely that it may be due to an error in data collection or
    • Is it so unlikely it might cause us to reevaluate are assumptions about the population distribution.

Example: Trading on the NYSE

  • Historic data indicates that the first 30 minutes of New York Stock Exchange (NYSE) trading volume (millions of shares) is normally distributed with

  • a mean of 200 million shares, \(\mu = 200\)

  • a standard deviation of 26 million shares, \(\sigma = 26\).

Answering Questions using the Empirical Rule

Use the Empirical Rule to find the probability that the trading volume will be in the range of 174 to 226 million shares.

  • A good approach is to convert range endpoints to Z-scores:
(174 - 200)/26  # -1 means 1 sd below the mean
[1] -1
(226 - 200)/26  #  1 means 1 sd above the mean
[1] 1
  • 174 to 226 is (\(\mu \pm \sigma\)) mean +/- 1 SD

  • Recall the Rule:

    • 68% of population within \(\mu \pm \sigma\)
    • 95% of population within \(\mu \pm 2\sigma\)
    • 99.7% of population within \(\mu \pm 3\sigma\)
  • Probability that trading will be between 174 and 226 is 68%

💥Lecture 7 In-class Exercises - Q2 💥

Session ID: MAS261f24

Use the Empirical Rule to find the probability that the NYSE morning trading volume will be in the range of 148 to 200 million shares.

  • Convert endpoints to Z scores

  • Hint: Normal distribution is SYMETRIC

💥Lecture 7 In-class Exercises - Q3 💥

Session ID: MAS261f24

Use the Empirical Rule to find the probability that the NYSE morning trading volume will be in the range of 200 to 278 million shares.

  • Convert endpoints to Z scores

  • Hint: Normal distribution is SYMETRIC

Interpreting a Z score using the Empirical Rule

  • If Z is between -1 and 1, observed value is VERY LIKELY.

  • If Z is between -1 and -2 or between 1 and 2, observed value is NOT AT ALL UNLIKELY, BUT MAY NOT BE TOO COMMON (especially as Z gets closer to 2 or -2).

  • If Z value is between -2 and -3 or between 2 and 3, observed value is UNLIKELY, BUT NOT TOO SUPRISING (until Z gets closer to 3 or -3).

  • If Z value is less than -3 or greater than 3, observed value is EXTREMELY UNLIKELY and could be due to error if \(\vert{Z}\vert\) is very large.

💥Lecture 7 In-class Exercises - Q4 💥

Session ID: MAS261f24


If trading in the first half hour is at 250 million shares, how should we interpret that?


Step 1. Convert 250 to Z score

Step 2. Use the guidance on the previous slide (based on the Empirical Rule) to interpret that Z score.

Finding X (observed value) from a Percentile

Sometimes what we want to know is what value would put us in

  • the top 10%
  • the bottom 5%
  • etc.


  • For example, how high would trading have to be to put it in the top 5% for sales

  • To answer this question we use a similar command to one we already know,

    • vdist_normal_perc

    • perc stands for percentile.

  • We (the user) specify the percentile and the output shows the value needed to achieve that percentile.

Finding X from a Percentile - NYSE

How high would trading have to be to put it in the top 5%?

vdist_normal_perc(.05, mean=200, sd=26, type="upper")

Finding X from a Percentile - Average Movie Gross

Recall our Annual Average Movie Gross example from Lecture 6:

  • The data follows an approximately normal distribution with

    • Population Mean (\(\mu\)) = $17.77 million
    • Population Std. Dev. (\(\sigma\)) = $2.41 million


  • Two Polling Questions:

    • How low would the annual average gross have to be in 2024 to be in the bottom 10%?

    • How high would the annual average gross have to be in 2024 to be in the top 2%?

💥Lecture 7 In-class Exercises - Q5-Q6 💥

Session ID: MAS261f24


Use vdist_normal_perc to answer each of these questions.

Round each answer to two decimal places.


How low would the annual average gross have to be in 2024 to be in the bottom 10%?


How high would the annual average gross have to be in 2024 to be in the top 2%?

A note about vdist_normal_perc

In each of the previous questions, one logical choice is to match the command inputs to how the question is written.

You can get the same answer two different ways

Example: How high would the annual average gross have to be in 2025 to be in the top 20%?

vdist_normal_perc(.2, mean=17.77, sd=2.41, type="upper")

vdist_normal_perc(.8, mean=17.77, sd=2.41, type="lower")

Preview of Lecture 8

  • In Lectures 6 and 7 we have talked about the normal distribution.

  • If we know are data are from a normal population, then we can easily find the probability of observing a single observation

    • greater than or equal to a specific value

    • less than or equal to a specific value

    • within a specified range

  • In Lecture 8 we will talk about the probability of observing a sample mean.

    • How does working with a sample mean with sample size (n) greater than 1, change our calculations?

    • Spoiler Alert: The adjustment to our calculations is very straightforward.

Key Points from Today

  • Normal Distribution is symmetric and bell-shaped

    • Width is determined by the population standard deviation, \(\sigma\).

    • Location is determined by the population mean (\(\mu\)).

  • Emperical (68-95-99.7) Rule

    • 68% of all values are within 1 std. dev, \(\sigma\), of the pop. mean, \(\mu\)
    • 95% of all values are within 2 std. dev, \(2\times \sigma\), of the pop. mean,\(\mu\)
    • 99.7% of all values are within 3 std. dev, \(3\times \sigma\), of the pop. mean,\(\mu\)
  • Convert values of interest and then use rule to determine how likely a value or range of values is.

  • Finding a value of interest from a percent chance or percentile?

    • use vdist_normal_perc and interpret

To submit an Engagement Question or Comment about material from Lecture 7: Submit it by midnight today (day of lecture).