2023-09-19
Today’s plan 📋
Review Question about Normal Probability
A few minutes for R Questions 🪄
Review of Of The Normal Distribution
Empirical Rule
Finding an observed value, X, from a probability (percent chance)
Questions about HW 3
In-class Exercises
Review: You have two options to facilitate your introduction to R and RStudio:
If you are comfortable with coding: Start with Option 1, but still sign up for Posit Cloud account.
If you are nervous about coding: Choose Option 2.
For both options: I can help with download/install issues during office hours.
What I do: I maintain a Posit Cloud account for helping students but I do most of my work on my laptop.
NOTE: We will use R and RStudio in class during MOST lectures
The mean number of customers at a local cafe on a Monday morning is 39 with a standard deviation of 3
What is the percent chance that they will have 45 or more customers next Monday morning?
Use the vdist_norm_prob
command to help you answer this question.
Histograms are an effective tool for examining the distribution of the data.
LEFT SKEWED
Tail pulled out to LEFT
Low outliers
e.g. Human Lifespan
NORMAL/SYMMETRIC
Data appear in a symmetric bell-shaped curve
No graphic evidence of outliers
e.g. Test scores
RIGHT SKEWED
Tail pulled out to RIGHT
High outliers
e.g. Movie Gross values
In lecture 6 we talked about the normal distribution
It is symmetric and bell-shaped.
It’s location is determined by the population mean, \(\mu\)
It’s width is determined by the population standard devation, \(\sigma\)
Regardless of the values of \(\mu\) and \(\sigma\), the normal distribution has a consistent shape
That shape is well known and provides information about all normally distributed populations.
Also recall that all normally distributed populations can be converted to the standard normal distribution \(Z\)
Z is normally distributed with mean, \(\mu\) of 0 and SD, \(\sigma\) of 1.
If X is from a normal population, \(Z = \frac{X-\mu}{\sigma}\)
Part 1: 68% of all Normal populations fall within one standard deviation of their mean (illustrated using Z distribution).
Part 2: 95% of all Normal populations fall within two standard deviations of their mean (illustrated using Z distribution).
Part 3: 99.7% of all Normal population falls within three standard deviations of their mean (illustrated using Z distribution).
True probability is 0.9973, so I think this package rounds up.
Also Referred to as the 68-95-99.7 Rule
68% of all values are within 1 std. dev, \(\sigma\), of the pop. mean, \(\mu\)
95% of all values are within 2 std. dev, \(2\times \sigma\), of the pop. mean, \(\mu\)
99.7% of all values are within 3 std. dev, \(3\times \sigma\), of the pop. mean, \(\mu\)
BUT
Internalizing the Empirical Rule allows you to understand the probability of seeing observed data intuitively WITHOUT using a computer or phone.
Learning these rules and how to use them allows you to immediately evaluate data to determine
Historic data indicates that the first 30 minutes of New York Stock Exchange (NYSE) trading volume (millions of shares) is normally distributed with
a mean of 200 million shares, \(\mu = 200\)
a standard deviation of 26 million shares, \(\sigma = 26\).
Use the Empirical Rule to find the probability that the trading volume will be in the range of 174 to 226 million shares.
174 to 226 is (\(\mu \pm \sigma\)) mean +/- 1 SD
Recall the Rule:
Probability that trading will be between 174 and 226 is 68%
Use the Empirical Rule to find the probability that the NYSE morning trading volume will be in the range of 148 to 200 million shares.
Convert endpoints to Z scores
Hint: Normal distribution is SYMETRIC
Use the Empirical Rule to find the probability that the NYSE morning trading volume will be in the range of 200 to 278 million shares.
Convert endpoints to Z scores
Hint: Normal distribution is SYMETRIC
If Z is between -1 and 1, observed value is VERY LIKELY.
If Z is between -1 and -2 or between 1 and 2, observed value is NOT AT ALL UNLIKELY, BUT MAY NOT BE TOO COMMON (especially as Z gets closer to 2 or -2).
If Z value is between -2 and -3 or between 2 and 3, observed value is UNLIKELY, BUT NOT TOO SUPRISING (until Z gets closer to 3 or -3).
If Z value is less than -3 or greater than 3, observed value is EXTREMELY UNLIKELY and could be due to error if \(\vert{Z}\vert\) is very large.
If trading in the first half hour is at 250 million shares, how should we interpret that?
Step 1. Convert 250 to Z score
Step 2. Use the guidance on the previous slide (based on the Empirical Rule) to interpret that Z score.
Sometimes what we want to know is what value would put us in
For example, how high would trading have to be to put it in the top 5% for sales
To answer this question we use a similar command to one we already know,
vdist_normal_perc
perc
stands for percentile.
We (the user) specify the percentile and the output shows the value needed to achieve that percentile.
How high would trading have to be to put it in the top 5%?
Recall our Annual Average Movie Gross example from Lecture 6:
The data follows an approximately normal distribution with
Two Polling Questions:
How low would the annual average gross have to be in 2024 to be in the bottom 10%?
How high would the annual average gross have to be in 2024 to be in the top 2%?
Use vdist_normal_perc
to answer each of these questions.
Round each answer to two decimal places.
How low would the annual average gross have to be in 2024 to be in the bottom 10%?
How high would the annual average gross have to be in 2024 to be in the top 2%?
vdist_normal_perc
In each of the previous questions, one logical choice is to match the command inputs to how the question is written.
You can get the same answer two different ways
Example: How high would the annual average gross have to be in 2025 to be in the top 20%?
In Lectures 6 and 7 we have talked about the normal distribution.
If we know are data are from a normal population, then we can easily find the probability of observing a single observation
greater than or equal to a specific value
less than or equal to a specific value
within a specified range
In Lecture 8 we will talk about the probability of observing a sample mean.
How does working with a sample mean with sample size (n) greater than 1, change our calculations?
Spoiler Alert: The adjustment to our calculations is very straightforward.
Normal Distribution is symmetric and bell-shaped
Width is determined by the population standard deviation, \(\sigma\).
Location is determined by the population mean (\(\mu\)).
Emperical (68-95-99.7) Rule
Convert values of interest and then use rule to determine how likely a value or range of values is.
Finding a value of interest from a percent chance or percentile?
vdist_normal_perc
and interpretTo submit an Engagement Question or Comment about material from Lecture 7: Submit by midnight today (day of lecture). Click on Link next to the ❓ under Lecture 7