X is the number of observations in category of interest and n = sample size
95% Confidence Interval
correct = F is used because sample size is large enough so a continuity correction is not needed.
Students in MAS 261 are not required to learn about when to use a continuity correction.
All examples and questions will have sufficiently large sample sizes
prop.test(584, 1025, conf.level=.95, correct = F)
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5392410 0.5997503
sample estimates:
p
0.5697561
💥 Lecture 14 In-class Exercises - Q2 and Q3 💥
Interpreting the prop.test output:
Question 2:
The 95% lower bound for the true proportion of US adults that are currently ‘NOT Likely’ to by an EV is ____.
Question 3:
Fill in the blank: We are 95% confident that a majority of US adults are ____ to buy an EV at this time.
Specify likely or not likely.
Would results change if we opt for a 99% Confidence Interval?
In Lecture 14 we discussed that if we want to be MORE confident that we have captured the true value, our interval will be wider.
Increased Confidence Level translates to WIDER Confidence Interval
prop.test(584, 1025, conf.level=.95, correct = F)
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5392410 0.5997503
sample estimates:
p
0.5697561
prop.test(584, 1025, conf.level=.99, correct = F)
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.529599 0.609016
sample estimates:
p
0.5697561
In this case, conclusion based on confidence interval does not change because sample size is large.
How is this margin of Error, E, estimated?
In this case, t is NOT used to find E AND S is calculated differently
t distribution not appropriate for categorical data, but by CLT we can use Z distribution:
80% CI: Z = 1.282
90% CI: Z = 1.645
95% CI: Z = 1.960
99% CI: Z = 2.576
These are categorical (binomial) data so \(S = \sqrt{\hat{P}\times(1-\hat{P})}\)
Recall that \(\hat{P} = \frac{X}{n}\)
Margin of Error for Proportion Data:
\(E = \frac{S}{\sqrt{n}}\times Z = \sqrt{\frac{\hat{P}\times(1-\hat{P})}{n}}\times Z\)
💥 Lecture 14 In-class Exercises - Q4 and Q5 💥
Question 4: What is the standard deviation for our EV poll data?
Recall that \(\hat{P} = \frac{584}{1025} = 0.57\)
Question 5: If we are calculating a 99% Confidence Interval for a proportion, what Z value should we use?
See previous slide or check the bottom of this t-table.
💥 Lecture 14 In-class Exercises - Q6 💥
What is the margin of error, E for the 99% Confidence Interval for the EV poll data?
\(E = \frac{S}{\sqrt{n}}\times Z = \sqrt{\frac{\hat{P}\times(1-\hat{P})}{n}}\times Z\)
Suggested strategy:
Divide answer to question 4 by \(\sqrt{n}=\sqrt{1025}\)
Multiply this ratio by Z = 2.576
Check work by using prop.test output (\(E = \frac{UB-LB}{2}\))
prop.test(584, 1025, conf.level=.99, correct = F)
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.529599 0.609016
sample estimates:
p
0.5697561
Introduction to Hypotheses
So far this course has dealt with describing data and estimating values.
Often we want to go beyond description and estimation
Based on what we see in the data, we want to formulate and test hypotheses.
Hypotheses can have different formats based on type of data and questions being asked.
In the next lecture we’ll talk about the formal language of hypothesis testing.
Today we’ll discuss some concepts that show we have already been informally testing hypotheses.
When looking at data, it is natural to develop hypotheses based on what we notice.
Testing hypotheses is a formal way of examining the data, graphically and numerically.
When we test our hypotheses, we are asking “Do these data support the ideas I have developed about this population?”
One more Look at the EV Poll 95% CI
**Hypothesis: US adults are evenly split (50-50) on whether or not to buy an EV.
Our data disproves this hypothesis IF the 95% confidence interval EXCLUDES 0.5.
prop.test(584, 1025, conf.level=.95, correct = F)
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5392410 0.5997503
sample estimates:
p
0.5697561
Our 95% interval (and our 99% interval) endpoints are both ABOVE 0.5, which disproves this hypothesis.
This conclusion matches the other parts of the output.
P-value is the probability of seeing these sample data if the specified hypothesis is true.
This hypothesis is the Null Hypothesis that is the default for proportion tests.
A Previous Confidence Interval Example
Recall, the global mean number of subscribers is 21.89 million for the top 1000 YouTubers.
Is this random sample of 60 US YouTubers typical of the global top 1000?
Null Hypothesis: These 60 randomly sampled US YouTubers are not different from the larger global population of top YouTubers.
If data disprove this hypothesis:
95% percent confidence interval will NOT contain 21.89
One Sample t-test
data: yt60$subscribers_mil
t = -0.55738, df = 59, p-value = 0.5794
alternative hypothesis: true mean is not equal to 21.89
95 percent confidence interval:
17.65192 24.28141
sample estimates:
mean of x
20.96667
💥 Lecture 14 In-class Exercises - Q7 💥
Null Hypothesis: These 60 randomly sampled US YouTubers are not different from the larger global population of top YouTubers which has a population mean of 21.89 million.
Does the t.test confidence interval output disprove this hypothesis?
t.test(yt60$subscribers_mil, mu=21.89)
One Sample t-test
data: yt60$subscribers_mil
t = -0.55738, df = 59, p-value = 0.5794
alternative hypothesis: true mean is not equal to 21.89
95 percent confidence interval:
17.65192 24.28141
sample estimates:
mean of x
20.96667
Key Points from Today
Categorical Data can be simplified to two categories for analysis purposes
Two category data - Estimate a proportion and a confidence interval.
\(\hat{P}=\frac{X}{n}\) where X = number of observations in category of interest.
prop.test command is one option to estimate confidence interval
Sample standard deviation for proportion data: \(S = \sqrt{\hat{P}\times(1-\hat{P})}\)
Margin of Error: \(E = \frac{S}{\sqrt{n}}\times Z = \sqrt{\frac{\hat{P}\times(1-\hat{P})}{n}}\times Z\)
We will cover hypothesis tests more formally in coming lectures
Today we tested hypotheses and drew conclusions based on estimated confidence intervals.
To submit an Engagement Question or Comment about material from Lecture 15: Submit by midnight today (day of lecture). Click on Link next to the ❓ under Lecture 15