Proportions, Percentages and Confidence Intervals
2025-10-20
Comments and Questions about HW 5
A few minutes for R Questions 🪄
Review of Confidence Interval Concepts and Definitions
Terminology for Proportion Estimates
Point Estimates and Confidence Interval for Proportions
Introduction to the concept of Hypotheses
Examining Hypotheses using Confidence Intervals
In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
For those who want to go further with R/RStudio:
If you are interested in downloading R and RStudio to your own computer, I can guide you through the process.
The software is completely free but it does have to be updated a couple times each year.
Poll Everywhere - My User Name: penelopepoolereisenbies685
In Lecture 14 and HW 5, we cover the three components of the margin of error E which is the half width of the confidence interval.
Recall
CI Lower Bound = \(\overline{X}-E\)
CI Upper Bound = \(\overline{X}+E\)
\(E = \frac{S}{\sqrt{n}}\times t\) where
Which component of E, do we have no ability to control?
HW 5 is due Friday, 10/17 and the grace period is extended until 10/20 at midnight.
If there are questions from HW 5 or Quiz 1 that are general and would benefit everyone, please let me know.
Proportion a part of a whole population
Expressed as a value between 0 and 1
Notation:
Population proportion: P (Often unknown)
Sample proportion: \(\hat{P}\) (from sample data)
Confidence Interval for a Proportion
Lower Bound: \(\hat{P} - E\)
Upper Bound: \(\hat{P} + E\)
E is the margin of error (calculated a little differently than for quantitative data).
These data were collected with a poll that had SIX response choices but we’ll use a common analytical technique to simplify the analyses.
We group the data into
NO’s (Responded NOT Likely)
Not NO’s (Did not respond NOT likely)
The pollsters interviewed 1025 Adults in the US (n = 1025)
584 of those interviewed responded ‘NOT Likely’
441 did not respond ‘NOT Likely’
Are we 95% confident that the majority of US adults are not ready for EVs?
Are we 99% confident that the majority of US adutls are not ready for EVs?
Estimated proportion, \(\hat{P} = \frac{X}{n}=\frac{584}{1025} = 0.57\)
95% Confidence Interval
correct = F is used because sample size is large enough so a continuity correction is not needed.
Students in MAS 261 are not required to learn about when to use a continuity correction.
All examples and questions will have sufficiently large sample sizes
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5392410 0.5997503
sample estimates:
p
0.5697561
Poll Everywhere - My User Name: penelopepoolereisenbies685
Interpreting the prop.test output:
____.Fill in the blank: We are 95% confident that a majority of US adults are ____ to buy an EV at this time.
Specify likely or not likely.
In Lecture 14 we discussed that if we want to be MORE confident that we have captured the true value, our interval will be wider.
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5392410 0.5997503
sample estimates:
p
0.5697561
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.529599 0.609016
sample estimates:
p
0.5697561
In this case, t is NOT used to find E AND S is calculated differently
t distribution not appropriate for categorical data, but by CLT we can use Z distribution:
These are categorical (binomial) data so \(S = \sqrt{\hat{P}\times(1-\hat{P})}\)
Margin of Error for Proportion Data:
Poll Everywhere - My User Name: penelopepoolereisenbies685
Question 4: What is the standard deviation for our EV poll data?
Recall that \(\hat{P} = \frac{584}{1025} = 0.57\)
Question 5: If we are calculating a 99% Confidence Interval for a proportion, what Z value should we use?
See previous slide or check the bottom of this t-table.
Poll Everywhere - My User Name: penelopepoolereisenbies685
What is the margin of error, E for the 99% Confidence Interval for the EV poll data?
\(E = \frac{S}{\sqrt{n}}\times Z = \sqrt{\frac{\hat{P}\times(1-\hat{P})}{n}}\times Z\)
Suggested strategy:
Divide answer to question 4 by \(\sqrt{n}=\sqrt{1025}\)
Multiply this ratio by Z
Check work by using prop.test output (\(E = \frac{UB-LB}{2}\))
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.529599 0.609016
sample estimates:
p
0.5697561
So far this course has dealt with describing data and estimating values.
Often we want to go beyond description and estimation
Based on what we see in the data, we want to formulate and test hypotheses.
Hypotheses can have different formats based on type of data and questions being asked.
In the next lecture we’ll talk about the formal language of hypothesis testing.
Today we’ll discuss some concepts that show we have already been informally testing hypotheses.
When looking at data, it is natural to develop hypotheses based on what we notice.
Testing hypotheses is a formal way of examining the data, graphically and numerically.
When we test our hypotheses, we are asking “Do these data support the ideas I have developed about this population?”
**Hypothesis: US adults are evenly split (50-50) on whether or not to buy an EV.
Our data disproves this hypothesis IF the 95% confidence interval EXCLUDES 0.5.
1-sample proportions test without continuity correction
data: 584 out of 1025, null probability 0.5
X-squared = 19.95, df = 1, p-value = 0.000007948
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5392410 0.5997503
sample estimates:
p
0.5697561
Our 95% interval (and our 99% interval) endpoints are both ABOVE 0.5, which disproves this hypothesis.
This conclusion matches the other parts of the output.
P-value is the probability of seeing these sample data if the specified hypothesis is true.
Null Hypothesis that is the default for proportion tests.Recall, the global mean number of subscribers is 21.89 million for the top 1000 YouTubers.
Is this random sample of 60 US YouTubers typical of the global top 1000?
Null Hypothesis: These 60 randomly sampled US YouTubers are not different from the larger global population of top YouTubers.
If data disprove this hypothesis:
95% percent confidence interval will NOT contain 21.89
p-value will be less that 0.05
One Sample t-test
data: yt60$subscribers_mil
t = -0.55738, df = 59, p-value = 0.5794
alternative hypothesis: true mean is not equal to 21.89
95 percent confidence interval:
17.65192 24.28141
sample estimates:
mean of x
20.96667
Poll Everywhere - My User Name: penelopepoolereisenbies685
Null Hypothesis: These 60 randomly sampled US YouTubers are not different from the larger global population of top YouTubers which has a population mean of 21.89 million.
Does the t.test confidence interval output disprove this hypothesis?
Categorical Data can be simplified to two categories for analysis purposes
Two category data - Estimate a proportion and a confidence interval.
prop.test command is one option to estimate confidence interval
Sample standard deviation for proportion data: \(S = \sqrt{\hat{P}\times(1-\hat{P})}\)
Margin of Error: \(E = \frac{S}{\sqrt{n}}\times Z = \sqrt{\frac{\hat{P}\times(1-\hat{P})}{n}}\times Z\)
We will cover hypothesis tests more formally in coming lectures
To submit an Engagement Question or Comment about material from Lecture 16: Submit it by midnight today (day of lecture).