KITADA
Lab Activity #3 Inference for a single categorical variable of interest (with two categories): The One-Proportion methods
Objectives:
Part I: Examples
Example 1: the Referendum Example
A referendum was placed on a ballot in a local community. If more than 60% of the voters vote in favor of the referendum, it will pass. Otherwise, the referendum will not pass. A random sample of 25 registered voters was taken. Their responses are recorded in the REFERENDUM data set on Blackboard. The variable vote contains two responses: “1” for “Yes” and “0” for “No”. Based on these results, is there evidence to indicate the referendum will pass?
Questions
1. What is the variable of interest and population of interest? Is the variable of interest categorical or quantitative?
Variable of interst: “Yes” vote for referendum
Population of interest: Registered voters in the local community
2. What is the random variable? Is the random variable discrete or continuous?
X = The number of voters who vote “yes”
The random variable is discrete and takes values from 0 to 25.
3. Does the random variable have a binomial distribution? Explain.
Yes, the random variable has a binomial distribution.
Verify the following:
1) Two outcome options
2) Independent observations
3) Fixed number of trials
4) Constant underlying proportions
4. State the null and alternative hypotheses in words and statistical notation. Define the notation.
\( H_0: p = 0.6 \)
\( H_0: p > 0.6 \)
p is the proportion of people who votes “Yes”
5. Exploring the sample data for a single categorical value with two categories involves obtaining either a bar chart or pie chart AND the frequency table. For two categories, a pie chart might be a better graph, but either can be used. Obtain a pie chart and a frequency table.
Frequency table
This command was learned in ST 351. It will be beneficial to create an object that contains the table for creating the pie chart:
votetable<-with(REFERENDUM, table(vote))
votetable
## vote
## 0 1
## 7 18
Note that the response of “0” comes first in the table, then “1”. This is important for labelling your pie chart.
Pie Chart:
You can make a simple pie chart in R using the pie() function on a tabled categorical variable.
with(REFERENDUM, pie(votetable,
labels=c("No", "Yes"),
main="Sample of Referendum Voters"))
The order of the labels comes from the order in your table.
a. What proportion of voters in the sample said they’d vote for the referendum? What is the notation for this proportion?
\( \hat{p}=\frac{18}{25}=0.72 \)
b. Based on the exploratory analysis of the pie chart and frequency table, do you feel there is evidence to say that the referendum will pass?
Yes, based on the exploratory analysis it looks like we will fail to reject the null in favor of the alternative; however, we still need to perform a rigorous statistics test.
See Part III for R commands for questions 6 and 7
6. Using R, find the EXACT p-value and 95% confidence interval using the binomial formula.
p-value:
### EXACT BINOMIAL TEST
binom.test(x=18, n=25, p = 0.6,
alternative = "greater",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 18 and 25
## number of successes = 18, number of trials = 25, p-value = 0.1536
## alternative hypothesis: true probability of success is greater than 0.6
## 95 percent confidence interval:
## 0.5377911 1.0000000
## sample estimates:
## probability of success
## 0.72
p-value = 0.1536
95% confidence interval for p:
### EXACT BINOMIAL TEST
binom.test(x=18, n=25, p = 0.6,
alternative = "two.sided",
conf.level = 0.95)
##
## Exact binomial test
##
## data: 18 and 25
## number of successes = 18, number of trials = 25, p-value = 0.3073
## alternative hypothesis: true probability of success is not equal to 0.6
## 95 percent confidence interval:
## 0.5061232 0.8792833
## sample estimates:
## probability of success
## 0.72
95 percent confidence interval:
0.5061232 0.8792833
*7. The oneprop macro can be used to estimate the exact p-value and construct a confidence interval for the population proportion. *
Copy and paste ALL the oneprop macro code in your R script, highlight all code for the macro and run this code. This will make the macro available for use.
To use the macro, the column that contains the responses of the cases must be coded with a 0 or 1. (For this example, 0 = not in favor and 1 = in favor.) If the responses are not coded this way, the macro may not work or may not work properly.
The arguments for the macro are:
For our example here, the code to run the macro is as follows:
refBoot<-oneprop(original_sample=REFERENDUM$vote,
iterations=2000,
null.value = 0.6,
Alt_Hyp= 2,
ci_level=0.95)
b. Write the p-value and 95% confidence interval
p-value:
refBoot$pval
## [1] 0.1575
95% confidence interval for p using the formula method:
95% confidence interval for p using the percentile method:
refBoot$Confidence_Intervals
## CI_Percent CI_Formula
## 1 0.439 0.5310374
## 2 0.800 0.9089626
c. Is the p-value from the oneprop macro close to the exact p-value?
Yes, the p-value is pretty close.
8. Using the most appropriate method, answer the question of interest supported by a p-value.
There is no evidence to suggest that the proportion of voters who vote “Yes” is greater than 60%, with a p-value of 0.1536. Therefore, we will fail to reject the null.
9. Are the bounds of the confidence interval for the proportion using the binomial formula method close to the bounds using the percentile method?
Yes, the bounds a pretty close; however, the percentile method can be off due to randomness. Therefore, it makes more sense to use the formula for comparisons sake.
10. Using the most appropriate confidence interval, interpret the 95% confidence interval for the proportion of ALL voters in this community in favor of the referendum.
We are 95% confident that the true population proportion of community members who vote “Yes” is between (0.506, 0.879)
Example 2: the Smoking example
Beginning September 1, 2012, the Oregon State University campus became smoke-free. Smoking is not allowed anywhere on campus. What percentage of Oregon State University students does this affect? That is, what proportion of all OSU students smoke? To answer this question, use the SMOKE data set, which is based on survey responses from students in the ST 352 class during the Spring 2013 quarter. In the data set, 1 = currently smoke, and 0 = not a current smoker.