KITADA

Lab Activity #3 Inference for a single categorical variable of interest (with two categories): The One-Proportion methods

Objectives:

Use R to perform a hypothesis test and construct a confidence interval for a proportion using the binomial formula method
Use the oneprop macro in R to obtain a p-value from a hypothesis test and construct a confidence interval for a proportion
Interpret R output from each analyses listed above

Part I: Examples

Example 1: the Referendum Example

A referendum was placed on a ballot in a local community. If more than 60% of the voters vote in favor of the referendum, it will pass. Otherwise, the referendum will not pass. A random sample of 25 registered voters was taken. Their responses are recorded in the REFERENDUM data set on Blackboard. The variable vote contains two responses: “1” for “Yes” and “0” for “No”. Based on these results, is there evidence to indicate the referendum will pass?

Questions

1. What is the variable of interest and population of interest? Is the variable of interest categorical or quantitative?

Variable of interst: “Yes” vote for referendum

Population of interest: Registered voters in the local community

2. What is the random variable? Is the random variable discrete or continuous?

X = The number of voters who vote “yes”

The random variable is discrete and takes values from 0 to 25.

3. Does the random variable have a binomial distribution? Explain.

Yes, the random variable has a binomial distribution.

Verify the following:

1) Two outcome options

2) Independent observations

3) Fixed number of trials

4) Constant underlying proportions

4. State the null and alternative hypotheses in words and statistical notation. Define the notation.

\( H_0: p = 0.6 \)

\( H_0: p > 0.6 \)

p is the proportion of people who votes “Yes”

5. Exploring the sample data for a single categorical value with two categories involves obtaining either a bar chart or pie chart AND the frequency table. For two categories, a pie chart might be a better graph, but either can be used. Obtain a pie chart and a frequency table.

Frequency table

This command was learned in ST 351. It will be beneficial to create an object that contains the table for creating the pie chart:

votetable<-with(REFERENDUM, table(vote))
votetable

## vote
##  0  1 
##  7 18

Note that the response of “0” comes first in the table, then “1”. This is important for labelling your pie chart.

Pie Chart:

You can make a simple pie chart in R using the pie() function on a tabled categorical variable.

with(REFERENDUM, pie(votetable, 
                     labels=c("No", "Yes"),
                     main="Sample of Referendum Voters"))

plot of chunk unnamed-chunk-3

The order of the labels comes from the order in your table.

a. What proportion of voters in the sample said they’d vote for the referendum? What is the notation for this proportion?

\( \hat{p}=\frac{18}{25}=0.72 \)

b. Based on the exploratory analysis of the pie chart and frequency table, do you feel there is evidence to say that the referendum will pass?

Yes, based on the exploratory analysis it looks like we will fail to reject the null in favor of the alternative; however, we still need to perform a rigorous statistics test.

See Part III for R commands for questions 6 and 7

6. Using R, find the EXACT p-value and 95% confidence interval using the binomial formula.

p-value:

### EXACT BINOMIAL TEST
binom.test(x=18, n=25, p = 0.6,
           alternative = "greater",
           conf.level = 0.95)

## 
##  Exact binomial test
## 
## data:  18 and 25
## number of successes = 18, number of trials = 25, p-value = 0.1536
## alternative hypothesis: true probability of success is greater than 0.6
## 95 percent confidence interval:
##  0.5377911 1.0000000
## sample estimates:
## probability of success 
##                   0.72

p-value = 0.1536

95% confidence interval for p:

### EXACT BINOMIAL TEST
binom.test(x=18, n=25, p = 0.6,
           alternative = "two.sided",
           conf.level = 0.95)

## 
##  Exact binomial test
## 
## data:  18 and 25
## number of successes = 18, number of trials = 25, p-value = 0.3073
## alternative hypothesis: true probability of success is not equal to 0.6
## 95 percent confidence interval:
##  0.5061232 0.8792833
## sample estimates:
## probability of success 
##                   0.72

95 percent confidence interval:

0.5061232 0.8792833

*7. The oneprop macro can be used to estimate the exact p-value and construct a confidence interval for the population proportion. *

If you do not have it already, download the oneprop.txt macro file from Canvas and save it to your z drive. Open the text file.
Copy and paste ALL the oneprop macro code in your R script, highlight all code for the macro and run this code. This will make the macro available for use.
To use the macro, the column that contains the responses of the cases must be coded with a 0 or 1. (For this example, 0 = not in favor and 1 = in favor.) If the responses are not coded this way, the macro may not work or may not work properly.
The arguments for the macro are:
- original_sample the column that contains the responses of the cases
- iterations the number of bootstrap sample proportions you want to generate (suggested: at least 2000)
- null.value the null hypothesized value
- Alt_Hyp a code of either 1, 2 or 3 where
  - 1: less than alternative
  - 2: greater than alternative
  - 3: two-sided
- ci_level the confidence level as a decimal between 0 and 1
- Summary_Stats an optional TRUE/FALSE statement to display summary stats
  - by default this is TRUE
- Histogram an optional TRUE/FALSE statement to display a histogram of the bootstrapped sample proportions
- by default this is TRUE
For our example here, the code to run the macro is as follows:

refBoot<-oneprop(original_sample=REFERENDUM$vote,
                 iterations=2000,
                 null.value = 0.6,
                 Alt_Hyp= 2, 
                 ci_level=0.95)

plot of chunk unnamed-chunk-6

The following output will be displayed in the Console:
the p-value from the hypothesis test. If it is an estimation problem only, the p-value will be displayed as NA
the standard deviation of the bootstrap sample proportions from the specified number of bootstrap samples.
a confidence interval for the population proportion (with the indicated level of confidence in the macro) with bounds determined using the percentile method and the formula method
The alternative hypothesis selected.
The adjusted bootstrapped sample proportions (number based on iterations).
The sample proportion and a histogram of the bootstrapped sample proportions if Summary_Stats=TRUE and histogram=TRUE

b. Write the p-value and 95% confidence interval

p-value:

refBoot$pval

## [1] 0.1575

95% confidence interval for p using the formula method:

95% confidence interval for p using the percentile method:

refBoot$Confidence_Intervals

##   CI_Percent CI_Formula
## 1      0.439  0.5310374
## 2      0.800  0.9089626

c. Is the p-value from the oneprop macro close to the exact p-value?

Yes, the p-value is pretty close.

8. Using the most appropriate method, answer the question of interest supported by a p-value.

There is no evidence to suggest that the proportion of voters who vote “Yes” is greater than 60%, with a p-value of 0.1536. Therefore, we will fail to reject the null.

9. Are the bounds of the confidence interval for the proportion using the binomial formula method close to the bounds using the percentile method?

Yes, the bounds a pretty close; however, the percentile method can be off due to randomness. Therefore, it makes more sense to use the formula for comparisons sake.

10. Using the most appropriate confidence interval, interpret the 95% confidence interval for the proportion of ALL voters in this community in favor of the referendum.

We are 95% confident that the true population proportion of community members who vote “Yes” is between (0.506, 0.879)

Example 2: the Smoking example

Beginning September 1, 2012, the Oregon State University campus became smoke-free. Smoking is not allowed anywhere on campus. What percentage of Oregon State University students does this affect? That is, what proportion of all OSU students smoke? To answer this question, use the SMOKE data set, which is based on survey responses from students in the ST 352 class during the Spring 2013 quarter. In the data set, 1 = currently smoke, and 0 = not a current smoker.