Topic 9: Hypothesis Testing for One and Two Sample Proportions


In Topic 9 we extended our coverage of hypothesis tests to include hypothesis tests of proportions. In this computer lab, we will practice carrying out hypothesis tests for both one-sample and two-sample proportions.


1 One-sample Test of Proportions

For this question, suppose we are interested in the proportion of first-year university students who drink coffee regularly. Further suppose that a recent claim in a newspaper stated that 65% of first-year university students drink coffee regularly.

In order to test this claim, we can carry out a one-sample test of proportions. Let \(p\) denote the proportion of first-year university students who drink coffee regularly. Suppose that we survey first year students at La Trobe University, and find that, out of the \(n=840\) respondents, \(x=582\) students said that they drink coffee regularly.

1.1

🏑 What is \(\widehat{p}\), our estimate of \(p\)?

🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.

1.2

🏑 What are the null and alternative hypotheses for this test?

🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.

1.3

🏑 Remember, just like for our other hypothesis tests, it is important to check that the assumptions of this hypothesis test are satisfied, before we proceed.

Check that the one-sample test of proportions conditions are satisfied for our example.

Hint: If you need a reminder, check the code chunk below:

# Remember, both n*p and n*(1-p) must be greater than or equal to 5
# in order for us to be able to use the Central Limit Theorem.

1.4

🏑 Assuming that the test conditions have been satisfied, write out the approximate distribution for our proportion.

🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.

1.5

πŸ’» We can use the prop.test R function to carry out a one-sample test of proportions for our data.

This function takes several arguments, which will be specified differently depending on the type of proportion test we are conducting. For a one-sample test of proportions, the arguments are:

  • x: The number of successes
  • n: The number of trials
  • p: The probability of success under the null hypothesis

Using this information, carry out a one-sample test of proportions for the data specified in 1 using the prop.test function.

Hint: Check the code chunk below if you are stuck - you will need to replace the ...s with your own values:

prop.test(x = ..., n = ..., p = ...)

1.6

🏑 Interpret the output of the one-sample test of proportions, and note down:

  • The test statistic value,
  • the \(p\)-value, and
  • the 95% confidence interval for the proportion.
🎧 Online students πŸ’¬ Based on these results, should you reject the null hypothesis? Enter your answer next to the question in the shared Google Doc.

1.7

🏑 Write a short conclusion summarising this test, and state your decision regarding the hypothesis, with reference to your findings from 1.6.

2 Two-sample Test of Proportions

Suppose that we would now like to check if the proportion of first-year university students who regularly drink coffee differs significantly from the proportion of final-year university students who regularly drink coffee.

Suppose that following a survey of final-year students at La Trobe University, we find that, out of the \(n=414\) respondents, \(x=302\) students said that they drink coffee regularly.

2.1

🏑 Let \(p_1\) now denote the proportion of first-year university students who drink coffee regularly, and let \(p_2\) denote the proportion of final-year university students who drink coffee regularly.

Using the survey data from questions 1 and 2, calculate \(\widehat{p}_1\) and \(\widehat{p}_2\).

🎧 Online students πŸ’¬ Enter your answer next to the question in the shared Google Doc.

2.2

🏑 What are the null and alternative hypotheses for this test?

2.3

🏑 Check that the assumptions for the two-sample test of proportions are satisfied.

Hint: Remember, we have two data samples we need to check this time.

2.4

πŸ’» Use the prop.test R function to carry out a two-sample test of proportions for our data.

Recall from 1.5 that the main arguments in the prop.test function are x, n and p. Now that we are considering two samples, these arguments will be vectors, rather than scalars.

Hint: If you are not sure how to test multiple samples at once, check the code chunk below - you will need to replace the ...s with your own values:

prop.test(x = c(..., ...), n = c(..., ...))

2.5

🏑 Interpret the output of the two-sample test of proportions, and write a brief conclusion summarising your findings.

🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.

3 Extension: Visualising the Data

It is always a good idea to try and visualise the data we are analysing. The typical plot used to visualise proportions data is a stacked bar chart, also known as a stacked bar plot. However, this type of plot can be a little tricky to create in R (which is why we have left this question as an extension option).

3.1

πŸ’» For these graphs, we will use the plotly R package. Run the following code to load this package in R.

Note: If you haven’t already installed plotly, uncomment the first line of code below, and run that too.

# install.packages("plotly")
library(plotly)

3.2

πŸ’» First, we will try to visualise the data for question 1. Run the following code:

students <- "First Year"
coffee <- 582
noCoffee <- 840 - coffee
data <- data.frame(students, coffee, noCoffee)

fig <- plot_ly(data, x = ~students, y = ~coffee, type = 'bar', name = 'Regular Coffee Drinker')
fig <- fig %>% add_trace(y = ~noCoffee, name = 'Not Regular Coffee Drinker')
fig <- fig %>% layout(yaxis = list(title = 'Count'), barmode = 'stack')

fig

3.3

πŸ’» Using the code above as a base, try to now visualise the data for question 2 side-by-side in the one plot.

If you are able to produce this plot, you could also try the following:

  • Change the order of the plots by adding the code xaxis = list(categoryorder = 'category descending') to the layout(...) section of the code, so that the First Year students’ data is shown first.
  • Add numbers to each plot section, by adding the code text = ..., textposition = 'auto' to the plot_ly and add_trace sections of the code (you will have to work out what to replace the … by).
  • Instead of plotting Count on the y-axis, change this to cumulative percentage, so that proportions between the two groups are easier to compare.\(^\dagger\)

Hint: Read through the code above carefully and try to understand what each of the commands are doing. If you get stuck, ask your lab demonstrator for guidance.

\(^\dagger\) Hint: To display your data as cumulative percentages rather than counts, try changing the values in coffee and noCoffee so that they are percentages rather than counts:

🎧 Online students πŸ’¬ Volunteer to share your screen and explain your answers to this question.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.