One-sample z-test of proportion

Mark Bounthavong

27 February 2026

Introduction

There are situations where you will be asked to compare the performance of your institution with another institution. This is commonly done with projects that I’m on where data collection occurs at a single site, and stakeholders want to compare the single site’s findings with a reference site. More commonly, stakeholders want to compare their performance to a published paper’s findings. In other words, we want to compare an observed finding to a theoretical one.

In the case of proportions, we can compare the proportion of individuals who experienced an event in single site to the proportion from a published study. To do that, we can use the one-sample z-test of proportions.

The hypotheses for the one-sample z-test of proportions are:

\[\begin{align*} H_{0}: p_{observed} = p_{expected} \\ H_{a}: p_{observed} \ne p_{expected} \end{align*}\]

where \(p_{observed}\) is the observed proportion of individuals with the event and \(p_{expected}\) is the expected proportion of individuals with the event. Generally, \(p_{expected}\) will come from a published study, while \(p_{observed}\) comes from data collected at a single site.

For a two-sided alpha of 0.05, we use a z-score of 1.96.

z_score <- qnorm(p = 0.975, mean = 0 , sd = 1)
z_score
## [1] 1.959964

The test statistic for the one-sample z-test is:

\[\begin{align*} z = \frac{p_{observed} - p_{expected}}{SE} \end{align*}\]

where the standard error (SE):

\[\begin{align*} SE = \sqrt{\frac{{p_{observed}(1 - {p_{observed})}}}{n}} \end{align*}\]

If the z-score calculated is greater than 1.96, then the \(p_{observed}\) is statistically significantly different from the \(p_{expected}\).

Motivating example

Suppose we estimated that 7 out of 23 patients (or 30%) at a medical center experienced a myocardial infarction (MI) in January 2026. We want to know if proportion of patients at our medical center who experienced an MI in January 2026 is statistically significantly different from the overall annual average of 27%.

To answer this question, we can perform a one-sample z-test of proportions.

We have the following parameters:

\[\begin{align*} p_{observed} = 0.30 \\ p_{expected} = 0.27 \end{align*}\]

From here, we can estimate the z-score and SE:

\[\begin{align*} SE = \sqrt{\frac{{p_{observed}(1 - {p_{observed})}}}{n}} => 0.096 = \sqrt{\frac{0.30(1 - 0.30)}{23}} \end{align*}\]

\[\begin{align*} z = \frac{p_{observed} - p_{expected}}{SE} => 0.313 = \frac{0.30 - 0.27}{SE} \end{align*}\]

This yields a z-score of 0.313, which is less than 1.96 indicating that the \(p_{observed}\) is not statistically different from \(p_{expected}\). In other words, the proportion of patients at the medical center who experienced an MI (30%) was not statistically significantly different from the overall annual average of 27%.

We can estimate the 95% confidence interval (CI) for the proportion of events that have an MI using the following equation:

\[\begin{align*} 95% CI = p_{observed} +/- 1.96 * SE \end{align*}\]

Using this formula, the 95% CI for the proportion of events that have an MI is 11% and 49%.

0.30 + 1.96*0.096
## [1] 0.48816
0.30 - 1.96*0.096
## [1] 0.11184

We can estimate the p-value using two R functions: binon.test() and prop.test().

Using the binom.test():

mi_event <- 7        # number of patients with an MI event in January 2026
total_patients <- 23 # number of patients seen in January 2026
prop_ref <- 0.27     # annual average proportion of MI events 

binom.test(x = mi_event, n = total_patients, p = prop_ref, alternative = "two.sided")
## 
##  Exact binomial test
## 
## data:  mi_event and total_patients
## number of successes = 7, number of trials = 23, p-value = 0.647
## alternative hypothesis: true probability of success is not equal to 0.27
## 95 percent confidence interval:
##  0.1321029 0.5291917
## sample estimates:
## probability of success 
##              0.3043478

Based on the binom.test(), the p-value is 0.647, which indicates that we don’t have enough evidence to reject the null hypothesis that the proportion of patients at the medical center who experienced an MI (30%) was not statistically significantly different from the overall annual average of 27%.

Using the prop.test():

mi_event <- 7        # number of patients with an MI event in January 2026
total_patients <- 23 # number of patients seen in January 2026
prop_ref <- 0.27     # annual average proportion of MI events 

prop.test(x = mi_event, n = total_patients, p = prop_ref, correct = TRUE)
## 
##  1-sample proportions test with continuity correction
## 
## data:  mi_event out of total_patients, null probability prop_ref
## X-squared = 0.018552, df = 1, p-value = 0.8917
## alternative hypothesis: true p is not equal to 0.27
## 95 percent confidence interval:
##  0.1405633 0.5300578
## sample estimates:
##         p 
## 0.3043478

Based on the binom.test(), the p-value is 0.892, which indicates that we don’t have enough evidence to reject the null hypothesis that the proportion of patients at the medical center who experienced an MI (30%) was not statistically significantly different from the overall annual average of 27%.

Why is there a difference in the p-values between these two methods? That’s because of the assumptions behind each one. I’ll be honest; I had to look up this answer, and I found it with Antoine Soetewey’s online article in Stats and R titled “One-proportion and chi-square goodness of fit test (link). It’s a good read, and I highly recommend you read it.

The binom.test() performs an exact binomial test that is recommended for small sample sizes. The 95% confidence interval (CI) is estimated using the Clopper-Pearson method.

The prop.test() uses a normal approximation of the binomial distribution. Since it uses a normal approximation, larger sample will be needed for accurate estimations. The 95% CI is estimated using the Wilson score method.

Results from the different methods are summarized in the table below.

Method Prop (95% CI) P-value
By hand 30% (11%, 49%) NS*
binomial.test() 30% (13%, 53%) 0.647
prop.test() 30% (14%, 53%) 0.892

NS, not statistically significant based on z-score

Conclusions

If you have want to compare the proportion of events at a site to a theoretical average or a reference, then you can use the one-sample z-test of proportions. You can do this by hand or use R to compute the p-value and 95% CI. It’s recommended to use the binom.test() when the sample size is small.

Acknowledgements

I learned about the differences between the binom.test() and prop.test() from Antoine Soeteway’s article, “One-proportion and chi-square goodness of fit test,” which you can read here.

Disclosures and Disclaimers

This is a work in progress, so expect some changes in the future

This is for educational purposes only.