Sample size calculation is important in cross-sectional studies for estimating proper confidence intervals of the prevalence (or incidence) of any effect. For infinite populations, the sample size needed \(n_0\) could be estimated using the Cochran’s formula presented in Equation (1).
\[\begin{equation} n_0 = \frac{{Z_{1-{\alpha}/2}}^2\cdot p\cdot(1-p)}{d^2} \tag{1} \end{equation}\]
where \(Z_{1-{\alpha}/2}\) is the value of standardized normal distribution at the quantile \((1-\alpha/2)\), being \(\alpha\) the type I error allowed. \(p\) is the estimated prevalence of the outcome in the survey and \(d\) is the precision (also known as the margin error) needed to guarantee in the study.
The result of \(n_0\) could be corrected using the Equation (2) in order to taking into account the size of a non-infinite population.
\[\begin{equation} n = \frac{n_0}{1 + \frac{(n_0-1)}{N}} \tag{2} \end{equation}\]
A cross study design has been performed in Valencia Region in order to gain insight about the side effects in vaccination against the COVID-19. In this case, the population size is \(N =\) 5 000 000. The significance parameter is chosen as \(\alpha = 0.01\). As the incidence of the side effects are unknown, we demand a precision that should be at least \(0.0125\) when the incidence is higher than \(0.05\) and 1/4 of the prevalence in other cases. Therefore, the Figure 1 shows the relationship between the prevalence of side effect \(p\) (unknown) and the sample size needed \(n\), ranging the prevalence among 0.01 and 1.
Figure 1: sample size needed vs prevalence relationship
The maximum of the needed sample size in the Figure 1 are found for incidences of the side effects of \(p=0.01\) with a \(n_{0.01} =\) 10 488 and \(p = 0.5\) with a \(n_{0.5} =\) 10 594.