Sample size
library(epiR)
Data
p <- 0.50 #expected prevalence
b <- 25 # units to be examined per cluster
EXAMPLE 1
The expected prevalence of disease in the population is 0.5. We wish to conduct a survey, sampling 25 persons per school. No data are available to provide an estimate of rho, though we suspect the intra-cluster correlation for this disease to be moderate. We wish to be 95% certain of being within 10% of the true population prevalence of disease. How many schools should be sampled?
D <- 4
rho <- (D - 1) / (b - 1)
cluster_ss <- epi.clustersize(p = p, # expected prevalence
b = b, # sampling per school
rho = rho, # difference among variance inside the clusters vs between clusters, i.e. caries prevalence inside every school expected to be homogeneous, and prevalence between schools expected to be homogenous also.
epsilon.r = 0.2, #the acceptable relative error. Relative error is error / expected prevalence
conf.level = 0.95)
We need to sample 16 schools (400 samples in total)
Reference: Otte J, Gumm I (1997). Intra-cluster correlation coefficients of 20 infections calculated from the results of cluster-sample surveys. Preventive Veterinary Medicine 31: 147 - 150.
EXAMPLE 2
A cross-sectional study is to be carried out to determine the prevalence of a given disease in a population using a two-stage cluster design. We estimate prevalence to be 0.5 and we expect rho to be in the order of 0.02. We want to take sufficient samples to be 95% certain that our estimate of prevalence is within 5% of the true population value (that is, a relative error of 0.05 / 0.50 = 0.1). Assuming 25 responses from each cluster, how many clusters do we need to be sample?
cluster_ss_2 <- epi.clustersize(p = p, # expected prevalence
b = b, # participants per cluster
rho = 0.02,
epsilon.r = 0.1,
conf.level = 0.95)
We need to sample 24 clusters (600 samples in total)
Reference: Bennett S, Woods T, Liyanage WM, Smith DL (1991). A simplified general method for clustersample surveys of health in developing countries. World Health Statistics Quarterly 44: 98 - 106.
Sampling procedure (if required)
library(sampling)
Two-stage cluster sampling Uses the ‘swissmunicipalities’ data the variable ‘REG’ (region) has 7 categories; it is used as clustering variable in the first-stage sample the variable ‘CT’ (canton) has 26 categories; it is used as clustering variable in the second-stage sample 4 clusters (regions) are selected in the first-stage 1 canton is selected in the second-stage from each sampled region the method is simple random sampling without replacement in each stage (equal probability, without replacement)
data(swissmunicipalities)
c = swissmunicipalities
c = c[order(c$REG,c$CT),]
attach(c)
m = mstage(c,
stage = list("cluster","cluster"),
varnames = list("REG","CT"),
size = list(4,c(1,1,1,1)),
method = list("srswor","srswor"))
the first stage is m[[1]], the second stage is m[[2]] the selected regions
unique(m[[1]]$REG)
[1] 2 3 5 6
the selected cantons
unique(m[[2]]$CT)
[1] 26 12 16 6
extracts the observed data
x = getdata(c,m)[[2]]
check the output
table(x$REG,x$CT)
LS0tDQp0aXRsZTogIkNMVVNURVIgU0FNUExFIFNJWkUgRk9SIElMWkUiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOiANCiAgICB0b2M6IHllcw0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KICAgIGZpZ19jYXB0aW9uOiB0cnVlDQotLS0NCiMgU2FtcGxlIHNpemUNCg0KYGBge3IsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KGVwaVIpDQpgYGANCiMjIERhdGENCg0KYGBge3J9DQpwIDwtIDAuNTAgI2V4cGVjdGVkIHByZXZhbGVuY2UNCmIgPC0gMjUgIyB1bml0cyB0byBiZSBleGFtaW5lZCBwZXIgY2x1c3Rlcg0KYGBgDQoNCiMjIEVYQU1QTEUgMQ0KIFRoZSBleHBlY3RlZCBwcmV2YWxlbmNlIG9mIGRpc2Vhc2UgaW4gdGhlIHBvcHVsYXRpb24gIGlzIGByIHBgLg0KIFdlIHdpc2ggdG8gY29uZHVjdCBhIHN1cnZleSwgc2FtcGxpbmcgYHIgYmAgcGVyc29ucyBwZXIgc2Nob29sLiBObyBkYXRhDQogYXJlIGF2YWlsYWJsZSB0byBwcm92aWRlIGFuIGVzdGltYXRlIG9mIHJobywgdGhvdWdoIHdlIHN1c3BlY3QNCiB0aGUgaW50cmEtY2x1c3RlciBjb3JyZWxhdGlvbiBmb3IgdGhpcyBkaXNlYXNlIHRvIGJlIG1vZGVyYXRlLg0KIFdlIHdpc2ggdG8gYmUgOTUlIGNlcnRhaW4gb2YgYmVpbmcgd2l0aGluIDEwJSBvZiB0aGUgdHJ1ZSBwb3B1bGF0aW9uDQogcHJldmFsZW5jZSBvZiBkaXNlYXNlLiBIb3cgbWFueSBzY2hvb2xzIHNob3VsZCBiZSBzYW1wbGVkPw0KDQpgYGB7cn0NCg0KRCA8LSA0DQpyaG8gPC0gKEQgLSAxKSAvIChiIC0gMSkNCmNsdXN0ZXJfc3MgPC0gZXBpLmNsdXN0ZXJzaXplKHAgPSBwLCAjIGV4cGVjdGVkIHByZXZhbGVuY2UNCiAgICAgICAgICAgICAgICBiID0gYiwgIyBzYW1wbGluZyBwZXIgc2Nob29sDQogICAgICAgICAgICAgICAgcmhvID0gcmhvLCAjIGRpZmZlcmVuY2UgYW1vbmcgdmFyaWFuY2UgaW5zaWRlIHRoZSBjbHVzdGVycyB2cyBiZXR3ZWVuIGNsdXN0ZXJzLCBpLmUuIGNhcmllcyBwcmV2YWxlbmNlIGluc2lkZSBldmVyeSBzY2hvb2wgZXhwZWN0ZWQgdG8gYmUgaG9tb2dlbmVvdXMsIGFuZCBwcmV2YWxlbmNlIGJldHdlZW4gc2Nob29scyBleHBlY3RlZCB0byBiZSBob21vZ2Vub3VzIGFsc28uIA0KICAgICAgICAgICAgICAgIGVwc2lsb24uciA9IDAuMiwgI3RoZSBhY2NlcHRhYmxlIHJlbGF0aXZlIGVycm9yLiBSZWxhdGl2ZSBlcnJvciBpcyBlcnJvciAvIGV4cGVjdGVkIHByZXZhbGVuY2UNCiAgICAgICAgICAgICAgICBjb25mLmxldmVsID0gMC45NSkNCg0KYGBgDQoNCg0KKipXZSBuZWVkIHRvIHNhbXBsZSBgciBjbHVzdGVyX3NzJGNsdXN0ZXJzYCBzY2hvb2xzIChgciBjbHVzdGVyX3NzJHVuaXRzYCBzYW1wbGVzIGluIHRvdGFsKSoqDQogDQogUmVmZXJlbmNlOiANCiBPdHRlIEosIEd1bW0gSSAoMTk5NykuIEludHJhLWNsdXN0ZXIgY29ycmVsYXRpb24gY29lZmZpY2llbnRzIG9mIDIwIGluZmVjdGlvbnMgY2FsY3VsYXRlZCBmcm9tIHRoZQ0KcmVzdWx0cyBvZiBjbHVzdGVyLXNhbXBsZSBzdXJ2ZXlzLiBQcmV2ZW50aXZlIFZldGVyaW5hcnkgTWVkaWNpbmUgMzE6IDE0NyAtIDE1MC4NCg0KDQojIyBFWEFNUExFIDINCiBBIGNyb3NzLXNlY3Rpb25hbCBzdHVkeSBpcyB0byBiZSBjYXJyaWVkIG91dCB0byBkZXRlcm1pbmUgdGhlIHByZXZhbGVuY2UNCiBvZiBhIGdpdmVuIGRpc2Vhc2UgaW4gYSBwb3B1bGF0aW9uIHVzaW5nIGEgdHdvLXN0YWdlIGNsdXN0ZXIgZGVzaWduLiBXZQ0KIGVzdGltYXRlIHByZXZhbGVuY2UgdG8gYmUgYHIgcGAgYW5kIHdlIGV4cGVjdCByaG8gdG8gYmUgaW4gdGhlIG9yZGVyIG9mIDAuMDIuDQogV2Ugd2FudCB0byB0YWtlIHN1ZmZpY2llbnQgc2FtcGxlcyB0byBiZSA5NSUgY2VydGFpbiB0aGF0IG91ciBlc3RpbWF0ZSBvZg0KIHByZXZhbGVuY2UgaXMgd2l0aGluIDUlIG9mIHRoZSB0cnVlIHBvcHVsYXRpb24gdmFsdWUgKHRoYXQgaXMsIGEgcmVsYXRpdmUNCiBlcnJvciBvZiAwLjA1IC8gMC41MCA9IDAuMSkuIEFzc3VtaW5nIGByIGJgIHJlc3BvbnNlcyBmcm9tIGVhY2ggY2x1c3RlciwNCiBob3cgbWFueSBjbHVzdGVycyBkbyB3ZSBuZWVkIHRvIGJlIHNhbXBsZT8NCg0KDQpgYGB7cn0NCmNsdXN0ZXJfc3NfMiA8LSBlcGkuY2x1c3RlcnNpemUocCA9IHAsICMgZXhwZWN0ZWQgcHJldmFsZW5jZQ0KICAgICAgICAgICAgICAgIGIgPSBiLCAjIHBhcnRpY2lwYW50cyBwZXIgY2x1c3Rlcg0KICAgICAgICAgICAgICAgIHJobyA9IDAuMDIsDQogICAgICAgICAgICAgICAgZXBzaWxvbi5yID0gMC4xLA0KICAgICAgICAgICAgICAgIGNvbmYubGV2ZWwgPSAwLjk1KQ0KYGBgDQoqKldlIG5lZWQgdG8gc2FtcGxlIGByIGNsdXN0ZXJfc3NfMiRjbHVzdGVyc2AgY2x1c3RlcnMgKGByIGNsdXN0ZXJfc3NfMiR1bml0c2Agc2FtcGxlcyBpbiB0b3RhbCkqKg0KDQpSZWZlcmVuY2U6IA0KQmVubmV0dCBTLCBXb29kcyBULCBMaXlhbmFnZSBXTSwgU21pdGggREwgKDE5OTEpLiBBIHNpbXBsaWZpZWQgZ2VuZXJhbCBtZXRob2QgZm9yIGNsdXN0ZXJzYW1wbGUNCnN1cnZleXMgb2YgaGVhbHRoIGluIGRldmVsb3BpbmcgY291bnRyaWVzLiBXb3JsZCBIZWFsdGggU3RhdGlzdGljcyBRdWFydGVybHkgNDQ6IDk4IC0gMTA2Lg0KDQoNCiMgU2FtcGxpbmcgcHJvY2VkdXJlIChpZiByZXF1aXJlZCkNCmBgYHtyLCB3YXJuaW5nPUZBTFNFfQ0KbGlicmFyeShzYW1wbGluZykNCmBgYA0KDQogVHdvLXN0YWdlIGNsdXN0ZXIgc2FtcGxpbmcNCiBVc2VzIHRoZSAnc3dpc3NtdW5pY2lwYWxpdGllcycgZGF0YSANCiB0aGUgdmFyaWFibGUgJ1JFRycgKHJlZ2lvbikgaGFzIDcgY2F0ZWdvcmllczsNCiBpdCBpcyB1c2VkIGFzIGNsdXN0ZXJpbmcgdmFyaWFibGUgaW4gdGhlIGZpcnN0LXN0YWdlIHNhbXBsZQ0KIHRoZSB2YXJpYWJsZSAnQ1QnIChjYW50b24pIGhhcyAyNiBjYXRlZ29yaWVzOyANCiBpdCBpcyB1c2VkIGFzIGNsdXN0ZXJpbmcgdmFyaWFibGUgaW4gdGhlIHNlY29uZC1zdGFnZSBzYW1wbGUNCiA0IGNsdXN0ZXJzIChyZWdpb25zKSBhcmUgc2VsZWN0ZWQgaW4gdGhlIGZpcnN0LXN0YWdlIA0KIDEgY2FudG9uIGlzIHNlbGVjdGVkIGluIHRoZSBzZWNvbmQtc3RhZ2UgZnJvbSBlYWNoIHNhbXBsZWQgcmVnaW9uIA0KIHRoZSBtZXRob2QgaXMgc2ltcGxlIHJhbmRvbSBzYW1wbGluZyB3aXRob3V0IHJlcGxhY2VtZW50IGluIGVhY2ggc3RhZ2UNCiAoZXF1YWwgcHJvYmFiaWxpdHksIHdpdGhvdXQgcmVwbGFjZW1lbnQpDQoNCmBgYHtyfQ0KDQoNCg0KZGF0YShzd2lzc211bmljaXBhbGl0aWVzKQ0KYyA9IHN3aXNzbXVuaWNpcGFsaXRpZXMNCmMgPSBjW29yZGVyKGMkUkVHLGMkQ1QpLF0NCmF0dGFjaChjKQ0KDQptID0gbXN0YWdlKGMsDQogICAgICAgICAgIHN0YWdlID0gbGlzdCgiY2x1c3RlciIsImNsdXN0ZXIiKSwNCiAgICAgICAgICAgdmFybmFtZXMgPSBsaXN0KCJSRUciLCJDVCIpLA0KICAgICAgICAgICBzaXplID0gbGlzdCg0LGMoMSwxLDEsMSkpLA0KICAgICAgICAgICBtZXRob2QgPSBsaXN0KCJzcnN3b3IiLCJzcnN3b3IiKSkNCg0KYGBgDQogdGhlIGZpcnN0IHN0YWdlIGlzIG1bWzFdXSwgdGhlIHNlY29uZCBzdGFnZSBpcyBtW1syXV0NCiB0aGUgc2VsZWN0ZWQgcmVnaW9ucw0KYGBge3J9DQoNCnVuaXF1ZShtW1sxXV0kUkVHKQ0KYGBgDQp0aGUgc2VsZWN0ZWQgY2FudG9ucw0KYGBge3J9DQp1bmlxdWUobVtbMl1dJENUKQ0KYGBgDQpleHRyYWN0cyB0aGUgb2JzZXJ2ZWQgZGF0YQ0KYGBge3J9DQp4ID0gZ2V0ZGF0YShjLG0pW1syXV0NCg0KYGBgDQoNCg0KY2hlY2sgdGhlIG91dHB1dA0KDQpgYGB7cn0NCnRhYmxlKHgkUkVHLHgkQ1QpDQpgYGA=