R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

This document is composed from of one of the msf survey templates (Nutrition, Vaccination, Mortality). The examples chosen are from the mortality survey. To start a survey the first step is to determine the goal. Is the purpose descriptive or exploratory. Focused on past, today or beyond tommorow. A first advice is to start with find out as much as possible about the population to be surveyed from existing sources. The next step is to determine the sample size.

More detailed explanation about the templates, visit https://r4epis.netlify.app/training/walk-through/template_setup/ The package itself can be found on [GitHub issues page]https://r4epi.github.io/sitrep/

Results

Survey inclusion

There are 120 households included across 10 clusters in this survey analysis. Among the 812 individuals excluded from the survey analysis, 695 (85.6%) individuals were excluded due to missing start- or end-dates and 117 (14.4%) were excluded for lack of consent. The reasons for no consent are shown below.

The median number of households per cluster was 11.5, with a range of 8–14. The median number of children per household was 1 (range: 1–4, standard deviation: 0.8). In this case the non response - no consent is very high this can cause a bias in the result. Non response bias can occur when individuals who refuse to take part are systematically different from those who participate fully.

.

Characteristic

N = 8291

no_consent_reason

No perceived benefit

13 (1.6%)

No time

17 (2.1%)

Other

13 (1.6%)

Previous negative experience

15 (1.8%)

Missing

771 (93%)

1n (%)

Demographics

The table shows the number of participants/respondents per age cohort in comparison with their proportion in the population. A binomial test with a p-value correction indicates if there is a significant difference between the cohort in the study group and in the population. If you repeat the binomial test for two proportions within the first cohort, you will see approximately the same result.

Another way to judge the significance is with a z-test for two proportions using a calculator, such as this one. https://epitools.ausvet.com.au/statisticshome.

In this case, the population consists of the citizens of 10 imaginary villages in 2 districts. In other cases, surveys are based on a population standard as a reference.

The World Standard Population is based on the populations of 46 countries and was developed in 1960. There are many “standard” populations. For example, the website of NHS Scotland provides informative details on the European Standard Population, World Standard Population, and Scotland Standard Population. You can find more information here. https://www.opendata.nhs.scot/dataset/standard-populations

Age group

Study population (n)

Study population (%)

Source population (n)

Source population (%)

P-value

0-2

13

7.0

1,360.0

6.8

0.884

15-29

40

21.4

5,520.0

27.6

0.06

3-14

43

23.0

7,244.0

36.2

<0.001

30-44

38

20.3

3,232.0

16.2

0.135

45+

53

28.3

2,644.0

13.2

<0.001

Age-pyramid

unweighted

This age pyramid shows the study population divided over two health_districts. The population is divided over four villages in two district belonging to the same region.

.

Age groups

In the database, there are two variables: one for age in years and one for age in months for those who are less than 12 months old. In the age_group variable, the first table aggregates those less than 12 months old into the first cohort for ages 0 to 2 years.

.

sex

Female

Male

Total

age_group

0-2

5 (2.7%)

8 (4.3%)

13 (7.0%)

15-29

20 (11%)

20 (11%)

40 (21%)

3-14

14 (7.5%)

29 (16%)

43 (23%)

30-44

16 (8.6%)

22 (12%)

38 (20%)

45+

24 (13%)

29 (16%)

53 (28%)

Total

79 (42%)

108 (58%)

187 (100%)

.

sex

Female

Male

Total

age_category

0-2 years

2 (1.1%)

7 (3.7%)

9 (4.8%)

0-5 months

2 (1.1%)

0 (0%)

2 (1.1%)

15-29 years

20 (11%)

20 (11%)

40 (21%)

3-14 years

14 (7.5%)

29 (16%)

43 (23%)

30-44 years

16 (8.6%)

22 (12%)

38 (20%)

45+ years

24 (13%)

29 (16%)

53 (28%)

6-8 months

1 (0.5%)

1 (0.5%)

2 (1.1%)

Total

79 (42%)

108 (58%)

187 (100%)

Survey Design

There are many pros, but perhaps even more cons, to data weighting, and the template didn’t provide an explanation for its use. It may be used because of the amount of non-consent in the survey (Borkowicz, 2023). In an article discussing whether surveys should be weighted, the abstract states, “Surveys have triggered a heated debate regarding their scientific validity. Many authors have adopted weighting methods to enhance the quality of online survey findings, while others did not find an advantage for this method” (Hadad et al., 2022).

In the template, there are four survey designs: simple, stratified, cluster, and the combination of cluster and stratified. Simple random sampling (SRS) requires a comprehensive sampling frame (i.e., a total list of households inside a refugee camp or GPS-based sampling in a known area). Cluster-based sampling is most commonly used in combination with sampling villages proportional to population size.

There are several methods to weight survey data to ensure it is representative of the population. If you are only interested in the respondents who took part in the survey, it is not necessary to use weights; you can use descriptive statistics tools. However, to analyze a sample to make predictions about larger populations, you need to use inferential statistics tools and weight cases properly.

Stratified design divides the sample into strata (e.g., age groups, gender, health districts) and assigns weights based on the population distribution. Cluster sampling involves dividing the population into clusters and randomly selecting some clusters to survey. Combining cluster and stratified sampling involves selecting clusters first and then forming strata within each cluster to improve precision.

.

Characteristic

Female, N = 791

Male, N = 1081

age_group

0-2

5 (6.3%)

8 (7.4%)

15-29

20 (25%)

20 (19%)

3-14

14 (18%)

29 (27%)

30-44

16 (20%)

22 (20%)

45+

24 (30%)

29 (27%)

1n (%)

age_group weighted

the table shows the age-group weighted for strata

Characteristic

District A Weighted Count (n)
N=100001

District B Weighted Count (n)
N=100001

age_group

0-2

680 (6.8%)

680 (6.8%)

15-29

2,760 (28%)

2,760 (28%)

3-14

3,622 (36%)

3,622 (36%)

30-44

1,616 (16%)

1,616 (16%)

45+

1,322 (13%)

1,322 (13%)

1n (%)

registrated causes of illness

the next two tables show diverse illnesses. In the first table the age_group 0-2 shows one person who had diarrhoea. The gender of this person is female. This means that 1 on 8 female in this age_group had this illness. Weighted per strata there are 680 female in the first age_group. In the second table weighted for strata there are 85 persons with diarrhoea. 1/8 is 0,125 x 680 means an estimate of 85 female with diarrhoea in the whole population of 20.000.

Characteristic

0-2, N = 131

15-29, N = 401

3-14, N = 431

30-44, N = 381

45+, N = 531

cause_illness

Diarrhoea

1 (7.7%)

5 (13%)

3 (7.0%)

3 (7.9%)

8 (15%)

Don't know

3 (23%)

6 (15%)

8 (19%)

6 (16%)

6 (11%)

During delivery

0 (0%)

3 (7.5%)

5 (12%)

4 (11%)

5 (9.4%)

During pregnancy

1 (7.7%)

4 (10%)

3 (7.0%)

2 (5.3%)

2 (3.8%)

Fever/malaria

3 (23%)

4 (10%)

5 (12%)

2 (5.3%)

8 (15%)

Other

2 (15%)

4 (10%)

3 (7.0%)

7 (18%)

3 (5.7%)

Post-partum (0-42 days after delivery)

1 (7.7%)

6 (15%)

3 (7.0%)

4 (11%)

6 (11%)

Respiratory infection

1 (7.7%)

3 (7.5%)

4 (9.3%)

5 (13%)

3 (5.7%)

Trauma/accident

0 (0%)

2 (5.0%)

4 (9.3%)

2 (5.3%)

2 (3.8%)

Violence

1 (7.7%)

3 (7.5%)

5 (12%)

3 (7.9%)

10 (19%)

1n (%)

Characteristic

0-2, N = 1,3601

15-29, N = 5,5201

3-14, N = 7,2441

30-44, N = 3,2321

45+, N = 2,6441

cause_illness

Diarrhoea

85 (6.3%)

633 (11%)

340 (4.7%)

260 (8.0%)

396 (15%)

Don't know

510 (38%)

748 (14%)

1,566 (22%)

394 (12%)

293 (11%)

During delivery

0 (0%)

403 (7.3%)

738 (10%)

274 (8.5%)

203 (7.7%)

During pregnancy

85 (6.3%)

518 (9.4%)

537 (7.4%)

125 (3.9%)

121 (4.6%)

Fever/malaria

255 (19%)

575 (10%)

909 (13%)

159 (4.9%)

412 (16%)

Other

170 (13%)

518 (9.4%)

366 (5.0%)

500 (15%)

133 (5.0%)

Post-partum (0-42 days after delivery)

85 (6.3%)

920 (17%)

631 (8.7%)

673 (21%)

339 (13%)

Respiratory infection

85 (6.3%)

460 (8.3%)

598 (8.3%)

308 (9.5%)

160 (6.1%)

Trauma/accident

0 (0%)

345 (6.3%)

651 (9.0%)

168 (5.2%)

78 (2.9%)

Violence

85 (6.3%)

403 (7.3%)

909 (13%)

370 (11%)

509 (19%)

1n (%)

Design Effect

The design effect (deff) is a measure used to assess the efficiency of a sample design compared to a simple random sample. It indicates how much larger the variance of an estimator is in a complex sample design.

The design effect (deff) can vary depending on the complexity of the sample design and the degree of clustering within the sample. Generally, a design effect between 1 and 2 is considered acceptable. This means that the variance of the estimator in the complex sample design is at most twice as large as in a simple random sample.

1 to 1.5: This is usually considered good and indicates a relatively efficient sample design. 1.5 to 2: This is still acceptable, but the sample design is less efficient than a simple random sample. Above 2: This may indicate significant clustering within the sample, which reduces the efficiency of the sample design. In such cases, it may be necessary to review or adjust the sample design.

It is important to remember that the acceptable value of the design effect depends on the specific context and objectives of the research.

variable value n deff
cause_illness Diarrhoea 1712 0.9926287
cause_illness Don’t know 3511 1.5785082
cause_illness During delivery 1617 1.2308960
cause_illness During pregnancy 1386 1.3118863
cause_illness Fever/malaria 2310 1.3463442
cause_illness Other 1687 0.9909957
cause_illness Post-partum (0-42 days after delivery) 2648 1.4922421
cause_illness Respiratory infection 1611 1.2570116
cause_illness Trauma/accident 1242 1.4877225
cause_illness Violence 2276 1.3758310

age pyramid weighted

The weighted data shows a different pyramid in relation to the earlier pyramid in this report. Mainly the two age groups that differ significant to the population cohort.

Survey designs

The three tables represent the outcomes of the strata design and cluster design. The last table shows the effect of the combined cluster and strata design, which gives the biggest difference compared to your survey outcome. To reproduce the outcome, you need to divide the figures into clusters and age groups and then sum them. For example, in the age group 15-29, females in cluster 1 have 92 persons in the strata design and 233 persons in the cluster design. Multiplying 92 by 233 gives the number of persons in the combined design.

The mathematical formula for a cluster design is: ((clusters available) / (clusters surveyed)) ((households in each cluster) / (households surveyed in each cluster)) ((individuals eligible in each household) / (individuals interviewed)).

Characteristic

Female, N = 10,0001

Male, N = 10,0001

age_group

0-2

680 (6.8%)

680 (6.8%)

15-29

2,760 (28%)

2,760 (28%)

3-14

3,622 (36%)

3,622 (36%)

30-44

1,616 (16%)

1,616 (16%)

45+

1,322 (13%)

1,322 (13%)

1n (%)

Characteristic

Female, N = 15,0831

Male, N = 21,7251

age_group

0-2

709 (4.7%)

1,329 (6.1%)

15-29

4,723 (31%)

4,361 (20%)

3-14

2,469 (16%)

5,537 (25%)

30-44

2,736 (18%)

4,026 (19%)

45+

4,446 (29%)

6,473 (30%)

1n (%)

Characteristic

Female, N = 1,870,2181

Male, N = 1,975,2421

age_group

0-2

83,801 (4.5%)

112,930 (5.7%)

15-29

613,417 (33%)

576,124 (29%)

3-14

638,884 (34%)

686,344 (35%)

30-44

290,529 (16%)

301,875 (15%)

45+

243,587 (13%)

297,969 (15%)

1n (%)

Mortality and mortality rates

The World Health Organization publishes worldwide annual mortality rates per country and analyzes the leading causes of death. There is a clear difference in the leading causes of death between low and high-income countries. The WHO also publishes key facts about traffic injuries and fatalities. Road traffic injuries are currently estimated to be the 8th leading cause of death across all age groups globally and are predicted to become the 7th leading cause of death by 2030.

In epidemiology, the standardized mortality ratio (SMR) is a quantity expressed as either a ratio or percentage, quantifying the increase or decrease in mortality of a study cohort with respect to the general population (Wikipedia).

In the table, there are 12 persons who died during the observation time. This is the absolute figure of persons who died, not a ratio or rate.

.

died

FALSE

TRUE

Total

age_group

0-2

11 (5.9%)

2 (1.1%)

13 (7.0%)

15-29

39 (21%)

1 (0.5%)

40 (21%)

3-14

41 (22%)

2 (1.1%)

43 (23%)

30-44

33 (18%)

5 (2.7%)

38 (20%)

45+

51 (27%)

2 (1.1%)

53 (28%)

Total

175 (94%)

12 (6.4%)

187 (100%)

Mortality rate

The table shows the mortality within age_groups where mortality was > 0. For every age-group the observation days are the denominator. For example age-group 0-2 2/90 * 10.000

age_group

Deaths

Mortality per 10,000 person/days (95% CI)

0-2

2

222.2 (98.9-345.5)

15-29

1

1,428.6 (1428.6-1428.6)

3-14

2

1,666.7 (1233.2-2100.1)

30-44

5

555.6 (326.0-785.1)

45+

2

434.8 (228.3-641.3)

Total Deaths

12

observation days

#the observation days that have been used in previous table as denominator.

mortality per age_cohort
age_group deaths total_obstime
0-2 2 90
15-29 1 7
3-14 2 12
30-44 5 90
45+ 2 46

crude and directly standardized rates

Crude Mortality Rate does not account for age, gender, or other demographic factors, making it a rough estimate of mortality in a population. Direct standardization uses a common age structure as a standard. This can be an existing population (e.g., the U.S. population in 1999) or a hypothetical one. In the example, the study population and the mortality are used, and the reference population is the source population with 20,000 persons divided over the age cohorts (see Demographics second table of this report). The table shows district A and district B as used in this study. The value of the direct mortality rate is calculated as the study population within an age cohort divided by the source population within the same age cohort, multiplied by the mortality within the same age cohort. The sum of these figures per age cohort gives the direct rate, which is 0.06045 x 100,000 for district A.

! to compare I added 17 cases of mortality for district B and all the 12 mortality cases I put on district A

,

District total_count total_pop value lowercl uppercl confidence statistic method
A 12 187 6045.989 2923.134 10866.41 95% dsr per 100000 Dobson
B 17 187 8776.088 4858.016 14413.96 95% dsr per 100000 Dobson

standard mortality rate smr or indirect mortality rate

The Standardized Mortality Ratio (SMR) is a measure used to compare the mortality rate of a specific population to that of a standard or reference population. It is calculated by dividing the observed number of deaths in the study population by the expected number of deaths if the study population had the same age distribution as the standard population. In the example the figures of district b are the reference. The observed mortality figure is 12 in district A and the reference is 17. The reference rate is 17 expected deaths within the study population district B (N=187) 17/187 * 100000 = ref_rate= 9090 * observed 12/ expected 17 * ref_rate = value 6417. This value is the SMR.

! to compare I added 17 cases of mortality for district B

.

observed expected ref_rate value lowercl uppercl confidence statistic method
12 17 9090.909 6417.112 3312.037 11210.08 95% indirectly standardised rate per 100000 Byars

weighted ratios

Weighted ratios are particularly useful in survey analysis because they account for the different probabilities of selection among various strata or clusters in the population. This ensures that the estimates are more representative of the entire population, reducing bias and improving accuracy. A weighted ratio take into account the survey design, including stratification and clustering, by applying weights to the data. This method provides more accurate and reliable estimates, especially in complex survey designs. Weighted ratios offer a more accurate and representative estimate by accounting for the survey design, while direct and indirect ratios may be simpler but potentially biased. The overlapping confidence intervals indicate that the estimates from different designs are generally consistent and reliable. The first table shows the different proportions of the mortality in the different designs and the second table shows the weighted ratios.

##    Design Proportion
## 1  Simple 0.06666667
## 2  Strata 0.08095385
## 3 Cluster 0.09267817
##    Design Died_Numeric Mortality Mortality_Low Mortality_Upp Observation_Time
## 1  Simple           12  35.37736      15.08613      55.66858             3392
## 2  Strata           12  44.32901      16.64496      72.01305             3392
## 3 Cluster           12  49.24855     -14.26282     112.75993             3392

Models and Odds

Odds are used to calculate the likelihood of a particular event occurring compared to the chance of it not happening. A variant of this is the log odds. These measures are often used in research on risk factors and in statistical models such as logistic regression.

A notable example is the logistic time model, also known as survival analysis. This model measures events such as mortality over a period of time. An example of this can be found in the Epidemiologist R Handbook.

In science, a combination of these methods and techniques is often used depending on the research field and specific questions. For example, in an epidemiological study of a disease, mortality rates can be used to measure the impact of the disease, while odds and log odds can help identify risk factors. In my Shiny dashboard I made about idps in Sudan on the page with some statistics, log odds are used to demonstrate the significance between the male/female ratio among internally displaced persons and the overall population in Sudan.

The mortality survey don’t use any of this analytics tools.