Link to this report: https://rpubs.com/ibagur/cati_mca_report

EDA (Exploratory Data Analysis)

How surveyed households are distributed?

Let’s see the distribution of surveyed households by province:

What water sources are used on each province?

What handwash methods are used on each province?

What sanitation systems are used on each province?

What water treaments are used on each province?

How reported suspected cases are distributed by province?

What is the consumption of take-away food by province?

MCA (Multiple Correspondance Analysis)

MCA categories correlation plot

This MCA (Multiple Correspondence Analysis) plot helps us understand how different practices and factors are related to each other in households across five provinces concerning suspected cholera cases.

Explanation of the MCA Plot:

  1. Axes: The plot is divided into two main directions: Dimension 1 (horizontal axis) and Dimension 2 (vertical axis). Each axis captures variations in the data; for instance, Dimension 1 captures 18.7% of the variation and Dimension 2 captures 10.4%.

  2. Points and Colors: Each point represents a category from the dataset, such as types of water sources, sanitation practices, or knowledge about cholera prevention. The different colors (although not specifically defined here) likely represent different types of categories, such as practices related to water (blue), knowledge about cholera (green), etc.

  3. Proximity and Association: Categories that are closer to each other on the plot are more often associated in the data. For example, categories that cluster on the right might represent practices or conditions more common in areas with higher risks or reports of cholera (like Tete)

  4. Interpretation of Directions:

Note: The interpretation is based visually on the prevalence and distribution of the different categories in the axis

  • Dimension 1: Could broadly represent the level of proactive health and hygiene awareness, where the negative direction might indicate higher proactive measures and positive values suggest lesser engagement in proactive health behaviors.
  • Dimension 2: Might represent the perceived severity of health risks, with positive values indicating higher direct health risks and corresponding behaviors or interventions, while negative values suggest more either controlled or lower perceived health risks.

MCA clustering plots

Province groups

  1. Color-coded Points: Each color represents a different province, showing the geographical distribution of the data collected. This plot is particularly useful to see if certain provinces have distinct characteristics or practices.

Sanitation groups

Handwash groups

  1. Blue and Green Points: These represent individual households and how they reported their handwashing methods—either using just water (blue) or water and soap/cinders (green). The green points clustered more towards the center and left side of the plot suggest a grouping of households practicing more effective handwashing methods.

  2. Ellipses: The green ellipse encompasses most households that use water and soap/cinders, indicating a potentially lower risk group in terms of hygiene practices, which could correlate with lower susceptibility or transmission of cholera.

  3. Ellipses and Alignment: The clustering of each province, such as Cabo Delgado (blue), highlights regional variations. Interestingly, the alignment of the Cabo Delgado cluster with the green cluster from the handwashing methods plot suggests that in Cabo Delgado, there might be a prevalence of using water and soap for handwashing, which is a safer practice.

  • Risk Assessment and Planning: Understanding these patterns helps health authorities and community leaders to tailor their efforts more effectively. For instance, areas with less optimal handwashing practices might need more resources or targeted educational programs to improve hygiene and prevent disease spread.

Water source groups

Water treatment groups

Events group

Suspected case groups

Take-away food groups

MCA factor map plots (combined)

Factor map 1 : handwash method

  • Correlation between Handwashing and Provincial Data: By aligning these plots, we can hypothesize that provinces like Cabo Delgado might have better hygiene practices, as indicated by their overlap with the safer handwashing method (using water and soap). This insight can be crucial for targeted public health interventions and educational campaigns.

Factor map 2 : water treatment

Factor map 3: sanitation

Factor map 4 (combined): water treatment + handwash method + suspected case

MCA after removing high entropy/noisy variables

We will remove the variables that injected noise in the dataset (the ones that did not provide a clearer pattern):

After removing this variables, we get that both MCA dimensions 1 and 2 can explain almost 40% of the variability of the responses among the households, which is fine for a categorical dataset and better than the previous analysis:

MCA categories correlation plot

  • Dimension 1: Could broadly represent the level of proactive health and hygiene awareness, where values towards the left indicate higher proactive measures and values towards the right suggest lesser engagement in proactive health behaviors.
  • Dimension 2: Might represent the perceived severity of health risks, with values higher up on the plot might indicate higher perceived health risks and corresponding behaviors or interventions, while values lower down might suggest more controlled or lower perceived health risks.

MCA factor map plots

Factor map 1: handwash method

  • Water Only (Água): Points representing the use of water only for handwashing are more spread and slightly skewed towards the right, suggesting less proactive health behavior. This method might be more common in contexts where soap is less available or awareness about the benefits of soap in handwashing is lower.
  • Water and Soap or Ash (Água e sabao ou cinza): These points are more tightly clustered and located towards the left, indicating that households using this method are generally more proactive about health. This is consistent with global health advice that emphasizes handwashing with soap as a critical behavior for preventing the spread of diseases.
  • The distribution along Dim 2 axis is fairly centered, suggesting that perceived health risks may not be a major differentiator in handwashing behavior in this specific plot, as both methods span a similar range vertically.

Factor map 2: water treatment

  • Certeza (Blue): Positioned further to the left, indicating a higher degree of proactive behavior in water treatment. This placement suggests that households using Certeza are more conscientious about ensuring water safety, consistent with its properties as a commercial water purification product.

  • Cloro (Green): Located more centrally but slightly to the right, suggesting a moderate level of proactive behavior. The vertical placement indicates these households are also responding to perceived water-related health risks but perhaps not as strongly as those using boiling (Ferve).

  • Ferve (Yellow): Positioned higher on Dimension 2 and slightly to the right on Dimension 1. The higher vertical position suggests a strong response to perceived health risks, likely due to the effectiveness of boiling in killing pathogens.

  • Nao trata (Orange): Located to the right and lower on Dimension 2, indicating lesser proactive behavior and a lower perceived health risk. This might reflect a lack of awareness or resources to engage in more effective water treatment practices.

  • Outro (Other; Dark Green): Positioned far to the right and high on Dimension 2, suggesting unusual or less common water treatment methods that are perceived as responding to high health risks but are not among the typical proactive methods.

Factor map 3: sanitation

  • Defecação ao ar livre (Green): Positioned on the right side, suggesting less proactive behavior in sanitation management. Its vertical position around the center implies a moderate perception of health risks, possibly indicating a normalization or lack of alternatives in these areas.

  • Latrinas comunitárias/Partilhada (Light Blue): Slightly towards the left but mostly centered, indicating a somewhat proactive approach, potentially due to the shared responsibility or communal efforts to manage sanitation.

  • SIM com tampa (Dark Blue) and SIM sem tampa (Yellow): Both positioned towards the left, which suggests more proactive behaviors. The “with lid” option is slightly more to the left, possibly indicating a slight preference or perception of this method being more effective or hygienic. Their different vertical positions could reflect varied perceptions of risk, with “with lid” being seen as more secure against potential health hazards.

Factor map 4: take-away food

  • No (Não; Blue): The households that do not eat food prepared outside the home are mostly clustered towards the left, suggesting a more proactive approach in managing dietary habits, possibly to avoid health risks associated with external food sources. This might be indicative of a preference for home-cooked meals which are perceived as safer.

  • Yes (Sim; Green): Positioned more towards the right, these households indicate less proactive behavior regarding food safety. Eating food prepared outside the home might be linked to convenience or socio-economic factors but could also suggest less control over food safety standards and higher risk of foodborne illnesses.

Factor map 5: suspected case

  • No (Não; Blue): These households are dispersed across the plot but predominantly lean towards the left, which may indicate more proactive health behaviors and lower overall association with suspected cholera cases. This could imply good preventive practices or lesser exposure to cholera.

  • Yes (Sim; Green): Concentrated more towards the center and right of the plot, suggesting these households might either be less proactive about health or are in environments where cholera exposure is more likely. Their central to rightward clustering on Dimension 1 could imply varied levels of health behaviors, with some being less effective in preventing disease.

UMAP exploration (Uniform Manifold Approximation and Projection)

Perform UMAP to detect overall pattern groups

Encode the CATI dataset and apply UMAP.

  • This will try to find if there are some underlying overall pattern in the households answers
  • It roughly identifies 5 groups. The dots represent households or group of households cluttered together. This means that households under each of these groups roughly answered in a similar way most of the questions.
  • Groups that appear more compact and darker, like the lump on the left, translate in a higher similarity in households answers over the whole survey.

Province groups

Let’s see now how these pre-identified groups align with the different questions asked. For example we can try and see whether the identified groups are somehow aligned with the geographical distribution of surveyed the households

  1. Provincial Clustering: The plot shows that the distinct clusters in some cases clearly align with a province (specially Cabo Delgado, Tete and to certain extent Zambezia), which suggests that there are significant differences in the categorical variables across provinces. This could be due to unique socio-economic, environmental, or cultural factors inherent to each province.

  2. Spatial Relationships: There is some proximity between certain provinces in the UMAP space (the big group at the bottom), which might indicate similar conditions or behaviors shared among Nampula and Zambezia.

  3. Isolated and Overlapping Areas: Some provinces like Cabo Delgado or Tete are relatively isolated in the UMAP space, suggesting unique characteristics. In contrast, provinces like Nampula and Zambezia have overlapping areas, which might indicate similarities in some of the measured variables. Sofala appears also somehow differentiated, but we need also to consider that the number of households surveyed in Sofala is significatively smaller than on the other provinces.

Suspected case groups

We highlight now the households that reported a suspected case “Yes”, to see how these answers align with the pre-identified clusters.

  1. Cluster Patterns: There is a clear aligment with the groups that also correspond to Cabo Delgado and Tete. Not a surprise as most of the households in Cabo answered ‘No’ to this question, whereas households in Tete answered ‘Yes’. For the rest of groups, the alignment is not that clear, suggesting that on other provinces there are mixed patterns among households when answering this question.

  2. Overlap Areas: While some clusters are distinct, there are areas of overlap, particularly where blue and red points are close. This could indicate similar socio-demographic or environmental conditions that might either contribute to or mitigate the risk of cholera, regardless of the outcome.

  3. Isolated Groups: The isolated group at the bottom left, predominantly blue (Não), might represent a unique subset of your data with specific characteristics that are very different from the rest of the data (Cabo Delgado)

Suspected cases groups by province

We might carry the same analysis but splitting by province, so we can better see the situation at each province level. Different patterns might arrise if performing the analysis at lower level (district), but for simplicity we stick to province level for this analysis.

  • Clearly Cabo Delgado and Tete show clear patterns, as reported before. While Tete shows most of the households have answered ‘Yes’, it also shows certain spread which means that there might be similarity among the households on the other questions, but not as tight as in Cabo Delgado.
  • Cabo Delgado clearly shows two sepparate groups of households, a predominant one comprising household who answerted ‘Yes’, and it looks very compact, suggesting that those households share a high degree of similarity in their answers to the CATI survey. On the other hand there is a separate and smaller but also compact group for the households which answered ‘No’, suggesting also a relative high similarity. It could be interesting to perform this analysis at district level, to check how the ‘suspected case’ answers related to districts.

Zambezia:

  • We previously saw, both in the MCA and initial UMAP analysis, that provinces clearly present different patterns and contextual dynamics.
  • For that reason, we will perform the next analysis focusing only in separate provinces, in order to better spot specific patterns within that specific province
  • We will also focus in the relation between different indicators and the ‘suspected case’ indicator, though it is not entirely clear what is the actual scope of this indicator outcome

Overall grouping

  • Overall, there appear to be two distinct groups of households sharing similar characteristics in relation to the answers provided. Next, we can study how these two distinct groups align with specific indicator outcomes

Suspected case groups

  • The previously identified groups seem also to be relatively well aligned with the ‘suspected case’ indicator.
  • Most of the households who answered ‘No’ seem also to share similar answers to other questions, a similar situation takes place on those households answering ‘Yes’, though there are some ‘No’ household within this group also.

Other indicators groupings (multiplot)

Let’s check how these underlying groups align with other indicators

  • Sanitation: Various types of sanitation methods seem to be used indistinctly both in households that reported suspected cases and not, including ‘defecaçao por livre’. However, in the main group of households that reported suspected cases, the use of ‘latrinas comunitarias’ seems to be more common, whereas the use of ‘latrina com tampa’ seems to be less reported

  • Water sources: Various types of water sources also appear to be used indistinctly in the two main groups. However, there is a clear lack of ‘agua canalizada’ in the household group that mostly reported suspected cases and which also share some commonalities.

  • Handwash method: There is definitely a lack of use of ‘agua e sabao’ in the group that reported suspected cases, where mostly only ‘agua’ is used. In the group that reported no suspected cases both methods are used, maybe the water sources are more safe in this second group which also mostly corresponds to households with other different indicator answers than the first group (for example located in other districts).

  • External food compsumption: Apparently there is not much external food consumption in the group that reported cases, whereas this is more common in the other group. It might be that the second group has access to overall better water combined with the use of better handwash. However the question itself was not clear (does it mean food bought externally or just prepared outside the household?)

Nampula:

Overall grouping

  • Overall, there appear to be two distinct groups of households sharing similar characteristics in relation to the answers provided. Next, we can study how these two distinct groups align with specific indicator outcomes

Suspected case groups

  • The previously identified groups seem also to be relatively well aligned with the ‘suspected case’ indicator.
  • Most of the households who answered ‘No’ seem also to share similar answers to other questions, a similar situation takes place on those households answering ‘Yes’, though there are some ‘No’ household within this group also.

Other indicators groupings (multiplot)

Let’s check how these underlying groups align with other indicators

  • Sanitation: TBC

  • Water sources: TBC

  • Handwash method: TBC

  • External food compsumption: TBC

Tete:

Overall grouping

  • Overall, there appear to be two distinct groups of households sharing similar characteristics in relation to the answers provided. Next, we can study how these two distinct groups align with specific indicator outcomes

Suspected case groups

TBC

Other indicators groupings (multiplot)

Let’s check how these underlying groups align with other indicators

  • Sanitation: TBC

  • Water sources: TBC

  • Handwash method: TBC

  • External food compsumption: TBC

Cabo Delgado:

Overall grouping

  • Overall, there appear to be two distinct groups of households sharing similar characteristics in relation to the answers provided. Next, we can study how these two distinct groups align with specific indicator outcomes

Suspected case groups

  • The previously identified groups seem also to be relatively well aligned with the ‘suspected case’ indicator.
  • Most of the households who answered ‘No’ seem also to share similar answers to other questions, a similar situation takes place on those households answering ‘Yes’, though there are some ‘No’ household within this group also.

Other indicators groupings (multiplot)

Let’s check how these underlying groups align with other indicators

  • Sanitation: TBC

  • Water sources: TBC

  • Handwash method: TBC

  • External food compsumption: TBC

Sofala:

Overall grouping

  • Overall, there appear to be roughly three groups of households sharing similar characteristics in relation to the answers provided. Next, we can study how these two distinct groups align with specific indicator outcomes

Suspected case groups

Other indicators groupings (multiplot)

Let’s check how these underlying groups align with other indicators

  • Sanitation: TBC

  • Water sources: TBC

  • Handwash method: TBC

  • External food compsumption: TBC