Link to this report: https://rpubs.com/ibagur/cati_mca_report
Let’s see the distribution of surveyed households by province:
This MCA (Multiple Correspondence Analysis) plot helps us understand how different practices and factors are related to each other in households across five provinces concerning suspected cholera cases.
Explanation of the MCA Plot:
Axes: The plot is divided into two main directions: Dimension 1 (horizontal axis) and Dimension 2 (vertical axis). Each axis captures variations in the data; for instance, Dimension 1 captures 18.7% of the variation and Dimension 2 captures 10.4%.
Points and Colors: Each point represents a category from the dataset, such as types of water sources, sanitation practices, or knowledge about cholera prevention. The different colors (although not specifically defined here) likely represent different types of categories, such as practices related to water (blue), knowledge about cholera (green), etc.
Proximity and Association: Categories that are closer to each other on the plot are more often associated in the data. For example, categories that cluster on the right might represent practices or conditions more common in areas with higher risks or reports of cholera (like Tete)
Interpretation of Directions:
Note: The interpretation is based visually on the prevalence and distribution of the different categories in the axis
Blue and Green Points: These represent individual households and how they reported their handwashing methods—either using just water (blue) or water and soap/cinders (green). The green points clustered more towards the center and left side of the plot suggest a grouping of households practicing more effective handwashing methods.
Ellipses: The green ellipse encompasses most households that use water and soap/cinders, indicating a potentially lower risk group in terms of hygiene practices, which could correlate with lower susceptibility or transmission of cholera.
Ellipses and Alignment: The clustering of each province, such as Cabo Delgado (blue), highlights regional variations. Interestingly, the alignment of the Cabo Delgado cluster with the green cluster from the handwashing methods plot suggests that in Cabo Delgado, there might be a prevalence of using water and soap for handwashing, which is a safer practice.
We will remove the variables that injected noise in the dataset (the ones that did not provide a clearer pattern):
After removing this variables, we get that both MCA dimensions 1 and 2 can explain almost 40% of the variability of the responses among the households, which is fine for a categorical dataset and better than the previous analysis:
Certeza (Blue): Positioned further to the left, indicating a higher degree of proactive behavior in water treatment. This placement suggests that households using Certeza are more conscientious about ensuring water safety, consistent with its properties as a commercial water purification product.
Cloro (Green): Located more centrally but slightly to the right, suggesting a moderate level of proactive behavior. The vertical placement indicates these households are also responding to perceived water-related health risks but perhaps not as strongly as those using boiling (Ferve).
Ferve (Yellow): Positioned higher on Dimension 2 and slightly to the right on Dimension 1. The higher vertical position suggests a strong response to perceived health risks, likely due to the effectiveness of boiling in killing pathogens.
Nao trata (Orange): Located to the right and lower on Dimension 2, indicating lesser proactive behavior and a lower perceived health risk. This might reflect a lack of awareness or resources to engage in more effective water treatment practices.
Outro (Other; Dark Green): Positioned far to the right and high on Dimension 2, suggesting unusual or less common water treatment methods that are perceived as responding to high health risks but are not among the typical proactive methods.
Defecação ao ar livre (Green): Positioned on the right side, suggesting less proactive behavior in sanitation management. Its vertical position around the center implies a moderate perception of health risks, possibly indicating a normalization or lack of alternatives in these areas.
Latrinas comunitárias/Partilhada (Light Blue): Slightly towards the left but mostly centered, indicating a somewhat proactive approach, potentially due to the shared responsibility or communal efforts to manage sanitation.
SIM com tampa (Dark Blue) and SIM sem tampa (Yellow): Both positioned towards the left, which suggests more proactive behaviors. The “with lid” option is slightly more to the left, possibly indicating a slight preference or perception of this method being more effective or hygienic. Their different vertical positions could reflect varied perceptions of risk, with “with lid” being seen as more secure against potential health hazards.
No (Não; Blue): The households that do not eat food prepared outside the home are mostly clustered towards the left, suggesting a more proactive approach in managing dietary habits, possibly to avoid health risks associated with external food sources. This might be indicative of a preference for home-cooked meals which are perceived as safer.
Yes (Sim; Green): Positioned more towards the right, these households indicate less proactive behavior regarding food safety. Eating food prepared outside the home might be linked to convenience or socio-economic factors but could also suggest less control over food safety standards and higher risk of foodborne illnesses.
No (Não; Blue): These households are dispersed across the plot but predominantly lean towards the left, which may indicate more proactive health behaviors and lower overall association with suspected cholera cases. This could imply good preventive practices or lesser exposure to cholera.
Yes (Sim; Green): Concentrated more towards the center and right of the plot, suggesting these households might either be less proactive about health or are in environments where cholera exposure is more likely. Their central to rightward clustering on Dimension 1 could imply varied levels of health behaviors, with some being less effective in preventing disease.
UMAP (Uniform Manifold Approximation and Projection) is an unsupervised learning technique that reduces the complexity of high-dimensional data (high number of indicators) to lower dimensions for easier visualization and analysis. It maintains the overall structure and internal relations of the data, helping reveal inherent patterns and clusters of data with similar characteristics.
In this analysis, the coordinates do not inherently carry a direct interpretation as with MCA, but their purpose is to actually place and separate the identified groupings and clusters.
We will focus part of the analysis in the relation between different indicators and the ‘suspected case’ indicator, though it is not entirely clear what is the actual scope of this indicator outcome
Encode the CATI dataset and apply UMAP.
Let’s see now how these pre-identified groups align with the different questions asked. For example we can try and see whether the identified groups are somehow aligned with the geographical distribution of surveyed the households
Provincial Clustering: The plot shows that the distinct clusters in some cases clearly align with a province (specially Cabo Delgado, Tete and to certain extent Zambezia), which suggests that there are significant differences in the categorical variables across provinces. This could be due to unique socio-economic, environmental, or cultural factors inherent to each province.
Spatial Relationships: There is some proximity between certain provinces in the UMAP space (the big group at the bottom), which might indicate similar conditions or behaviors shared among Nampula and Zambezia.
Isolated and Overlapping Areas: Some provinces like Cabo Delgado or Tete are relatively isolated in the UMAP space, suggesting unique characteristics. In contrast, provinces like Nampula and Zambezia have overlapping areas, which might indicate similarities in some of the measured variables. Sofala appears also somehow differentiated, but we need also to consider that the number of households surveyed in Sofala is significatively smaller than on the other provinces.
We highlight now the households that reported a suspected case “Yes”, to see how these answers align with the pre-identified clusters.
Cluster Patterns: There is a clear aligment with the groups that also correspond to Cabo Delgado and Tete. Not a surprise as most of the households in Cabo answered ‘No’ to this question, whereas households in Tete answered ‘Yes’. For the rest of groups, the alignment is not that clear, suggesting that on other provinces there are mixed patterns among households when answering this question.
Overlap Areas: While some clusters are distinct, there are areas of overlap, particularly where blue and red points are close. This could indicate similar socio-demographic or environmental conditions that might either contribute to or mitigate the risk of cholera, regardless of the outcome.
Isolated Groups: The isolated group at the bottom left, predominantly blue (Não), might represent a unique subset of your data with specific characteristics that are very different from the rest of the data (Cabo Delgado)
We might carry the same analysis but splitting by province, so we can better see the situation at each province level. Different patterns might arrise if performing the analysis at lower level (district), but for simplicity we stick to province level for this analysis.
Let’s check how these underlying groups align with other indicators
Sanitation: Various types of sanitation methods seem to be used indistinctly both in households that reported suspected cases and not, including ‘defecaçao por livre’. However, in the main group of households that reported suspected cases, the use of ‘latrinas comunitarias’ seems to be more common, whereas the use of ‘latrina com tampa’ seems to be less reported
Water sources: Various types of water sources also appear to be used indistinctly in the two main groups. However, there is a clear lack of ‘agua canalizada’ in the household group that mostly reported suspected cases and which also share some commonalities.
Handwash method: There is definitely a lack of use of ‘agua e sabao’ in the group that reported suspected cases, where mostly only ‘agua’ is used. In the group that reported no suspected cases both methods are used, maybe the water sources are more safe in this second group which also mostly corresponds to households with other different indicator answers than the first group (for example located in other districts).
External food compsumption: Apparently there is not much external food consumption in the group that reported cases, whereas this is more common in the other group. It might be that the second group has access to overall better water combined with the use of better handwash. However the question itself was not clear (does it mean food bought externally or just prepared outside the household?)
Let’s check how these underlying groups align with other indicators
Sanitation: TBC
Water sources: TBC
Handwash method: TBC
External food compsumption: TBC
TBC
Let’s check how these underlying groups align with other indicators
Sanitation: TBC
Water sources: TBC
Handwash method: TBC
External food compsumption: TBC
Let’s check how these underlying groups align with other indicators
Sanitation: TBC
Water sources: TBC
Handwash method: TBC
External food compsumption: TBC
Let’s check how these underlying groups align with other indicators
Sanitation: TBC
Water sources: TBC
Handwash method: TBC
External food compsumption: TBC