The global burden of HIV/AIDS continues to challenge public health systems worldwide, with marked disparities across: Regions, Economic groups, and Genders.
This analysis brings a fresh perspective by integrating:
The Main goal is to uncover nuanced insights that can drive targeted interventions.
This analysis aims to deliver the following key insights:
Provide a detailed understanding of gender-specific disparities in: HIV/AIDS prevalence, and AIDS-related deaths.
Highlight the role of economic factors, such as GDP per capita, in shaping these disparities.
Explore eventual anomalies such as, High prevalence rates correlating with low AIDS-related deaths.
Determine whether gender differences in Prevalence, or Deaths are statistically significant.
Identify priority areas for Policy interventions and Resource allocation.
Focus on countries with:
- Pronounced gender disparities, or
- Observable anomalies (e.g., high prevalence but low
death rates).
These findings aim to support evidence-based decision-making and contribute to addressing gender inequalities in the global HIV/AIDS burden.
The analysis is based on data from two key sources:
1. UNAIDS (ONUSIDA): For HIV/AIDS indicators such as
prevalence and AIDS-related deaths.
2. World Bank: For GDP data adjusted for
purchasing power parity (PPP).
You can access the datasets and the Jupyter
Notebook here:
Download
the ZIP file
The dataset used in this analysis comprises 11 columns and 117 rows, representing countries with up-to-date data for 2023. The data provides insights into various HIV/AIDS indicators, gender disparities, and economic factors across different countries. The key variables included are as follows:
# Load necessary libraries
library(dplyr)
library(ggplot2)
library(sf)
library(DT)
library(htmlwidgets )
library(htmltools)
library(FSA)
library(ggsignif)
library(corrplot)
library(psych)
library(tidyr)
# Load dataset
data <- read.csv("data_AIDS_and_GDP_cleaned.csv")
# Check for missing values
describe_result <- describe(data)
describe_result$na <- sapply(data, function(x) sum(is.na(x)))
result <- describe_result[, "na", drop = FALSE]
print(result)
## na
## Country* 0
## PeopleWithAIDS_All_adults_2023 0
## PeopleWithAIDS_Female_adults_2023 2
## PeopleWithAIDSe_Male_adults_2023 2
## AIDS_Prevalence_All_adults_2023 0
## AIDS_Prevalence_Female_adults_2023 2
## AIDS_Prevalence_Male_adults_2023 2
## AIDS_related_deaths_All_adults_2023 0
## AIDS_related_deaths_Female_adults_2023 2
## AIDS_related_deaths_Male_adults_2023 2
## GPD_PCAP_2023 0
Decision
Since the number of missing values is relatively small, we have
decided to remove rows containing missing values.
This approach simplifies the analysis while having minimal impact on the
overall results,as the missing data represents an insignificant fraction
of the dataset.
# Handle missing values by removing rows with NA)
data_clean <- na.omit(data)
data_clean <- na.omit(data)
na_count <- sum(is.na(data_clean))
cat("After removing rows with missing values, the number of NA values remaining is:", na_count, "\n")
## After removing rows with missing values, the number of NA values remaining is: 0
Objectives - Compare HIV/AIDS prevalence and AIDS-related deaths between men and women. - Identify countries where gender disparities are most pronounced.
# Subset data by gender
women_data <- data_clean %>% select(Country, GPD_PCAP_2023, AIDS_Prevalence_Female_adults_2023, AIDS_related_deaths_Female_adults_2023)
men_data <- data_clean %>% select(Country, GPD_PCAP_2023, AIDS_Prevalence_Male_adults_2023, AIDS_related_deaths_Male_adults_2023)
a- Summarize key statistics for HIV prevalence and deaths by gender
Women key statistics
# Summary statistics for women
women_data_summary <- describe(women_data)
women_data_summary[, c("n", "mean", "sd", "median", "min", "max")]
## n mean sd median min
## Country* 115 58.00 33.34 58.00 1.00
## GPD_PCAP_2023 115 25607.46 28752.09 16062.02 919.91
## AIDS_Prevalence_Female_adults_2023 115 1.46 4.55 0.20 0.10
## AIDS_related_deaths_Female_adults_2023 115 631.30 1035.71 200.00 100.00
## max
## Country* 115.0
## GPD_PCAP_2023 143809.5
## AIDS_Prevalence_Female_adults_2023 32.2
## AIDS_related_deaths_Female_adults_2023 5300.0
Men key statistics
# Summary statistics for men
women_data_summary <- describe(men_data)
women_data_summary[, c("n", "mean", "sd", "median", "min", "max")]
## n mean sd median min
## Country* 115 58.00 33.34 58.00 1.00
## GPD_PCAP_2023 115 25607.46 28752.09 16062.02 919.91
## AIDS_Prevalence_Male_adults_2023 115 1.13 3.00 0.50 0.10
## AIDS_related_deaths_Male_adults_2023 115 732.17 1009.04 500.00 100.00
## max
## Country* 115.0
## GPD_PCAP_2023 143809.5
## AIDS_Prevalence_Male_adults_2023 22.5
## AIDS_related_deaths_Male_adults_2023 5200.0
Objective: Create maps showing prevalence by gender.
Load spatial data and merge with HIV/AIDS data_clean
# Load spatial data (replace with actual shapefile for country boundaries)
world_shapefile <- st_read("ne_10m_admin_0_countries/ne_10m_admin_0_countries.shp")
## Reading layer `ne_10m_admin_0_countries' from data source
## `C:\Users\PDG Junior\Desktop\M.Sc biostatistics and epidemiology\Biostatistics\R_project\AIDS\ne_10m_admin_0_countries\ne_10m_admin_0_countries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 258 features and 168 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.6341
## Geodetic CRS: WGS 84
# Merge spatial data with HIV/AIDS data
map_data <- world_shapefile %>%
left_join(data_clean, by = c("NAME" = "Country"))
Prevalence Map for Woman
# Map for women
map_women <- ggplot(map_data) +
geom_sf(aes(fill = AIDS_Prevalence_Female_adults_2023), color = "white") +
scale_fill_viridis_c() +
theme_minimal() +
labs(title = "HIV Prevalence among Women by Country",
fill = "Prevalence")
print(map_women)
Prevalence Map for Men
# Map for men
map_men <- ggplot(map_data) +
geom_sf(aes(fill = AIDS_Prevalence_Male_adults_2023), color = "white") +
scale_fill_viridis_c() +
theme_minimal() +
labs(title = "HIV Prevalence among Men by Country",
fill = "Prevalence")
print(map_men)
HIV Prevalence Map among Women by Country and GDP
Categories
HIV Prevalence Map among men by Country and GDP
Categories
b- Calculate disparities between gender “HIV prevalence” and “death”
# Calculate disparities
data_clean_With_Disparities <- data_clean %>%
mutate(Prevalence_Disparity_2023 = round(abs(AIDS_Prevalence_Female_adults_2023 - AIDS_Prevalence_Male_adults_2023), 2),
Deaths_Disparity_2023 = abs(AIDS_related_deaths_Female_adults_2023 - AIDS_related_deaths_Male_adults_2023))%>%
select(Country, Prevalence_Disparity_2023, Deaths_Disparity_2023)
saveRDS(data_clean_With_Disparities, "data_clean_With_Disparities.rds")
datatable(data_clean_With_Disparities,
options = list(
),
caption = 'Gender "HIV prevalence" and "death" disparities Table pertaining to 2023'
)
HIV prevalence
HIV related death
Objective: Highlight gender disparities and regional trends.
a. Bar Chart
b. Boxplot
# Boxplot
boxplot <- ggplot(data_clean, aes(x = "Gender", y = AIDS_Prevalence_Male_adults_2023)) +
geom_boxplot(fill = "lightgreen") +
labs(title = "Boxplot of Prevalence for mal",
x = "", y = "HIV Prevalence")
print(boxplot)
Check summary for any extreme outliers
# Check summary for any extreme outliers
summary(data_clean$AIDS_Prevalence_Female_adults_2023)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 0.100 0.200 1.461 1.000 32.200
summary(data_clean$AIDS_Prevalence_Male_adults_2023)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.100 0.200 0.500 1.126 0.900 22.500
Presence of Extreme Values
The summary statistics indicate the presence of extreme values, as evidenced by the maximum values of 32.2 and 22.5 in the respective datasets.
Boxplot with limited scale to exclude extreme values
Objective: Explore whether HIV prevalence varies significantly across GDP categories (low, middle, high). Null Hypothesis (Ho): The distributions of AIDS_Prevalence_All_adults_2023 are the same across the GDP_Category groups. In other words, there is no difference in medians between the groups.
Alternative Hypothesis (Ha): At least one group has a different distribution or median compared to the others.
We’ll use histograms and Q-Q plots to assess the distribution of AIDS_Prevalence_All_adults_2023 for each GDP category.
Histogram for each GDP category
ggplot(data_clean, aes(x = AIDS_Prevalence_All_adults_2023, fill = GDP_Category)) +
geom_histogram(aes(y = after_stat(density)), bins = 15, alpha = 0.4, position = "identity", color = "black") +
geom_density(alpha = 1, linewidth = 1.2, aes(color = GDP_Category)) +
facet_wrap(~GDP_Category) +
theme_minimal() +
scale_color_manual(values = c("red", "blue", "green")) + # Set distinct colors for curves
scale_fill_manual(values = c("lightpink", "lightblue", "lightgreen")) + # Set distinct colors for bars
labs(title = "Histogram and Frequency Curve of HIV Prevalence by GDP Category",
x = "HIV Prevalence (%)",
y = "Density") +
theme(legend.position = "top",
legend.title = element_blank())
Q-Q Plot for normality assessment
ggplot(data_clean, aes(sample = AIDS_Prevalence_All_adults_2023)) +
stat_qq() +
stat_qq_line() +
facet_wrap(~GDP_Category) +
theme_minimal() +
labs(title = "Q-Q Plot of HIV Prevalence by GDP Category",
x = "Theoretical Quantiles",
y = "Sample Quantiles")
The Shapiro-Wilk test will be conducted separately for each GDP category to statistically test the normality of AIDS_Prevalence_All_adults_2023.
# Shapiro-Wilk test for normality within each GDP category
normality_results <- lapply(unique(data_clean$GDP_Category), function(category) {
shapiro.test(data_clean$AIDS_Prevalence_All_adults_2023[data_clean$GDP_Category == category])
})
# Assign category names to the results
names(normality_results) <- unique(data_clean$GDP_Category)
# Display results
print("Shapiro-Wilk Test Results by GDP Category:")
## [1] "Shapiro-Wilk Test Results by GDP Category:"
print(normality_results)
## $Low
##
## Shapiro-Wilk normality test
##
## data: data_clean$AIDS_Prevalence_All_adults_2023[data_clean$GDP_Category == category]
## W = 0.39495, p-value = 1.684e-11
##
##
## $Moderate
##
## Shapiro-Wilk normality test
##
## data: data_clean$AIDS_Prevalence_All_adults_2023[data_clean$GDP_Category == category]
## W = 0.37255, p-value = 1.43e-11
##
##
## $High
##
## Shapiro-Wilk normality test
##
## data: data_clean$AIDS_Prevalence_All_adults_2023[data_clean$GDP_Category == category]
## W = 0.69954, p-value = 1.621e-07
The Shapiro-Wilk test shows that the distribution is not normal
Perform Log transformation to normalize the distribution
data_clean$Log_Prevalence <- log(data_clean$AIDS_Prevalence_All_adults_2023 + 1)
performing the Shapiro-Wilk test again
normality_results <- lapply(unique(data_clean$GDP_Category), function(category) {
shapiro.test(data_clean$Log_Prevalence[data_clean$GDP_Category == category])
})
# Assign category names to the results
names(normality_results) <- unique(data_clean$GDP_Category)
# Display results
print("Shapiro-Wilk Test Results by GDP Category:")
## [1] "Shapiro-Wilk Test Results by GDP Category:"
print(normality_results)
## $Low
##
## Shapiro-Wilk normality test
##
## data: data_clean$Log_Prevalence[data_clean$GDP_Category == category]
## W = 0.79939, p-value = 8.168e-06
##
##
## $Moderate
##
## Shapiro-Wilk normality test
##
## data: data_clean$Log_Prevalence[data_clean$GDP_Category == category]
## W = 0.60077, p-value = 5.814e-09
##
##
## $High
##
## Shapiro-Wilk normality test
##
## data: data_clean$Log_Prevalence[data_clean$GDP_Category == category]
## W = 0.76835, p-value = 2.519e-06
the distribution still remains non-normal.
Since the distribution remains non-normal even after the log transformation, we will proceed with a non-parametric test.
As the normality assumption is not meet, rather to use ANOVA to compare means between multiple groups, we will use Kruskal-Wallis test The Kruskal-Wallis test is a non-parametric method used to determine if there are statistically significant differences between the medians of three or more independent groups. It is often used when the assumptions of ANOVA (normality and homogeneity of variances) are not met
#Kruskal-Wallis test to compare HIV prevalence by GDP categories
kruskal_test<- kruskal.test(AIDS_Prevalence_All_adults_2023 ~ GDP_Category, data = data_clean)
print(kruskal_test)
##
## Kruskal-Wallis rank sum test
##
## data: AIDS_Prevalence_All_adults_2023 by GDP_Category
## Kruskal-Wallis chi-squared = 14.294, df = 2, p-value = 0.0007872
Kruskal-Wallis chi-squared: 14.294 This is the test statistic, which measures the degree of difference in ranks among the groups. A higher value suggests greater differences. Degrees of freedom (df): 2 This corresponds to the number of groups minus one (e.g., 3 GDP categories: Low, Moderate, High → df=3−1). p-value: 0.0007872 This indicates the probability of observing the data (or more extreme results) under the null hypothesis. A p-value of 0.0007872 is significantly lower than a common significance level (α=0.05). **Therefore, reject the null *hypothesis.**
The Kruskal-Wallis test shows a significant difference in AIDS_Prevalence_All_adults_2023 among GDP categories (p = 0.0008). This suggests that AIDS prevalence varies meaningfully with GDP levels. Post hoc analysis can reveal specific group differences.
We will use the Dunn test for pairwise comparisons
dunn_test <- dunnTest(AIDS_Prevalence_All_adults_2023 ~ GDP_Category, data = data_clean, method = "bonferroni")
print(dunn_test)
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Bonferroni method.
## Comparison Z P.unadj P.adj
## 1 High - Low -3.764237 0.0001670586 0.0005011757
## 2 High - Moderate -1.589149 0.1120268530 0.3360805590
## 3 Low - Moderate 2.164802 0.0304028298 0.0912084893
Interpretation of Post Hoc Results The post hoc comparisons using Dunn’s test show the pairwise differences among GDP categories with adjusted p-values (Bonferroni correction). Here’s the breakdown:
High vs. Low Z = -3.76, Adjusted p-value = 0.0005 The difference between the “High” and “Low” GDP categories in terms of AIDS prevalence is statistically significant (p < 0.05), indicating a notable disparity.
High vs. Moderate Z = -1.59, Adjusted p-value = 0.336 The difference between the “High” and “Moderate” GDP categories is not statistically significant (p > 0.05).
Low vs. Moderate Z = 2.16, Adjusted p-value = 0.091 The difference between the “Low” and “Moderate” GDP categories is not statistically significant after adjustment (p > 0.05).
Summary There is a significant difference in AIDS prevalence between countries with “High” and “Low” GDP categories. Differences between “High vs. Moderate” and “Low vs. Moderate” GDP categories are not statistically significant after Bonferroni adjustment.
Boxplots to illustrate variations.
##### P-Value and Significance Levels
#####“NS” (Not Significant)** P-value: > 0.05 Confidence Level: Less than 95% Significance: Not significant
P-value: 0.01 < P ≤ 0.05 Confidence Level: 95% Significance: Significant
P-value: 0.001 < P ≤ 0.01 Confidence Level: 99% Significance: Very significant
P-value: P ≤ 0.001 Confidence Level: 99.9% Significance: Highly significant
The Dunn Test adjusts for the risk of Type I errors (false positives) when making multiple comparisons. While the difference between Low and Moderate is visible in the boxplot, the Dunn Test finds it not statistically significant at the 95% confidence level (with an adjusted p-value of 0.091).
The boxplot suggests a visible difference without considering necessary statistical adjustments, which can lead to the appearance of significance, even if the statistical test does not support it.
Key Takeaway: - Boxplot: Useful for quickly observing trends, but may not account for adjustments needed when making multiple comparisons.
Test Normality
# Appliquer le test de Shapiro-Wilk à toutes les variables quantitatives
shapiro_results <- sapply(data_clean[ , c("AIDS_Prevalence_All_adults_2023",
"AIDS_Prevalence_Female_adults_2023",
"AIDS_Prevalence_Male_adults_2023",
"AIDS_related_deaths_All_adults_2023",
"AIDS_related_deaths_Female_adults_2023",
"AIDS_related_deaths_Male_adults_2023",
"GPD_PCAP_2023")], shapiro.test)
# Résultats des tests de Shapiro-Wilk
shapiro_results
## AIDS_Prevalence_All_adults_2023 AIDS_Prevalence_Female_adults_2023
## statistic 0.3107968 0.3111598
## p.value 7.318499e-21 7.399975e-21
## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]" "X[[i]]"
## AIDS_Prevalence_Male_adults_2023 AIDS_related_deaths_All_adults_2023
## statistic 0.3147373 0.6127807
## p.value 8.254949e-21 5.883565e-16
## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]" "X[[i]]"
## AIDS_related_deaths_Female_adults_2023
## statistic 0.566939
## p.value 7.471252e-17
## method "Shapiro-Wilk normality test"
## data.name "X[[i]]"
## AIDS_related_deaths_Male_adults_2023 GPD_PCAP_2023
## statistic 0.6574065 0.7517411
## p.value 5.245459e-15 1.164334e-12
## method "Shapiro-Wilk normality test" "Shapiro-Wilk normality test"
## data.name "X[[i]]" "X[[i]]"
A p-value from the Shapiro-Wilk test less than 0.05 for a variable, indicates that the variable does not follow a normal distribution. Therefore, based on the results of the normality tests, we can conclude that our variables do not adhere to the assumption of normality. This is important to note, as non-normal distributions may require the use of non-parametric methods for statistical analysis.
Objective: Obtain an overview of relationships between all quantitative variables using spearman method.
# Selecting relevant quantitative variables
quantitative_vars <- data_clean %>%
select(AIDS_Prevalence_All_adults_2023,
AIDS_Prevalence_Female_adults_2023,
AIDS_Prevalence_Male_adults_2023,
AIDS_related_deaths_All_adults_2023,
AIDS_related_deaths_Female_adults_2023,
AIDS_related_deaths_Male_adults_2023,
GPD_PCAP_2023)
# Calculating the correlation matrix
#cor_matrix <- cor(quantitative_vars, use = "complete.obs")
cor_matrix_2 <- cor(quantitative_vars, use = "complete.obs", method = "spearman")
#print(cor_matrix_2)
Objective: All correlation coefficients will be
displayed in a heatmap for a global view
The strongest relationships are observed among HIV-related indicators, while GDP displays weaker and more indirect associations.
Indicators such as HIV prevalence among all adults, females, and males show very strong positive correlations (r > 0.99). This reflects their interdependence, where the trends across gender-specific and overall prevalence metrics are highly similar.
The correlation between HIV prevalence and AIDS-related deaths ranges from 0.56 to 0.63, indicating a moderate positive relationship. This suggests that higher prevalence tends to align with higher mortality, though not perfectly. The correlation suggests that other factors—such as access to treatment and healthcare—may also influence mortality rates.
GDP per capita shows weak to moderate negative correlations with HIV-related metrics, ranging from -0.13 to -0.58: - AIDS_Prevalence_All_adults_2023 : -0.3735 - AIDS_Prevalence_Female_adults_2023 : -0.4748 - AIDS_Prevalence_Male_adults_2023 : -0.2559 - AIDS_related_deaths_All_adults_2023 : -0.5846 - AIDS_related_deaths_Female_adults_2023 : -0.5807 - AIDS_related_deaths_Male_adults_2023 : -0.5689
These findings are consistent with global trends, where higher GDP often correlates with better healthcare infrastructure, greater access to antiretroviral therapy, and more effective prevention programs. However, the low magnitudes (ranging from -0.1 to -0.6) suggest that the relationship is moderate to weak. This indicates that economic factors alone cannot fully explain variations in HIV prevalence or mortality. Other factors, such as health inequalities, access to healthcare, and national policies, play a significant role and significantly mediate this relationship.
The insights from this correlation matrix suggest the need for multivariate analysis to better understand the combined effects of economic and non-economic factors on HIV prevalence and mortality.
Relationship between GDP and HIV Prevalence
ggplot(data_clean, aes(x = GPD_PCAP_2023, y = AIDS_Prevalence_All_adults_2023)) +
geom_point(aes(color = as.factor(GDP_Category)), size = 2) +
geom_smooth(method = "lm", se = TRUE, color = "blue", linetype = "dashed") +
theme_minimal() +
labs(title = "Relationship between GDP and HIV Prevalence",
x = "GDP per Capita (2023)",
y = "HIV Prevalence (%)",
color = "GDP Category") +
theme(plot.title = element_text(hjust = 0.5)) # Center-align the title
## `geom_smooth()` using formula = 'y ~ x'
Relationship between HIV Prevalence and Deaths
ggplot(data_clean, aes(x = AIDS_related_deaths_All_adults_2023, y = AIDS_Prevalence_All_adults_2023)) +
geom_point(aes(color = as.factor(GDP_Category)), size = 2) +
geom_smooth(method = "lm", se = TRUE, color = "blue", linetype = "dashed") +
theme_minimal() +
labs(title = "Relationship between HIV Prevalence and Deaths",
x = "Death",
y = "HIV Prevalence (%)",
color = "GDP Category") +
theme(plot.title = element_text(hjust = 0.5)) # Center-align the title
## `geom_smooth()` using formula = 'y ~ x'
##### Step 4: Statistical Significance Tests for Correlations
Objective: Confirm whether the detected correlations
between GDP and HIV Prevalence, HIV Prevalence
and Deaths are statistically significant.
Explanation of the significance test:
The Pearson or Spearman test evaluates whether the observed relationship between two variables is likely to have occurred by chance.
The p-value helps determine if the Null hypothesis can be rejected: - p ≤ 0.05: The correlation is statistically significant. - p > 0.05: We fail to reject the null hypothesis, meaning the correlation is not considered statistically significant.
Statistical Significance Test for GDP and Prevalence Correlation
# Correlation test for GDP and prevalence (all adults)
cor_GDP_prevalence <- cor.test(data_clean$GPD_PCAP_2023, data_clean$AIDS_Prevalence_All_adults_2023, method = "spearman")
print(cor_GDP_prevalence)
##
## Spearman's rank correlation rho
##
## data: data_clean$GPD_PCAP_2023 and data_clean$AIDS_Prevalence_All_adults_2023
## S = 348130, p-value = 3.93e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.3735126
Keys Statistics - Spearman’s rho = -0.3735126 - p-value = 3.93e-05 - Conclusion: There is a moderate negative correlation between GDP per capita and HIV prevalence among all adults. The p-value is less than 0.05, indicating that this correlation is statistically significant. This suggests that as GDP increases, HIV prevalence decreases, although the relationship is not strong.
Statistical Significance Test for Prevalence and Deaths Correlation:
# Correlation test for prevalence and deaths (all adults)
cor_prevalence_deaths <- cor.test(data_clean$AIDS_Prevalence_All_adults_2023, data_clean$AIDS_related_deaths_All_adults_2023, method = "spearman")
print(cor_prevalence_deaths)
##
## Spearman's rank correlation rho
##
## data: data_clean$AIDS_Prevalence_All_adults_2023 and data_clean$AIDS_related_deaths_All_adults_2023
## S = 97925, p-value = 3.064e-13
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6136468
Key Statistics - Spearman’s rho =
0.6136468
- p-value = 3.064e-13
- Conclusion: There is a moderate positive correlation
between HIV prevalence and AIDS-related deaths among all adults. The
p-value is less than 0.05, indicating that this correlation is
statistically significant. This suggests that as HIV
prevalence increases, the number of AIDS-related deaths also increases,
though the correlation is moderate in strength.
Objective: Identify unusual or unexpected observations in the dataset.
Potential types of anomalies: - Countries with high prevalence and low deaths. - Countries with high GDP and high prevalence (or the reverse).
AIDS_Prevalence_All_adults_2023.AIDS_Prevalence_All_adults_2023.AIDS_related_deaths_All_adults_2023.AIDS_related_deaths_All_adults_2023.GPD_PCAP_2023.GPD_PCAP_2023.These rules aim to identify countries with unusual combinations of HIV prevalence, AIDS-related deaths, and GDP, which could highlight disparities or inefficiencies in healthcare systems, access to treatment, or disease management strategies.
Anomalies by countries and Types
# View anomalies in the data
anomalies <- data_clean %>% filter(anomaly != "Normal")
# Display anomalies with DT (Only Country and Anomaly)
datatable(anomalies[, c("Country", "anomaly")], options = list(
),
caption = 'Table: Anomalies by Country and Type')
Anomalies and Countries
# List countries by anomaly type, keeping the countries in one line
anomalies_list <- anomalies %>%
group_by(anomaly) %>%
summarise(Countries = paste(unique(Country), collapse = ", ")) %>%
arrange(anomaly)
# Display the anomalies list with countries on the same line
datatable(anomalies_list, options = list(
pageLength = 10,
autoWidth = TRUE,
columnDefs = list(list(targets = 1, width = '300px')) # Adjust column width for readability
),
caption = 'Anomalies and Countries')
Summary of anomalies by type
# Optionally view the summary of anomalies by type
anomalies_summary <- anomalies %>%
group_by(anomaly) %>%
summarise(Count = n())
datatable(anomalies_summary, options = list(
),
caption = 'Anomalies summary')
A moderate negative correlation (-0.3735) was observed between GDP per capita and HIV prevalence, indicating that countries with higher GDPs tend to have lower HIV prevalence. However, this relationship is not absolute. For instance, countries like South Africa exhibit high prevalence despite having relatively higher GDPs in sub-Saharan Africa.
This may be attributed to:
- Better healthcare infrastructure and prevention programs in wealthier
countries.
- Internal economic inequalities (among social classes) within these
countries that remain an influencing factor.
Women represent a disproportionately high share of people living with HIV, especially in sub-Saharan Africa, where their prevalence rates are often double those of men.
The geographic maps included in the analysis highlight critical zones like West Africa and parts of South Asia.
The analysis presented in this study offers critical insights into the socio-economic and gendered dynamics of the HIV/AIDS epidemic. The findings contribute to existing literature while uncovering anomalies and trends that demand attention in public health policymaking.
The negative correlation observed between GDP per capita and HIV
prevalence (-0.3735) suggests that higher-income nations generally have
better health outcomes concerning HIV. This trend is largely
attributable to:
- Improved healthcare infrastructure: Including
prevention programs and widespread access to antiretroviral therapies
(ARVs).
- Greater investment in public health awareness
campaigns: Alongside early detection programs.
However, the correlation’s moderate strength underscores the complexity of the HIV epidemic. Middle- and high-income countries, such as South Africa, illustrate that economic strength alone cannot mitigate the epidemic. Structural inequalities, uneven healthcare distribution, and cultural factors can offset the benefits of economic growth.
This finding aligns with previous studies highlighting that the epidemic disproportionately affects marginalized populations, even in wealthier nations.
The disproportionate burden of HIV among women in low-income countries, especially in sub-Saharan Africa, is both a biological and socio-cultural issue. Women face higher biological susceptibility to infection, but social determinants—such as gender inequality, lack of education, and economic dependence—amplify their vulnerability.
This disparity highlights the urgent need for
gender-sensitive policies and interventions that
empower women through:
- Education
- Economic independence
- Healthcare access
The study identified anomalies where countries with high HIV prevalence demonstrated low AIDS-related mortality rates (e.g., Botswana). This suggests that healthcare access, specifically ARV coverage, plays a pivotal role in mitigating AIDS-related deaths.
Conversely, countries with low prevalence but high AIDS mortality highlight significant gaps in healthcare systems, particularly in reaching underserved populations.
These findings reveal that healthcare outcomes are not solely
dependent on prevalence but are significantly influenced by:
- Access to life-saving treatments.
- The efficiency of healthcare delivery systems.
The trends and disparities observed in this analysis reinforce the
need for tailored public health interventions. Policies
should prioritize:
1. Reducing systemic barriers to healthcare access, particularly for
rural and marginalized populations.
2. Addressing socio-economic and gender inequalities that amplify the
epidemic’s impact on women.
3. Strengthening healthcare systems to close gaps in treatment
availability and ensure equitable access.
Moreover, anomalies in the data emphasize the importance of localized strategies that consider the unique challenges faced by each country or region. A one-size-fits-all approach is insufficient in addressing the diverse determinants of the HIV epidemic.
While the study offers valuable insights, it is important to
acknowledge its limitations:
- The reliance on secondary data may not fully capture localized nuances
or undocumented populations, such as informal settlements or migrant
workers.
- Correlation does not imply causation; additional research is needed to
understand the causal pathways between socio-economic factors and HIV
prevalence.
—