First 6 rows of the dataset
Country AIDS_Prevalence_All_adults_2023 GPD_PCAP_2023
Afghanistan 0.1 2211.281
Albania 0.1 21925.608
Algeria 0.1 16824.488
Angola 1.4 8040.702
Argentina 0.4 30082.305
Armenia 0.3 21342.515

Objective 1: Identifiy If the HIV prevalence differs significantly between countries grouped by GDP levels (low, medium, high)

Step 1: group the countries into 3 groups regarding the GDP

First 6 rows of the dataset

first six row of the dataset with the new column created
Country AIDS_Prevalence_All_adults_2023 GPD_PCAP_2023 GDP_category
Afghanistan 0.1 2211.281 Low
Albania 0.1 21925.608 Moderate
Algeria 0.1 16824.488 Moderate
Angola 1.4 8040.702 Low
Argentina 0.4 30082.305 High
Armenia 0.3 21342.515 Moderate

Frequency table of GDP_category

Var1 Freq
High 33.33
Low 33.33
Moderate 33.33

Step 2: check Normality of HIV prevalence among each GDP category

Shapiro-Wilk Test Results for Normality by GDP Category

GDP Category W Statistic P-Value
W Low 0.395 < 0.001
W1 Moderate 0.367 < 0.001
W2 High 0.672 < 0.001

The Shapiro-Wilk test shows that the distribution is not normal. Since the distribution is non-normal we will proceed with a non-parametric test, Kruskal-Wallis test.

Step 3: Perform Kruskal-Wallis test

Kruskal-Wallis Test Results
Test Chi-Squared Statistic Degrees of Freedom P-Value
Kruskal-Wallis chi-squared Kruskal-Wallis Test 16.41 2 0.00027

p-value: 0.0007872, Therefore, we reject the null hypothesis.This suggests that AIDS prevalence varies meaningfully with GDP levels. Post hoc analysis can reveal specific group differences.

Post Hoc Pairwise Comparisons

We will use the Dunn test for pairwise comparisons

Dunn’s Test Results (Bonferroni Adjusted)
Comparison Z-Statistic P-Value Adjusted P-Value
High - Low -4.049395 0.0000514 0.0001541
High - Moderate -1.927960 0.0538601 0.1615803
Low - Moderate 2.121435 0.0338852 0.1016557

Interpretation of Post Hoc Results The post hoc comparisons using Dunn’s test show the pairwise differences among GDP categories with adjusted p-values (Bonferroni correction). Here’s the breakdown:

  • High vs. Low Z = -3.76, Adjusted p-value = 0.0005 The difference between the “High” and “Low” GDP categories in terms of AIDS prevalence is statistically significant (p < 0.05), indicating a notable disparity.

  • High vs. Moderate Z = -1.59, Adjusted p-value = 0.336 The difference between the “High” and “Moderate” GDP categories is not statistically significant (p > 0.05).

  • Low vs. Moderate Z = 2.16, Adjusted p-value = 0.091 The difference between the “Low” and “Moderate” GDP categories is not statistically significant after adjustment (p > 0.05).

  • Summary There is a significant difference in AIDS prevalence between countries with “High” and “Low” GDP categories. Differences between “High vs. Moderate” and “Low vs. Moderate” GDP categories are not statistically significant after Bonferroni adjustment.

Visualization of the Paire wise comparison

Objective 2: Explore whether GDP is associated with HIV prevalence.

Table Spearman Correlation Test Results between GDP and AIDS Prevalence
Test Correlation Coefficient P-Value
rho Spearman Correlation -0.391 < 0.001

Visualization relationship between GDP and HIV Prevalence

Objective 3: Assessing whether GDP per capita is a significant predictor of HIV prevalence among adults

Linear Regression Results: GDP as a Predictor of HIV Prevalence
Term Estimate Standard Error t Value P Value R-squared Adjusted R-squared
(Intercept) 1.79311 0.46710 3.839 < 0.001 0.023 0.014
GPD_PCAP_2023 -0.00002 0.00001 -1.637 0.104 0.023 0.014

Linear Regression Equation

## y = 1.79311 + -2e-05 * x
## AIDS Prevalence = 1.79311 + -2e-05 * GDP per capita

Although the regression shows a slight negative relationship between GDP per capita and HIV prevalence, this relationship is not statistically significant. Additionally, the regression model explains very little of the variance in the data, as evidenced by a low adjusted R² of only 0.01427. This suggests that there are other factors not captured in this model that are potentially more important in predicting HIV prevalence. Therefore, GDP per capita is not a significant predictor of HIV prevalence in this context.