Abstract

Background: Cardiovascular disease is an important source of morbidity among people living with HIV (PLWH), particularly as survival improves with antiretroviral therapy (ART). This analysis examines whether ART duration and ARV regimen type are associated with elevated NT-proBNP, a biomarker of cardiac stress.

Methods: This cross-sectional analysis included PLWH aged 40 years and older receiving care in Almaty, Kazakhstan. Elevated NT-proBNP was defined as NT-proBNP ≥125 pg/mL. Logistic regression models estimated crude and adjusted associations between ART duration and elevated NT-proBNP.

Results: The unadjusted model showed a modest positive association between ART duration and elevated NT-proBNP. After adjustment for age, hypertension, ARV regimen type, sex, BMI, and smoking status, the ART duration association was attenuated. Age was the strongest predictor of elevated NT-proBNP.

Conclusion: In this sample, ART duration was not independently associated with elevated NT-proBNP after adjustment. Findings suggest that aging-related cardiovascular risk may be more important than cumulative ART duration in this population.

1. Data Loading and Analytic Sample

hiv_raw <- read_excel("C:/Users/userp/OneDrive/Рабочий стол/HSTA553/Project EPI 553/nursultan.xlsx")

n_start <- nrow(hiv_raw)
n_start
## [1] 150
# Recode helper function for ARV regimen groups
classify_regimen <- function(x) {
  x <- tolower(as.character(x))

  has_pi <- grepl("резолста|rezolsta|калетра|kaletra|lopinavir|дарунавир|darun", x)
  has_insti <- grepl("теград|триград|dtg|долутегравир|dolutegravir|тивикай|tivicay|триумек|triumeq", x)
  has_nnrti <- grepl("efv|эфавиренз|атрипла|atripla|тенмифа|комплера|complera|rilp|рилпивирин", x)

  n_classes <- sum(c(has_pi, has_insti, has_nnrti))

  if (n_classes > 1) {
    "Other/mixed"
  } else if (has_pi) {
    "PI-based"
  } else if (has_insti) {
    "INSTI-based"
  } else if (has_nnrti) {
    "NNRTI-based"
  } else {
    "Other/mixed"
  }
}

hiv <- hiv_raw %>%
  mutate(
    bmi = Weight / (Height / 100)^2,
    elevated_ntprobnp = ifelse(proBNP >= 125, 1, 0),
    sex_label = factor(ifelse(sex == 1, "Male", "Female"),
                       levels = c("Female", "Male")),
    current_smoker = ifelse(smoke100 == 1 & smokeday %in% c(1, 2), 1, 0),
    smoker_label = factor(ifelse(current_smoker == 1, "Current", "Non-current"),
                          levels = c("Non-current", "Current")),
    htn_label = factor(ifelse(HTN == 1, "Yes", "No"),
                       levels = c("No", "Yes")),
    regimen_group4 = sapply(ART, classify_regimen),
    regimen_group3 = case_when(
      regimen_group4 %in% c("PI-based", "Other/mixed") ~ "PI/Other",
      TRUE ~ regimen_group4
    ),
    regimen_group3 = factor(regimen_group3,
                            levels = c("INSTI-based", "NNRTI-based", "PI/Other")),
    regimen_group4 = factor(regimen_group4,
                            levels = c("INSTI-based", "NNRTI-based", "PI-based", "Other/mixed")),
    art_duration_cat = case_when(
      ARTyrs < 5 ~ "<5 years",
      ARTyrs >= 5 & ARTyrs <= 9 ~ "5-9 years",
      ARTyrs >= 10 ~ ">=10 years",
      TRUE ~ NA_character_
    ),
    art_duration_cat = factor(art_duration_cat,
                              levels = c("<5 years", "5-9 years", ">=10 years"))
  ) %>%
  filter(age >= 40)

n_final <- nrow(hiv)
n_excluded_age <- n_start - n_final

analytic_sample <- data.frame(
  Step = c("Raw dataset", "Excluded age <40 years", "Final analytic sample"),
  N = c(n_start, n_excluded_age, n_final)
)

kable(analytic_sample)
Step N
Raw dataset 150
Excluded age <40 years 0
Final analytic sample 150

The dataset was imported from a single Excel worksheet. The final analytic sample included 150 participants aged 40 years and older.

2. Variable Selection and Recoding

The primary outcome was elevated NT-proBNP, defined as NT-proBNP ≥125 pg/mL. The primary exposure was ART duration in years. ARV regimen type was classified into INSTI-based, NNRTI-based, and PI/Other groups for regression analysis.

Covariates included age, sex, BMI, hypertension, and current smoking status. These variables were selected because they are clinically relevant cardiovascular risk factors and potential confounders of the ART duration and NT-proBNP relationship.

key_vars <- c("proBNP", "ARTyrs", "ART", "age", "sex", "Height", "Weight", "HTN", "smoke100", "smokeday")

missing_table <- data.frame(
  Variable = key_vars,
  Missing_n = sapply(key_vars, function(v) sum(is.na(hiv[[v]]))),
  Missing_pct = sapply(key_vars, function(v) mean(is.na(hiv[[v]])) * 100)
)

kable(missing_table, digits = 1)
Variable Missing_n Missing_pct
proBNP proBNP 0 0.0
ARTyrs ARTyrs 0 0.0
ART ART 0 0.0
age age 0 0.0
sex sex 0 0.0
Height Height 0 0.0
Weight Weight 0 0.0
HTN HTN 0 0.0
smoke100 smoke100 0 0.0
smokeday smokeday 31 20.7

3. Descriptive Statistics

table1 <- bind_rows(
  data.frame(Characteristic = "Age, years", Summary = sprintf("%.1f (%.1f)", mean(hiv$age, na.rm = TRUE), sd(hiv$age, na.rm = TRUE))),
  data.frame(Characteristic = "BMI, kg/m^2", Summary = sprintf("%.1f (%.1f)", mean(hiv$bmi, na.rm = TRUE), sd(hiv$bmi, na.rm = TRUE))),
  data.frame(Characteristic = "NT-proBNP, pg/mL", Summary = sprintf("%.1f [%.1f, %.1f]",
                                                                    median(hiv$proBNP, na.rm = TRUE),
                                                                    quantile(hiv$proBNP, 0.25, na.rm = TRUE),
                                                                    quantile(hiv$proBNP, 0.75, na.rm = TRUE))),
  data.frame(Characteristic = "Elevated NT-proBNP: No", Summary = sprintf("%d (%.1f%%)",
                                                                          sum(hiv$elevated_ntprobnp == 0, na.rm = TRUE),
                                                                          mean(hiv$elevated_ntprobnp == 0, na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Elevated NT-proBNP: Yes", Summary = sprintf("%d (%.1f%%)",
                                                                           sum(hiv$elevated_ntprobnp == 1, na.rm = TRUE),
                                                                           mean(hiv$elevated_ntprobnp == 1, na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Sex: Male", Summary = sprintf("%d (%.1f%%)",
                                                             sum(hiv$sex_label == "Male", na.rm = TRUE),
                                                             mean(hiv$sex_label == "Male", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Sex: Female", Summary = sprintf("%d (%.1f%%)",
                                                               sum(hiv$sex_label == "Female", na.rm = TRUE),
                                                               mean(hiv$sex_label == "Female", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Current smoker: Current", Summary = sprintf("%d (%.1f%%)",
                                                                           sum(hiv$smoker_label == "Current", na.rm = TRUE),
                                                                           mean(hiv$smoker_label == "Current", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Current smoker: Non-current", Summary = sprintf("%d (%.1f%%)",
                                                                               sum(hiv$smoker_label == "Non-current", na.rm = TRUE),
                                                                               mean(hiv$smoker_label == "Non-current", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Hypertension: No", Summary = sprintf("%d (%.1f%%)",
                                                                     sum(hiv$htn_label == "No", na.rm = TRUE),
                                                                     mean(hiv$htn_label == "No", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "Hypertension: Yes", Summary = sprintf("%d (%.1f%%)",
                                                                      sum(hiv$htn_label == "Yes", na.rm = TRUE),
                                                                      mean(hiv$htn_label == "Yes", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ARV regimen type: INSTI-based", Summary = sprintf("%d (%.1f%%)",
                                                                                 sum(hiv$regimen_group4 == "INSTI-based", na.rm = TRUE),
                                                                                 mean(hiv$regimen_group4 == "INSTI-based", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ARV regimen type: NNRTI-based", Summary = sprintf("%d (%.1f%%)",
                                                                                 sum(hiv$regimen_group4 == "NNRTI-based", na.rm = TRUE),
                                                                                 mean(hiv$regimen_group4 == "NNRTI-based", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ARV regimen type: PI-based", Summary = sprintf("%d (%.1f%%)",
                                                                              sum(hiv$regimen_group4 == "PI-based", na.rm = TRUE),
                                                                              mean(hiv$regimen_group4 == "PI-based", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ARV regimen type: Other/mixed", Summary = sprintf("%d (%.1f%%)",
                                                                                 sum(hiv$regimen_group4 == "Other/mixed", na.rm = TRUE),
                                                                                 mean(hiv$regimen_group4 == "Other/mixed", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ART duration: <5 years", Summary = sprintf("%d (%.1f%%)",
                                                                          sum(hiv$art_duration_cat == "<5 years", na.rm = TRUE),
                                                                          mean(hiv$art_duration_cat == "<5 years", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ART duration: 5-9 years", Summary = sprintf("%d (%.1f%%)",
                                                                           sum(hiv$art_duration_cat == "5-9 years", na.rm = TRUE),
                                                                           mean(hiv$art_duration_cat == "5-9 years", na.rm = TRUE) * 100)),
  data.frame(Characteristic = "ART duration: >=10 years", Summary = sprintf("%d (%.1f%%)",
                                                                            sum(hiv$art_duration_cat == ">=10 years", na.rm = TRUE),
                                                                            mean(hiv$art_duration_cat == ">=10 years", na.rm = TRUE) * 100))
)

kable(table1, col.names = c("Characteristic", paste0("Overall (n = ", n_final, ")")))
Characteristic Overall (n = 150)
Age, years 50.6 (7.9)
BMI, kg/m^2 23.8 (3.6)
NT-proBNP, pg/mL 90.5 [46.2, 190.2]
Elevated NT-proBNP: No 95 (63.3%)
Elevated NT-proBNP: Yes 55 (36.7%)
Sex: Male 82 (54.7%)
Sex: Female 68 (45.3%)
Current smoker: Current 111 (74.0%)
Current smoker: Non-current 39 (26.0%)
Hypertension: No 128 (85.3%)
Hypertension: Yes 22 (14.7%)
ARV regimen type: INSTI-based 76 (50.7%)
ARV regimen type: NNRTI-based 63 (42.0%)
ARV regimen type: PI-based 7 (4.7%)
ARV regimen type: Other/mixed 4 (2.7%)
ART duration: <5 years 56 (37.3%)
ART duration: 5-9 years 59 (39.3%)
ART duration: >=10 years 35 (23.3%)

Note: Continuous variables are reported as mean (SD), except NT-proBNP, which is reported as median [IQR] because of right skew.

4. Exploratory Data Analysis

Figure 1. Distribution of NT-proBNP

ggplot(hiv, aes(x = proBNP)) +
  geom_histogram(bins = 30, color = "black", fill = "steelblue", alpha = 0.8) +
  labs(
    title = "Figure 1. Distribution of NT-proBNP",
    x = "NT-proBNP (pg/mL)",
    y = "Frequency"
  ) +
  theme_minimal()

NT-proBNP values were right-skewed, with most observations concentrated at lower values and a smaller number of high values.

Figure 2. NT-proBNP by ART Duration

ggplot(hiv, aes(x = ARTyrs, y = proBNP)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Figure 2. NT-proBNP by ART Duration",
    x = "ART duration (years)",
    y = "NT-proBNP (pg/mL)"
  ) +
  theme_minimal()

The scatterplot suggests a weak positive pattern between ART duration and NT-proBNP, but there is substantial variability.

Figure 3. NT-proBNP Across ARV Regimen Groups

ggplot(hiv, aes(x = regimen_group4, y = proBNP)) +
  geom_boxplot() +
  labs(
    title = "Figure 3. NT-proBNP Across ARV Regimen Groups",
    x = "ARV regimen group",
    y = "NT-proBNP (pg/mL)"
  ) +
  theme_minimal()

NT-proBNP distributions overlapped across regimen groups. PI-based and Other/mixed groups were small, so these visual differences should be interpreted cautiously.

5. Regression Model Specification

A logistic regression model was used because the outcome was binary: elevated NT-proBNP versus not elevated. Model 1 was unadjusted and included ART duration only. Model 2 adjusted for age, hypertension, ARV regimen group, sex, BMI, and smoking status.

model1 <- glm(
  elevated_ntprobnp ~ ARTyrs,
  family = binomial(link = "logit"),
  data = hiv
)

model2 <- glm(
  elevated_ntprobnp ~ ARTyrs + age + htn_label + regimen_group3 + sex_label + bmi + smoker_label,
  family = binomial(link = "logit"),
  data = hiv
)

6. Regression Results

tbl_regression(model1, exponentiate = TRUE) %>%
  modify_header(label = "**Term**") %>%
  modify_caption("**Model 1. Unadjusted logistic regression**")
Model 1. Unadjusted logistic regression
Term OR 95% CI p-value
ARTyrs 1.07 0.99, 1.15 0.094
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
tbl_regression(model2, exponentiate = TRUE) %>%
  modify_header(label = "**Term**") %>%
  modify_caption("**Model 2. Adjusted logistic regression**")
Model 2. Adjusted logistic regression
Term OR 95% CI p-value
ARTyrs 1.05 0.97, 1.14 0.2
age 1.08 1.03, 1.13 0.003
htn_label


    No
    Yes 0.88 0.29, 2.56 0.8
regimen_group3


    INSTI-based
    NNRTI-based 0.95 0.45, 1.99 0.9
    PI/Other 2.01 0.48, 8.69 0.3
sex_label


    Female
    Male 1.01 0.46, 2.24 >0.9
bmi 0.98 0.88, 1.08 0.7
smoker_label


    Non-current
    Current 0.72 0.30, 1.70 0.4
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
model1_or <- tidy(model1, exponentiate = TRUE, conf.int = TRUE)
model2_or <- tidy(model2, exponentiate = TRUE, conf.int = TRUE)

model1_or %>% kable(digits = 3)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.380 0.307 -3.153 0.002 0.202 0.679
ARTyrs 1.066 0.038 1.673 0.094 0.992 1.154
model2_or %>% kable(digits = 3)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.021 1.774 -2.176 0.030 0.001 0.644
ARTyrs 1.049 0.040 1.211 0.226 0.972 1.139
age 1.076 0.025 2.948 0.003 1.026 1.133
htn_labelYes 0.884 0.552 -0.224 0.823 0.287 2.559
regimen_group3NNRTI-based 0.948 0.378 -0.141 0.888 0.449 1.987
regimen_group3PI/Other 2.009 0.728 0.958 0.338 0.476 8.686
sex_labelMale 1.013 0.402 0.032 0.975 0.459 2.237
bmi 0.978 0.052 -0.431 0.666 0.880 1.081
smoker_labelCurrent 0.716 0.437 -0.765 0.444 0.303 1.700

In the unadjusted model, each additional year of ART duration was associated with higher odds of elevated NT-proBNP. After adjustment for age, hypertension, ARV regimen, sex, BMI, and smoking, the ART duration estimate was attenuated. Age was the strongest predictor of elevated NT-proBNP in the adjusted model.

7. Conclusion

Longer ART duration showed a modest positive association with elevated NT-proBNP in the unadjusted model, but this association was attenuated after adjustment. Age was the strongest predictor of elevated NT-proBNP. These results suggest that aging-related cardiovascular risk may be more important than ART duration alone in this sample of PLWH aged 40 years and older.

END