Part 0: Data Preparation

  1. Load Required Packages
  2. Import bmd.csv
##Read CSV from workspace
bmd = read.csv('bmd.csv', header = T)
##Preview dataset
head(bmd)
##    SEQN RIAGENDR RIDAGEYR RIDRETH1 BMXBMI smoker totmet metcat DXXOFBMD tbmdcat
## 1 93705        2       66        4   31.7      2    240      0    1.058       0
## 2 93708        2       66        5   23.7      3    120      0    0.801       1
## 3 93709        2       75        4   38.9      1    720      1    0.880       0
## 4 93711        1       56        5   21.3      3    840      1    0.851       1
## 5 93713        1       67        3   23.5      1    360      0    0.778       1
## 6 93714        2       54        4   39.9      2     NA     NA    0.994       0
##   calcium vitd DSQTVD DSQTCALC
## 1   503.5 1.85 20.557   211.67
## 2   473.5 5.85 25.000   820.00
## 3      NA   NA     NA       NA
## 4  1248.5 3.85 25.000    35.00
## 5   660.5 2.35     NA    13.33
## 6   776.0 5.65     NA       NA
  1. Convert RIDRETH1 to factor with meaningful ethnicity labels.
  2. Convert RIAGENDR to factor (male/female)
  3. Convert smoker to factor (Current/Past/Never)
bmd <- bmd %>%
  mutate(
  ##Recode RIDRETH1 to V_ETH
  V_ETH = factor(RIDRETH1, 
                 levels = c(1,2,3,4,5),
                 labels = c("Mexican American", 
                            "Other Hispanic",
                            "Non-Hispanic White",
                            "Non-Hispanic Black",
                            "Other")
                ),
  ##Recode RIAGENDR to V_GEN
  V_GEN = factor(RIAGENDR, 
                 levels = c(1,2),
                 labels = c("Male",
                            "Female")
                ),
  ##Recode smoker to V_SMK
  V_SMK = factor(smoker,
                 levels = c(1,2,3),
                 labels = c("Current","Past","Never"))
)

#QA - Print top 3 rows, output for new var
print (bmd[1:3, (ncol(bmd)-2):ncol(bmd)])
##                V_ETH  V_GEN   V_SMK
## 1 Non-Hispanic Black Female    Past
## 2              Other Female   Never
## 3 Non-Hispanic Black Female Current
  1. Report total sample size
  2. Create analytic dataset removing missing BMD, ethnicity
  3. Report final analytic sample size
# Store the total size of the dataset
M_completesamplesize <- nrow(bmd)

#a_bmd is new df, not missing any values in V_ETH or BMXBMI
a_bmd <- bmd[(!is.na(bmd$V_ETH) & !is.na(bmd$DXXOFBMD)),]

#Store total size of nonmissing dataset
M_total <- nrow(a_bmd)
  • Size of Complete Dataset: 2898
  • Number of Rows Excluded: 612 rows, 21.12%
  • Working Dataset size: n = 2286

Part 1: One-Way ANOVA

Research Question: Do mean BMD values differ across ethnicity groups?

Step 1: State Hypotheses

Null Hypothesis (H₀): μ1 = μ2 = μ3 = μ4 = μ5

The null hypothesis states that the BMD mean of all 5 ethnicity groups (Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, Other) are equal

Alternative Hypothesis (H₁): i,j | μi≠μj

The alternative hypothesis states that at least one ethnicity group’s population mean (μ) differs from the others

Significance level: α = 0.05

Step 2: Exploratory Analysis

# Calculate summary statistics by ethnic group
s_bmd <- a_bmd %>%
  group_by(V_ETH) %>%
  summarise(
    n = n(),
    Mean = mean(DXXOFBMD),
    SD = sd(DXXOFBMD),
    Median = median(DXXOFBMD),
    Min = min(DXXOFBMD),
    Max = max(DXXOFBMD)
  )

##clean table
s_bmd %>% 
  rename("Ethnicity Group" = V_ETH) %>%
  kable(digits = 2, 
        caption = "Descriptive Statistics: Total Femur Bone Mineral Density (g/cm^2^) by Ethnicity Group",
        )
Descriptive Statistics: Total Femur Bone Mineral Density (g/cm2) by Ethnicity Group
Ethnicity Group n Mean SD Median Min Max
Mexican American 255 0.95 0.15 0.96 0.44 1.46
Other Hispanic 244 0.95 0.16 0.94 0.55 1.39
Non-Hispanic White 846 0.89 0.16 0.88 0.37 1.46
Non-Hispanic Black 532 0.97 0.16 0.98 0.44 1.55
Other 409 0.90 0.15 0.89 0.54 1.34
##plot
ggplot(a_bmd, 
  aes(x = V_ETH, y = DXXOFBMD, fill = V_ETH)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Total Femur Bone Mineral Density by Ethnicity Group",
    subtitle = "NHANES Data",
    x = "Ethnicity Group",
    y = "Total Femur Bone Mineral Density (g/cm^2^)",
    fill = "V_ETH"
  ) +
  theme_classic(base_size = 12) +
  theme(legend.position = "none")

Step 3: ANOVA F-Test

# Fit ANOVA model
anova_bmd <- aov(DXXOFBMD ~ V_ETH, data = a_bmd)

# Display the ANOVA table
s_anova <- summary(anova_bmd)[[1]]
rownames(s_anova)[1] <- "Treatment: Ethnicity"
rownames(s_anova)[2] <- "Error"

s_anova %>% 
  rownames_to_column(var= " ") %>%
  kable(digits = 3)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment: Ethnicity 4 2.948 0.737 30.185 0
Error 2281 55.697 0.024 NA NA

Interpretation: Based on an α = 0.05, at a p value of <0.001, we reject the null hypothesis - indicating that there is statistically significant difference in the mean total femur bone density across ethnicity groups. The F-Statistic is F(4, 2281) = 30.19.


Step 4: Assumptions

# Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_bmd)

Residuals vs. Fitted: Linearity 👍

Five distinct columns of points, one for each ethnicity group. The red line is flat on 0.0, which indicates that the model is unbiased and centered around group means.

Normal Q-Q: Normality 👍

The points follow the diagonal very closely, minimal skew away from the line indicating that the distribution is not stretched.

Scale-Location: Homoscedasticity ⚠️

This is only relatively stable, with some variation in height among the groups. It may be worth further investigating with Levene’s Test to get a specific number to interpet.

Residuals vs Leverage: Absence of Influential Outliers 👍

A handful of items are far enough to be labeled, but none pass Cook’s Distance lines, indicating that the outliers are not influential enough to warp the mean.

# Levene's test for homogeneity of variance
levene <- leveneTest(DXXOFBMD ~ V_ETH, data = a_bmd)
print(levene)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    4  1.5728 0.1788
##       2281

Levene’s Test for Homogeneity of Variance

P value of 0.179 indicates that we fail to reject the null hypothesis, and we can proceed with the ANOVA without any violations of assumptions.


Step 5: Post-Hoc Comparisons

Tukey HSD

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_bmd)
Comparison Mean Diff 95% CI p-value Significant?
Other Hispanic - Mexican American -0.004 [-0.04, 0.03] 0.998 No
Non-Hispanic White - Mexican American -0.062 [-0.09, -0.03] 2.57e-07 Yes
Non-Hispanic Black - Mexican American 0.022 [-0.01, 0.05] 0.328 No
Other - Mexican American -0.053 [-0.09, -0.02] 0.000192 Yes
Non-Hispanic White - Other Hispanic -0.058 [-0.09, -0.03] 3.61e-06 Yes
Non-Hispanic Black - Other Hispanic 0.027 [-0.01, 0.06] 0.172 No
Other - Other Hispanic -0.049 [-0.08, -0.01] 0.00107 Yes
Non-Hispanic Black - Non-Hispanic White 0.085 [0.06, 0.11] 3.94e-11 Yes
Other - Non-Hispanic White 0.009 [-0.02, 0.03] 0.872 No
Other - Non-Hispanic Black -0.076 [-0.1, -0.05] 4.18e-11 Yes

Interpretation

  • Based on the Tukey HSD test, the mean BMD of the Non-Hispanic White treatment group differs statistically significantly from all other ethnicity groups except for Other.
  • The mean BMD of the Non-Hispanic Black group differs statistically significantly from the Non-Hispanic White and Other treatment groups.
  • The mean BMD for the Mexican American group differs statistically significantly from only the Other and Non-Hispanic White ethnicity groups.
  • The mean BMD for the Other Hispanic group statistically significantly differs from the Other and Non-Hispanic White ethnicity groups.
  • The mean BMD for the Other group statistically significantly differs from the Non-Hispanic Black, Other Hispanic, and Mexican American groups.

Effect Size

ss_treatment <- s_anova$`Sum Sq`[1]
ss_total <- sum(s_anova$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")
## Eta-squared (η²): 0.0503
cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")
## Percentage of variance explained: 5.03 %
cat("Effect Size: ", 
    ifelse(eta_squared < 0.06, 
            "Small", ifelse(eta_squared < 0.14, 
                            "Medium", 
                            "Large")
           )
    )
## Effect Size:  Small

Part 2: Correlation Analysis

Research Question: Are there associations between BMI, Calcium Vitamin D, MET?

Step 1: Create total intake variables.

Calculate:

  • calcium_total = calcium + DSQTCALC (replace NA with 0)
  • vitd_total = vitd + DSQTVD (replace NA with 0)
a_bmd <- a_bmd %>% 
  mutate(
  ##Missing value handling
     DSQTCALC_0 = replace_na(DSQTCALC,0),
     ## Downstream, I run into -inf off the logs, and then NaN in table A, so- 
 
     #I see int he student's template (that I hadn't seen when I wrote this lol) 
        #does not transform the dietary variables
            ##CALCIUM_0 = replace_na(calcium,0),
     DSQTVD_0 = replace_na(DSQTVD, 0),
     ##VITD_0 = replace_na(vitd, 0),
  ##Totals off MV cols
    calcium_total = calcium + DSQTCALC_0,
    vitd_total = vitd + DSQTVD_0
  )

Step 2: Assess and Apply Transformations

Create histograms for continuous variables (BMI, calcium, vitamin D, MET)

  • Decide on transformations based on skewness
  • Document reasoning

Suggested transformations:

  • Log transformation: BMI, calcium
  • Square root: MET, vitamin D (handles zeros better)
##start with plots

p1 <- ggplot(a_bmd, aes(x=a_bmd$BMXBMI, fill="coral")) +
  geom_histogram() +
  labs(
    title = "BMI",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$BMXBMI, na.rm=TRUE),3)),
    x = "BMI",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p2 <-  ggplot(a_bmd, aes(x=a_bmd$calcium_total, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Calcium",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$calcium_total, na.rm=TRUE),3)),
    x = "Total Calcium",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p3 <-  ggplot(a_bmd, aes(x=a_bmd$totmet, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "MET",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$totmet, na.rm=TRUE),3)),
    x = "Total MET",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p4 <-  ggplot(a_bmd, aes(x=a_bmd$vitd_total, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Vitamin D",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$vitd_total, na.rm=TRUE),3)),
    x = "Total Vitamin D",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")
p1+p2+p3+p4

##Based on these tables, likely do not need to transform BMI, this variable has a slight right skew, but is very close to normal. Calcium has a slight larger right skew, but is still fairly close to normal - this is worth testing to see if a transformation may help, but if the adjustment is not substantial, then we will proceed without it.

## MET and Vitamin D do appear to require transformation, and with the large number of 0s, a square root transform may help.

a_bmd <- a_bmd %>%
  mutate(
    ##we'll do em all and then decide
    log_vitd = log(vitd_total),
    log_calc = log(calcium_total),
    log_bmi = log(BMXBMI),
    log_met = log(totmet),
    sr_vitd = sqrt(vitd_total),
    sr_met = sqrt(totmet)
  )

p1 <- ggplot(a_bmd, aes(x=a_bmd$log_bmi, fill="coral")) +
  geom_histogram() +
  labs(
    title = "BMI - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_bmi, na.rm=TRUE),3)),
    x = "BMI",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p2 <-  ggplot(a_bmd, aes(x=a_bmd$log_calc, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Calcium - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_calc, na.rm=TRUE),3)),
    x = "Total Calcium",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p3 <-  ggplot(a_bmd, aes(x=a_bmd$log_met, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "MET - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_met, na.rm=TRUE),3)),
    x = "Total MET",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p4 <-  ggplot(a_bmd, aes(x=a_bmd$log_vitd, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Vitamin D - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_vitd, na.rm=TRUE),3)),
    x = "Total Vitamin D",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")
##BMI, Calcium somewhat improved but not so substantially - may be worth seeing how the correlations look if I test both.
## MET and VITD somewhat improved but still remain quite far from normal.
p1+p2+p3+p4

p1 <-  ggplot(a_bmd, aes(x=a_bmd$sr_met, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "MET - SqRt Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$sr_met, na.rm=TRUE),3)),
    x = "Total MET",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p2<-  ggplot(a_bmd, aes(x=a_bmd$sr_vitd, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Vitamin D - SqRt Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$sr_vitd, na.rm=TRUE),3)),
    x = "Total Vitamin D",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")
## These ones look workable
p1 + p2

While BMI and Calcium were not too far from normal, I will proceed with a LOG transformation with both for the Pearson Correlation, because this did improve their normality.

Both VitD and MET will be transformed using the SQRT functions to use for the Pearson Correlation.


Step 3: Three Correlation Tables

Table A: Predictors vs. Outcome (BMD)

  • Age (RIDAGEYR) vs. BMD
  • BMI (log-transformed) vs. BMD
  • Total calcium (log-transformed) vs. BMD
  • Total vitamin D (sqrt-transformed) vs. BMD

Create Corr_Helper function to use to generate the three tables:

# Function to calculate Pearson correlation for a pair of variables
corr_pair <- function(data, x, y) {
  # Select variables and remove missing values
  d <- data %>%
    select(all_of(c(x, y))) %>%
    drop_na()
  
  # Need at least 3 observations for correlation
  if(nrow(d) < 3) {
    return(tibble(
      x = x,
      y = y,
      n = nrow(d),
      estimate = NA_real_,
      p_value = NA_real_
    ))
  }
  
  # Calculate Pearson correlation
  ct <- cor.test(d[[x]], d[[y]], method = "pearson")
  
  # Return tidy results
  tibble(
    x = x,
    y = y,
    n = nrow(d),
    estimate = unname(ct$estimate),
    p_value = ct$p.value
  )
}

Table A Generation

##- Age (RIDAGEYR) vs. BMD 
##- BMI (log-transformed) vs. BMD  
##- Total calcium (log-transformed) vs. BMD 
##- Total vitamin D (sqrt-transformed) vs. BMD
x_list <- c("Age" = "RIDAGEYR",
       "BMI (log-transformed)" = "log_bmi",
       "Total Calcium (log-transformed)" = "log_calc",
       "Total Vitamin D (sqrt-transformed)" = "sr_vitd"
)

##loop through that list
table_a <- lapply(x_list, function(col) {
  corr_pair(a_bmd, x = col, y = "DXXOFBMD")
}) %>%
  ##convert lists to clean table
  bind_rows(.id="Predictor") %>%
  mutate(
    outcome = "BMD"
  ) %>%
  select(Predictor, outcome, everything())

table_a %>% 
  select(Predictor, outcome, n, estimate, p_value) %>% 
  kable(digits = 3, 
        caption = "**Table A:** Predictors vs. Outcome",
        col.names = c("Predictor", "Outcome", "n", "Estimate", "P-value")
        )
Table A: Predictors vs. Outcome
Predictor Outcome n Estimate P-value
Age BMD 2286 -0.232 0.000
BMI (log-transformed) BMD 2275 0.425 0.000
Total Calcium (log-transformed) BMD 2129 0.012 0.592
Total Vitamin D (sqrt-transformed) BMD 2129 -0.062 0.004

Interpretation: The initial Pearson Correlations indicate that there is a statistically significant negative association between BMD and Age (estimate = -0.232; p = <2e-16) and BMD and total Vitamin D (estimate = -0.0619;p = 0.00426), and a statistically significant positive association between BMD and BMI (log-transformed) (estimate = 0.425;p = <2e-16). The association between BMD and total calcium intake is not statistically significant.


Table B: Predictors vs. Exposure (Physical Activity)

  • Age vs. MET (sqrt-transformed)
  • BMI vs. MET - Total
  • calcium vs. MET - Total
  • vitamin D vs. MET

Table B Generation

## X-List can actually be the same for this - the outcome is different (MET), but predictors are the same.
x_list <- c("Age" = "RIDAGEYR",
       "BMI (log-transformed)" = "log_bmi",
       "Total Calcium (log-transformed)" = "log_calc",
       "Total Vitamin D (sqrt-transformed)" = "sr_vitd"
)

##loop through that list
table_b <- lapply(x_list, function(col) {
  corr_pair(a_bmd, x = col, y = "sr_met")
}) %>%
  ##convert lists to clean table
  bind_rows(.id="Predictor") %>%
  mutate(
    outcome = "MET"
  ) %>%
  select(Predictor, outcome, everything())

table_b %>% 
  select(Predictor, outcome, n, estimate, p_value) %>% 
  kable(digits = 3, 
        caption = "**Table B:** Predictors vs. Outcome",
        col.names = c("Predictor", "Outcome", "n", "Estimate", "P-value")
        )
Table B: Predictors vs. Outcome
Predictor Outcome n Estimate P-value
Age MET 1588 -0.105 0.000
BMI (log-transformed) MET 1582 0.009 0.732
Total Calcium (log-transformed) MET 1500 0.068 0.009
Total Vitamin D (sqrt-transformed) MET 1500 -0.003 0.893

The initial Pearson Correlations indicate that there is a statistically significant negative association between MET and Age (estimate = -0.105; p = 2.67e-05) and MET and total Vitamin D (estimate = -0.00349;p = 0.893), and a statistically significant positive association between MET and Calcium (log-transformed) (estimate = 0.0677;p = 0.00876). The association between MET and BMI is not statistically significant.


Table C: Predictors vs. Each Other All pairwise correlations among:

  • Age
  • BMI (transformed)
  • Total calcium (transformed)
  • Total vitamin D (transformed)

Table C Generation

##List of Predictors remains the same
x_list <- c("Age" = "RIDAGEYR",
       "BMI (log-transformed)" = "log_bmi",
       "Total Calcium (log-transformed)" = "log_calc",
       "Total Vitamin D (sqrt-transformed)" = "sr_vitd"
)
##I made this worse by just keeping the same named vector instead of making two lists
###all_pairs <- combn(x_list, 2, simplify=FALSE)

##Create a DF from the names
all_pairs <- as.data.frame(
  ##Transpose
  t(
    ##just doing names here
    combn(names(x_list), 2)), 
  stringsAsFactors = FALSE
)

# Add the actual variable codes
all_pairs <- all_pairs %>%
  rename(var1_name = V1, var2_name = V2) %>%
  mutate(
    var1_value = x_list[var1_name],
    var2_value = x_list[var2_name]
  )

##I think for this, it makes the most sense to just, do a regular loop 1-row, count, access rows?
table_c <- lapply(1:nrow(all_pairs), function(i) {
  ##yeah, this is whatever row we're on
  test <- all_pairs[i, ]
  ##dynamic function call
    corr_pair(a_bmd, 
            x = test$var1_value[[1]],
            y = test$var2_value[[1]]) %>%
    mutate(Predictor = test$var1_name[[1]], outcome = test$var2_name[[1]])}) %>%
bind_rows()


table_c %>% 
  select(Predictor, outcome, n, estimate, p_value) %>% 
  kable(digits = 3, 
        caption = "**Table C:** Predictors vs. Each Other",
        col.names = c("Predictor", "Outcome", "n", "Estimate", "P-value")
        )
Table C: Predictors vs. Each Other
Predictor Outcome n Estimate P-value
Age BMI (log-transformed) 2275 -0.093 0.000
Age Total Calcium (log-transformed) 2129 0.067 0.002
Age Total Vitamin D (sqrt-transformed) 2129 0.152 0.000
BMI (log-transformed) Total Calcium (log-transformed) 2122 0.009 0.685
BMI (log-transformed) Total Vitamin D (sqrt-transformed) 2122 0.026 0.226
Total Calcium (log-transformed) Total Vitamin D (sqrt-transformed) 2129 0.291 0.000
  • Age has a statistically significant negative association with BMI (estimate = -0.0928; p = 9.28e-06), and statistically significant positive associations with Calcium (0.0666; p = 0.00211) and Vitamin D (0.152; p = 2.02e-12).

  • BMI has a statistically significant negative association with Age (estimate = -0.0928; p = 9.28e-06), and no other predictors.

  • Calcium has a statistically significant positive association with Age (0.0666; p = 0.00211) and Vitamin D (0.291; p = <2e-16).

  • Vitamin D has statistically significant positive associations with Age(0.152; p = 2.02e-12) and Calcium (0.291; p = <2e-16).

  • No other pairs of predictors are statistically significant.

  • All of these predictors have very low correlations with each other, indicating that there are no major concerns for multicollinearity.


Part 3: Reflection

Both ANOVA and Correlation are informative. ANOVA is relevant when you’re looking to compare the differences in a continuous variable across a categorical variable, whereas Correlations are useful for identifying associations in two continuous variables. With ANOVA, we might be able to dig into which other categorical variables affect the mean BMD values – sex or education level may be a factor. With correlation, we can identify the strength and direction of any association between continuous variables, like diastolic blood pressure and BMD.

In this dataset, the normality needed for a linear correlation was the toughest assumption we needed to reach. We were able to transform the data to be relatively close to normal but if we weren’t able to, we would likely need to do a different type of correlation – perhaps the Spearman Correlation. ANOVA in general is quite robust at large enough sample sizes, because of the Central Limit Theorem.

Additionally, because NHANES data is cross-sectional, it cannot be used to establish temporality or causation. We can only see if there’s an association between variables. With BMD, which was the focus of this analysis, in particular, this measure is something that is linked to very long, years-long, factors in someone’s life, whereas something like dietary calcium is recently measured and may not reflect historical calcium intake. In particular – it might be confounded by indication, given that high dietary calcium now could be linked to specifically taking calcium supplements as a result of bone issues.

I do wish that I had found the template to work from ahead of time, but working without the template was very helpful for my coding skills in general. The toughest part was figuring out how to structure the loop to execute Table C. I actually decided to give up on using the loop, and rbind the function call output 6 times, and then when I ran into the same error (needing a numeric for the correlation function), I realized that I was calling the column and I needed to access the contents of the row with double brackets. So I was able to go back in and correct it by adding in the [[1]] into the loop.