Assignment 1: ANOVA and Correlation

Part 0: Data Preparation

Load Required Packages
Import bmd.csv

##Read CSV from workspace
bmd = read.csv('bmd.csv', header = T)
##Preview dataset
head(bmd)

##    SEQN RIAGENDR RIDAGEYR RIDRETH1 BMXBMI smoker totmet metcat DXXOFBMD tbmdcat
## 1 93705        2       66        4   31.7      2    240      0    1.058       0
## 2 93708        2       66        5   23.7      3    120      0    0.801       1
## 3 93709        2       75        4   38.9      1    720      1    0.880       0
## 4 93711        1       56        5   21.3      3    840      1    0.851       1
## 5 93713        1       67        3   23.5      1    360      0    0.778       1
## 6 93714        2       54        4   39.9      2     NA     NA    0.994       0
##   calcium vitd DSQTVD DSQTCALC
## 1   503.5 1.85 20.557   211.67
## 2   473.5 5.85 25.000   820.00
## 3      NA   NA     NA       NA
## 4  1248.5 3.85 25.000    35.00
## 5   660.5 2.35     NA    13.33
## 6   776.0 5.65     NA       NA

Convert RIDRETH1 to factor with meaningful ethnicity labels.
Convert RIAGENDR to factor (male/female)
Convert smoker to factor (Current/Past/Never)

bmd <- bmd %>%
  mutate(
  ##Recode RIDRETH1 to V_ETH
  V_ETH = factor(RIDRETH1, 
                 levels = c(1,2,3,4,5),
                 labels = c("Mexican American", 
                            "Other Hispanic",
                            "Non-Hispanic White",
                            "Non-Hispanic Black",
                            "Other")
                ),
  ##Recode RIAGENDR to V_GEN
  V_GEN = factor(RIAGENDR, 
                 levels = c(1,2),
                 labels = c("Male",
                            "Female")
                ),
  ##Recode smoker to V_SMK
  V_SMK = factor(smoker,
                 levels = c(1,2,3),
                 labels = c("Current","Past","Never"))
)

#QA - Print top 3 rows, output for new var
print (bmd[1:3, (ncol(bmd)-2):ncol(bmd)])

##                V_ETH  V_GEN   V_SMK
## 1 Non-Hispanic Black Female    Past
## 2              Other Female   Never
## 3 Non-Hispanic Black Female Current

Report total sample size
Create analytic dataset removing missing BMD, ethnicity
Report final analytic sample size

# Store the total size of the dataset
M_completesamplesize <- nrow(bmd)

#a_bmd is new df, not missing any values in V_ETH or BMXBMI
a_bmd <- bmd[(!is.na(bmd$V_ETH) & !is.na(bmd$DXXOFBMD)),]

#Store total size of nonmissing dataset
M_total <- nrow(a_bmd)

Size of Complete Dataset: 2898
Number of Rows Excluded: 612 rows, 21.12%
Working Dataset size: n = 2286

Part 1: One-Way ANOVA

Research Question: Do mean BMD values differ across ethnicity groups?

Step 1: State Hypotheses

Null Hypothesis (H₀): μ₁ = μ₂ = μ₃ = μ₄ = μ₅

The null hypothesis states that the BMD mean of all 5 ethnicity groups (Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black, Other) are equal

Alternative Hypothesis (H₁): i,j | μ_i≠μ_j

The alternative hypothesis states that at least one ethnicity group’s population mean (μ) differs from the others

Significance level: α = 0.05

Step 2: Exploratory Analysis

# Calculate summary statistics by ethnic group
s_bmd <- a_bmd %>%
  group_by(V_ETH) %>%
  summarise(
    n = n(),
    Mean = mean(DXXOFBMD),
    SD = sd(DXXOFBMD),
    Median = median(DXXOFBMD),
    Min = min(DXXOFBMD),
    Max = max(DXXOFBMD)
  )

##clean table
s_bmd %>% 
  rename("Ethnicity Group" = V_ETH) %>%
  kable(digits = 2, 
        caption = "Descriptive Statistics: Total Femur Bone Mineral Density (g/cm^2^) by Ethnicity Group",
        )

Descriptive Statistics: Total Femur Bone Mineral Density (g/cm²) by Ethnicity Group
Ethnicity Group	n	Mean	SD	Median	Min	Max
Mexican American	255	0.95	0.15	0.96	0.44	1.46
Other Hispanic	244	0.95	0.16	0.94	0.55	1.39
Non-Hispanic White	846	0.89	0.16	0.88	0.37	1.46
Non-Hispanic Black	532	0.97	0.16	0.98	0.44	1.55
Other	409	0.90	0.15	0.89	0.54	1.34

##plot
ggplot(a_bmd, 
  aes(x = V_ETH, y = DXXOFBMD, fill = V_ETH)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Total Femur Bone Mineral Density by Ethnicity Group",
    subtitle = "NHANES Data",
    x = "Ethnicity Group",
    y = "Total Femur Bone Mineral Density (g/cm^2^)",
    fill = "V_ETH"
  ) +
  theme_classic(base_size = 12) +
  theme(legend.position = "none")

Step 3: ANOVA F-Test

# Fit ANOVA model
anova_bmd <- aov(DXXOFBMD ~ V_ETH, data = a_bmd)

# Display the ANOVA table
s_anova <- summary(anova_bmd)[[1]]
rownames(s_anova)[1] <- "Treatment: Ethnicity"
rownames(s_anova)[2] <- "Error"

s_anova %>% 
  rownames_to_column(var= " ") %>%
  kable(digits = 3)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
Treatment: Ethnicity	4	2.948	0.737	30.185	0
Error	2281	55.697	0.024	NA	NA

Interpretation: Based on an α = 0.05, at a p value of <0.001, we reject the null hypothesis - indicating that there is statistically significant difference in the mean total femur bone density across ethnicity groups. The F-Statistic is F(4, 2281) = 30.19.

Step 4: Assumptions

# Create diagnostic plots
par(mfrow = c(2, 2))
plot(anova_bmd)

Residuals vs. Fitted: Linearity 👍

Five distinct columns of points, one for each ethnicity group. The red line is flat on 0.0, which indicates that the model is unbiased and centered around group means.

Normal Q-Q: Normality 👍

The points follow the diagonal very closely, minimal skew away from the line indicating that the distribution is not stretched.

Scale-Location: Homoscedasticity ⚠️

This is only relatively stable, with some variation in height among the groups. It may be worth further investigating with Levene’s Test to get a specific number to interpet.

Residuals vs Leverage: Absence of Influential Outliers 👍

A handful of items are far enough to be labeled, but none pass Cook’s Distance lines, indicating that the outliers are not influential enough to warp the mean.

# Levene's test for homogeneity of variance
levene <- leveneTest(DXXOFBMD ~ V_ETH, data = a_bmd)
print(levene)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    4  1.5728 0.1788
##       2281

Levene’s Test for Homogeneity of Variance

P value of 0.179 indicates that we fail to reject the null hypothesis, and we can proceed with the ANOVA without any violations of assumptions.

Step 5: Post-Hoc Comparisons

Tukey HSD

# Conduct Tukey HSD test
tukey_results <- TukeyHSD(anova_bmd)

Comparison	Mean Diff	95% CI	p-value	Significant?
Other Hispanic - Mexican American	-0.004	[-0.04, 0.03]	0.998	No
Non-Hispanic White - Mexican American	-0.062	[-0.09, -0.03]	2.57e-07	Yes
Non-Hispanic Black - Mexican American	0.022	[-0.01, 0.05]	0.328	No
Other - Mexican American	-0.053	[-0.09, -0.02]	0.000192	Yes
Non-Hispanic White - Other Hispanic	-0.058	[-0.09, -0.03]	3.61e-06	Yes
Non-Hispanic Black - Other Hispanic	0.027	[-0.01, 0.06]	0.172	No
Other - Other Hispanic	-0.049	[-0.08, -0.01]	0.00107	Yes
Non-Hispanic Black - Non-Hispanic White	0.085	[0.06, 0.11]	3.94e-11	Yes
Other - Non-Hispanic White	0.009	[-0.02, 0.03]	0.872	No
Other - Non-Hispanic Black	-0.076	[-0.1, -0.05]	4.18e-11	Yes

Interpretation

Based on the Tukey HSD test, the mean BMD of the Non-Hispanic White treatment group differs statistically significantly from all other ethnicity groups except for Other.
The mean BMD of the Non-Hispanic Black group differs statistically significantly from the Non-Hispanic White and Other treatment groups.
The mean BMD for the Mexican American group differs statistically significantly from only the Other and Non-Hispanic White ethnicity groups.
The mean BMD for the Other Hispanic group statistically significantly differs from the Other and Non-Hispanic White ethnicity groups.
The mean BMD for the Other group statistically significantly differs from the Non-Hispanic Black, Other Hispanic, and Mexican American groups.

Effect Size

ss_treatment <- s_anova$`Sum Sq`[1]
ss_total <- sum(s_anova$`Sum Sq`)

# Calculate eta-squared
eta_squared <- ss_treatment / ss_total

cat("Eta-squared (η²):", round(eta_squared, 4), "\n")

## Eta-squared (η²): 0.0503

cat("Percentage of variance explained:", round(eta_squared * 100, 2), "%")

## Percentage of variance explained: 5.03 %

cat("Effect Size: ", 
    ifelse(eta_squared < 0.06, 
            "Small", ifelse(eta_squared < 0.14, 
                            "Medium", 
                            "Large")
           )
    )

## Effect Size:  Small

Part 2: Correlation Analysis

Research Question: Are there associations between BMI, Calcium Vitamin D, MET?

Step 1: Create total intake variables.

Calculate:

calcium_total = calcium + DSQTCALC (replace NA with 0)
vitd_total = vitd + DSQTVD (replace NA with 0)

a_bmd <- a_bmd %>% 
  mutate(
  ##Missing value handling
     DSQTCALC_0 = replace_na(DSQTCALC,0),
     ## Downstream, I run into -inf off the logs, and then NaN in table A, so- 
 
     #I see int he student's template (that I hadn't seen when I wrote this lol) 
        #does not transform the dietary variables
            ##CALCIUM_0 = replace_na(calcium,0),
     DSQTVD_0 = replace_na(DSQTVD, 0),
     ##VITD_0 = replace_na(vitd, 0),
  ##Totals off MV cols
    calcium_total = calcium + DSQTCALC_0,
    vitd_total = vitd + DSQTVD_0
  )

Step 2: Assess and Apply Transformations

Create histograms for continuous variables (BMI, calcium, vitamin D, MET)

Decide on transformations based on skewness
Document reasoning

Suggested transformations:

Log transformation: BMI, calcium
Square root: MET, vitamin D (handles zeros better)

##start with plots

p1 <- ggplot(a_bmd, aes(x=a_bmd$BMXBMI, fill="coral")) +
  geom_histogram() +
  labs(
    title = "BMI",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$BMXBMI, na.rm=TRUE),3)),
    x = "BMI",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p2 <-  ggplot(a_bmd, aes(x=a_bmd$calcium_total, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Calcium",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$calcium_total, na.rm=TRUE),3)),
    x = "Total Calcium",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p3 <-  ggplot(a_bmd, aes(x=a_bmd$totmet, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "MET",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$totmet, na.rm=TRUE),3)),
    x = "Total MET",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p4 <-  ggplot(a_bmd, aes(x=a_bmd$vitd_total, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Vitamin D",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$vitd_total, na.rm=TRUE),3)),
    x = "Total Vitamin D",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")
p1+p2+p3+p4

##Based on these tables, likely do not need to transform BMI, this variable has a slight right skew, but is very close to normal. Calcium has a slight larger right skew, but is still fairly close to normal - this is worth testing to see if a transformation may help, but if the adjustment is not substantial, then we will proceed without it.

## MET and Vitamin D do appear to require transformation, and with the large number of 0s, a square root transform may help.

a_bmd <- a_bmd %>%
  mutate(
    ##we'll do em all and then decide
    log_vitd = log(vitd_total),
    log_calc = log(calcium_total),
    log_bmi = log(BMXBMI),
    log_met = log(totmet),
    sr_vitd = sqrt(vitd_total),
    sr_met = sqrt(totmet)
  )

p1 <- ggplot(a_bmd, aes(x=a_bmd$log_bmi, fill="coral")) +
  geom_histogram() +
  labs(
    title = "BMI - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_bmi, na.rm=TRUE),3)),
    x = "BMI",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p2 <-  ggplot(a_bmd, aes(x=a_bmd$log_calc, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Calcium - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_calc, na.rm=TRUE),3)),
    x = "Total Calcium",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p3 <-  ggplot(a_bmd, aes(x=a_bmd$log_met, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "MET - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_met, na.rm=TRUE),3)),
    x = "Total MET",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p4 <-  ggplot(a_bmd, aes(x=a_bmd$log_vitd, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Vitamin D - Log Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$log_vitd, na.rm=TRUE),3)),
    x = "Total Vitamin D",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")
##BMI, Calcium somewhat improved but not so substantially - may be worth seeing how the correlations look if I test both.
## MET and VITD somewhat improved but still remain quite far from normal.
p1+p2+p3+p4

p1 <-  ggplot(a_bmd, aes(x=a_bmd$sr_met, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "MET - SqRt Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$sr_met, na.rm=TRUE),3)),
    x = "Total MET",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")

p2<-  ggplot(a_bmd, aes(x=a_bmd$sr_vitd, fill = "coral")) +
  geom_histogram() +
  labs(
    title = "Vitamin D - SqRt Transformed",
    subtitle = paste0("Skewness: ", round(skewness(a_bmd$sr_vitd, na.rm=TRUE),3)),
    x = "Total Vitamin D",
    y = "Frequency",
  )  +
theme_classic() + 
  theme(legend.position = "none")
## These ones look workable
p1 + p2

While BMI and Calcium were not too far from normal, I will proceed with a LOG transformation with both for the Pearson Correlation, because this did improve their normality.

Both VitD and MET will be transformed using the SQRT functions to use for the Pearson Correlation.

Step 3: Three Correlation Tables

Table A: Predictors vs. Outcome (BMD)

Age (RIDAGEYR) vs. BMD
BMI (log-transformed) vs. BMD
Total calcium (log-transformed) vs. BMD
Total vitamin D (sqrt-transformed) vs. BMD

Create Corr_Helper function to use to generate the three tables:

# Function to calculate Pearson correlation for a pair of variables
corr_pair <- function(data, x, y) {
  # Select variables and remove missing values
  d <- data %>%
    select(all_of(c(x, y))) %>%
    drop_na()
  
  # Need at least 3 observations for correlation
  if(nrow(d) < 3) {
    return(tibble(
      x = x,
      y = y,
      n = nrow(d),
      estimate = NA_real_,
      p_value = NA_real_
    ))
  }
  
  # Calculate Pearson correlation
  ct <- cor.test(d[[x]], d[[y]], method = "pearson")
  
  # Return tidy results
  tibble(
    x = x,
    y = y,
    n = nrow(d),
    estimate = unname(ct$estimate),
    p_value = ct$p.value
  )
}

Table A Generation

##- Age (RIDAGEYR) vs. BMD 
##- BMI (log-transformed) vs. BMD  
##- Total calcium (log-transformed) vs. BMD 
##- Total vitamin D (sqrt-transformed) vs. BMD
x_list <- c("Age" = "RIDAGEYR",
       "BMI (log-transformed)" = "log_bmi",
       "Total Calcium (log-transformed)" = "log_calc",
       "Total Vitamin D (sqrt-transformed)" = "sr_vitd"
)

##loop through that list
table_a <- lapply(x_list, function(col) {
  corr_pair(a_bmd, x = col, y = "DXXOFBMD")
}) %>%
  ##convert lists to clean table
  bind_rows(.id="Predictor") %>%
  mutate(
    outcome = "BMD"
  ) %>%
  select(Predictor, outcome, everything())

table_a %>% 
  select(Predictor, outcome, n, estimate, p_value) %>% 
  kable(digits = 3, 
        caption = "**Table A:** Predictors vs. Outcome",
        col.names = c("Predictor", "Outcome", "n", "Estimate", "P-value")
        )

**Table A:** Predictors vs. Outcome
Predictor	Outcome	n	Estimate	P-value
Age	BMD	2286	-0.232	0.000
BMI (log-transformed)	BMD	2275	0.425	0.000
Total Calcium (log-transformed)	BMD	2129	0.012	0.592
Total Vitamin D (sqrt-transformed)	BMD	2129	-0.062	0.004

Interpretation: The initial Pearson Correlations indicate that there is a statistically significant negative association between BMD and Age (estimate = -0.232; p = <2e-16) and BMD and total Vitamin D (estimate = -0.0619;p = 0.00426), and a statistically significant positive association between BMD and BMI (log-transformed) (estimate = 0.425;p = <2e-16). The association between BMD and total calcium intake is not statistically significant.

Table B: Predictors vs. Exposure (Physical Activity)

Age vs. MET (sqrt-transformed)
BMI vs. MET - Total
calcium vs. MET - Total
vitamin D vs. MET

Table B Generation

## X-List can actually be the same for this - the outcome is different (MET), but predictors are the same.
x_list <- c("Age" = "RIDAGEYR",
       "BMI (log-transformed)" = "log_bmi",
       "Total Calcium (log-transformed)" = "log_calc",
       "Total Vitamin D (sqrt-transformed)" = "sr_vitd"
)

##loop through that list
table_b <- lapply(x_list, function(col) {
  corr_pair(a_bmd, x = col, y = "sr_met")
}) %>%
  ##convert lists to clean table
  bind_rows(.id="Predictor") %>%
  mutate(
    outcome = "MET"
  ) %>%
  select(Predictor, outcome, everything())

table_b %>% 
  select(Predictor, outcome, n, estimate, p_value) %>% 
  kable(digits = 3, 
        caption = "**Table B:** Predictors vs. Outcome",
        col.names = c("Predictor", "Outcome", "n", "Estimate", "P-value")
        )

**Table B:** Predictors vs. Outcome
Predictor	Outcome	n	Estimate	P-value
Age	MET	1588	-0.105	0.000
BMI (log-transformed)	MET	1582	0.009	0.732
Total Calcium (log-transformed)	MET	1500	0.068	0.009
Total Vitamin D (sqrt-transformed)	MET	1500	-0.003	0.893

The initial Pearson Correlations indicate that there is a statistically significant negative association between MET and Age (estimate = -0.105; p = 2.67e-05) and MET and total Vitamin D (estimate = -0.00349;p = 0.893), and a statistically significant positive association between MET and Calcium (log-transformed) (estimate = 0.0677;p = 0.00876). The association between MET and BMI is not statistically significant.

Table C: Predictors vs. Each Other All pairwise correlations among:

Age
BMI (transformed)
Total calcium (transformed)
Total vitamin D (transformed)

Table C Generation

##List of Predictors remains the same
x_list <- c("Age" = "RIDAGEYR",
       "BMI (log-transformed)" = "log_bmi",
       "Total Calcium (log-transformed)" = "log_calc",
       "Total Vitamin D (sqrt-transformed)" = "sr_vitd"
)
##I made this worse by just keeping the same named vector instead of making two lists
###all_pairs <- combn(x_list, 2, simplify=FALSE)

##Create a DF from the names
all_pairs <- as.data.frame(
  ##Transpose
  t(
    ##just doing names here
    combn(names(x_list), 2)), 
  stringsAsFactors = FALSE
)

# Add the actual variable codes
all_pairs <- all_pairs %>%
  rename(var1_name = V1, var2_name = V2) %>%
  mutate(
    var1_value = x_list[var1_name],
    var2_value = x_list[var2_name]
  )

##I think for this, it makes the most sense to just, do a regular loop 1-row, count, access rows?
table_c <- lapply(1:nrow(all_pairs), function(i) {
  ##yeah, this is whatever row we're on
  test <- all_pairs[i, ]
  ##dynamic function call
    corr_pair(a_bmd, 
            x = test$var1_value[[1]],
            y = test$var2_value[[1]]) %>%
    mutate(Predictor = test$var1_name[[1]], outcome = test$var2_name[[1]])}) %>%
bind_rows()


table_c %>% 
  select(Predictor, outcome, n, estimate, p_value) %>% 
  kable(digits = 3, 
        caption = "**Table C:** Predictors vs. Each Other",
        col.names = c("Predictor", "Outcome", "n", "Estimate", "P-value")
        )

**Table C:** Predictors vs. Each Other
Predictor	Outcome	n	Estimate	P-value
Age	BMI (log-transformed)	2275	-0.093	0.000
Age	Total Calcium (log-transformed)	2129	0.067	0.002
Age	Total Vitamin D (sqrt-transformed)	2129	0.152	0.000
BMI (log-transformed)	Total Calcium (log-transformed)	2122	0.009	0.685
BMI (log-transformed)	Total Vitamin D (sqrt-transformed)	2122	0.026	0.226
Total Calcium (log-transformed)	Total Vitamin D (sqrt-transformed)	2129	0.291	0.000

Age has a statistically significant negative association with BMI (estimate = -0.0928; p = 9.28e-06), and statistically significant positive associations with Calcium (0.0666; p = 0.00211) and Vitamin D (0.152; p = 2.02e-12).
BMI has a statistically significant negative association with Age (estimate = -0.0928; p = 9.28e-06), and no other predictors.
Calcium has a statistically significant positive association with Age (0.0666; p = 0.00211) and Vitamin D (0.291; p = <2e-16).
Vitamin D has statistically significant positive associations with Age(0.152; p = 2.02e-12) and Calcium (0.291; p = <2e-16).
No other pairs of predictors are statistically significant.
All of these predictors have very low correlations with each other, indicating that there are no major concerns for multicollinearity.

Part 3: Reflection

Both ANOVA and Correlation are informative. ANOVA is relevant when you’re looking to compare the differences in a continuous variable across a categorical variable, whereas Correlations are useful for identifying associations in two continuous variables. With ANOVA, we might be able to dig into which other categorical variables affect the mean BMD values – sex or education level may be a factor. With correlation, we can identify the strength and direction of any association between continuous variables, like diastolic blood pressure and BMD.

In this dataset, the normality needed for a linear correlation was the toughest assumption we needed to reach. We were able to transform the data to be relatively close to normal but if we weren’t able to, we would likely need to do a different type of correlation – perhaps the Spearman Correlation. ANOVA in general is quite robust at large enough sample sizes, because of the Central Limit Theorem.

Additionally, because NHANES data is cross-sectional, it cannot be used to establish temporality or causation. We can only see if there’s an association between variables. With BMD, which was the focus of this analysis, in particular, this measure is something that is linked to very long, years-long, factors in someone’s life, whereas something like dietary calcium is recently measured and may not reflect historical calcium intake. In particular – it might be confounded by indication, given that high dietary calcium now could be linked to specifically taking calcium supplements as a result of bone issues.

I do wish that I had found the template to work from ahead of time, but working without the template was very helpful for my coding skills in general. The toughest part was figuring out how to structure the loop to execute Table C. I actually decided to give up on using the loop, and rbind the function call output 6 times, and then when I ran into the same error (needing a numeric for the correlation function), I realized that I was calling the column and I needed to access the contents of the row with double brackets. So I was able to go back in and correct it by adding in the [[1]] into the loop.