Homework 1 overview

Due: Thursday, February 20, 2026 (11:59 pm ET)
Topics: One-way ANOVA + Correlation (based on Lectures/Labs 02–03)
Dataset: bmd.csv (NHANES 2017–2018 DEXA subsample)

AI policy (Homework 1): AI tools (ChatGPT, Gemini, Claude, Copilot, etc.) are not permitted for coding assistance on HW1. You must write/debug your code independently.

Data description

This homework uses bmd.csv, a subset of NHANES 2017–2018 respondents who completed a DEXA scan.

Key variables:

Variable	Description	Type
`DXXOFBMD`	Total femur bone mineral density	Continuous (g/cm²)
`RIDRETH1`	Race/ethnicity	Categorical (1–5)
`RIAGENDR`	Sex	Categorical (1=Male, 2=Female)
`RIDAGEYR`	Age in years	Continuous
`BMXBMI`	Body mass index	Continuous (kg/m²)
`smoker`	Smoking status	Categorical (1=Current, 2=Past, 3=Never)
`calcium`	Dietary calcium intake	Continuous (mg/day)
`DSQTCALC`	Supplemental calcium	Continuous (mg/day)
`vitd`	Dietary vitamin D intake	Continuous (mcg/day)
`DSQTVD`	Supplemental vitamin D	Continuous (mcg/day)
`totmet`	Total physical activity (MET-min/week)	Continuous

Part 0 — Load and prepare the data

0.1 Import

Load required packages and import the dataset.

# Load required packages
library(tidyverse)
library(car)
library(kableExtra)
library(broom)

# Import data (adjust path as needed)
bmd <- readr::read_csv("/Users/nataliasmall/Downloads/EPI 553/Assignments/HW01/bmd.csv", show_col_types = FALSE)

# Inspect the data
glimpse(bmd)

## Rows: 2,898
## Columns: 14
## $ SEQN     <dbl> 93705, 93708, 93709, 93711, 93713, 93714, 93715, 93716, 93721…
## $ RIAGENDR <dbl> 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ RIDAGEYR <dbl> 66, 66, 75, 56, 67, 54, 71, 61, 60, 60, 64, 67, 70, 53, 57, 7…
## $ RIDRETH1 <dbl> 4, 5, 4, 5, 3, 4, 5, 5, 1, 3, 3, 1, 5, 4, 2, 3, 2, 4, 4, 3, 3…
## $ BMXBMI   <dbl> 31.7, 23.7, 38.9, 21.3, 23.5, 39.9, 22.5, 30.7, 35.9, 23.8, 2…
## $ smoker   <dbl> 2, 3, 1, 3, 1, 2, 1, 2, 3, 3, 3, 2, 2, 3, 3, 2, 3, 1, 2, 1, 1…
## $ totmet   <dbl> 240, 120, 720, 840, 360, NA, 6320, 2400, NA, NA, 1680, 240, 4…
## $ metcat   <dbl> 0, 0, 1, 1, 0, NA, 2, 2, NA, NA, 2, 0, 0, 0, 1, NA, 0, NA, 1,…
## $ DXXOFBMD <dbl> 1.058, 0.801, 0.880, 0.851, 0.778, 0.994, 0.952, 1.121, NA, 0…
## $ tbmdcat  <dbl> 0, 1, 0, 1, 1, 0, 0, 0, NA, 1, 0, 0, 1, 0, 0, 1, NA, NA, 0, N…
## $ calcium  <dbl> 503.5, 473.5, NA, 1248.5, 660.5, 776.0, 452.0, 853.5, 929.0, …
## $ vitd     <dbl> 1.85, 5.85, NA, 3.85, 2.35, 5.65, 3.75, 4.45, 6.05, 6.45, 3.3…
## $ DSQTVD   <dbl> 20.557, 25.000, NA, 25.000, NA, NA, NA, 100.000, 50.000, 46.6…
## $ DSQTCALC <dbl> 211.67, 820.00, NA, 35.00, 13.33, NA, 26.67, 1066.67, 35.00, …

0.2 Recode to readable factors

Create readable labels for the main categorical variables:

bmd <- bmd %>%
  mutate(
    sex = factor(RIAGENDR, 
                 levels = c(1, 2), 
                 labels = c("Male", "Female")),
    ethnicity = factor(RIDRETH1, 
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Mexican American", 
                                  "Other Hispanic",
                                  "Non-Hispanic White", 
                                  "Non-Hispanic Black", 
                                  "Other")),
    smoker_f = factor(smoker, 
                      levels = c(1, 2, 3),
                      labels = c("Current", "Past", "Never"))
  )

0.3 Missingness + analytic sample

Report the total sample size and create the analytic dataset for ANOVA by removing missing values:

# Total observations
n_total <- nrow(bmd)

# Create analytic dataset for ANOVA (complete cases for BMD and ethnicity / remove missing BMD and ethnicity)
anova_df <- bmd %>%
  filter(!is.na(DXXOFBMD), !is.na(ethnicity))

# Sample size for ANOVA analysis
n_anova <- nrow(anova_df)

# Display sample sizes
n_total

## [1] 2898

n_anova

## [1] 2286

2–4 sentences summarizing analytic sample

The total sample contains 2898 observations and 17 variables. 612 observations were removed due to missing data, which accounts for about 21% missing before cleaning. The final analytic sample size is 2286 for the ANOVA analysis.

Part 1 — One-way ANOVA: BMD by ethnicity

Research question: Do mean BMD values differ across ethnicity groups?

1.1 Hypotheses (required)

Statistical notation:

H₀: μ_1 = μ_2 = μ_3 = μ_4 = μ_5
Hₐ: At least one ethnicity group has a mean BMD value that differs from the others

Plain language:

H₀: Mean BMD is equal across all ethnicity groups
Hₐ: At least one ethnicity groups mean BMD differs from the others

1.2 Exploratory plot and group summaries

Table showing sample size, mean, and standard deviation of BMD for each ethnicity group:

anova_df %>%
  group_by(ethnicity) %>%
  summarise(
    n = n(),
    mean_bmd = mean(DXXOFBMD, na.rm = TRUE),
    sd_bmd = sd(DXXOFBMD, na.rm = TRUE)
  ) %>%
  arrange(desc(n)) %>%
  kable(digits = 3)

ethnicity	n	mean_bmd	sd_bmd
Non-Hispanic White	846	0.888	0.160
Non-Hispanic Black	532	0.973	0.161
Other	409	0.897	0.148
Mexican American	255	0.950	0.149
Other Hispanic	244	0.946	0.156

Visualization comparing BMD distributions across ethnicity groups:

ggplot(anova_df, aes(x = ethnicity, y = DXXOFBMD)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.15, alpha = 0.25) +
  labs(
    x = "Ethnicity group",
    y = "Bone mineral density (g/cm²)"
  ) +
  theme(axis.text.x = element_text(angle = 25, hjust = 1))

Interpret the plot: What patterns do you observe in BMD across ethnicity groups? “Non-Hispanic Black” individuals show the higher mean BMD compared to other ethnicity groups. The “Mexican American” and “Other Hispanic” groups follow closely behind, with “Non-Hispanic White” and “Other” groups displaying lower mean BMD values. The “Non-Hispanic Black” and “Non-Hispanic White” have the highest concentration of dots, indicating that they have greater number of people within their ethnicity groups, compared to the other ethnicity groups, like “Mexican American” and “Other Hispanic” who have lower a lower number of people in their groups.

1.3 Fit ANOVA model + report the F-test

Fit a one-way ANOVA with:

Outcome: BMD (DXXOFBMD)
Predictor: Ethnicity (5 groups)

# Fit ANOVA model
fit_aov <- aov(DXXOFBMD ~ ethnicity, data = anova_df)

# Display ANOVA table
summary(fit_aov)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## ethnicity      4   2.95  0.7371   30.18 <2e-16 ***
## Residuals   2281  55.70  0.0244                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Report the F-statistic, degrees of freedom, and p-value in a formatted table:

# Create tidy ANOVA table
broom::tidy(fit_aov) %>%
  kable(digits = 4)

term	df	sumsq	meansq	statistic	p.value
ethnicity	4	2.9482	0.7371	30.185	0
Residuals	2281	55.6973	0.0244	NA	NA

Write your interpretation here: Since p < 0.05, we reject H₀. There is statistically significant evidence that at least one mean BMD value differs across ethnicity groups.

1.4 Assumption checks (normality + equal variances)

ANOVA requires three assumptions: independence, normality of residuals, and equal variances. Check the latter two graphically and with formal tests.

If assumptions are not satisfied, say so clearly, but still proceed with the ANOVA and post-hoc tests.

Normality of residuals

Create diagnostic plots:

par(mfrow = c(1, 2))
plot(fit_aov, which = 1)  # Residuals vs fitted
plot(fit_aov, which = 2)  # QQ plot

Equal variances (Levene’s test)

car::leveneTest(DXXOFBMD ~ ethnicity, data = anova_df)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    4  1.5728 0.1788
##       2281

Summarize what you see in 2–4 sentences:

Based on the QQ plot, points do follow the diagonal line, so normality assumption is reasonable. Based on Levene’s test, since p > 0.05, fail to reject equal variances. Equal variance assumption is met. Residuals vs Fitted show random scatter with no pattern, indicating that the independence assumption holds.

1.5 Post-hoc comparisons (pairwise tests)

Because ethnicity has 5 groups, you must do a multiple-comparisons procedure.

Use Tukey HSD and report:

Pairwise comparisons
Mean differences
95% confidence intervals
Adjusted p-values

tuk <- TukeyHSD(fit_aov)
tuk

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = DXXOFBMD ~ ethnicity, data = anova_df)
## 
## $ethnicity
##                                               diff          lwr         upr
## Other Hispanic-Mexican American       -0.004441273 -0.042644130  0.03376158
## Non-Hispanic White-Mexican American   -0.062377249 -0.092852618 -0.03190188
## Non-Hispanic Black-Mexican American    0.022391619 -0.010100052  0.05488329
## Other-Mexican American                -0.053319344 -0.087357253 -0.01928144
## Non-Hispanic White-Other Hispanic     -0.057935976 -0.088934694 -0.02693726
## Non-Hispanic Black-Other Hispanic      0.026832892 -0.006150151  0.05981593
## Other-Other Hispanic                  -0.048878071 -0.083385341 -0.01437080
## Non-Hispanic Black-Non-Hispanic White  0.084768868  0.061164402  0.10837333
## Other-Non-Hispanic White               0.009057905 -0.016633367  0.03474918
## Other-Non-Hispanic Black              -0.075710963 -0.103764519 -0.04765741
##                                           p adj
## Other Hispanic-Mexican American       0.9978049
## Non-Hispanic White-Mexican American   0.0000003
## Non-Hispanic Black-Mexican American   0.3276613
## Other-Mexican American                0.0001919
## Non-Hispanic White-Other Hispanic     0.0000036
## Non-Hispanic Black-Other Hispanic     0.1722466
## Other-Other Hispanic                  0.0010720
## Non-Hispanic Black-Non-Hispanic White 0.0000000
## Other-Non-Hispanic White              0.8719109
## Other-Non-Hispanic Black              0.0000000

Create and present a readable table (you can tidy the Tukey output):

# YOUR CODE HERE to create a tidy table from Tukey results
# Hint: Convert to data frame, add rownames as column, filter for significance

# Extract and format results
tukey_summary <- as.data.frame(tuk$ethnicity)
tukey_summary$Comparison <- rownames(tukey_summary)
tukey_summary <- tukey_summary[, c("Comparison", "diff", "lwr", "upr", "p adj")]

# Add interpretation columns
tukey_summary$Significant <- ifelse(tukey_summary$`p adj` < 0.05, "Yes", "No")
tukey_summary$Direction <- ifelse(
  tukey_summary$Significant == "Yes",
  ifelse(tukey_summary$diff > 0, "First group higher", "Second group higher"),
  "No difference"
)

kable(tukey_summary, digits = 4,
      caption = "Tukey HSD post-hoc comparisons with 95% confidence intervals")

Tukey HSD post-hoc comparisons with 95% confidence intervals
	Comparison	diff	lwr	upr	p adj	Significant	Direction
Other Hispanic-Mexican American	Other Hispanic-Mexican American	-0.0044	-0.0426	0.0338	0.9978	No	No difference
Non-Hispanic White-Mexican American	Non-Hispanic White-Mexican American	-0.0624	-0.0929	-0.0319	0.0000	Yes	Second group higher
Non-Hispanic Black-Mexican American	Non-Hispanic Black-Mexican American	0.0224	-0.0101	0.0549	0.3277	No	No difference
Other-Mexican American	Other-Mexican American	-0.0533	-0.0874	-0.0193	0.0002	Yes	Second group higher
Non-Hispanic White-Other Hispanic	Non-Hispanic White-Other Hispanic	-0.0579	-0.0889	-0.0269	0.0000	Yes	Second group higher
Non-Hispanic Black-Other Hispanic	Non-Hispanic Black-Other Hispanic	0.0268	-0.0062	0.0598	0.1722	No	No difference
Other-Other Hispanic	Other-Other Hispanic	-0.0489	-0.0834	-0.0144	0.0011	Yes	Second group higher
Non-Hispanic Black-Non-Hispanic White	Non-Hispanic Black-Non-Hispanic White	0.0848	0.0612	0.1084	0.0000	Yes	First group higher
Other-Non-Hispanic White	Other-Non-Hispanic White	0.0091	-0.0166	0.0347	0.8719	No	No difference
Other-Non-Hispanic Black	Other-Non-Hispanic Black	-0.0757	-0.1038	-0.0477	0.0000	Yes	Second group higher

Write a short paragraph: The groups that differed were Non-Hispanic White - Mexican American, Other - Mexican American, Non-Hispanic White - Other Hispanic, Other - Other Hispanic, Non-Hispanic Black - Non-Hispanic White, and Other - Non-Hispanic Black. Moreover, Non-Hispanic White - Mexican American, Other - Mexican American, Non-Hispanic White - Other Hispanic, Other - Other Hispanic, Non-Hispanic Black - Non-Hispanic White, and Other - Non-Hispanic Black were specific ethnicity pairs that showed statistically significant differences in mean BMD.

1.6 Effect size (eta-squared)

Compute η² and interpret it (small/moderate/large is okay as a qualitative statement).

Formula: η² = SS_between / SS_total

# Get ANOVA table with sum of squares
anova_tbl <- broom::tidy(fit_aov)
anova_tbl

## # A tibble: 2 × 6
##   term         df sumsq meansq statistic   p.value
##   <chr>     <dbl> <dbl>  <dbl>     <dbl>     <dbl>
## 1 ethnicity     4  2.95 0.737       30.2  1.65e-24
## 2 Residuals  2281 55.7  0.0244      NA   NA

# YOUR CODE HERE to compute eta-squared
# Hint: Extract SS for ethnicity and Residuals, then calculate proportion
ss_between<- anova_tbl$sumsq[1]
ss_residuals<- anova_tbl$sumsq [2]
ss_total <- ss_between + ss_residuals
eta_squared <- ss_between / ss_total

Interpret in 1–2 sentences:

Our calculated effect size is η² = 0.0503 which is around medium. This means ethnicity explains approximately 5.03% of the variance in BMD.

Part 2 — Correlation analysis (40 points)

In this section you will:

Create total intake variables (dietary + supplemental)
Assess whether transformations are needed
Produce three correlation tables examining relationships between:
- Table A: Predictors vs. outcome (BMD)
- Table B: Predictors vs. exposure (physical activity)
- Table C: Predictors vs. each other

2.1 Create total intake variables

Calculate total nutrient intake by adding dietary and supplemental sources:

# Create total calcium and vitamin D variables
# Note: Replace NA in supplement variables with 0 (no supplementation)
bmd2 <- bmd %>%
  mutate(
    DSQTCALC_0 = replace_na(DSQTCALC, 0),
    DSQTVD_0 = replace_na(DSQTVD, 0),
    calcium_total = calcium + DSQTCALC_0,
    vitd_total = vitd + DSQTVD_0
  )

2.2 Decide on transformations (show evidence)

Before calculating correlations, examine the distributions of continuous variables. Based on skewness and the presence of outliers, you may need to apply transformations.

Create histograms to assess distributions:

# YOUR CODE HERE: Create histograms for key continuous variables
# Recommended: BMXBMI, calcium_total, vitd_total, totmet

library(patchwork)
p1 <- ggplot(bmd2, aes(x = BMXBMI)) + geom_histogram(bins = 30)
p2 <- ggplot(bmd2, aes(x = calcium_total)) + geom_histogram(bins = 30)
p3 <- ggplot(bmd2, aes(x = vitd_total)) + geom_histogram(bins = 30)
p4 <- ggplot(bmd2, aes(x = totmet)) + geom_histogram(bins = 30)
(p1 | p2) / (p3 | p4)

Based on your assessment, apply transformations as needed:

bmd2 <- bmd2 %>%
  mutate(
    # Add your transformation code here
    #doing log bmi for table A further down
    log_calcium_total = log(calcium_total),
    sqrt_totmet = sqrt(totmet),
    sqrt_vitd_total = sqrt(vitd_total),
    log_bmi = log(BMXBMI)
  )

Document your reasoning: I applied transformations to vitd_total,calcium_total, and totmet because relationships are clearly non-linear. vitd_total and totmet appear severely right skewed, and calcium_total while not as severely affected can also be seen as rightly skewed.

2.3 Create three correlation tables (Pearson)

For each correlation, report:

Correlation coefficient (r)
p-value
Sample size (n)

Helper function for correlation pairs

# Function to calculate Pearson correlation for a pair of variables
corr_pair <- function(data, x, y) {
  # Select variables and remove missing values
  d <- data %>%
    select(all_of(c(x, y))) %>%
    drop_na()
  
  # Need at least 3 observations for correlation
  if(nrow(d) < 3) {
    return(tibble(
      x = x,
      y = y,
      n = nrow(d),
      estimate = NA_real_,
      p_value = NA_real_
    ))
  }
  
  # Calculate Pearson correlation
  ct <- cor.test(d[[x]], d[[y]], method = "pearson")
  
  # Return tidy results
  tibble(
    x = x,
    y = y,
    n = nrow(d),
    estimate = unname(ct$estimate),
    p_value = ct$p.value
  )
}

Table A: Variables associated with the outcome (BMD)

Examine correlations between potential predictors and BMD:

Age vs. BMD
BMI vs. BMD
Total calcium vs. BMD
Total vitamin D vs. BMD

Use transformed versions where appropriate.

# YOUR CODE HERE
# Use the corr_pair function to create correlations
# Use bind_rows() to combine multiple correlation results
# Format with kable()

tableA <- bind_rows(
corr_pair(bmd2, "RIDAGEYR", "DXXOFBMD"),
corr_pair(bmd2, "log_bmi", "DXXOFBMD"),
corr_pair(bmd2, "log_calcium_total", "DXXOFBMD"),
corr_pair(bmd2, "sqrt_vitd_total", "DXXOFBMD")
) %>%
mutate(
estimate = round(estimate, 3),
p_value = signif(p_value, 3)
)

tableA %>% kable()

x	y	n	estimate	p_value
RIDAGEYR	DXXOFBMD	2286	-0.232	0.00000
log_bmi	DXXOFBMD	2275	0.425	0.00000
log_calcium_total	DXXOFBMD	2129	0.012	0.59200
sqrt_vitd_total	DXXOFBMD	2129	-0.062	0.00426

Interpret the results: RIDAGEYR, log_bmi, and sqrt_vitd_total show statistically significant correlations with BMD. RIDAGEYR is going in the negative direction, and has a weak correlation relationship. log_bmi is going in the positive direction, and has a moderate correlation relationship. sqrt_vitd_total is going in the negative direction, and has a weak correlation relationship.

Table B: Variables associated with the exposure (MET)

Examine correlations between potential confounders and physical activity:

Age vs. MET
BMI vs. MET
Total calcium vs. MET
Total vitamin D vs. MET

# YOUR CODE HERE
# Follow the same structure as Table A
# Use sqrt_totmet (or transformed version) as the outcome variable
tableB <- bind_rows(
corr_pair(bmd2, "RIDAGEYR", "sqrt_totmet"),
corr_pair(bmd2, "log_bmi", "sqrt_totmet"),
corr_pair(bmd2, "log_calcium_total", "sqrt_totmet"),
corr_pair(bmd2, "sqrt_vitd_total", "sqrt_totmet")
) %>%
mutate(
estimate = round(estimate, 3),
p_value = signif(p_value, 3)
)

tableB %>% kable()

x	y	n	estimate	p_value
RIDAGEYR	sqrt_totmet	1934	-0.097	1.96e-05
log_bmi	sqrt_totmet	1912	0.001	9.51e-01
log_calcium_total	sqrt_totmet	1777	0.086	2.82e-04
sqrt_vitd_total	sqrt_totmet	1777	-0.002	9.32e-01

Interpret the results: RIDAGEYR, log_calcium_total, and sqrt_vitd_total are associated with physical activity levels.

Table C: Predictors associated with each other

Examine correlations among all predictor variables (all pairwise combinations):

Age, BMI, Total calcium, Total vitamin D

# YOUR CODE HERE
# Create all pairwise combinations of predictors
# Hint: Use combn() to generate pairs, then map_dfr() with corr_pair()

preds <- c("RIDAGEYR", "log_bmi", "log_calcium_total", "sqrt_vitd_total")
pairs <- combn(preds, 2, simplify = FALSE)
tableC <- map_dfr(pairs, ~corr_pair(bmd2, .x[1], .x[2])) %>%
mutate(
estimate = round(estimate, 3),
p_value = signif(p_value, 3)
)

tableC %>% kable()

x	y	n	estimate	p_value
RIDAGEYR	log_bmi	2834	-0.098	2.00e-07
RIDAGEYR	log_calcium_total	2605	0.048	1.36e-02
RIDAGEYR	sqrt_vitd_total	2605	0.153	0.00e+00
log_bmi	log_calcium_total	2569	0.000	9.81e-01
log_bmi	sqrt_vitd_total	2569	0.007	7.31e-01
log_calcium_total	sqrt_vitd_total	2605	0.314	0.00e+00

Interpret the results:

There are no strong correlations among predictors that might indicate multicollinearity concerns in future regression analyses.

Optional: Visualization examples

Create scatterplots to visualize key relationships:

#BMD vs BMI
ggplot(bmd2, aes(x = log_bmi, y = DXXOFBMD)) +
  geom_point(alpha = 0.25) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    x = "log(BMI)",
    y = "BMD (g/cm²)",
    title = "BMD vs BMI (transformed)"
  )

#BMD vs Physical Activity
ggplot(bmd2, aes(x = sqrt_totmet, y = DXXOFBMD)) +
  geom_point(alpha = 0.25) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    x = "sqrt(totmet)",
    y = "BMD (g/cm²)",
    title = "BMD vs MET (transformed)"
  )

Part 3 — Reflection

You would use ANOVA versus correlation in epidemiological research when one variable is categorical. ANOVA addresses the multiple comparisons problem. When comparing 3+ groups. While, correlation measures the strength and direction of the LINEAR relationship between two continuous variables. One example of a research question better suited to ANOVA is: How does respiratory hospital admission rate vary by air pollution level (low/moderate/high)? One example of a research question better suited to correlation is: Is there a correlation between age and systolic blood pressure among US adults?

In this assignment the independence assumption was most challenging to meet.I feel like the QQ plot made it clear to assume normality, Levene for variance, and although we has Residuals vs Fitted for independence, it was still tricky. Violation of assumptions can affect conclusions because they can distort results. For example: outliers can have large effects on F-test validity. The limitations of cross-sectional correlation analysis for understanding relationships between nutrition/activity and bone health is that you can’t determine causality, correlation does not equal causation.

The most difficult part of the R coding for this assignment was making sure that the right variables were going into the right places, especially when creating correlation tables. Also, the process to get eta_squared was difficult because I wanted to use the SUM function like previous in-class examples to get the total, but I was running into errors because of the order I was placing things. I ultimately determined that I could simply just use a plus sign instead.I also assumed ss_ethnicity was ss_between because there were 2 rows in the table ethnicity and residuals. And from previous examples I knew that I would have to sum the two in the table so, process of elimination. The R skill that I strengthen through completing this assignment was probably pearson correlation tables.

EPI 553 Homework 1: ANOVA & Correlation

EPI 553/STA 553: Principles of Statistical Inference II (Spring 2026)

Natalia Small

2026-02-20