``{r setup-data} # Load NHANES data data(NHANES)

Select adult participants with complete data

nhanes_adult <- NHANES %>% filter(Age >= 18, Age <= 80) %>% select(Age, Weight, Height, BMI, BPSysAve, BPDiaAve, Pulse, PhysActive, SleepHrsNight) %>% na.omit()

Display sample

Display sample size

data.frame( Metric = “Sample Size”, Value = paste(nrow(nhanes_adult), “adults”) ) %>% kable()


PART B: YOUR TURN - Practice Problems

Now it’s your turn to practice! Use the same NHANES dataset and follow the examples above.

Total Points: 25 points


Problem 1: Weight and Height (10 points)

Research Question: Is there a correlation between weight and height among US adults?

Your tasks:

  1. Create a scatterplot with a fitted line (2 points)

  2. Calculate Pearson correlation using cor.test() and display with tidy() (3 points)

  3. Test for statistical significance and state your conclusion (2 points)

  4. Calculate r² and interpret in 2-3 sentences (3 points)

# YOUR CODE HERE

# a. Scatterplot

ggplot(nhanes_adult, aes(x = Weight, y = Height)) +
  geom_point(alpha = 0.3, color = "darkgreen") +
  geom_smooth(method = "lm", se = TRUE, color = "purple2") +
  labs(
    title = "Weight vs Height",
    subtitle = "NHANES Data, Adults Weighing 35-230 Kg",
    x = "Weight (Kg)",
    y = "Height(Cm)"
  ) +
  theme_minimal()

# b. Correlation test with tidy() display

cor_weight_bp <- cor.test(nhanes_adult$Weight, nhanes_adult$Height)
tidy(cor_weight_bp) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: Weight and Height"
  )


# c. Statistical significance
r_squared <- cor_weight_bp$estimate^2

data.frame(
  Measure = c("Correlation (r)", "Coefficient of Determination (r²)", 
              "Variance Explained"),
  Value = c(
    round(cor_weight_bp$estimate, 3),
    round(r_squared, 3),
    paste0(round(r_squared * 100, 1), "%")
  )
) %>%
  kable(caption = "Summary of Correlation Strength")



# d. r² and interpretation (write as comment)
#Interpretation: There is a statistically significant slightly positive correlation between the weight and height since as the weight increases the hight does as well But weight explains only about 20.3% of the variation in height, suggesting other factors also play important roles. 

Problem 2: Correlation Matrix Analysis (10 points)

Research Question: What are the relationships among BMI, weight, and height?

Your tasks:

  1. Create a correlation matrix for: Weight, Height, BMI (3 points)
  2. Visualize the matrix using corrplot (3 points)
  3. Identify which pair has the strongest correlation (2 points)
  4. Explain why that correlation makes sense biologically/mathematically (2 points)
# YOUR CODE HERE

# a. Correlation matrix

# Select cardiovascular variables
bmi_vars <- nhanes_adult %>%
  select(BMI, Weight, Height,)

# Calculate correlation matrix
cor_matrix <- cor(bmi_vars, use = "complete.obs")

# Display as table
cor_matrix %>%
  kable(digits = 3, caption = "Health Correlation Matrix")
# b. Visualize with corrplot
corrplot(cor_matrix, 
         method = "circle",
         type = "upper",
         tl.col = "black",
         tl.srt = 45,
         addCoef.col = "black",
         number.cex = 0.7,
         col = colorRampPalette(c("#3498db", "white", "#e74c3c"))(200),
         title = "Health Correlations",
         mar = c(0,0,2,0))

# c. Strongest correlation:
data.frame(
  Relationship = c(
    "Height & Weight",
    "Height & BMI",
    "Weight & BMI",
    "Height, Weight, & BMI"
  ),
  Correlation = c(
    round(cor_matrix["Height", "Weight"], 3),
    round(cor_matrix["Height", "BMI"], 3),
    round(cor_matrix["Weight", "BMI"], 3),
    NA  # no single correlation for 3 variables
  ),
  Strength = c("Strong", "Moderate", "Weak-Moderate", "Moderate"),
  stringsAsFactors = FALSE
) %>%
  kable(caption = "Notable Correlations Summary")

# d. Explanation (write as comment)
# Height and BMI show the strongest correlation (r = 0.880), which makes sense because BMI is directly calculated using weight (and height), so increases in weight strongly increase BMI when height is relatively stable. Height and weight show a moderate positive correlation (r = 0.451), suggesting taller individuals tend to weigh more, but not perfectly.

Problem 3: Sleep and Age (5 points)

Research Question: Is there a relationship between hours of sleep and age?

Your tasks:

  1. Create a scatterplot (1 point)
  2. Calculate Pearson correlation and display with tidy() (2 points)
  3. Interpret whether the relationship is statistically significant (2 points)
# YOUR CODE HERE

# a. Scatterplot
ggplot(nhanes_adult, aes(x = SleepHrsNight, y = Age)) +
  geom_point(alpha = 0.3, color = "pink") +
  geom_smooth(method = "lm", se = TRUE, color = "darkblue") +
  labs(
    title = "Sleeping time vs Age among  Adults",
    subtitle = "NHANES Data, Adults 18-80 years",
    x = "Total Sleeping time in hrs",
    y = "Age in years"
  ) +
  theme_minimal()

# b. Correlation with tidy()
cor_sleep_age <- cor.test(nhanes_adult$SleepHrsNight, nhanes_adult$Age)

tidy(cor_sleep_age) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: Sleep and Age"
  )

# c. Interpretation (write as comment)
# The Pearson correlation between sleep duration and age is very weak and positive (r = 0.023). The p-value (p = 0.057) is slightly above the significance level of 0.05, showing that this relationship is not statistically significant. Additionally, the 95% confidence interval (-0.001, 0.046) includes zero, further suggesting that there is no meaningful linear association between sleep time and age in this sample.

Bonus (Optional, 5 extra points)

Challenge: Investigate the relationship between two variables of your choice from the NHANES dataset. Include:

  • Scatterplot
  • Correlation test with clean display
  • Assumption checks
  • Thoughtful interpretation
# YOUR CODE HERE

Grading Rubric

Problem 1: Weight and Height (10 points)

    1. Scatterplot properly formatted with labels: 2 points
    1. Correct correlation with clean display: 3 points
    1. Significance test correctly interpreted: 2 points
    1. r² calculated and interpreted: 3 points

Problem 2: Correlation Matrix (10 points)

    1. Correct matrix calculated: 3 points
    1. Well-formatted correlation plot: 3 points
    1. Strongest correlation identified: 2 points
    1. Biological/mathematical explanation: 2 points

Problem 3: Sleep and Age (5 points)

    1. Scatterplot: 1 point
    1. Correlation calculated and displayed: 2 points
    1. Interpretation of significance: 2 points

Submission Instructions

  1. Save your work with your name: Correlation_Lab_YourName.Rmd

  2. Knit to HTML to create your report

  3. Publish to RPubs:

    • Click the Publish button (blue icon) in the HTML preview window
    • Choose RPubs from the options
    • Follow the prompts to publish (create account if needed)
    • Copy your RPubs URL
  4. Submit to Brightspace:

    • Upload your .Rmd file
    • Paste your RPubs link in the assignment comments or submission text box
  5. Due: End of class today

Grading: This lab is worth 15% of your in-class lab grade. The lowest 2 lab grades are dropped.


Additional Resources

R Functions Used Today

  • cor.test() - Calculate correlation and test significance
  • tidy() - Clean display of statistical test results
  • cor() - Calculate correlation matrix
  • corrplot() - Visualize correlation matrix
  • ggplot() + geom_point() - Scatterplots
  • geom_smooth(method="lm") - Add fitted regression line
  • qqnorm() / qqline() - Check normality

For More Help

  • Textbook: Chapter 6 - Correlation Analysis
  • Office Hours: See syllabus
  • TA Help: See syllabus
  • R Documentation: Type ?cor.test in console

Remember:

✓ Correlation measures LINEAR relationships only
✓ Always visualize your data first
✓ Correlation ≠ Causation
✓ Check your assumptions
✓ Consider confounding and alternative explanations


This lab activity was created for EPI 553: Principles of Statistical Inference II
University at Albany, College of Integrated Health Sciences
Spring 2026