``{r setup-data} # Load NHANES data data(NHANES)
nhanes_adult <- NHANES %>% filter(Age >= 18, Age <= 80) %>% select(Age, Weight, Height, BMI, BPSysAve, BPDiaAve, Pulse, PhysActive, SleepHrsNight) %>% na.omit()
data.frame( Metric = “Sample Size”, Value = paste(nrow(nhanes_adult), “adults”) ) %>% kable()
Now it’s your turn to practice! Use the same NHANES dataset and follow the examples above.
Total Points: 25 points
Research Question: Is there a correlation between weight and height among US adults?
Your tasks:
Create a scatterplot with a fitted line (2 points)
Calculate Pearson correlation using cor.test() and
display with tidy() (3 points)
Test for statistical significance and state your conclusion (2 points)
Calculate r² and interpret in 2-3 sentences (3 points)
# YOUR CODE HERE
# a. Scatterplot
ggplot(nhanes_adult, aes(x = Weight, y = Height)) +
geom_point(alpha = 0.3, color = "darkgreen") +
geom_smooth(method = "lm", se = TRUE, color = "purple2") +
labs(
title = "Weight vs Height",
subtitle = "NHANES Data, Adults Weighing 35-230 Kg",
x = "Weight (Kg)",
y = "Height(Cm)"
) +
theme_minimal()
# b. Correlation test with tidy() display
cor_weight_bp <- cor.test(nhanes_adult$Weight, nhanes_adult$Height)
tidy(cor_weight_bp) %>%
select(estimate, statistic, p.value, conf.low, conf.high) %>%
kable(
digits = 3,
col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
caption = "Pearson Correlation: Weight and Height"
)
# c. Statistical significance
r_squared <- cor_weight_bp$estimate^2
data.frame(
Measure = c("Correlation (r)", "Coefficient of Determination (r²)",
"Variance Explained"),
Value = c(
round(cor_weight_bp$estimate, 3),
round(r_squared, 3),
paste0(round(r_squared * 100, 1), "%")
)
) %>%
kable(caption = "Summary of Correlation Strength")
# d. r² and interpretation (write as comment)
#Interpretation: There is a statistically significant slightly positive correlation between the weight and height since as the weight increases the hight does as well But weight explains only about 20.3% of the variation in height, suggesting other factors also play important roles. Research Question: What are the relationships among BMI, weight, and height?
Your tasks:
# YOUR CODE HERE
# a. Correlation matrix
# Select cardiovascular variables
bmi_vars <- nhanes_adult %>%
select(BMI, Weight, Height,)
# Calculate correlation matrix
cor_matrix <- cor(bmi_vars, use = "complete.obs")
# Display as table
cor_matrix %>%
kable(digits = 3, caption = "Health Correlation Matrix")
# b. Visualize with corrplot
corrplot(cor_matrix,
method = "circle",
type = "upper",
tl.col = "black",
tl.srt = 45,
addCoef.col = "black",
number.cex = 0.7,
col = colorRampPalette(c("#3498db", "white", "#e74c3c"))(200),
title = "Health Correlations",
mar = c(0,0,2,0))
# c. Strongest correlation:
data.frame(
Relationship = c(
"Height & Weight",
"Height & BMI",
"Weight & BMI",
"Height, Weight, & BMI"
),
Correlation = c(
round(cor_matrix["Height", "Weight"], 3),
round(cor_matrix["Height", "BMI"], 3),
round(cor_matrix["Weight", "BMI"], 3),
NA # no single correlation for 3 variables
),
Strength = c("Strong", "Moderate", "Weak-Moderate", "Moderate"),
stringsAsFactors = FALSE
) %>%
kable(caption = "Notable Correlations Summary")
# d. Explanation (write as comment)
# Height and BMI show the strongest correlation (r = 0.880), which makes sense because BMI is directly calculated using weight (and height), so increases in weight strongly increase BMI when height is relatively stable. Height and weight show a moderate positive correlation (r = 0.451), suggesting taller individuals tend to weigh more, but not perfectly.Research Question: Is there a relationship between hours of sleep and age?
Your tasks:
tidy()
(2 points)# YOUR CODE HERE
# a. Scatterplot
ggplot(nhanes_adult, aes(x = SleepHrsNight, y = Age)) +
geom_point(alpha = 0.3, color = "pink") +
geom_smooth(method = "lm", se = TRUE, color = "darkblue") +
labs(
title = "Sleeping time vs Age among Adults",
subtitle = "NHANES Data, Adults 18-80 years",
x = "Total Sleeping time in hrs",
y = "Age in years"
) +
theme_minimal()
# b. Correlation with tidy()
cor_sleep_age <- cor.test(nhanes_adult$SleepHrsNight, nhanes_adult$Age)
tidy(cor_sleep_age) %>%
select(estimate, statistic, p.value, conf.low, conf.high) %>%
kable(
digits = 3,
col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
caption = "Pearson Correlation: Sleep and Age"
)
# c. Interpretation (write as comment)
# The Pearson correlation between sleep duration and age is very weak and positive (r = 0.023). The p-value (p = 0.057) is slightly above the significance level of 0.05, showing that this relationship is not statistically significant. Additionally, the 95% confidence interval (-0.001, 0.046) includes zero, further suggesting that there is no meaningful linear association between sleep time and age in this sample.Save your work with your name:
Correlation_Lab_YourName.Rmd
Knit to HTML to create your report
Publish to RPubs:
Submit to Brightspace:
Due: End of class today
Grading: This lab is worth 15% of your in-class lab grade. The lowest 2 lab grades are dropped.
cor.test() - Calculate correlation and test
significancetidy() - Clean display of statistical test resultscor() - Calculate correlation matrixcorrplot() - Visualize correlation matrixggplot() + geom_point() - Scatterplotsgeom_smooth(method="lm") - Add fitted regression
lineqqnorm() / qqline() - Check normality?cor.test in
consoleRemember:
✓ Correlation measures LINEAR relationships only
✓ Always visualize your data first
✓ Correlation ≠ Causation
✓ Check your assumptions
✓ Consider confounding and alternative explanations
This lab activity was created for EPI 553: Principles of
Statistical Inference II
University at Albany, College of Integrated Health
Sciences
Spring 2026