Lab Overview

Time: ~30 minutes

Goal: Practice correlation analysis from start to finish using real public health data

Learning Objectives:

Understand when and why to use correlation analysis
Calculate and interpret Pearson correlation coefficients
Test hypotheses about correlation
Check correlation assumptions
Distinguish between correlation and causation
Use Spearman correlation for non-normal data

Structure:

Part A: Guided Examples (follow along with instructor)
Part B: Your Turn (independent practice)

Submission: Publish to RPubs and submit your .Rmd file + RPubs link to Brightspace by end of class

Background: Why Correlation Matters

What is Correlation?

Correlation measures the strength and direction of the LINEAR relationship between two continuous variables.

Range: -1 ≤ r ≤ 1
r = 1: Perfect positive relationship (as X ↑, Y ↑)
r = -1: Perfect negative relationship (as X ↑, Y ↓)
r = 0: No linear relationship
|r| > 0.7: Strong correlation
0.3 < |r| < 0.7: Moderate correlation
|r| < 0.3: Weak correlation

When to Use Correlation

✅ Use correlation when:

Both variables are continuous (or at least ordinal)
You want to measure strength/direction of linear relationship
You’re exploring data before regression
You want to describe associations (not causation)

❌ Don’t use when:

One variable is categorical → use t-test or ANOVA
Relationship is clearly non-linear → consider transformation
You want to establish causation → use experimental design
You want to predict values → use regression

Important Warning

⚠️ CORRELATION ≠ CAUSATION

Just because two variables are correlated does NOT mean one causes the other!

Classic Example: Ice cream sales and drowning deaths are highly correlated. Does ice cream cause drowning? NO! Both increase in summer (confounding by temperature/season).

PART A: GUIDED EXAMPLES

Setup: Load Packages and Data

# Load NHANES data
data(NHANES)

# Select adult participants with complete data
nhanes_adult <- NHANES %>%
  filter(Age >= 18, Age <= 80) %>%
  select(Age, Weight, Height, BMI, BPSysAve, BPDiaAve, 
         Pulse, PhysActive, SleepHrsNight) %>%
  na.omit()

# Display sample
# Display sample size
data.frame(
  Metric = "Sample Size",
  Value = paste(nrow(nhanes_adult), "adults")
) %>%
  kable()

Metric	Value
Sample Size	7133 adults

head(nhanes_adult, 8) %>%
  kable(digits = 1, caption = "NHANES Adult Data Sample")

NHANES Adult Data Sample
Age	Weight	Height	BMI	BPSysAve	BPDiaAve	Pulse	PhysActive	SleepHrsNight
34	87.4	164.7	32.2	113	85	70	No	4
34	87.4	164.7	32.2	113	85	70	No	4
34	87.4	164.7	32.2	113	85	70	No	4
49	86.7	168.4	30.6	112	75	86	No	8
45	75.7	166.7	27.2	118	64	62	Yes	8
45	75.7	166.7	27.2	118	64	62	Yes	8
45	75.7	166.7	27.2	118	64	62	Yes	8
66	68.0	169.5	23.7	111	63	60	Yes	7

Dataset Description:

Age: Age in years
Weight: Weight in kg
BMI: Body Mass Index (kg/m²)
BPSysAve: Average systolic blood pressure (mmHg)
BPDiaAve: Average diastolic blood pressure (mmHg)
Pulse: 60 second pulse rate
SleepHrsNight: Hours of sleep per night

Example 1: Age and Blood Pressure

Research Question

Is there a correlation between age and systolic blood pressure among US adults?

Public Health Context: Understanding age-related changes in blood pressure helps identify at-risk populations and inform screening guidelines.

Step 1: Visualize the Relationship

Always start with a scatterplot!

# Create scatterplot
ggplot(nhanes_adult, aes(x = Age, y = BPSysAve)) +
  geom_point(alpha = 0.3, color = "steelblue") +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Age vs Systolic Blood Pressure",
    subtitle = "NHANES Data, Adults 18-80 years",
    x = "Age (years)",
    y = "Systolic Blood Pressure (mmHg)"
  ) +
  theme_minimal()

What we observe:

Positive trend: older adults tend to have higher blood pressure
Points scattered around the line (not perfect relationship)
Relationship appears roughly linear
Some variability at all age levels

Step 2: Calculate Correlation

# Calculate Pearson correlation
cor_age_bp <- cor.test(nhanes_adult$Age, nhanes_adult$BPSysAve)

# Display results in clean table
tidy(cor_age_bp) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: Age and Systolic BP"
  )

Pearson Correlation: Age and Systolic BP
r	t-statistic	p-value	95% CI Lower	95% CI Upper
0.415	38.54	0	0.396	0.434

Step 3: Interpret Results

Hypothesis Test:

H₀: ρ = 0 (no correlation between age and BP in population)
H₁: ρ ≠ 0 (correlation exists)
α = 0.05

Results:

r = 0.415: Moderate positive correlation
p < 0.001: Statistically significant (reject H₀)
95% CI [0.396, 0.434]: Doesn’t contain zero (confirms significance)

# Calculate r-squared
r_squared <- cor_age_bp$estimate^2

data.frame(
  Measure = c("Correlation (r)", "Coefficient of Determination (r²)", 
              "Variance Explained"),
  Value = c(
    round(cor_age_bp$estimate, 3),
    round(r_squared, 3),
    paste0(round(r_squared * 100, 1), "%")
  )
) %>%
  kable(caption = "Summary of Correlation Strength")

Summary of Correlation Strength
Measure	Value
Correlation (r)	0.415
Coefficient of Determination (r²)	0.172
Variance Explained	17.2%

Interpretation:

There is a statistically significant moderate positive correlation between age and systolic blood pressure. As age increases, systolic BP tends to increase. However, age explains only about 17.2% of the variation in BP, suggesting other factors also play important roles.

Public Health Implication: Age-appropriate BP screening is important, but individual risk assessment should consider multiple factors beyond age alone.

Step 4: Check Assumptions

Assumption 1: Linearity (already checked with scatterplot ✓)

Assumption 2: Bivariate Normality

# Q-Q plots for normality
par(mfrow = c(1, 2))

qqnorm(nhanes_adult$Age, main = "Q-Q Plot: Age")
qqline(nhanes_adult$Age, col = "red")

qqnorm(nhanes_adult$BPSysAve, main = "Q-Q Plot: Systolic BP")
qqline(nhanes_adult$BPSysAve, col = "red")

par(mfrow = c(1, 1))

Assessment: Both variables are approximately normally distributed (points follow the red line reasonably well). Some deviation in the tails, but with large sample size (n = 7133), the correlation test is robust to minor violations.

Assumption 3: No Extreme Outliers (scatterplot shows no extreme outliers ✓)

Example 2: BMI and Diastolic Blood Pressure

Research Question

Is BMI correlated with diastolic blood pressure?

Why this matters: Understanding the relationship between obesity and blood pressure helps inform weight management interventions.

Step 1: Visualize

ggplot(nhanes_adult, aes(x = BMI, y = BPDiaAve)) +
  geom_point(alpha = 0.3, color = "darkgreen") +
  geom_smooth(method = "lm", se = TRUE, color = "red", fill = "pink") +
  labs(
    title = "BMI vs Diastolic Blood Pressure",
    x = "Body Mass Index (kg/m²)",
    y = "Diastolic Blood Pressure (mmHg)"
  ) +
  theme_minimal()

Observation: There is a Positive relationship and a moderate scatter around the line.

Step 2: Calculate Correlation

# Pearson correlation
cor_bmi_bp <- cor.test(nhanes_adult$BMI, nhanes_adult$BPDiaAve)

# Display results
tidy(cor_bmi_bp) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: BMI and Diastolic BP"
  )

Pearson Correlation: BMI and Diastolic BP
r	t-statistic	p-value	95% CI Lower	95% CI Upper
0.117	9.966	0	0.094	0.14

# Calculate r-squared
r_squared_bmi <- cor_bmi_bp$estimate^2

data.frame(
  Measure = c("r²", "Variance Explained"),
  Value = c(
    round(r_squared_bmi, 4),
    paste0(round(r_squared_bmi * 100, 2), "%")
  )
) %>%
  kable(caption = "Effect Size")

Effect Size
	Measure	Value
cor	r²	0.0137
	Variance Explained	1.37%

Interpretation:

Moderate positive correlation (r = 0.117) between BMI and diastolic BP
Statistically significant (p < 0.001)
BMI explains only 1.4% of variation in diastolic BP

Key Insight: While BMI and blood pressure are related, BMI alone explains less than 10% of BP variation. Other factors (genetics, diet, physical activity, stress, age) play substantial roles.

Example 3: Correlation Matrix

Research Question

How are cardiovascular health indicators related to each other?

Step 1: Calculate Correlation Matrix

# Select cardiovascular variables
cardio_vars <- nhanes_adult %>%
  select(Age, BMI, BPSysAve, BPDiaAve, Pulse)

# Calculate correlation matrix
cor_matrix <- cor(cardio_vars, use = "complete.obs")

# Display as table
cor_matrix %>%
  kable(digits = 3, caption = "Cardiovascular Health Correlation Matrix")

Cardiovascular Health Correlation Matrix
	Age	BMI	BPSysAve	BPDiaAve	Pulse
Age	1.000	0.065	0.415	-0.019	-0.153
BMI	0.065	1.000	0.135	0.117	0.112
BPSysAve	0.415	0.135	1.000	0.340	-0.022
BPDiaAve	-0.019	0.117	0.340	1.000	0.106
Pulse	-0.153	0.112	-0.022	0.106	1.000

Step 2: Visualize Correlation Matrix

# Create correlation plot
corrplot(cor_matrix, 
         method = "circle",
         type = "lower",
         tl.col = "black",
         tl.srt = 45,
         addCoef.col = "black",
         number.cex = 0.7,
         col = colorRampPalette(c("#3498db", "white", "#e74c3c"))(200),
         title = "Cardiovascular Health Correlations",
         mar = c(0,0,2,0))

Key Findings:

# Create summary table of notable correlations
data.frame(
  Relationship = c(
    "Systolic BP & Diastolic BP",
    "Age & Systolic BP",
    "Age & Diastolic BP",
    "BMI & Systolic BP",
    "BMI & Pulse"
  ),
  Correlation = c(
    round(cor_matrix["BPSysAve", "BPDiaAve"], 3),
    round(cor_matrix["Age", "BPSysAve"], 3),
    round(cor_matrix["Age", "BPDiaAve"], 3),
    round(cor_matrix["BMI", "BPSysAve"], 3),
    round(cor_matrix["BMI", "Pulse"], 3)
  ),
  Strength = c("Strong", "Moderate", "Weak-Moderate", "Moderate", "Very Weak")
) %>%
  kable(caption = "Notable Correlations Summary")

Notable Correlations Summary
Relationship	Correlation	Strength
Systolic BP & Diastolic BP	0.340	Strong
Age & Systolic BP	0.415	Moderate
Age & Diastolic BP	-0.019	Weak-Moderate
BMI & Systolic BP	0.135	Moderate
BMI & Pulse	0.112	Very Weak

Interpretation: Systolic and diastolic BP show the strongest correlation (r = 0.34), which makes sense as they measure the same physiological process. Pulse rate shows relatively weak correlations, suggesting it’s influenced by different factors.

Example 4: Spearman vs Pearson

When to Use Spearman Correlation

Use Spearman’s rank correlation when:

Data are ordinal (ranked)
Relationship is monotonic but not linear
Data contain outliers
Normality assumption is violated

Example: Age vs Pulse Rate

# Visualize relationship
ggplot(nhanes_adult, aes(x = Age, y = Pulse)) +
  geom_point(alpha = 0.3, color = "purple") +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Age vs Pulse Rate",
    x = "Age (years)",
    y = "Pulse Rate (bpm)"
  ) +
  theme_minimal()

# Calculate both correlations
pearson_r <- cor.test(nhanes_adult$Age, nhanes_adult$Pulse, method = "pearson")
spearman_r <- cor.test(nhanes_adult$Age, nhanes_adult$Pulse, method = "spearman")

# Compare in table
data.frame(
  Method = c("Pearson", "Spearman"),
  Correlation = c(
    round(pearson_r$estimate, 3),
    round(spearman_r$estimate, 3)
  ),
  p_value = c(
    format.pval(pearson_r$p.value),
    format.pval(spearman_r$p.value)
  ),
  Difference = c(
    "—",
    round(abs(pearson_r$estimate - spearman_r$estimate), 3)
  )
) %>%
  kable(caption = "Pearson vs Spearman Comparison")

Pearson vs Spearman Comparison
	Method	Correlation	p_value	Difference
cor	Pearson	-0.153	< 2.22e-16	—
rho	Spearman	-0.162	< 2.22e-16	0.008

Interpretation:

Results are very similar (difference < 0.01)
Both show very weak negative correlation
With large sample and only mild assumption violations, either method is appropriate
For this data, Pearson is fine (and has better statistical power)

Key Takeaways from Examples

Always visualize first - scatterplots reveal patterns correlation coefficients miss
Context matters - statistical significance ≠ practical importance
Correlation ≠ Causation - always consider confounding
r² tells you explained variance - even significant correlations may explain little
Check assumptions - especially for small samples

PART B: YOUR TURN - Practice Problems

Now it’s your turn to practice! Use the same NHANES dataset and follow the examples above.

Total Points: 25 points

Problem 1: Weight and Height (10 points)

Research Question: Is there a correlation between weight and height among US adults?

Your tasks:

Create a scatterplot with a fitted line (2 points)
Calculate Pearson correlation using cor.test() and display with tidy() (3 points)
Test for statistical significance and state your conclusion (2 points)
Calculate r² and interpret in 2-3 sentences (3 points)

# YOUR CODE HERE

# a. Scatterplot
ggplot(nhanes_adult, aes(x = Weight, y = Height)) +
  geom_point(alpha = 0.3, color = "steelblue") +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Weight vs Height among US Adults",
    subtitle = "NHANES Data, Adults 18-80 years",
    x = "Weight in Kgs",
    y = "Heights in cms"
  ) +
  theme_minimal()

# b. Correlation test with tidy() display
cor_weight_height <- cor.test(nhanes_adult$Weight, nhanes_adult$Height)

# Display results in clean table
tidy(cor_weight_height) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: Weight and Height"
  )

Pearson Correlation: Weight and Height
r	t-statistic	p-value	95% CI Lower	95% CI Upper
0.451	42.618	0	0.432	0.469

c. Statistical significance

The Pearson correlation analysis showed a moderate positive association between weight and height among U.S. adults (r = 0.451). This relationship was statistically significant, t = 42.618, p < 0.001, indicating that the probability that this association happens by chance is fairly low. The 95% confidence interval (0.432, 0.469) does not include zero, further confirming statistical significance.

d. r² and interpretation (write as comment)

r_squared <- cor_weight_height$estimate^2

data.frame(
  Measure = c("Correlation (r)", "Coefficient of Determination (r²)", 
              "Variance Explained"),
  Value = c(
    round(cor_weight_height$estimate, 3),
    round(r_squared, 3),
    paste0(round(r_squared * 100, 1), "%")
  )
) %>%
  kable(caption = "Summary of Correlation Strength")

Summary of Correlation Strength
Measure	Value
Correlation (r)	0.451
Coefficient of Determination (r²)	0.203
Variance Explained	20.3%

r square interpretation:

The correlation coefficient (r = 0.451) indicates a moderate positive relationship between height and weight.

The coefficient of determination (r² = 0.203) shows that approximately 20.3% of the variation in height is explained by weight in this sample.

This means that while weight is significantly related to height, about 79.7% of the variation in height is explained by other factors not included in this model.

Problem 2: Correlation Matrix Analysis (10 points)

Research Question: What are the relationships among BMI, weight, and height?

Your tasks:

Create a correlation matrix for: Weight, Height, BMI (3 points)
Visualize the matrix using corrplot (3 points)
Identify which pair has the strongest correlation (2 points)
Explain why that correlation makes sense biologically/mathematically (2 points)

YOUR CODE HERE

a. Correlation matrix

# Select cardiovascular   
bmi_vars <- nhanes_adult %>%
  select(Weight, Height, BMI)

# Calculate correlation matrix
cor_matrix <- cor(bmi_vars, use = "complete.obs")

# Display as table
cor_matrix %>%
  kable(digits = 3, caption = "BMI with Weight and Height Correlation Matrix")

BMI with Weight and Height Correlation Matrix
	Weight	Height	BMI
Weight	1.000	0.451	0.880
Height	0.451	1.000	-0.012
BMI	0.880	-0.012	1.000

b. Visualize with corrplot

corrplot(cor_matrix, 
         method = "circle",
         type = "lower",
         tl.col = "black",
         tl.srt = 45,
         addCoef.col = "black",
         number.cex = 0.7,
         col = colorRampPalette(c("#3498db", "white", "#e74c3c"))(200),
         title = "BMI with Weight and Height Correlation",
         mar = c(0,0,2,0))

c. Strongest correlation:

The strongest correlation is between Weight and BMI (r = 0.880).

d. Explanation (write as comment)

0.880 indicates a very strong positive relationship, meaning that as weight increases, BMI also increases substantially. This is expected because BMI is directly calculated using weight.

The correlation between Height and BMI (r = -0.012) is extremely weak (essentially no relationship), and the correlation between Weight and Height (r = 0.451) is moderate but much weaker than the BMI–Weight relationship.

Problem 3: Sleep and Age (5 points)

Research Question: Is there a relationship between hours of sleep and age?

Your tasks:

Create a scatterplot (1 point)
Calculate Pearson correlation and display with tidy() (2 points)
Interpret whether the relationship is statistically significant (2 points)

# YOUR CODE HERE

# a. Scatterplot
ggplot(nhanes_adult, aes(x = SleepHrsNight, y = Age)) +
  geom_point(alpha = 0.3, color = "darkgreen") +
  geom_smooth(method = "lm", se = TRUE, color = "purple") +
  labs(
    title = "Sleeping time vs Age among US Adults",
    subtitle = "NHANES Data, Adults 18-80 years",
    x = "Total Sleeping time in hrs",
    y = "Age in years"
  ) +
  theme_minimal()

# b. Correlation test with tidy() display
cor_sleep_age <- cor.test(nhanes_adult$SleepHrsNight, nhanes_adult$Age)

# Display results in clean table
tidy(cor_sleep_age) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: Sleep and Age"
  )

Pearson Correlation: Sleep and Age
r	t-statistic	p-value	95% CI Lower	95% CI Upper
0.023	1.904	0.057	-0.001	0.046

c. Interpretation (write as comment)

The Pearson correlation between sleep duration and age is very weak and positive (r = 0.023). The p-value (p = 0.057) is slightly above the conventional significance level of 0.05, indicating that this relationship is not statistically significant. Additionally, the 95% confidence interval (-0.001, 0.046) includes zero, further suggesting that there is no meaningful linear association between sleep time and age in this sample.

Bonus (Optional, 5 extra points)

Challenge: Investigate the relationship between two variables of your choice from the NHANES dataset. Include:

Scatterplot
Correlation test with clean display
Assumption checks
Thoughtful interpretation

I would like to do the analysis between sleep hrs and systolic blood pressure.

# YOUR CODE HERE
ggplot(nhanes_adult, aes(x = SleepHrsNight, y = BPSysAve)) +
  geom_point(alpha = 0.3, color = "gold") +
  geom_smooth(method = "lm", se = TRUE, color = "darkred") +
  labs(
    title = "Sleeping time vs Systolic BP among US Adults",
    subtitle = "NHANES Data, Adults 18-80 years",
    x = "Total Sleeping time in hrs",
    y = "Systolic BP (mmHg)"
  ) +
  theme_minimal()

# b. Correlation test with tidy() display
cor_sleep_sysBP <- cor.test(nhanes_adult$SleepHrsNight, nhanes_adult$BPSysAve)

# Display results in clean table
tidy(cor_sleep_sysBP) %>%
  select(estimate, statistic, p.value, conf.low, conf.high) %>%
  kable(
    digits = 3,
    col.names = c("r", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper"),
    caption = "Pearson Correlation: Sleep and Systolic BP"
  )

Pearson Correlation: Sleep and Systolic BP
r	t-statistic	p-value	95% CI Lower	95% CI Upper
-0.031	-2.59	0.01	-0.054	-0.007

#Assumption Check:

The scatterplot suggests a roughly linear relationship with no major curvature, and the spread of systolic blood pressure appears relatively consistent across sleep durations, indicating no serious violation of homoscedasticity. Although some outliers are present, the large sample size supports the robustness of the Pearson correlation results.

#Interpretation:

There is a very weak negative correlation between sleep duration and systolic blood pressure (r = −0.031). Although the result is statistically significant (p = 0.01; 95% CI: −0.054 to −0.007), the effect size is extremely small (r² ≈ 0.001), meaning sleep explains less than 0.1% of the variation in systolic blood pressure. Therefore, this relationship is not practically meaningful.

Grading Rubric

Problem 1: Weight and Height (10 points)

1. Scatterplot properly formatted with labels: 2 points
1. Correct correlation with clean display: 3 points
1. Significance test correctly interpreted: 2 points
1. r² calculated and interpreted: 3 points

Problem 2: Correlation Matrix (10 points)

1. Correct matrix calculated: 3 points
1. Well-formatted correlation plot: 3 points
1. Strongest correlation identified: 2 points
1. Biological/mathematical explanation: 2 points

Problem 3: Sleep and Age (5 points)

1. Scatterplot: 1 point
1. Correlation calculated and displayed: 2 points
1. Interpretation of significance: 2 points

Submission Instructions

Save your work with your name: Correlation_Lab_YourName.Rmd
Knit to HTML to create your report
Publish to RPubs:
- Click the Publish button (blue icon) in the HTML preview window
- Choose RPubs from the options
- Follow the prompts to publish (create account if needed)
- Copy your RPubs URL
Submit to Brightspace:
- Upload your .Rmd file
- Paste your RPubs link in the assignment comments or submission text box
Due: End of class today

Grading: This lab is worth 15% of your in-class lab grade. The lowest 2 lab grades are dropped.

Additional Resources

R Functions Used Today

cor.test() - Calculate correlation and test significance
tidy() - Clean display of statistical test results
cor() - Calculate correlation matrix
corrplot() - Visualize correlation matrix
ggplot() + geom_point() - Scatterplots
geom_smooth(method="lm") - Add fitted regression line
qqnorm() / qqline() - Check normality

For More Help

Textbook: Chapter 6 - Correlation Analysis
Office Hours: See syllabus
TA Help: See syllabus
R Documentation: Type ?cor.test in console

Remember:

✓ Correlation measures LINEAR relationships only
✓ Always visualize your data first
✓ Correlation ≠ Causation
✓ Check your assumptions
✓ Consider confounding and alternative explanations

This lab activity was created for EPI 553: Principles of Statistical Inference II
University at Albany, College of Integrated Health Sciences
Spring 2026

Lab 03: In-Class Lab Activity: Correlation Analysis

EPI 553: Principles of Statistical Inference II

Muntasir Masum

2026-02-11

Lab Overview

Background: Why Correlation Matters

What is Correlation?

When to Use Correlation

Important Warning

PART A: GUIDED EXAMPLES

Setup: Load Packages and Data

Example 1: Age and Blood Pressure

Research Question

Step 1: Visualize the Relationship

Step 2: Calculate Correlation

Step 3: Interpret Results

Step 4: Check Assumptions

Example 2: BMI and Diastolic Blood Pressure

Research Question

Step 1: Visualize

Step 2: Calculate Correlation

Example 3: Correlation Matrix

Research Question

Step 1: Calculate Correlation Matrix

Step 2: Visualize Correlation Matrix

Example 4: Spearman vs Pearson

When to Use Spearman Correlation

Example: Age vs Pulse Rate

Key Takeaways from Examples

PART B: YOUR TURN - Practice Problems

Problem 1: Weight and Height (10 points)

c. Statistical significance

d. r² and interpretation (write as comment)

r square interpretation:

Problem 2: Correlation Matrix Analysis (10 points)

YOUR CODE HERE

a. Correlation matrix

b. Visualize with corrplot

c. Strongest correlation:

d. Explanation (write as comment)

Problem 3: Sleep and Age (5 points)

c. Interpretation (write as comment)

Bonus (Optional, 5 extra points)

Grading Rubric

Problem 1: Weight and Height (10 points)

Problem 2: Correlation Matrix (10 points)

Problem 3: Sleep and Age (5 points)

Submission Instructions

Additional Resources

R Functions Used Today

For More Help