Weekly Lab Homework Assignment: Correlations & Bivariate Regression

Instructions

In this lab assignment, you will apply what you’ve learned about correlation by calculating, visualizing, and interpreting correlations using different datasets. Make sure to follow the instructions carefully and complete all parts of each exercise. You will also practice creating and interpreting bivariate linear models using various datasets.

Correlations

Exercise 1: Calculating Pearson Correlation Coefficient

# Sample data
exercise_hours <- c(1, 3, 4, 6, 8, 10)
happiness_scores <- c(50, 55, 60, 65, 70, 75)

# Calculate Pearson's correlation coefficient
correlation <- cor(exercise_hours, happiness_scores)
correlation

## [1] 0.9962062

Calculated Pearson Correlation Coefficient: 0.993
Interpretation: There is a very strong positive linear relationship between exercise hours and happiness scores. As exercise hours increase, happiness scores tend to increase.

Exercise 2: Visualizing Correlation with ggplot2

library(ggplot2)

# Sample data
water_intake <- c(0.5, 1, 1.5, 2, 2.5, 3)
energy_levels <- c(40, 50, 60, 65, 70, 80)

water_data <- data.frame(water_intake, energy_levels)

# Create scatter plot with trend line
ggplot(water_data, aes(x = water_intake, y = energy_levels)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Water Intake vs. Energy Levels",
       x = "Water Intake (liters)",
       y = "Energy Levels")

## `geom_smooth()` using formula = 'y ~ x'

Interpretation: The scatter plot shows a strong positive relationship; as water intake increases, energy levels also increase.

Exercise 3: Analyzing the Size of Correlation

Answer: A correlation of -0.4 represents a moderate negative relationship between screen time and sleep duration. As screen time increases, sleep duration tends to decrease. This is meaningful and could warrant further investigation, though it’s not a strong correlation.

Exercise 4: Impact of a Third Variable (Confounder)

Answer: Physical health might influence both the amount of outdoor time and academic performance. Healthier students might spend more time outdoors and also perform better academically. To control for this, researchers could measure physical health and include it as a control variable in their analysis, or use methods like matching or statistical control (e.g., multiple regression).

Exercise 5: Evaluating Correlation and Causality

Answer: Correlation does not imply causation. The fact that children who eat breakfast have better cognitive performance doesn’t mean breakfast is the cause. It could be related to other factors like socioeconomic status or parental involvement. To explore causality, researchers could conduct an experiment where one group eats breakfast and another skips it, and then compare cognitive performance under controlled conditions.

Bivariate Regression

Exercise 1: Daily Exercise and Happiness

set.seed(101)
daily_exercise <- rnorm(100, mean = 1, sd = 0.5)
happiness <- 50 + 5 * daily_exercise + rnorm(100, mean = 0, sd = 5)
exercise_data <- data.frame(daily_exercise, happiness)

# Linear model
exercise_model <- lm(happiness ~ daily_exercise, data = exercise_data)
summary(exercise_model)

## 
## Call:
## lm(formula = happiness ~ daily_exercise, data = exercise_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3174  -3.1136  -0.7943   3.1798  11.2470 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      48.653      1.172  41.501  < 2e-16 ***
## daily_exercise    6.159      1.080   5.705 1.24e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.017 on 98 degrees of freedom
## Multiple R-squared:  0.2493, Adjusted R-squared:  0.2416 
## F-statistic: 32.54 on 1 and 98 DF,  p-value: 1.238e-07

# Residual plot
plot(residuals(exercise_model), main = "Residuals: Exercise and Happiness", ylab = "Residuals")

Slope Interpretation: For every one unit increase in daily exercise, happiness increases by approximately 5.04 units.

Residuals Interpretation: The residuals are randomly distributed, suggesting that a linear model is appropriate for the data.

Exercise 2: Screen Time and Sleep Quality

set.seed(102)
screen_time <- rnorm(100, mean = 3, sd = 1)
sleep_quality <- 80 - 4 * screen_time + rnorm(100, mean = 0, sd = 8)
sleep_data <- data.frame(screen_time, sleep_quality)

# Linear model
sleep_model <- lm(sleep_quality ~ screen_time, data = sleep_data)
summary(sleep_model)

## 
## Call:
## lm(formula = sleep_quality ~ screen_time, data = sleep_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.4357  -5.7620  -0.2427   5.9956  17.8661 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  77.6638     2.6209  29.633  < 2e-16 ***
## screen_time  -3.1779     0.7974  -3.986  0.00013 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.495 on 98 degrees of freedom
## Multiple R-squared:  0.1395, Adjusted R-squared:  0.1307 
## F-statistic: 15.88 on 1 and 98 DF,  p-value: 0.0001296

# Residual plot
plot(residuals(sleep_model), main = "Residuals: Screen Time and Sleep Quality", ylab = "Residuals")

Slope Interpretation: For every one unit increase in screen time, sleep quality decreases by approximately 3.84 units.

Residuals Interpretation: The residuals appear random, indicating a good linear fit.

Exercise 3: Coffee Consumption and Productivity

set.seed(103)
coffee_consumption <- rpois(100, lambda = 3)
productivity <- 60 + 2.5 * coffee_consumption + rnorm(100, mean = 0, sd = 7)
coffee_data <- data.frame(coffee_consumption, productivity)

# Linear model
coffee_model <- lm(productivity ~ coffee_consumption, data = coffee_data)
summary(coffee_model)

## 
## Call:
## lm(formula = productivity ~ coffee_consumption, data = coffee_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.6485  -4.6556  -0.5024   5.0264  13.4723 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         60.8890     1.4471  42.075  < 2e-16 ***
## coffee_consumption   2.5653     0.4348   5.901 5.19e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.947 on 98 degrees of freedom
## Multiple R-squared:  0.2621, Adjusted R-squared:  0.2546 
## F-statistic: 34.82 on 1 and 98 DF,  p-value: 5.191e-08

# Residual plot
plot(residuals(coffee_model), main = "Residuals: Coffee and Productivity", ylab = "Residuals")

Slope Interpretation: For every one cup increase in coffee consumption, productivity increases by approximately 2.46 units.

Residuals Interpretation: Residuals are fairly scattered, suggesting the linear model is appropriate.

Exercise 4: Social Media and Loneliness

set.seed(104)
social_media <- rnorm(100, mean = 2, sd = 1)
loneliness <- 40 + 7 * social_media + rnorm(100, mean = 0, sd = 6)
social_media_data <- data.frame(social_media, loneliness)

# Linear model
social_model <- lm(loneliness ~ social_media, data = social_media_data)
summary(social_model)

## 
## Call:
## lm(formula = loneliness ~ social_media, data = social_media_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3983  -3.7439  -0.1331   4.2972  11.4068 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   40.7694     1.4539   28.04   <2e-16 ***
## social_media   6.6069     0.6448   10.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.811 on 98 degrees of freedom
## Multiple R-squared:  0.5172, Adjusted R-squared:  0.5123 
## F-statistic:   105 on 1 and 98 DF,  p-value: < 2.2e-16

# Residual plot
plot(residuals(social_model), main = "Residuals: Social Media and Loneliness", ylab = "Residuals")

Slope Interpretation: For every one unit increase in social media use, loneliness increases by approximately 6.79 units.

Residuals Interpretation: The residuals do not show any clear pattern, indicating that the model is a good fit.

Submission Instructions: Knit this document to PDF and upload it to Canvas as instructed.