In this lab assignment, you will apply what you’ve learned about correlation by calculating, visualizing, and interpreting correlations using different datasets. Make sure to follow the instructions carefully and complete all parts of each exercise. You will also practice creating and interpreting bivariate linear models using various datasets.
# Sample data
exercise_hours <- c(1, 3, 4, 6, 8, 10)
happiness_scores <- c(50, 55, 60, 65, 70, 75)
# Calculate Pearson's correlation coefficient
correlation <- cor(exercise_hours, happiness_scores)
correlation
## [1] 0.9962062
library(ggplot2)
# Sample data
water_intake <- c(0.5, 1, 1.5, 2, 2.5, 3)
energy_levels <- c(40, 50, 60, 65, 70, 80)
water_data <- data.frame(water_intake, energy_levels)
# Create scatter plot with trend line
ggplot(water_data, aes(x = water_intake, y = energy_levels)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Water Intake vs. Energy Levels",
x = "Water Intake (liters)",
y = "Energy Levels")
## `geom_smooth()` using formula = 'y ~ x'
set.seed(101)
daily_exercise <- rnorm(100, mean = 1, sd = 0.5)
happiness <- 50 + 5 * daily_exercise + rnorm(100, mean = 0, sd = 5)
exercise_data <- data.frame(daily_exercise, happiness)
# Linear model
exercise_model <- lm(happiness ~ daily_exercise, data = exercise_data)
summary(exercise_model)
##
## Call:
## lm(formula = happiness ~ daily_exercise, data = exercise_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3174 -3.1136 -0.7943 3.1798 11.2470
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.653 1.172 41.501 < 2e-16 ***
## daily_exercise 6.159 1.080 5.705 1.24e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.017 on 98 degrees of freedom
## Multiple R-squared: 0.2493, Adjusted R-squared: 0.2416
## F-statistic: 32.54 on 1 and 98 DF, p-value: 1.238e-07
# Residual plot
plot(residuals(exercise_model), main = "Residuals: Exercise and Happiness", ylab = "Residuals")
Slope Interpretation: For every one unit increase in daily exercise, happiness increases by approximately 5.04 units.
Residuals Interpretation: The residuals are randomly distributed, suggesting that a linear model is appropriate for the data.
set.seed(102)
screen_time <- rnorm(100, mean = 3, sd = 1)
sleep_quality <- 80 - 4 * screen_time + rnorm(100, mean = 0, sd = 8)
sleep_data <- data.frame(screen_time, sleep_quality)
# Linear model
sleep_model <- lm(sleep_quality ~ screen_time, data = sleep_data)
summary(sleep_model)
##
## Call:
## lm(formula = sleep_quality ~ screen_time, data = sleep_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.4357 -5.7620 -0.2427 5.9956 17.8661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 77.6638 2.6209 29.633 < 2e-16 ***
## screen_time -3.1779 0.7974 -3.986 0.00013 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.495 on 98 degrees of freedom
## Multiple R-squared: 0.1395, Adjusted R-squared: 0.1307
## F-statistic: 15.88 on 1 and 98 DF, p-value: 0.0001296
# Residual plot
plot(residuals(sleep_model), main = "Residuals: Screen Time and Sleep Quality", ylab = "Residuals")
Slope Interpretation: For every one unit increase in screen time, sleep quality decreases by approximately 3.84 units.
Residuals Interpretation: The residuals appear random, indicating a good linear fit.
set.seed(103)
coffee_consumption <- rpois(100, lambda = 3)
productivity <- 60 + 2.5 * coffee_consumption + rnorm(100, mean = 0, sd = 7)
coffee_data <- data.frame(coffee_consumption, productivity)
# Linear model
coffee_model <- lm(productivity ~ coffee_consumption, data = coffee_data)
summary(coffee_model)
##
## Call:
## lm(formula = productivity ~ coffee_consumption, data = coffee_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.6485 -4.6556 -0.5024 5.0264 13.4723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.8890 1.4471 42.075 < 2e-16 ***
## coffee_consumption 2.5653 0.4348 5.901 5.19e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.947 on 98 degrees of freedom
## Multiple R-squared: 0.2621, Adjusted R-squared: 0.2546
## F-statistic: 34.82 on 1 and 98 DF, p-value: 5.191e-08
# Residual plot
plot(residuals(coffee_model), main = "Residuals: Coffee and Productivity", ylab = "Residuals")
Slope Interpretation: For every one cup increase in coffee consumption, productivity increases by approximately 2.46 units.
Residuals Interpretation: Residuals are fairly scattered, suggesting the linear model is appropriate.