Rationale

Cultivation theory describes attitude stability stemming from heavy media exposure. Heavy media viewers think differently on certain topics (depending on what they watch) than light media viewers.

Considering what the theory suggests, a person’s unhappiness can be influenced based on hours spent on social media.

Hypothesis

An individuals happiness will have a positive correlation by how much time they spend on social media.

Variables & Method

The dependent variable in the analysis is a continuous measure of unhappiness on a 50 point scale. The independent variable in the analysis is a continuous measure of average weekly screen time, with a minimum of 0 hours, maximum of 112 hours, average of 50 hours and standard deviation of 7. The method used in the analysis is a Linear Regression model.

Results & Discussion

The graphs below show the group distributions and averages of the dependent and independent variables by themselves and all together. The tables show the results for the leverage estimates for outliers, results of the t-test, and model fit statistics.

Graph 1

## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Graph 2

## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Graph 3

## `geom_smooth()` using formula = 'y ~ x'

Table 1

Leverage estimates for 10 largest outliers
Row # Leverage
164 0.0303
360 0.0199
359 0.0190
371 0.0178
72 0.0170
265 0.0164
201 0.0152
392 0.0151
97 0.0151
44 0.0149

Table 2

Regression Analysis Results
Coefficient Estimates
Term Estimate Std. Error t p-value
(Intercept) 3.9888 1.5531 2.5683 0.0106
IV 0.4206 0.0307 13.6953 0.0000

Table 3

Model Fit Statistics
Overall Regression Performance
R-squared Adj. R-squared F-statistic df (model) df (residual) Residual Std. Error
0.3203 0.3186 187.5609 1.0000 398.0000 4.1619

With R-squared equaling about .32, the group distributions show a weak positive correlation between unhappiness and hours spent on social media. The results supported the hypothesis but the statistics are not significant.

Code

##################################################
# 1. Install and load required packages
##################################################
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")

library(tidyverse)
library(gt)
library(gtExtras)


##################################################
# 2. Read in the dataset
##################################################
# Replace "YOURFILENAME.csv" with the actual filename
mydata <- read.csv("RegressionData.csv")


# ################################################
# # (Optional) 2b. Remove specific cases by row number
# ################################################
# # Example: remove rows 10 and 25
# rows_to_remove <- c(10, 25) # Edit and uncomment this line
# mydata <- mydata[-rows_to_remove, ] # Uncomment this line


##################################################
# 3. Define dependent variable (DV) and independent variable (IV)
##################################################
# Replace YOURDVNAME and YOURIVNAME with actual column names
mydata$DV <- mydata$Unhappiness
mydata$IV <- mydata$Hours


##################################################
# 4. Explore distributions of DV and IV
##################################################
# Make a histogram for DV
DVGraph <- ggplot(mydata, aes(x = DV)) + 
  geom_histogram(color = "black", fill = "#1f78b4")

# Make a histogram for IV
IVGraph <- ggplot(mydata, aes(x = IV)) + 
  geom_histogram(color = "black", fill = "#1f78b4")


##################################################
# 5. Fit and summarize initial regression model
##################################################
# Suppress scientific notation
options(scipen = 999)

# Fit model
myreg <- lm(DV ~ IV, data = mydata)

# Model summary
summary(myreg)


##################################################
# 6. Visualize regression and check for bivariate outliers
##################################################
# Create scatterplot with regression line as a ggplot object
RegressionPlot <- ggplot(mydata, aes(x = IV, y = DV)) +
  geom_point(color = "#1f78b4") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "Scatterplot of DV vs IV with Regression Line",
    x = "Independent Variable (IV)",
    y = "Dependent Variable (DV)"
  ) +
  theme_minimal()


##################################################
# 7. Check for potential outliers (high leverage points)
##################################################
# Calculate leverage values
hat_vals <- hatvalues(myreg)

# Rule of thumb: leverage > 2 * (number of predictors + 1) / n may be influential
threshold <- 2 * (length(coef(myreg)) / nrow(mydata))

# Create table showing 10 largest leverage values
outliers <- data.frame(
  Obs = 1:nrow(mydata),
  Leverage = hatvalues(myreg)
) %>%
  arrange(desc(Leverage)) %>%
  slice_head(n = 10)

# Format as a gt table
outliers_table <- outliers %>%
  gt() %>%
  tab_header(
    title = "Leverage estimates for 10 largest outliers"
  ) %>%
  cols_label(
    Obs = "Row #",
    Leverage = "Leverage"
  ) %>%
  fmt_number(
    columns = Leverage,
    decimals = 4
  )


##################################################
# 8. Create nicely formatted regression results tables
##################################################
# --- Coefficient-level results ---
reg_results <- as.data.frame(coef(summary(myreg))) %>%
  tibble::rownames_to_column("Term") %>%
  rename(
    Estimate = Estimate,
    `Std. Error` = `Std. Error`,
    t = `t value`,
    `p-value` = `Pr(>|t|)`
  )

reg_table <- reg_results %>%
  gt() %>%
  tab_header(
    title = "Regression Analysis Results",
    subtitle = "Coefficient Estimates"
  ) %>%
  fmt_number(
    columns = c(Estimate, `Std. Error`, t, `p-value`),
    decimals = 4
  )


# --- Model fit statistics ---
reg_summary <- summary(myreg)

fit_stats <- tibble::tibble(
  `R-squared` = reg_summary$r.squared,
  `Adj. R-squared` = reg_summary$adj.r.squared,
  `F-statistic` = reg_summary$fstatistic[1],
  `df (model)` = reg_summary$fstatistic[2],
  `df (residual)` = reg_summary$fstatistic[3],
  `Residual Std. Error` = reg_summary$sigma
)

fit_table <- fit_stats %>%
  gt() %>%
  tab_header(
    title = "Model Fit Statistics",
    subtitle = "Overall Regression Performance"
  ) %>%
  fmt_number(
    columns = everything(),
    decimals = 4
  )


##################################################
# 9. Final print of key graphics and tables
##################################################
DVGraph
IVGraph
RegressionPlot
outliers_table
reg_table
fit_table