Rationale

Cultivation theory suggests that prolonged exposure to media content can shape an individuals’ perception and attitude. While the theory was originally developed around television, the logic behind it can also extend to social media, where constant phone use may influence how they precieve the real world. This study examines whether the amount of time individuals spend on their phones each week is associated with higher levels of unhappiness. By focusing on screen time as the independent variable and self-reported unhappiness (measured on a 50-point scale) as the dependent variable, this test evaluates cultivation’s core assumption that heavier media use predicts negative outcomes in perception and experience.

Hypothesis

On average, unhappiness scores will be positively associated with the number of hours of weekly screen time. Individuals who report higher weekly phone use are expected to score higher on the 50-point unhappiness scale compared to those with lower screen time.

Variables & method

The dependent variable in this study was unhappiness, measured as a continuous variable on a 50-point self-report scale. The independent variable was weekly screen time, also measured as a continuous variable in hours reported from the participants’ phones. A total of 400 individuals participated, reporting their weekly screen time and then completing the unhappiness questionnaire. A bivariate linear regression was conducted to examine whether weekly screen time significantly predicted unhappiness.

Results & discussion

The following graphs and and tables below show that higher weekly screen time was associated with greater unhappiness, supporting the hypothesis. The linear regression indicated that weekly screen time significantly predicted unhappiness with a positive relationship. These findings align with cultivation theory, suggesting that greater media exposure can influence emotional well-being. The scatter-plot illustrates this trend, with a clear upward slope showing that participants who reported more screen time also reported higher unhappiness scores. These results support the cultivation theory, suggesting that greater media exposure can shape emotional well-being. The high t-value and very low p-value indicate the effect is highly statistically significant and unlikely to be due to chance.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## `geom_smooth()` using formula = 'y ~ x'

Leverage estimates for 10 largest outliers
Row # Leverage
164 0.0303
360 0.0199
359 0.0190
371 0.0178
72 0.0170
265 0.0164
201 0.0152
392 0.0151
97 0.0151
44 0.0149
Regression Analysis Results
Coefficient Estimates
Term Estimate Std. Error t p-value
(Intercept) 3.9888 1.5531 2.5683 0.0106
IV 0.4206 0.0307 13.6953 0.0000
Model Fit Statistics
Overall Regression Performance
R-squared Adj. R-squared F-statistic df (model) df (residual) Residual Std. Error
0.3203 0.3186 187.5609 1.0000 398.0000 4.1619

Code

##################################################
# 1. Install and load required packages
##################################################
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")

library(tidyverse)
library(gt)
library(gtExtras)


##################################################
# 2. Read in the dataset
##################################################
# Replace "YOURFILENAME.csv" with the actual filename
mydata <- read.csv("RegressionData.csv")


# ################################################
# # (Optional) 2b. Remove specific cases by row number
# ################################################
# # Example: remove rows 10 and 25
# rows_to_remove <- c(10, 25) # Edit and uncomment this line
# mydata <- mydata[-rows_to_remove, ] # Uncomment this line


##################################################
# 3. Define dependent variable (DV) and independent variable (IV)
##################################################
# Replace YOURDVNAME and YOURIVNAME with actual column names
mydata$DV <- mydata$Unhappiness
mydata$IV <- mydata$Hours


##################################################
# 4. Explore distributions of DV and IV
##################################################
# Make a histogram for DV
DVGraph <- ggplot(mydata, aes(x = DV)) + 
  geom_histogram(color = "black", fill = "#1f78b4")

# Make a histogram for IV
IVGraph <- ggplot(mydata, aes(x = IV)) + 
  geom_histogram(color = "black", fill = "#1f78b4")


##################################################
# 5. Fit and summarize initial regression model
##################################################
# Suppress scientific notation
options(scipen = 999)

# Fit model
myreg <- lm(DV ~ IV, data = mydata)

# Model summary
summary(myreg)


##################################################
# 6. Visualize regression and check for bivariate outliers
##################################################
# Create scatterplot with regression line as a ggplot object
RegressionPlot <- ggplot(mydata, aes(x = IV, y = DV)) +
  geom_point(color = "#1f78b4") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "Scatterplot of DV vs IV with Regression Line",
    x = "Independent Variable (IV)",
    y = "Dependent Variable (DV)"
  ) +
  theme_minimal()


##################################################
# 7. Check for potential outliers (high leverage points)
##################################################
# Calculate leverage values
hat_vals <- hatvalues(myreg)

# Rule of thumb: leverage > 2 * (number of predictors + 1) / n may be influential
threshold <- 2 * (length(coef(myreg)) / nrow(mydata))

# Create table showing 10 largest leverage values
outliers <- data.frame(
  Obs = 1:nrow(mydata),
  Leverage = hatvalues(myreg)
) %>%
  arrange(desc(Leverage)) %>%
  slice_head(n = 10)

# Format as a gt table
outliers_table <- outliers %>%
  gt() %>%
  tab_header(
    title = "Leverage estimates for 10 largest outliers"
  ) %>%
  cols_label(
    Obs = "Row #",
    Leverage = "Leverage"
  ) %>%
  fmt_number(
    columns = Leverage,
    decimals = 4
  )


##################################################
# 8. Create nicely formatted regression results tables
##################################################
# --- Coefficient-level results ---
reg_results <- as.data.frame(coef(summary(myreg))) %>%
  tibble::rownames_to_column("Term") %>%
  rename(
    Estimate = Estimate,
    `Std. Error` = `Std. Error`,
    t = `t value`,
    `p-value` = `Pr(>|t|)`
  )

reg_table <- reg_results %>%
  gt() %>%
  tab_header(
    title = "Regression Analysis Results",
    subtitle = "Coefficient Estimates"
  ) %>%
  fmt_number(
    columns = c(Estimate, `Std. Error`, t, `p-value`),
    decimals = 4
  )


# --- Model fit statistics ---
reg_summary <- summary(myreg)

fit_stats <- tibble::tibble(
  `R-squared` = reg_summary$r.squared,
  `Adj. R-squared` = reg_summary$adj.r.squared,
  `F-statistic` = reg_summary$fstatistic[1],
  `df (model)` = reg_summary$fstatistic[2],
  `df (residual)` = reg_summary$fstatistic[3],
  `Residual Std. Error` = reg_summary$sigma
)

fit_table <- fit_stats %>%
  gt() %>%
  tab_header(
    title = "Model Fit Statistics",
    subtitle = "Overall Regression Performance"
  ) %>%
  fmt_number(
    columns = everything(),
    decimals = 4
  )


##################################################
# 9. Final print of key graphics and tables
##################################################
DVGraph
IVGraph
RegressionPlot
outliers_table
reg_table
fit_table