Cultivation Analysis

Rationale

The Cultivation theory suggests that heavy television exposure can gradually shape viewers perception of reality. Over time, frequent exposure to televised content may “cultivate” attitudes and fears just as the world is portrayed on screen. If this theory is correct, people who spend more time watching television should, on average, report higher levels of unhappiness.

Hypothesis

Unhappiness scores will increase as the number of hours of daily television viewing increases. Specifically, heavier viewers will report higher levels of unhappiness than lighter viewers.

Variables & method

Participants reported their average number of hours of television watched per day (IV) and rated their level of unhappiness on a numerical scale (DV). Both variables were treated as continuous measures.

A linear regression was performed to test whether television viewing hours significantly predicted unhappiness. Histograms were used to inspect the distributions of both variables, and the scatterplot with a fitted regression line illustrates the relationship. Leverage statistics were also calculated to check for influential outliers.

Results & discussion

The regression model showed a positive correlation between television hours and unhappiness. The scatterplot indicated a positive slope, meaning that unhappiness scores tended to rise as viewing hours increased. Model fit statistics confirmed that television hours accounted for a meaningful portion of the variance in unhappiness. Leverage diagnostics revealed no influential outliers.

Row #	Leverage
Leverage estimates for 10 largest outliers
164	0.0303
360	0.0199
359	0.0190
371	0.0178
72	0.0170
265	0.0164
201	0.0152
392	0.0151
97	0.0151
44	0.0149

Term	Estimate	Std. Error	t	p-value
Regression Analysis Results
Coefficient Estimates
(Intercept)	3.9888	1.5531	2.5683	0.0106
IV	0.4206	0.0307	13.6953	0.0000

R-squared	Adj. R-squared	F-statistic	df (model)	df (residual)	Residual Std. Error
Model Fit Statistics
Overall Regression Performance
0.3203	0.3186	187.5609	1.0000	398.0000	4.1619

These findings support the hypothesis and align with cultivation theory, suggesting that heavier television viewing is associated with greater reported unhappiness. Increased exposure to television content may cultivate more negative emotional states over time.

Code:

##################################################
# 1. Install and load required packages
##################################################
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")

library(tidyverse)
library(gt)
library(gtExtras)


##################################################
# 2. Read in the dataset
##################################################
# Replace "YOURFILENAME.csv" with the actual filename
mydata <- read.csv("RegressionData.csv")


# ################################################
# # (Optional) 2b. Remove specific cases by row number
# ################################################
# # Example: remove rows 10 and 25
# rows_to_remove <- c(10, 25) # Edit and uncomment this line
# mydata <- mydata[-rows_to_remove, ] # Uncomment this line


##################################################
# 3. Define dependent variable (DV) and independent variable (IV)
##################################################
# Replace YOURDVNAME and YOURIVNAME with actual column names
mydata$DV <- mydata$Unhappiness
mydata$IV <- mydata$Hours


##################################################
# 4. Explore distributions of DV and IV
##################################################
# Make a histogram for DV
DVGraph <- ggplot(mydata, aes(x = DV)) + 
  geom_histogram(color = "black", fill = "#1f78b4")

# Make a histogram for IV
IVGraph <- ggplot(mydata, aes(x = IV)) + 
  geom_histogram(color = "black", fill = "#1f78b4")


##################################################
# 5. Fit and summarize initial regression model
##################################################
# Suppress scientific notation
options(scipen = 999)

# Fit model
myreg <- lm(DV ~ IV, data = mydata)

# Model summary
summary(myreg)


##################################################
# 6. Visualize regression and check for bivariate outliers
##################################################
# Create scatterplot with regression line as a ggplot object
RegressionPlot <- ggplot(mydata, aes(x = IV, y = DV)) +
  geom_point(color = "#1f78b4") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "Scatterplot of DV vs IV with Regression Line",
    x = "Independent Variable (IV)",
    y = "Dependent Variable (DV)"
  ) +
  theme_minimal()


##################################################
# 7. Check for potential outliers (high leverage points)
##################################################
# Calculate leverage values
hat_vals <- hatvalues(myreg)

# Rule of thumb: leverage > 2 * (number of predictors + 1) / n may be influential
threshold <- 2 * (length(coef(myreg)) / nrow(mydata))

# Create table showing 10 largest leverage values
outliers <- data.frame(
  Obs = 1:nrow(mydata),
  Leverage = hatvalues(myreg)
) %>%
  arrange(desc(Leverage)) %>%
  slice_head(n = 10)

# Format as a gt table
outliers_table <- outliers %>%
  gt() %>%
  tab_header(
    title = "Leverage estimates for 10 largest outliers"
  ) %>%
  cols_label(
    Obs = "Row #",
    Leverage = "Leverage"
  ) %>%
  fmt_number(
    columns = Leverage,
    decimals = 4
  )


##################################################
# 8. Create nicely formatted regression results tables
##################################################
# --- Coefficient-level results ---
reg_results <- as.data.frame(coef(summary(myreg))) %>%
  tibble::rownames_to_column("Term") %>%
  rename(
    Estimate = Estimate,
    `Std. Error` = `Std. Error`,
    t = `t value`,
    `p-value` = `Pr(>|t|)`
  )

reg_table <- reg_results %>%
  gt() %>%
  tab_header(
    title = "Regression Analysis Results",
    subtitle = "Coefficient Estimates"
  ) %>%
  fmt_number(
    columns = c(Estimate, `Std. Error`, t, `p-value`),
    decimals = 4
  )


# --- Model fit statistics ---
reg_summary <- summary(myreg)

fit_stats <- tibble::tibble(
  `R-squared` = reg_summary$r.squared,
  `Adj. R-squared` = reg_summary$adj.r.squared,
  `F-statistic` = reg_summary$fstatistic[1],
  `df (model)` = reg_summary$fstatistic[2],
  `df (residual)` = reg_summary$fstatistic[3],
  `Residual Std. Error` = reg_summary$sigma
)

fit_table <- fit_stats %>%
  gt() %>%
  tab_header(
    title = "Model Fit Statistics",
    subtitle = "Overall Regression Performance"
  ) %>%
  fmt_number(
    columns = everything(),
    decimals = 4
  )


##################################################
# 9. Final print of key graphics and tables
##################################################
DVGraph
IVGraph
RegressionPlot
outliers_table
reg_table
fit_table