In cultivation theory, heavy exposure to television or television-like programming leads to attitude stability. The theory was founded by George Gerbner in the late 1960s. At the core of the theory is that television is a powerful mass communication medium and that it shapes society’s culture and norms depending on the level of attention and viewership.
Considering what the theory states, the number of hours a person spends watching commercially produced television stories should impact the perception of what the person thinks is “normal” within the culture. For this analysis, we will specifically examine the correlation between the level of television viewship and perception about what types of jobs most people have within society as a whole.
Based on the level of television viewership, those who are heavy viewers will be more likely to think that most people in society have a job in law enforcement, medicine, or emergency response services.
For this analysis, 400 volunteer study participants were recruited from a random sample of all U.S. adults. Study participants agreed to connect a monitoring device to their household television (or televisions, and/or other electronic devices that serve as television content consumption platforms) that enabled the study’s researchers to record the precise number of hours per week that each study participant spent watching television content. Each device was configured to count only those hours during which the study participant was watching. Thus, time spent watching by other household members was not recorded in the total unless the study participant was also watching.
Data was collected for six months. At the end of the six-month monitoring period, study participants completed a questionnaire, which resulted in two variables. Both the independent variable and the dependent variable are continuous.
The independent variable is the average weekly hours each research participant spent watching television for the six-month period. The dependent variable is the percentage of the U.S. population estimated by each research participant to be employed full time in either law enforcement/criminal justice (police, investigators, prosecutors, criminal defense lawyers, security guards, etc.) or medicine (doctors, nurses, hospital personnel, medical interns, etc.) or emergency response services (firefighters, paramedics, search-and-rescue personnel, etc.). Study participants were asked about the percentages for each worker category, and the percentages were summed.
The data was analyzed using a linear regression to determine the level of correlation between the two variables.
As part of the linear regression analysis, two graphs, a scatter plot and three tables were produced from the data using R code in R Studio.
The Dependent Variable graph shows a slightly skewed bell curve with dependent variable values mostly between the mid-40s through the 60s. There appear to be a few outliers on the low end of the scale, but they do not appear to be significant enough to affect the overall results of the analysis.
The Independent Variable graph also shows a bell curve with most of the IV values occurring between 19 and 50 with the peak being at approximately the 35 mark. There is one slight outlier on the right side of the graph, at the 60 mark, but it does not appear to be significant enough to affect the overall results.
The regression plot illustrates there is a positive correlation between the two variables, but it appears to be only a slight correlation based on the distance of the plots from each other and the regression line.
The outliers table estimates the 10 largest outliers, but as the previous graphs showed, the outliers are not mathematically divergent from average values.
The regression table shows a p-value of 0.0000, meaning there is a significant effect between the independent variable and the dependent variable — the results of the analysis are not random. The table also shows an intercept of 23.2076, which means that if a person watched zero hours of television, they would be 23% likely to think most people work in the fields of law enforcement, medicine, or emergency response services. This value represents the baseline for the analysis.
The model fit statistics table shows an R-squared value of .3111, which confirms what the scatter plot shows — there is a positive but weak correlation between the number of hours watching television and the viewers’ perception of the amount of people who work in the fields of law enforcement, medicine, or emergency response services.
| Leverage estimates for 10 largest outliers | |
| Row # | Leverage |
|---|---|
| 164 | 0.0305 |
| 360 | 0.0207 |
| 359 | 0.0194 |
| 371 | 0.0174 |
| 72 | 0.0162 |
| 201 | 0.0159 |
| 265 | 0.0159 |
| 392 | 0.0148 |
| 44 | 0.0144 |
| 97 | 0.0144 |
| Regression Analysis Results | ||||
| Coefficient Estimates | ||||
| Term | Estimate | Std. Error | t | p-value |
|---|---|---|---|---|
| (Intercept) | 23.2076 | 2.2026 | 10.5363 | 0.0000 |
| IV | 0.8440 | 0.0630 | 13.4056 | 0.0000 |
| Model Fit Statistics | |||||
| Overall Regression Performance | |||||
| R-squared | Adj. R-squared | F-statistic | df (model) | df (residual) | Residual Std. Error |
|---|---|---|---|---|---|
| 0.3111 | 0.3093 | 179.7107 | 1.0000 | 398.0000 | 9.7373 |
##################################################
# 1. Install and load required packages
##################################################
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")
library(tidyverse)
library(gt)
library(gtExtras)
##################################################
# 2. Read in the dataset
##################################################
# Replace "YOURFILENAME.csv" with the actual filename
mydata <- read.csv("Cultivation.csv")
# ################################################
# # (Optional) 2b. Remove specific cases by row number
# ################################################
# # Example: remove rows 10 and 25
# rows_to_remove <- c(10, 25) # Edit and uncomment this line
# mydata <- mydata[-rows_to_remove, ] # Uncomment this line
##################################################
# 3. Define dependent variable (DV) and independent variable (IV)
##################################################
# Replace YOURDVNAME and YOURIVNAME with actual column names
mydata$DV <- mydata$pct
mydata$IV <- mydata$video
##################################################
# 4. Explore distributions of DV and IV
##################################################
# Make a histogram for DV
DVGraph <- ggplot(mydata, aes(x = DV)) +
geom_histogram(color = "black", fill = "#1f78b4")
# Make a histogram for IV
IVGraph <- ggplot(mydata, aes(x = IV)) +
geom_histogram(color = "black", fill = "#1f78b4")
##################################################
# 5. Fit and summarize initial regression model
##################################################
# Suppress scientific notation
options(scipen = 999)
# Fit model
myreg <- lm(DV ~ IV, data = mydata)
# Model summary
summary(myreg)
##################################################
# 6. Visualize regression and check for bivariate outliers
##################################################
# Create scatterplot with regression line as a ggplot object
RegressionPlot <- ggplot(mydata, aes(x = IV, y = DV)) +
geom_point(color = "#1f78b4") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "Scatterplot of DV vs IV with Regression Line",
x = "Independent Variable (IV)",
y = "Dependent Variable (DV)"
) +
theme_minimal()
##################################################
# 7. Check for potential outliers (high leverage points)
##################################################
# Calculate leverage values
hat_vals <- hatvalues(myreg)
# Rule of thumb: leverage > 2 * (number of predictors + 1) / n may be influential
threshold <- 2 * (length(coef(myreg)) / nrow(mydata))
# Create table showing 10 largest leverage values
outliers <- data.frame(
Obs = 1:nrow(mydata),
Leverage = hatvalues(myreg)
) %>%
arrange(desc(Leverage)) %>%
slice_head(n = 10)
# Format as a gt table
outliers_table <- outliers %>%
gt() %>%
tab_header(
title = "Leverage estimates for 10 largest outliers"
) %>%
cols_label(
Obs = "Row #",
Leverage = "Leverage"
) %>%
fmt_number(
columns = Leverage,
decimals = 4
)
##################################################
# 8. Create nicely formatted regression results tables
##################################################
# --- Coefficient-level results ---
reg_results <- as.data.frame(coef(summary(myreg))) %>%
tibble::rownames_to_column("Term") %>%
rename(
Estimate = Estimate,
`Std. Error` = `Std. Error`,
t = `t value`,
`p-value` = `Pr(>|t|)`
)
reg_table <- reg_results %>%
gt() %>%
tab_header(
title = "Regression Analysis Results",
subtitle = "Coefficient Estimates"
) %>%
fmt_number(
columns = c(Estimate, `Std. Error`, t, `p-value`),
decimals = 4
)
# --- Model fit statistics ---
reg_summary <- summary(myreg)
fit_stats <- tibble::tibble(
`R-squared` = reg_summary$r.squared,
`Adj. R-squared` = reg_summary$adj.r.squared,
`F-statistic` = reg_summary$fstatistic[1],
`df (model)` = reg_summary$fstatistic[2],
`df (residual)` = reg_summary$fstatistic[3],
`Residual Std. Error` = reg_summary$sigma
)
fit_table <- fit_stats %>%
gt() %>%
tab_header(
title = "Model Fit Statistics",
subtitle = "Overall Regression Performance"
) %>%
fmt_number(
columns = everything(),
decimals = 4
)
##################################################
# 9. Final print of key graphics and tables
##################################################
DVGraph
IVGraph
RegressionPlot
outliers_table
reg_table
fit_table