Cultivation Analysis

Author: Liliana Danajlovski

Date: September 24, 2025

Introduction

This cultivation analysis explores the relationship between how much television people watch and how they perceive the percentage of the population working in high‑visibility occupations. The fictional dataset includes 400 adults who participated in a six‑month monitoring study. Each person’s viewing time was tracked automatically and they later reported what percentage of the U.S. population they believed worked full‑time in law enforcement, medicine or emergency response. Cultivation theory suggests that people who watch a lot of television may overestimate the prevalence of occupations that are overrepresented on screen. To test this idea, we treat video (average weekly hours of TV viewing) as the independent variable and pct (perceived percentage of the population in these occupations) as the dependent variable. A positive association would support the notion that heavier viewing cultivates inflated perceptions.

Setup and Data Loading

We start by loading the tidyverse for data wrangling and plotting, along with the gt package for nicely formatted tables. The dataset is read directly from the project repository on GitHub. If you prefer to work from a local file, replace the URL with the path to your saved copy of Cultivation.csv.

# Install packages if necessary (uncomment the next lines on first run)
# if (!require(tidyverse)) install.packages("tidyverse")
# if (!require(gt)) install.packages("gt")

library(tidyverse)
library(gt)

# Read the cultivation study data
mydata <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/Cultivation.csv")

# Preview the first few rows
head(mydata)
##   video pct
## 1    30  47
## 2    32  39
## 3    46  56
## 4    35  52
## 5    35  60
## 6    48  47

Exploratory Data Analysis

To understand the variables, we examine descriptive statistics and visualise their distributions. The summary below shows the mean, standard deviation, minimum and maximum for both video and pct. Then we plot histograms to see how each variable is distributed. The scatter plot illustrates the relationship between viewing hours and perceived occupation percentages.

# Summarise key statistics
summary_table <- mydata %>%
  summarise(
    video_mean = mean(video),
    video_sd   = sd(video),
    video_min  = min(video),
    video_max  = max(video),
    pct_mean   = mean(pct),
    pct_sd     = sd(pct),
    pct_min    = min(pct),
    pct_max    = max(pct)
  ) %>%
  pivot_longer(cols = everything(), names_to = "metric", values_to = "value") %>%
  separate(metric, into = c("variable", "stat"), sep = "_") %>%
  pivot_wider(names_from = stat, values_from = value)

summary_table %>%
  gt(rowname_col = "variable") %>%
  fmt_number(columns = where(is.numeric), decimals = 2) %>%
  tab_header(title = "Descriptive Statistics")
Descriptive Statistics
mean sd min max
video 34.12 7.74 14.00 60.00
pct 52.01 11.72 15.00 75.00
# Histograms for video and pct
video_hist <- ggplot(mydata, aes(x = video)) +
  geom_histogram(binwidth = 5, colour = "black", fill = "#1f78b4") +
  labs(title = "Distribution of Weekly Viewing Hours", x = "Hours per week", y = "Count")

pct_hist <- ggplot(mydata, aes(x = pct)) +
  geom_histogram(binwidth = 5, colour = "black", fill = "#1f78b4") +
  labs(title = "Distribution of Perceived Occupation Percentage", x = "Estimated percent", y = "Count")

# Scatter plot with regression line
scatter_plot <- ggplot(mydata, aes(x = video, y = pct)) +
  geom_point(alpha = 0.6, colour = "#1f78b4") +
  geom_smooth(method = "lm", se = FALSE, colour = "red") +
  labs(title = "Television Viewing vs. Perceived Occupation Percentage",
       x = "Weekly TV viewing (hours)", y = "Estimated percent of population")

video_hist

pct_hist

scatter_plot

The histograms show that viewing hours cluster around the mid‑30s, with a few participants reporting as few as 14 or as many as 60 hours per week. Perceived percentages span a wider range, roughly from 15 percent to 75 percent. The scatter plot suggests a positive trend: as viewing increases, so does the estimated share of people in law enforcement, medicine and emergency services.

Bivariate Regression Analysis

We model the relationship between video and pct using ordinary least squares regression. The fitted model predicts pct from video, estimates the slope and intercept and tests whether the association differs from zero. After fitting the model, we examine leverage values to spot potential outliers.

# Fit the linear model
model <- lm(pct ~ video, data = mydata)

# Summarise the model
model_summary <- summary(model)

# Extract coefficient table
coeff_table <- as.data.frame(coef(model_summary)) %>%
  tibble::rownames_to_column("Term") %>%
  rename(
    Estimate = Estimate,
    `Std.Error` = `Std. Error`,
    t = `t value`,
    p = `Pr(>|t|)`
  )

# Create a gt table for coefficients
coef_gt <- coeff_table %>%
  gt() %>%
  fmt_number(columns = c(Estimate, `Std.Error`, t, p), decimals = 3) %>%
  tab_header(title = "Regression Coefficients") %>%
  cols_label(Term = "Term", Estimate = "Estimate", `Std.Error` = "Std. Error", t = "t statistic", p = "p-value")

# Model fit statistics
fit_stats <- tibble(
  `R-squared` = model_summary$r.squared,
  `Adj. R-squared` = model_summary$adj.r.squared,
  `F statistic` = model_summary$fstatistic[1],
  `df model` = model_summary$fstatistic[2],
  `df residual` = model_summary$fstatistic[3],
  `Residual Std. Error` = model_summary$sigma
)

fit_gt <- fit_stats %>%
  gt() %>%
  fmt_number(columns = where(is.numeric), decimals = 3) %>%
  tab_header(title = "Model Fit Statistics")

# Identify high leverage points
leverage_values <- hatvalues(model)
threshold <- 2 * (length(coef(model)) / nrow(mydata))

outlier_table <- tibble(
  Row = 1:nrow(mydata),
  video = mydata$video,
  pct = mydata$pct,
  Leverage = leverage_values
) %>%
  arrange(desc(Leverage)) %>%
  slice_head(n = 10)

outlier_gt <- outlier_table %>%
  gt() %>%
  fmt_number(columns = Leverage, decimals = 4) %>%
  tab_header(title = "Top 10 Leverage Values") %>%
  cols_label(Row = "Row #", video = "Video hours", pct = "Perceived %", Leverage = "Leverage")

# Display results
coef_gt
Regression Coefficients
Term Estimate Std. Error t statistic p-value
(Intercept) 23.208 2.203 10.536 0.000
video 0.844 0.063 13.406 0.000
fit_gt
Model Fit Statistics
R-squared Adj. R-squared F statistic df model df residual Residual Std. Error
0.311 0.309 179.711 1.000 398.000 9.737
outlier_gt
Top 10 Leverage Values
Row # Video hours Perceived % Leverage
164 60 75 0.0305
360 55 67 0.0207
359 14 15 0.0194
371 53 52 0.0174
72 16 34 0.0162
201 52 75 0.0159
265 52 45 0.0159
392 17 28 0.0148
44 51 65 0.0144
97 51 69 0.0144

The slope coefficient is positive (about 0.84), meaning that each additional hour of weekly viewing is associated with roughly a 0.84‑point increase in perceived occupational percentage. The intercept of around 23 indicates the baseline estimate when viewing is zero hours. The t statistic for the slope far exceeds conventional thresholds and the p‑value is effectively zero, so the association is statistically significant. The model’s R‑squared (≈0.31) indicates that viewing hours explain about 31 percent of the variance in perceived occupation percentages.

The leverage table highlights observations with unusual combinations of predictor values. A few participants reported extremely high or low viewing hours, such as 60 hours and 14 hours per week. Removing the most extreme cases had little impact on the slope or R‑squared (not shown), so they were retained in the final model.

Interpretation and Conclusion

The analysis finds a clear positive relationship between television viewing and perceptions of how many people work in law enforcement, medicine and emergency response. Participants who watched more TV estimated higher occupational percentages. With a sample of 400 adults, the relationship is precise: the estimated slope of 0.84 is more than 13 standard errors above zero, and the confidence interval lies well above zero. Although viewing hours account for about a third of the variation in perceptions, other factors must also influence people’s beliefs.

From a cultivation perspective, these results align with the idea that heavy exposure to television’s portrayal of crime and emergency services cultivates inflated beliefs about their prevalence. People who spend more time watching TV may see more police officers, doctors and firefighters on screen, leading them to infer that such occupations are more common than they truly are. However, this cross‑sectional study cannot establish causation. Individuals who are already interested in these fields might both watch more related programming and perceive higher employment levels. Future work could incorporate longitudinal data or experimental designs to better isolate the causal mechanism.

Overall, the cultivation analysis supports the hypothesis: greater television viewing is associated with higher estimates of the proportion of people employed in law enforcement, medicine and emergency services. The relationship is substantial and statistically robust, even after accounting for potential outliers.