First-Level Agenda Setting Theory states that issues in media commonly end up being ever present in the minds of those who consume it. There is only so much media coverage to go around of certain topics, and that means that in certain instances, some topics or people will be covered more in comparison to others. However, the amount that certain topics or people get featured can differ depending on the time of year, and due to different circumstances. The amount of coverage on the individual topics is not beholden to a set limit. One can be higher then lower or the exact same as another media topic.
For my project, I will compare the amount of coverage or media prominence for Men’s and Women’s Sports in our modern sports media landscape. This will compare their numbers from stories per week on APNews.com between Jan 1 and Sept 30, 2025.
This will help build comprehension of the media coverage similarities and/or disparity between men’s sports and women’s sports.
The coverage from APNews.com when it came to Men’s and Women’s Sports favored one set of sports compared to the other.
The amount of coverage of Men’s Sports and Women’s Sports served as the Dependent Variable in this situation. This was translated as the number of stories in a week that the two had. The independent variable was the content of the story, specifically whether it was a story about a Men’s Sport or Women’s Sport. The key words and phrases I used for Male Sports included: “NFL”, “NBA”, “NHL”, “MLB”, and “MLS.” The key words and phrases I used for Female Sports included: “WNBA”, “College Softball”, “PWHL”, and “NWSL.”
There will be a Paired Samples T-Test that I plan to use in due time to find the statistical significance between these two story topics.
The plot shows the disparity between the story coverage for Men’s Sports and Women’s Sports, and it shows that Men’s Sports are virtually always getting more coverage than Women’s Sports, at least when it comes to the AP.
The peak for Men’s Sports stories was back in Week 17 which would be at the time of the NFL Draft, which was a huge boom this year due to the sheer amount of coverage on draftees like Shedeur Sanders and Cam Ward. All 4 major sports had some sort of action taking place during this week, which likely contributed to the major spike of 192 stories. This was compared to only 10 stories that week for Women’s Sports.
The peak for Women’s Sports in 2025 so far was in Week 29 which would fall right around the time of the WNBA All-Star Weekend which featured heavy discussion on Caitlin Clark who is a very popular subject in the world of sports. This spike led to 32 stories being written at the time, however Men’s Sports still had the advantage with 63 stories, which is nearly double the amount of Women’s Sports stories.
These results show us that the total amount of coverage for Men’s Sports is far more than that of Women’s Sports. Throughout the entire year, there has been a clear advanatage for the major men’s leagues while the women’s leagues have rarely even come close. This demonstrates a noticeable approach to how the AP and other major networks may view covering these sports overall.
There were an average of 67 Men’s Sports stories per week from January-September. Women’s Sports averaged just 11 stories per week with an average difference between the two of 56 stories. The pair differences failed a Shapiro-Wilk Normality Test, but since there 40 cases, a paired-sample t-test was still okay to use for finding the statistical significance.
Below this is a box plot of the weekly story measures followed by the aforementioned t-test.
| Paired-Samples t-Test | |||||
| statistic | parameter | p.value | conf.low | conf.high | method |
|---|---|---|---|---|---|
| 13.1865 | 39 | 0.0000 | 47.6006 | 64.8494 | Paired t-test |
| Group Means and SDs (t-Test) | |||
| V1_Mean | V2_Mean | V1_SD | V2_SD |
|---|---|---|---|
| 11.100 | 67.325 | 6.547 | 26.602 |
The t-test result supported the hypothesis that there would be a difference in coverage between Men’s Sports and Women’s Sports in the first 9 months of 2025, and it turned out to fairly easy to spot.
The code below generated the prior results displayed in the document.
# ============================================
# APNews text analysis (First-level agenda-setting theory version)
# ============================================
# ============================================
# --- Load required libraries ---
# ============================================
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
library(tidyverse)
library(tidytext)
# ============================================
# --- Load the APNews data ---
# ============================================
# Read the data from the web
FetchedData <- readRDS(url("https://github.com/drkblake/Data/raw/refs/heads/main/APNews.rds"))
# Save the data on your computer
saveRDS(FetchedData, file = "APNews.rds")
# remove the downloaded data from the environment
rm (FetchedData)
APNews <- readRDS("APNews.rds")
# ============================================
# --- Flag Topic1-related stories ---
# ============================================
# --- Define Topic1 phrases ---
phrases <- c(
"NFL",
"NBA",
"NHL",
"MLB",
"MLS"
)
# --- Escape regex special characters ---
escaped_phrases <- str_replace_all(
phrases,
"([\\^$.|?*+()\\[\\]{}\\\\])",
"\\\\\\1"
)
# --- Build whole-word/phrase regex pattern ---
pattern <- paste0("\\b", escaped_phrases, "\\b", collapse = "|")
# --- Apply matching to flag Topic1 stories ---
APNews <- APNews %>%
mutate(
Full.Text.clean = str_squish(Full.Text), # normalize whitespace
Topic1 = if_else(
str_detect(Full.Text.clean, regex(pattern, ignore_case = TRUE)),
"Yes",
"No"
)
)
# ============================================
# --- Flag Topic2-related stories ---
# ============================================
# --- Define Topic2 phrases ---
phrases <- c(
"WNBA",
"College Softball",
"NWSL",
"PWHL"
)
# --- Escape regex special characters ---
escaped_phrases <- str_replace_all(
phrases,
"([\\^$.|?*+()\\[\\]{}\\\\])",
"\\\\\\1"
)
# --- Build whole-word/phrase regex pattern ---
pattern <- paste0("\\b", escaped_phrases, "\\b", collapse = "|")
# --- Apply matching to flag Topic2 stories ---
APNews <- APNews %>%
mutate(
Full.Text.clean = str_squish(Full.Text),
Topic2 = if_else(
str_detect(Full.Text.clean, regex(pattern, ignore_case = TRUE)),
"Yes",
"No"
)
)
# ============================================
# --- Visualize weekly counts of Topic1- and Topic2-related stories ---
# ============================================
# --- Load plotly if needed ---
if (!require("plotly")) install.packages("plotly")
library(plotly)
# --- Summarize weekly counts for Topic1 = "Yes" ---
Topic1_weekly <- APNews %>%
filter(Topic1 == "Yes") %>%
group_by(Week) %>%
summarize(Count = n(), .groups = "drop") %>%
mutate(Topic = "Men's Sports") # Note custom Topic1 label
# --- Summarize weekly counts for Topic2 = "Yes" ---
Topic2_weekly <- APNews %>%
filter(Topic2 == "Yes") %>%
group_by(Week) %>%
summarize(Count = n(), .groups = "drop") %>%
mutate(Topic = "Women's Sports") # Note custom Topic2 label
# --- Combine both summaries into one data frame ---
Weekly_counts <- bind_rows(Topic2_weekly, Topic1_weekly)
# --- Fill in missing combinations with zero counts ---
Weekly_counts <- Weekly_counts %>%
tidyr::complete(
Topic,
Week = full_seq(range(Week), 1), # generate all week numbers
fill = list(Count = 0)
) %>%
arrange(Topic, Week)
# --- Create interactive plotly line chart ---
AS1 <- plot_ly(
data = Weekly_counts,
x = ~Week,
y = ~Count,
color = ~Topic,
colors = c("steelblue", "firebrick"),
type = "scatter",
mode = "lines+markers",
line = list(width = 2),
marker = list(size = 6)
) %>%
layout(
title = "Weekly Counts of Topic1- and Topic2-Related AP News Articles",
xaxis = list(
title = "Week Number (starting with Week 1 of 2025)",
dtick = 1
),
yaxis = list(title = "Number of Articles"),
legend = list(title = list(text = "Topic")),
hovermode = "x unified"
)
# ============================================
# --- Show the chart ---
# ============================================
AS1
# ============================================================
# Setup: Install and Load Required Packages
# ============================================================
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("plotly")) install.packages("plotly")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")
if (!require("broom")) install.packages("broom")
library(tidyverse)
library(plotly)
library(gt)
library(gtExtras)
library(broom)
options(scipen = 999)
# ============================================================
# Data Import
# ============================================================
# Reshape to wide form
mydata <- Weekly_counts %>%
pivot_wider(names_from = Topic, values_from = Count)
names(mydata) <- make.names(names(mydata))
# Specify the two variables involved
mydata$V1 <- mydata$Women.s.Sports # <== Customize this
mydata$V2 <- mydata$Men.s.Sports # <== Customize this
# ============================================================
# Compute Pair Differences
# ============================================================
mydata$PairDifferences <- mydata$V2 - mydata$V1
# ============================================================
# Interactive Histogram of Pair Differences
# ============================================================
hist_plot <- plot_ly(
data = mydata,
x = ~PairDifferences,
type = "histogram",
marker = list(color = "#1f78b4", line = list(color = "black", width = 1))
) %>%
layout(
title = "Distribution of Pair Differences",
xaxis = list(title = "Pair Differences"),
yaxis = list(title = "Count"),
shapes = list(
list(
type = "line",
x0 = mean(mydata$PairDifferences, na.rm = TRUE),
x1 = mean(mydata$PairDifferences, na.rm = TRUE),
y0 = 0,
y1 = max(table(mydata$PairDifferences)),
line = list(color = "red", dash = "dash")
)
)
)
# ============================================================
# Descriptive Statistics
# ============================================================
desc_stats <- mydata %>%
summarise(
count = n(),
mean = mean(PairDifferences, na.rm = TRUE),
sd = sd(PairDifferences, na.rm = TRUE),
min = min(PairDifferences, na.rm = TRUE),
max = max(PairDifferences, na.rm = TRUE)
)
desc_table <- desc_stats %>%
gt() %>%
gt_theme_538() %>%
tab_header(title = "Descriptive Statistics: Pair Differences") %>%
fmt_number(columns = where(is.numeric), decimals = 3)
# ============================================================
# Normality Test (Shapiro-Wilk)
# ============================================================
shapiro_res <- shapiro.test(mydata$PairDifferences)
shapiro_table <- tidy(shapiro_res) %>%
select(statistic, p.value, method) %>%
gt() %>%
gt_theme_538() %>%
tab_header(title = "Normality Test (Shapiro-Wilk)") %>%
fmt_number(columns = c(statistic, p.value), decimals = 4) %>%
tab_source_note(
source_note = "If the P.VALUE is 0.05 or less, the number of pairs is fewer than 40, and the distribution of pair differences shows obvious non-normality or outliers, consider using the Wilcoxon Signed Rank Test results instead of the Paired-Samples t-Test results."
)
# ============================================================
# Reshape Data for Repeated-Measures Plot
# ============================================================
df_long <- mydata %>%
pivot_longer(cols = c(V1, V2),
names_to = "Measure",
values_to = "Value")
# ============================================================
# Repeated-Measures Boxplot (Interactive, with Means)
# ============================================================
group_means <- df_long %>%
group_by(Measure) %>%
summarise(mean_value = mean(Value), .groups = "drop")
boxplot_measures <- plot_ly() %>%
add_trace(
data = df_long,
x = ~Measure, y = ~Value,
type = "box",
boxpoints = "outliers",
marker = list(color = "red", size = 4),
line = list(color = "black"),
fillcolor = "royalblue",
name = ""
) %>%
add_trace(
data = group_means,
x = ~Measure, y = ~mean_value,
type = "scatter", mode = "markers",
marker = list(
symbol = "diamond", size = 9,
color = "black", line = list(color = "white", width = 1)
),
text = ~paste0("Mean = ", round(mean_value, 2)),
hoverinfo = "text",
name = "Group Mean"
) %>%
layout(
title = "Boxplot of Repeated Measures (V1 vs V2) with Means",
xaxis = list(title = "Measure"),
yaxis = list(title = "Value"),
showlegend = FALSE
)
# ============================================================
# Parametric Test (Paired-Samples t-Test)
# ============================================================
t_res <- t.test(mydata$V2, mydata$V1, paired = TRUE)
t_table <- tidy(t_res) %>%
select(statistic, parameter, p.value, conf.low, conf.high, method) %>%
gt() %>%
gt_theme_538() %>%
tab_header(title = "Paired-Samples t-Test") %>%
fmt_number(columns = c(statistic, p.value, conf.low, conf.high), decimals = 4)
t_summary <- mydata %>%
select(V1, V2) %>%
summarise_all(list(Mean = mean, SD = sd)) %>%
gt() %>%
gt_theme_538() %>%
tab_header(title = "Group Means and SDs (t-Test)") %>%
fmt_number(columns = everything(), decimals = 3)
# ============================================================
# Nonparametric Test (Wilcoxon Signed Rank)
# ============================================================
wilcox_res <- wilcox.test(mydata$V1, mydata$V2, paired = TRUE,
exact = FALSE)
wilcox_table <- tidy(wilcox_res) %>%
select(statistic, p.value, method) %>%
gt() %>%
gt_theme_538() %>%
tab_header(title = "Wilcoxon Signed Rank Test") %>%
fmt_number(columns = c(statistic, p.value), decimals = 4)
wilcox_summary <- mydata %>%
select(V1, V2) %>%
summarise_all(list(Mean = mean, SD = sd)) %>%
gt() %>%
gt_theme_538() %>%
tab_header(title = "Group Means and SDs (Wilcoxon)") %>%
fmt_number(columns = everything(), decimals = 3)
# ============================================================
# Results Summary (in specified order)
# ============================================================
hist_plot
desc_table
shapiro_table
boxplot_measures
t_table
t_summary
wilcox_table
wilcox_summary