Rationale:

The Los Angeles Dodgers repeatedly made headlines this year with a strong season, followed by exciting post-season play that led the team to win the World Series. Two of their star players, pitcher Shohei Ohtani and rightfielder Mookie Betts, also made news themselves this season, sometimes for their talents on the field and sometimes for things off the field that often were out of their control. For Ohtani, it was the conviction and sentencing of his translator on sports gambling charges, while Betts was sidelined for part of the season due to what initially was referred to as a “mystery illness,” and a stomach bug, and eventually diagnosed as a Norovirus which caused him to lose a noticeable amount of weight. Both situations played out in the first ten months of this year, and coinciding with the start of baseball season as both players tried to carry on with their baseball careers with the Los Angeles team.

First-level agenda setting theory would suggest that, due to limited resources both in creating and publishing news, that media attention would likely focus on one or the other of these players and their stories at a time, rather than both at the same time and when media interest in one rose, the other would dip.

In this study, based on first-level agenda-setting theory, I will compare the media coverage of both Dodger players: Shohei Ohtani and Mookie Betts.

Specifically, we will look at and compare the number of stories published about or mentioning each player on APNews.com between January 1st and September 30th of this year.

We anticipate that, based on what first-level agenda setting theory tells us, that Ohtani and Betts are not only competing on the baseball field per se’, but that they are also “competing,” if you will, for coverage, whether they’re really trying to or not. In other words, when one player is making headlines, it’s likely the other will not be, or at least will to a much less extent. It would seem that first-level agenda setting theory does not allow for what one might describe as “too much” baseball coverage.


Hypothesis:

Weekly coverage of Los Angeles Dodger players Shohei Ohtani and Mookie Betts on APNews.com will flucuate respectively during the first nine months of 2025.


Variables & Method:

The dependent variable in my analysis is the number of stories or mentions of the two L.A. Dodger players in weekly APNews.com coverage from January 1st through September 30th of this year, in other words, the amount of coverage. This was a continuous measurement. Meanwhile, the independent variables are the specific players, Ohtani and Betts. While we would eventually like to do further analysis studying, breaking down and comparing the coverage of each player to determine whether each got more coverage for what happened to them on or off the field, our preliminary focus solely examined the coverage of Ohtani and Betts in more general terms, essentially comparing overall coverage of the players. For that reason, we had just one “Topic phrase” for each category and that was simply each man’s full name, Shohei Ohtani and Mookie Betts.


Results & Discussion:

The graph lays out the findings, showing that across the board, Shohei Ohtani regularly got more news coverage than Mookie Betts each week on APNews.com. There were few mentions of either player in the first few weeks of the year as the baseball season had not yet started. Coverage was related to off-field news. In week 10, Ohtani’s coverage spiked as the Dodgers traveled to Japan to play in what was dubbed the “Tokyo Series,” two exhibition games between the LA team and the Chicago Cubs. For the Japanese native, it would be natural for a bump in coverage with him playing in Japan. Likewise, Betts’ coverage saw a spike after he got sick ahead of the trip to Japan and then was sidelined with his illness after returning to the U.S. But, both men saw the biggest jumps in coverage, it turns out, when they played well on the field. Other than the spike he saw in week 14 when he was sick, Betts saw his second highest number of stories in week 37 after he had a particularly good streak on the field including a grand-slam home-run, while Ohtani also saw jumps in coverage when he not only pitched strong games, but also had a couple of good hitting streaks, which is rare for pitchers.

According to the analysis, Ohtani had an average of just over 5 (5.333) news references a week during the period studied, while Betts had slightly under 1 (0.974) news references a week.

While it would be expected that coverage of any one baseball player would ebb and flow over a season, the results here appear to confirm first-level agenda setting theory as the diagram shows, when coverage of one player on APNews.com rose, the other dropped and vice versa.

The Shapiro-Wilk Normality Test result shows a P-Value just slightly over the .05 value mark, indicating somewhat close to normality. But since the case count is also just under 40 (39), we’ll also include the results of the Wilcoxon Signed Rank Test where the P-Value was 0.000, well below the standard of .05.

Descriptive Statistics: Pair Differences
count mean sd min max
39.000 4.359 2.906 −1.000 11.000
Normality Test (Shapiro-Wilk)
statistic p.value method
0.9438 0.0509 Shapiro-Wilk normality test
If the P.VALUE is 0.05 or less, the number of pairs is fewer than 40, and the distribution of pair differences shows obvious non-normality or outliers, consider using the Wilcoxon Signed Rank Test results instead of the Paired-Samples t-Test results.
Paired-Samples t-Test
statistic parameter p.value conf.low conf.high method
9.3664 38 0.0000 3.4169 5.3011 Paired t-test
Group Means and SDs (t-Test)
V1_Mean V2_Mean V1_SD V2_SD
0.974 5.333 1.088 2.941
Wilcoxon Signed Rank Test
statistic p.value method
2.0000 0.0000 Wilcoxon signed rank test with continuity correction
Group Means and SDs (Wilcoxon)
V1_Mean V2_Mean V1_SD V2_SD
0.974 5.333 1.088 2.941

Code:

Here’s the code used in this analysis and to produce the accompanying graph.

# ============================================
# APNews text analysis (First-level agenda-setting theory version)
# ============================================

# ============================================
# --- Load required libraries ---
# ============================================

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")

library(tidyverse)
library(tidytext)

# ============================================
# --- Load the APNews data ---
# ============================================

# Read the data from the web
FetchedData <- readRDS(url("https://github.com/drkblake/Data/raw/refs/heads/main/APNews.rds"))
# Save the data on your computer
saveRDS(FetchedData, file = "APNews.rds")
# remove the downloaded data from the environment
rm (FetchedData)

APNews <- readRDS("APNews.rds")

# ============================================
# --- Flag Topic1-related stories ---
# ============================================

# --- Define Topic1 phrases ---
phrases <- c(
  "Shohei Ohtani"
)

# --- Escape regex special characters ---
escaped_phrases <- str_replace_all(
  phrases,
  "([\\^$.|?*+()\\[\\]{}\\\\])",
  "\\\\\\1"
)

# --- Build whole-word/phrase regex pattern ---
pattern <- paste0("\\b", escaped_phrases, "\\b", collapse = "|")

# --- Apply matching to flag Topic1 stories ---
APNews <- APNews %>%
  mutate(
    Full.Text.clean = str_squish(Full.Text),  # normalize whitespace
    Topic1 = if_else(
      str_detect(Full.Text.clean, regex(pattern, ignore_case = TRUE)),
      "Yes",
      "No"
    )
  )

# ============================================
# --- Flag Topic2-related stories ---
# ============================================

# --- Define Topic2 phrases ---
phrases <- c(
  "Mookie Betts"
)

# --- Escape regex special characters ---
escaped_phrases <- str_replace_all(
  phrases,
  "([\\^$.|?*+()\\[\\]{}\\\\])",
  "\\\\\\1"
)

# --- Build whole-word/phrase regex pattern ---
pattern <- paste0("\\b", escaped_phrases, "\\b", collapse = "|")

# --- Apply matching to flag Topic2 stories ---
APNews <- APNews %>%
  mutate(
    Full.Text.clean = str_squish(Full.Text),
    Topic2 = if_else(
      str_detect(Full.Text.clean, regex(pattern, ignore_case = TRUE)),
      "Yes",
      "No"
    )
  )

# ============================================
# --- Visualize weekly counts of Topic1- and Topic2-related stories ---
# ============================================

# --- Load plotly if needed ---
if (!require("plotly")) install.packages("plotly")
library(plotly)

# --- Summarize weekly counts for Topic1 = "Yes" ---
Topic1_weekly <- APNews %>%
  filter(Topic1 == "Yes") %>%
  group_by(Week) %>%
  summarize(Count = n(), .groups = "drop") %>%
  mutate(Topic = "Shohei Ohtani") # Note custom Topic1 label

# --- Summarize weekly counts for Topic2 = "Yes" ---
Topic2_weekly <- APNews %>%
  filter(Topic2 == "Yes") %>%
  group_by(Week) %>%
  summarize(Count = n(), .groups = "drop") %>%
  mutate(Topic = "Mookie Betts") # Note custom Topic2 label

# --- Combine both summaries into one data frame ---
Weekly_counts <- bind_rows(Topic2_weekly, Topic1_weekly)

# --- Fill in missing combinations with zero counts ---
Weekly_counts <- Weekly_counts %>%
  tidyr::complete(
    Topic,
    Week = full_seq(range(Week), 1),  # generate all week numbers
    fill = list(Count = 0)
  ) %>%
  arrange(Topic, Week)

# --- Create interactive plotly line chart ---
AS1 <- plot_ly(
  data = Weekly_counts,
  x = ~Week,
  y = ~Count,
  color = ~Topic,
  colors = c("steelblue", "firebrick"),
  type = "scatter",
  mode = "lines+markers",
  line = list(width = 2),
  marker = list(size = 6)
) %>%
  layout(
    title = "Weekly Counts of Topic1- and Topic2-Related AP News Articles",
    xaxis = list(
      title = "Week Number (starting with Week 1 of 2025)",
      dtick = 1
    ),
    yaxis = list(title = "Number of Articles"),
    legend = list(title = list(text = "Topic")),
    hovermode = "x unified"
  )

# ============================================
# --- Show the chart ---
# ============================================

AS1

# ============================================================
#  Setup: Install and Load Required Packages
# ============================================================
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("plotly")) install.packages("plotly")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")
if (!require("broom")) install.packages("broom")

library(tidyverse)
library(plotly)
library(gt)
library(gtExtras)
library(broom)

options(scipen = 999)

# ============================================================
#  Data Import
# ============================================================
# Reshape to wide form

mydata <- Weekly_counts %>%
  pivot_wider(names_from = Topic, values_from = Count)
names(mydata) <- make.names(names(mydata))

# Specify the two variables involved
mydata$V1 <- mydata$Mookie.Betts # <== Customize this
mydata$V2 <- mydata$Shohei.Ohtani # <== Customize this

# ============================================================
#  Compute Pair Differences
# ============================================================
mydata$PairDifferences <- mydata$V2 - mydata$V1

# ============================================================
#  Interactive Histogram of Pair Differences
# ============================================================
hist_plot <- plot_ly(
  data = mydata,
  x = ~PairDifferences,
  type = "histogram",
  marker = list(color = "#1f78b4", line = list(color = "black", width = 1))
) %>%
  layout(
    title = "Distribution of Pair Differences",
    xaxis = list(title = "Pair Differences"),
    yaxis = list(title = "Count"),
    shapes = list(
      list(
        type = "line",
        x0 = mean(mydata$PairDifferences, na.rm = TRUE),
        x1 = mean(mydata$PairDifferences, na.rm = TRUE),
        y0 = 0,
        y1 = max(table(mydata$PairDifferences)),
        line = list(color = "red", dash = "dash")
      )
    )
  )

# ============================================================
#  Descriptive Statistics
# ============================================================
desc_stats <- mydata %>%
  summarise(
    count = n(),
    mean = mean(PairDifferences, na.rm = TRUE),
    sd = sd(PairDifferences, na.rm = TRUE),
    min = min(PairDifferences, na.rm = TRUE),
    max = max(PairDifferences, na.rm = TRUE)
  )

desc_table <- desc_stats %>%
  gt() %>%
  gt_theme_538() %>%
  tab_header(title = "Descriptive Statistics: Pair Differences") %>%
  fmt_number(columns = where(is.numeric), decimals = 3)

# ============================================================
#  Normality Test (Shapiro-Wilk)
# ============================================================
shapiro_res <- shapiro.test(mydata$PairDifferences)
shapiro_table <- tidy(shapiro_res) %>%
  select(statistic, p.value, method) %>%
  gt() %>%
  gt_theme_538() %>%
  tab_header(title = "Normality Test (Shapiro-Wilk)") %>%
  fmt_number(columns = c(statistic, p.value), decimals = 4) %>%
  tab_source_note(
    source_note = "If the P.VALUE is 0.05 or less, the number of pairs is fewer than 40, and the distribution of pair differences shows obvious non-normality or outliers, consider using the Wilcoxon Signed Rank Test results instead of the Paired-Samples t-Test results."
  )

# ============================================================
#  Reshape Data for Repeated-Measures Plot
# ============================================================
df_long <- mydata %>%
  pivot_longer(cols = c(V1, V2),
               names_to = "Measure",
               values_to = "Value")

# ============================================================
#  Repeated-Measures Boxplot (Interactive, with Means)
# ============================================================
group_means <- df_long %>%
  group_by(Measure) %>%
  summarise(mean_value = mean(Value), .groups = "drop")

boxplot_measures <- plot_ly() %>%
  add_trace(
    data = df_long,
    x = ~Measure, y = ~Value,
    type = "box",
    boxpoints = "outliers",   
    marker = list(color = "red", size = 4),
    line = list(color = "black"),
    fillcolor = "royalblue",
    name = ""
  ) %>%
  add_trace(
    data = group_means,
    x = ~Measure, y = ~mean_value,
    type = "scatter", mode = "markers",
    marker = list(
      symbol = "diamond", size = 9,
      color = "black", line = list(color = "white", width = 1)
    ),
    text = ~paste0("Mean = ", round(mean_value, 2)),
    hoverinfo = "text",
    name = "Group Mean"
  ) %>%
  layout(
    title = "Boxplot of Repeated Measures (V1 vs V2) with Means",
    xaxis = list(title = "Measure"),
    yaxis = list(title = "Value"),
    showlegend = FALSE
  )

# ============================================================
#  Parametric Test (Paired-Samples t-Test)
# ============================================================
t_res <- t.test(mydata$V2, mydata$V1, paired = TRUE)
t_table <- tidy(t_res) %>%
  select(statistic, parameter, p.value, conf.low, conf.high, method) %>%
  gt() %>%
  gt_theme_538() %>%
  tab_header(title = "Paired-Samples t-Test") %>%
  fmt_number(columns = c(statistic, p.value, conf.low, conf.high), decimals = 4)

t_summary <- mydata %>%
  select(V1, V2) %>%
  summarise_all(list(Mean = mean, SD = sd)) %>%
  gt() %>%
  gt_theme_538() %>%
  tab_header(title = "Group Means and SDs (t-Test)") %>%
  fmt_number(columns = everything(), decimals = 3)

# ============================================================
#  Nonparametric Test (Wilcoxon Signed Rank)
# ============================================================
wilcox_res <- wilcox.test(mydata$V1, mydata$V2, paired = TRUE,
                          exact = FALSE)
wilcox_table <- tidy(wilcox_res) %>%
  select(statistic, p.value, method) %>%
  gt() %>%
  gt_theme_538() %>%
  tab_header(title = "Wilcoxon Signed Rank Test") %>%
  fmt_number(columns = c(statistic, p.value), decimals = 4)

wilcox_summary <- mydata %>%
  select(V1, V2) %>%
  summarise_all(list(Mean = mean, SD = sd)) %>%
  gt() %>%
  gt_theme_538() %>%
  tab_header(title = "Group Means and SDs (Wilcoxon)") %>%
  fmt_number(columns = everything(), decimals = 3)

# ============================================================
#  Results Summary (in specified order)
# ============================================================
hist_plot
desc_table
shapiro_table
boxplot_measures
t_table
t_summary
wilcox_table
wilcox_summary