Overview

Which person - Donald Trump or Joe Biden - got more coverage in U.S. online news content between Jan. 1 and Dec. 31 of 2022? You might want to know if you were a media researcher, a political strategist, or just a news junkie. This R script will find and graph an answer for you. Furthermore, you can tweak it to get the same information about any two public figures, for any recent range of dates. The coverage data come from the GDELT 2.0 API.

Installing required packages

First, install and load the tidyverse, plotly, and readr packages, which include some tools the script will need. The if(!require() code means that R will skip package installation if you already have the package installed.

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("plotly")) install.packages("plotly")
if (!require("readr")) install.packages("readr")
library(tidyverse)
library(plotly)
library(readr)

Defining the date range

Now, define the beginning and end dates for the time period you want to search. Use four digits for the year, two digits for the month, and two digits for the day. For example, use 20220101 for Jan. 1, 2022. If you edit the dates below, be sure to leave the quote marks around each date.

startdate <- "20220101"
enddate <- "20221231"

Searching for “Trump” articles

This code is set to search for stories that mention “Trump” and that were published by U.S.-based online news sources during the specified time frame. You can change Trump to another search term, if you like. Most of the code is about constructing and encoding a URL that the GDELT 2.0 API will recognize and respond to, correctly formatting the data’s “Date” variable, and extracting the data into a data frame called “VolumeTrump.”

query <- "'Donald Trump' SourceCountry:US"
#Building the Volume dataframe
vp1 <- "https://api.gdeltproject.org/api/v2/doc/doc?query="
vp2 <- "&mode=timelinevolinfo&startdatetime="
vp3 <- "000000&enddatetime="
vp4 <- "000000&format=CSV"
text_v_url <- paste0(vp1, query, vp2, startdate, vp3, enddate, vp4)
v_url <- URLencode(text_v_url)
v_url

## [1] "https://api.gdeltproject.org/api/v2/doc/doc?query='Donald%20Trump'%20SourceCountry:US&mode=timelinevolinfo&startdatetime=20220101000000&enddatetime=20221231000000&format=CSV"

Volume <- read_csv(v_url)

## Rows: 365 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (21): Series, TopArtURL1, TopArtTitle1, TopArtURL2, TopArtTitle2, TopAr...
## dbl   (1): Value
## date  (1): Date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Volume$Date <- as.Date(Volume$Date, "%Y-%m-%d")
VolumeTrump <- Volume

Searching for “Biden” articles

A ‘rinse and repeat’ of the above code. This time, though, it searches for stories that mention Biden and extracts the data to a data frame called “VolumeBiden.”

query <- "'Joe Biden' SourceCountry:US"
#Building the Volume dataframe
vp1 <- "https://api.gdeltproject.org/api/v2/doc/doc?query="
vp2 <- "&mode=timelinevolinfo&startdatetime="
vp3 <- "000000&enddatetime="
vp4 <- "000000&format=CSV"
text_v_url <- paste0(vp1, query, vp2, startdate, vp3, enddate, vp4)
v_url <- URLencode(text_v_url)
v_url

## [1] "https://api.gdeltproject.org/api/v2/doc/doc?query='Joe%20Biden'%20SourceCountry:US&mode=timelinevolinfo&startdatetime=20220101000000&enddatetime=20221231000000&format=CSV"

Volume <- read_csv(v_url)

## Rows: 365 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (21): Series, TopArtURL1, TopArtTitle1, TopArtURL2, TopArtTitle2, TopAr...
## dbl   (1): Value
## date  (1): Date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Volume$Date <- as.Date(Volume$Date, "%Y-%m-%d")
VolumeBiden <- Volume

Merging and plotting the data

This block of code merges the two data frames and interactively plots, by day, the percentage of all GDELT-monitored articles mentioning “Trump,” and the percentage of all GDELT-monitored aticles mentioning “Biden.”

## Merging
VolumeTrumpBiden <- merge(VolumeTrump, VolumeBiden, by = "Date")
VolumeTrumpBiden$TrumpVolume <- VolumeTrumpBiden$Value.x
VolumeTrumpBiden$BidenVolume <- VolumeTrumpBiden$Value.y

#Plotting volume by date
library(plotly)
fig <- plot_ly(VolumeTrumpBiden, x = ~Date, y = ~TrumpVolume,
               name = 'Trump', 
               type = 'scatter', 
               mode = 'lines+markers') 
fig <- fig %>% add_trace(y = ~BidenVolume, 
                         name = 'Biden', 
                         mode = 'lines+markers') 
fig

Saving data to a local .csv file

This code will export the merged data in comma-separated value format and save the the exported file in the same directory as the script.

write_csv(VolumeTrumpBiden,"VolumeTrumpBiden.csv")

Performing a paired-samples t-test

Finally, here’s pair-samples t-test code for evaluating the null hypothesis that daily coverage of Biden equaled daily coverage of Trump during the specified time period. In the code and output, V1 represents Biden’s volume, and V2 represents Trump’s volume.

mydata <- VolumeTrumpBiden
mydata$V1 <- mydata$BidenVolume
mydata$V2 <- mydata$TrumpVolume
ggplot(mydata, aes(x = V1))+geom_histogram()+
  geom_vline(xintercept = mean(mydata$V1))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(mydata, aes(x = V2))+geom_histogram()+
  geom_vline(xintercept = mean(mydata$V2))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mean.V1 <- mean(mydata$V1, na.rm = TRUE)
mean.V2 <- mean(mydata$V2, na.rm = TRUE)
mean.V1

## [1] 1.057778

mean.V2

## [1] 0.8064332

options(scipen = 999)
t.test(mydata$V1, mydata$V2,
       paired = TRUE)

## 
##  Paired t-test
## 
## data:  mydata$V1 and mydata$V2
## t = 14.898, df = 364, p-value < 0.00000000000000022
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.2181688 0.2845205
## sample estimates:
## mean difference 
##       0.2513447

Full script

Just want the script? Here it is, all in one piece, ready to copy and paste.

#Installing and loading the tidyverse package
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("plotly")) install.packages("plotly")
if (!require("readr")) install.packages("readr")
library(tidyverse)
library(plotly)
library(readr)

### Date range
startdate <- "20220101"
enddate <- "20221231"

### Trump
query <- "'Trump' SourceCountry:US"
#Building the Volume dataframe
vp1 <- "https://api.gdeltproject.org/api/v2/doc/doc?query="
vp2 <- "&mode=timelinevolinfo&startdatetime="
vp3 <- "000000&enddatetime="
vp4 <- "000000&format=CSV"
text_v_url <- paste0(vp1, query, vp2, startdate, vp3, enddate, vp4)
v_url <- URLencode(text_v_url)
v_url
Volume <- read_csv(v_url)
Volume$Date <- as.Date(Volume$Date, "%Y-%m-%d")
VolumeTrump <- Volume

### Biden
query <- "'Biden' SourceCountry:US"
#Building the Volume dataframe
vp1 <- "https://api.gdeltproject.org/api/v2/doc/doc?query="
vp2 <- "&mode=timelinevolinfo&startdatetime="
vp3 <- "000000&enddatetime="
vp4 <- "000000&format=CSV"
text_v_url <- paste0(vp1, query, vp2, startdate, vp3, enddate, vp4)
v_url <- URLencode(text_v_url)
v_url
Volume <- read_csv(v_url)
Volume$Date <- as.Date(Volume$Date, "%Y-%m-%d")
VolumeBiden <- Volume

### Merging
VolumeTrumpBiden <- merge(VolumeTrump, VolumeBiden, by = "Date")
VolumeTrumpBiden$TrumpVolume <- VolumeTrumpBiden$Value.x
VolumeTrumpBiden$BidenVolume <- VolumeTrumpBiden$Value.y

#Plotting volume by date
library(plotly)
fig <- plot_ly(VolumeTrumpBiden, x = ~Date, y = ~TrumpVolume,
               name = 'Trump', 
               type = 'scatter', 
               mode = 'lines+markers') 
fig <- fig %>% add_trace(y = ~BidenVolume, 
                         name = 'Biden', 
                         mode = 'lines+markers') 
fig

### Saving the data to a local .csv file
write_csv(VolumeTrumpBiden,"VolumeTrumpBiden.csv")

### Running a paired-samples t-test
mydata <- VolumeTrumpBiden
mydata$V1 <- mydata$BidenVolume
mydata$V2 <- mydata$TrumpVolume
ggplot(mydata, aes(x = V1))+geom_histogram()+
  geom_vline(xintercept = mean(mydata$V1))
ggplot(mydata, aes(x = V2))+geom_histogram()+
  geom_vline(xintercept = mean(mydata$V2))
mean.V1 <- mean(mydata$V1, na.rm = TRUE)
mean.V2 <- mean(mydata$V2, na.rm = TRUE)
mean.V1
mean.V2
options(scipen = 999)
t.test(mydata$V1, mydata$V2,
       paired = TRUE)

Biden v Trump coverage script

Ken Blake

2023-02-02