NYT Archive API

Using the New York Times Archive API, we will be analyzing the articles from November of 2020, the election month for the 2020 election, in order to gauge its weight in the new industry.

library(jsonlite)
library(tidyverse)
library(wordcloud)

url = "https://api.nytimes.com/svc/archive/v1/2020/11.json?api-key="
data <- as.data.frame(fromJSON(paste(url,key,sep = "")))

d <- data %>% select(Abstract = response.docs.abstract, Snippet = response.docs.snippet, Lead_Paragraph = response.docs.lead_paragraph, response.docs.type_of_material, Section = response.docs.section_name, Subsection = response.docs.subsection_name, Doc_Type = response.docs.document_type)
d["Headline"] <- data$response.docs.headline$main
n <- nrow(d)

What are the top 5 sections, and subsections

data.frame(table(d$Section)) %>% mutate(Proportion = Freq/n)  %>% arrange(desc(Proportion)) %>% head(5)

##           Var1 Freq Proportion
## 1         U.S. 1777 0.34747751
## 2        World  514 0.10050841
## 3      Opinion  353 0.06902620
## 4 Business Day  265 0.05181854
## 5         Arts  252 0.04927650

data.frame(table(d$Subsection)) %>% mutate(Proportion = Freq/n)  %>% arrange(desc(Proportion)) %>% head(5)

##          Var1 Freq Proportion
## 1   Elections  786 0.15369574
## 2    Politics  575 0.11243645
## 3      Europe  140 0.02737583
## 4 Book Review  113 0.02209621
## 5  Television   97 0.01896754

The top five sections are: U.S., World, Opinion, Business Day, and Arts. The top five sub sections are: Elections, Politics, Europe, Book Review, and Television. Of all the New York Times articles 34.8% were related to the U.S.

Election

As we are interested in the 2020 election, the data will be filtered to contain rows that have any of the top three (U.S., World, Opinion) as the section.

election <- d %>% filter(Section == "U.S." | Section == "World" | Section == "Opinion")
head(election,3)

##                                                                                                                                              Abstract
## 1 PHILADELPHIA — Add Debra Messing and Kathy Najimy to the thousands of canvassers spread out across Philadelphia for the Biden campaign on Saturday.
## 2                                                             Here are 20 counties in battleground states that are crucial for a White House victory.
## 3                                    SUN CITY, Ariz. — Don’t believe the polls. Don’t believe the media. And definitely do not believe the Democrats.
##                                                                                   Snippet
## 1                                                                                        
## 2 Here are 20 counties in battleground states that are crucial for a White House victory.
## 3                                                                                        
##                                                                                                                                        Lead_Paragraph
## 1 PHILADELPHIA — Add Debra Messing and Kathy Najimy to the thousands of canvassers spread out across Philadelphia for the Biden campaign on Saturday.
## 2                                                             Here are 20 counties in battleground states that are crucial for a White House victory.
## 3                                    SUN CITY, Ariz. — Don’t believe the polls. Don’t believe the media. And definitely do not believe the Democrats.
##   response.docs.type_of_material Section Subsection   Doc_Type
## 1                           News    U.S.  Elections    article
## 2            Interactive Feature    U.S.   Politics multimedia
## 3                           News    U.S.  Elections    article
##                                                                 Headline
## 1 Celebrities lend Biden a hand in turning out the vote in Philadelphia.
## 2                                 The Battlegrounds Within Battlegrounds
## 3   ‘They’re coming after our state,’ McSally warns Arizona Republicans.

The “Headlines” function take in a word and returns its count in the “Headlines” column in the “elections” data frame. This function will be used to acquire the frequency of all the desired words in the word bank.

Headlines <- function(x){

  counts <- data.frame(table(unlist(strsplit(tolower(election$Headline), " ")))) %>% arrange(desc(Freq))
  x <- counts[which(counts$Var1 == x),]
  return(x)

}

A word bank is created consisting of keywords related to the 2020 election. These words will be the targets for our analysis. We would like to see the frequency of their appearance in Article headlines.

words <- c("biden", "trump",  "harris", "election", "2020", "vote", "president", "vice", "coronavirus", "covid-19", "american", "democrat", "republican", "swing", "state", "votes", "results", "fraud", "battleground", "progressive", "ballot" )

results <- c()
counter <- 1
for(word in words)
{
  results[[counter]] <- Headlines(word)
  counter <- counter +1
}
results <- as.data.frame(bind_rows(results))
results <- cbind(as.data.frame(words), bind_cols(words, results))

## New names:
## * NA -> ...1

results <- results[-2]
results <- results[-2]
colnames(results) <- c("Word", "Count")

results %>% arrange(desc(Count))

##            Word Count
## 1      election   745
## 2         trump   233
## 3         biden   205
## 4       results   170
## 5   coronavirus    65
## 6          vote    36
## 7         state    31
## 8    republican    28
## 9     president    24
## 10        votes    24
## 11         2020    20
## 12     covid-19    18
## 13        fraud    16
## 14       harris    13
## 15        swing     9
## 16       ballot     9
## 17     american     8
## 18     democrat     7
## 19         vice     5
## 20  progressive     3
## 21 battleground     2

From the analysis we can see that of the 2,644 articles dealing with the US, politics, and Opinion, the word “Election” appeared 745 times, followed by “Trump” (233) and “Biden” (205)

wordcloud(words = results$Word, freq = results$Count,  max.words=200, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, "Dark2"))

Conclusion

With the results we have gotten, we can say that the 2020 election was an important event as it relates to news for November of 2020. The word with the highest could was “election”, with the names of the two presidential candidates following behind. As a continuation of this work, we would like to create a more neutral word bank and apply it to the articles published in the election month of the past few presidential elections.

New_York_Times

NYT Archive API

What are the top 5 sections, and subsections

Election

Conclusion