Data 607: Assignment 9

Introduction

This week’s assignment focuses on APIs - I choose from one New York Times APIs, specifically top stories in the US today. I used jsonlite to construct an interface in R to read in the JSON data, and transform it into an R dataframe after signing up for an API key.

#Removed API key from code

top_stories <- paste0("https://api.nytimes.com/svc/topstories/v2/us.json?api-key=", api_key)

top_stories <- GET(top_stories)

stories_text <- content(top_stories, as = "text", encoding = "UTF-8")

stories_df <-fromJSON(stories_text) %>% as.data.frame

stories_df$results.title

##  [1] "How Trump Connected With So Many Americans"                                            
##  [2] "Resist or Retreat? Democratic Voters Are Torn About Whether to Keep Fighting."         
##  [3] "Plea Deals for Accused 9/11 Plotters Are Valid, Judge Rules"                           
##  [4] "Vindman Wins Virginia House Race, Keeping a Key Seat in Democratic Hands"              
##  [5] "Will a Woman Ever Be President?"                                                       
##  [6] "Grant Ujifusa, 82, Dies; Lobbied for Redress for Japanese Americans"                   
##  [7] "7 Salisbury U. Students Beat Person Because of Sexual Orientation, Police Say"         
##  [8] "Early Results Suggest the Polls Were Notably Accurate"                                 
##  [9] "How Harris’s Loss Could Haunt Biden’s Legacy"                                          
## [10] "Democrats Again Banked on the ‘Blue Wall.’ It Crumbled."                               
## [11] "Harris Says She Concedes the Election, but Not Her Fight"                              
## [12] "Robert F. Kennedy Jr., Foe of Drug Makers and Regulators, Is Poised to Wield New Power"
## [13] "Elon Musk Helped Elect Trump. What Does He Expect in Return?"                          
## [14] "Voters Reject Efforts to Loosen Drug Laws in Several States"                           
## [15] "Harris Asked Voters to Protect Democracy. Here’s Why It Didn’t Land."                  
## [16] "Zelensky Urges Trump to Help Defend Ukraine Against Russia"                            
## [17] "An Emboldened G.O.P. Senate Majority Is Ready to Empower Trump"                        
## [18] "Why Does It Take So Long to Call the House?"                                           
## [19] "Jack Smith Assesses How to Wind Down Trump’s Federal Cases, Official Says"             
## [20] "Voters Were Fed Up Over Immigration. They Voted for Trump."                            
## [21] "Top Republicans Hail Trump’s Victory, and Even G.O.P. Critics Congratulate Him"        
## [22] "Slotkin Defeats Rogers in Michigan Senate Race, Holding a Democratic Seat"             
## [23] "What Is Project 2025, and Why Did Trump Distance Himself From It During the Campaign?"

Analysis

Let’s take a look at the “subsection” of US top stories and plot them to see what most of them are about. I also used REGEX to look for certain words in the article’s abstract.

stories_us <- stories_df %>%
  select(results.subsection) %>%
     group_by(results.subsection) %>%
  summarise(count = n())

stories_us$results.subsection[stories_us$results.subsection == ""] <- NA

stories_us <- stories_us %>% drop_na()

ggplot(data=stories_us, aes(x=results.subsection, y=count)) +
  geom_bar(stat="identity", fill="darkgreen", position = "dodge")+
  ggtitle("Top Articles in the US Subsections") +
   ylab("Frequency") + xlab("Article Subsections")

str_subset(stories_df$results.abstract, pattern = "elect")

## [1] "After Kamala Harris became the second woman to lose a presidential election to Donald J. Trump, some women wondered if the glass ceiling would ever break."                                          
## [2] "Her commitment to a peaceful transfer of power was more than President-elect Trump ever offered to President Biden and Vice President Kamala Harris after they defeated him in 2020."                
## [3] "President-elect Donald Trump has encouraged him to “go wild on health” but has not made clear what role Mr. Kennedy will play."                                                                      
## [4] "The world’s richest man gave his money and time in campaigning for the president-elect and now is putting in his requests for a friendlier regulatory environment."                                  
## [5] "President-elect Donald J. Trump has often shown disdain toward Ukraine and its leader, President Volodymyr Zelensky."                                                                                
## [6] "With a decisive margin in the Senate, Republicans, who have shown their willingness to accommodate the president-elect, will have the numbers to overcome divisions over his personnel and policies."

str_subset(stories_df$results.abstract, pattern = "policy")

## [1] "The decision is in keeping with a longstanding Justice Department policy that bars prosecution of sitting presidents."                                                        
## [2] "Democrats had attacked Donald J. Trump’s ties to the conservative policy blueprint for reshaping the federal government. Several of its authors served in his administration."

Conclusion

The dataframe I pulled from the NYT API has 24 observations and 24 variables. I took a quick look at the titles of the 24 current articles in the US, and found that most of the top stories were about the 2024 election or politics but most likely still the election. I also searched for some words in the article with certain words, to see how many articles have discuss these topics. In the future, I would look at how top us articles compare to world articles, or even look at other methods to read in JSON data and transform it into a dataframe.

Data 607: Assignment 9

2024-11-04

Introduction

Analysis

Conclusion