The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis
You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
library(stringr)
library(ggplot2)
library(DT)
#API Key
apiKey = "ZHUSRtGwBb1cvhmgHPJ6QIiXQNGy65as"
I will be using the Top Stories API, and filter Technology API.
url <- paste("https://api.nytimes.com/svc/topstories/v2/technology.json?api-key=", apiKey, sep='')
techData <- fromJSON(url) %>%
as.data.frame() %>%
select(-results.multimedia)
datatable(techData,extensions='Scroller',options=list(scrollY=500,scroller=TRUE))
techData_df <- techData %>%
select(last_updated,results.published_date,results.section,results.subsection, results.title,results.abstract,results.url, results.byline, results.des_facet)
datatable(techData_df,extensions='Scroller',options=list(scrollY=500,scroller=TRUE))
colnames <- c('LAST_UPDATED', 'PUBLISHED_DATE', 'WEBSITE_SECTION' , 'WEBSITE_SUBSECTION', 'TITLE', 'ABSTRACT', 'URL', 'AUTHOR', 'TAGS')
colnames(techData_df) <- colnames
datatable(techData_df,extensions='Scroller',options=list(scrollY=500,scroller=TRUE))
Now that we have the data tidied up, lets do some visualization.
finalData <- unnest(techData_df, TAGS)
finalData
## # A tibble: 141 x 9
## LAST_UPDATED PUBLISHED_DATE WEBSITE_SECTION WEBSITE_SUBSECT~ TITLE ABSTRACT
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 2 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 3 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 4 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 5 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 6 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 7 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 8 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 9 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## 10 2021-10-23T1~ 2021-10-23T15:~ technology "" In I~ Interna~
## # ... with 131 more rows, and 3 more variables: URL <chr>, AUTHOR <chr>,
## # TAGS <chr>
finalCounts <- as.data.frame(table(finalData$TAGS)%>% sort(decreasing= TRUE))
colnames(finalCounts) <- c('Tag', 'Frequency')
top_n(finalCounts, n=10, Frequency) %>%
ggplot(., aes(x=Tag, y=Frequency))+
geom_bar(stat='identity') +
ggtitle("Top Tags") +
xlab("Tags") + ylab("# of articles") +
theme(axis.text.x = element_text(angle = 90))
ggplot(techData_df, aes(x = AUTHOR)) +
geom_histogram(stat="count") +
theme(axis.text.x = element_text(angle = 90))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
The top tag in top technology stories is Computer and the Internet followed by social media with Shivra Ovalde having written the most articles.