Week 9 Homework-Data 607

New York Times API

The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.

Defined the parameters of the query (search key words, start/end data and the api key)

# March Madnes Query:

term <- "march+madness+basketball"
begin_date <- "20180301"
end_date <- "20180331"
api.key<- "&api-key=ace27904c56848de8cf3c9a855163697"

madness_url <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,"&begin_date=",begin_date,"&end_date=",
                  end_date)
madness_url

## [1] "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=march+madness+basketball&begin_date=20180301&end_date=20180331"

Used the “fromJSON” call to work with the API generate and parse the “madness_url”call

#Fetch the data, only gives 10 rows at a time
march_madness_query <- data.frame(fromJSON(paste0(madness_url, api.key)))%>%
                      select (-c(status, copyright, response.docs.blog,
                                 response.docs.multimedia, response.docs.headline,
                                 response.docs.keywords, response.docs.byline))
march_madness_query

                                                                                                                                 response.docs.web_url

1 https://www.nytimes.com/2018/03/13/learning/march-madness.html 2 https://www.nytimes.com/2018/03/27/opinion/poetry-march-madness-basketball.html 3 https://www.nytimes.com/aponline/2018/03/21/sports/ncaabasketball/ap-bkc-ncaa-tournament-regional-guide.html 4 https://www.nytimes.com/aponline/2018/03/19/sports/ncaabasketball/ap-bkc-ncaa-amped-up-madness.html 5 https://www.nytimes.com/2018/03/15/briefing/pennsylvania-elizabeth-holmes-march-madness.html 6 https://www.nytimes.com/2018/03/15/arts/television/whats-on-tv-thursday-march-madness-and-how-to-get-away-with-murder.html 7 https://www.nytimes.com/2018/03/17/sports/32-teams-7-days-1-gym.html 8 https://www.nytimes.com/2018/03/22/learning/student-walkouts-march-madness-and-risk-enhanced-playgrounds-our-favorite-student-comments-of-the-week.html 9 https://www.nytimes.com/2018/03/12/sports/march-madness-predictions-experts.html 10 https://www.nytimes.com/interactive/2018/03/20/learning/20StudentNewsQuiz-Hawking-Basketball-Protests.html response.docs.snippet 1 Who do you predict to win the tournament this year? 2 At Loyola, our poetry workshop was energized by our team’s win, not overshadowed by it. 3 The opening weekend of the NCAA Tournament was not madness. It was straight bonkers. 4 One word succinctly describes what’s transpired so far in the NCAA Tournament: 5 Here’s what you need to know to start your day. 6 March Madness begins with games all afternoon and evening. And watch season finales with RuPaul and Annalise Keating. 7 For true madness, see the N.A.I.A. basketball tournament in Kansas City. 8 The best teenage comments from last week’s writing prompts, and an invitation to join the conversation this week. 9 There is no shortage of experts with advice to guide you to all the winners in the N.C.A.A. tournament. Here’s a roundup of selections from people in a position to know. 10 How well did you follow the news this past week? How many of these 10 questions can you get right? response.docs.source response.docs.pub_date 1 The New York Times 2018-03-13T07:00:01+0000 2 The New York Times 2018-03-28T00:05:01+0000 3 AP 2018-03-21T18:48:29+0000 4 AP 2018-03-19T07:09:55+0000 5 The New York Times 2018-03-15T09:33:51+0000 6 The New York Times 2018-03-15T05:00:05+0000 7 The New York Times 2018-03-17T20:49:48+0000 8 The New York Times 2018-03-22T19:53:13+0000 9 The New York Times 2018-03-12T13:21:38+0000 10 The New York Times 2018-03-20T11:41:10+0000 response.docs.document_type response.docs.new_desk 1 article Learning 2 article OpEd 3 article None 4 article None 5 article NYTNow 6 article Culture 7 article Sports 8 article Learning 9 article Sports 10 multimedia The Learning Network response.docs.type_of_material response.docs._id 1 News 5aa776f747de81a90120def1 2 Op-Ed 5abadc3047de81a90121843f 3 News 5ab2a90047de81a901214a7a 4 News 5aaf624547de81a9012125f3 5 briefing 5aaa3e0247de81a901210423 6 Schedule 5aa9fdd947de81a90121027f 7 News 5aad7f7047de81a901211f17 8 News 5ab409b047de81a90121580d 9 News 5aa67ee547de81a90120d71b 10 Interactive Feature 5ab0f35f47de81a901213223 response.docs.word_count response.docs.score 1 108 0.04048129 2 926 0.02787079 3 737 0.02224132 4 1043 0.02179713 5 1191 0.01992367 6 480 0.01820390 7 1637 0.01780204 8 6452 0.01732480 9 1393 0.01672325 10 0 0.01666914 response.docs.uri 1 nyt://article/f990a3f7-b3e0-54c0-8973-ce8afb7f7bb8 2 nyt://article/9ea6717c-a398-5437-b60c-615212754e62 3 nyt://article/ccf9aebc-5436-5d2d-b15c-9b8a1fe27763 4 nyt://article/9454a0bb-9bcf-5206-9582-7f95b946f38d 5 nyt://article/68ae1f67-82e3-5142-85ee-b6b114c41fbf 6 nyt://article/9760a8f1-ad18-5fc6-9286-09a6f6d74c87 7 nyt://article/a352017e-2f03-5fad-961b-a54760dedf33 8 nyt://article/1b248820-b37c-5928-b9f3-db14d0312c54 9 nyt://article/3bb032a8-298a-5574-8ecc-9d9fe7cc2a95 10 nyt://interactive/c4d1deca-9d9e-5463-ae19-7d9743f413c7 response.docs.print_page response.docs.section_name response.meta.hits 1 172 2 23 172 3 College Basketball 172 4 College Basketball 172 5 172 6 7 Television 172 7 5 172 8 172 9 8 172 10 172 response.meta.offset response.meta.time 1 0 19 2 0 19 3 0 19 4 0 19 5 0 19 6 0 19 7 0 19 8 0 19 9 0 19 10 0 19

Find the max pages available. There is a column in the output “response.meta.hit” which provides the max number of pages with results (10 per page). Divide by 10 to find the range for a loop function.

max_pages<-round((march_madness_query$response.meta.hits[1] / 10)-1) 
max_pages

## [1] 16

Looped through the response pages and combined the results

responses <- list()
for(i in 0:max_pages){
  madness.search <- data.frame(fromJSON(paste0(madness_url, api.key, "&page=", i), flatten = TRUE))%>%
               select(response.docs.web_url, response.docs.source, response.docs.word_count,
                      response.docs.new_desk,response.docs.type_of_material, response.docs.pub_date)%>%
                      rename(url = response.docs.web_url,source = response.docs.source,
                             word_count = response.docs.word_count, news_desk = response.docs.new_desk, 
                             material_type = response.docs.type_of_material, publish_date = response.docs.pub_date)
  responses[[i+1]] <- madness.search 
  Sys.sleep(1) 
}
combined_responses <- rbind_pages(responses)

kable(head(combined_responses,20),  caption = "March Madness Articles")

March Madness Articles
url	source	word_count	news_desk	material_type	publish_date
https://www.nytimes.com/2018/03/13/learning/march-madness.html	The New York Times	108	Learning	News	2018-03-13T07:00:01+0000
https://www.nytimes.com/2018/03/27/opinion/poetry-march-madness-basketball.html	The New York Times	926	OpEd	Op-Ed	2018-03-28T00:05:01+0000
https://www.nytimes.com/aponline/2018/03/21/sports/ncaabasketball/ap-bkc-ncaa-tournament-regional-guide.html	AP	737	None	News	2018-03-21T18:48:29+0000
https://www.nytimes.com/aponline/2018/03/19/sports/ncaabasketball/ap-bkc-ncaa-amped-up-madness.html	AP	1043	None	News	2018-03-19T07:09:55+0000
https://www.nytimes.com/2018/03/15/briefing/pennsylvania-elizabeth-holmes-march-madness.html	The New York Times	1191	NYTNow	briefing	2018-03-15T09:33:51+0000
https://www.nytimes.com/2018/03/15/arts/television/whats-on-tv-thursday-march-madness-and-how-to-get-away-with-murder.html	The New York Times	480	Culture	Schedule	2018-03-15T05:00:05+0000
https://www.nytimes.com/2018/03/17/sports/32-teams-7-days-1-gym.html	The New York Times	1637	Sports	News	2018-03-17T20:49:48+0000
https://www.nytimes.com/2018/03/22/learning/student-walkouts-march-madness-and-risk-enhanced-playgrounds-our-favorite-student-comments-of-the-week.html	The New York Times	6452	Learning	News	2018-03-22T19:53:13+0000
https://www.nytimes.com/2018/03/12/sports/march-madness-predictions-experts.html	The New York Times	1393	Sports	News	2018-03-12T13:21:38+0000
https://www.nytimes.com/interactive/2018/03/20/learning/20StudentNewsQuiz-Hawking-Basketball-Protests.html	The New York Times	0	The Learning Network	Interactive Feature	2018-03-20T11:41:10+0000
https://www.nytimes.com/2018/03/12/sports/ncaabasketball/bracket-myths.html	The New York Times	1297	Sports	News	2018-03-12T07:00:09+0000
https://www.nytimes.com/2018/03/16/sports/march-madness-ncaa-tournament.html	The New York Times	1875	Sports	News	2018-03-16T13:15:16+0000
https://www.nytimes.com/aponline/2018/03/25/sports/ncaabasketball/ap-bkc-ap-sports-special-events-podcast.html	AP	153	None	News	2018-03-25T12:59:54+0000
https://www.nytimes.com/aponline/2018/03/23/sports/ncaabasketball/ap-bkc-ap-special-event-podcast.html	AP	120	None	News	2018-03-23T15:47:18+0000
https://www.nytimes.com/aponline/2018/03/22/sports/ncaabasketball/ap-bkc-ap-special-events-podcast.html	AP	107	None	News	2018-03-22T14:21:03+0000
https://www.nytimes.com/2018/03/14/nyregion/hoop-dreams-deferred.html	The New York Times	1112	Metropolitan	News	2018-03-14T21:54:27+0000
https://www.nytimes.com/aponline/2018/03/19/us/ap-bkc-ncaa-umbc-fairytale-ends.html	AP	772	None	News	2018-03-19T06:18:38+0000
https://www.nytimes.com/aponline/2018/03/19/sports/ncaabasketball/ap-bkc-ncaa-regional-reset.html	AP	836	None	News	2018-03-19T18:09:24+0000
https://www.nytimes.com/2018/03/12/sports/ncaa-snubs-tournament.html	The New York Times	451	Sports	News	2018-03-12T13:04:55+0000
https://www.nytimes.com/2018/03/22/sports/basketball/loyola-sister-jean-nun-ncaa-tournament.html	The New York Times	1111	Sports	News	2018-03-22T18:00:12+0000

Found which news publishing desks publish the most march madness articles. Unsurprisingly, the sports desks published the most articles

combined_responses %>% 
  group_by(news_desk) %>%
  summarize(count=n()) %>%
  filter(news_desk != "None")%>%
  mutate(percent = (count / sum(count))*100) %>%
  ggplot() +
  geom_bar(aes(y=percent, x=reorder(news_desk, -percent), fill= "tomato3"), stat = "identity") + coord_flip()+
  labs(x='News Desk', 
       y='Percent Total',
       title="Publishing Desk", 
       caption="Source: New York Times API") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

Explored the distribution of the publishing date. It is very interesting to see the bulk of the articles came out toward the of the month.

combined_responses %>%
  mutate(day=gsub("T.*","",publish_date)) %>%
  group_by(day) %>%
  summarise(count=n()) %>%
  ggplot() +
  geom_bar(aes(x=reorder(day, -count), y=count), fill= "yellow", stat="identity") + coord_flip()+
  labs(x='Publishing Date', 
       y='Count',
       title="Publishing Date", 
       caption="Source: New York Times API") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

Explored the distribution of the sources. I found it very interesting that the associated press sourced more of the content than the NY Times.

combined_responses %>%
  group_by(source) %>%
  summarise(count=n()) %>%
  ggplot() +
  geom_bar(aes(x=reorder(source, -count), y=count), fill= "blue", stat="identity") + coord_flip()+
  labs(x='Source', 
       y='Count',
       title="Article Source", 
       caption="Source: New York Times API") + 
  theme(axis.text.x = element_text(angle=65, vjust=0.6))

Week 9 Homework-Data 607

Meaghan Burke

March 27, 2018

New York Times API