Week9Assignment

Assignment Instruction: The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs. You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.

First step of this assignment is to obtain a API key from NYT website. I am choosing the Books API. This API provides information about book reviews and The New York Times bestsellers lists.

#I need to install the necessary packages into this markdown file. 
#install.packages("jsonlite")
#install.packages("RCurl")
install.packages('curl', repos = "http://cran.us.r-project.org" )

## Installing package into 'C:/Users/blin261/Documents/R/win-library/3.3'
## (as 'lib' is unspecified)

## package 'curl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\blin261\AppData\Local\Temp\RtmpigP3NU\downloaded_packages

library(jsonlite)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

According to the “README” file on the website, to retrieve best-seller lists, people can use this URL structure.

“http://api.nytimes.com/svc/books/{version}/lists/[date/]{list-name}[.response_format]?[optional-param1=value1]&[…]&api-key={your-API-key}”. Curly braces {} indicate required items. Square brackets [] indicat optional items.

README file tells us the version should be v3, response format is .json. However, there is no list-name we can find. Therefore, I was trying to utilize the API for best seller list name to get the type of books that is stored in the Book API.
I used the function fromJSON from jsonlite to read the content of Book API and converts it to R data frames.

url <- "http://api.nytimes.com/svc/books/v3/lists/names.json?api-key=b3b20ef4ff3a4a0ca48764fd9d73e0e5"
json_data <- fromJSON(url)
str(json_data)

## List of 4
##  $ status     : chr "OK"
##  $ copyright  : chr "Copyright (c) 2016 The New York Times Company.  All Rights Reserved."
##  $ num_results: int 53
##  $ results    :'data.frame': 53 obs. of  6 variables:
##   ..$ list_name            : chr [1:53] "Combined Print and E-Book Fiction" "Combined Print and E-Book Nonfiction" "Hardcover Fiction" "Hardcover Nonfiction" ...
##   ..$ display_name         : chr [1:53] "Combined Print & E-Book Fiction" "Combined Print & E-Book Nonfiction" "Hardcover Fiction" "Hardcover Nonfiction" ...
##   ..$ list_name_encoded    : chr [1:53] "combined-print-and-e-book-fiction" "combined-print-and-e-book-nonfiction" "hardcover-fiction" "hardcover-nonfiction" ...
##   ..$ oldest_published_date: chr [1:53] "2011-02-13" "2011-02-13" "2008-06-08" "2008-06-08" ...
##   ..$ newest_published_date: chr [1:53] "2016-11-06" "2016-11-06" "2016-11-06" "2016-11-06" ...
##   ..$ updated              : chr [1:53] "WEEKLY" "WEEKLY" "WEEKLY" "WEEKLY" ...

From the code above, we realize the column results contains the most important information about the best selling books. In addition, it tells us the list name for each book is store in the list_name variable. I picked the type of book that I feel interesting, which is Hardcover Business Books.

raw_data <- json_data$results
colnames(raw_data)

## [1] "list_name"             "display_name"          "list_name_encoded"    
## [4] "oldest_published_date" "newest_published_date" "updated"

raw_data$list_name

##  [1] "Combined Print and E-Book Fiction"   
##  [2] "Combined Print and E-Book Nonfiction"
##  [3] "Hardcover Fiction"                   
##  [4] "Hardcover Nonfiction"                
##  [5] "Trade Fiction Paperback"             
##  [6] "Mass Market Paperback"               
##  [7] "Paperback Nonfiction"                
##  [8] "E-Book Fiction"                      
##  [9] "E-Book Nonfiction"                   
## [10] "Hardcover Advice"                    
## [11] "Paperback Advice"                    
## [12] "Advice How-To and Miscellaneous"     
## [13] "Chapter Books"                       
## [14] "Childrens Middle Grade"              
## [15] "Childrens Middle Grade E-Book"       
## [16] "Childrens Middle Grade Hardcover"    
## [17] "Childrens Middle Grade Paperback"    
## [18] "Paperback Books"                     
## [19] "Picture Books"                       
## [20] "Series Books"                        
## [21] "Young Adult"                         
## [22] "Young Adult E-Book"                  
## [23] "Young Adult Hardcover"               
## [24] "Young Adult Paperback"               
## [25] "Hardcover Graphic Books"             
## [26] "Paperback Graphic Books"             
## [27] "Manga"                               
## [28] "Combined Print Fiction"              
## [29] "Combined Print Nonfiction"           
## [30] "Animals"                             
## [31] "Business Books"                      
## [32] "Celebrities"                         
## [33] "Crime and Punishment"                
## [34] "Culture"                             
## [35] "Education"                           
## [36] "Espionage"                           
## [37] "Expeditions Disasters and Adventures"
## [38] "Fashion Manners and Customs"         
## [39] "Food and Fitness"                    
## [40] "Games and Activities"                
## [41] "Hardcover Business Books"            
## [42] "Health"                              
## [43] "Humor"                               
## [44] "Indigenous Americans"                
## [45] "Relationships"                       
## [46] "Paperback Business Books"            
## [47] "Family"                              
## [48] "Hardcover Political Books"           
## [49] "Race and Civil Rights"               
## [50] "Religion Spirituality and Faith"     
## [51] "Science"                             
## [52] "Sports"                              
## [53] "Travel"

I pass the list name into the API structure again. The following code is doing pretty much the same thing to investigate into what variables that is contained in this API.

url <- "http://api.nytimes.com/svc/books/v3/lists/Hardcover-Business-Books.json?api-key=b3b20ef4ff3a4a0ca48764fd9d73e0e5"
json_data <- fromJSON(url)
raw_data <- json_data$results$books
colnames(raw_data)

##  [1] "rank"                 "rank_last_week"       "weeks_on_list"       
##  [4] "asterisk"             "dagger"               "primary_isbn10"      
##  [7] "primary_isbn13"       "publisher"            "description"         
## [10] "price"                "title"                "author"              
## [13] "contributor"          "contributor_note"     "book_image"          
## [16] "book_image_width"     "book_image_height"    "amazon_product_url"  
## [19] "age_group"            "book_review_link"     "first_chapter_link"  
## [22] "sunday_review_link"   "article_chapter_link" "isbns"               
## [25] "buy_links"

I used the dplyr package to subset the variables that is mostly correlated with best selling books. Especially the ranking for each book. Unfortuantely, NYT only allows me to achieve the top 15 books. Also, I expect there should be some non-zero values for rank_last_week and weeks_on_list, so that I can perform some analysis, but both of the two variables are 0 for all 15 books. It could have been due to the missing data here.

books <- raw_data%>%
  select(rank, rank_last_week, weeks_on_list, publisher, title, author)
books

##    rank rank_last_week weeks_on_list      publisher
## 1     1              0             0          Knopf
## 2     2              0             0   Howard Books
## 3     3              0             0 Crown Business
## 4     4              0             0  Business Plus
## 5     5              0             0    McGraw-Hill
## 6     6              0             0 Crown Business
## 7     7              0             0      Doubleday
## 8     8              0             0          Wiley
## 9     9              0             0          Wiley
## 10   10              0             0   Random House
## 11   11              0             0         Gallup
## 12   12              0             0  Thomas Nelson
## 13   13              0             0 HarperBusiness
## 14   14              0             0 Crown Business
## 15   15              0             0          Crown
##                         title
## 1                     LEAN IN
## 2             DUCK COMMANDERS
## 3            BEFORE HAPPINESS
## 4                    LOOPTAIL
## 5                  GREAT WORK
## 6         MISSION IN A BOTTLE
## 7                      SYSTEM
## 8                   YOU FIRST
## 9               VALUE OF DEBT
## 10         THE POWER OF HABIT
## 11 STRENGTHS-BASED LEADERSHIP
## 12       TOTAL MONEY MAKEOVER
## 13        WINNING FROM WITHIN
## 14               LEAN STARTUP
## 15            4-HOUR WORKWEEK
##                                            author
## 1               Sheryl Sandberg with Nell Scovell
## 2  Willie and Korie Robertson with Mark Schlabach
## 3                                     Shawn Achor
## 4                                  Bruce Poon Tip
## 5                                     David Sturt
## 6                 Seth Goldman and Barry Nalebuff
## 7                Jeff Benedict and Armen Keteyian
## 8                                     Liane Davey
## 9                               Thomas J Anderson
## 10                                 Charles Duhigg
## 11                     Tom Rath and Barry Conchie
## 12                                    Dave Ramsey
## 13                                Erica Ariel Fox
## 14                                      Eric Ries
## 15                                Timothy Ferriss

Week9Assignment

Bin Lin

2016-10-30