For this weeks assignment, we will be working primarily with API’s. As an avid fan of movies, I of course picked the Movie Reviews API:

Movie API Link

As the API’s all appear to utilize a URI, it will be interesting to work with. There are 4 basic URI’s that are available to us, but the 2 that I will be using for this project are ass follows:

As these URIs use a GET request, the primary tool that we can use for this project is the JSONlite package. This can pull the created URL directly using the fromJSON function.

library(jsonlite)
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:utils':
## 
##     View
library(stringr)
library(tidyr)

From here, the easiest way to pull information would be to “build” a URI using the above format. Here is a sample URI that we pulled using the search key “The Martian”

url<- "http://api.nytimes.com/svc/movies/v2/reviews/search.json?query='The+Martian'&api-key=e2d581c1a5059550bc8711ca7e9bc86a:17:73348668"

url_data <- fromJSON(url)
data <- url_data$results
colnames(data)
##  [1] "nyt_movie_id"     "display_title"    "sort_name"       
##  [4] "mpaa_rating"      "critics_pick"     "thousand_best"   
##  [7] "byline"           "headline"         "capsule_review"  
## [10] "summary_short"    "publication_date" "opening_date"    
## [13] "dvd_release_date" "date_updated"     "seo_name"        
## [16] "link"             "related_urls"     "multimedia"

This shows just a generic pull and the data that it extracts, and the headers that it creates.

Though the assignment this week appeared to be rather open ended (and truthfully I was a little preturbed by it) I decided to have a little fun, and rather then pull a vast dataset, I create a simple and generic keyword search for this particular API. Furthermore, the data for this particular API is not very useful statstically (I don’t really read the NY Times, but I assumed the reviewers gave a “rating”, but it turns out they simple list it as a “crtics-pick” or not) I looked the column data, and found the only data I would be particular interested in a query would be The Movie Title, the MPAA rating, critics pick, top 1000s, opening date, and the dvd release date. So, I created a simple function:

Keyword_Search <- function(keyword){
  keyword <-unlist(strsplit(toString(keyword)," "))
  #Loop to create a standard "New_Key" function in the required URL format 
  new_key <- character(length =0)
  for(i in 1:length(keyword)){
    new_key <- paste0(new_key,keyword[i],"+")
  }
  str_sub(new_key, -1,-1) <- ""
  #Cut out URL to for the Query '' left to make Query more restrictive
  URI_1 <-"http://api.nytimes.com/svc/movies/v2/reviews/search.json?query='"
  URI_key <-"'&api-key=e2d581c1a5059550bc8711ca7e9bc86a:17:73348668" 
  
  # Combining the separate sets into one URL and extracting JSON
  data<- fromJSON(paste0(URI_1,new_key,URI_key))
  data_frame<-data$results

  #Cleaning
  names(data_frame) <- c("a","Movie","c","MPAA_Rating", "Critics_Pick","Thousand_Best","d","e","f","g","h","Opening_Date","Dvd_Release","i","j","k","l","m")
  
  data_frame$a <- NULL
  data_frame$c <- NULL
  data_frame$d <- NULL
  data_frame$e <- NULL
  data_frame$f <- NULL
  data_frame$g <- NULL
  data_frame$h <- NULL
  data_frame$i <- NULL
  data_frame$j <- NULL
  data_frame$k <- NULL
  data_frame$l <- NULL
  data_frame$m <- NULL
  data_frame
}
Keyword_Search("Apollo")
##                 Movie MPAA_Rating Critics_Pick Thousand_Best Opening_Date
## 1           Apollo 18        PG13            0             0   2011-09-02
## 2  House of Pleasures        <NA>            0             0   2011-11-25
## 3           Apollo 13          PG            1             1   1995-06-30
## 4         Purple Rain           R            0             0   1984-07-27
## 5 Broadway Danny Rose        <NA>            1             0   1984-01-01
##   Dvd_Release
## 1        <NA>
## 2        <NA>
## 3  2006-08-22
## 4        <NA>
## 5  2001-11-06
Keyword_Search("Saving Private Ryan")
##                              Movie MPAA_Rating Critics_Pick Thousand_Best
## 1              Saving Private Ryan           R            1             1
## 2             Saving Private Perez        PG13            0             0
## 3                   Private School           R            0             0
## 4                    Woman in Gold        PG13            0             0
## 5                       Goosebumps          PG            0             0
## 6                Mississippi Grind           R            1             0
## 7             The Young Kieslowski           R            0             0
## 8        Jack Ryan: Shadow Recruit        PG13            0             0
## 9  Kirk Cameron's Saving Christmas          PG            0             0
## 10   The Admiral: Roaring Currents        <NA>            0             0
## 11       Cabin Fever: Patient Zero          NR            0             0
## 12                Reasonable Doubt           R            0             0
## 13               Jackie &amp; Ryan        PG13            0             0
## 14                      Catch Hell                        0             0
## 15                      Breathe In           R            0             0
## 16                    Devil's Knot          NR            0             0
## 17                     Escape Plan           R            0             0
## 18                        R.I.P.D.        PG13            0             0
## 19                      Code Black          NR            1             0
## 20                  Good Ol' Freda          PG            0             0
##    Opening_Date Dvd_Release
## 1    1998-07-24  1999-11-02
## 2    2011-09-02        <NA>
## 3    1983-07-29        <NA>
## 4    2015-04-03        <NA>
## 5    2015-10-16        <NA>
## 6    2015-09-25  2015-08-18
## 7    2015-07-24        <NA>
## 8    2014-01-17        <NA>
## 9    2014-11-14        <NA>
## 10   2014-08-15        <NA>
## 11   2014-08-01        <NA>
## 12   2014-01-17        <NA>
## 13   2015-07-03        <NA>
## 14   2014-10-10        <NA>
## 15   2013-03-28        <NA>
## 16   2014-05-09        <NA>
## 17   2013-10-18        <NA>
## 18   2013-07-19        <NA>
## 19   2014-06-20  2015-02-24
## 20   2013-09-13        <NA>

The Query of the API is not the greatest, appreantly it uses an index, and also the “or” function, so any search with multiple words produces multiple results. As it doesn’t search exact matches, it tends to pull more results than needed. Fortunately, it does find the most likely matches and list them first.

Anyway, that is a simple code, I didn’t put in any errors or stops for the function (ie. if you put in a non string value). Those will all get an automated error anyway from R console, so they seemed surpurflous.