MSDS Spring 2018

DATA 607 Data Aquisition and Management

Jiadi Li

Week 9 Assignment:Web APIs

Choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.

New York Times web site provides a rich set of APIs:http://developer.nytimes.com/docs

The Geographic API

Base URL:
http://api.nytimes.com/svc/semantic/v2/geocodes

Scope:
The New York Times controlled vocabulary (over 2000 places used to classify New York Times articles metadata) and New York Times articles from 1981 to today (excludes wire services such as the Associated Press)

The general form for a Geographic API request by concept type and specific concept:
http://api.nytimes.com/svc/semantic/v2/geocodes/query.json?(query parameters)&api-key=your-API-key

  1. install packages
library(httr)
## Warning: package 'httr' was built under R version 3.4.4
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.4.4
library(RCurl)
## Loading required package: bitops
  1. Load Data from NYT API
API_key <- '562c28bceeaf4a43a251d7a6dddce779' #API key required to manipulate NYT API
query_parameters <- 'country_code=US'

API_geo_url <- paste0('http://api.nytimes.com/svc/semantic/v2/geocodes/query.json?',query_parameters,'&api-key=',API_key) #create URL for JSON file

#put all information gathered before and generate a data frame
geo.data <- flatten(fromJSON(API_geo_url)$result,recursive = TRUE)
head(geo.data)
##   concept_id                    concept_name geocode_id geoname_id
## 1      24012            Charlottesville (Va)       2840    4752031
## 2      28132               Philadelphia (Pa)        436    4560349
## 3      28848 San Juan National Forest (Colo)       7240    5437675
## 4      27356                Nantucket (Mass)       1312    4944903
## 5      71052                   Yamhill (Ore)       8680    5761959
## 6      27744                      Ohio River       3916    4401696
##                       name latitude  longitude elevation population
## 1          Charlottesville 38.02931  -78.47668       142      34703
## 2             Philadelphia 39.95233  -75.16379        12    1517550
## 3 San Juan National Forest 37.69166 -107.80895      3472         NA
## 4                Nantucket 41.28346  -70.09946        13      14775
## 5                  Yamhill 45.34150 -123.18733        60       1024
## 6               Ohio River 36.98672  -89.13062        87         NA
##   country_code  country_name admin_code1 admin_code2 admin_code3
## 1           US United States          VA         540          NA
## 2           US United States          PA         101          NA
## 3           US United States          CO         111          NA
## 4           US United States          MA         019          NA
## 5           US United States          OR         071          NA
## 6           US United States          MO         133          NA
##   admin_code4   admin_name1             admin_name2 admin_name3
## 1          NA      Virginia City of Charlottesville          NA
## 2          NA  Pennsylvania     Philadelphia County          NA
## 3          NA      Colorado         San Juan County          NA
## 4          NA Massachusetts        Nantucket County          NA
## 5          NA        Oregon          Yamhill County          NA
## 6          NA      Missouri      Mississippi County          NA
##   admin_name4 feature_class feature_code feature_code_name
## 1          NA             P          PPL   populated place
## 2          NA             P          PPL   populated place
## 3          NA             V         FRST         forest(s)
## 4          NA             P          PPL   populated place
## 5          NA             P          PPL   populated place
## 6          NA             H          STM            stream
##           time_zone_id dst_offset gmt_offset          geocodes_created
## 1     America/New_York         -4         -5 2013-02-25 15:10:12-05:00
## 2     America/New_York         -4         -5 2013-02-25 15:10:12-05:00
## 3     America/Shiprock         -6         -7 2013-02-25 15:10:12-05:00
## 4     America/New_York         -4         -5 2013-02-25 15:10:12-05:00
## 5  America/Los_Angeles         -7         -8 2013-02-25 15:10:12-05:00
## 6 America/Indiana/Knox         -5         -6 2013-02-25 15:10:12-05:00
##            geocodes_updated
## 1 2013-02-25 15:10:12-05:00
## 2 2013-02-25 15:10:12-05:00
## 3 2013-02-25 15:10:12-05:00
## 4 2013-02-25 15:10:12-05:00
## 5 2013-02-25 15:10:12-05:00
## 6 2013-02-25 15:10:12-05:00