library(tidyverse)
library(kableExtra)
library(httr)
library(jsonlite)
library(rmongodb)

For this assignment I chose to use MongoDB for NoSQL data migration.

To reproduce this process, you will need to choose “Knit with Parameters…” and provide value for the [nytimes] api_key parameter.

Below are the steps that illustrate the migration process


Create a function to be used for data migration into MongoDB database

The function, mongoimport.df will import a given data frame [table] structure into MongoDB by converting each row into a JSON document and inserting the document into the database.
Parameter List:
1. mongo - connection to MongoDB
2. ns - namespace "<db name>.<collection>"
3. df - data frame

mongoimport.df <- function(mongo, ns, df) {
  for (i in c(1:dim(df)[1])) {
    json <- toJSON(as.list(df[i,]))
    b <- mongo.bson.from.JSON(json)
    mongo.insert(mongo, ns, b)
  }
}

Such function can be used for the purpose of migrating from a relational database, by first running a SQL and getting results into a data frame. The data frame can then be given as a parameter to the mongoimport.df function to be imported into MongoDB. For this assignment, however, instead of using a relational database, I decided to reuse the function which I built to query NYTimes API for travel related articles and deliver the search results into a data frame. This illustrates function reusability and being agnostic as to where the data is coming from, by coding to a common data structure, such as data frame.


Create an R wrapper funciton for NYTimes Article Search API

The function will search Travel section and news desk for articles matching given query term word(s), such as a country name, for example. It is designed to return a data frame with 4 columns:
1. headline
2. publication date
3. web url
4. snippet

nytimes.articleSearch.on.travel <- function(api_key = NA, qryTermTravelPlace = "New York", begin_date = "yyyymmdd", end_date = "yyyymmdd") {
  baseUrl <- "https://api.nytimes.com/svc/search/v2/articlesearch.json"
  
  baseUrlParam <- URLencode(
    sprintf("?fq=section_name:(\"Travel\") OR news_desk:(\"Travel\")&fl=%s&api-key=%s",
            "web_url,snippet,headline,pub_date,print_page", api_key))
  
  qryTermParam <- URLencode(sprintf("&q=%s", qryTermTravelPlace))
  
  qryBeginDate <- ""
  if (begin_date != "yyyymmdd" && !is.na(begin_date)) {
    qryBeginDate <- sprintf("&begin_date=%s", begin_date)
  }
  
  qryEndDate <- ""
  if (end_date != "yyyymmdd" && !is.na(end_date)) {
    qryEndDate <- sprintf("&end_date=%s", end_date)
  }
  
  qryResult <- GET(paste0(baseUrl, baseUrlParam, qryTermParam, qryBeginDate, qryEndDate))

  df.content <- fromJSON(content(qryResult, "text"))
  
  df.on.Travel <- data.frame(
    headline = df.content[["response"]][["docs"]][["headline"]][["main"]],
    pub_date = df.content[["response"]][["docs"]][["pub_date"]],
    web_url = df.content[["response"]][["docs"]][["web_url"]],
    snippet = df.content[["response"]][["docs"]][["snippet"]],
    stringsAsFactors = FALSE
  )
  
  return(df.on.Travel)
}

Retrieve the travel search data using the NYTimes API and import the results into MongoDB

df.nytimes.Travel <- nytimes.articleSearch.on.travel(api.key, "Costa Rica") %>% arrange(desc(pub_date))
kable_styling(knitr::kable(df.nytimes.Travel, "html", caption = "Travel Search Results for Costa Rica"), bootstrap_options = "striped")
Travel Search Results for Costa Rica
headline pub_date web_url snippet
Sustainable Travel: It’s Not Just About the Environment 2018-04-13T09:00:12+0000 https://www.nytimes.com/2018/04/13/travel/sustainable-travel.html A look at tours and programs that address the impact travelers have on the communities they visit.
Place 7 of 52: In Peru, a Still-Hidden Alternative to Machu Picchu 2018-04-03T09:00:06+0000 https://www.nytimes.com/2018/04/03/travel/kuelap-peru-ruins-52-places.html The pre-Incan ruins of Kuélap share similarities with their more famous cousin. But getting there can be a herculean effort.
Place 6 of 52: On the Costa Rican Coast, Finding Fun by Escaping Exclusivity 2018-03-27T09:00:08+0000 https://www.nytimes.com/2018/03/27/travel/peninsula-papagayo-costa-rica-52-places.html Peninsula Papagayo is where the ultrarich go to avoid having to interact with the regular rich. But our 52 Places columnist finds a way to get away from the traps of luxury.
Celebrating International Women’s Day With Free Hotel Amenities 2018-02-27T10:00:43+0000 https://www.nytimes.com/2018/02/27/travel/international-womens-day-hotel-deals.html A range of activities that allow female travelers to connect are being offered free of charge at hotels in celebration of International Women’s Day.
After a Tragedy, Making the Case for Costa Rica 2018-01-05T20:02:42+0000 https://www.nytimes.com/2018/01/05/travel/costa-rica-crash-ecotravel.html The tragedy of the crash is particularly difficult to reconcile with the remarkable beauty of Costa Rica — a place I wouldn’t hesitate to return to.
After a Plane Crash, Questions About Travel in Costa Rica 2018-01-04T18:50:09+0000 https://www.nytimes.com/2018/01/04/travel/costa-rica-air-travel-safety.html The country is a popular ecotourism destination for Americans — but the fatal crash of a charter flight has prompted inquiries about transportation alternatives.
In the Rainy Season, Savings in Asia and Central America 2017-09-06T13:59:10+0000 https://www.nytimes.com/2017/09/06/travel/rainy-season-savings.html Resorts in Cambodia, Thailand, Costa Rica and Guatemala are offering packages and lower prices in what is sometimes called the green season.
On the Costa Rican Coast, Finding Pura Vida on a Budget 2017-05-31T09:00:35+0000 https://www.nytimes.com/2017/05/31/travel/costa-rica-eco-travel-budget-drake-bay.html In Drake Bay, it’s possible to truly disconnect. Better yet, I was able to relish this enjoyable tropical getaway for a very reasonable sum.
Five Ways to be a Savvy Medical Tourist and Enjoy a Vacation 2017-03-08T10:00:21+0000 https://www.nytimes.com/2017/03/08/travel/five-ways-to-be-a-medical-dental-tourist-vacation.html When costly dental work became unavoidable, the Frugal Family decided to get it done abroad and let the savings pay for a tropical week in Thailand.
In Costa Rica, Photographing Jaguars to Help Save Them 2016-08-11T10:00:04+0000 https://www.nytimes.com/2016/08/14/travel/costa-rica-eco-tourism.html Conservationists and eco-lodges in Costa Rica are using camera “traps” to record jaguars and assess the environment.

Establish connection to MongoDB

mongo <- mongo.create(host = "localhost")
# test if connection is active
mongo.is.connected(mongo)
## [1] TRUE

Import the data

ns <- "nytimes.travel"
mongoimport.df(mongo, ns, df.nytimes.Travel)

Inspect Imported Documents using the MongoDB Compass (admin tool)


Conclusion

Advantages and Disadvantages of SQL vs. NoSQL databases
. SQL Databases enforce ACID (Atomicity, Consistency, Isolation and Durability) compliance
. NoSQL focus on better response time, scale (availability and performance)
. SQL can be an easier syntax to work with rather than JSON, Java based and etc. for NoSQL
. NoSQL provides more flexibility in terms of structure and so development can be faster
. SQL DBs help design and enforce “normalized” data


NYTimes Attribution Requirement
*The logo links directly to http://developer.nytimes.com*