library(tidyverse)
library(kableExtra)
library(httr)
library(jsonlite)
library(rmongodb)
To reproduce this process, you will need to choose “Knit with Parameters…” and provide value for the [nytimes] api_key parameter.
Below are the steps that illustrate the migration process
The function, mongoimport.df will import a given data frame [table] structure into MongoDB by converting each row into a JSON document and inserting the document into the database.
Parameter List:
1. mongo - connection to MongoDB
2. ns - namespace "<db name>.<collection>"
3. df - data frame
mongoimport.df <- function(mongo, ns, df) {
for (i in c(1:dim(df)[1])) {
json <- toJSON(as.list(df[i,]))
b <- mongo.bson.from.JSON(json)
mongo.insert(mongo, ns, b)
}
}
Such function can be used for the purpose of migrating from a relational database, by first running a SQL and getting results into a data frame. The data frame can then be given as a parameter to the mongoimport.df function to be imported into MongoDB. For this assignment, however, instead of using a relational database, I decided to reuse the function which I built to query NYTimes API for travel related articles and deliver the search results into a data frame. This illustrates function reusability and being agnostic as to where the data is coming from, by coding to a common data structure, such as data frame.
The function will search Travel section and news desk for articles matching given query term word(s), such as a country name, for example. It is designed to return a data frame with 4 columns:
1. headline
2. publication date
3. web url
4. snippet
nytimes.articleSearch.on.travel <- function(api_key = NA, qryTermTravelPlace = "New York", begin_date = "yyyymmdd", end_date = "yyyymmdd") {
baseUrl <- "https://api.nytimes.com/svc/search/v2/articlesearch.json"
baseUrlParam <- URLencode(
sprintf("?fq=section_name:(\"Travel\") OR news_desk:(\"Travel\")&fl=%s&api-key=%s",
"web_url,snippet,headline,pub_date,print_page", api_key))
qryTermParam <- URLencode(sprintf("&q=%s", qryTermTravelPlace))
qryBeginDate <- ""
if (begin_date != "yyyymmdd" && !is.na(begin_date)) {
qryBeginDate <- sprintf("&begin_date=%s", begin_date)
}
qryEndDate <- ""
if (end_date != "yyyymmdd" && !is.na(end_date)) {
qryEndDate <- sprintf("&end_date=%s", end_date)
}
qryResult <- GET(paste0(baseUrl, baseUrlParam, qryTermParam, qryBeginDate, qryEndDate))
df.content <- fromJSON(content(qryResult, "text"))
df.on.Travel <- data.frame(
headline = df.content[["response"]][["docs"]][["headline"]][["main"]],
pub_date = df.content[["response"]][["docs"]][["pub_date"]],
web_url = df.content[["response"]][["docs"]][["web_url"]],
snippet = df.content[["response"]][["docs"]][["snippet"]],
stringsAsFactors = FALSE
)
return(df.on.Travel)
}
df.nytimes.Travel <- nytimes.articleSearch.on.travel(api.key, "Costa Rica") %>% arrange(desc(pub_date))
kable_styling(knitr::kable(df.nytimes.Travel, "html", caption = "Travel Search Results for Costa Rica"), bootstrap_options = "striped")
| headline | pub_date | web_url | snippet |
|---|---|---|---|
| Sustainable Travel: Its Not Just About the Environment | 2018-04-13T09:00:12+0000 | https://www.nytimes.com/2018/04/13/travel/sustainable-travel.html | A look at tours and programs that address the impact travelers have on the communities they visit. |
| Place 7 of 52: In Peru, a Still-Hidden Alternative to Machu Picchu | 2018-04-03T09:00:06+0000 | https://www.nytimes.com/2018/04/03/travel/kuelap-peru-ruins-52-places.html | The pre-Incan ruins of Kuélap share similarities with their more famous cousin. But getting there can be a herculean effort. |
| Place 6 of 52: On the Costa Rican Coast, Finding Fun by Escaping Exclusivity | 2018-03-27T09:00:08+0000 | https://www.nytimes.com/2018/03/27/travel/peninsula-papagayo-costa-rica-52-places.html | Peninsula Papagayo is where the ultrarich go to avoid having to interact with the regular rich. But our 52 Places columnist finds a way to get away from the traps of luxury. |
| Celebrating International Womens Day With Free Hotel Amenities | 2018-02-27T10:00:43+0000 | https://www.nytimes.com/2018/02/27/travel/international-womens-day-hotel-deals.html | A range of activities that allow female travelers to connect are being offered free of charge at hotels in celebration of International Womens Day. |
| After a Tragedy, Making the Case for Costa Rica | 2018-01-05T20:02:42+0000 | https://www.nytimes.com/2018/01/05/travel/costa-rica-crash-ecotravel.html | The tragedy of the crash is particularly difficult to reconcile with the remarkable beauty of Costa Rica a place I wouldnt hesitate to return to. |
| After a Plane Crash, Questions About Travel in Costa Rica | 2018-01-04T18:50:09+0000 | https://www.nytimes.com/2018/01/04/travel/costa-rica-air-travel-safety.html | The country is a popular ecotourism destination for Americans but the fatal crash of a charter flight has prompted inquiries about transportation alternatives. |
| In the Rainy Season, Savings in Asia and Central America | 2017-09-06T13:59:10+0000 | https://www.nytimes.com/2017/09/06/travel/rainy-season-savings.html | Resorts in Cambodia, Thailand, Costa Rica and Guatemala are offering packages and lower prices in what is sometimes called the green season. |
| On the Costa Rican Coast, Finding Pura Vida on a Budget | 2017-05-31T09:00:35+0000 | https://www.nytimes.com/2017/05/31/travel/costa-rica-eco-travel-budget-drake-bay.html | In Drake Bay, its possible to truly disconnect. Better yet, I was able to relish this enjoyable tropical getaway for a very reasonable sum. |
| Five Ways to be a Savvy Medical Tourist and Enjoy a Vacation | 2017-03-08T10:00:21+0000 | https://www.nytimes.com/2017/03/08/travel/five-ways-to-be-a-medical-dental-tourist-vacation.html | When costly dental work became unavoidable, the Frugal Family decided to get it done abroad and let the savings pay for a tropical week in Thailand. |
| In Costa Rica, Photographing Jaguars to Help Save Them | 2016-08-11T10:00:04+0000 | https://www.nytimes.com/2016/08/14/travel/costa-rica-eco-tourism.html | Conservationists and eco-lodges in Costa Rica are using camera “traps” to record jaguars and assess the environment. |
Establish connection to MongoDB
mongo <- mongo.create(host = "localhost")
# test if connection is active
mongo.is.connected(mongo)
## [1] TRUE
Import the data
ns <- "nytimes.travel"
mongoimport.df(mongo, ns, df.nytimes.Travel)
Inspect Imported Documents using the MongoDB Compass (admin tool)
Advantages and Disadvantages of SQL vs. NoSQL databases
. SQL Databases enforce ACID (Atomicity, Consistency, Isolation and Durability) compliance
. NoSQL focus on better response time, scale (availability and performance)
. SQL can be an easier syntax to work with rather than JSON, Java based and etc. for NoSQL
. NoSQL provides more flexibility in terms of structure and so development can be faster
. SQL DBs help design and enforce “normalized” data
NYTimes Attribution Requirement
*The logo links directly to http://developer.nytimes.com*