For Project 4, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB, Neo4j (which we introduce in Week 12), or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
I watched the videos that were posted on how to import data into MongoDB using mongoimport. It took several iterations but was able to connect, the one additional step that was required was creating a folder c:/data/db as this is the default folder for MongoDB. Once this folder was created, the permissions also needed to be updated. The default permissions did not allow full control.
I also did some research and found an rmongodb tutorial, the steps can be found below for the flights database.
Load libraries
library(devtools)
## Warning: package 'devtools' was built under R version 3.2.5
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.4
library(RCurl)
## Loading required package: bitops
library(rjson)
library(rmongodb)
Connect to MongoDB
mongo = mongo.create(host = "localhost")
mongo.is.connected(mongo)
## [1] TRUE
What’s in MongoDB
mongo.get.databases(mongo)
## [1] "flights"
mongo.get.database.collections(mongo, db = "flights")
## character(0)
DBNS = "flights.airports"
mongo.count(mongo, ns = DBNS)
## [1] 1397
Query the data
#tmp = mongo.find.one(mongo, ns = "flights.aiports")
#tmp
#tmp = mongo.bson.to.list(tmp)
#class(tmp)
#names(tmp)
#tmp
#find_all <- mongo.find.all(mongo, ns=DBNS)
#nrow(find_all)
Create a dataframe
airports = data.frame(strinASFactors=FALSE)
Create the namespace
"DBNS = flights.airports"
## [1] "DBNS = flights.airports"
Replicate select* (from SQL) by creating a cursor to iterate over
cursor = mongo.find(mongo, DBNS)
Create the counter
i=1
Iterate over the cursor: 1.Iterate and grab the next record 2.Make it a dataframe 3.Bind to the master dataframe
while (mongo.cursor.next(cursor)) {
tmp = mongo.bson.to.list(mongo.cursor.value(cursor))
tmp.df = as.data.frame(t(unlist(tmp)), stringAsFactors=FALSE)
airports = rbind.fill(airports, tmp.df)
}
Check to see what we have
dim(airports)
## [1] 1398 9
str(airports)
## 'data.frame': 1398 obs. of 9 variables:
## $ strinASFactors: logi FALSE NA NA NA NA NA ...
## $ _id : Factor w/ 159 levels "74905224","-10",..: NA 1 2 3 4 3 5 6 7 8 ...
## $ faa : Factor w/ 1396 levels "06C","06A","04G",..: NA 1 2 3 4 5 6 7 8 9 ...
## $ name : Factor w/ 1381 levels "Schaumburg Regional",..: NA 1 2 3 4 5 6 7 8 9 ...
## $ lat : Factor w/ 1395 levels "41.9893408","32.4605722",..: NA 1 2 3 4 5 6 7 8 9 ...
## $ lon : Factor w/ 1397 levels "-88.1012428",..: NA 1 2 3 4 5 6 7 8 9 ...
## $ alt : Factor w/ 889 levels "801","264","1044",..: NA 1 2 3 4 5 6 7 8 9 ...
## $ tz : Factor w/ 12 levels "-6","-5","-4",..: NA 1 2 2 3 2 2 2 4 2 ...
## $ dst : Factor w/ 3 levels "A","U","N": NA 1 1 1 1 1 2 1 1 1 ...