For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
Install packages for MySQL DB
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 3.3.3
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 3.3.2
#install.packages('rJava')
Connect to MySQL Instance
mydb <- dbConnect(MySQL(), user='root', password='meZt8pkaA', host='localhost')
dbSendQuery(mydb, "USE tb")
## <MySQLResult:341708704,0,0>
Let’s store the table in tb database into a dataframe
tb_df<-dbGetQuery(mydb,"SELECT * FROM tb;")
tb_df_10 <- head(tb_df, 10)
tb_df_10
## country year sex child adult elderly
## 1 Afghanistan 1995 female NA NA NA
## 2 Afghanistan 1995 male NA NA NA
## 3 Afghanistan 1996 female NA NA NA
## 4 Afghanistan 1996 male NA NA NA
## 5 Afghanistan 1997 female 5 96 1
## 6 Afghanistan 1997 male 0 26 0
## 7 Afghanistan 1998 female 45 1142 20
## 8 Afghanistan 1998 male 30 500 41
## 9 Afghanistan 1999 female 25 484 8
## 10 Afghanistan 1999 male 8 212 8
Let’s close and disconnect from MySQL database, and also detach the RMySQL package
dbDisconnect(mydb)
## [1] TRUE
detach("package:RMySQL", unload=TRUE)
Start the MongoDB Server on your system by running the below command in the command prompt
“C:Files.4.exe”
Once the MongoDB has started with waiting message proceed to install R packages Install Packagees
library(devtools)
## Warning: package 'devtools' was built under R version 3.3.3
install_github(repo = "mongosoup/rmongodb")
## Skipping install of 'rmongodb' from a github remote, the SHA1 (8eb2bca2) has not changed since last install.
## Use `force = TRUE` to force installation
library(rJava)
## Warning: package 'rJava' was built under R version 3.3.2
library(RMongo)
## Warning: package 'RMongo' was built under R version 3.3.3
##
## Attaching package: 'RMongo'
## The following objects are masked from 'package:DBI':
##
## dbDisconnect, dbGetQuery
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.3.3
library(stringr)
## Warning: package 'stringr' was built under R version 3.3.3
library(rmongodb)
# connect to MongoDB
mongo = mongo.create(host = "localhost",db ="tb")
#Confirm sucessful connection
mongo.is.connected(mongo)
## [1] TRUE
mongo.get.databases(mongo)
## [1] "tb"
In this section we would try to upload the dataframe into MongoDB. This assumes you have installed the MongoDB
#Convert Dataframe into BSON
tb_list=mongo.bson.from.df(tb_df)
b = mongo.bson.from.list(tb_list)
#insert BSON into MongoDB collection
mongo.insert(mongo,"tb.tb_collection1",b)
## [1] TRUE
#confirm we have data in the collection
mongo.count(mongo, "tb.tb_collection1")
## [1] 2
Query the collection that we just inserted
tmp = mongo.find.one(mongo, ns = "tb.tb_collection1")
Display some Mongo data
tb_list_10=mongo.bson.from.df(tb_df_10)
b_10 = mongo.bson.from.list(tb_list_10)
mongo.insert(mongo,"tb.tb_collection10",b_10)
## [1] TRUE
mongo.find.one(mongo, ns = "tb.tb_collection10")
## _id : 7 5906884756b76e730ddbff8b
## 1 : 3
## country : 2 Afghanistan
## year : 16 1995
## sex : 2 female
## child : 10 BSON_NULL
## adult : 10 BSON_NULL
## elderly : 10 BSON_NULL
##
## 2 : 3
## country : 2 Afghanistan
## year : 16 1995
## sex : 2 male
## child : 10 BSON_NULL
## adult : 10 BSON_NULL
## elderly : 10 BSON_NULL
##
## 3 : 3
## country : 2 Afghanistan
## year : 16 1996
## sex : 2 female
## child : 10 BSON_NULL
## adult : 10 BSON_NULL
## elderly : 10 BSON_NULL
##
## 4 : 3
## country : 2 Afghanistan
## year : 16 1996
## sex : 2 male
## child : 10 BSON_NULL
## adult : 10 BSON_NULL
## elderly : 10 BSON_NULL
##
## 5 : 3
## country : 2 Afghanistan
## year : 16 1997
## sex : 2 female
## child : 16 5
## adult : 16 96
## elderly : 16 1
##
## 6 : 3
## country : 2 Afghanistan
## year : 16 1997
## sex : 2 male
## child : 16 0
## adult : 16 26
## elderly : 16 0
##
## 7 : 3
## country : 2 Afghanistan
## year : 16 1998
## sex : 2 female
## child : 16 45
## adult : 16 1142
## elderly : 16 20
##
## 8 : 3
## country : 2 Afghanistan
## year : 16 1998
## sex : 2 male
## child : 16 30
## adult : 16 500
## elderly : 16 41
##
## 9 : 3
## country : 2 Afghanistan
## year : 16 1999
## sex : 2 female
## child : 16 25
## adult : 16 484
## elderly : 16 8
##
## 10 : 3
## country : 2 Afghanistan
## year : 16 1999
## sex : 2 male
## child : 16 8
## adult : 16 212
## elderly : 16 8
Lets confirm that the data can be queried in MongoDB
library(RMongo)
library(knitr)
query = mongo.bson.buffer.create()
mongo.bson.buffer.append(query, 'country', c("Afghanistan","Algeria"))
## [1] TRUE
result =mongo.bson.from.buffer(query)
result
## country : 4
## 0 : 2 Afghanistan
## 1 : 2 Algeria
Advantages of storing Data in Relational Database 1. Minimize redudancy of data. Data could be stored once and maintained at one place but referenced many other places 2. It prevents data becoming out of sync. Database Atomicity property, you can either have a debit or a credit 3. It helps prevent bad data like a phone number that has alphabets 4. Relational database can perform certain numeric computations at a faster rate than NOSQL 5. Requires less processing effort
Advantages of NOSQL over Relational Database 1. When you are deal with unstructured data like documents, photos, videos 2. It offers much more flexibility in data storage 3. Storage structure is intuitive for example a blog post like with pictures, videos, and likes can be stored in one place 4. NoSQL have a better relationship with object oriented programing langugaes 5. NoSQL makes the most out of cloud computing 6. Rapid development is encouraged in NOSQL shop due to the design or architecture is not needed