For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
# Different kinds of packages that will be used in this project put in one place.
library(RMySQL) #connecting to MySQL and performing db function
## Warning: package 'RMySQL' was built under R version 3.5.1
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 3.5.1
library(mongolite) #connecting to MongoDB
## Warning: package 'mongolite' was built under R version 3.5.1
library(stringr) #string functions
## Warning: package 'stringr' was built under R version 3.5.1
I used a website to download the sample dataset which has 10,000 records of different employees.
https://www.sample-videos.com/download-sample-sql.php
Then, I downloaded the .sql file and ran the queries on MYSQL Workbench.
Lets connect to MYSQL
# connecting to MYSQL using my username and password
mydb <- dbConnect(MySQL(), user='root', password='everest8848', dbname = 'test', host='localhost')
dbSendQuery(mydb, 'use test')
## <MySQLResult:0,0,0>
To store database for further processing
#storing data
emp_df <- dbGetQuery(mydb, 'select * from user_details')
#to make sure data is correct
emp_df_25 <- head(emp_df, 25)
emp_df_25
## user_id username first_name last_name gender
## 1 1 rogers63 david john Female
## 2 2 mike28 rogers paul Male
## 3 3 rivera92 david john Male
## 4 4 ross95 maria sanders Male
## 5 5 paul85 morris miller Female
## 6 6 smith34 daniel michael Female
## 7 7 james84 sanders paul Female
## 8 8 daniel53 mark mike Male
## 9 9 brooks80 morgan maria Female
## 10 10 morgan65 paul miller Female
## 11 11 sanders84 david miller Female
## 12 12 maria40 chrishaydon bell Female
## 13 13 brown71 michael brown Male
## 14 14 james63 morgan james Male
## 15 15 jenny0993 rogers chrishaydon Female
## 16 16 john96 morgan wright Male
## 17 17 miller64 morgan wright Male
## 18 18 mark46 david ross Female
## 19 19 jenny0988 maria morgan Female
## 20 20 mark80 mike bell Male
## 21 21 morris72 miller michael Male
## 22 22 wright39 ross rogers Female
## 23 23 paul68 brooks mike Male
## 24 24 smith60 miller daniel Male
## 25 25 bell43 mike wright Male
## password status
## 1 e6a33eee180b07e563d74fee8c2c66b8 1
## 2 2e7dc6b8a1598f4f75c3eaa47958ee2f 1
## 3 1c3a8e03f448d211904161a6f5849b68 1
## 4 62f0a68a4179c5cdd997189760cbcf18 1
## 5 61bd060b07bddfecccea56a82b850ecf 1
## 6 7055b3d9f5cb2829c26cd7e0e601cde5 1
## 7 b7f72d6eb92b45458020748c8d1a3573 1
## 8 299cbf7171ad1b2967408ed200b4e26c 1
## 9 aa736a35dc15934d67c0a999dccff8f6 1
## 10 a28dca31f5aa5792e1cefd1dfd098569 1
## 11 0629e4f9f0e01e6f20bc2066175e09f7 1
## 12 17f286a78c74db7ee24374c608a2f20c 1
## 13 fa0c46cc4339a8a51a7da1b33e9d2831 1
## 14 b945416fa907fac533d94efe1974ec07 1
## 15 388823cb9249d4cebc9d677a99e1d79d 1
## 16 d0bb977705c3cdad1e346c898f32a1b7 1
## 17 58b207ee33794b046511203967c8e0d7 1
## 18 21cdcb68a932871524e16680fac72e18 1
## 19 ec9ed18ae2a13fef709964af24bb60e6 1
## 20 084489b355edd349bca1c798788de19a 1
## 21 bdb047eb9ea511052fc690a8ac72a7d3 1
## 22 1b6859df2da2a416c5b0fa044b1c6a75 1
## 23 12d836bf64839f987338414ccbec657f 1
## 24 494610644518624d05e2bdc8b9df3c36 1
## 25 2bd4e16a15f5527cb43282ee0ef94619 1
It’s always good to clean up after you get the data and disconnect from MYSQL
#disconnecting MYSQL
dbDisconnect(mydb)
## [1] TRUE
# to stop certain packages to autorun everytime we run the RStudio
detach("package:RMySQL", unload=TRUE)
# including more libraries needed for further work
library(devtools)
## Warning: package 'devtools' was built under R version 3.5.1
## Warning: package 'usethis' was built under R version 3.5.1
library(rJava)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.5.1
library(rmongodb)
# connect to MongoDB
mongo = mongo.create(host = "localhost",db ="test")
#Confirm sucessful connection
mongo.is.connected(mongo)
## [1] TRUE
mongo.get.databases(mongo)
## [1] "config" "emp"
So, now we are ready to upload the dataframe into MongoDB.
#Convert Dataframe into BSON for mongoDB
emp_list=mongo.bson.from.df(emp_df)
emp = mongo.bson.from.list(emp_list)
#now lets insert BSON into MongoDB collection
mongo.insert(mongo,"emp.emp_collection1",emp)
## [1] TRUE
#once again lets confirm that we have the data
mongo.count(mongo, "emp.emp_collection1")
## [1] 22
## lets see what we have collection so far
tmp = mongo.find.one(mongo, ns = "emp.emp_collection1")
## now finally lets see the real data that we have imported to mongoDB
emp_list_25=mongo.bson.from.df(emp_df_25)
emp_25 = mongo.bson.from.list(emp_list_25)
mongo.insert(mongo,"emp.emp_collection25",emp_25)
## [1] TRUE
mongo.find.one(mongo, ns = "emp.emp_collection25")
## _id : 7 5bfa528cca58c00c4a178322
## 1 : 3
## user_id : 16 1
## username : 2 rogers63
## first_name : 2 david
## last_name : 2 john
## gender : 2 Female
## password : 2 e6a33eee180b07e563d74fee8c2c66b8
## status : 16 1
##
## 2 : 3
## user_id : 16 2
## username : 2 mike28
## first_name : 2 rogers
## last_name : 2 paul
## gender : 2 Male
## password : 2 2e7dc6b8a1598f4f75c3eaa47958ee2f
## status : 16 1
##
## 3 : 3
## user_id : 16 3
## username : 2 rivera92
## first_name : 2 david
## last_name : 2 john
## gender : 2 Male
## password : 2 1c3a8e03f448d211904161a6f5849b68
## status : 16 1
##
## 4 : 3
## user_id : 16 4
## username : 2 ross95
## first_name : 2 maria
## last_name : 2 sanders
## gender : 2 Male
## password : 2 62f0a68a4179c5cdd997189760cbcf18
## status : 16 1
##
## 5 : 3
## user_id : 16 5
## username : 2 paul85
## first_name : 2 morris
## last_name : 2 miller
## gender : 2 Female
## password : 2 61bd060b07bddfecccea56a82b850ecf
## status : 16 1
##
## 6 : 3
## user_id : 16 6
## username : 2 smith34
## first_name : 2 daniel
## last_name : 2 michael
## gender : 2 Female
## password : 2 7055b3d9f5cb2829c26cd7e0e601cde5
## status : 16 1
##
## 7 : 3
## user_id : 16 7
## username : 2 james84
## first_name : 2 sanders
## last_name : 2 paul
## gender : 2 Female
## password : 2 b7f72d6eb92b45458020748c8d1a3573
## status : 16 1
##
## 8 : 3
## user_id : 16 8
## username : 2 daniel53
## first_name : 2 mark
## last_name : 2 mike
## gender : 2 Male
## password : 2 299cbf7171ad1b2967408ed200b4e26c
## status : 16 1
##
## 9 : 3
## user_id : 16 9
## username : 2 brooks80
## first_name : 2 morgan
## last_name : 2 maria
## gender : 2 Female
## password : 2 aa736a35dc15934d67c0a999dccff8f6
## status : 16 1
##
## 10 : 3
## user_id : 16 10
## username : 2 morgan65
## first_name : 2 paul
## last_name : 2 miller
## gender : 2 Female
## password : 2 a28dca31f5aa5792e1cefd1dfd098569
## status : 16 1
##
## 11 : 3
## user_id : 16 11
## username : 2 sanders84
## first_name : 2 david
## last_name : 2 miller
## gender : 2 Female
## password : 2 0629e4f9f0e01e6f20bc2066175e09f7
## status : 16 1
##
## 12 : 3
## user_id : 16 12
## username : 2 maria40
## first_name : 2 chrishaydon
## last_name : 2 bell
## gender : 2 Female
## password : 2 17f286a78c74db7ee24374c608a2f20c
## status : 16 1
##
## 13 : 3
## user_id : 16 13
## username : 2 brown71
## first_name : 2 michael
## last_name : 2 brown
## gender : 2 Male
## password : 2 fa0c46cc4339a8a51a7da1b33e9d2831
## status : 16 1
##
## 14 : 3
## user_id : 16 14
## username : 2 james63
## first_name : 2 morgan
## last_name : 2 james
## gender : 2 Male
## password : 2 b945416fa907fac533d94efe1974ec07
## status : 16 1
##
## 15 : 3
## user_id : 16 15
## username : 2 jenny0993
## first_name : 2 rogers
## last_name : 2 chrishaydon
## gender : 2 Female
## password : 2 388823cb9249d4cebc9d677a99e1d79d
## status : 16 1
##
## 16 : 3
## user_id : 16 16
## username : 2 john96
## first_name : 2 morgan
## last_name : 2 wright
## gender : 2 Male
## password : 2 d0bb977705c3cdad1e346c898f32a1b7
## status : 16 1
##
## 17 : 3
## user_id : 16 17
## username : 2 miller64
## first_name : 2 morgan
## last_name : 2 wright
## gender : 2 Male
## password : 2 58b207ee33794b046511203967c8e0d7
## status : 16 1
##
## 18 : 3
## user_id : 16 18
## username : 2 mark46
## first_name : 2 david
## last_name : 2 ross
## gender : 2 Female
## password : 2 21cdcb68a932871524e16680fac72e18
## status : 16 1
##
## 19 : 3
## user_id : 16 19
## username : 2 jenny0988
## first_name : 2 maria
## last_name : 2 morgan
## gender : 2 Female
## password : 2 ec9ed18ae2a13fef709964af24bb60e6
## status : 16 1
##
## 20 : 3
## user_id : 16 20
## username : 2 mark80
## first_name : 2 mike
## last_name : 2 bell
## gender : 2 Male
## password : 2 084489b355edd349bca1c798788de19a
## status : 16 1
##
## 21 : 3
## user_id : 16 21
## username : 2 morris72
## first_name : 2 miller
## last_name : 2 michael
## gender : 2 Male
## password : 2 bdb047eb9ea511052fc690a8ac72a7d3
## status : 16 1
##
## 22 : 3
## user_id : 16 22
## username : 2 wright39
## first_name : 2 ross
## last_name : 2 rogers
## gender : 2 Female
## password : 2 1b6859df2da2a416c5b0fa044b1c6a75
## status : 16 1
##
## 23 : 3
## user_id : 16 23
## username : 2 paul68
## first_name : 2 brooks
## last_name : 2 mike
## gender : 2 Male
## password : 2 12d836bf64839f987338414ccbec657f
## status : 16 1
##
## 24 : 3
## user_id : 16 24
## username : 2 smith60
## first_name : 2 miller
## last_name : 2 daniel
## gender : 2 Male
## password : 2 494610644518624d05e2bdc8b9df3c36
## status : 16 1
##
## 25 : 3
## user_id : 16 25
## username : 2 bell43
## first_name : 2 mike
## last_name : 2 wright
## gender : 2 Male
## password : 2 2bd4e16a15f5527cb43282ee0ef94619
## status : 16 1
NOW, since we have got all the positive output, lets finally check if you can pull out the information from MongoDB
# checking queries
query = mongo.bson.buffer.create()
mongo.bson.buffer.append(query, 'firstname', c("david","marie","mike", "mark"))
## [1] TRUE
# testing if there are records or not
result =mongo.bson.from.buffer(query)
result
## firstname : 4
## 0 : 2 david
## 1 : 2 marie
## 2 : 2 mike
## 3 : 2 mark
RDBMS
1. Relational database is well defined. It is a table based database.
2. Data are stored in tables as records (columns and rows).
3. Object names are not case-insensitive.
4. RDBMS can perform numeric computations at a faster rate than NoSQL.
5. Since data is stored in records, you can link with other databases.
6. You can join tables and records and databases in RDBMS and use the records as references also.
7. Maintaining RDBMS systems is expensive and need trained manpower.
NoSQL
1. NoSQL database doesn’t have a fixed schema.
2. Data are stored in collections as key - value pairs.
3. Object names are case-sensitive.
4. NoSQL have a better relationship with object oriented programing like photos, documents, videos,etc.
5. You cannot use joins in NoSQL and document nesting is allowed.
6. NoSQL is more popular in blogs data collection and cloud computing.
7. It support automatic repair, easier data distribution, simpler data models and less manpower.