The object of this assignment is to develop understanding of data storage and retrieval mechanism for a NoSQL database. For this conversion, I have chosen the SQL database of my own. Database ER diagram is illustrated below. Amongst the NoSQL database, I have chosen mongoDB for the migration.
ER Diagram: Banking DAtabase
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# SQL Database Connector
library(RMySQL)
## Loading required package: DBI
#Mongolite Database Connector
library(mongolite)
Here we pull the data down from MySQL:
mydb <- dbConnect(MySQL(), user='ahmshahparan', password='11223344', host='db4free.net')
dbSendQuery(mydb, "USE ahmshahparan;")
## <MySQLResult:2,0,0>
customers <- dbGetQuery(mydb, "SELECT * FROM CDW_SAPP_CUSTOMER;")
branches <- dbGetQuery(mydb, "SELECT * FROM CDW_SAPP_BRANCH;")
transactions <- dbGetQuery(mydb, "SELECT * FROM CDW_SAPP_CREDITCARD;")
## Warning in .local(conn, statement, ...): Decimal MySQL column 8 imported as
## numeric
#Disconnect and detach RMySQL to prevent masking of functions below
dbDisconnect(mydb)
## [1] TRUE
detach("package:RMySQL", unload=TRUE)
Data can be also downloaded from here.
Once we pull the movie ratings from mySQL, we connect to a MongoDB, and use the count() function to make sure the database is empty.
customersNoSQL <- mongo(collection = 'CDW_SAPP_CUSTOMER', db = 'ahmshahparan', url = "mongodb://ahmshahparan:11223344@ds263619.mlab.com:63619/ahmshahparan")
branchesNoSQL <- mongo(collection = 'CDW_SAPP_BRANCH', db = 'ahmshahparan', url = "mongodb://ahmshahparan:11223344@ds263619.mlab.com:63619/ahmshahparan")
transactionsNoSQL <- mongo(collection = 'CDW_SAPP_CREDITCARD', db = 'ahmshahparan', url = "mongodb://ahmshahparan:11223344@ds263619.mlab.com:63619/ahmshahparan")
customersNoSQL$count("{}")
## [1] 0
branchesNoSQL$count("{}")
## [1] 0
transactionsNoSQL$count("{}")
## [1] 0
From here we insert the ratings table into Mongo and run a count() to make sure the data was uploaded.
customersNoSQL$insert(customers)
## List of 5
## $ nInserted : num 952
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
branchesNoSQL$insert(branches)
## List of 5
## $ nInserted : num 115
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
transactionsNoSQL$insert(transactions)
## List of 5
## $ nInserted : num 46694
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
customersNoSQL$count("{}")
## [1] 952
branchesNoSQL$count("{}")
## [1] 115
transactionsNoSQL$count("{}")
## [1] 46694
The data is successfully migrated and can be queried now.
Drop the created tables from Mongo if needed and the connection through mongolite automatically disconnects when the connection is removed.
customersNoSQL$drop()
branchesNoSQL$drop()
transactionsNoSQL$drop()
customersNoSQL$count("{}")
## [1] 0
branchesNoSQL$count("{}")
## [1] 0
transactionsNoSQL$count("{}")
## [1] 0
rm(customersNoSQL)
rm(branchesNoSQL)
rm(transactionsNoSQL)
SQL databases are table based databases whereas NoSQL databases are document based, key-value pairs, graph databases or wide-column stores. This means that SQL databases represent data in form of tables which consists of n number of rows of data whereas NoSQL databases are the collection of key-value pair, documents, graph databases or wide-column stores which do not have standard schema definitions which it needs to adhered to.
NoSQL uses documents that have flexible schema. Each instance can have a fluid number of named schemas, for example, customers can be set up without having to change the schema.
NoSQL allows easy migration of data because of the easy and flexible schema. No-SQL databases such as MongoDB also scale better and have higher performance compared to SQL databases.
A disadvantage that I see over SQL databases is that it is harder to view the data in its entirety. We are so used to seeing data in a tabular form. We cannot directly see what the table looks like in its column names, typical values, etc.
Overall the conversion process was smooth. The more difficult part was to identify proper database storage method (SQL vs. NoSQL) and proper database structure.