Assignment
The given task is to take information from a relational database and migrate it to a NoSQL database. In this assignment I have picked the RDBMS database as MySQL and NoSQL database as Mongo DB
The entire code should be reproducible
Approach
Fetch data from MySQL Database using RMySQL library and generate a table like structure inorder to migrate to NoSQL database
Mongo DB - NoSQL database, load the table
Assumption is both MongoDB and MySQL databases are running in system
#Library for MySQL database
library(RMySQL)
library(mongolite)
library(knitr)
library(kableExtra)
Fetch Data from MySQL
Used the Week2 Assignment MySQL database table movie for this assignment
#Connection Details
movie = dbConnect(MySQL(), user='root', password='mohamed', dbname='movie', host='localhost')
#Fetching Oscars2019 table
Oscars2019 <- dbSendQuery(movie, "select * from movie.Oscars2019")
Oscars2019_List <- fetch(Oscars2019)
#Fetching Ratings table
Ratings <- dbSendQuery(movie, "select MovieID, RatingCategory, GivenRating from movie.MovieRating")
MovieRatings <- fetch(Ratings)
#Query for getting Average Ratings
BestMovieRating <- dbSendQuery(movie, "select m.MovieName,m.Category, CAST(AVG(r.GivenRating) AS DECIMAL(3,2)) as AvgRating from movie.Oscars2019 m inner join movie.MovieRating r on r.MovieId = m.MovieId group by m.MovieName order by AvgRating desc")
Rating_List <- fetch(BestMovieRating)
#Display Ratings Results
kable(data.frame(Rating_List)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872")
|
MovieName
|
Category
|
AvgRating
|
|
BLACK PANTHER
|
PG-13
|
4.10
|
|
GREEN BOOK
|
PG-13
|
3.90
|
|
ROMA
|
R
|
3.70
|
|
BOHEMIAN RHAPSODY
|
PG-13
|
3.50
|
|
VICE
|
R
|
3.34
|
|
THE FAVOURITE
|
R
|
3.20
|
|
A STAR IS BORN
|
R
|
3.10
|
|
BLACKkKLANSMAN
|
R
|
2.90
|
#Disconnect from database
dbDisconnect(movie)
## [1] TRUE
Migrate to Mongo DB
Initiate connection with Mongo DB, requires mongolite library
Using insert command, push the date in to mondo db
Query to find out the data has been loaded
Disconnect from mongo db
#Connecting to Mongo DB
mongodb <- mongo(collection = "cmovie", db = "movie", url = "mongodb://localhost",verbose = FALSE, options = ssl_options())
if(mongodb$count() > 0) mongodb$drop()
#Insert the dataframe into R
mongodb$insert(Rating_List,stop_on_error = TRUE)
## List of 5
## $ nInserted : num 8
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
stopifnot(mongodb$count() == nrow(Rating_List))
# Query data
mongo_data <- mongodb$find()
stopifnot(all.equal(mongo_data, Rating_List))
#mongodb$drop()
#Display Output
kable(data.frame(mongo_data)) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
row_spec(0, bold = T, color = "white", background = "#ea7872")
|
MovieName
|
Category
|
AvgRating
|
|
BLACK PANTHER
|
PG-13
|
4.10
|
|
GREEN BOOK
|
PG-13
|
3.90
|
|
ROMA
|
R
|
3.70
|
|
BOHEMIAN RHAPSODY
|
PG-13
|
3.50
|
|
VICE
|
R
|
3.34
|
|
THE FAVOURITE
|
R
|
3.20
|
|
A STAR IS BORN
|
R
|
3.10
|
|
BLACKkKLANSMAN
|
R
|
2.90
|
# Automatically disconnect when connection is removed
rm(mongodb)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 742997 39.7 1250844 66.9 1250844 66.9
## Vcells 1333809 10.2 8388608 64.0 2020610 15.5
Comparison
Listed some of the points to state the best of RDBMS and NoSQL databases
Reasons why NoSQL is best
|
NoSQL
|
RDBMS
|
|
Primary goal of flexibility and speed, rather than 100% data integrity
|
ACID compliancy(Atomicity, Consistency, Isolation, Durability)
|
|
Rapid development
|
Has to follow SDLC process involving vendors and data administors
|
|
Using cloud computing and storage
|
Very Limited options available
|
|
Storing large volumes of data without structure
|
Expensive to buy storage
|
|
highly and easily scalable
|
Need Expertise and pricy
|
|
Maintaining NoSQL Servers is Less Expensive
|
Maintainance is high
|
|
No Schema or Fixed Data model
|
It has to follow some Schema or fixed data model
|
Reasons why RDBMS is best
|
|
RDBMS
|
NoSQL
|
|
8
|
Most of the leading RDMS databases are Commerical and efficient
|
NoSQL database is Open Source
|
|
9
|
Stored Procedures and Function
|
No Stored Procedures
|
|
10
|
Since it videly used, finding experts is easy
|
Difficult for finding nosql experts
|
|
11
|
Most of the databases are equal on their foundation like (DML/DDL etc) but might differ on architecture
|
No two NoSQL databases are created equal
|
|
12
|
Logic-related discrete data requirements which can be identified up front
|
Evolving databases
|
|
13
|
Standards-based proven technology with good developer experience and support.
|
Since it is in early stage, not proven at it best
|
|
14
|
Flexible queries
|
Limited queries
|
Conclusion
Both RDBMS and NoSQL databases has its own advantages and limitations as listed above, its purely depend on project and how the organizations has to store data based on factors like durability, cost and security