This assigment is to do the migration from relational db to noSql database. The relational db used here is mysql and the nosql used is mongodb. MongoDb is document oriented noSql db.
The data I’m using is movie review data which is persisted in mySQL db in two tables (movie_ratings , movies) I’m trying 3 different approaches to migrate the data into mongoDB.
Flat migration(Normalised) This is the table to table migration. Just like data is normalized in relational database, the data will be kept normalized in noSql also.
De-Normalised migration. Here we are using a query that joins all the data and migrate into a single collection in mongoDb. The disadvantage is that the some of the data will be redudant.
De-Normalised with nested migration Here the data is De-Normalised, but trying to remove the redudant data by nesting certain data types.
Based on the requirments, we can use different approaches. For e.g, for analytics and maching learning purpose, the 2nd type of migration hold good. For Api development, the 3rd type is usefull.
Below are some of the benefits of noSQL.
ABILITY TO HANDLE CHANGE OVER TIME Schema free. unlike relational db, it is not tightly coupled with schema. We can change the schema in real time.
ABILITY TO SCALE HORIZONTALLY ON COMMODITY HARDWARE
Low Latency and High performance.
mydb = dbConnect(MySQL(), user='root', password='mysql@123', dbname='movies', host='localhost')
movie_ratings <- dbGetQuery(mydb, "select * from movie_ratings")
movies <- dbGetQuery(mydb, "select * from movies")
head(movie_ratings)
## movie_id viewer_name rating
## 1 1001 John Antony 3
## 2 1002 John Antony NA
## 3 1003 John Antony 4
## 4 1004 John Antony 4
## 5 1005 John Antony 3
## 6 1006 John Antony 2
head(movies)
## movie_id movie_name movie_type year
## 1 1001 Womb Ghosts Horror 2010
## 2 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 3 1003 Ex Machine-1 Sci-Fi 2016
## 4 1004 Matrix Sci-Fi 1999
## 5 1005 Gravity Sci-Fi 2013
## 6 1006 Incredibles Animation 2017
Here, we are loading the data as it is in mySQL. Both ‘movies’ and ‘movie_ratings’ goes into the collection ‘movies’ and ‘movie_ratings’ respectively in mongo db.
Connect to mongo db and persist the data frame into mongo db collection.
mongodb <- mongo(collection = "movies", db = "moviedb")
mongodb$insert(movies)
## List of 5
## $ nInserted : num 8
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mongodb <- mongo(collection = "movie_ratings", db = "moviedb")
mongodb$insert(movie_ratings)
## List of 5
## $ nInserted : num 72
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
Below is the snapshot from mongodb Compas.
movies Collection
Movie Ratings Collection
Now, lets dnormalise the data and keep the review data into one collection. In NoSql, this is quite okay to denormalize the data in one collection. This means that data is not segregated in multiple tables. Instead the data are grouped together and noSql gives a flexibilty to evolve the schema without impacting existing records.
movies_denorm <- dbGetQuery(mydb, "select movies.movie_id, movies.movie_name , movies.movie_type , movies.year, movie_ratings.viewer_name, movie_ratings.rating from movies
join movie_ratings on movies.movie_id = movie_ratings.movie_id and movie_ratings.rating is not null")
head(movies_denorm)
## movie_id movie_name movie_type year viewer_name rating
## 1 1001 Womb Ghosts Horror 2010 John Antony 3
## 2 1003 Ex Machine-1 Sci-Fi 2016 John Antony 4
## 3 1004 Matrix Sci-Fi 1999 John Antony 4
## 4 1005 Gravity Sci-Fi 2013 John Antony 3
## 5 1006 Incredibles Animation 2017 John Antony 2
## 6 1007 Coco Animation 2017 John Antony 3
mongodb1 <- mongo(collection = "movies_data", db = "moviedb_denorm")
mongodb1$insert(movies_denorm)
## List of 5
## $ nInserted : num 55
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mongodb1$find()
## movie_id movie_name movie_type year
## 1 1001 Womb Ghosts Horror 2010
## 2 1003 Ex Machine-1 Sci-Fi 2016
## 3 1004 Matrix Sci-Fi 1999
## 4 1005 Gravity Sci-Fi 2013
## 5 1006 Incredibles Animation 2017
## 6 1007 Coco Animation 2017
## 7 1008 Shrek Animation 2012
## 8 1001 Womb Ghosts Horror 2010
## 9 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 10 1003 Ex Machine-1 Sci-Fi 2016
## 11 1004 Matrix Sci-Fi 1999
## 12 1006 Incredibles Animation 2017
## 13 1007 Coco Animation 2017
## 14 1008 Shrek Animation 2012
## 15 1001 Womb Ghosts Horror 2010
## 16 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 17 1003 Ex Machine-1 Sci-Fi 2016
## 18 1004 Matrix Sci-Fi 1999
## 19 1005 Gravity Sci-Fi 2013
## 20 1006 Incredibles Animation 2017
## 21 1007 Coco Animation 2017
## 22 1008 Shrek Animation 2012
## 23 1001 Womb Ghosts Horror 2010
## 24 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 25 1003 Ex Machine-1 Sci-Fi 2016
## 26 1004 Matrix Sci-Fi 1999
## 27 1005 Gravity Sci-Fi 2013
## 28 1006 Incredibles Animation 2017
## 29 1007 Coco Animation 2017
## 30 1008 Shrek Animation 2012
## 31 1001 Womb Ghosts Horror 2010
## 32 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 33 1003 Ex Machine-1 Sci-Fi 2016
## 34 1006 Incredibles Animation 2017
## 35 1004 Matrix Sci-Fi 1999
## 36 1005 Gravity Sci-Fi 2013
## 37 1006 Incredibles Animation 2017
## 38 1007 Coco Animation 2017
## 39 1008 Shrek Animation 2012
## 40 1001 Womb Ghosts Horror 2010
## 41 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 42 1004 Matrix Sci-Fi 1999
## 43 1005 Gravity Sci-Fi 2013
## 44 1006 Incredibles Animation 2017
## 45 1001 Womb Ghosts Horror 2010
## 46 1004 Matrix Sci-Fi 1999
## 47 1007 Coco Animation 2017
## 48 1001 Womb Ghosts Horror 2010
## 49 1002 Paranormal Activity: The Ghost Dimension Horror 2015
## 50 1003 Ex Machine-1 Sci-Fi 2016
## 51 1004 Matrix Sci-Fi 1999
## 52 1005 Gravity Sci-Fi 2013
## 53 1006 Incredibles Animation 2017
## 54 1007 Coco Animation 2017
## 55 1008 Shrek Animation 2012
## viewer_name rating
## 1 John Antony 3
## 2 John Antony 4
## 3 John Antony 4
## 4 John Antony 3
## 5 John Antony 2
## 6 John Antony 3
## 7 John Antony 2
## 8 Joseph Thomas 2
## 9 Joseph Thomas 2
## 10 Joseph Thomas 2
## 11 Joseph Thomas 3
## 12 Joseph Thomas 4
## 13 Joseph Thomas 4
## 14 Joseph Thomas 3
## 15 Smith Davis 2
## 16 Smith Davis 2
## 17 Smith Davis 1
## 18 Smith Davis 4
## 19 Smith Davis 3
## 20 Smith Davis 2
## 21 Smith Davis 4
## 22 Smith Davis 2
## 23 Mike Peter 3
## 24 Mike Peter 2
## 25 Mike Peter 3
## 26 Mike Peter 3
## 27 Mike Peter 2
## 28 Mike Peter 2
## 29 Mike Peter 4
## 30 Mike Peter 2
## 31 Tom Jacob 2
## 32 Tom Jacob 2
## 33 Tom Jacob 2
## 34 Tom Jacob 3
## 35 George 4
## 36 George 3
## 37 George 2
## 38 George 3
## 39 George 2
## 40 Jeff 1
## 41 Jeff 3
## 42 Jeff 1
## 43 Jeff 2
## 44 Jeff 3
## 45 Mark 1
## 46 Mark 2
## 47 Mark 3
## 48 Joe 3
## 49 Joe 2
## 50 Joe 2
## 51 Joe 3
## 52 Joe 2
## 53 Joe 2
## 54 Joe 3
## 55 Joe 2
mongodb1$find(query = '{"movie_id":1001}')
## movie_id movie_name movie_type year viewer_name rating
## 1 1001 Womb Ghosts Horror 2010 John Antony 3
## 2 1001 Womb Ghosts Horror 2010 Joseph Thomas 2
## 3 1001 Womb Ghosts Horror 2010 Smith Davis 2
## 4 1001 Womb Ghosts Horror 2010 Mike Peter 3
## 5 1001 Womb Ghosts Horror 2010 Tom Jacob 2
## 6 1001 Womb Ghosts Horror 2010 Jeff 1
## 7 1001 Womb Ghosts Horror 2010 Mark 1
## 8 1001 Womb Ghosts Horror 2010 Joe 3
mongodb1$find(query = '{"viewer_name":"Tom Jacob"}')
## movie_id movie_name movie_type year viewer_name rating
## 1 1003 Ex Machine-1 Sci-Fi 2016 Tom Jacob 2
Below is the snapshot from mongodb Compas.
movies Data Collection
Here, the review data is nested in movie data. Benefits: Movie data is not redudant and review data is in nested format. This is upto your requirment. Lets say I need the list of review and movie data on a particular movie name, I just give movie name as input paramter and it returns the movie and it review data.
Below is the code snippet that demonstrate this design.
mongodb2 <- mongo(collection = "movies_reviews", db = "moviedb_denorm")
for (i in 1:nrow(movies)){
reviews <- movie_ratings %>% filter(movie_id == movies$movie_id[i]) %>% select(viewer_name ,rating)
movieDf <- movies[i,]
movieDf$ratings <- c(list(reviews))
movie_rating_json <- toJSON(movieDf)
print(movie_rating_json)
mongodb2$insert(fromJSON(movie_rating_json))
}
## [{"movie_id":1001,"movie_name":"Womb Ghosts","movie_type":" Horror","year":2010,"ratings":[{"viewer_name":" John Antony","rating":3},{"viewer_name":" Joseph Thomas","rating":2},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":3},{"viewer_name":" Tom Jacob","rating":2},{"viewer_name":" George "},{"viewer_name":" Jeff ","rating":1},{"viewer_name":" Mark ","rating":1},{"viewer_name":" Joe ","rating":3}]}]
## [{"movie_id":1002,"movie_name":"Paranormal Activity: The Ghost Dimension","movie_type":" Horror","year":2015,"ratings":[{"viewer_name":" John Antony"},{"viewer_name":" Joseph Thomas","rating":2},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob","rating":2},{"viewer_name":" George"},{"viewer_name":" Jeff","rating":3},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}]
## [{"movie_id":1003,"movie_name":"Ex Machine-1","movie_type":" Sci-Fi","year":2016,"ratings":[{"viewer_name":"John Antony","rating":4},{"viewer_name":"Joseph Thomas","rating":2},{"viewer_name":"Smith Davis","rating":1},{"viewer_name":"Mike Peter","rating":3},{"viewer_name":"Tom Jacob","rating":2},{"viewer_name":"George"},{"viewer_name":"Jeff"},{"viewer_name":"Mark"},{"viewer_name":"Joe","rating":2}]}]
## [{"movie_id":1004,"movie_name":"Matrix","movie_type":" Sci-Fi","year":1999,"ratings":[{"viewer_name":" John Antony","rating":4},{"viewer_name":" Joseph Thomas","rating":3},{"viewer_name":" Smith Davis","rating":4},{"viewer_name":" Mike Peter","rating":3},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":4},{"viewer_name":" Jeff","rating":1},{"viewer_name":" Mark","rating":2},{"viewer_name":" Joe","rating":3}]}]
## [{"movie_id":1005,"movie_name":"Gravity","movie_type":" Sci-Fi","year":2013,"ratings":[{"viewer_name":" John Antony","rating":3},{"viewer_name":" Joseph Thomas"},{"viewer_name":" Smith Davis","rating":3},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":3},{"viewer_name":" Jeff","rating":2},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}]
## [{"movie_id":1006,"movie_name":"Incredibles","movie_type":" Animation","year":2017,"ratings":[{"viewer_name":" John Antony","rating":2},{"viewer_name":" Joseph Thomas","rating":4},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob","rating":3},{"viewer_name":" George","rating":2},{"viewer_name":" Jeff","rating":3},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}]
## [{"movie_id":1007,"movie_name":" Coco","movie_type":" Animation","year":2017,"ratings":[{"viewer_name":" John Antony","rating":3},{"viewer_name":" Joseph Thomas","rating":4},{"viewer_name":" Smith Davis","rating":4},{"viewer_name":" Mike Peter","rating":4},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":3},{"viewer_name":" Jeff"},{"viewer_name":" Mark","rating":3},{"viewer_name":" Joe","rating":3}]}]
## [{"movie_id":1008,"movie_name":"Shrek","movie_type":" Animation","year":2012,"ratings":[{"viewer_name":" John Antony","rating":2},{"viewer_name":" Joseph Thomas","rating":3},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":2},{"viewer_name":" Jeff"},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}]
Below shows the pretty print version of Json data from Mongo db.
movie_Womb_Ghosts <- mongodb2$find(query = '{"movie_name":"Womb Ghosts"}')
print(prettify(toJSON(movie_Womb_Ghosts)))
## [
## {
## "movie_id": 1001,
## "movie_name": "Womb Ghosts",
## "movie_type": " Horror",
## "year": 2010,
## "ratings": [
## {
## "viewer_name": " John Antony",
## "rating": 3
## },
## {
## "viewer_name": " Joseph Thomas",
## "rating": 2
## },
## {
## "viewer_name": " Smith Davis",
## "rating": 2
## },
## {
## "viewer_name": " Mike Peter",
## "rating": 3
## },
## {
## "viewer_name": " Tom Jacob",
## "rating": 2
## },
## {
## "viewer_name": " George "
## },
## {
## "viewer_name": " Jeff ",
## "rating": 1
## },
## {
## "viewer_name": " Mark ",
## "rating": 1
## },
## {
## "viewer_name": " Joe ",
## "rating": 3
## }
## ]
## }
## ]
##
Below is the snapshot from mongodb Compas.
movies Reviews Collection