Migration from relational db to mongo.

This assigment is to do the migration from relational db to noSql database. The relational db used here is mysql and the nosql used is mongodb. MongoDb is document oriented noSql db.

The data I’m using is movie review data which is persisted in mySQL db in two tables (movie_ratings , movies) I’m trying 3 different approaches to migrate the data into mongoDB.

  1. Flat migration(Normalised) This is the table to table migration. Just like data is normalized in relational database, the data will be kept normalized in noSql also.

  2. De-Normalised migration. Here we are using a query that joins all the data and migrate into a single collection in mongoDb. The disadvantage is that the some of the data will be redudant.

  3. De-Normalised with nested migration Here the data is De-Normalised, but trying to remove the redudant data by nesting certain data types.

Based on the requirments, we can use different approaches. For e.g, for analytics and maching learning purpose, the 2nd type of migration hold good. For Api development, the 3rd type is usefull.

Below are some of the benefits of noSQL.

  1. ABILITY TO HANDLE CHANGE OVER TIME Schema free. unlike relational db, it is not tightly coupled with schema. We can change the schema in real time.

  2. ABILITY TO SCALE HORIZONTALLY ON COMMODITY HARDWARE

  3. Low Latency and High performance.

Load the data from mysql into dataframe.

mydb = dbConnect(MySQL(), user='root', password='mysql@123', dbname='movies', host='localhost')
movie_ratings <- dbGetQuery(mydb, "select * from movie_ratings")
movies <- dbGetQuery(mydb, "select * from movies")
head(movie_ratings)
##   movie_id  viewer_name rating
## 1     1001  John Antony      3
## 2     1002  John Antony     NA
## 3     1003  John Antony      4
## 4     1004  John Antony      4
## 5     1005  John Antony      3
## 6     1006  John Antony      2
head(movies)
##   movie_id                               movie_name movie_type year
## 1     1001                              Womb Ghosts     Horror 2010
## 2     1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 3     1003                             Ex Machine-1     Sci-Fi 2016
## 4     1004                                   Matrix     Sci-Fi 1999
## 5     1005                                  Gravity     Sci-Fi 2013
## 6     1006                              Incredibles  Animation 2017

Load data in normalised way.

Here, we are loading the data as it is in mySQL. Both ‘movies’ and ‘movie_ratings’ goes into the collection ‘movies’ and ‘movie_ratings’ respectively in mongo db.

Connect to mongo db and persist the data frame into mongo db collection.

mongodb <- mongo(collection = "movies", db = "moviedb")
mongodb$insert(movies)
## List of 5
##  $ nInserted  : num 8
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
mongodb <- mongo(collection = "movie_ratings", db = "moviedb")
mongodb$insert(movie_ratings)
## List of 5
##  $ nInserted  : num 72
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Below is the snapshot from mongodb Compas.

movies Collection

movies Collection

Movie Ratings Collection

Movie Ratings Collection

Load data in denormalised way.

Now, lets dnormalise the data and keep the review data into one collection. In NoSql, this is quite okay to denormalize the data in one collection. This means that data is not segregated in multiple tables. Instead the data are grouped together and noSql gives a flexibilty to evolve the schema without impacting existing records.

movies_denorm <- dbGetQuery(mydb, "select movies.movie_id, movies.movie_name , movies.movie_type , movies.year, movie_ratings.viewer_name, movie_ratings.rating from movies
join movie_ratings on movies.movie_id = movie_ratings.movie_id  and movie_ratings.rating is not null")
head(movies_denorm)
##   movie_id   movie_name movie_type year  viewer_name rating
## 1     1001  Womb Ghosts     Horror 2010  John Antony      3
## 2     1003 Ex Machine-1     Sci-Fi 2016  John Antony      4
## 3     1004       Matrix     Sci-Fi 1999  John Antony      4
## 4     1005      Gravity     Sci-Fi 2013  John Antony      3
## 5     1006  Incredibles  Animation 2017  John Antony      2
## 6     1007         Coco  Animation 2017  John Antony      3
mongodb1 <- mongo(collection = "movies_data", db = "moviedb_denorm")
mongodb1$insert(movies_denorm)
## List of 5
##  $ nInserted  : num 55
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
mongodb1$find()
##    movie_id                               movie_name movie_type year
## 1      1001                              Womb Ghosts     Horror 2010
## 2      1003                             Ex Machine-1     Sci-Fi 2016
## 3      1004                                   Matrix     Sci-Fi 1999
## 4      1005                                  Gravity     Sci-Fi 2013
## 5      1006                              Incredibles  Animation 2017
## 6      1007                                     Coco  Animation 2017
## 7      1008                                    Shrek  Animation 2012
## 8      1001                              Womb Ghosts     Horror 2010
## 9      1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 10     1003                             Ex Machine-1     Sci-Fi 2016
## 11     1004                                   Matrix     Sci-Fi 1999
## 12     1006                              Incredibles  Animation 2017
## 13     1007                                     Coco  Animation 2017
## 14     1008                                    Shrek  Animation 2012
## 15     1001                              Womb Ghosts     Horror 2010
## 16     1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 17     1003                             Ex Machine-1     Sci-Fi 2016
## 18     1004                                   Matrix     Sci-Fi 1999
## 19     1005                                  Gravity     Sci-Fi 2013
## 20     1006                              Incredibles  Animation 2017
## 21     1007                                     Coco  Animation 2017
## 22     1008                                    Shrek  Animation 2012
## 23     1001                              Womb Ghosts     Horror 2010
## 24     1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 25     1003                             Ex Machine-1     Sci-Fi 2016
## 26     1004                                   Matrix     Sci-Fi 1999
## 27     1005                                  Gravity     Sci-Fi 2013
## 28     1006                              Incredibles  Animation 2017
## 29     1007                                     Coco  Animation 2017
## 30     1008                                    Shrek  Animation 2012
## 31     1001                              Womb Ghosts     Horror 2010
## 32     1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 33     1003                             Ex Machine-1     Sci-Fi 2016
## 34     1006                              Incredibles  Animation 2017
## 35     1004                                   Matrix     Sci-Fi 1999
## 36     1005                                  Gravity     Sci-Fi 2013
## 37     1006                              Incredibles  Animation 2017
## 38     1007                                     Coco  Animation 2017
## 39     1008                                    Shrek  Animation 2012
## 40     1001                              Womb Ghosts     Horror 2010
## 41     1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 42     1004                                   Matrix     Sci-Fi 1999
## 43     1005                                  Gravity     Sci-Fi 2013
## 44     1006                              Incredibles  Animation 2017
## 45     1001                              Womb Ghosts     Horror 2010
## 46     1004                                   Matrix     Sci-Fi 1999
## 47     1007                                     Coco  Animation 2017
## 48     1001                              Womb Ghosts     Horror 2010
## 49     1002 Paranormal Activity: The Ghost Dimension     Horror 2015
## 50     1003                             Ex Machine-1     Sci-Fi 2016
## 51     1004                                   Matrix     Sci-Fi 1999
## 52     1005                                  Gravity     Sci-Fi 2013
## 53     1006                              Incredibles  Animation 2017
## 54     1007                                     Coco  Animation 2017
## 55     1008                                    Shrek  Animation 2012
##       viewer_name rating
## 1     John Antony      3
## 2     John Antony      4
## 3     John Antony      4
## 4     John Antony      3
## 5     John Antony      2
## 6     John Antony      3
## 7     John Antony      2
## 8   Joseph Thomas      2
## 9   Joseph Thomas      2
## 10  Joseph Thomas      2
## 11  Joseph Thomas      3
## 12  Joseph Thomas      4
## 13  Joseph Thomas      4
## 14  Joseph Thomas      3
## 15    Smith Davis      2
## 16    Smith Davis      2
## 17    Smith Davis      1
## 18    Smith Davis      4
## 19    Smith Davis      3
## 20    Smith Davis      2
## 21    Smith Davis      4
## 22    Smith Davis      2
## 23     Mike Peter      3
## 24     Mike Peter      2
## 25     Mike Peter      3
## 26     Mike Peter      3
## 27     Mike Peter      2
## 28     Mike Peter      2
## 29     Mike Peter      4
## 30     Mike Peter      2
## 31      Tom Jacob      2
## 32      Tom Jacob      2
## 33      Tom Jacob      2
## 34      Tom Jacob      3
## 35         George      4
## 36         George      3
## 37         George      2
## 38         George      3
## 39         George      2
## 40          Jeff       1
## 41           Jeff      3
## 42           Jeff      1
## 43           Jeff      2
## 44           Jeff      3
## 45          Mark       1
## 46           Mark      2
## 47           Mark      3
## 48           Joe       3
## 49            Joe      2
## 50            Joe      2
## 51            Joe      3
## 52            Joe      2
## 53            Joe      2
## 54            Joe      3
## 55            Joe      2
mongodb1$find(query  = '{"movie_id":1001}')
##   movie_id  movie_name movie_type year    viewer_name rating
## 1     1001 Womb Ghosts     Horror 2010    John Antony      3
## 2     1001 Womb Ghosts     Horror 2010  Joseph Thomas      2
## 3     1001 Womb Ghosts     Horror 2010    Smith Davis      2
## 4     1001 Womb Ghosts     Horror 2010     Mike Peter      3
## 5     1001 Womb Ghosts     Horror 2010      Tom Jacob      2
## 6     1001 Womb Ghosts     Horror 2010          Jeff       1
## 7     1001 Womb Ghosts     Horror 2010          Mark       1
## 8     1001 Womb Ghosts     Horror 2010           Joe       3
mongodb1$find(query  = '{"viewer_name":"Tom Jacob"}')
##   movie_id   movie_name movie_type year viewer_name rating
## 1     1003 Ex Machine-1     Sci-Fi 2016   Tom Jacob      2

Below is the snapshot from mongodb Compas.

movies Data Collection

movies Data Collection

Load data with nested data.

Here, the review data is nested in movie data. Benefits: Movie data is not redudant and review data is in nested format. This is upto your requirment. Lets say I need the list of review and movie data on a particular movie name, I just give movie name as input paramter and it returns the movie and it review data.

Below is the code snippet that demonstrate this design.

mongodb2 <- mongo(collection = "movies_reviews", db = "moviedb_denorm")


for (i in 1:nrow(movies)){
    reviews <- movie_ratings %>% filter(movie_id == movies$movie_id[i]) %>%  select(viewer_name ,rating)
    movieDf <- movies[i,]
    movieDf$ratings <-  c(list(reviews))
    movie_rating_json <- toJSON(movieDf)
    print(movie_rating_json)
    mongodb2$insert(fromJSON(movie_rating_json))
}
## [{"movie_id":1001,"movie_name":"Womb Ghosts","movie_type":" Horror","year":2010,"ratings":[{"viewer_name":" John Antony","rating":3},{"viewer_name":" Joseph Thomas","rating":2},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":3},{"viewer_name":" Tom Jacob","rating":2},{"viewer_name":" George "},{"viewer_name":" Jeff ","rating":1},{"viewer_name":" Mark ","rating":1},{"viewer_name":" Joe ","rating":3}]}] 
## [{"movie_id":1002,"movie_name":"Paranormal Activity: The Ghost Dimension","movie_type":" Horror","year":2015,"ratings":[{"viewer_name":" John Antony"},{"viewer_name":" Joseph Thomas","rating":2},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob","rating":2},{"viewer_name":" George"},{"viewer_name":" Jeff","rating":3},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}] 
## [{"movie_id":1003,"movie_name":"Ex Machine-1","movie_type":" Sci-Fi","year":2016,"ratings":[{"viewer_name":"John Antony","rating":4},{"viewer_name":"Joseph Thomas","rating":2},{"viewer_name":"Smith Davis","rating":1},{"viewer_name":"Mike Peter","rating":3},{"viewer_name":"Tom Jacob","rating":2},{"viewer_name":"George"},{"viewer_name":"Jeff"},{"viewer_name":"Mark"},{"viewer_name":"Joe","rating":2}]}] 
## [{"movie_id":1004,"movie_name":"Matrix","movie_type":" Sci-Fi","year":1999,"ratings":[{"viewer_name":" John Antony","rating":4},{"viewer_name":" Joseph Thomas","rating":3},{"viewer_name":" Smith Davis","rating":4},{"viewer_name":" Mike Peter","rating":3},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":4},{"viewer_name":" Jeff","rating":1},{"viewer_name":" Mark","rating":2},{"viewer_name":" Joe","rating":3}]}] 
## [{"movie_id":1005,"movie_name":"Gravity","movie_type":" Sci-Fi","year":2013,"ratings":[{"viewer_name":" John Antony","rating":3},{"viewer_name":" Joseph Thomas"},{"viewer_name":" Smith Davis","rating":3},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":3},{"viewer_name":" Jeff","rating":2},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}] 
## [{"movie_id":1006,"movie_name":"Incredibles","movie_type":" Animation","year":2017,"ratings":[{"viewer_name":" John Antony","rating":2},{"viewer_name":" Joseph Thomas","rating":4},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob","rating":3},{"viewer_name":" George","rating":2},{"viewer_name":" Jeff","rating":3},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}] 
## [{"movie_id":1007,"movie_name":" Coco","movie_type":" Animation","year":2017,"ratings":[{"viewer_name":" John Antony","rating":3},{"viewer_name":" Joseph Thomas","rating":4},{"viewer_name":" Smith Davis","rating":4},{"viewer_name":" Mike Peter","rating":4},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":3},{"viewer_name":" Jeff"},{"viewer_name":" Mark","rating":3},{"viewer_name":" Joe","rating":3}]}] 
## [{"movie_id":1008,"movie_name":"Shrek","movie_type":" Animation","year":2012,"ratings":[{"viewer_name":" John Antony","rating":2},{"viewer_name":" Joseph Thomas","rating":3},{"viewer_name":" Smith Davis","rating":2},{"viewer_name":" Mike Peter","rating":2},{"viewer_name":" Tom Jacob"},{"viewer_name":" George","rating":2},{"viewer_name":" Jeff"},{"viewer_name":" Mark"},{"viewer_name":" Joe","rating":2}]}]

Below shows the pretty print version of Json data from Mongo db.

movie_Womb_Ghosts <- mongodb2$find(query  = '{"movie_name":"Womb Ghosts"}')
print(prettify(toJSON(movie_Womb_Ghosts)))
## [
##     {
##         "movie_id": 1001,
##         "movie_name": "Womb Ghosts",
##         "movie_type": " Horror",
##         "year": 2010,
##         "ratings": [
##             {
##                 "viewer_name": " John Antony",
##                 "rating": 3
##             },
##             {
##                 "viewer_name": " Joseph Thomas",
##                 "rating": 2
##             },
##             {
##                 "viewer_name": " Smith Davis",
##                 "rating": 2
##             },
##             {
##                 "viewer_name": " Mike Peter",
##                 "rating": 3
##             },
##             {
##                 "viewer_name": " Tom Jacob",
##                 "rating": 2
##             },
##             {
##                 "viewer_name": " George "
##             },
##             {
##                 "viewer_name": " Jeff ",
##                 "rating": 1
##             },
##             {
##                 "viewer_name": " Mark ",
##                 "rating": 1
##             },
##             {
##                 "viewer_name": " Joe ",
##                 "rating": 3
##             }
##         ]
##     }
## ]
## 

Below is the snapshot from mongodb Compas.

movies Reviews Collection

movies Reviews Collection