Assignment

The given task is to take information from a relational database and migrate it to a NoSQL database. In this assignment I have picked the RDBMS database as MySQL and NoSQL database as Mongo DB
The entire code should be reproducible

Approach

  • Fetch data from MySQL Database using RMySQL library and generate a table like structure inorder to migrate to NoSQL database
  • Mongo DB - NoSQL database, load the table
  • Assumption is both MongoDB and MySQL databases are running in system

  • #Library for MySQL database
    library(RMySQL)
    library(mongolite)
    library(knitr)
    library(kableExtra)

    Fetch Data from MySQL

  • Used the Week2 Assignment MySQL database table movie for this assignment

  • #Connection Details
    movie = dbConnect(MySQL(), user='root', password='mohamed', dbname='movie', host='localhost')
    
    #Fetching Oscars2019 table
    Oscars2019 <- dbSendQuery(movie, "select * from movie.Oscars2019")
    Oscars2019_List <- fetch(Oscars2019)
    
    #Fetching Ratings table
    Ratings <- dbSendQuery(movie, "select MovieID,  RatingCategory, GivenRating from movie.MovieRating")
    MovieRatings <- fetch(Ratings)
    
    #Query for getting Average Ratings
    BestMovieRating <- dbSendQuery(movie, "select m.MovieName,m.Category, CAST(AVG(r.GivenRating) AS DECIMAL(3,2)) as AvgRating from movie.Oscars2019 m inner join movie.MovieRating r on r.MovieId = m.MovieId group by m.MovieName order by AvgRating desc")
    Rating_List <- fetch(BestMovieRating)
    #Display Ratings Results
    
    kable(data.frame(Rating_List)) %>%
      kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
      row_spec(0, bold = T, color = "white", background = "#ea7872")
    MovieName Category AvgRating
    BLACK PANTHER PG-13 4.10
    GREEN BOOK PG-13 3.90
    ROMA R 3.70
    BOHEMIAN RHAPSODY PG-13 3.50
    VICE R 3.34
    THE FAVOURITE R 3.20
    A STAR IS BORN R 3.10
    BLACKkKLANSMAN R 2.90
    #Disconnect from database
    dbDisconnect(movie)
    ## [1] TRUE

    Migrate to Mongo DB

  • Initiate connection with Mongo DB, requires mongolite library
  • Using insert command, push the date in to mondo db
  • Query to find out the data has been loaded
  • Disconnect from mongo db

  • #Connecting to Mongo DB
    mongodb <- mongo(collection = "cmovie", db = "movie", url = "mongodb://localhost",verbose = FALSE, options = ssl_options())
    if(mongodb$count() > 0) mongodb$drop()
    
    #Insert the dataframe into R
    mongodb$insert(Rating_List,stop_on_error = TRUE)
    ## List of 5
    ##  $ nInserted  : num 8
    ##  $ nMatched   : num 0
    ##  $ nRemoved   : num 0
    ##  $ nUpserted  : num 0
    ##  $ writeErrors: list()
    stopifnot(mongodb$count() == nrow(Rating_List))
    
    # Query data
    mongo_data <- mongodb$find()
    stopifnot(all.equal(mongo_data, Rating_List))
    #mongodb$drop()
    
    #Display Output
    kable(data.frame(mongo_data)) %>%
      kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
      row_spec(0, bold = T, color = "white", background = "#ea7872")
    MovieName Category AvgRating
    BLACK PANTHER PG-13 4.10
    GREEN BOOK PG-13 3.90
    ROMA R 3.70
    BOHEMIAN RHAPSODY PG-13 3.50
    VICE R 3.34
    THE FAVOURITE R 3.20
    A STAR IS BORN R 3.10
    BLACKkKLANSMAN R 2.90
    # Automatically disconnect when connection is removed
    rm(mongodb)
    gc()
    ##           used (Mb) gc trigger (Mb) max used (Mb)
    ## Ncells  742997 39.7    1250844 66.9  1250844 66.9
    ## Vcells 1333809 10.2    8388608 64.0  2020610 15.5

    Comparison

  • Listed some of the points to state the best of RDBMS and NoSQL databases
  • Reasons why NoSQL is best

    NoSQL RDBMS
    Primary goal of flexibility and speed, rather than 100% data integrity ACID compliancy(Atomicity, Consistency, Isolation, Durability)
    Rapid development Has to follow SDLC process involving vendors and data administors
    Using cloud computing and storage Very Limited options available
    Storing large volumes of data without structure Expensive to buy storage
    highly and easily scalable Need Expertise and pricy
    Maintaining NoSQL Servers is Less Expensive Maintainance is high
    No Schema or Fixed Data model It has to follow some Schema or fixed data model

    Reasons why RDBMS is best

    RDBMS NoSQL
    8 Most of the leading RDMS databases are Commerical and efficient NoSQL database is Open Source
    9 Stored Procedures and Function No Stored Procedures
    10 Since it videly used, finding experts is easy Difficult for finding nosql experts
    11 Most of the databases are equal on their foundation like (DML/DDL etc) but might differ on architecture No two NoSQL databases are created equal
    12 Logic-related discrete data requirements which can be identified up front Evolving databases
    13 Standards-based proven technology with good developer experience and support. Since it is in early stage, not proven at it best
    14 Flexible queries Limited queries

    Conclusion

    Both RDBMS and NoSQL databases has its own advantages and limitations as listed above, its purely depend on project and how the organizations has to store data based on factors like durability, cost and security