The problem statement is as follows:

“For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing. For the relational database, you might use the flights database, the tb database, the”data skills" database your team created for Project 3, or another database of your own choosing or creation.

For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing. Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

You may work on a small team on this project. Due end of day next Sunday".

## Warning: package 'RMySQL' was built under R version 3.6.3
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 3.6.3
## Warning: package 'mongolite' was built under R version 3.6.3
## Warning: package 'knitr' was built under R version 3.6.3

I chose the movies database from an earlier assignment. Right below, I am connecting to the MySQL database.

Displaying all tables in the MySQL database.

## [1] "movies"     "responders" "responses"

Joining two tables in MySQL database and getting the data in the R variable df2.

movie_name resp_name response city state ethnicity
Shakespeare in Love John 3 New York New York White
Solaris John NA New York New York White
Enter the Dragon John 5 New York New York White
Star Wars John 3 New York New York White
Rumble in the Bronx John NA New York New York White
Autumn Sonata John 5 New York New York White

Connecting to Mongo, and creating a databse, called “movie_responses_db”, and a collection, called ‘movie_responses’, in it.

Exporting data from R variable into Mongo database.

## List of 5
##  $ nInserted  : num 30
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Checking Mongo commands at my disposal.

## <Mongo collection> 'movie_responses' 
##  $aggregate(pipeline = "{}", options = "{\"allowDiskUse\":true}", handler = NULL, pagesize = 1000, iterate = FALSE) 
##  $count(query = "{}") 
##  $disconnect(gc = TRUE) 
##  $distinct(key, query = "{}") 
##  $drop() 
##  $export(con = stdout(), bson = FALSE, query = "{}", fields = "{}", sort = "{\"_id\":1}") 
##  $find(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0, handler = NULL, pagesize = 1000) 
##  $import(con, bson = FALSE) 
##  $index(add = NULL, remove = NULL) 
##  $info() 
##  $insert(data, pagesize = 1000, stop_on_error = TRUE, ...) 
##  $iterate(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0) 
##  $mapreduce(map, reduce, query = "{}", sort = "{}", limit = 0, out = NULL, scope = NULL) 
##  $remove(query, just_one = FALSE) 
##  $rename(name, db = NULL) 
##  $replace(query, update = "{}", upsert = FALSE) 
##  $run(command = "{\"ping\": 1}", simplify = TRUE) 
##  $update(query, update = "{\"$set\":{}}", filters = NULL, upsert = FALSE, multiple = FALSE)

Selecting from Mongo database.

movie_name resp_name response city state ethnicity
Shakespeare in Love John 3 New York New York White
Solaris John NA New York New York White
Enter the Dragon John 5 New York New York White
Star Wars John 3 New York New York White
Rumble in the Bronx John NA New York New York White
Autumn Sonata John 5 New York New York White

Here’s a screen shot of what was actually inserted into Mongo database

Comparison between RDBMS and NoSQL database:

  1. RDBMS is structured, and therefore roots out data, not conforming to the database structure, during import or Insert phase.
    NoSQL (in our case, Mongo) doesn’t have any structure, so it doesn’t check whether any new field is in the database. This may have unintended consequences.

  2. RDBMS tables store data in the form of rows, separated in several columns.
    NoSQL databases store data in Collections (equivalent of tables), in JSON data format, which is a Key:Value data store. A NoSQL row is called Document.

  3. RDBMS supports indices.
    Mongo supports indices.

  4. RDBMS supports Primary key, which can be applied on any column that has unique data.
    Mongo supports indices, but provides default key _id

  5. RDBMS supports Foreign Key contraints, for referential integrity.
    Mongo does not support referential integrity.

  6. RDBMS provides Group By clause, for data aggregation.
    Mongo provides Aggregate clause, for data aggregation.

  7. RDBMS support Joins, Triggers.
    Mongo does not.

  8. RDBMS contains predefined schemas.
    Mongo contains dynamic schemas.

  9. RDBMS is not suitable for hierarchical data, though in Oracle queries, it can be achieved by “Connect by prior”.
    Mongo is suitable for hierarchical data.

  10. RDBMS emphasizes on ACID properties (Atomicity, Consistency, Isolation and Durability).
    Mongo emphasizes on CAP theorem (Consistency, Availability, Partition tolerance)

Marker: 607-12