For Project 4, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB, Neo4j (which we introduce in Week 12), or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
The following steps were taken:
c:\data\dbC:\Program Files\MongoDB\Server\3.2\binThe objects to migrate were exported using Workbench Export Wizard. Tables were exported to json file format, while the result of the table-joins query was exported to CSV file.
These 4 files can be saved in local directory (e.g. c:\Mongo)
TABLES:
movies - contains 6 popular movies in 2016 Link to github
people - contains 10 names of survey participants Link to github
movie_ranking - contains ranking of the 6 movies by the 10 survey participants. The rank ranges 1-5, representing 1-5 stars Link to github
QUERY RESULT
A batch file (sql2mongo.bat) was created to execute mongoimport commands which creates corresponding collections in Mongo. This file has to be saved to same folder location as the SQL export files above. Output of the script is directed to console. A pause has been provided as a way to review the outcome before exiting.
Migration Output
I found Mongo simple to install and to immediately start using – no issues with local service and connections, which I experienced with SQL install. Having no knowledge of NoSQL prior to starting this project, there is learning curve, but there seems to be a simplicity in the syntax and construct, which helped. I think one of the strengths of NoSQL is retrieving data faster since information can be stored in de-normalized form. With SQL, the relationships force table joins which sometimes gets the result slower. On the other hand, relationship can enforced data integrity which is important in more traditional processses/transactions.