For Project 4, you should take information from a relational database and migrate it to a NoSQL database of your own choosing. For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation. For the NoSQL database, you may use MongoDB, Neo4j (which we introduce in Week 12), or another NoSQL database of your choosing.

Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Installing required libraries

library(RMySQL)
## Loading required package: DBI
library(knitr)
library(rmongodb)
library(mongolite)

Connection to mysql

We will connect to the mysql using dbconnect and select the existing mysql database called ds_skills which was created by our team for Project 3.

connection to mysql using this code. The original code is hidden using echo=FALSE to protect the password.

mydb = dbConnect(MySQL(), user=‘root’, password=‘****’, dbname=‘ds_skills’, host=‘localhost’)

dbListTables(mydb) #list all the tables in database 
## [1] "doc_category"   "doc_skills"     "documents"      "skill_category"
## [5] "skills"
dbListFields(mydb, 'documents') # choose one table from the database and list all fields in the table
## [1] "doc_id"    "dc_id"     "doc_path"  "doc_title"
doc = dbGetQuery(mydb, "select * from documents")

kable(head(doc))
doc_id dc_id doc_path doc_title
1 2 http://www.kdnuggets.com/2014/11/9-must-have-skills-data-scientist.html Must-Have Skills You Need to Become a Data Scientist
2 2 https://www.quora.com/What-are-the-most-valuable-skills-to-learn-for-a-data-scientist-now What are the most valuable skills to learn for a data scientist ?
3 2 http://dataconomy.com/top-10-data-science-skills-and-how-to-learn-them/ The Top 10 Data Science Skills, and How to Learn Them
4 2 http://blog.udacity.com/2014/11/data-science-job-skills.html 8 Skills You Need to Be a Data Scientist | Udacity
5 2 http://www.mastersindatascience.org/careers/data-scientist/ Data Scientist Careers | How to Become a Data Scientist
6 2 https://adtmag.com/articles/2016/01/08/data-science-skills.aspx What Are the Most-Wanted Data Science Skills for 2016 ?

Mongodb

mongo <- mongo.create()
mongo.is.connected(mongo) # test the connection with mongo
## [1] TRUE
mongo = mongo(collection = "documents",  db = "ds_skills") # create ds_skills in mongo


mongo$insert(doc) #inserting document in mongo
## 
Complete! Processed total of 218 rows.
## [1] TRUE
kable(head(mongo$find('{}')))  #search for collection in mongo
## 
 Found 1000 records...
 Found 2000 records...
 Found 3000 records...
 Found 4000 records...
 Found 4578 records...
 Imported 4578 records. Simplifying into dataframe...
dc_id doc_path doc_title doc_id
2 http://www.kdnuggets.com/2014/11/9-must-have-skills-data-scientist.html Must-Have Skills You Need to Become a Data Scientist NA
2 https://www.quora.com/What-are-the-most-valuable-skills-to-learn-for-a-data-scientist-now What are the most valuable skills to learn for a data scientist ? NA
2 http://dataconomy.com/top-10-data-science-skills-and-how-to-learn-them/ The Top 10 Data Science Skills, and How to Learn Them NA
2 http://blog.udacity.com/2014/11/data-science-job-skills.html 8 Skills You Need to Be a Data Scientist | Udacity NA
2 http://www.mastersindatascience.org/careers/data-scientist/ Data Scientist Careers | How to Become a Data Scientist NA
2 https://adtmag.com/articles/2016/01/08/data-science-skills.aspx What Are the Most-Wanted Data Science Skills for 2016 ? NA

Advantages of Non relational databases over relational databases:

No schema required: Data can be inserted in a NoSQL database without first defining a rigid database schema. As a corollary, the format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.

Auto elasticity: NoSQL automatically spreads your data onto multiple servers without requiring application assistance. Servers can be added or removed from the data layer without application downtime NoSQL databases are also often faster because their data models are simpler.

Key-value stores: As the name implies, a key-value store is a system that stores values indexed for retrieval by keys. These systems can hold structured or unstructured data.

Column- oriented databases: Rather than store sets of information in a heavily structured table of columns and rows with uniform sized fields for each record, as is the case with relational databases, column-oriented databases contain one extendable column of closely related data.

document-based stores: These databases store and organize data as collections of documents, rather than as structured tables with uniform sized fields for each record. With these databases, users can add any number of fields of any length to a document

Flexibility:Major NoSQL systems are flexible enough to better enable developers to use the applications in ways that meet their needs.

Faster:NoSQL databases generally process data faster than relational databases, because their data models are simpler.

Disadvantages of Non relational databases over relational databases:

Lack of community: NoSQL is relatively new and lacks a healthy community. Meanwhile, MySQL has a seasoned community willing to help you start on your journey into the world of databases.

Lack of reporting tools: A major problem with NoSQL databases is the lack of reporting tools for analysis and performance testing. However, with MySQL, you can find a wide array of reporting tools to help you prove your application’s validity.

Doesn’t conform to ACID properties: Relational database systems function on the ACID paradigm (Atomicity, Consistency, Isolation, Durability). NoSQL databases don’t.

Lack of standardization: In order for NoSQL to grow, it needs a standard query language like SQL. This is a major issue highlighted by researchers at Microsoft, who claim that NoSQL’s lack of standardization can cause a problem during migration. Besides this, standardization is important for the database industry to unify itself in future.

Reference:

http://www.thewindowsclub.com/difference-sql-nosql-comparision

http://blog.monitor.us/2013/05/cc-in-review-the-key-differences-between-mysql-and-nosql-dbs/

Additional Information For creating ds_skills in your mysql enviornment, please use the sql script described in the Project 3 github link By running the script ds_skills will be created in your relational database. By using your username and password for mysql, the users will be able to run the code by making the connection with mysql.