Assignment

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.

For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.

Your migration process needs to be reproducible. R code is encouraged, but not required.

You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Introduction

The assignment will use the database created in Assignment 2 of this course - a movie recommendation database with the following schema:

Realational Database Schema

Setup R and install packages to be used

Make Connection to MySQL

ratingsdb = dbConnect(MySQL(), user='data607nn', password='password', dbname='data607Assignment2nn', host='localhost')

Run queries to save database to csv.

MoviesTable <- dbGetQuery(ratingsdb, "SELECT * FROM Movies;")
write.csv(MoviesTable, file="movies.csv")
ReviewersTable <- dbGetQuery(ratingsdb, "SELECT * FROM Reviewers")
write.csv(ReviewersTable, file="reviewers.csv")
RatingsTable <- dbGetQuery(ratingsdb, "SELECT * FROM Ratings")
write.csv(RatingsTable, file="ratings.csv")

Load CSV into Neo4j

graphdb = startGraph("http://localhost:7474/db/data", username = "neo4j", password = "admin")
clear(graphdb, input = F)

query1 = "USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/NNedd/DATA607-Submissions/master/Week%2013%20Assignment/movies.csv' AS row
CREATE (:Movies {MovieID: row.MovieID, MovieName: row.MovieName, ReleaseDate: row.ReleaseDate, BoxOfficeGross: row.BoxOfficeGross});

"
cypher(graphdb, query1)

query2 = "USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/NNedd/DATA607-Submissions/master/Week%2013%20Assignment/reviewers.csv' AS row
CREATE (:Reviewers {ReviewerID: row.ReviewerID, ReviewerName: row.ReviewerName});

"
cypher(graphdb, query2)

query3 = "USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/NNedd/DATA607-Submissions/master/Week%2013%20Assignment/ratings.csv' AS row
CREATE (:Ratings {RatingsID: row.RatingsID, Reviewer: row.Reviewer, Movie: row.Movie, Rating: row.Rating});

"
cypher(graphdb, query3)

Comparison

With a graph database like Neo4j relationship are defined on a row by row basis rather than through the entire table. This makes it more suitable for complex structures.

On the other hand, the relational database is geared towards faster processing and less storage space, very useful advantages when dealing with large datasets.

DATA607-Week 13 Assignment

N Nedd

April 30, 2017