DATA 607 - WEEK 13 Assignment

Clean-up and Setup
Create the SQL Tables
- User Password
- The SQL data
Migrate to Mongo
- Method 1 - CSV
- Method 2 - Right from MySQL

The task here is to migrate from an SQL database to a NoSQL database. I decided to recycle my movie data from week 2 and move it over to mongoDB.

All the relevant data for this work can be found on github here and the output can be found on rpubs here

Clean-up and Setup

library(RMySQL)

## Loading required package: DBI

library(mongolite)

## Warning: package 'mongolite' was built under R version 3.4.4

library(stringr)
library(miniUI)
library(shiny)
library(knitr)

Create the SQL Tables

User Password

I’ve borrowed an awesome snippet that was posted on Slack as a much nicer way allow for users to input passwords.

#a shout out to Justin for posting this awesome snippet on slack!

get_password <- function() {
 ui <- miniPage(
   gadgetTitleBar("Please enter password for database: data_607 "),
   miniContentPanel(
     passwordInput("password", "")
   )
 )

 server <- function(input, output) {
   observeEvent(input$done, {
     stopApp(input$password)
   })
   observeEvent(input$cancel, {
     stopApp(stop("No password.", call. = FALSE))
   })
 }

 runGadget(ui, server, viewer = dialogViewer("Password", height = 200))
}

pw <- get_password()

The SQL data

In the interest of reproducibility, I’m going to create the mysql db right here. The only action that the user would need to perform in order to make this code work is to create a mysql database entitled data_607.

#connect to the 
con = dbConnect(MySQL(),
                user="root",
                password=pw,
                dbname='DATA_607',
                host='localhost')

#load the data
ratings <- read.table("https://raw.githubusercontent.com/plb2018/DATA607/master/ml-latest-small/ratings.csv",header = TRUE, sep = ",",quote = "\"")

movies <- read.table("https://raw.githubusercontent.com/plb2018/DATA607/master/ml-latest-small/movies.csv",header = TRUE, sep = ",",fill = TRUE,quote = "\"")

#create tables from the data
dbWriteTable(con, name='ratings', value=ratings, overwrite=TRUE)

## [1] TRUE

dbWriteTable(con, name='movies', value=movies, overwrite=TRUE)

## [1] TRUE

#join the data into a third table
dbSendQuery(con, "CREATE TABLE IF NOT EXISTS movie_ratings AS (
SELECT  movies.movieId,ratings.userId, ratings.rating, movies.title,movies.genres
FROM movies
INNER JOIN ratings ON  movies.movieId = ratings.movieId);")

## <MySQLResult:311234192,0,10>

Migrate to Mongo

From what I gather, there are a few ways to migrate from SQL to NoSQL, however, it seems to be highly dependent on the input SQL DB and the desired outcome. In the case of my movie DB, the requirements are pretty simple, so i decided to try 2 methods of migration.

Method 1 - CSV

A common method seems to be using CSV as a go-between. It appears as though this method offers good versatility, however, it’s probably slow and impractival for extremely large databases.

First I write my SQL tables to files

sql.tables <- dbListTables(con)

for (i in 1:length(sql.tables)){
  data<- dbReadTable(con,sql.tables[i])
  write.table(data,
              paste(sql.tables[i],".csv",sep=""),
              row.names=FALSE,
              sep=",")
}

For the ease of the user, I’ve put the output from the above on github, and I’ll work from that source. We load 1000 rows of the DB from github back into R

movie.ratings <- read.table("https://github.com/plb2018/DATA607/raw/master/DATA_607_week13/movie_ratings.csv",sep=",",header=TRUE, nrows=1000)

Then we connect to mongo and add the data. Note that we assume that the user here already has a local copy of mongo running which contains a db called “DATA_607” and a collection called “movies”

mongo <- mongo(db="DATA_607",collection="movies")
mongo$insert(movie.ratings)

## List of 5
##  $ nInserted  : num 1000
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Now we check and see what’s in the new mongo db:

kable(mongo$find(
  query = '{"rating" : {"$gte" : 5}}', 
  fields = '{"title":true, "rating":true}',
  limit = 10)
  )

_id	rating	title
5ae5f32a6118ff1b6c000df9	5	Sense and Sensibility (1995)
5ae5f32a6118ff1b6c000dfa	5	Clueless (1995)
5ae5f32a6118ff1b6c000e01	5	Apollo 13 (1995)
5ae5f32a6118ff1b6c000e09	5	Circle of Friends (1995)
5ae5f32a6118ff1b6c000e10	5	Like Water for Chocolate (Como agua para chocolate) (1992)
5ae5f32a6118ff1b6c000e11	5	Legends of the Fall (1994)
5ae5f32a6118ff1b6c000e37	5	Nightmare Before Christmas, The (1993)
5ae5f32a6118ff1b6c000e39	5	Brady Bunch Movie, The (1995)
5ae5f32a6118ff1b6c000e3d	5	Terminator 2: Judgment Day (1991)
5ae5f32a6118ff1b6c000e3e	5	Dances with Wolves (1990)

The data looks good!

Method 2 - Right from MySQL

Another way, which it appears to be less feasible for complicated SQL DBs is to just go directly from a SQL query into mongo. Here we’re really only loading 1 table, so it’s not a big deal, but i’m less confident that this will work for complicated DBs.

movie.ratings <- dbReadTable(con,"movie_ratings")
mongo <- mongo(db="DATA_607",collection="movies2")
mongo$insert(movie.ratings)

## List of 5
##  $ nInserted  : num 1e+05
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

kable(mongo$find(
  query = '{"rating" : {"$gte" : 5}}', 
  fields = '{"title":true, "rating":true}',
  limit = 10)
  )

_id	rating	title
5ae5f3ae6118ff1b6c0011e2	5	Sense and Sensibility (1995)
5ae5f3ae6118ff1b6c0011e3	5	Clueless (1995)
5ae5f3ae6118ff1b6c0011ea	5	Apollo 13 (1995)
5ae5f3ae6118ff1b6c0011f2	5	Circle of Friends (1995)
5ae5f3ae6118ff1b6c0011f9	5	Like Water for Chocolate (Como agua para chocolate) (1992)
5ae5f3ae6118ff1b6c0011fa	5	Legends of the Fall (1994)
5ae5f3ae6118ff1b6c001220	5	Nightmare Before Christmas, The (1993)
5ae5f3ae6118ff1b6c001222	5	Brady Bunch Movie, The (1995)
5ae5f3ae6118ff1b6c001226	5	Terminator 2: Judgment Day (1991)
5ae5f3ae6118ff1b6c001227	5	Dances with Wolves (1990)

The data looks good here too!