The task here is to migrate from an SQL database to a NoSQL database. I decided to recycle my movie data from week 2 and move it over to mongoDB.
All the relevant data for this work can be found on github here and the output can be found on rpubs here
library(RMySQL)## Loading required package: DBI
library(mongolite)## Warning: package 'mongolite' was built under R version 3.4.4
library(stringr)
library(miniUI)
library(shiny)
library(knitr)I’ve borrowed an awesome snippet that was posted on Slack as a much nicer way allow for users to input passwords.
#a shout out to Justin for posting this awesome snippet on slack!
get_password <- function() {
ui <- miniPage(
gadgetTitleBar("Please enter password for database: data_607 "),
miniContentPanel(
passwordInput("password", "")
)
)
server <- function(input, output) {
observeEvent(input$done, {
stopApp(input$password)
})
observeEvent(input$cancel, {
stopApp(stop("No password.", call. = FALSE))
})
}
runGadget(ui, server, viewer = dialogViewer("Password", height = 200))
}
pw <- get_password() In the interest of reproducibility, I’m going to create the mysql db right here. The only action that the user would need to perform in order to make this code work is to create a mysql database entitled data_607.
#connect to the
con = dbConnect(MySQL(),
user="root",
password=pw,
dbname='DATA_607',
host='localhost')
#load the data
ratings <- read.table("https://raw.githubusercontent.com/plb2018/DATA607/master/ml-latest-small/ratings.csv",header = TRUE, sep = ",",quote = "\"")
movies <- read.table("https://raw.githubusercontent.com/plb2018/DATA607/master/ml-latest-small/movies.csv",header = TRUE, sep = ",",fill = TRUE,quote = "\"")
#create tables from the data
dbWriteTable(con, name='ratings', value=ratings, overwrite=TRUE)## [1] TRUE
dbWriteTable(con, name='movies', value=movies, overwrite=TRUE)## [1] TRUE
#join the data into a third table
dbSendQuery(con, "CREATE TABLE IF NOT EXISTS movie_ratings AS (
SELECT movies.movieId,ratings.userId, ratings.rating, movies.title,movies.genres
FROM movies
INNER JOIN ratings ON movies.movieId = ratings.movieId);")## <MySQLResult:311234192,0,10>
From what I gather, there are a few ways to migrate from SQL to NoSQL, however, it seems to be highly dependent on the input SQL DB and the desired outcome. In the case of my movie DB, the requirements are pretty simple, so i decided to try 2 methods of migration.
A common method seems to be using CSV as a go-between. It appears as though this method offers good versatility, however, it’s probably slow and impractival for extremely large databases.
First I write my SQL tables to files
sql.tables <- dbListTables(con)
for (i in 1:length(sql.tables)){
data<- dbReadTable(con,sql.tables[i])
write.table(data,
paste(sql.tables[i],".csv",sep=""),
row.names=FALSE,
sep=",")
}For the ease of the user, I’ve put the output from the above on github, and I’ll work from that source. We load 1000 rows of the DB from github back into R
movie.ratings <- read.table("https://github.com/plb2018/DATA607/raw/master/DATA_607_week13/movie_ratings.csv",sep=",",header=TRUE, nrows=1000)Then we connect to mongo and add the data. Note that we assume that the user here already has a local copy of mongo running which contains a db called “DATA_607” and a collection called “movies”
mongo <- mongo(db="DATA_607",collection="movies")
mongo$insert(movie.ratings)## List of 5
## $ nInserted : num 1000
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
Now we check and see what’s in the new mongo db:
kable(mongo$find(
query = '{"rating" : {"$gte" : 5}}',
fields = '{"title":true, "rating":true}',
limit = 10)
)| _id | rating | title |
|---|---|---|
| 5ae5f32a6118ff1b6c000df9 | 5 | Sense and Sensibility (1995) |
| 5ae5f32a6118ff1b6c000dfa | 5 | Clueless (1995) |
| 5ae5f32a6118ff1b6c000e01 | 5 | Apollo 13 (1995) |
| 5ae5f32a6118ff1b6c000e09 | 5 | Circle of Friends (1995) |
| 5ae5f32a6118ff1b6c000e10 | 5 | Like Water for Chocolate (Como agua para chocolate) (1992) |
| 5ae5f32a6118ff1b6c000e11 | 5 | Legends of the Fall (1994) |
| 5ae5f32a6118ff1b6c000e37 | 5 | Nightmare Before Christmas, The (1993) |
| 5ae5f32a6118ff1b6c000e39 | 5 | Brady Bunch Movie, The (1995) |
| 5ae5f32a6118ff1b6c000e3d | 5 | Terminator 2: Judgment Day (1991) |
| 5ae5f32a6118ff1b6c000e3e | 5 | Dances with Wolves (1990) |
The data looks good!
Another way, which it appears to be less feasible for complicated SQL DBs is to just go directly from a SQL query into mongo. Here we’re really only loading 1 table, so it’s not a big deal, but i’m less confident that this will work for complicated DBs.
movie.ratings <- dbReadTable(con,"movie_ratings")
mongo <- mongo(db="DATA_607",collection="movies2")
mongo$insert(movie.ratings)## List of 5
## $ nInserted : num 1e+05
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
kable(mongo$find(
query = '{"rating" : {"$gte" : 5}}',
fields = '{"title":true, "rating":true}',
limit = 10)
)| _id | rating | title |
|---|---|---|
| 5ae5f3ae6118ff1b6c0011e2 | 5 | Sense and Sensibility (1995) |
| 5ae5f3ae6118ff1b6c0011e3 | 5 | Clueless (1995) |
| 5ae5f3ae6118ff1b6c0011ea | 5 | Apollo 13 (1995) |
| 5ae5f3ae6118ff1b6c0011f2 | 5 | Circle of Friends (1995) |
| 5ae5f3ae6118ff1b6c0011f9 | 5 | Like Water for Chocolate (Como agua para chocolate) (1992) |
| 5ae5f3ae6118ff1b6c0011fa | 5 | Legends of the Fall (1994) |
| 5ae5f3ae6118ff1b6c001220 | 5 | Nightmare Before Christmas, The (1993) |
| 5ae5f3ae6118ff1b6c001222 | 5 | Brady Bunch Movie, The (1995) |
| 5ae5f3ae6118ff1b6c001226 | 5 | Terminator 2: Judgment Day (1991) |
| 5ae5f3ae6118ff1b6c001227 | 5 | Dances with Wolves (1990) |
The data looks good here too!