In this exercise we will migrate a database from MySQL to MongoDB.


Load libraries

library(mongolite)
## Warning: package 'mongolite' was built under R version 3.5.3
library(dplyr)
library(tidyr)
library(rvest)
library(ggplot2)
library(knitr)


Preparation

Make sure you have MySQL installed. Then please run the following sql script in order to create the movie_review database:

Github_Link


Loading the SQL tables into R

Be sure you use the correct username and password for your particular MySQL setup! We then loop through all of the tables in the db and store them each as a data frame in one list called df_list.

my_db <- src_mysql(dbname = 'movie_reviews', username = 'user', password = 'password')
my_tables <- db_list_tables(my_db$con)
print(my_tables)
## [1] "critic" "movie"  "review"
df_list <- list()

for (i in my_tables){
  df_list[[i]] <- tbl(my_db, i) %>% collect(n=Inf)
}

df_list %>% head()
## $critic
## # A tibble: 5 x 2
##   critic_id critic_name
##       <int> <chr>      
## 1         1 Aaron      
## 2         2 Beverly    
## 3         3 Charlie    
## 4         4 Dolores    
## 5         5 Edward     
## 
## $movie
## # A tibble: 6 x 2
##   movie_id title               
##      <int> <chr>               
## 1        1 Acquaman            
## 2        2 Black Panther       
## 3        3 The Greatest Showman
## 4        4 Ready Player One    
## 5        5 Mary Poppins Returns
## 6        6 Annihilation        
## 
## $review
## # A tibble: 30 x 3
##    movie_id critic_id movie_rating
##       <int>     <int>        <dbl>
##  1        1         1          3.1
##  2        1         2          3.9
##  3        1         3          3.9
##  4        1         4          3.1
##  5        1         5          3.5
##  6        2         1          4.8
##  7        2         2          4.6
##  8        2         3          4.1
##  9        2         4          5  
## 10        2         5          5  
## # ... with 20 more rows


Migrate to MongoDB

We then use the mongolite package in R to easily insert the data frames as collections into a Mongo database.

Please be sure to set the variable mongo_url to your particular mongo database location. The default is “mongodb://localhost” but you may have a cloud version you prefer. For security purposes I saved my particular URL in a local file and access it with the readLines() function.

You can also choose your own value for db= within the mongo() function. The default is “test”.

mongo_url <- readLines("mongodb_server_url.txt")

for (i in my_tables){
  con <- mongo(collection = i, db = "test", url = mongo_url)
  con$insert(df_list[[i]])
  con$count()
  print(con$find() %>% head(10))
  con$drop()
}
##   critic_id critic_name
## 1         1       Aaron
## 2         2     Beverly
## 3         3     Charlie
## 4         4     Dolores
## 5         5      Edward
##   movie_id                title
## 1        1             Acquaman
## 2        2        Black Panther
## 3        3 The Greatest Showman
## 4        4     Ready Player One
## 5        5 Mary Poppins Returns
## 6        6         Annihilation
##    movie_id critic_id movie_rating
## 1         1         1          3.1
## 2         1         2          3.9
## 3         1         3          3.9
## 4         1         4          3.1
## 5         1         5          3.5
## 6         2         1          4.8
## 7         2         2          4.6
## 8         2         3          4.1
## 9         2         4          5.0
## 10        2         5          5.0

We can see that the information in the collections is the same as the SQL tables.

The advantages of using a NoSQL database are that it is more flexible in its structure and also takes up less disk space. This may be desirable if working with a very large dataset. However, the flexibility can be a disadvantage as well. Data relationships are not strictly enforced, which could lead to duplication and disorganization. Relational databases are often preferred if a dataset is not overly immense and / or cost is not an issue.