Migrating from RDB to MongoDB

The task of moving data can sometimes be daunting but with the help of R and packages like mongolite and RMySQL it is fairly easy.

Creating collection in Mongo db ‘flights’ and creating connection

# creating collection in mongo flights db and connecting to it
airlines_col <- mongo(collection = 'airlines',
                      db = 'flights',
                      url = 'mongodb://localhost')

airports_col <- mongo(collection = 'airports',
                      db = 'flights',
                      url = 'mongodb://localhost')

flights_col <- mongo(collection = 'flights',
                      db = 'flights',
                      url = 'mongodb://localhost')

planes_col <- mongo(collection = 'planes',
                      db = 'flights',
                      url = 'mongodb://localhost')

weather_col <- mongo(collection = 'weather',
                      db = 'flights',
                      url = 'mongodb://localhost')

Connect to MySQL

# connecting to MySQL flights db
flights_sql <- dbConnect(MySQL(), user='root', password='root12345', dbname='flights', host='localhost')

# getting dataframes from all tables flights db
airlines <- dbGetQuery(flights_sql, 'SELECT * FROM airlines')
airports <- dbGetQuery(flights_sql, 'SELECT * FROM airports')
flights <- dbGetQuery(flights_sql, 'SELECT * FROM flights')
planes <- dbGetQuery(flights_sql, 'SELECT * FROM planes')
weather <- dbGetQuery(flights_sql, 'SELECT * FROM weather')

Some quick tidying to remove new line character ‘’

# remove "\r" from end of line in airlines data frame
airlines$name <- sub('\r$','',airlines$name)

Inserting the tables from MySQL to the Mongo collections

# insert tables into collections, dropping anything in the collection first
airlines_col$drop()
airlines_col$insert(airlines)
## List of 5
##  $ nInserted  : num 16
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
airports_col$drop()
airports_col$insert(airports)
## List of 5
##  $ nInserted  : num 1397
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
flights_col$drop()
flights_col$insert(flights)
## List of 5
##  $ nInserted  : num 336776
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
planes_col$drop()
planes_col$insert(planes)
## List of 5
##  $ nInserted  : num 3322
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()
weather_col$drop()
weather_col$insert(weather)
## List of 5
##  $ nInserted  : num 8719
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Quick check to make sure all of the data is now store in the collections

# checking that the collections now match the MySQL tables
airlines_col$count('{}') == nrow(airlines)
## [1] TRUE
airports_col$count('{}') == nrow(airports)
## [1] TRUE
flights_col$count('{}') == nrow(flights)
## [1] TRUE
planes_col$count('{}') == nrow(planes)
## [1] TRUE
weather_col$count('{}') == nrow(weather)
## [1] TRUE

Conclusion

  • Storing data in relational databases, like MySQL, are useful for structured relational data, ensuring data integrity.
  • However, NoSQL databases like MongoDB increase felxiblilty in database structure and is more easily scaled
  • Once connections are established, the insertion of a table from MySQL to MongoDB collection can be done with one line like below:
airports_col$insert(dbGetQuery(flights_sql, 'SELECT * FROM airports'))
## List of 5
##  $ nInserted  : num 1397
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()