Assignment Overview

This assignment was to work with converting a SQL type DB’s data into an noSQL type DB. The databases chosen to work with on this project was mySQL for the SQL DB and MongoDB for the noSQL db. Obviously there are various ways to do such a task, export data into a file from the SQL DB, and then load that file into the noSQL DB would be one such method. Here, though, R was used to programatically read the data from the mySQL DB and then insert into noSQL Mongo DB.

The Code…

The code is a bunch of R functions, and one just needs to call the function “convertMySQLToMongo” with a parameter of the source mySQL schema, and another parameter being a list of tables that one wants to import. The code below is commented so for most details, I refer you to those comments.

But, one function will be called out: the “popMongoTable” function. This function does the actual work, of reading data from mySQL to Mongo for a passed in table name parameter. The work needing to be done, is just one line:

mongo(collection = tableName)$insert(dbGetQuery(dbConn, str_c(“select * from”, tableName)))

So essentially to do a simple migration, requires one line of code.

library(jsonlite)
library(mongolite)
library(DBI)
library(RMySQL)
library(stringr)

rDBName <- "flights"
passWordFile <- "L:\\school\\cuny\\dbPassword.txt"


##this will drop any existing collections with the same names we wish to load...beware :)
dropMongo <- function(collections)
{
    #we will get error messages if we run this clean up and the collection doesn't exist in mongo
    ##so turn off messages
    options(show.error.messages = FALSE)
    
    for(c in collections)
    {
        m <- mongo(collection = c)
        try(m$drop())   ##returns error if doesn't exist, could do existance checks, but why bother
    }

    options(show.error.messages = TRUE)
}

##just a function to return a mysql db connection
getDBConn <- function(dbName)
{
    db <- dbDriver("MySQL")
    dbConn <-dbConnect(db, user='root',scan(passWordFile, what = "character"),dbname= dbName);
    return(dbConn)
}

##takes in a table name that exists on mySQL db, and a mysql connection, and copies data from one db to the other
popMongoTable <- function(tableName, dbConn)
{
    mongo(collection = tableName)$insert(dbGetQuery(dbConn, str_c("select * from ", tableName)))
}

##this takes in my sql schema name, connects to it, and then loops through the list of passed in tables
##calling popMongoTable to copy the data from mysql to mongo
convertMySQLToMongo <- function(db, tables)
{
    conn <- getDBConn(db)
    lapply(tables, popMongoTable, conn)
    dbDisconnect(conn)
}

#get a list of tables for a given schema on mysql
getMySqlTables <- function(db)
{
    conn <- getDBConn(db)
    tables <- dbListTables(conn)
    dbDisconnect(conn)
    return(tables)
}


#function that will compare (and print) counts from mysql and mongo, so we can see all data transferred
#also prints head of collection from mongo for visual verification that data looks right
compareDBs <- function(sqlDB, table)
{
    conn <- getDBConn(sqlDB)
    print(str_c("Count from mySql table: ", table, " is ", dbGetQuery(conn, str_c("select count(*) from ", table))))
    c <- mongo(collection = table)
    print(str_c("Count from Mongo Collection: ", table, " is ", c$count()))
    print(head(c$find(),5))
    dbDisconnect(conn)
    
}

Let’s Use it…

To test the functionality, we will migrate the “flights” schema stored in mySQL. This consists of tables, airlines, airports, flights, planes, weather. We will first run a command to drop any existing collections with the names of the tables we will import…just so we are sure to start fresh.

dropMongo(getMySqlTables("flights"))

Ok, now let’s move stuff from mySQL to Mongo:

convertMySQLToMongo("flights", getMySqlTables("flights"))
## 
Complete! Processed total of 16 rows.
## 
Processed 1000 rows...
Complete! Processed total of 1397 rows.
## 
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Processed 4000 rows...
Processed 5000 rows...
Processed 6000 rows...
Processed 7000 rows...
Processed 8000 rows...
Processed 9000 rows...
Processed 10000 rows...
Processed 11000 rows...
Processed 12000 rows...
Processed 13000 rows...
Processed 14000 rows...
Processed 15000 rows...
Processed 16000 rows...
Processed 17000 rows...
Processed 18000 rows...
Processed 19000 rows...
Processed 20000 rows...
Processed 21000 rows...
Processed 22000 rows...
Processed 23000 rows...
Processed 24000 rows...
Processed 25000 rows...
Processed 26000 rows...
Processed 27000 rows...
Processed 28000 rows...
Processed 29000 rows...
Processed 30000 rows...
Processed 31000 rows...
Processed 32000 rows...
Processed 33000 rows...
Processed 34000 rows...
Processed 35000 rows...
Processed 36000 rows...
Processed 37000 rows...
Processed 38000 rows...
Processed 39000 rows...
Processed 40000 rows...
Processed 41000 rows...
Processed 42000 rows...
Processed 43000 rows...
Processed 44000 rows...
Processed 45000 rows...
Processed 46000 rows...
Processed 47000 rows...
Processed 48000 rows...
Processed 49000 rows...
Processed 50000 rows...
Processed 51000 rows...
Processed 52000 rows...
Processed 53000 rows...
Processed 54000 rows...
Processed 55000 rows...
Processed 56000 rows...
Processed 57000 rows...
Processed 58000 rows...
Processed 59000 rows...
Processed 60000 rows...
Processed 61000 rows...
Processed 62000 rows...
Processed 63000 rows...
Processed 64000 rows...
Processed 65000 rows...
Processed 66000 rows...
Processed 67000 rows...
Processed 68000 rows...
Processed 69000 rows...
Processed 70000 rows...
Processed 71000 rows...
Processed 72000 rows...
Processed 73000 rows...
Processed 74000 rows...
Processed 75000 rows...
Processed 76000 rows...
Processed 77000 rows...
Processed 78000 rows...
Processed 79000 rows...
Processed 80000 rows...
Processed 81000 rows...
Processed 82000 rows...
Processed 83000 rows...
Processed 84000 rows...
Processed 85000 rows...
Processed 86000 rows...
Processed 87000 rows...
Processed 88000 rows...
Processed 89000 rows...
Processed 90000 rows...
Processed 91000 rows...
Processed 92000 rows...
Processed 93000 rows...
Processed 94000 rows...
Processed 95000 rows...
Processed 96000 rows...
Processed 97000 rows...
Processed 98000 rows...
Processed 99000 rows...
Processed 1e+05 rows...
Processed 101000 rows...
Processed 102000 rows...
Processed 103000 rows...
Processed 104000 rows...
Processed 105000 rows...
Processed 106000 rows...
Processed 107000 rows...
Processed 108000 rows...
Processed 109000 rows...
Processed 110000 rows...
Processed 111000 rows...
Processed 112000 rows...
Processed 113000 rows...
Processed 114000 rows...
Processed 115000 rows...
Processed 116000 rows...
Processed 117000 rows...
Processed 118000 rows...
Processed 119000 rows...
Processed 120000 rows...
Processed 121000 rows...
Processed 122000 rows...
Processed 123000 rows...
Processed 124000 rows...
Processed 125000 rows...
Processed 126000 rows...
Processed 127000 rows...
Processed 128000 rows...
Processed 129000 rows...
Processed 130000 rows...
Processed 131000 rows...
Processed 132000 rows...
Processed 133000 rows...
Processed 134000 rows...
Processed 135000 rows...
Processed 136000 rows...
Processed 137000 rows...
Processed 138000 rows...
Processed 139000 rows...
Processed 140000 rows...
Processed 141000 rows...
Processed 142000 rows...
Processed 143000 rows...
Processed 144000 rows...
Processed 145000 rows...
Processed 146000 rows...
Processed 147000 rows...
Processed 148000 rows...
Processed 149000 rows...
Processed 150000 rows...
Processed 151000 rows...
Processed 152000 rows...
Processed 153000 rows...
Processed 154000 rows...
Processed 155000 rows...
Processed 156000 rows...
Processed 157000 rows...
Processed 158000 rows...
Processed 159000 rows...
Processed 160000 rows...
Processed 161000 rows...
Processed 162000 rows...
Processed 163000 rows...
Processed 164000 rows...
Processed 165000 rows...
Processed 166000 rows...
Processed 167000 rows...
Processed 168000 rows...
Processed 169000 rows...
Processed 170000 rows...
Processed 171000 rows...
Processed 172000 rows...
Processed 173000 rows...
Processed 174000 rows...
Processed 175000 rows...
Processed 176000 rows...
Processed 177000 rows...
Processed 178000 rows...
Processed 179000 rows...
Processed 180000 rows...
Processed 181000 rows...
Processed 182000 rows...
Processed 183000 rows...
Processed 184000 rows...
Processed 185000 rows...
Processed 186000 rows...
Processed 187000 rows...
Processed 188000 rows...
Processed 189000 rows...
Processed 190000 rows...
Processed 191000 rows...
Processed 192000 rows...
Processed 193000 rows...
Processed 194000 rows...
Processed 195000 rows...
Processed 196000 rows...
Processed 197000 rows...
Processed 198000 rows...
Processed 199000 rows...
Processed 2e+05 rows...
Processed 201000 rows...
Processed 202000 rows...
Processed 203000 rows...
Processed 204000 rows...
Processed 205000 rows...
Processed 206000 rows...
Processed 207000 rows...
Processed 208000 rows...
Processed 209000 rows...
Processed 210000 rows...
Processed 211000 rows...
Processed 212000 rows...
Processed 213000 rows...
Processed 214000 rows...
Processed 215000 rows...
Processed 216000 rows...
Processed 217000 rows...
Processed 218000 rows...
Processed 219000 rows...
Processed 220000 rows...
Processed 221000 rows...
Processed 222000 rows...
Processed 223000 rows...
Processed 224000 rows...
Processed 225000 rows...
Processed 226000 rows...
Processed 227000 rows...
Processed 228000 rows...
Processed 229000 rows...
Processed 230000 rows...
Processed 231000 rows...
Processed 232000 rows...
Processed 233000 rows...
Processed 234000 rows...
Processed 235000 rows...
Processed 236000 rows...
Processed 237000 rows...
Processed 238000 rows...
Processed 239000 rows...
Processed 240000 rows...
Processed 241000 rows...
Processed 242000 rows...
Processed 243000 rows...
Processed 244000 rows...
Processed 245000 rows...
Processed 246000 rows...
Processed 247000 rows...
Processed 248000 rows...
Processed 249000 rows...
Processed 250000 rows...
Processed 251000 rows...
Processed 252000 rows...
Processed 253000 rows...
Processed 254000 rows...
Processed 255000 rows...
Processed 256000 rows...
Processed 257000 rows...
Processed 258000 rows...
Processed 259000 rows...
Processed 260000 rows...
Processed 261000 rows...
Processed 262000 rows...
Processed 263000 rows...
Processed 264000 rows...
Processed 265000 rows...
Processed 266000 rows...
Processed 267000 rows...
Processed 268000 rows...
Processed 269000 rows...
Processed 270000 rows...
Processed 271000 rows...
Processed 272000 rows...
Processed 273000 rows...
Processed 274000 rows...
Processed 275000 rows...
Processed 276000 rows...
Processed 277000 rows...
Processed 278000 rows...
Processed 279000 rows...
Processed 280000 rows...
Processed 281000 rows...
Processed 282000 rows...
Processed 283000 rows...
Processed 284000 rows...
Processed 285000 rows...
Processed 286000 rows...
Processed 287000 rows...
Processed 288000 rows...
Processed 289000 rows...
Processed 290000 rows...
Processed 291000 rows...
Processed 292000 rows...
Processed 293000 rows...
Processed 294000 rows...
Processed 295000 rows...
Processed 296000 rows...
Processed 297000 rows...
Processed 298000 rows...
Processed 299000 rows...
Processed 3e+05 rows...
Processed 301000 rows...
Processed 302000 rows...
Processed 303000 rows...
Processed 304000 rows...
Processed 305000 rows...
Processed 306000 rows...
Processed 307000 rows...
Processed 308000 rows...
Processed 309000 rows...
Processed 310000 rows...
Processed 311000 rows...
Processed 312000 rows...
Processed 313000 rows...
Processed 314000 rows...
Processed 315000 rows...
Processed 316000 rows...
Processed 317000 rows...
Processed 318000 rows...
Processed 319000 rows...
Processed 320000 rows...
Processed 321000 rows...
Processed 322000 rows...
Processed 323000 rows...
Processed 324000 rows...
Processed 325000 rows...
Processed 326000 rows...
Processed 327000 rows...
Processed 328000 rows...
Processed 329000 rows...
Processed 330000 rows...
Processed 331000 rows...
Processed 332000 rows...
Processed 333000 rows...
Processed 334000 rows...
Processed 335000 rows...
Processed 336000 rows...
Complete! Processed total of 336776 rows.
## 
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Complete! Processed total of 3322 rows.
## 
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Processed 4000 rows...
Processed 5000 rows...
Processed 6000 rows...
Processed 7000 rows...
Processed 8000 rows...
Complete! Processed total of 8719 rows.
## [1] TRUE

Ok done :)

Lets check our work. First let’s just query the airlines collection on Mongo, and see if we get anything back:

c <- mongo(collection = "airlines")
c$find('{"carrier" : "AA"} ')
## 
 Found 1 records...
 Imported 1 records. Simplifying into dataframe...
##   carrier                     name
## 1      AA American Airlines Inc.\r

Ok, we did, so now let’s run the compare function through all the tables and see if all the counts match, as well as see the top few rows of each collection on Mongo.

compareDBs(rDBName, "airlines")
## [1] "Count from mySql table: airlines is 16"
## [1] "Count from Mongo Collection: airlines is 16"
## 
 Found 16 records...
 Imported 16 records. Simplifying into dataframe...
##   carrier                     name
## 1      9E      Endeavor Air Inc.\r
## 2      AA American Airlines Inc.\r
## 3      AS   Alaska Airlines Inc.\r
## 4      B6        JetBlue Airways\r
## 5      DL   Delta Air Lines Inc.\r
## [1] TRUE
compareDBs(rDBName, "airports")
## [1] "Count from mySql table: airports is 1397"
## [1] "Count from Mongo Collection: airports is 1397"
## 
 Found 1000 records...
 Found 1397 records...
 Imported 1397 records. Simplifying into dataframe...
##   faa                          name      lat       lon  alt tz dst
## 1 04G             Lansdowne Airport 41.13047 -80.61958 1044 -5   A
## 2 06A Moton Field Municipal Airport 32.46057 -85.68003  264 -5   A
## 3 06C           Schaumburg Regional 41.98934 -88.10124  801 -6   A
## 4 06N               Randall Airport 41.43191 -74.39156  523 -5   A
## 5 09J         Jekyll Island Airport 31.07447 -81.42778   11 -4   A
## [1] TRUE
compareDBs(rDBName, "flights")
## [1] "Count from mySql table: flights is 336776"
## [1] "Count from Mongo Collection: flights is 336776"
## 
 Found 1000 records...
 Found 2000 records...
 Found 3000 records...
 Found 4000 records...
 Found 5000 records...
 Found 6000 records...
 Found 7000 records...
 Found 8000 records...
 Found 9000 records...
 Found 10000 records...
 Found 11000 records...
 Found 12000 records...
 Found 13000 records...
 Found 14000 records...
 Found 15000 records...
 Found 16000 records...
 Found 17000 records...
 Found 18000 records...
 Found 19000 records...
 Found 20000 records...
 Found 21000 records...
 Found 22000 records...
 Found 23000 records...
 Found 24000 records...
 Found 25000 records...
 Found 26000 records...
 Found 27000 records...
 Found 28000 records...
 Found 29000 records...
 Found 30000 records...
 Found 31000 records...
 Found 32000 records...
 Found 33000 records...
 Found 34000 records...
 Found 35000 records...
 Found 36000 records...
 Found 37000 records...
 Found 38000 records...
 Found 39000 records...
 Found 40000 records...
 Found 41000 records...
 Found 42000 records...
 Found 43000 records...
 Found 44000 records...
 Found 45000 records...
 Found 46000 records...
 Found 47000 records...
 Found 48000 records...
 Found 49000 records...
 Found 50000 records...
 Found 51000 records...
 Found 52000 records...
 Found 53000 records...
 Found 54000 records...
 Found 55000 records...
 Found 56000 records...
 Found 57000 records...
 Found 58000 records...
 Found 59000 records...
 Found 60000 records...
 Found 61000 records...
 Found 62000 records...
 Found 63000 records...
 Found 64000 records...
 Found 65000 records...
 Found 66000 records...
 Found 67000 records...
 Found 68000 records...
 Found 69000 records...
 Found 70000 records...
 Found 71000 records...
 Found 72000 records...
 Found 73000 records...
 Found 74000 records...
 Found 75000 records...
 Found 76000 records...
 Found 77000 records...
 Found 78000 records...
 Found 79000 records...
 Found 80000 records...
 Found 81000 records...
 Found 82000 records...
 Found 83000 records...
 Found 84000 records...
 Found 85000 records...
 Found 86000 records...
 Found 87000 records...
 Found 88000 records...
 Found 89000 records...
 Found 90000 records...
 Found 91000 records...
 Found 92000 records...
 Found 93000 records...
 Found 94000 records...
 Found 95000 records...
 Found 96000 records...
 Found 97000 records...
 Found 98000 records...
 Found 99000 records...
 Found 1e+05 records...
 Found 101000 records...
 Found 102000 records...
 Found 103000 records...
 Found 104000 records...
 Found 105000 records...
 Found 106000 records...
 Found 107000 records...
 Found 108000 records...
 Found 109000 records...
 Found 110000 records...
 Found 111000 records...
 Found 112000 records...
 Found 113000 records...
 Found 114000 records...
 Found 115000 records...
 Found 116000 records...
 Found 117000 records...
 Found 118000 records...
 Found 119000 records...
 Found 120000 records...
 Found 121000 records...
 Found 122000 records...
 Found 123000 records...
 Found 124000 records...
 Found 125000 records...
 Found 126000 records...
 Found 127000 records...
 Found 128000 records...
 Found 129000 records...
 Found 130000 records...
 Found 131000 records...
 Found 132000 records...
 Found 133000 records...
 Found 134000 records...
 Found 135000 records...
 Found 136000 records...
 Found 137000 records...
 Found 138000 records...
 Found 139000 records...
 Found 140000 records...
 Found 141000 records...
 Found 142000 records...
 Found 143000 records...
 Found 144000 records...
 Found 145000 records...
 Found 146000 records...
 Found 147000 records...
 Found 148000 records...
 Found 149000 records...
 Found 150000 records...
 Found 151000 records...
 Found 152000 records...
 Found 153000 records...
 Found 154000 records...
 Found 155000 records...
 Found 156000 records...
 Found 157000 records...
 Found 158000 records...
 Found 159000 records...
 Found 160000 records...
 Found 161000 records...
 Found 162000 records...
 Found 163000 records...
 Found 164000 records...
 Found 165000 records...
 Found 166000 records...
 Found 167000 records...
 Found 168000 records...
 Found 169000 records...
 Found 170000 records...
 Found 171000 records...
 Found 172000 records...
 Found 173000 records...
 Found 174000 records...
 Found 175000 records...
 Found 176000 records...
 Found 177000 records...
 Found 178000 records...
 Found 179000 records...
 Found 180000 records...
 Found 181000 records...
 Found 182000 records...
 Found 183000 records...
 Found 184000 records...
 Found 185000 records...
 Found 186000 records...
 Found 187000 records...
 Found 188000 records...
 Found 189000 records...
 Found 190000 records...
 Found 191000 records...
 Found 192000 records...
 Found 193000 records...
 Found 194000 records...
 Found 195000 records...
 Found 196000 records...
 Found 197000 records...
 Found 198000 records...
 Found 199000 records...
 Found 2e+05 records...
 Found 201000 records...
 Found 202000 records...
 Found 203000 records...
 Found 204000 records...
 Found 205000 records...
 Found 206000 records...
 Found 207000 records...
 Found 208000 records...
 Found 209000 records...
 Found 210000 records...
 Found 211000 records...
 Found 212000 records...
 Found 213000 records...
 Found 214000 records...
 Found 215000 records...
 Found 216000 records...
 Found 217000 records...
 Found 218000 records...
 Found 219000 records...
 Found 220000 records...
 Found 221000 records...
 Found 222000 records...
 Found 223000 records...
 Found 224000 records...
 Found 225000 records...
 Found 226000 records...
 Found 227000 records...
 Found 228000 records...
 Found 229000 records...
 Found 230000 records...
 Found 231000 records...
 Found 232000 records...
 Found 233000 records...
 Found 234000 records...
 Found 235000 records...
 Found 236000 records...
 Found 237000 records...
 Found 238000 records...
 Found 239000 records...
 Found 240000 records...
 Found 241000 records...
 Found 242000 records...
 Found 243000 records...
 Found 244000 records...
 Found 245000 records...
 Found 246000 records...
 Found 247000 records...
 Found 248000 records...
 Found 249000 records...
 Found 250000 records...
 Found 251000 records...
 Found 252000 records...
 Found 253000 records...
 Found 254000 records...
 Found 255000 records...
 Found 256000 records...
 Found 257000 records...
 Found 258000 records...
 Found 259000 records...
 Found 260000 records...
 Found 261000 records...
 Found 262000 records...
 Found 263000 records...
 Found 264000 records...
 Found 265000 records...
 Found 266000 records...
 Found 267000 records...
 Found 268000 records...
 Found 269000 records...
 Found 270000 records...
 Found 271000 records...
 Found 272000 records...
 Found 273000 records...
 Found 274000 records...
 Found 275000 records...
 Found 276000 records...
 Found 277000 records...
 Found 278000 records...
 Found 279000 records...
 Found 280000 records...
 Found 281000 records...
 Found 282000 records...
 Found 283000 records...
 Found 284000 records...
 Found 285000 records...
 Found 286000 records...
 Found 287000 records...
 Found 288000 records...
 Found 289000 records...
 Found 290000 records...
 Found 291000 records...
 Found 292000 records...
 Found 293000 records...
 Found 294000 records...
 Found 295000 records...
 Found 296000 records...
 Found 297000 records...
 Found 298000 records...
 Found 299000 records...
 Found 3e+05 records...
 Found 301000 records...
 Found 302000 records...
 Found 303000 records...
 Found 304000 records...
 Found 305000 records...
 Found 306000 records...
 Found 307000 records...
 Found 308000 records...
 Found 309000 records...
 Found 310000 records...
 Found 311000 records...
 Found 312000 records...
 Found 313000 records...
 Found 314000 records...
 Found 315000 records...
 Found 316000 records...
 Found 317000 records...
 Found 318000 records...
 Found 319000 records...
 Found 320000 records...
 Found 321000 records...
 Found 322000 records...
 Found 323000 records...
 Found 324000 records...
 Found 325000 records...
 Found 326000 records...
 Found 327000 records...
 Found 328000 records...
 Found 329000 records...
 Found 330000 records...
 Found 331000 records...
 Found 332000 records...
 Found 333000 records...
 Found 334000 records...
 Found 335000 records...
 Found 336000 records...
 Found 336776 records...
 Imported 336776 records. Simplifying into dataframe...
##   year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013     1   1      517         2      830        11      UA  N14228
## 2 2013     1   1      533         4      850        20      UA  N24211
## 3 2013     1   1      542         2      923        33      AA  N619AA
## 4 2013     1   1      544        -1     1004       -18      B6  N804JB
## 5 2013     1   1      554        -6      812       -25      DL  N668DN
##   flight origin dest air_time distance hour minute
## 1   1545    EWR  IAH      227     1400    5     17
## 2   1714    LGA  IAH      227     1416    5     33
## 3   1141    JFK  MIA      160     1089    5     42
## 4    725    JFK  BQN      183     1576    5     44
## 5    461    LGA  ATL      116      762    6     54
## [1] TRUE
compareDBs(rDBName, "planes")
## [1] "Count from mySql table: planes is 3322"
## [1] "Count from Mongo Collection: planes is 3322"
## 
 Found 1000 records...
 Found 2000 records...
 Found 3000 records...
 Found 3322 records...
 Imported 3322 records. Simplifying into dataframe...
##   tailnum year                    type     manufacturer     model engines
## 1  N10156 2004 Fixed wing multi engine          EMBRAER EMB-145XR       2
## 2  N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 3  N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 4  N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 5  N10575 2002 Fixed wing multi engine          EMBRAER EMB-145LR       2
##   seats    engine speed
## 1    55 Turbo-fan    NA
## 2   182 Turbo-fan    NA
## 3   182 Turbo-fan    NA
## 4   182 Turbo-fan    NA
## 5    55 Turbo-fan    NA
## [1] TRUE
compareDBs(rDBName, "weather")
## [1] "Count from mySql table: weather is 8719"
## [1] "Count from Mongo Collection: weather is 8719"
## 
 Found 1000 records...
 Found 2000 records...
 Found 3000 records...
 Found 4000 records...
 Found 5000 records...
 Found 6000 records...
 Found 7000 records...
 Found 8000 records...
 Found 8719 records...
 Imported 8719 records. Simplifying into dataframe...
##   origin year month day hour  temp  dewp humid wind_dir wind_speed
## 1    EWR 2013     1   1    0 37.04 21.92 53.97      230   10.35702
## 2    EWR 2013     1   1    1 37.04 21.92 53.97      230   13.80936
## 3    EWR 2013     1   1    2 37.94 21.92 52.09      230   12.65858
## 4    EWR 2013     1   1    3 37.94 23.00 54.51      230   13.80936
## 5    EWR 2013     1   1    4 37.94 24.08 57.04      240   14.96014
##   wind_gust precip pressure visib
## 1  11.91865      0   1013.9    10
## 2  15.89154      0   1013.0    10
## 3  14.56724      0   1012.6    10
## 4  15.89154      0   1012.7    10
## 5  17.21583      0   1012.8    10
## [1] TRUE

Clearly, the compare function is not very efficient, as it loads the whole Mongo collection into a dataframe, and then does a “head”" to get the top 5 rows. Clearly if querying from Mongo itself through R, you’d want to write a query to return a subset of rows…but often in R we are just going to load the whole collection into a dataframe, and work from there, so not sure our inefficiency here is that critical.

Next, let’s do a table from a different schema. The “Diamonds” from the ggplot package was imported into mySQL, so lets see if we can load that into Mongo.

dropMongo("diamonds")
convertMySQLToMongo("diamonds", "diamonds")
## 
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Processed 4000 rows...
Processed 5000 rows...
Processed 6000 rows...
Processed 7000 rows...
Processed 8000 rows...
Processed 9000 rows...
Processed 10000 rows...
Processed 11000 rows...
Processed 12000 rows...
Processed 13000 rows...
Processed 14000 rows...
Processed 15000 rows...
Processed 16000 rows...
Processed 17000 rows...
Processed 18000 rows...
Processed 19000 rows...
Processed 20000 rows...
Processed 21000 rows...
Processed 22000 rows...
Processed 23000 rows...
Processed 24000 rows...
Processed 25000 rows...
Processed 26000 rows...
Processed 27000 rows...
Processed 28000 rows...
Processed 29000 rows...
Processed 30000 rows...
Processed 31000 rows...
Processed 32000 rows...
Processed 33000 rows...
Processed 34000 rows...
Processed 35000 rows...
Processed 36000 rows...
Processed 37000 rows...
Processed 38000 rows...
Processed 39000 rows...
Processed 40000 rows...
Processed 41000 rows...
Processed 42000 rows...
Processed 43000 rows...
Processed 44000 rows...
Processed 45000 rows...
Processed 46000 rows...
Processed 47000 rows...
Processed 48000 rows...
Processed 49000 rows...
Processed 50000 rows...
Processed 51000 rows...
Processed 52000 rows...
Processed 53000 rows...
Complete! Processed total of 53940 rows.
## [1] TRUE
compareDBs("diamonds", "diamonds")
## [1] "Count from mySql table: diamonds is 53940"
## [1] "Count from Mongo Collection: diamonds is 53940"
## 
 Found 1000 records...
 Found 2000 records...
 Found 3000 records...
 Found 4000 records...
 Found 5000 records...
 Found 6000 records...
 Found 7000 records...
 Found 8000 records...
 Found 9000 records...
 Found 10000 records...
 Found 11000 records...
 Found 12000 records...
 Found 13000 records...
 Found 14000 records...
 Found 15000 records...
 Found 16000 records...
 Found 17000 records...
 Found 18000 records...
 Found 19000 records...
 Found 20000 records...
 Found 21000 records...
 Found 22000 records...
 Found 23000 records...
 Found 24000 records...
 Found 25000 records...
 Found 26000 records...
 Found 27000 records...
 Found 28000 records...
 Found 29000 records...
 Found 30000 records...
 Found 31000 records...
 Found 32000 records...
 Found 33000 records...
 Found 34000 records...
 Found 35000 records...
 Found 36000 records...
 Found 37000 records...
 Found 38000 records...
 Found 39000 records...
 Found 40000 records...
 Found 41000 records...
 Found 42000 records...
 Found 43000 records...
 Found 44000 records...
 Found 45000 records...
 Found 46000 records...
 Found 47000 records...
 Found 48000 records...
 Found 49000 records...
 Found 50000 records...
 Found 51000 records...
 Found 52000 records...
 Found 53000 records...
 Found 53940 records...
 Imported 53940 records. Simplifying into dataframe...
##   seq carat     cut color clarity depth table price    x    y    z
## 1   1  0.23   Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2   2  0.21 Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3   3  0.23    Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4   4  0.29 Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5   5  0.31    Good     J     SI2  63.3    58   335 4.34 4.35 2.75
## [1] TRUE

Looks good.

Discussion of SQL vs. noSQL Databases.

Ok, so great we did this conversion, would we want to use a noSQL DB, or a SQL DB to store data going forward? Well first thing to note, the data we imported from flights and diamonds, had no foreign keys to other tables, and hence we didn’t need to worry about referential integrity. While RI can be worked into noSQL, it is much harder, while RI is inherent in a (properly) designed SQL schema. Another data advantage to SQL is transactional Integrity. With the structure of noSQL DB’s, it is hard to validate that a “transactional” upsert/delete type query and be sure stale data will not be read, or only one record of a given type is inserted. One last advantage mentioned is SQL language itself, JSON CRUD operations for complex queries become much more complicated to code than the SQL query counterparts.

But noSQL DB’s clearly have advantages over SQL versions (one needs the right tool for the job). SQL databases are more difficult to scale, so for storing massive amounts of data (that doesn’t have as strict data integerity requirements) noSQL is a better choice as it scales more easily. noSQL is also simpler for getting started and getting to work. To work with a SQL db, one needs to define the metadata (data types, keys, etc.), while you don’t have to do that with noSQL. In our exercise, it was quite easy to import the SQL tables into Mongo collections, but the reverse would have been more difficult (granted one could default all the data to varchar(max) upon import but that isn’t really what one should do). Lastly, noSQL DB’s are generaly faster at retrieving data than a SQL DB (obviously one would need to get correctly tune indexes to do so…).

Clearly noSQL DB’s aren’t going to replace SQL DB’s (at least not in noSQL DB’s current state), but they have their place, just like the SQL DB’s that have been around much longer.