Introduction

For this assignment we will migrate a MySQL database to MongoDB. We will be connecting directly to both database servers and migrating the various tables directly.
Finally, we will validate that all rows of data are present and accounted for in Mongo.

Connect to MySQL

Here we connect to MySQL which already contains the flights database used during the SQL bridge course. We’ll query the table names to ensure that it contains the tables we’re interested in.

#connect to MySQL
flights_db <- dbConnect(MySQL(), user="root", password="1234", dbname='flights', host='localhost')

#Get names of tables
(tables <- dbListTables(flights_db))
## [1] "airlines" "airports" "flights"  "planes"   "weather"

We’ve verified that the database contains five tables, airlines, airports, flights, planes, and weather. I’ve saved these table names to a vector for use in the next step where we will iterate over each of those elements to copy them from one database to the other. Next we will copy these over to Mongo.

Migration

In the chunk below I loop through the elements of the tables vector, which contains the names of the tables currently on MySQL. For each table I query MySQL returning all columns and rows. I then connect to and create a collection for that same table in Mongo. Next, I check if that collection has any existing data, if so, it gets dropped. Then finally I insert the full table into the Mongo collection.

#programmatically query the MySQL DB for each table and assign to a temporary variable, then insert to mongodb
for(i in 1:length(tables)){
    tbl <- dbGetQuery(flights_db,paste("SELECT * FROM",tables[i]))
    con <- mongo(collection = tables[i], db = "flights")
    if (con$count() > 0) {
        con$drop() 
    }
    con$insert(tbl)
}

Verification

To ensure data was copied over, here we print the head of each table from the Mongo server.

for(i in 1:length(tables)){
    print(head(mongo(collection=tables[i],db="flights")$find()))  
}
##   carrier                       name
## 1      9E        Endeavor Air Inc.\r
## 2      AA   American Airlines Inc.\r
## 3      AS     Alaska Airlines Inc.\r
## 4      B6          JetBlue Airways\r
## 5      DL     Delta Air Lines Inc.\r
## 6      EV ExpressJet Airlines Inc.\r
##   faa                           name      lat       lon  alt tz dst
## 1 04G              Lansdowne Airport 41.13047 -80.61958 1044 -5   A
## 2 06A  Moton Field Municipal Airport 32.46057 -85.68003  264 -5   A
## 3 06C            Schaumburg Regional 41.98934 -88.10124  801 -6   A
## 4 06N                Randall Airport 41.43191 -74.39156  523 -5   A
## 5 09J          Jekyll Island Airport 31.07447 -81.42778   11 -4   A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4   A
##   year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013     1   1      517         2      830        11      UA  N14228
## 2 2013     1   1      533         4      850        20      UA  N24211
## 3 2013     1   1      542         2      923        33      AA  N619AA
## 4 2013     1   1      544        -1     1004       -18      B6  N804JB
## 5 2013     1   1      554        -6      812       -25      DL  N668DN
## 6 2013     1   1      554        -4      740        12      UA  N39463
##   flight origin dest air_time distance hour minute
## 1   1545    EWR  IAH      227     1400    5     17
## 2   1714    LGA  IAH      227     1416    5     33
## 3   1141    JFK  MIA      160     1089    5     42
## 4    725    JFK  BQN      183     1576    5     44
## 5    461    LGA  ATL      116      762    6     54
## 6   1696    EWR  ORD      150      719    6     54
##   tailnum year                    type     manufacturer     model engines
## 1  N10156 2004 Fixed wing multi engine          EMBRAER EMB-145XR       2
## 2  N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 3  N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 4  N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 5  N10575 2002 Fixed wing multi engine          EMBRAER EMB-145LR       2
## 6  N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
##   seats    engine speed
## 1    55 Turbo-fan    NA
## 2   182 Turbo-fan    NA
## 3   182 Turbo-fan    NA
## 4   182 Turbo-fan    NA
## 5    55 Turbo-fan    NA
## 6   182 Turbo-fan    NA
##   origin year month day hour  temp  dewp humid wind_dir wind_speed
## 1    EWR 2013     1   1    0 37.04 21.92 53.97      230   10.35702
## 2    EWR 2013     1   1    1 37.04 21.92 53.97      230   13.80936
## 3    EWR 2013     1   1    2 37.94 21.92 52.09      230   12.65858
## 4    EWR 2013     1   1    3 37.94 23.00 54.51      230   13.80936
## 5    EWR 2013     1   1    4 37.94 24.08 57.04      240   14.96014
## 6    EWR 2013     1   1    6 39.02 26.06 59.37      270   10.35702
##   wind_gust precip pressure visib
## 1  11.91865      0   1013.9    10
## 2  15.89154      0   1013.0    10
## 3  14.56724      0   1012.6    10
## 4  15.89154      0   1012.7    10
## 5  17.21583      0   1012.8    10
## 6  11.91865      0   1012.0    10

Finally, to verify that all rows have been copied over, I will compare the number of rows for each table on both MySQL and Mongo directly. If they match up, R will print “ok” if there is a deviation, “not ok”.

for(i in 1:length(tables)){
    mongo_nrow <- nrow(mongo(collection=tables[i],db="flights")$find())
    sql_nrow <- nrow(dbGetQuery(flights_db,paste("SELECT * FROM ",tables[i])))
    if(mongo_nrow == sql_nrow){
        print(paste(tables[i],"ok"))
    }else{
        print(paste(tables[i],"not ok"))
    }
}
## [1] "airlines ok"
## [1] "airports ok"
## [1] "flights ok"
## [1] "planes ok"
## [1] "weather ok"