For this assignment we will migrate a MySQL database to MongoDB. We will be connecting directly to both database servers and migrating the various tables directly.
Finally, we will validate that all rows of data are present and accounted for in Mongo.
Here we connect to MySQL which already contains the flights database used during the SQL bridge course. We’ll query the table names to ensure that it contains the tables we’re interested in.
#connect to MySQL
flights_db <- dbConnect(MySQL(), user="root", password="1234", dbname='flights', host='localhost')
#Get names of tables
(tables <- dbListTables(flights_db))
## [1] "airlines" "airports" "flights" "planes" "weather"
We’ve verified that the database contains five tables, airlines, airports, flights, planes, and weather. I’ve saved these table names to a vector for use in the next step where we will iterate over each of those elements to copy them from one database to the other. Next we will copy these over to Mongo.
In the chunk below I loop through the elements of the tables vector, which contains the names of the tables currently on MySQL. For each table I query MySQL returning all columns and rows. I then connect to and create a collection for that same table in Mongo. Next, I check if that collection has any existing data, if so, it gets dropped. Then finally I insert the full table into the Mongo collection.
#programmatically query the MySQL DB for each table and assign to a temporary variable, then insert to mongodb
for(i in 1:length(tables)){
tbl <- dbGetQuery(flights_db,paste("SELECT * FROM",tables[i]))
con <- mongo(collection = tables[i], db = "flights")
if (con$count() > 0) {
con$drop()
}
con$insert(tbl)
}
To ensure data was copied over, here we print the head of each table from the Mongo server.
for(i in 1:length(tables)){
print(head(mongo(collection=tables[i],db="flights")$find()))
}
## carrier name
## 1 9E Endeavor Air Inc.\r
## 2 AA American Airlines Inc.\r
## 3 AS Alaska Airlines Inc.\r
## 4 B6 JetBlue Airways\r
## 5 DL Delta Air Lines Inc.\r
## 6 EV ExpressJet Airlines Inc.\r
## faa name lat lon alt tz dst
## 1 04G Lansdowne Airport 41.13047 -80.61958 1044 -5 A
## 2 06A Moton Field Municipal Airport 32.46057 -85.68003 264 -5 A
## 3 06C Schaumburg Regional 41.98934 -88.10124 801 -6 A
## 4 06N Randall Airport 41.43191 -74.39156 523 -5 A
## 5 09J Jekyll Island Airport 31.07447 -81.42778 11 -4 A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4 A
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## flight origin dest air_time distance hour minute
## 1 1545 EWR IAH 227 1400 5 17
## 2 1714 LGA IAH 227 1416 5 33
## 3 1141 JFK MIA 160 1089 5 42
## 4 725 JFK BQN 183 1576 5 44
## 5 461 LGA ATL 116 762 6 54
## 6 1696 EWR ORD 150 719 6 54
## tailnum year type manufacturer model engines
## 1 N10156 2004 Fixed wing multi engine EMBRAER EMB-145XR 2
## 2 N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 3 N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 4 N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 5 N10575 2002 Fixed wing multi engine EMBRAER EMB-145LR 2
## 6 N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## seats engine speed
## 1 55 Turbo-fan NA
## 2 182 Turbo-fan NA
## 3 182 Turbo-fan NA
## 4 182 Turbo-fan NA
## 5 55 Turbo-fan NA
## 6 182 Turbo-fan NA
## origin year month day hour temp dewp humid wind_dir wind_speed
## 1 EWR 2013 1 1 0 37.04 21.92 53.97 230 10.35702
## 2 EWR 2013 1 1 1 37.04 21.92 53.97 230 13.80936
## 3 EWR 2013 1 1 2 37.94 21.92 52.09 230 12.65858
## 4 EWR 2013 1 1 3 37.94 23.00 54.51 230 13.80936
## 5 EWR 2013 1 1 4 37.94 24.08 57.04 240 14.96014
## 6 EWR 2013 1 1 6 39.02 26.06 59.37 270 10.35702
## wind_gust precip pressure visib
## 1 11.91865 0 1013.9 10
## 2 15.89154 0 1013.0 10
## 3 14.56724 0 1012.6 10
## 4 15.89154 0 1012.7 10
## 5 17.21583 0 1012.8 10
## 6 11.91865 0 1012.0 10
Finally, to verify that all rows have been copied over, I will compare the number of rows for each table on both MySQL and Mongo directly. If they match up, R will print “ok” if there is a deviation, “not ok”.
for(i in 1:length(tables)){
mongo_nrow <- nrow(mongo(collection=tables[i],db="flights")$find())
sql_nrow <- nrow(dbGetQuery(flights_db,paste("SELECT * FROM ",tables[i])))
if(mongo_nrow == sql_nrow){
print(paste(tables[i],"ok"))
}else{
print(paste(tables[i],"not ok"))
}
}
## [1] "airlines ok"
## [1] "airports ok"
## [1] "flights ok"
## [1] "planes ok"
## [1] "weather ok"