I first start by connecting to MySQL database in my local computer. The database i will be migrating is the Flights database given at the beginning of the course.
Flight Schema
After connecting to the local database I created a separate query for each table. Using Tidyr I modified the dates columns in the flightsdf and weatherdf data frame to a standard date format. Packages used in the below step are RMYSQL and Tidyr.
# set MySQL information
Yuser = 'root'
Ydbname ='flights'
Yhost = 'localhost'
# create db connection with password promt
mydb = dbConnect(MySQL(), user= Yuser, password=getPass('Enter Database Password'), dbname=Ydbname, host=Yhost)
# query for the view
planesq = dbSendQuery(mydb,"select * from planes;")
# place query values in data frame
planesdf = data.frame( fetch(planesq, n=-1))
head(planesdf,5)
tailnum year type manufacturer model engines
1 N10156 2004 Fixed wing multi engine EMBRAER EMB-145XR 2
2 N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
3 N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
4 N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
5 N10575 2002 Fixed wing multi engine EMBRAER EMB-145LR 2
seats speed engine
1 55 NA Turbo-fan
2 182 NA Turbo-fan
3 182 NA Turbo-fan
4 182 NA Turbo-fan
5 55 NA Turbo-fan
flightsq = dbSendQuery(mydb,"select * from flights;")
# place query values in data frame
flightsdf = data.frame( fetch(flightsq, n=-1))
head(flightsdf,5)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum
1 2013 1 1 517 2 830 11 UA N14228
2 2013 1 1 533 4 850 20 UA N24211
3 2013 1 1 542 2 923 33 AA N619AA
4 2013 1 1 544 -1 1004 -18 B6 N804JB
5 2013 1 1 554 -6 812 -25 DL N668DN
flight origin dest air_time distance hour minute
1 1545 EWR IAH 227 1400 5 17
2 1714 LGA IAH 227 1416 5 33
3 1141 JFK MIA 160 1089 5 42
4 725 JFK BQN 183 1576 5 44
5 461 LGA ATL 116 762 6 54
airportsq = dbSendQuery(mydb,"select * from airports;")
# place query values in data frame
airportsdf = data.frame( fetch(airportsq, n=-1))
head(airportsdf,5)
faa name lat lon alt tz dst
1 04G Lansdowne Airport 41.13047 -80.61958 1044 -5 A
2 06A Moton Field Municipal Airport 32.46057 -85.68003 264 -5 A
3 06C Schaumburg Regional 41.98934 -88.10124 801 -6 A
4 06N Randall Airport 41.43191 -74.39156 523 -5 A
5 09J Jekyll Island Airport 31.07447 -81.42778 11 -4 A
weatherq = dbSendQuery(mydb,"select * from weather;")
# place query values in data frame
weatherdf = data.frame( fetch(weatherq, n=-1))
head(weatherdf,5)
origin year month day hour temp dewp humid wind_dir wind_speed
1 EWR 2013 1 1 0 37.04 21.92 53.97 230 10.35702
2 EWR 2013 1 1 1 37.04 21.92 53.97 230 13.80936
3 EWR 2013 1 1 2 37.94 21.92 52.09 230 12.65858
4 EWR 2013 1 1 3 37.94 23.00 54.51 230 13.80936
5 EWR 2013 1 1 4 37.94 24.08 57.04 240 14.96014
wind_gust precip pressure visib
1 11.91865 0 1013.9 10
2 15.89154 0 1013.0 10
3 14.56724 0 1012.6 10
4 15.89154 0 1012.7 10
5 17.21583 0 1012.8 10
airlinesq = dbSendQuery(mydb,"select * from airlines;")
# place query values in data frame
airlinesdf = data.frame( fetch(airlinesq, n=-1))
head(airlinesdf,5)
carrier name
1 9E Endeavor Air Inc.\r
2 AA American Airlines Inc.\r
3 AS Alaska Airlines Inc.\r
4 B6 JetBlue Airways\r
5 DL Delta Air Lines Inc.\r
# close database connection
dbDisconnect(mydb)
Warning: Closing open result sets
[1] TRUE
flightdfF = unite( flightsdf, "date", month,day,year, sep = "/")
head(flightdfF,5)
date dep_time dep_delay arr_time arr_delay carrier tailnum flight
1 1/1/2013 517 2 830 11 UA N14228 1545
2 1/1/2013 533 4 850 20 UA N24211 1714
3 1/1/2013 542 2 923 33 AA N619AA 1141
4 1/1/2013 544 -1 1004 -18 B6 N804JB 725
5 1/1/2013 554 -6 812 -25 DL N668DN 461
origin dest air_time distance hour minute
1 EWR IAH 227 1400 5 17
2 LGA IAH 227 1416 5 33
3 JFK MIA 160 1089 5 42
4 JFK BQN 183 1576 5 44
5 LGA ATL 116 762 6 54
weatherdfF = unite(weatherdf, "date", month,day,year, sep = "/")
head(weatherdfF,5)
origin date hour temp dewp humid wind_dir wind_speed wind_gust
1 EWR 1/1/2013 0 37.04 21.92 53.97 230 10.35702 11.91865
2 EWR 1/1/2013 1 37.04 21.92 53.97 230 13.80936 15.89154
3 EWR 1/1/2013 2 37.94 21.92 52.09 230 12.65858 14.56724
4 EWR 1/1/2013 3 37.94 23.00 54.51 230 13.80936 15.89154
5 EWR 1/1/2013 4 37.94 24.08 57.04 240 14.96014 17.21583
precip pressure visib
1 0 1013.9 10
2 0 1013.0 10
3 0 1012.6 10
4 0 1012.7 10
5 0 1012.8 10
After modifying the data I created a separate connection for each data frame to be inserted into a collection/table.The code below will always drop the collection upon execution, insert the corresponding data frame, count the documents/rows, and display to documents.
weather = mongo(collection = "Weather" ,db = "flights", url = "mongodb://localhost")
weather$drop()
[1] TRUE
weather$insert(weatherdfF)
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Processed 4000 rows...
Processed 5000 rows...
Processed 6000 rows...
Processed 7000 rows...
Processed 8000 rows...
Complete! Processed total of 8719 rows.
$nInserted
[1] 8719
$nMatched
[1] 0
$nRemoved
[1] 0
$nUpserted
[1] 0
$writeErrors
list()
weather$count()
[1] 8719
weather$find(limit = 10)
Found 10 records...
Imported 10 records. Simplifying into dataframe...
origin date hour temp dewp humid wind_dir wind_speed wind_gust
1 EWR 1/1/2013 0 37.04 21.92 53.97 230 10.35702 11.918651
2 EWR 1/1/2013 1 37.04 21.92 53.97 230 13.80936 15.891535
3 EWR 1/1/2013 2 37.94 21.92 52.09 230 12.65858 14.567241
4 EWR 1/1/2013 3 37.94 23.00 54.51 230 13.80936 15.891535
5 EWR 1/1/2013 4 37.94 24.08 57.04 240 14.96014 17.215830
6 EWR 1/1/2013 6 39.02 26.06 59.37 270 10.35702 11.918651
7 EWR 1/1/2013 7 39.02 26.96 61.63 250 8.05546 9.270062
8 EWR 1/1/2013 8 39.02 28.04 64.43 240 11.50780 13.242946
9 EWR 1/1/2013 9 39.92 28.04 62.21 250 12.65858 14.567241
10 EWR 1/1/2013 10 39.02 28.04 64.43 260 12.65858 14.567241
precip pressure visib
1 0 1013.9 10
2 0 1013.0 10
3 0 1012.6 10
4 0 1012.7 10
5 0 1012.8 10
6 0 1012.0 10
7 0 1012.3 10
8 0 1012.5 10
9 0 1012.2 10
10 0 1011.9 10
Weather
flights = mongo( collection = "Flights",db = "flights", url = "mongodb://localhost")
flights$drop()
[1] TRUE
flights$insert(flightdfF)
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Processed 4000 rows...
Processed 5000 rows...
Processed 6000 rows...
Processed 7000 rows...
Processed 8000 rows...
Processed 9000 rows...
Processed 10000 rows...
Processed 11000 rows...
Processed 12000 rows...
Processed 13000 rows...
Processed 14000 rows...
Processed 15000 rows...
Processed 16000 rows...
Processed 17000 rows...
Processed 18000 rows...
Processed 19000 rows...
Processed 20000 rows...
Processed 21000 rows...
Processed 22000 rows...
Processed 23000 rows...
Processed 24000 rows...
Processed 25000 rows...
Processed 26000 rows...
Processed 27000 rows...
Processed 28000 rows...
Processed 29000 rows...
Processed 30000 rows...
Processed 31000 rows...
Processed 32000 rows...
Processed 33000 rows...
Processed 34000 rows...
Processed 35000 rows...
Processed 36000 rows...
Processed 37000 rows...
Processed 38000 rows...
Processed 39000 rows...
Processed 40000 rows...
Processed 41000 rows...
Processed 42000 rows...
Processed 43000 rows...
Processed 44000 rows...
Processed 45000 rows...
Processed 46000 rows...
Processed 47000 rows...
Processed 48000 rows...
Processed 49000 rows...
Processed 50000 rows...
Processed 51000 rows...
Processed 52000 rows...
Processed 53000 rows...
Processed 54000 rows...
Processed 55000 rows...
Processed 56000 rows...
Processed 57000 rows...
Processed 58000 rows...
Processed 59000 rows...
Processed 60000 rows...
Processed 61000 rows...
Processed 62000 rows...
Processed 63000 rows...
Processed 64000 rows...
Processed 65000 rows...
Processed 66000 rows...
Processed 67000 rows...
Processed 68000 rows...
Processed 69000 rows...
Processed 70000 rows...
Processed 71000 rows...
Processed 72000 rows...
Processed 73000 rows...
Processed 74000 rows...
Processed 75000 rows...
Processed 76000 rows...
Processed 77000 rows...
Processed 78000 rows...
Processed 79000 rows...
Processed 80000 rows...
Processed 81000 rows...
Processed 82000 rows...
Processed 83000 rows...
Processed 84000 rows...
Processed 85000 rows...
Processed 86000 rows...
Processed 87000 rows...
Processed 88000 rows...
Processed 89000 rows...
Processed 90000 rows...
Processed 91000 rows...
Processed 92000 rows...
Processed 93000 rows...
Processed 94000 rows...
Processed 95000 rows...
Processed 96000 rows...
Processed 97000 rows...
Processed 98000 rows...
Processed 99000 rows...
Processed 1e+05 rows...
Processed 101000 rows...
Processed 102000 rows...
Processed 103000 rows...
Processed 104000 rows...
Processed 105000 rows...
Processed 106000 rows...
Processed 107000 rows...
Processed 108000 rows...
Processed 109000 rows...
Processed 110000 rows...
Processed 111000 rows...
Processed 112000 rows...
Processed 113000 rows...
Processed 114000 rows...
Processed 115000 rows...
Processed 116000 rows...
Processed 117000 rows...
Processed 118000 rows...
Processed 119000 rows...
Processed 120000 rows...
Processed 121000 rows...
Processed 122000 rows...
Processed 123000 rows...
Processed 124000 rows...
Processed 125000 rows...
Processed 126000 rows...
Processed 127000 rows...
Processed 128000 rows...
Processed 129000 rows...
Processed 130000 rows...
Processed 131000 rows...
Processed 132000 rows...
Processed 133000 rows...
Processed 134000 rows...
Processed 135000 rows...
Processed 136000 rows...
Processed 137000 rows...
Processed 138000 rows...
Processed 139000 rows...
Processed 140000 rows...
Processed 141000 rows...
Processed 142000 rows...
Processed 143000 rows...
Processed 144000 rows...
Processed 145000 rows...
Processed 146000 rows...
Processed 147000 rows...
Processed 148000 rows...
Processed 149000 rows...
Processed 150000 rows...
Processed 151000 rows...
Processed 152000 rows...
Processed 153000 rows...
Processed 154000 rows...
Processed 155000 rows...
Processed 156000 rows...
Processed 157000 rows...
Processed 158000 rows...
Processed 159000 rows...
Processed 160000 rows...
Processed 161000 rows...
Processed 162000 rows...
Processed 163000 rows...
Processed 164000 rows...
Processed 165000 rows...
Processed 166000 rows...
Processed 167000 rows...
Processed 168000 rows...
Processed 169000 rows...
Processed 170000 rows...
Processed 171000 rows...
Processed 172000 rows...
Processed 173000 rows...
Processed 174000 rows...
Processed 175000 rows...
Processed 176000 rows...
Processed 177000 rows...
Processed 178000 rows...
Processed 179000 rows...
Processed 180000 rows...
Processed 181000 rows...
Processed 182000 rows...
Processed 183000 rows...
Processed 184000 rows...
Processed 185000 rows...
Processed 186000 rows...
Processed 187000 rows...
Processed 188000 rows...
Processed 189000 rows...
Processed 190000 rows...
Processed 191000 rows...
Processed 192000 rows...
Processed 193000 rows...
Processed 194000 rows...
Processed 195000 rows...
Processed 196000 rows...
Processed 197000 rows...
Processed 198000 rows...
Processed 199000 rows...
Processed 2e+05 rows...
Processed 201000 rows...
Processed 202000 rows...
Processed 203000 rows...
Processed 204000 rows...
Processed 205000 rows...
Processed 206000 rows...
Processed 207000 rows...
Processed 208000 rows...
Processed 209000 rows...
Processed 210000 rows...
Processed 211000 rows...
Processed 212000 rows...
Processed 213000 rows...
Processed 214000 rows...
Processed 215000 rows...
Processed 216000 rows...
Processed 217000 rows...
Processed 218000 rows...
Processed 219000 rows...
Processed 220000 rows...
Processed 221000 rows...
Processed 222000 rows...
Processed 223000 rows...
Processed 224000 rows...
Processed 225000 rows...
Processed 226000 rows...
Processed 227000 rows...
Processed 228000 rows...
Processed 229000 rows...
Processed 230000 rows...
Processed 231000 rows...
Processed 232000 rows...
Processed 233000 rows...
Processed 234000 rows...
Processed 235000 rows...
Processed 236000 rows...
Processed 237000 rows...
Processed 238000 rows...
Processed 239000 rows...
Processed 240000 rows...
Processed 241000 rows...
Processed 242000 rows...
Processed 243000 rows...
Processed 244000 rows...
Processed 245000 rows...
Processed 246000 rows...
Processed 247000 rows...
Processed 248000 rows...
Processed 249000 rows...
Processed 250000 rows...
Processed 251000 rows...
Processed 252000 rows...
Processed 253000 rows...
Processed 254000 rows...
Processed 255000 rows...
Processed 256000 rows...
Processed 257000 rows...
Processed 258000 rows...
Processed 259000 rows...
Processed 260000 rows...
Processed 261000 rows...
Processed 262000 rows...
Processed 263000 rows...
Processed 264000 rows...
Processed 265000 rows...
Processed 266000 rows...
Processed 267000 rows...
Processed 268000 rows...
Processed 269000 rows...
Processed 270000 rows...
Processed 271000 rows...
Processed 272000 rows...
Processed 273000 rows...
Processed 274000 rows...
Processed 275000 rows...
Processed 276000 rows...
Processed 277000 rows...
Processed 278000 rows...
Processed 279000 rows...
Processed 280000 rows...
Processed 281000 rows...
Processed 282000 rows...
Processed 283000 rows...
Processed 284000 rows...
Processed 285000 rows...
Processed 286000 rows...
Processed 287000 rows...
Processed 288000 rows...
Processed 289000 rows...
Processed 290000 rows...
Processed 291000 rows...
Processed 292000 rows...
Processed 293000 rows...
Processed 294000 rows...
Processed 295000 rows...
Processed 296000 rows...
Processed 297000 rows...
Processed 298000 rows...
Processed 299000 rows...
Processed 3e+05 rows...
Processed 301000 rows...
Processed 302000 rows...
Processed 303000 rows...
Processed 304000 rows...
Processed 305000 rows...
Processed 306000 rows...
Processed 307000 rows...
Processed 308000 rows...
Processed 309000 rows...
Processed 310000 rows...
Processed 311000 rows...
Processed 312000 rows...
Processed 313000 rows...
Processed 314000 rows...
Processed 315000 rows...
Processed 316000 rows...
Processed 317000 rows...
Processed 318000 rows...
Processed 319000 rows...
Processed 320000 rows...
Processed 321000 rows...
Processed 322000 rows...
Processed 323000 rows...
Processed 324000 rows...
Processed 325000 rows...
Processed 326000 rows...
Processed 327000 rows...
Processed 328000 rows...
Processed 329000 rows...
Processed 330000 rows...
Processed 331000 rows...
Processed 332000 rows...
Processed 333000 rows...
Processed 334000 rows...
Processed 335000 rows...
Processed 336000 rows...
Complete! Processed total of 336776 rows.
$nInserted
[1] 336776
$nMatched
[1] 0
$nRemoved
[1] 0
$nUpserted
[1] 0
$writeErrors
list()
flights$count()
[1] 336776
flights$find(limit = 10)
Found 10 records...
Imported 10 records. Simplifying into dataframe...
date dep_time dep_delay arr_time arr_delay carrier tailnum flight
1 1/1/2013 517 2 830 11 UA N14228 1545
2 1/1/2013 533 4 850 20 UA N24211 1714
3 1/1/2013 542 2 923 33 AA N619AA 1141
4 1/1/2013 544 -1 1004 -18 B6 N804JB 725
5 1/1/2013 554 -6 812 -25 DL N668DN 461
6 1/1/2013 554 -4 740 12 UA N39463 1696
7 1/1/2013 555 -5 913 19 B6 N516JB 507
8 1/1/2013 557 -3 709 -14 EV N829AS 5708
9 1/1/2013 557 -3 838 -8 B6 N593JB 79
10 1/1/2013 558 -2 753 8 AA N3ALAA 301
origin dest air_time distance hour minute
1 EWR IAH 227 1400 5 17
2 LGA IAH 227 1416 5 33
3 JFK MIA 160 1089 5 42
4 JFK BQN 183 1576 5 44
5 LGA ATL 116 762 6 54
6 EWR ORD 150 719 6 54
7 EWR FLL 158 1065 6 55
8 LGA IAD 53 229 6 57
9 JFK MCO 140 944 6 57
10 LGA ORD 138 733 6 58
Flights
planes = mongo( collection = "Planes",db = "flights", url = "mongodb://localhost")
planes$drop()
[1] TRUE
planes$insert(planesdf)
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Complete! Processed total of 3322 rows.
$nInserted
[1] 3322
$nMatched
[1] 0
$nRemoved
[1] 0
$nUpserted
[1] 0
$writeErrors
list()
planes$count()
[1] 3322
planes$find(limit = 10)
Found 10 records...
Imported 10 records. Simplifying into dataframe...
tailnum year type manufacturer model engines
1 N10156 2004 Fixed wing multi engine EMBRAER EMB-145XR 2
2 N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
3 N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
4 N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
5 N10575 2002 Fixed wing multi engine EMBRAER EMB-145LR 2
6 N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
7 N107US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
8 N108UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
9 N109UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
10 N110UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
seats engine
1 55 Turbo-fan
2 182 Turbo-fan
3 182 Turbo-fan
4 182 Turbo-fan
5 55 Turbo-fan
6 182 Turbo-fan
7 182 Turbo-fan
8 182 Turbo-fan
9 182 Turbo-fan
10 182 Turbo-fan
Planes
airports = mongo( collection = "Airports",db = "flights", url = "mongodb://localhost")
airports$drop()
[1] TRUE
airports$insert(airportsdf)
Processed 1000 rows...
Complete! Processed total of 1397 rows.
$nInserted
[1] 1397
$nMatched
[1] 0
$nRemoved
[1] 0
$nUpserted
[1] 0
$writeErrors
list()
airports$count()
[1] 1397
airports$find(limit = 10)
Found 10 records...
Imported 10 records. Simplifying into dataframe...
faa name lat lon alt tz dst
1 04G Lansdowne Airport 41.13047 -80.61958 1044 -5 A
2 06A Moton Field Municipal Airport 32.46057 -85.68003 264 -5 A
3 06C Schaumburg Regional 41.98934 -88.10124 801 -6 A
4 06N Randall Airport 41.43191 -74.39156 523 -5 A
5 09J Jekyll Island Airport 31.07447 -81.42778 11 -4 A
6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4 A
7 0G6 Williams County Airport 41.46731 -84.50678 730 -5 A
8 0G7 Finger Lakes Regional Airport 42.88356 -76.78123 492 -5 A
9 0P2 Shoestring Aviation Airfield 39.79482 -76.64719 1000 -5 U
10 0S9 Jefferson County Intl 48.05381 -122.81064 108 -8 A
Airports
airlines = mongo( collection = "AirLines",db = "flights", url = "mongodb://localhost")
airlines$drop()
[1] TRUE
airlines$insert(airlinesdf)
Complete! Processed total of 16 rows.
$nInserted
[1] 16
$nMatched
[1] 0
$nRemoved
[1] 0
$nUpserted
[1] 0
$writeErrors
list()
airlines$count()
[1] 16
airlines$find(limit = 10)
Found 10 records...
Imported 10 records. Simplifying into dataframe...
carrier name
1 9E Endeavor Air Inc.\r
2 AA American Airlines Inc.\r
3 AS Alaska Airlines Inc.\r
4 B6 JetBlue Airways\r
5 DL Delta Air Lines Inc.\r
6 EV ExpressJet Airlines Inc.\r
7 F9 Frontier Airlines Inc.\r
8 FL AirTran Airways Corporation\r
9 HA Hawaiian Airlines Inc.\r
10 MQ Envoy Air\r
AirLines
SQL and NOSQL databases each have their own strengths and weakness, and should be implemented according to the business scenario. Using Mongobd was interesting and fun endeavor. I encountered problems during the installation and creation phase.The Mongodb package used was mongolite due to the incompatibility of rmongodb with my R version. In the future I wish to study the query syntax deeper and perform left outer joins using $lookup to simulate SQL.
https://www.techwalla.com/articles/advantages-disadvantages-of-microsoft-sql
https://www.mongodb.com/scale/advantages-of-nosql
http://www.thegeekstuff.com/2014/01/sql-vs-nosql-db/?utm_source=tuicool
http://www.hadoop360.com/blog/advantages-and-disadvantages-of-nosql-databases-what-you-should-k
https://cran.r-project.org/web/packages/mongolite/mongolite.pdf
https://cran.r-project.org/web/packages/mongolite/vignettes/intro.html