1 Introduction

In this assignment, we will take flights database from MYSQL relational Database and migrate to NoSQL Neo4j database. As a conclusion, we will discuss the advantages and disadvantages of NoSQL Database.

2 Loading Libraries

library(RMySQL)
## Loading required package: DBI
library(neo4r)

3 Data Collection

3.1 MySQL Relational Database - nycflights13 database

user = 'root'
password = 'GGanuil1'
dbname = 'nycflights13'
host = 'localhost'
myDb <- dbConnect(MySQL(),user = user, password = password,  dbname = dbname, host = host)
tables <- dbListTables(myDb)

tables
## [1] "planes"  "weather"

We have currently planes and weather tables.

planes <- dbGetQuery(myDb, "select * from planes")
weather <- dbGetQuery(myDb, "select * from weather")
head(planes)
##   tailnum year                    type     manufacturer     model engines
## 1  N10156 2004 Fixed wing multi engine          EMBRAER EMB-145XR       2
## 2  N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 3  N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 4  N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
## 5  N10575 2002 Fixed wing multi engine          EMBRAER EMB-145LR       2
## 6  N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE  A320-214       2
##   seats speed    engine
## 1    55    NA Turbo-fan
## 2   182    NA Turbo-fan
## 3   182    NA Turbo-fan
## 4   182    NA Turbo-fan
## 5    55    NA Turbo-fan
## 6   182    NA Turbo-fan
head(weather)
##   origin year month day hour  temp  dewp humid wind_dir wind_speed
## 1    EWR 2013     1   1    1 39.02 26.06 59.37      270   10.35702
## 2    EWR 2013     1   1    2 39.02 26.96 61.63      250    8.05546
## 3    EWR 2013     1   1    3 39.02 28.04 64.43      240   11.50780
## 4    EWR 2013     1   1    4 39.92 28.04 62.21      250   12.65858
## 5    EWR 2013     1   1    5 39.02 28.04 64.43      260   12.65858
## 6    EWR 2013     1   1    6 37.94 28.04 67.21      240   11.50780
##   wind_gust precip pressure visib            time_hour
## 1        NA      0   1012.0    10 2013-01-01T06:00:00Z
## 2        NA      0   1012.3    10 2013-01-01T07:00:00Z
## 3        NA      0   1012.5    10 2013-01-01T08:00:00Z
## 4        NA      0   1012.2    10 2013-01-01T09:00:00Z
## 5        NA      0   1011.9    10 2013-01-01T10:00:00Z
## 6        NA      0   1012.4    10 2013-01-01T11:00:00Z

4 Data Export

write.csv(planes, "planes.csv", row.names = TRUE, na="")
# the exported and imported csv file is in https://raw.githubusercontent.com/anilak1978/nosql-migration/master/planes.csv url

4.1 Neo4j Graph - database migration

In order to work with Neo4J Graph , we have to install the desktop version, create a database(graph) under the My Project section.

We can further access to the database using package neo4j R package. We need to make sure to start the instance before we attempt to connect.

con <- neo4j_api$new(url = "http://localhost:7474", 
                     user = "neo4j", password = "GGanuil1")

con$ping() # to check if we can access the server by pinging it
## [1] 200

200 is telling us that the connection was successful.

on_load_query <- 'CREATE (n:Price) SET n = row,
n.tailnum = toString(row.tailnum),
n.year = toInteger(row.year),
n.manufacturer = toString(row.manufacturer),
n.model = toString(row.model),
n.engines = toString(row.engines),
n.seats = toString(row.seats),
n.speed = toInteger(row.speed),
n.engine = toString(row.engine);'
# Send the csv 
load_csv(url = "https://raw.githubusercontent.com/anilak1978/nosql-migration/master/planes-2.csv", 
         con = con, header = TRUE, periodic_commit = 50, 
         as = "row", on_load = on_load_query)
## No data returned.
## # A tibble: 12 x 2
##    type                  value
##    <chr>                 <dbl>
##  1 contains_updates          1
##  2 nodes_created          3252
##  3 nodes_deleted             0
##  4 properties_set        55284
##  5 relationships_created     0
##  6 relationship_deleted      0
##  7 labels_added           3252
##  8 labels_removed            0
##  9 indexes_added             0
## 10 indexes_removed           0
## 11 constraints_added         0
## 12 constraints_removed       0

We can see that we were able to load the table and successfully migrated the data into our newly created graph (database). We can also see the data in the neo4j browser.

5 Conclusion

Based on this assignment, the advantages of neo4j nosql graph or database is that, we can create nodes and relationship across the database and tables. The disadvantage i see is that, there are not many r packages out there that will make the load and queriying easy. It does look agile as , easy to filter as my first impression.

LS0tDQp0aXRsZTogIk5vU1FMIE1pZ3JhdGlvbiINCmF1dGhvcjogQW5pbCBBa3lpbGRpcmltDQpkYXRlOiAiMTAvMTkvMjAxOSINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDoNCiAgICBjb2RlX2Rvd25sb2FkOiB5ZXMNCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUNCiAgICBoaWdobGlnaHQ6IHB5Z21lbnRzDQogICAgbnVtYmVyX3NlY3Rpb25zOiB5ZXMNCiAgICB0aGVtZTogZmxhdGx5DQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZmxvYXQ6IHllcw0KICBwZGZfZG9jdW1lbnQ6DQogICAgdG9jOiB5ZXMNCi0tLQ0KDQoNCiMgSW50cm9kdWN0aW9uDQoNCkluIHRoaXMgYXNzaWdubWVudCwgd2Ugd2lsbCB0YWtlIGZsaWdodHMgZGF0YWJhc2UgZnJvbSBNWVNRTCByZWxhdGlvbmFsIERhdGFiYXNlIGFuZCBtaWdyYXRlIHRvIE5vU1FMIE5lbzRqIGRhdGFiYXNlLiBBcyBhIGNvbmNsdXNpb24sIHdlIHdpbGwgZGlzY3VzcyB0aGUgYWR2YW50YWdlcyBhbmQgZGlzYWR2YW50YWdlcyBvZiBOb1NRTCBEYXRhYmFzZS4NCg0KDQoNCiMgTG9hZGluZyBMaWJyYXJpZXMNCg0KYGBge3J9DQpsaWJyYXJ5KFJNeVNRTCkNCmxpYnJhcnkobmVvNHIpDQpgYGANCg0KDQoNCiMgRGF0YSBDb2xsZWN0aW9uDQoNCiMjIE15U1FMIFJlbGF0aW9uYWwgRGF0YWJhc2UgLSBueWNmbGlnaHRzMTMgZGF0YWJhc2UgDQoNCmBgYHtyfQ0KDQp1c2VyID0gJ3Jvb3QnDQpwYXNzd29yZCA9ICdHR2FudWlsMScNCmRibmFtZSA9ICdueWNmbGlnaHRzMTMnDQpob3N0ID0gJ2xvY2FsaG9zdCcNCm15RGIgPC0gZGJDb25uZWN0KE15U1FMKCksdXNlciA9IHVzZXIsIHBhc3N3b3JkID0gcGFzc3dvcmQsICBkYm5hbWUgPSBkYm5hbWUsIGhvc3QgPSBob3N0KQ0KdGFibGVzIDwtIGRiTGlzdFRhYmxlcyhteURiKQ0KDQp0YWJsZXMNCg0KYGBgDQoNCldlIGhhdmUgY3VycmVudGx5IHBsYW5lcyBhbmQgd2VhdGhlciB0YWJsZXMuIA0KDQpgYGB7cn0NCnBsYW5lcyA8LSBkYkdldFF1ZXJ5KG15RGIsICJzZWxlY3QgKiBmcm9tIHBsYW5lcyIpDQp3ZWF0aGVyIDwtIGRiR2V0UXVlcnkobXlEYiwgInNlbGVjdCAqIGZyb20gd2VhdGhlciIpDQpoZWFkKHBsYW5lcykNCmhlYWQod2VhdGhlcikNCg0KDQpgYGANCg0KIyBEYXRhIEV4cG9ydA0KDQpgYGB7cn0NCg0Kd3JpdGUuY3N2KHBsYW5lcywgInBsYW5lcy5jc3YiLCByb3cubmFtZXMgPSBUUlVFLCBuYT0iIikNCiMgdGhlIGV4cG9ydGVkIGFuZCBpbXBvcnRlZCBjc3YgZmlsZSBpcyBpbiBodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vYW5pbGFrMTk3OC9ub3NxbC1taWdyYXRpb24vbWFzdGVyL3BsYW5lcy5jc3YgdXJsDQoNCmBgYA0KDQoNCg0KIyMgTmVvNGogR3JhcGggLSBkYXRhYmFzZSBtaWdyYXRpb24NCg0KSW4gb3JkZXIgdG8gd29yayB3aXRoIE5lbzRKIEdyYXBoICwgd2UgaGF2ZSB0byBpbnN0YWxsIHRoZSBkZXNrdG9wIHZlcnNpb24sIGNyZWF0ZSBhIGRhdGFiYXNlKGdyYXBoKSB1bmRlciB0aGUgTXkgUHJvamVjdCBzZWN0aW9uLg0KDQohW10oaHR0cDovL2FpcmxvY2tlZDUwLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxOS8xMS9uZW80ai5qcGcpDQoNCldlIGNhbiBmdXJ0aGVyIGFjY2VzcyB0byB0aGUgZGF0YWJhc2UgdXNpbmcgcGFja2FnZSBuZW80aiBSIHBhY2thZ2UuIFdlIG5lZWQgdG8gbWFrZSBzdXJlIHRvIHN0YXJ0IHRoZSBpbnN0YW5jZSBiZWZvcmUgd2UgYXR0ZW1wdCB0byBjb25uZWN0LiANCg0KDQpgYGB7cn0NCg0KY29uIDwtIG5lbzRqX2FwaSRuZXcodXJsID0gImh0dHA6Ly9sb2NhbGhvc3Q6NzQ3NCIsIA0KICAgICAgICAgICAgICAgICAgICAgdXNlciA9ICJuZW80aiIsIHBhc3N3b3JkID0gIkdHYW51aWwxIikNCg0KY29uJHBpbmcoKSAjIHRvIGNoZWNrIGlmIHdlIGNhbiBhY2Nlc3MgdGhlIHNlcnZlciBieSBwaW5naW5nIGl0DQpgYGANCg0KMjAwIGlzIHRlbGxpbmcgdXMgdGhhdCB0aGUgY29ubmVjdGlvbiB3YXMgc3VjY2Vzc2Z1bC4gDQoNCiFbXShodHRwOi8vYWlybG9ja2VkNTAuY29tL3dwLWNvbnRlbnQvdXBsb2Fkcy8yMDE5LzExL25lbzRqLTIuanBnKQ0KDQoNCmBgYHtyfQ0KDQpvbl9sb2FkX3F1ZXJ5IDwtICdDUkVBVEUgKG46UHJpY2UpIFNFVCBuID0gcm93LA0Kbi50YWlsbnVtID0gdG9TdHJpbmcocm93LnRhaWxudW0pLA0Kbi55ZWFyID0gdG9JbnRlZ2VyKHJvdy55ZWFyKSwNCm4ubWFudWZhY3R1cmVyID0gdG9TdHJpbmcocm93Lm1hbnVmYWN0dXJlciksDQpuLm1vZGVsID0gdG9TdHJpbmcocm93Lm1vZGVsKSwNCm4uZW5naW5lcyA9IHRvU3RyaW5nKHJvdy5lbmdpbmVzKSwNCm4uc2VhdHMgPSB0b1N0cmluZyhyb3cuc2VhdHMpLA0Kbi5zcGVlZCA9IHRvSW50ZWdlcihyb3cuc3BlZWQpLA0Kbi5lbmdpbmUgPSB0b1N0cmluZyhyb3cuZW5naW5lKTsnDQojIFNlbmQgdGhlIGNzdiANCmxvYWRfY3N2KHVybCA9ICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vYW5pbGFrMTk3OC9ub3NxbC1taWdyYXRpb24vbWFzdGVyL3BsYW5lcy0yLmNzdiIsIA0KICAgICAgICAgY29uID0gY29uLCBoZWFkZXIgPSBUUlVFLCBwZXJpb2RpY19jb21taXQgPSA1MCwgDQogICAgICAgICBhcyA9ICJyb3ciLCBvbl9sb2FkID0gb25fbG9hZF9xdWVyeSkNCg0KDQpgYGANCg0KV2UgY2FuIHNlZSB0aGF0IHdlIHdlcmUgYWJsZSB0byBsb2FkIHRoZSB0YWJsZSBhbmQgc3VjY2Vzc2Z1bGx5IG1pZ3JhdGVkIHRoZSBkYXRhIGludG8gb3VyIG5ld2x5IGNyZWF0ZWQgZ3JhcGggKGRhdGFiYXNlKS4gV2UgY2FuIGFsc28gc2VlIHRoZSBkYXRhIGluIHRoZSBuZW80aiBicm93c2VyLg0KDQohW10oaHR0cDovL2FpcmxvY2tlZDUwLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxOS8xMS9uZW80ai00LmpwZykNCg0KDQoNCiMgQ29uY2x1c2lvbg0KDQpCYXNlZCBvbiB0aGlzIGFzc2lnbm1lbnQsIHRoZSBhZHZhbnRhZ2VzIG9mIG5lbzRqIG5vc3FsIGdyYXBoIG9yIGRhdGFiYXNlIGlzIHRoYXQsIHdlIGNhbiBjcmVhdGUgbm9kZXMgYW5kIHJlbGF0aW9uc2hpcCBhY3Jvc3MgdGhlIGRhdGFiYXNlIGFuZCB0YWJsZXMuIFRoZSBkaXNhZHZhbnRhZ2UgaSBzZWUgaXMgdGhhdCwgdGhlcmUgYXJlIG5vdCBtYW55IHIgcGFja2FnZXMgb3V0IHRoZXJlIHRoYXQgd2lsbCBtYWtlIHRoZSBsb2FkIGFuZCBxdWVyaXlpbmcgZWFzeS4gSXQgZG9lcyBsb29rIGFnaWxlIGFzICwgZWFzeSB0byBmaWx0ZXIgYXMgbXkgZmlyc3QgaW1wcmVzc2lvbi4gDQoNCg0KDQoNCg0KDQo=