In this assignment, we will take flights database from MYSQL relational Database and migrate to NoSQL Neo4j database. As a conclusion, we will discuss the advantages and disadvantages of NoSQL Database.
user = 'root'
password = 'GGanuil1'
dbname = 'nycflights13'
host = 'localhost'
myDb <- dbConnect(MySQL(),user = user, password = password, dbname = dbname, host = host)
tables <- dbListTables(myDb)
tables## [1] "planes" "weather"
We have currently planes and weather tables.
planes <- dbGetQuery(myDb, "select * from planes")
weather <- dbGetQuery(myDb, "select * from weather")
head(planes)## tailnum year type manufacturer model engines
## 1 N10156 2004 Fixed wing multi engine EMBRAER EMB-145XR 2
## 2 N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 3 N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 4 N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 5 N10575 2002 Fixed wing multi engine EMBRAER EMB-145LR 2
## 6 N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## seats speed engine
## 1 55 NA Turbo-fan
## 2 182 NA Turbo-fan
## 3 182 NA Turbo-fan
## 4 182 NA Turbo-fan
## 5 55 NA Turbo-fan
## 6 182 NA Turbo-fan
## origin year month day hour temp dewp humid wind_dir wind_speed
## 1 EWR 2013 1 1 1 39.02 26.06 59.37 270 10.35702
## 2 EWR 2013 1 1 2 39.02 26.96 61.63 250 8.05546
## 3 EWR 2013 1 1 3 39.02 28.04 64.43 240 11.50780
## 4 EWR 2013 1 1 4 39.92 28.04 62.21 250 12.65858
## 5 EWR 2013 1 1 5 39.02 28.04 64.43 260 12.65858
## 6 EWR 2013 1 1 6 37.94 28.04 67.21 240 11.50780
## wind_gust precip pressure visib time_hour
## 1 NA 0 1012.0 10 2013-01-01T06:00:00Z
## 2 NA 0 1012.3 10 2013-01-01T07:00:00Z
## 3 NA 0 1012.5 10 2013-01-01T08:00:00Z
## 4 NA 0 1012.2 10 2013-01-01T09:00:00Z
## 5 NA 0 1011.9 10 2013-01-01T10:00:00Z
## 6 NA 0 1012.4 10 2013-01-01T11:00:00Z
write.csv(planes, "planes.csv", row.names = TRUE, na="")
# the exported and imported csv file is in https://raw.githubusercontent.com/anilak1978/nosql-migration/master/planes.csv urlIn order to work with Neo4J Graph , we have to install the desktop version, create a database(graph) under the My Project section.
We can further access to the database using package neo4j R package. We need to make sure to start the instance before we attempt to connect.
con <- neo4j_api$new(url = "http://localhost:7474",
user = "neo4j", password = "GGanuil1")
con$ping() # to check if we can access the server by pinging it## [1] 200
200 is telling us that the connection was successful.
on_load_query <- 'CREATE (n:Price) SET n = row,
n.tailnum = toString(row.tailnum),
n.year = toInteger(row.year),
n.manufacturer = toString(row.manufacturer),
n.model = toString(row.model),
n.engines = toString(row.engines),
n.seats = toString(row.seats),
n.speed = toInteger(row.speed),
n.engine = toString(row.engine);'
# Send the csv
load_csv(url = "https://raw.githubusercontent.com/anilak1978/nosql-migration/master/planes-2.csv",
con = con, header = TRUE, periodic_commit = 50,
as = "row", on_load = on_load_query)## No data returned.
## # A tibble: 12 x 2
## type value
## <chr> <dbl>
## 1 contains_updates 1
## 2 nodes_created 3252
## 3 nodes_deleted 0
## 4 properties_set 55284
## 5 relationships_created 0
## 6 relationship_deleted 0
## 7 labels_added 3252
## 8 labels_removed 0
## 9 indexes_added 0
## 10 indexes_removed 0
## 11 constraints_added 0
## 12 constraints_removed 0
We can see that we were able to load the table and successfully migrated the data into our newly created graph (database). We can also see the data in the neo4j browser.
Based on this assignment, the advantages of neo4j nosql graph or database is that, we can create nodes and relationship across the database and tables. The disadvantage i see is that, there are not many r packages out there that will make the load and queriying easy. It does look agile as , easy to filter as my first impression.