Neo4j Graphical Database

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation. For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing. Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Introduction

This rmd,sql code and all related input data can be found on my github.

Run these two applications first in order to install ‘RNeo4j’ package in R

#library(devtools)
#install_git("https://github.com/nicolewhite/RNeo4j")

Data

library(RNeo4j)
library(knitr)

airports <- read.csv("https://raw.githubusercontent.com/omerozeren/DATA607/master/HMW_12/neo4j-airport-csv-raw.csv", stringsAsFactors=F)
flights <- read.csv("https://raw.githubusercontent.com/omerozeren/DATA607/master/HMW_12/neo4j-flight-csv-raw.csv", stringsAsFactors=F)
kable(head(flights,10))

flight	airline	depart	arrive	capacity
23	Delta	ATL	DTW	160
24	Delta	DTW	ATL	160
27	Delta	DTW	BOS	160
28	Delta	BOS	DTW	160
35	Delta	DTW	PIT	128
36	Delta	PIT	DTW	128
37	Delta	ATL	BOS	160
38	Delta	BOS	ATL	160
43	Delta	ATL	PIT	160
44	Delta	PIT	ATL	160

Query

flight_nodes <- startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "omerozeren")
query <- "MERGE (:Airport {label:{label_name}, city:{city_name}, state:{state_name}})"

# Create a transaction object
t <- newTransaction(flight_nodes)

# Append the cypher queries to the transaction object
for (i in 1:nrow(airports)){
  appendCypher(t,
               query,
               label_name = airports[i,]$label,
               city_name = airports[i,]$city,
               state_name = airports[i,]$state)
}

# Execute the transaction
commit(t)

RNeo4j

query <- "
MERGE (f:Flight {flight:{flnm}, airline:{alnm}, capacity:{cpct}})
WITH f
MATCH (a:Airport {label:{depart}}), (c:Airport {label:{arrive}}) 
CREATE (a) -[:DEPART {takeoff:{tkof}}]-> (f) -[:ARRIVE {landing:{ldng}}]-> (c)
"

t <- newTransaction(flight_nodes)

for (i in 1:nrow(flights)){
  appendCypher(t,
               query,
               flnm = flights[i,]$flight,
               alnm = flights[i,]$airline,
               cpct = flights[i,]$capacity,
               tkof = flights[i,]$takeoff,
               ldng = flights[i,]$landing,
               depart = flights[i,]$depart ,
               arrive = flights[i,]$arrive)
}

commit(t)

knitr::include_graphics("https://raw.githubusercontent.com/omerozeren/DATA607/master/HMW_12/flight_nodes.png")

Pros vs. Cons of Graphical Databases

Pros

So what are the pros and cons of having graphical databases. A source that I had referred to for this is listed below:

References : https://scraperwiki.com/2015/01/book-review-graph-databases-by-ian-robinson-jim-webber-and-emil-eifrem/

Three reasons why they(authors) believe graphical databases are better.

Performance: “In contrast to relational databases, where join-intensive query performance deteriorates as the dataset gets bigger, with a graph database performance tends to remain relatively constant, even as the dataset grows. This is because queries are localized to a portion of the graph. As a result, the execution time for each query is proportional only to the size of the part of the graph traversed to satisfy that query, rather than the size of the overall graph.” In other words, they had noted improved speed and efficiency (despite the contrary of what I had encountered.)
Flexibility: “Graphs are naturally additive, meaning we can add new kinds of relationships, new nodes, new labels, and new subgraphs to an existing structure without disturbing existing queries and application functionality. These things have generally positive implications for developer productivity and project risk. Because of the graph model’s flexibility, we don’t have to model our domain in exhaustive detail ahead of time-a practice that is all but foolhardy in the face of changing business requirements.” They noted that neo4j is a much more flexible database system.
Agility: “We want to be able to evolve our data model in step with the rest of our application, using a technology aligned with today’s incremental and iterative software delivery practices. Modern graph databases equip us to perform frictionless development and graceful systems maintenance.” They also state that graphical databases are easier to maintain.

Cons

This is newer technology, so there is less time for the technology to mature.
Given that this is new, a new language i.e. Cypher needs to be learned and requires to rewire your brain.
If the databases do not rely on relationships, then it may not be necessary to use a graphical database such as neo4j.

Omer Ozeren DATA 607- Homework 12

OMER OZEREN

April 21, 2019