It is a graph databases which can find relationship between data and extract their true value. Its service implemented in Java, which developed by the company Neo Technology. It has been widely used in fraud detection, real-time recommendations, social networks, data center management system.
A graph in Neo4j is defined by consists of vertices and edges. There are several types of the graphs:undirected graphs, directed graphs, graphs with weight, graphs with labels, property graphs.
Comparing MySQL I used in previous homework, Neo4j has difference ways to relate data.
1.Difference to relational database which use primary key attributes via foreign-key columns, Neo4j enable to build sophisticatedmodels by assembling the simple abstractions of nodes and relationship into connected structures.
More details about the difference of graph db and relational db: https://neo4j.com/developer/graph-db-vs-rdbms/
Load Libraries
if (!require(RCurl)) install.packages('RCurl')
## Loading required package: RCurl
## Loading required package: bitops
if (!require(getPass)) install.packages('getPass')
## Loading required package: getPass
if (!require(RNeo4j)) install.packages('RNeo4j')
## Loading required package: RNeo4j
if (!require(knitr)) install.packages('knitr')
## Loading required package: knitr
if (!require(png)) install.packages('jpeg')
## Loading required package: png
PartI:Data Preparation
R is an analysis tool to importe data to Neo4j database to build the relationship and manipulate the data we need throught Neo4j. In this assignment, we will use two data sets, one for airport and another for flight.
Load the raw data into R from my github which were download from cuny data files.
airport_raw <- getURL("https://raw.githubusercontent.com/ada2802/607-W12-Graph-DB/master/neo4j-airport-csv-raw.csv")
flight_raw <- getURL("https://raw.githubusercontent.com/ada2802/607-W12-Graph-DB/master/neo4j-flight-lab-data.csv")
airport <- read.csv(textConnection(airport_raw), header=T)
flight <- read.csv(textConnection(flight_raw), header=T)
Get connection to Neo4j
library(RNeo4j)
library(getPass)
neo4j <- getPass::getPass("Enter Neo4J username")
## Please enter password in TK window (Alt+Tab)
Password <- getPass::getPass("Enter Neo4J password")
## Please enter password in TK window (Alt+Tab)
graph = startGraph("http://127.0.0.1:7474/db/data/",
username = neo4j,
password = Password)
clear(graph, input = FALSE)
Creat an object airport and store each row data of the airport table to a node in Neo4j. There are three attributions for each node - code of the airport, city and state.
addConstraint(graph, "airport","label")
nrow <- length(airport$label)
for(i in 1:nrow)
{
createNode(graph, "airport",
label = airport[i,]$label,
code=airport[i,]$label,
city=airport[i,]$city,
state = airport[i,]$state
)
}
Creat an object flight and store each row data of the flight table to a node in Neo4j. There are two attributions for each node - airline and capacity.
nflight <- length(flight$flight)
for(i in 1:nflight)
{
createNode(graph, "flights",
flight = flight[i,]$flight,
airline = flight[i,]$airline,
capacity = flight[i,]$capacity
)
}
Build the relationship between airport and flight and query landing time by match the conection in Neo4j.
query = "match (a:airport {code:{airport_code}}), (b:flights {flight:{flight_num}}) create (b) -[r:Arrive {landing:{landing_time}}]->(a)"
t = newTransaction(graph)
for(i in 1:nflight)
{
airport_code = flight[i,]$arrive
flight_num = flight[i,]$flight
landing_time = flight[i,]$landing
appendCypher(t, query, airport_code = airport_code, flight_num = flight_num,landing_time = landing_time)
}
commit(t)
Build the relationship between airport and flight and query takeoff time by match the conection in Neo4j.
query = "match (a:airport {code:{airport_code}}), (b:flights {flight:{flight_num}}) create (a) -[r:Depart {takeoff:{takeoff_time}}]->(b)"
t = newTransaction(graph)
for(i in 1:nflight)
{
airport_code = flight[i,]$depart
flight_num = flight[i,]$flight
takeoff_time = flight[i,]$takeoff
appendCypher(t, query, airport_code = airport_code,flight_num = flight_num, takeoff_time = takeoff_time )
}
commit(t)
Connect depart airport, flights and arrive airport, which connect all nodes - flights and airport in Neo4j.
query = "MATCH (d:airport)-[t:Depart]->(f:flights)-[l:Arrive]->(a:airport)
RETURN d.label as FromAirport,t.takeoff as TakeOff, f.airline as AirLine, f.flight as FlightNum,l.landing as Landing, a.label as ToAirport"
match_table = cypher(graph, query)
kable(match_table)
| FromAirport | TakeOff | AirLine | FlightNum | Landing | ToAirport |
|---|---|---|---|---|---|
| DTW | 914 | Delta | 24 | 1123 | ATL |
| PIT | 1100 | Delta | 44 | 1203 | ATL |
| BOS | 816 | Delta | 38 | 1040 | ATL |
| DTW | 1310 | American | 1257 | 1455 | ATL |
| BOS | 1315 | American | 1258 | 1455 | ATL |
| DTW | 945 | Delta | 27 | 1210 | BOS |
| PIT | 955 | Delta | 46 | 1103 | BOS |
| ATL | 1201 | Delta | 37 | 1444 | BOS |
| DTW | 1600 | Southwest | 101 | 1810 | BOS |
| DTW | 1300 | American | 1231 | 1520 | BOS |
| ATL | 1340 | American | 1292 | 1545 | BOS |
| BOS | 1009 | Delta | 28 | 1228 | DTW |
| ATL | 926 | Delta | 23 | 1109 | DTW |
| PIT | 1620 | Southwest | 104 | 1735 | DTW |
| BOS | 1605 | Southwest | 102 | 1815 | DTW |
| PIT | 1117 | Delta | 36 | 1219 | DTW |
| BOS | 1310 | American | 1277 | 1530 | DTW |
| PIT | 1400 | American | 1278 | 1505 | DTW |
| ATL | 1330 | American | 1291 | 1530 | DTW |
| BOS | 744 | Delta | 45 | 855 | PIT |
| ATL | 901 | Delta | 43 | 1001 | PIT |
| DTW | 1615 | Southwest | 103 | 1730 | PIT |
| DTW | 955 | Delta | 35 | 1106 | PIT |
| DTW | 1305 | American | 1232 | 1410 | PIT |
Part II: Data Analysis
Now that we have some data loaded, let’s answer some basic questions. Write Cypher queries to answer each of the following:
query = "MATCH (d:airport)-->(f:flights)-->(a:airport) where d.code = 'DTW' and a.code = 'ATL'
RETURN d.label as FromAirport, f.airline as AirLine, f.flight as FlightNum, a.label as ToAirport"
DTW_To_ATL = cypher(graph, query)
kable(DTW_To_ATL)
| FromAirport | AirLine | FlightNum | ToAirport |
|---|---|---|---|
| DTW | Delta | 24 | ATL |
| DTW | American | 1257 | ATL |
Part III: Data Modification
Let’s make a few simple changes to our data. Perform the following tasks: 15. Suppose the Pittsburgh airport designation code has been changed from PIT to PGH. Write a query to update the airport’s code. match (a:airport {label: ‘PIT’}) set a.label = ‘PGH’ return a
#get value "PGH" of from the airport talbe which match "PIT"
Pittsburgh_node = getNodes(graph, "Match (a:airport) where a.code = 'PIT' return a ")
#replace the value from code list
Pittsburgh_node = Pittsburgh_node[[1]]
match (b:Airport {label: ‘BOS’})-[r:Arrives]-(f:Flight {airline: ‘Southwest’})-[d:Departs]-(a:Airport {label: ‘DTW’}) delete f,r,d match (b:Airport {label: ‘DTW’})-[r:Arrives]-(f:Flight {airline: ‘Southwest’})-[d:Departs]-(a:Airport {label: ‘BOS’}) remove f,r,d
Alternatively, you can remove all of them at once with: match (b:Airport {label: ‘BOS’})-[r]-(f:Flight {airline: ‘Southwest’})-[d]-(a:Airport {label: ‘DTW’}) delete f,r,d
Notice that this single query sets up the basic pattern but does not specify which relationship is Arrives and which is Departs.
#get the flight from both directions between DTW and BOS
query = "match (a:airport)-[d:Depart]->(f:flights)-[l:Arrive]->(b:airport) where (a.code = 'DTW' or a.code = 'BOS') and (b.code = 'BOS' or b.code = 'DTW') and f.airline = 'Southwest' delete d, f, l"
cypher(graph, query)
Part IV: Clean Up
Once all of the lab has been completed, you may wish to remove the data from the database. What query removes all nodes and relationships from the database?
query = "match (a) optional match (a)-[r]-() delete a,r"
cypher(graph, query)