In this assignment, I intend to try out RNeo4j, a R driver package for Neo4j. The CRAN documentation can be found here: https://cran.r-project.org/web/packages/RNeo4j/RNeo4j.pdf
I will use RNeo4j to migrate the small flight and airport data, provided by the class Neo4j Query Lab, to a Neo4j database.
To reproduce the codes and results of this assignment, the reader must first have a Neo4j database set up and running in the back.
Below code will connect to the Neo4j database. Reader should adjust the username/password.
library(RNeo4j)
library(knitr)
graph <- startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "12345")
Below is an optional step to clear any existing nodes and relations in the database.
clear(graph, input = F)
Let’s load the data and examine them.
airports <- read.csv("https://raw.githubusercontent.com/Tyllis/Data607/master/neo4j-airport-csv-raw.csv", stringsAsFactors=F)
flights <- read.csv("https://raw.githubusercontent.com/Tyllis/Data607/master/neo4j-flight-lab-data.csv", stringsAsFactors=F)
kable(airports)
| label | city | state |
|---|---|---|
| DTW | Detroit | Michigan |
| ATL | Atlanta | Georgia |
| PIT | Pittsburgh | Pennsylvania |
| BOS | Boston | Massachussetts |
kable(flights)
| flight | airline | depart | arrive | capacity | takeoff | landing |
|---|---|---|---|---|---|---|
| 23 | Delta | ATL | DTW | 160 | 926 | 1109 |
| 24 | Delta | DTW | ATL | 160 | 914 | 1123 |
| 27 | Delta | DTW | BOS | 160 | 945 | 1210 |
| 28 | Delta | BOS | DTW | 160 | 1009 | 1228 |
| 35 | Delta | DTW | PIT | 128 | 955 | 1106 |
| 36 | Delta | PIT | DTW | 128 | 1117 | 1219 |
| 37 | Delta | ATL | BOS | 160 | 1201 | 1444 |
| 38 | Delta | BOS | ATL | 160 | 816 | 1040 |
| 43 | Delta | ATL | PIT | 160 | 901 | 1001 |
| 44 | Delta | PIT | ATL | 160 | 1100 | 1203 |
| 45 | Delta | BOS | PIT | 160 | 744 | 855 |
| 46 | Delta | PIT | BOS | 160 | 955 | 1103 |
| 101 | Southwest | DTW | BOS | 136 | 1600 | 1810 |
| 102 | Southwest | BOS | DTW | 136 | 1605 | 1815 |
| 103 | Southwest | DTW | PIT | 136 | 1615 | 1730 |
| 104 | Southwest | PIT | DTW | 136 | 1620 | 1735 |
| 1231 | American | DTW | BOS | 160 | 1300 | 1520 |
| 1232 | American | DTW | PIT | 160 | 1305 | 1410 |
| 1257 | American | DTW | ATL | 128 | 1310 | 1455 |
| 1258 | American | BOS | ATL | 160 | 1315 | 1455 |
| 1277 | American | BOS | DTW | 160 | 1310 | 1530 |
| 1278 | American | PIT | DTW | 160 | 1400 | 1505 |
| 1291 | American | ATL | DTW | 128 | 1330 | 1530 |
| 1292 | American | ATL | BOS | 160 | 1340 | 1545 |
First, I will write the query to create airport nodes.
query <- "
MERGE (:Airport {label:{label_name}, city:{city_name}, state:{state_name}})
"
To migrate the airport data, I will use a for-loop to go thru each row, retrieve the airport data, and use the query above to create cypher query. The appendCypher function will collect the cypher query created in each loop into a transaction object.
# Create a transaction object
t <- newTransaction(graph)
# Append the cypher queries to the transaction object
for (i in 1:nrow(airports)){
appendCypher(t,
query,
label_name = airports[i,]$label,
city_name = airports[i,]$city,
state_name = airports[i,]$state)
}
# Execute the transaction
commit(t)
The commit function will execute the migration.
Similarly, I will now add the flights nodes. I will also add the relations between flights and airports.
query <- "
MERGE (f:Flight {flight:{flnm}, airline:{alnm}, capacity:{cpct}})
WITH f
MATCH (a:Airport {label:{depart}}), (c:Airport {label:{arrive}})
CREATE (a) -[:DEPART {takeoff:{tkof}}]-> (f) -[:ARRIVE {landing:{ldng}}]-> (c)
"
t <- newTransaction(graph)
for (i in 1:nrow(flights)){
appendCypher(t,
query,
flnm = flights[i,]$flight,
alnm = flights[i,]$airline,
cpct = flights[i,]$capacity,
tkof = flights[i,]$takeoff,
ldng = flights[i,]$landing,
depart = flights[i,]$depart ,
arrive = flights[i,]$arrive)
}
commit(t)
We can now check the migration result in the browser.
browse(graph, viewer = T)
Setting viewer = F will open the graph in the default browser; otherwise, it will be opened in R’s viewer pane.
Calling MATCH (n) RETURN (n) will return the following visual representation of the graph:
To further verify the migration, we can write queries and use cypher function to retrieve the data from the Neo4j database and turn into a data.frame object in R.
Below query will match and return any nodes with Airport tag.
query <- "
MATCH (a:Airport)
RETURN a.label, a.city, a.state
"
kable(cypher(graph, query))
| a.label | a.city | a.state |
|---|---|---|
| DTW | Detroit | Michigan |
| ATL | Atlanta | Georgia |
| PIT | Pittsburgh | Pennsylvania |
| BOS | Boston | Massachussetts |
Below query will match and return the flight data, by looking at the relations between airports and flights.
query <- "
MATCH (a:Airport) -[b:DEPART]-> (c:Flight) -[d:ARRIVE]-> (e:Airport)
RETURN c.flight, c.airline, a.label AS depart, e.label AS arrive, c.capacity, b.takeoff, d.landing
ORDER BY c.flight
"
kable(cypher(graph, query))
| c.flight | c.airline | depart | arrive | c.capacity | b.takeoff | d.landing |
|---|---|---|---|---|---|---|
| 23 | Delta | ATL | DTW | 160 | 926 | 1109 |
| 24 | Delta | DTW | ATL | 160 | 914 | 1123 |
| 27 | Delta | DTW | BOS | 160 | 945 | 1210 |
| 28 | Delta | BOS | DTW | 160 | 1009 | 1228 |
| 35 | Delta | DTW | PIT | 128 | 955 | 1106 |
| 36 | Delta | PIT | DTW | 128 | 1117 | 1219 |
| 37 | Delta | ATL | BOS | 160 | 1201 | 1444 |
| 38 | Delta | BOS | ATL | 160 | 816 | 1040 |
| 43 | Delta | ATL | PIT | 160 | 901 | 1001 |
| 44 | Delta | PIT | ATL | 160 | 1100 | 1203 |
| 45 | Delta | BOS | PIT | 160 | 744 | 855 |
| 46 | Delta | PIT | BOS | 160 | 955 | 1103 |
| 101 | Southwest | DTW | BOS | 136 | 1600 | 1810 |
| 102 | Southwest | BOS | DTW | 136 | 1605 | 1815 |
| 103 | Southwest | DTW | PIT | 136 | 1615 | 1730 |
| 104 | Southwest | PIT | DTW | 136 | 1620 | 1735 |
| 1231 | American | DTW | BOS | 160 | 1300 | 1520 |
| 1232 | American | DTW | PIT | 160 | 1305 | 1410 |
| 1257 | American | DTW | ATL | 128 | 1310 | 1455 |
| 1258 | American | BOS | ATL | 160 | 1315 | 1455 |
| 1277 | American | BOS | DTW | 160 | 1310 | 1530 |
| 1278 | American | PIT | DTW | 160 | 1400 | 1505 |
| 1291 | American | ATL | DTW | 128 | 1330 | 1530 |
| 1292 | American | ATL | BOS | 160 | 1340 | 1545 |
As you can see, the two tables are identical to the csv files we imported earlier. The migration is successful.
The advantage of storing the data in a graph-based database such as Neo4j is that the search can be done through relationship-based query, as demonstrated above. These searches can be very fast. The disadvantage is that importing/uploading a large volume of data can be slow.