Starting Up

In this assignment, I intend to try out RNeo4j, a R driver package for Neo4j. The CRAN documentation can be found here: https://cran.r-project.org/web/packages/RNeo4j/RNeo4j.pdf

I will use RNeo4j to migrate the small flight and airport data, provided by the class Neo4j Query Lab, to a Neo4j database.

To reproduce the codes and results of this assignment, the reader must first have a Neo4j database set up and running in the back.

Below code will connect to the Neo4j database. Reader should adjust the username/password.

library(RNeo4j)
library(knitr)
graph <- startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "12345")

Below is an optional step to clear any existing nodes and relations in the database.

clear(graph, input = F)

Let’s load the data and examine them.

airports <- read.csv("https://raw.githubusercontent.com/Tyllis/Data607/master/neo4j-airport-csv-raw.csv", stringsAsFactors=F)
flights <- read.csv("https://raw.githubusercontent.com/Tyllis/Data607/master/neo4j-flight-lab-data.csv", stringsAsFactors=F)
kable(airports)
label city state
DTW Detroit Michigan
ATL Atlanta Georgia
PIT Pittsburgh Pennsylvania
BOS Boston Massachussetts
kable(flights)
flight airline depart arrive capacity takeoff landing
23 Delta ATL DTW 160 926 1109
24 Delta DTW ATL 160 914 1123
27 Delta DTW BOS 160 945 1210
28 Delta BOS DTW 160 1009 1228
35 Delta DTW PIT 128 955 1106
36 Delta PIT DTW 128 1117 1219
37 Delta ATL BOS 160 1201 1444
38 Delta BOS ATL 160 816 1040
43 Delta ATL PIT 160 901 1001
44 Delta PIT ATL 160 1100 1203
45 Delta BOS PIT 160 744 855
46 Delta PIT BOS 160 955 1103
101 Southwest DTW BOS 136 1600 1810
102 Southwest BOS DTW 136 1605 1815
103 Southwest DTW PIT 136 1615 1730
104 Southwest PIT DTW 136 1620 1735
1231 American DTW BOS 160 1300 1520
1232 American DTW PIT 160 1305 1410
1257 American DTW ATL 128 1310 1455
1258 American BOS ATL 160 1315 1455
1277 American BOS DTW 160 1310 1530
1278 American PIT DTW 160 1400 1505
1291 American ATL DTW 128 1330 1530
1292 American ATL BOS 160 1340 1545

Data Migration

First, I will write the query to create airport nodes.

query <- "
MERGE (:Airport {label:{label_name}, city:{city_name}, state:{state_name}})
"

To migrate the airport data, I will use a for-loop to go thru each row, retrieve the airport data, and use the query above to create cypher query. The appendCypher function will collect the cypher query created in each loop into a transaction object.

# Create a transaction object
t <- newTransaction(graph)

# Append the cypher queries to the transaction object
for (i in 1:nrow(airports)){
  appendCypher(t,
               query,
               label_name = airports[i,]$label,
               city_name = airports[i,]$city,
               state_name = airports[i,]$state)
}

# Execute the transaction
commit(t)

The commit function will execute the migration.

Similarly, I will now add the flights nodes. I will also add the relations between flights and airports.

query <- "
MERGE (f:Flight {flight:{flnm}, airline:{alnm}, capacity:{cpct}})
WITH f
MATCH (a:Airport {label:{depart}}), (c:Airport {label:{arrive}}) 
CREATE (a) -[:DEPART {takeoff:{tkof}}]-> (f) -[:ARRIVE {landing:{ldng}}]-> (c)
"

t <- newTransaction(graph)

for (i in 1:nrow(flights)){
  appendCypher(t,
               query,
               flnm = flights[i,]$flight,
               alnm = flights[i,]$airline,
               cpct = flights[i,]$capacity,
               tkof = flights[i,]$takeoff,
               ldng = flights[i,]$landing,
               depart = flights[i,]$depart ,
               arrive = flights[i,]$arrive)
}

commit(t)

Checking Results

We can now check the migration result in the browser.

browse(graph, viewer = T)

Setting viewer = F will open the graph in the default browser; otherwise, it will be opened in R’s viewer pane.

Calling MATCH (n) RETURN (n) will return the following visual representation of the graph:

To further verify the migration, we can write queries and use cypher function to retrieve the data from the Neo4j database and turn into a data.frame object in R.

Below query will match and return any nodes with Airport tag.

query <- "
MATCH (a:Airport)
RETURN a.label, a.city, a.state
"
kable(cypher(graph, query))
a.label a.city a.state
DTW Detroit Michigan
ATL Atlanta Georgia
PIT Pittsburgh Pennsylvania
BOS Boston Massachussetts

Below query will match and return the flight data, by looking at the relations between airports and flights.

query <- "
MATCH (a:Airport) -[b:DEPART]-> (c:Flight) -[d:ARRIVE]-> (e:Airport)
RETURN c.flight, c.airline, a.label AS depart, e.label AS arrive, c.capacity, b.takeoff, d.landing
ORDER BY c.flight
"
kable(cypher(graph, query))
c.flight c.airline depart arrive c.capacity b.takeoff d.landing
23 Delta ATL DTW 160 926 1109
24 Delta DTW ATL 160 914 1123
27 Delta DTW BOS 160 945 1210
28 Delta BOS DTW 160 1009 1228
35 Delta DTW PIT 128 955 1106
36 Delta PIT DTW 128 1117 1219
37 Delta ATL BOS 160 1201 1444
38 Delta BOS ATL 160 816 1040
43 Delta ATL PIT 160 901 1001
44 Delta PIT ATL 160 1100 1203
45 Delta BOS PIT 160 744 855
46 Delta PIT BOS 160 955 1103
101 Southwest DTW BOS 136 1600 1810
102 Southwest BOS DTW 136 1605 1815
103 Southwest DTW PIT 136 1615 1730
104 Southwest PIT DTW 136 1620 1735
1231 American DTW BOS 160 1300 1520
1232 American DTW PIT 160 1305 1410
1257 American DTW ATL 128 1310 1455
1258 American BOS ATL 160 1315 1455
1277 American BOS DTW 160 1310 1530
1278 American PIT DTW 160 1400 1505
1291 American ATL DTW 128 1330 1530
1292 American ATL BOS 160 1340 1545

As you can see, the two tables are identical to the csv files we imported earlier. The migration is successful.

The advantage of storing the data in a graph-based database such as Neo4j is that the search can be done through relationship-based query, as demonstrated above. These searches can be very fast. The disadvantage is that importing/uploading a large volume of data can be slow.