The relational database script was created in MySQL the script can be found at the [Github Repository] (https://github.com/sortega7878/DATA607HW12) you can recreate the same environment.
Second step used the export wizard and upload CSV’s in the same github referenced in the last paragraph.
For this assignment I used a package called RNeo4j that even in early stages is very practical for connection with RNeo4j but still some work to go. I took the source files in the Neo4J extra materials and upload the CSV file into my github.
Requirements for Reproduction:
-Have Neo4j and and empty database -Change where specified the user password for Neo4j
Let’s start connecting to the database in Neo4j
library(RNeo4j)
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.3
#graph <- startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "yourpassword")
Below is an optional step to clear any existing nodes and relations in the database.
clear(graph, input = F)
Load the data for examination
airports <- read.csv("https://raw.githubusercontent.com/sortega7878/DATA607HW12/master/neo4j-airport-csv-raw.csv", stringsAsFactors=F)
flights <- read.csv("https://raw.githubusercontent.com/sortega7878/DATA607HW12/master/neo4j-flight-csv-raw.csv", stringsAsFactors=F)
kable(airports)
| label | city | state |
|---|---|---|
| DTW | Detroit | Michigan |
| ATL | Atlanta | Georgia |
| PIT | Pittsburgh | Pennsylvania |
| BOS | Boston | Massachussetts |
kable(flights)
| flight | airline | depart | arrive | capacity |
|---|---|---|---|---|
| 23 | Delta | ATL | DTW | 160 |
| 24 | Delta | DTW | ATL | 160 |
| 27 | Delta | DTW | BOS | 160 |
| 28 | Delta | BOS | DTW | 160 |
| 35 | Delta | DTW | PIT | 128 |
| 36 | Delta | PIT | DTW | 128 |
| 37 | Delta | ATL | BOS | 160 |
| 38 | Delta | BOS | ATL | 160 |
| 43 | Delta | ATL | PIT | 160 |
| 44 | Delta | PIT | ATL | 160 |
| 45 | Delta | BOS | PIT | 160 |
| 46 | Delta | PIT | BOS | 160 |
| 101 | Southwest | DTW | BOS | 136 |
| 102 | Southwest | BOS | DTW | 136 |
| 103 | Southwest | DTW | PIT | 136 |
| 104 | Southwest | PIT | DTW | 136 |
| 1231 | American | DTW | BOS | 160 |
| 1232 | American | DTW | PIT | 160 |
| 1257 | American | DTW | ATL | 128 |
| 1258 | American | BOS | ATL | 160 |
| 1277 | American | BOS | DTW | 160 |
| 1278 | American | PIT | DTW | 160 |
| 1291 | American | ATL | DTW | 128 |
| 1292 | American | ATL | BOS | 160 |
Query to create airport nodes
query <- "MERGE (:Airport {label:{label_name}, city:{city_name}, state:{state_name}})"
The migration process uses a loop to construct the query that will create each node founded in the CSV File
# Create a transaction object
t <- newTransaction(graph)
# Append the cypher queries to the transaction object
for (i in 1:nrow(airports)){
appendCypher(t,
query,
label_name = airports[i,]$label,
city_name = airports[i,]$city,
state_name = airports[i,]$state)
}
# Execute the transaction
commit(t)
Now repeating the process for flights and adding the query that will create the relationships
query <- "
MERGE (f:Flight {flight:{flnm}, airline:{alnm}, capacity:{cpct}})
WITH f
MATCH (a:Airport {label:{depart}}), (c:Airport {label:{arrive}})
CREATE (a) -[:DEPART {takeoff:{tkof}}]-> (f) -[:ARRIVE {landing:{ldng}}]-> (c)
"
t <- newTransaction(graph)
for (i in 1:nrow(flights)){
appendCypher(t,
query,
flnm = flights[i,]$flight,
alnm = flights[i,]$airline,
cpct = flights[i,]$capacity,
tkof = flights[i,]$takeoff,
ldng = flights[i,]$landing,
depart = flights[i,]$depart ,
arrive = flights[i,]$arrive)
}
commit(t)
Displaying the Generated graph as a static image the code will open Neo4j browser
knitr::include_graphics("C:\\Users\\sergioor\\Documents\\images\\graph.png")
browse(graph, viewer = TRUE)
Querying the information in the database in Neo4j
query <- "MATCH (a:Airport) RETURN a.label, a.city, a.state"
kable(cypher(graph, query))
| a.label | a.city | a.state |
|---|---|---|
| DTW | Detroit | Michigan |
| ATL | Atlanta | Georgia |
| PIT | Pittsburgh | Pennsylvania |
| BOS | Boston | Massachussetts |
Query will return the flight data, by looking the relations between airports and flights
query <- "
MATCH (a:Airport) -[b:DEPART]-> (c:Flight) -[d:ARRIVE]-> (e:Airport)
RETURN c.flight, c.airline, a.label AS depart, e.label AS arrive, c.capacity, b.takeoff, d.landing
ORDER BY c.flight
"
kable(cypher(graph, query))
| c.flight | c.airline | depart | arrive | c.capacity | b.takeoff | d.landing |
|---|---|---|---|---|---|---|
| 23 | Delta | ATL | DTW | 160 | NA | NA |
| 24 | Delta | DTW | ATL | 160 | NA | NA |
| 27 | Delta | DTW | BOS | 160 | NA | NA |
| 28 | Delta | BOS | DTW | 160 | NA | NA |
| 35 | Delta | DTW | PIT | 128 | NA | NA |
| 36 | Delta | PIT | DTW | 128 | NA | NA |
| 37 | Delta | ATL | BOS | 160 | NA | NA |
| 38 | Delta | BOS | ATL | 160 | NA | NA |
| 43 | Delta | ATL | PIT | 160 | NA | NA |
| 44 | Delta | PIT | ATL | 160 | NA | NA |
| 45 | Delta | BOS | PIT | 160 | NA | NA |
| 46 | Delta | PIT | BOS | 160 | NA | NA |
| 101 | Southwest | DTW | BOS | 136 | NA | NA |
| 102 | Southwest | BOS | DTW | 136 | NA | NA |
| 103 | Southwest | DTW | PIT | 136 | NA | NA |
| 104 | Southwest | PIT | DTW | 136 | NA | NA |
| 1231 | American | DTW | BOS | 160 | NA | NA |
| 1232 | American | DTW | PIT | 160 | NA | NA |
| 1257 | American | DTW | ATL | 128 | NA | NA |
| 1258 | American | BOS | ATL | 160 | NA | NA |
| 1277 | American | BOS | DTW | 160 | NA | NA |
| 1278 | American | PIT | DTW | 160 | NA | NA |
| 1291 | American | ATL | DTW | 128 | NA | NA |
| 1292 | American | ATL | BOS | 160 | NA | NA |
As you can see, the two tables are identical to the csv files we imported earlier.With these results we can conclude migration is complete and successful
NoSQL Databases seem to have a faster response time and are prone to structure, store and process unstructured data in a more efficient way aggregating some functionality in a more efficient way for things like workflows, fraud detection, intel analysis etc.
disadvntage is constant elasticity of needs such as Hardware and hence that cloud providers are focusing in this kind of platforma and change in the paradigm of data design so talent might be less or scarce in cxomparison of the other data platforms.