Database

The relational database script was created in MySQL the script can be found at the [Github Repository] (https://github.com/sortega7878/DATA607HW12) you can recreate the same environment.

Second step used the export wizard and upload CSV’s in the same github referenced in the last paragraph.

Starting Up

For this assignment I used a package called RNeo4j that even in early stages is very practical for connection with RNeo4j but still some work to go. I took the source files in the Neo4J extra materials and upload the CSV file into my github.

Requirements for Reproduction:

-Have Neo4j and and empty database -Change where specified the user password for Neo4j

Let’s start connecting to the database in Neo4j

library(RNeo4j)
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.3
#graph <- startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "yourpassword")

Below is an optional step to clear any existing nodes and relations in the database.

clear(graph, input = F)

Load the data for examination

airports <- read.csv("https://raw.githubusercontent.com/sortega7878/DATA607HW12/master/neo4j-airport-csv-raw.csv", stringsAsFactors=F)
flights <- read.csv("https://raw.githubusercontent.com/sortega7878/DATA607HW12/master/neo4j-flight-csv-raw.csv", stringsAsFactors=F)
kable(airports)
label city state
DTW Detroit Michigan
ATL Atlanta Georgia
PIT Pittsburgh Pennsylvania
BOS Boston Massachussetts
kable(flights)
flight airline depart arrive capacity
23 Delta ATL DTW 160
24 Delta DTW ATL 160
27 Delta DTW BOS 160
28 Delta BOS DTW 160
35 Delta DTW PIT 128
36 Delta PIT DTW 128
37 Delta ATL BOS 160
38 Delta BOS ATL 160
43 Delta ATL PIT 160
44 Delta PIT ATL 160
45 Delta BOS PIT 160
46 Delta PIT BOS 160
101 Southwest DTW BOS 136
102 Southwest BOS DTW 136
103 Southwest DTW PIT 136
104 Southwest PIT DTW 136
1231 American DTW BOS 160
1232 American DTW PIT 160
1257 American DTW ATL 128
1258 American BOS ATL 160
1277 American BOS DTW 160
1278 American PIT DTW 160
1291 American ATL DTW 128
1292 American ATL BOS 160

Data migration

Query to create airport nodes

query <- "MERGE (:Airport {label:{label_name}, city:{city_name}, state:{state_name}})"

The migration process uses a loop to construct the query that will create each node founded in the CSV File

# Create a transaction object
t <- newTransaction(graph)

# Append the cypher queries to the transaction object
for (i in 1:nrow(airports)){
  appendCypher(t,
               query,
               label_name = airports[i,]$label,
               city_name = airports[i,]$city,
               state_name = airports[i,]$state)
}

# Execute the transaction
commit(t)

Now repeating the process for flights and adding the query that will create the relationships

query <- "
MERGE (f:Flight {flight:{flnm}, airline:{alnm}, capacity:{cpct}})
WITH f
MATCH (a:Airport {label:{depart}}), (c:Airport {label:{arrive}}) 
CREATE (a) -[:DEPART {takeoff:{tkof}}]-> (f) -[:ARRIVE {landing:{ldng}}]-> (c)
"

t <- newTransaction(graph)

for (i in 1:nrow(flights)){
  appendCypher(t,
               query,
               flnm = flights[i,]$flight,
               alnm = flights[i,]$airline,
               cpct = flights[i,]$capacity,
               tkof = flights[i,]$takeoff,
               ldng = flights[i,]$landing,
               depart = flights[i,]$depart ,
               arrive = flights[i,]$arrive)
}

commit(t)

RESULT

Displaying the Generated graph as a static image the code will open Neo4j browser

knitr::include_graphics("C:\\Users\\sergioor\\Documents\\images\\graph.png")

browse(graph, viewer = TRUE)

Querying the information in the database in Neo4j

query <- "MATCH (a:Airport) RETURN a.label, a.city, a.state"
kable(cypher(graph, query))
a.label a.city a.state
DTW Detroit Michigan
ATL Atlanta Georgia
PIT Pittsburgh Pennsylvania
BOS Boston Massachussetts

Query will return the flight data, by looking the relations between airports and flights

query <- "
MATCH (a:Airport) -[b:DEPART]-> (c:Flight) -[d:ARRIVE]-> (e:Airport)
RETURN c.flight, c.airline, a.label AS depart, e.label AS arrive, c.capacity, b.takeoff, d.landing
ORDER BY c.flight
"
kable(cypher(graph, query))
c.flight c.airline depart arrive c.capacity b.takeoff d.landing
23 Delta ATL DTW 160 NA NA
24 Delta DTW ATL 160 NA NA
27 Delta DTW BOS 160 NA NA
28 Delta BOS DTW 160 NA NA
35 Delta DTW PIT 128 NA NA
36 Delta PIT DTW 128 NA NA
37 Delta ATL BOS 160 NA NA
38 Delta BOS ATL 160 NA NA
43 Delta ATL PIT 160 NA NA
44 Delta PIT ATL 160 NA NA
45 Delta BOS PIT 160 NA NA
46 Delta PIT BOS 160 NA NA
101 Southwest DTW BOS 136 NA NA
102 Southwest BOS DTW 136 NA NA
103 Southwest DTW PIT 136 NA NA
104 Southwest PIT DTW 136 NA NA
1231 American DTW BOS 160 NA NA
1232 American DTW PIT 160 NA NA
1257 American DTW ATL 128 NA NA
1258 American BOS ATL 160 NA NA
1277 American BOS DTW 160 NA NA
1278 American PIT DTW 160 NA NA
1291 American ATL DTW 128 NA NA
1292 American ATL BOS 160 NA NA

As you can see, the two tables are identical to the csv files we imported earlier.With these results we can conclude migration is complete and successful

Conclusion

NoSQL Databases seem to have a faster response time and are prone to structure, store and process unstructured data in a more efficient way aggregating some functionality in a more efficient way for things like workflows, fraud detection, intel analysis etc.

disadvntage is constant elasticity of needs such as Hardware and hence that cloud providers are focusing in this kind of platforma and change in the paradigm of data design so talent might be less or scarce in cxomparison of the other data platforms.