DATA606 Homework Week 12

Database

The relational database script was created in MySQL the script can be found at the [Github Repository] (https://github.com/sortega7878/DATA607HW12) you can recreate the same environment.

Second step used the export wizard and upload CSV’s in the same github referenced in the last paragraph.

Starting Up

For this assignment I used a package called RNeo4j that even in early stages is very practical for connection with RNeo4j but still some work to go. I took the source files in the Neo4J extra materials and upload the CSV file into my github.

Requirements for Reproduction:

-Have Neo4j and and empty database -Change where specified the user password for Neo4j

Let’s start connecting to the database in Neo4j

library(RNeo4j)
library(knitr)

## Warning: package 'knitr' was built under R version 3.3.3

#graph <- startGraph("http://localhost:7474/db/data/", username = "neo4j", password = "yourpassword")

Below is an optional step to clear any existing nodes and relations in the database.

clear(graph, input = F)

Load the data for examination

airports <- read.csv("https://raw.githubusercontent.com/sortega7878/DATA607HW12/master/neo4j-airport-csv-raw.csv", stringsAsFactors=F)
flights <- read.csv("https://raw.githubusercontent.com/sortega7878/DATA607HW12/master/neo4j-flight-csv-raw.csv", stringsAsFactors=F)
kable(airports)

label	city	state
DTW	Detroit	Michigan
ATL	Atlanta	Georgia
PIT	Pittsburgh	Pennsylvania
BOS	Boston	Massachussetts

kable(flights)

flight	airline	depart	arrive	capacity
23	Delta	ATL	DTW	160
24	Delta	DTW	ATL	160
27	Delta	DTW	BOS	160
28	Delta	BOS	DTW	160
35	Delta	DTW	PIT	128
36	Delta	PIT	DTW	128
37	Delta	ATL	BOS	160
38	Delta	BOS	ATL	160
43	Delta	ATL	PIT	160
44	Delta	PIT	ATL	160
45	Delta	BOS	PIT	160
46	Delta	PIT	BOS	160
101	Southwest	DTW	BOS	136
102	Southwest	BOS	DTW	136
103	Southwest	DTW	PIT	136
104	Southwest	PIT	DTW	136
1231	American	DTW	BOS	160
1232	American	DTW	PIT	160
1257	American	DTW	ATL	128
1258	American	BOS	ATL	160
1277	American	BOS	DTW	160
1278	American	PIT	DTW	160
1291	American	ATL	DTW	128
1292	American	ATL	BOS	160

Data migration

Query to create airport nodes

query <- "MERGE (:Airport {label:{label_name}, city:{city_name}, state:{state_name}})"

The migration process uses a loop to construct the query that will create each node founded in the CSV File

# Create a transaction object
t <- newTransaction(graph)

# Append the cypher queries to the transaction object
for (i in 1:nrow(airports)){
  appendCypher(t,
               query,
               label_name = airports[i,]$label,
               city_name = airports[i,]$city,
               state_name = airports[i,]$state)
}

# Execute the transaction
commit(t)

Now repeating the process for flights and adding the query that will create the relationships

query <- "
MERGE (f:Flight {flight:{flnm}, airline:{alnm}, capacity:{cpct}})
WITH f
MATCH (a:Airport {label:{depart}}), (c:Airport {label:{arrive}}) 
CREATE (a) -[:DEPART {takeoff:{tkof}}]-> (f) -[:ARRIVE {landing:{ldng}}]-> (c)
"

t <- newTransaction(graph)

for (i in 1:nrow(flights)){
  appendCypher(t,
               query,
               flnm = flights[i,]$flight,
               alnm = flights[i,]$airline,
               cpct = flights[i,]$capacity,
               tkof = flights[i,]$takeoff,
               ldng = flights[i,]$landing,
               depart = flights[i,]$depart ,
               arrive = flights[i,]$arrive)
}

commit(t)

RESULT

Displaying the Generated graph as a static image the code will open Neo4j browser

knitr::include_graphics("C:\\Users\\sergioor\\Documents\\images\\graph.png")

browse(graph, viewer = TRUE)

Querying the information in the database in Neo4j

query <- "MATCH (a:Airport) RETURN a.label, a.city, a.state"
kable(cypher(graph, query))

a.label	a.city	a.state
DTW	Detroit	Michigan
ATL	Atlanta	Georgia
PIT	Pittsburgh	Pennsylvania
BOS	Boston	Massachussetts

Query will return the flight data, by looking the relations between airports and flights

query <- "
MATCH (a:Airport) -[b:DEPART]-> (c:Flight) -[d:ARRIVE]-> (e:Airport)
RETURN c.flight, c.airline, a.label AS depart, e.label AS arrive, c.capacity, b.takeoff, d.landing
ORDER BY c.flight
"
kable(cypher(graph, query))

c.flight	c.airline	depart	arrive	c.capacity	b.takeoff	d.landing
23	Delta	ATL	DTW	160	NA	NA
24	Delta	DTW	ATL	160	NA	NA
27	Delta	DTW	BOS	160	NA	NA
28	Delta	BOS	DTW	160	NA	NA
35	Delta	DTW	PIT	128	NA	NA
36	Delta	PIT	DTW	128	NA	NA
37	Delta	ATL	BOS	160	NA	NA
38	Delta	BOS	ATL	160	NA	NA
43	Delta	ATL	PIT	160	NA	NA
44	Delta	PIT	ATL	160	NA	NA
45	Delta	BOS	PIT	160	NA	NA
46	Delta	PIT	BOS	160	NA	NA
101	Southwest	DTW	BOS	136	NA	NA
102	Southwest	BOS	DTW	136	NA	NA
103	Southwest	DTW	PIT	136	NA	NA
104	Southwest	PIT	DTW	136	NA	NA
1231	American	DTW	BOS	160	NA	NA
1232	American	DTW	PIT	160	NA	NA
1257	American	DTW	ATL	128	NA	NA
1258	American	BOS	ATL	160	NA	NA
1277	American	BOS	DTW	160	NA	NA
1278	American	PIT	DTW	160	NA	NA
1291	American	ATL	DTW	128	NA	NA
1292	American	ATL	BOS	160	NA	NA

As you can see, the two tables are identical to the csv files we imported earlier.With these results we can conclude migration is complete and successful

Conclusion

NoSQL Databases seem to have a faster response time and are prone to structure, store and process unstructured data in a more efficient way aggregating some functionality in a more efficient way for things like workflows, fraud detection, intel analysis etc.

disadvntage is constant elasticity of needs such as Hardware and hence that cloud providers are focusing in this kind of platforma and change in the paradigm of data design so talent might be less or scarce in cxomparison of the other data platforms.