NoSQL database: Neo4j
relational database: flights database
match (n) return (n)
and checking that no nodes are found.
If you need to empty a database that has nodes and relationships, you can use the query:
match (a) optional match (a)-[r]-() delete a,r
load csv with headers from "https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_12/neo4j-airport-csv-raw(1).csv" as airports create (a1:Airport {label: airports.label, city: airports.city, state: airports.state})
load csv with headers from "https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_12/neo4j-flight-lab-data.csv" as flights create (n:Flight {number: flights.flight, airline: flights.airline, capacity: flights.capacity})
load csv with headers from "https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_12/neo4j-flight-lab-data.csv" as flights match (a: Flight {number: flights.flight}),(b:Airport {label: flights.arrive}) create (a) -[r:Arrives {landing: flights.landing}]-> (b)
load csv with headers from "https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_12/neo4j-flight-lab-data.csv" as flights match (a: Flight {number: flights.flight}),(b:Airport {label: flights.depart}) create (a) -[r:Departs {takeoff: flights.takeoff}]-> (b)
match (n) return (n)
to ensure that the data have loaded correctly. You should see four airport nodes, 24 flight nodes, and the various attributes assigned correctly. Organize the graph in a way that makes these features clear.
You should see a graph that looks something like this.
Now let’s write Cypher queries to answer each of the following:
match (n:Airport) return n
match (n:Airport) return count(n)
match (f:Flight {airline: "Delta"}) return f
match (f:Flight)-[d:Departs]-(a:Airport {label: 'BOS'}) return f
match (b:Airport {label: 'ATL'})-[r:Arrives]-(f:Flight)-[d:Departs]-(a:Airport {label: 'DTW'}) return f
match (f:Flight) -[d:Departs]-() where toInt(d.takeoff) < 1100 return f
match (f:Flight) where toInt(f.capacity) > 150 return f
match (f:Flight {airline: 'Delta'}) -[r:Arrives]-(a:Airport {label: 'BOS'}) return f
Let’s make a few simple changes to our data. Perform the following tasks:
match (a:Airport {label: 'PIT'}) set a.label = 'PGH' return a
match (f:Flight {number: '28'}) set f.number = '29' return f
match (b:Airport {label: 'BOS'})-[r:Arrives]-(f:Flight {airline: 'Southwest'})-[d:Departs]-(a:Airport {label: 'DTW'}) delete f,r,d
Connect the neo4j database with R and access the flight and airport database
#devtools::install_git("https://github.com/nicolewhite/RNeo4j")
library(RNeo4j)
graph=startGraph("http://127.0.0.1:7474/db/data/", username = "neo4j", password = "data607")
clear(graph, input = FALSE)
addConstraint(graph, "Airport","label")
addConstraint(graph, "Flight", "flight" )
Flight<-read.csv("https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_12/neo4j-flight-lab-data.csv",header = T)
Airport<-read.csv("https://raw.githubusercontent.com/PriyaShaji/Data607/master/Assignment_12/neo4j-airport-csv-raw(1).csv",header = T)
for ( i in 1: nrow(Airport) ) {
createNode (graph, "Airport",
label = Airport[[i,"label" ]],
city = Airport[[i,"city"]],
state = Airport[[i,"state"]]
)
}
for ( j in 1: nrow(Flight) ) {
createNode (graph, "Flight",
flight = Flight[[j,"flight"]],
airline = Flight[[j,"airline"]] ,
capacity = Flight[[j,"capacity" ]],
arrive = Flight[[j,"arrive" ]]
)
}
query1 = "MATCH (a:Flight), (b:Airport) where a.arrive =b.label RETURN a.flight,a.airline, a.capacity, a.arrive, b.label, b.city, b. state"
cypher(graph,query1)
## a.flight a.airline a.capacity a.arrive b.label b.city
## 1 27 Delta 160 BOS BOS Boston
## 2 37 Delta 160 BOS BOS Boston
## 3 45 Delta 160 PIT PIT Pittsburgh
## 4 103 Southwest 136 PIT PIT Pittsburgh
## 5 1257 American 128 ATL ATL Atlanta
## 6 1291 American 128 DTW DTW Detroit
## 7 24 Delta 160 ATL ATL Atlanta
## 8 36 Delta 128 DTW DTW Detroit
## 9 44 Delta 160 ATL ATL Atlanta
## 10 102 Southwest 136 DTW DTW Detroit
## 11 1232 American 160 PIT PIT Pittsburgh
## 12 1278 American 160 DTW DTW Detroit
## 13 23 Delta 160 DTW DTW Detroit
## 14 35 Delta 128 PIT PIT Pittsburgh
## 15 43 Delta 160 PIT PIT Pittsburgh
## 16 101 Southwest 136 BOS BOS Boston
## 17 1231 American 160 BOS BOS Boston
## 18 1277 American 160 DTW DTW Detroit
## 19 28 Delta 160 DTW DTW Detroit
## 20 38 Delta 160 ATL ATL Atlanta
## 21 46 Delta 160 BOS BOS Boston
## 22 104 Southwest 136 DTW DTW Detroit
## 23 1258 American 160 ATL ATL Atlanta
## 24 1292 American 160 BOS BOS Boston
## b. state
## 1 Massachussetts
## 2 Massachussetts
## 3 Pennsylvania
## 4 Pennsylvania
## 5 Georgia
## 6 Michigan
## 7 Georgia
## 8 Michigan
## 9 Georgia
## 10 Michigan
## 11 Pennsylvania
## 12 Michigan
## 13 Michigan
## 14 Pennsylvania
## 15 Pennsylvania
## 16 Massachussetts
## 17 Massachussetts
## 18 Michigan
## 19 Michigan
## 20 Georgia
## 21 Massachussetts
## 22 Michigan
## 23 Georgia
## 24 Massachussetts
Question:
Briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
Answer)
Relational database
Relational db nomally require us to infer connection between entities suing speical properties such as foreign keys, or out-of-band processing like map-reduce. Relational db can also be called relational database management systems (RDBMS) or SQL databases. The most popular of these are Microsoft SQL Server, Oracle Database, MySQL, and IBM DB2.
Pros:
Expressivge query language & secondary Indexes: Users should be able to access and manipulate their data in sophisticated ways to support both operational and analytical applications. Indexes play a critical role in providing efficient access to data, supported natively by the database rather than maintained in application code.
Strong consistency: It is much more complex to build applications around an eventually consistent model, imposing significant work on the developer, even for the most sophisticated engineering teams.
Enterprise Management and Integrations: Organizations need a database that can be secured, monitored, automated, and integrated with their existing technology infrastructure, processes, and staff, including operations teams, DBAs, and data analysts.
Cons:
Performance problems associated with re-assembling simple data structures into their more complicated real-world representations.
Lack of support for complex base types, e.g., drawings.
SQL is limited when accessing complex data.
Knowledge of the database structure is required to create ad hoc queries.
Locking mechanisms defined by RDBMSs do not allow design transactions to be supported, e.g., the “check in” and “check out” type of feature that would allow an engineer to modify a drawing over the course of several working days.
NoSql database
Graph db is assembling the simple abstractions of nodes and relationships into connected structures. It enable to bnuild sophisticated models that map closely to the problem domain.Graph db also called NoSQL databases, the most popular being MongoDB, DocumentDB, Cassandra, Coachbase, HBase, Redis, and Neo4j.
Pros:
Flexible data model:Whether document,graph, key-value, or wide-column, all offer a flexible data model making it easy to store and combine data of any structure and allow dynamic modification of the schema without downtime or performance impace.
Scalability and performance:This allows the database to scale out on commodity hardware deployed on-premises or in the cloud, enabling almost unlimited growth with higher throughput and lower latency than relational databases.
Always-on global deployments:They are designed to run across many nodes, including replication to automatically synchronize data across servers, racks, and data centers.
Cons:
It is a growing technology that will be mature in a few years, so, you have to bet for one and pray for it becoming successful.
Most of them do not have a declarative language, and those that have it lack the capability to optimise queries in a proper way.
You have to use an API.
Many lack native implementations for different platforms, except for Sparksee high-performance graph database, which has native implementations for Linux, Windows, MacOS, iOS, Android and BB10.
Reference
“http://www.jamesserra.com/archive/2015/08/relational-databases-vs-non-relational-databases/”