SQL vs NoSQL

One of the most widely used database types is relational databases. Relational databases, like SQL, ORACLE, ACCESS etc., use relational algebra techniques (Unions, Intersects etc.) to combine and relate databases together. SQL is the langauge used to retreive information and join tables together. Other databases, like NEO4J are not relational, they are graphical databases. They use a venn diagram type database to link nodes together.Each node can be connected to ā€œnā€ other nodes and that linking is how relationships are established.

This project will take a table from a SQL/Relational Database and move it to a NoSQL/Graphical database.

The following packages are required for this assignement:

SQL flights table

There is a need to grab the information from the SQL database. This query will connect to the flights database in SQL. It will then got out an retreive the first 500 entries from the data base. It will then create a csv file on the users current working directory. This file will be used to import into the NoSQL database later.

select <- dbSendQuery(con, "select * from flights limit 500")
data <- fetch(select, n=-1)
wd <- getwd()
csvdir <- paste("file:///",wd, "/", "test.csv", sep = "")
write.csv(data, csvdir)

NEO4J flights table

We take the csv that was created from the flights SQL database and convert it into a NoSQL graphical data base. The query this database, we use cypher. Cypher is the way that you can loase information and query the data base in R.

dlgMessage(message = "Please open and connnect to NEO4J database before continuing", type = "ok")

graphname <- dlgInput(message = "What is your NEO4J URL (Must have /db/data ending)?", gui = .GUI)$res
username <- dlgInput(message = "What is your NEO4J username?", gui = .GUI)$res
password <- dlgInput(message = "What is your NEO4J password?", gui = .GUI)$res

graph <- startGraph("http://localhost:7474/db/data", user = "neo4j", password = "MHSdefLAX39")
query <- "LOAD CSV WITH HEADERS FROM {csv} AS csvLine CREATE (p:flights {year: csvLine.year, month: csvLine.mo, dep_time: csvLine.dep_time, arr_time: csvLine.arr_time, carrier: csvLine.carrier, tailnum: csvLine.tailnum, flight: csvLine.flight, origin: csvLine.origin, dest: csvLine.dest, air_time: csvLine.air_time, distance: csvLine.distance, hour: csvLine.hour, minute: csvLine.minute })"

cypher(graph, query, csv = "file:///C:/Users/brook/Documents/test.csv")

Query Comparisons

This is a simple comparison between the two databases. Both sets of the querries are doing the exact same thing. They are both going out and asking the databases for the same information, they just ask the database in different ways.

#SQL
SQLmin <- dbSendQuery(con, "select minute from flights where minute > 54")
data <- fetch(SQLmin, n = -1)
head(data)
##   minute
## 1     55
## 2     57
## 3     57
## 4     58
## 5     58
## 6     58
#Neo4j
query <- "match (n) where n.minute > '54' return n.minute"
neomin <- cypher(graph, query)
head(neomin)
##   n.minute
## 1       55
## 2       57
## 3       57
## 4       58
## 5       58
## 6       58
#SQL
SQLtail <- dbSendQuery(con, "select tailnum from flights where minute between 50 and 60 ")
data <- fetch(SQLtail, n = -1)
head(data)
##   tailnum
## 1  N668DN
## 2  N39463
## 3  N516JB
## 4  N829AS
## 5  N593JB
## 6  N3ALAA
#Neo4j
query <- "match (n) where n.minute >= '50' and n.minute <= '60' return n.tailnum"
neotail <- cypher(graph, query)
head(neotail)
##   n.tailnum
## 1    N668DN
## 2    N39463
## 3    N516JB
## 4    N829AS
## 5    N593JB
## 6    N3ALAA

NEO4J and MySQL differences

There are a lot of differences between the two databases.

  1. The way that we connect in R to the databases.
  1. The way you select the data
  1. Readbility of Language